Data Complexity

With the right data on hand we can answer complex questions, produce reliable forecasts and even prescribe how to generate required outcomes. Management decisions rely on such publications, presented in reports, dashboards and more often in self service portals. Therefore we need to make sure we use the right data as input. However the still growing complexity of our data,  is challenging data owners and stewards for example.

We describe some, not all, of the elements that contribute to the complexity of data.

Volume

The ever increasing numbers of producers, owners, users, media and sharing capabilities cause dazzling volumes and growth rates for data. Not so long ago we stored data in ERP, CRM and alike systems. Today we tend to produce more data in unstructerd formats, like documents, spreadsheets, presentations and more. In many cases this unstructured data is an extract of the structured data, for analytic and reporting goals. The publications are then often shared with management and colleagues, tripling the source data.

The reason for this multiplication is often the simple fact that we are not able to find the required data in the right format, so we take a “best fit” extract to produce another needle in the hay stack, making it even more difficult to find the right data next time. We are in a loop, like you see in the picture.

We will present you more sources and details about volume and growth rates during this course.

Structure

More and more sources become available, like mobile devices, sensors, camera’s, cars and even e-bikes. Al these devices produce and consume data via an infrastructure, fixed or mobile. The expanding type of data production devices will automatically lead to more types of data, and a more complex infrastructure to connect them.

For analytics, reports, and other presentations there is a growing demand for interoperability between all these types. We need to find and use technology that enables us to combine data from various sources, types, and formats. Tim Berners Lee, the “inventor”of Internet already set some rules in 2006 for a new technology which we know as Linked Data. You can read his publication here.

Abstraction

Many of us know the term abstract as used in artistic productions. Abstraction is the term used for communication or thinking concepts that differ from reality, mostly in a simplified format. In computer science we use abstraction to define, create, and apply a framework or template for production according standardised rules and formats.

Sounds like a good idea to reduce data complexity, however due to the lack of standardisation for abstraction models and procedures, it is in fact increasing complexity as we now also have to figure out what templates or frameworks have been used during production of data types and formats.