Big Data – A New Dimension for Information
Big data extends existing IT concepts to include data amounts generated within a specific market segment or within the context of a specific task, for instance, the collection, processing and presentation of all information required for a country’s national traffic planning during a specific time frame.
Based on Experton Group’s definition, these data reach a new dimension, because they are created from multiple sources within a very short time. The large number of heterogeneous data sources differentiates big data from traditional data volumes, for instance, within typical data warehouses.
As such, big data does not refer to isolated amounts of structured information, as is the case for traditional financial or technical-scientific workloads, which also produced large data amounts in the past, but refers to large amounts of structured and unstructured information as well as file-oriented or block-oriented information, whereby the relevance or validity of this information is not verified when it is being created.
Pure processor performance of individual systems will not play a critical role anymore. Mainframes, supercomputers and standard servers complement each other and work together to process big data.
Challenges arising in the big data context can only partially be addressed with current storage solutions such as virtualization, data deduplication or storage management approaches, since data volumes are not homogeneous, but consist of structured as well as unstructured data on multiple storage systems.
New procedures include parsers for decomposing and analyzing data streams, new distribution mechanisms for analysis results, new storage approaches to store these data efficiently and new organizational rules for collecting, processing and storing information.
De-facto real-time provisioning of gathered information is only possible, if this information is processed with new computing technologies that have their roots in supercomputing and academic environments.
Approaches include neuronal computing and grids that consist of various classes of computer systems with distributed algorithms. Also, long-forgotten approaches such as "soups" – that use individual unorganized database entities for organizational entities and are accessed by all kinds of applications – could experience a renaissance for big data processing purposes.