Big Data may provide companies with very useful information. But a great amount of data is basically useless unless it is converted into something significant. Only once they are combined and normalized they can show their full potential
di Valerio Alessandroni
In 1999 the advent of blogs gave rise to what we would have learnt to know as the “Web 2.0”: from users of contents generated by a limited number of sources, the users of Internet became the authors of these contents. The amount of information available exploded, and it did not take very long before companies and ICT experts understood that the amount of data generated online, although it was almost impossible to manage, had a huge potential in terms of business intelligence. In 2010, Google’s former Managing Director, Erick Schmidt, recalled how five Exabyte (billions of billions of bytes) of information had been created in the period which goes from the dawning of civilization to 2003. Today these same five Exabytes are created every two days.
A seismic shift worth huge amounts of data
Just a few examples are enough to understand this shift which made history. In 2016, the estimated volume of mobile traffic worldwide was equal to 6.2 billion Gbyte (6.2 Exabyte) a month. In 2020 almost 40.000 Exabytes (40 Zetabytes) of data are foreseen. Over 3.5 billion queries are carried out daily on Google, while FaceBook users increase by roughly 22% year after year. We might add the 187 million e-mails, 38 million WhatsApp messages and 18 million text messages exchanged every minute and so on.
It is estimated that China alone, by 2020, will provide 20% of all data generated on the planet. It is difficult to imagine figures of this shirt. To provide you with an idea, here are a couple of benchmarks. Assuming all beaches on earth contain 700.5 bilion bilion grains of sand, the 40 Zb mentioned before would be equivalent to 57 times that amount. Or, if we could save all the 40 Zb of data on Blu-ray discs, the weight of those discs (excluding covers) would be equal to that of the aircraft carrier warship Nimitz. The expression “Big Data” refers to such large amounts of data.
Megadata will become the cornerstone of the future Internet 3.0
Out of all these data, only one fourth could prove useful for companies, and consumers, if they were correctly classified and processed. But only 3% of these data get “tagged” and an even lower percentage, valued at about 0.5%, are actually examined. In their rough form, Big Data are indeed difficult to exploit, but this will not always be the case. We can indeed foresee that megadata will become the cornerstone of the future Internet 3.0 or “Semantic web”, which will imply the change to mass production and targeted consumption. To reach this result, however, we shall require tools capable of processing and structuring megadata, without excessive costs. Currently, only very few companies have the competence and means to exploit megadata, and this is heavily fettering the development of a market which, by 2027, should be worth 103 billion dollars.
It is however evident that such a huge amount of data cannot be managed and analyzed using traditional methods to mine the information they “hide”, and therefore carry out such operations as the predictive maintenance of plants, the analysis of consumers’ behaviour, market projections and so on.
What are the five “V”s which characterize Big Data
In general, it may be said that Big Data are characterized by 5 “V”s: volume, velocity, variety, veracity and value. We mentioned volume already. Velocity refers to the amount of data accumulated in a unit of time, considering there is a huge and continuous flow of Big Data.
Regarding variety, this refers to the nature of data, which may be structured or not structured. Structured data are basically organized data, with a defined length and format, while not structured and generally not organized data do not fit into the traditional row ad column structure of relational data bases. Examples of these data are texts images, videos and so on.
Veracity indicated lack of congruence and uncertainty in data, because those available, coming from different sources, may be disorderly, making it difficult to check their quality and precision. Particularly, a mass of disorderly data may create confusion, while few data may provide incomplete information.
Finally, value. It must be remembered that a mass of data is worthless and basically useless unless it is turned into something significant, from where information may be derived. This is the most important “V” out of the five.
The importance of making decisions based on models and trends
After combining and normalizing them, Big Data may display all their potential. For instance, analysis models and algorithms may be applied to identify the possible operative and energy savings, future malfunctioning of a company’s production systems may be foreseen, and so on. Making decisions based on models and trends identified in the gathered data may make the difference in the management of the building. The problem is doing so using a BMS, which is typically designed to command and control the building’s systems while data are gathered. A new platform must therefore be associated to the BMS which must take care of Data Mining, that is, of the software which can derive useful information from the large amount of available data.
Data Mining: fully exploiting the data available in the company
Data Mining is a process whereby knowledge is derived from large data banks by means of the application of algorithms which detect “hidden” associations (patterns) in the data and make them visible.
Statistical data analysis may enhance abnormal behaviour and connections between environment and functioning parameters, which may be useful for instance, to optimize maintenance action.
It may be discovered that a certain type of fault in an appliance always occurs when there is a drop in voltage superior to 5%, or when the appliance is used along with another device.
Unlike statistics, which allows to process such general information as unemployment or birth rates, data mining is used to seek connections between more variables relative to single subjects; by knowing the average behaviour of the clients of a telephone company, for instance, it is possible to try to foresee how much the average client will spend in the immediate future. Or else, by analyzing the vibrations of a mechanical line shaft, the moment of its breakage may be predicted in advance.
A concrete example in the inspection of hydraulic valves
Data Mining techniques are becoming very important considering the progress of Industry 4.0 and Internet of Things. The latter allows to collect great amounts of data using “intelligent” sensors positioned within the objects (tools, electric appliances, industrial devices and so on): These data, however, would be worthless if they did not provide us with useful information.
The purpose of Data Mining is deriving “hidden” information from large amounts of data which, observed as such, would appear random o meaningless.
By evaluating production data, for instance, an important German company managed to reduce the time needed to inspect hydraulic valves by 17.4%.
With 40,000 valves produces each year, the company can count on a saving of 14 days.
When we talk about millions of items, even a few seconds saved may build up rapidly, turning cents into millions of euros.
The capability of generating new knowledge from Big Data is therefor proving to be one of the key competences of the future.