Interesting statistics around big data, data analysis, and usage

Dr. Rupa Mahanti
Analyst’s corner
Published in
4 min readDec 17, 2022

--

Data, Data Analysis and Statistics
Data Analysis (Image created using photo and elements in Canva)

Evolution of data and big data

Until the advent of computers, limited facts were collected and documented, given the cost and scarcity of resources and effort to capture, store, and maintain them. Even with the advent of computers, due to high costs and limitation of storage space, the amount of data that could be stored was relatively small. However, the advancement in technology, decreasing cost of disk hardware, and availability of cloud storage has facilitated the capture and storage of large volumes of different varieties data generated at great velocities in a relatively short time. In other words, we have big data!

This story shows some interesting statistics related to the generation of data, its volume and growth. Also, we have some interesting statistics regarding the analysis and usage of data.

Data generation, volume, and growth statistics

  1. Between the dawn of time and 2003, 5 exabytes of data had been created at Google. By 2010, this volume of data was being created every two days, and by 2021, this volume of data was being created every 40 minutes.
  2. By 2024, 149 zettabytes of data will have been copied, collected, and organized. Compared to the 59 zettabytes produced in 2020, this is huge.
  3. 8 megabytes of new data is created every second — by every human on the planet.
  4. According to one estimate, 1.145 trillion megabytes of data are created every day.
  5. In 2020, according to Domo, each individual generated almost 2.5 quintillion bytes of data each day.
  6. According to the same report from DOMO, in 2020, each individual generated approximately 1.7 MB of data each second.
  7. 80–90% of the data that are generated today is unstructured (Source: CIO)
  8. In 2020, the ratio between unique and replicated data is 1:9. By 2024, the ratio between unique and replicated data will be 1:10.(Source: IDC).
  9. The amount of digital data in the universe is growing at an exponential rate, doubling every two years.
  10. By 2025, the world will produce slightly over 180 zettabytes of data.
  11. As of January 2022, 2,701 data centers were in the United States, with a further 487 data centers located in Germany. The United Kingdom ranked third among countries in terms of the number of data centers with 456, while China recorded 443.
  12. Data interactions went up by 5000% between 2010 and 2020 (Source: Forbes). To be more precise, data usage increased from 1.2 trillion gigabytes to almost 60 trillion gigabytes.
  13. According to Org, it would take 181 million years to download all the data of the internet.

Data analysis and usage

  1. Data analysis statistics — as per Professor Patrick Wolfe, Executive Director of the University College of London’s Big Data Institute, just about 0.5% of all data is currently analyzed, and that percentage is decreasing as more data is collected.
  2. A 2020 report by MicroStrategy discovered that around 94% of companies believe that data and analytics will be imperative for their growth and digital transformation.
  3. According to Sigma Computing, 63% of companies are not able to gather insights from big data.
  4. Forrester reports suggest that between 60% and 73% of the total data is never used for analytics.
  5. An IBM study states 80% of data scientists utilize their time finding, organizing, and cleansing data (that is improving data quality), and only 20% on data analysis.

Food for thought and the way ahead!

The statistics presented in this story reveal that the volume of data is increasing exponentially every day, and most companies believe that data and analytics will be imperative for their growth and digital transformation. Isn’t it ironic then that less than a significant percentage of this enormous data is not analyzed?

If data is collected, but not used, it is not serving any real purpose for the organization. In a way, it is a burden to the organization — in terms of storage and maintenance costs, and risk of possible thefts that may lead to non-compliance costs (for example, penalties and fines), and reputational damage.

Also, the fact that data scientists spend 80% of the time finding, organizing and cleansing, reflects on the poor state of both data quality and data governance. It is not astonishing that a 2022 data quality survey discovered that poor data quality impacts 26% of their companies’ revenue.

Organizations need to invest in data quality and data governance to get the best value of their data.

What do you think?

Please do let me know whether this article was helpful, and what more you would like to read with respect to data, big data, analytics, quality, governance. Leave a comment here or connect on LinkedIn or Research Gate.

You may also like the articles in the following lists:

Free Resources for Data Analysis, Data Quality, etc…

Data Quality, Data Analysis, Business Analysis

Thank you for reading! Take care!

Biography: Rupa Mahanti is a consultant, researcher, speaker, data enthusiast and author of several books on data (data quality, data governance and data analytics). She is also publisher of the “The Data Pub” Newsletter on Substack.

--

--

Dr. Rupa Mahanti
Analyst’s corner

Author of 7 books, mostly on data; Ph.D. in Computer Sc. & Eng.; Digital art designer; Publisher- The Data Pub (https://thedatapub.substack.com/)