Big Data, Big Dupe: A Progress Report

My new book, Big Data, Big Dupe, was published early this month. Since its publication, several readers have expressed their gratitude in emails. As you can imagine, this is both heartwarming and affirming. Big Data, Big Dupe confirms what these seasoned data professionals recognized long ago on their own, and in some cases have been arguing for years. Here are a few excerpts from emails that I’ve received:

I hope your book is wildly successful in a hurry, does its job, and then sinks into obscurity along with its topic.  We can only hope! 

I hope this short book makes it into the hands of decision-makers everywhere just in time for their budget meetings… I can’t imagine the waste of time and money that this buzz word has cost over the past decade.

Like yourself I have been doing business intelligence, data science, data warehousing, etc., for 21 years this year and have never seen such a wool over the eyes sham as Big Data…The more we can do to destroy the ruse, the better!

I’m reading Big Data, Big Dupe and nodding my head through most of it. There is no lack of snake oil in the IT industry.

Having been in the BI world for the past 20 years…I lead a small (6 to 10) cross-functional/cross-team collaboration group with like-minded folks from across the organization. We often gather to pontificate, share, and collaborate on what we are actively working on with data in our various business units, among other topics.  Lately we’ve been discussing the Big Data, Big Dupe ideas and how within [our organization] it has become so true. At times we are like ‘been saying this for years!’…

I believe deeply in the arguments you put forward in support of the scientific method, data sensemaking, and the right things to do despite their lack of sexiness.

As the title suggests, I argue in the book that Big Data is a marketing ruse. It is a term in search of meaning. Big Data is not a specific type of data. It is not a specific volume of data. (If you believe otherwise, please identify the agreed-upon threshold in volume that must be surpassed for data to become Big Data.) It is not a specific method or technique for processing data. It is not a specific technology for making sense of data. If it is none of these, what is it?

The answer, I believe, is that Big Data is an unredeemably ill-defined and therefore meaningless term that has been used to fuel a marketing campaign that began about ten years ago to sell data technologies and services. Existing data products and services at the time were losing their luster in public consciousness, so a new campaign emerged to rejuvenate sales without making substantive changes to those products and services. This campaign has promoted a great deal of nonsense and downright bad practices.

Big Data cannot be redeemed by pointing to an example of something useful that someone has done with data and exclaiming “Three cheers for Big Data,” for that useful thing would have still been done had the term Big Data never been coined. Much of the disinformation that’s associated with Big Data is propogated by good people with good intentions who prolong its nonsense by erroneously attributing beneficial but unrelated uses of data to it. When they equate Big Data with something useful, they make a semantic connection that lacks a connection to anything real. That semantic connection is no more credible than attributing a beneficial use of data to astrology. People do useful things with data all the time. How we interact with and make use of data has been gradually evolving for many years. Nothing that is qualitatively different about data or its use emerged roughly ten years ago to correspond with the emergence of the term Big Data.

Although no there is no consensus about the meaning of Big Data, one thing is certain: the term is responsible for a great deal of confusion and waste.

I read an article yesterday titled “Big Data – Useful Tool or Fetish?” that exposes some failures of Big Data. For example, it cites the failed $200,000,000 Big Data initiative of the Obama administration. You might think that I would applaud this article, but I don’t. I certainly appreciate the fact that it recognizes failures associated with Big Data, but its argument is logically flawed. Big Data is a meaningless term. As such, Big Data can neither fail nor succeed. By pointing out the failures of Big Data, this article endorses its existence, and in so doing perpetuates the ruse.

The article correctly assigns blame to the “fetishization of data” that is promoted by the Big Data marketing campaign. While Big Data now languishes with an “increasingly negative perception,” the gradual growth of skilled professionals and useful technologies continue to make good uses of data, as they always have.


Take care,

P.S. On March 6th, Stacey Barr interviewed me about Big Data, Big Dupe. You can find an audio recording of the interview on Stacey’s website.

4 Comments on “Big Data, Big Dupe: A Progress Report”


By Travis McTeer. March 5th, 2018 at 8:31 am

Ideally data sensemaking would involve good data and people with the necessary skills to use it. Unfortunately, as the shine continues to wear off of so called Big Data, the hype is shifting to Self Serve Analytics. So in effect we are moving from p-hacking to outright statistical ignorance. As someone who has spent more than a decade fighting tooth and nail to establish a good data culture at my organization, I fear for the long term reputation of data analysis as a perceived valuable source of insight.

By Stephen Few. March 5th, 2018 at 8:55 am

Travis,

“Self-service analytics” is yet another marketing campaign that was cooked up by technology vendors. It appeals to a somewhat different audience than the Big Data campaign. Whereas Big Data is shrouded in mystery, self-service analytics demystifies analytics by suggesting that anyone can do it–no skills required–given the right tool. Organizations are suckers for techno-magical solutions. As you no doubt know, turning the thinking over to a machine, especially one that is powered by crappy analytical software, is dangerous. This is true even if the software is good, for humans need to be involved in the loop.

By Vasim Chaudhari. March 30th, 2018 at 9:04 am

Hi Stephen,

I have been reading and following your books , blogs and its a great learning i derive from it.

Could you please share some of your thoughts on Data Science, Machine learning & AI which is currently much in demand both in Tech and Business.

Thanks,
Vasim

By Stephen Few. March 30th, 2018 at 9:40 am

Hi Vasim,

My opinion about data science is that the term is a misnomer. There is no science of data. People who call themselves data scientists may indeed do useful work, but what they’re doing doesn’t qualify as a scientific domain. The term “computer science” is also an unfortunate misnomer. Computer science department produce engineers, not scientists.

My opinion of machine learning is that it can certainly be useful when directed by skilled data sensemakers. Effective machine learning is directed by people, not by machines.

My opinion of AI is that it is vastly overhyped, especially as it applies to data sensemaking. True AI, as originally defined (i.e., a computer that exhibits general intelligence) has not been achieved and might never be achieved. Attempts to replace human thinking with machine thinking must be approached with great caution. Most claims about AI being incorporated in data analytics are bogus. Even if a computers did possess the general intelligence that is required for data sensemaking, we should be very concerned about allowing them to do our thinking for us. When we surrender tasks to machines that we can handle ourselves with proper training, our ability to perform those tasks diminishes and eventually disappears. We dare not lose our ability to reason. It is essential to our humanity.

Leave a Reply