Technology

How “Big Data” Went Bust

And what comes next.

The end of big data.

Photo illustration by Lisa Larson-Walker. Photo by Thinkstock.

Five years ago—in February 2012—an article in the New York Times’ Sunday Review heralded the arrival of a new epoch in human affairs: “The Age of Big Data.” Society was embarking on a revolution, the article informed us, one in which the collection and analysis of enormous quantities of data would transform almost every facet of life. No longer would data analysis be confined to spreadsheets and regressions: The advent of supercomputing, combined with the proliferation of internet-connected sensors that could record data constantly and send it to the cloud, meant that the sort ­of advanced statistical analysis described in Michael Lewis’ 2003 baseball book Moneyball could be applied to fields ranging from business to academia to medicine to romance. Not only that, but sophisticated data analysis software could help identify utterly unexpected correlations, such as a relationship between a loan recipient’s use of all caps and his likelihood of defaulting. This would surely yield novel insights that would change how we think about, well, just about everything.

The Times was not the first to arrive at this conclusion: Its story drew on a seminal McKinsey report from 2011 and was buttressed by an official report from the 2012 World Economic Forum in Davos, Switzerland, titled “Big Data, Big Impact.” But the pronouncement by the paper of record seems as apt a milestone as any to mark the era’s onset. The following month, Barack Obama’s White House launched a $200 million national big data initiative, and the frenzy commenced: Academia, nonprofits, governments, and companies raced to figure out just what “big data” was and how they could capitalize on it.

The frenzy, as it turned out, was short-lived. Five years later, data plays a vastly expanded role in our lives, yet the term big data has gone out of fashion—and acquired something of an unsavory reputation. It’s worth looking back at what, exactly, happened to the revolution we were promised, and where data, analytics, and algorithms are headed now.

The tech consulting firm Gartner dropped big data from its famous “hype cycle” report in 2015, and it hasn’t returned. That isn’t because companies were giving up on the concept of mining vast data sets for insights, the company clarified. It’s because the practice had already become so prevalent that it no longer qualified as an “emerging technology.” Big data helps to power the algorithms behind our news feeds, Netflix recommendations, automated stock trades, autocorrect features, and health trackers, among countless other tools. But we’re less likely to use the term big data these days—we just call it data. We’ve begun to take for granted that data sets can contain billions or even trillions of observations and that sophisticated software can detect trends in them.

When the term is still deployed, it’s more likely to carry a pejorative connotation, as in Cathy O’Neil’s 2016 book Weapons of Math Destruction, or Frank Pasquale’s 2015 The Black Box Society. The haste to implement and apply big data, via what’s often called “data-driven decision-making,” resulted in grievous mistakes.

Some were blatant: There was the time Target sent coupons for baby items to the family of a teenage girl who hadn’t told anyone she was pregnant. Or the time Pinterest congratulated single women on their impending marriages. Or the Google Photos snafu, in which the company’s vaunted A.I. mistook black people for gorillas due to a lack of diversity in the data it was trained on. (It’s worth pointing out that, in this case at least, the “big data” wasn’t quite big enough.)

Others were more subtle, and perhaps more insidious. Among these are the types of opaque, data-powered institutional models O’Neil chronicles in her important book: the ones that (arguably) encoded racial bias in recidivism models used by courts to sentence criminals, or fired beloved schoolteachers based on questionable test-score data. And fresh examples of big data gone wrong keep coming—like the Facebook algorithms that evidently helped Russian agents sow division in the American electorate via well-targeted, hyperpartisan fake news.

The problem with “big data” is not that data is bad. It’s not even that big data is bad: Applied carefully, massive data sets can reveal important trends that would otherwise go undetected. It’s the fetishization of data, and its uncritical use, that tends to lead to disaster, as Julia Rose West recently wrote for Slate. And that’s what “big data,” as a catchphrase, came to represent.

By its nature, big data is hard to interpret. When you’re collecting billions of data points—clicks or cursor positions on a website; turns of a turnstile in a large public space; hourly wind speed observations from around the world; tweets—the provenance of any given data point is obscured. This in turn means that seemingly high-level trends might turn out to be artifacts of problems in the data or methodology at the most granular level possible. But perhaps the bigger problem is that the data you have are usually only a proxy for what you really want to know. Big data doesn’t solve that problem—it magnifies it.

For instance, public opinion polling is widely used as a proxy for how people will vote in an election. But as surprise elections throughout the decades have reminded us—from Tom Bradley’s 1982 loss in the California gubernatorial race on through to Brexit and Trump—there is not always a perfect correspondence between the two. Facebook used to measure users’ interest in a given post mainly by whether they hit the “like” button on it. But as the algorithmically optimized news feed began to be overrun by clickbait, like-bait, and endless baby photos—causing user satisfaction to plunge—the company’s higher-ups gradually realized that “liking” something is not quite the same as actually liking it.

The wider the gap between the proxy and the thing you’re actually trying to measure, the more dangerous it is to place too much weight on it. Take the aforementioned example from early in O’Neil’s book: school districts’ use of mathematical models that tie teacher evaluations to student test scores. Student test scores are a function of numerous important factors outside of a teacher’s control. Part of the draw of big data was that you could find meaningful correlations even in very noisy data sets, thanks to the sheer volume of data, coupled with powerful software algorithms that can theoretically control for confounding variables. The model O’Neil describes, for instance, drew on a wide range of demographic correlations from students across many districts and systems to generate an “expected” set of test scores against which their actual results could be compared. (For this reason, O’Neil considers it an example of “big data,” at least colloquially speaking, even if the data set was not large enough to meet the threshold of some technical definitions of the term.)

Imagine for a second that such a system were applied within the context of a single school—with just the teachers in each grade compared with one another. Without the magic of big data, anomalies in student test scores in a given year would be glaring. No intelligent person examining them could be under the illusion that they corresponded neatly even with the abilities of the students taking them, let alone the teacher they had in a given year. Moreover, it would be relatively easy to investigate them on a case-by-case basis and figure out what was going on.

The system implemented under then-D.C. schools chancellor Michelle Rhee, however, was far more opaque. Because the data set was big rather than small, it had to be crunched and interpreted by a third-party consultant using a proprietary mathematical model. That lent a veneer of objectivity, but it foreclosed the possibility of closely interrogating any given output to see exactly how the model was arriving at its conclusions. O’Neil’s analysis suggested, for instance, that some teachers may have received a low score not because their students performed poorly, but because those same students had performed suspiciously well the prior year—perhaps because the teacher in the grade below had fudged their answers in order to boost his own teacher rating. But officials confronted with that possibility evinced little interest in diving into the mechanics of the model to confirm it.

That is not to say that student test scores, opinion polls, content-ranking algorithms, or recidivism prediction models need be ignored altogether. Aside from swearing off data and reverting to anecdote and intuition, there are at least two viable ways to deal with the problems that arise from the imperfect relationship between a data set and the real-world outcome you’re trying to measure or predict.

One is, in short: moar data. This has long been Facebook’s approach. When it became apparent that users’ “likes” were a flawed proxy for what they actually wanted to see more of in their feeds, the company responded by adding more and more proxies to its model. It began measuring other things, like the amount of time they spent looking at a post in their feed, the amount of time they spent reading a story they had clicked on, and whether they hit “like” before or after they had read the piece. When Facebook’s engineers had gone as far as they could in weighting and optimizing those metrics, they found that users were still unsatisfied in important ways. So the company added yet more metrics to the sauce: It started running huge user-survey panels, added new reaction emojis by which users could convey more nuanced sentiments, and started using A.I. to detect clickbait-y language in posts by pages and publishers. The company knows none of these proxies are perfect. But by constantly adding more of them to the mix, it can theoretically edge ever closer to an algorithm that delivers to users the posts that they most want to see.

One downside of the moar data approach is that it’s hard and expensive. Another is that the more variables are added to your model, the more complex, opaque, and unintelligible its methodology becomes. This is part of the problem Pasquale articulated in The Black Box Society. Even the most sophisticated algorithm, drawing on the best data sets, can go awry—and when it does, diagnosing the problem can be nigh-impossible. There are also the perils of “overfitting” and false confidence: The more sophisticated your model becomes, the more perfectly it seems to match up with all your past observations, and the more faith you place in it, the greater the danger that it will eventually fail you in a dramatic way. (Think mortgage crisis, election prediction models, and Zynga.)

Another possible response to the problems that arise from biases in big data sets is what some have taken to calling “small data.” Small data refers to data sets that are simple enough to be analyzed and interpreted directly by humans, without recourse to supercomputers or Hadoop jobs. Like “slow food,” the term arose as a conscious reaction to the prevalence of its opposite.

Danish author and marketing consultant Martin Lindstrom made the case for it in his 2016 book Small Data: The Tiny Clues That Uncover Big Trends. For example, the Danish toymaker Lego had moved toward larger blocks, along with theme parks and video games in the late 1990s and early 2000s, based on reams of research suggesting that millennials required instant gratification and were more attracted to easier projects than intricate ones. It didn’t work. That data-driven paradigm was finally upended by a much smaller-scale, ethnographic study that its marketers conducted in 2004, interviewing individual children about their most prized possessions. They found that kids developed the most attachment and loyalty to products that allowed them to demonstrate hard-earned mastery—like a pair of old sneakers worn down by hundreds of hours of skateboarding. Lego, in Lindstrom’s telling (he consulted for the company and is an enthusiast himself), refocused on its original, small blocks, and that helped to spur its revival.

Amazon is, in many ways, a leading example of the power of big data. Its data on hundreds of millions of customers’ buying and browsing habits have helped to make it perhaps the most successful retailer the world has ever seen. Yet as author Brad Stone recounts in his book The Everything Store, CEO Jeff Bezos has an interesting (and, for his employees, intimidating) way of counterbalancing all that impersonal analysis. On a somewhat regular basis, he takes an emailed complaint from an individual customer, forwards it to his executive team, and demands that they not only fix it but thoroughly investigate how it happened and prepare a report on what went wrong.

This suggests that Bezos understands not only big data’s power to make systems more efficient, but its potential to obscure the causes and mechanisms of specific problems that aren’t being effectively measured in the aggregate. A safeguard, when making decisions based on things you know how to measure, is to make sure there are also mechanisms by which you can be made aware of the things you don’t know how to measure. “The question is always, what data don’t you collect?” O’Neil said in a phone interview. “What’s the data you don’t see?”

There is some hope, then, that in moving away from “big data” as a buzzword, we’re moving gradually toward a more nuanced understanding of data’s power and pitfalls. In retrospect, it makes sense that the sudden proliferation of data-collecting sensors and data-crunching supercomputers would trigger a sort of gold rush, and that fear of missing out would in many cases trump caution and prudence. It was inevitable that thoughtful people would start to call our collective attention to these cases, and that there would be a backlash, and perhaps ultimately a sort of Hegelian synthesis.

Yet the threats posed by the misuse of big data haven’t gone away just because we no longer speak that particular term in reverent tones. Glance at the very peak of Gartner’s 2017 hype cycle and you’ll find the terms machine learning and deep learning, alongside related terms such as autonomous vehicles and virtual assistants that represent real-world applications of these computing techniques. These are new layers of scaffolding built on the same foundation as big data, and they all rely on it. They’re already leading to real breakthroughs—but we can rest assured that they’re also leading to huge mistakes.