Share

BOOK REVIEW: How big data exposes everyday lies

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are , by Seth Stephens-Davidowitz

AUTHOR Stephens-Davidowitz is a data scientist, someone employed to analyse and interpret complex digital data. He has worked at Google, who hired him after learning about the strength and accuracy of his data research into racism.

His exploration of data has led to fascinating revelations about mental illness, human sexuality, child abuse, abortion, advertising, religion and health. The datasets enabled by the digital explosion offered new perspectives on all manner of issues that didn’t exist a couple of decades ago.

The microscope made it possible to see that there is more to a drop of pond water than we thought, and the telescope showed us there is so much more to the night sky than we imagined. Digital data similarly reveals that there is more to human behaviour and society than we thought, and often very different to what we thought.

“One of the primary goals of this book… is to provide the missing evidence of what can be done with Big Data—how we can find the needles, if you will, in those larger and larger haystacks,” the author explains.

In the past we might have suspected something; now, using Big Data, we can prove it, or show that the world works in precisely the opposite manner.

The author’s grandmother frequently emphasised the importance of couples having common friends as a key factor for their marital success, as it was in hers. Is this sound advice?

A team of computer scientists recently analysed the biggest dataset ever assembled on human relationship, Facebook, to answer this question. What the data showed was that having a common core group of friends is a strong predictor that a relationship will not last. Having separate social circles may actually make relationships stronger.

So why did grandmother believe just the opposite of what is true? People tend to exaggerate the relevance of their own experience. We give far too much to weight to certain data points – ourselves. Similarly, we tend to overestimate the prevalence of anything that makes for a memorable story. Consider, for example, whether more people in the OECD countries die from terrorist attacks, or from drowning in bath tubs? (The answer is bath tubs!)

Four unique powers of Big Data

The author claims four unique powers of Big Data.

The first power of Big Data is, obviously, new data – data that could not be understood in small quantities.

The second power is being able to provide honest data. In the digital age, people still hide their thoughts, prejudices and desires from themselves and from other people. This is the origin of the book’s title, “Everybody Lies”. However, through people’s searches on the internet for example, even with their anonymity protected, people’s aggregated views are accurate and honest reflections of their thoughts.

We can also zoom in on small subsets of people - the third power of Big Data. For example, are people sick with the flu more likely to make flu-related searches? Which searches most closely track housing prices? If, for example, searches for schools in a district increase, we can expect housing price changes.

We can also do many causal experiments with Big Data. What types of crucial information will make the stock market move? In the US, one answer is the monthly unemployment rate. Financial institutions do whatever they can to maximise the speed with which they receive, analyse, and act on this information, and make buy or sell decisions. Today, once the labour statistics are released, the market will move in less time than it takes you to blink your eyes.

By analysing Big Data, we are also able to identify information of real value even if it is not explained. The size of a horse’s heart, and particularly the size of the left ventricle, is the single most important predictor of a horse’s success.

In the same vein, horses with small spleens earned virtually nothing. And the horse’s pedigree is a far less reliable predictor of success that we used to believe. This realisation will eventually affect the price of pedigreed horses.

Based on their Big Data analysis, Walmart identified a strong positive correlation between the sale of strawberry Pop-Tarts, and impending hurricanes. In like manner, the quality of a wine can be explained simply by the weather during the growing season, and less by a host of other factors we have become used to considering.

If your goal is to predict which wine will excel, what products will sell, which horses will win, you don’t need to be concerned with why your model works. “Just get the numbers right,” Stephens-Davidowitz recommends

Big Data comes in many forms – not only numbers, but text and even images. Traditionally, when academics or businesspeople want data, they conduct surveys.

Do newspapers influence readers’ left or right political leanings, or do readers’ leanings influence the newspaper? Using Big Data researchers can prove that, just as supermarkets identify what ice cream people want, and then fill their shelves with it, newspapers identify the viewpoints people want to read, and fill their pages with it.

The influence relationship is in the opposite direction to what many thought. But the two big data sets, how people vote in a district, and which papers they read, don’t lie.

Pictures are also data, as we see from the changing ways people have posed. Researchers studied 949 scanned yearbooks from American high schools from 1905 to 2013. From these they were able to create an “average” face out of the pictures from every decade. The image data showed how Americans, particularly women, started smiling in photos.

People originally thought of photographs as paintings for which you posed for hours. Holding a smile would have been impossible. When Kodak began associating photos with happiness, being photographed smiling was how people wanted to show others what a good time they were having.

This is the stuff of science, not pseudoscience. In the past, the world’s most famous linguists analysed individual texts; today they can reveal patterns across billions of books. The methodologies taught to graduate students in psychology, political science, and sociology and business, have been virtually untouched by the digital revolution.

This book demonstrates how much they have missed. 

Readability:  Light --+-- Serious
Insight:        High -+--- Low
Practical:      High --+-- Low

  • Ian Mann of Gateways consults internationally on leadership and strategy and is the author of Executive Update. Views expressed are his own.
We live in a world where facts and fiction get blurred
Who we choose to trust can have a profound impact on our lives. Join thousands of devoted South Africans who look to News24 to bring them news they can trust every day. As we celebrate 25 years, become a News24 subscriber as we strive to keep you informed, inspired and empowered.
Join News24 today
heading
description
username
Show Comments ()
Rand - Dollar
18.76
+1.4%
Rand - Pound
23.43
+0.3%
Rand - Euro
20.08
+0.2%
Rand - Aus dollar
12.25
+0.3%
Rand - Yen
0.12
+0.2%
Platinum
924.10
-0.0%
Palladium
959.00
+0.1%
Gold
2,337.68
0.0%
Silver
27.19
-0.0%
Brent Crude
89.50
+0.6%
Top 40
69,358
+1.3%
All Share
75,371
+1.4%
Resource 10
62,363
+0.4%
Industrial 25
103,903
+1.3%
Financial 15
16,161
+2.2%
All JSE data delayed by at least 15 minutes Iress logo
Company Snapshot
Editorial feedback and complaints

Contact the public editor with feedback for our journalists, complaints, queries or suggestions about articles on News24.

LEARN MORE
Government tenders

Find public sector tender opportunities in South Africa here.

Government tenders
This portal provides access to information on all tenders made by all public sector organisations in all spheres of government.
Browse tenders