Skip to main content

Probably Overthinking It

How to Use Data to Answer Questions, Avoid Statistical Traps, and Make Better Decisions

An essential guide to the ways data can improve decision making.
 
Statistics are everywhere: in news reports, at the doctor’s office, and in every sort of forecast, from the stock market to the weather. Blogger, teacher, and computer scientist Allen B. Downey knows well that people have an innate ability both to understand statistics and to be fooled by them. As he makes clear in this accessible introduction to statistical thinking, the stakes are big. Simple misunderstandings have led to incorrect medical prognoses, underestimated the likelihood of large earthquakes, hindered social justice efforts, and resulted in dubious policy decisions. There are right and wrong ways to look at numbers, and Downey will help you see which are which.
 
Probably Overthinking It uses real data to delve into real examples with real consequences, drawing on cases from health campaigns, political movements, chess rankings, and more. He lays out common pitfalls—like the base rate fallacy, length-biased sampling, and Simpson’s paradox—and shines a light on what we learn when we interpret data correctly, and what goes wrong when we don’t. Using data visualizations instead of equations, he builds understanding from the basics to help you recognize errors, whether in your own thinking or in media reports. Even if you have never studied statistics—or if you have and forgot everything you learned—this book will offer new insight into the methods and measurements that help us understand the world.

256 pages | 126 line drawings, 22 tables | 6 x 9 | © 2023

Mathematics and Statistics

Reviews

“Downey presents a large assortment of graphs and numerical results drawn from legitimate databases and provides clear-cut examples to demonstrate how interpretive pitfalls arise. His style is lively and designed to appeal to the curious reader, and his choice of graphical formats skillfully illustrates his points. He explains challenging issues fully in a clear, logical manner.” 

Choice

“While it eschews the technical density of a textbook, it demands more intellectual engagement than a typical pop science book, drawing readers in with its broad scope of topics and colorful storytelling.”

Implicit Assumptions

“Downey’s pure love for the subject shines through abundantly, as does his social conscience and belief in the importance of statistical methods to illuminate the greatest, most challenging issues of our time.”

Aubrey Clayton, author of Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science

Probably Overthinking It shows how fascinating and interesting statistics can be. Readers don’t need to be expert mathematicians. They just need to bring their curiosity about the world.”

Ravin Kumar, data scientist at Google

Probably Overthinking It is a delightful exposition of commonly-encountered statistical fallacies and paradoxes and why they matter. The illustrations are powerful and the prose is exceptionally clear. There are few domains of human activity to which the lessons of this volume are not applicable.”

Samuel H. Preston, coauthor of Demography: Measuring and Modeling Population Processes

“Mark Twain once observed that ‘facts are stubborn things, but statistics are more pliable.’ Downey understands just how that happens, even to people who are not trying to obfuscate. It was an honest researcher who in 1971 found data that seemed to indicate smoking by pregnant women might be good for their babies—a misinterpretation that may have delayed anti-smoking measures by a decade. In this clear and cogent analysis, Downey explains why the data was misunderstood, as well as much else. It is a valuable book.”

Floyd Norris, Johns Hopkins University, former chief financial correspondent for the New York Times

Table of Contents

Introduction
1. Are You Normal? Hint: No
2. Relay Races and Revolving Doors
3. Defy Tradition, Save the World
4. Extremes, Outliers, and GOATs
5. Better Than New
6. Jumping to Conclusions
7. Causation, Collision, and Confusion
8. The Long Tail of Disaster
9. Fairness and Fallacy
10. Penguins, Pessimists, and Paradoxes
11. Changing Hearts and Minds
12. Chasing the Overton Window
Epilogue
Acknowledgments
Bibliography
Index

Excerpt

Let me start with a premise: we are better off when our decisions are guided by evidence and reason. By “evidence,” I mean data that is relevant to a question. By “reason” I mean the thought processes we use to interpret evidence and make decisions. And by “better off,” I mean we are more likely to accomplish what we set out to do— and more likely to avoid undesired outcomes. 

Sometimes interpreting data is easy. For example, one of the reasons we know that smoking causes lung cancer is that when only 20% of the population smoked, 80% of people with lung cancer were smokers. If you are a doctor who treats patients with lung cancer, it does not take long to notice numbers like that. 

But interpreting data is not always that easy. For example, in 1971 a researcher at the University of California, Berkeley, published a paper about the relationship between smoking during pregnancy, the weight of babies at birth, and mortality in the first month of life. He found that babies of mothers who smoke are lighter at birth and more likely to be classified as “low birthweight.” Also, low- birthweight babies are more likely to die within a month of birth, by a factor of 22. These results were not surprising. 

However, when he looked specifically at the low- birthweight babies, he found that the mortality rate for children of smokers is lower, by a factor of two. That was surprising. He also found that among low-birthweight babies, children of smokers are less likely to have birth defects, also by a factor of 2. These results make maternal smoking seem beneficial for low- birthweight babies, somehow protecting them from birth defects and mortality. The paper was influential. In a 2014 retrospective in the International Journal of Epidemiology, one commentator suggests it was responsible for “holding up anti- smoking measures among pregnant women for perhaps a decade” in the United States. Another suggests it “postponed by several years any campaign to change mothers’ smoking habits” in the United Kingdom. 

But it was a mistake. In fact, maternal smoking is bad for babies, low birthweight or not. The reason for the apparent benefit is a statistical error I will explain in chapter 7. Among epidemiologists, this example is known as the low-birthweight paradox. A related phenomenon is called the obesity paradox. Other examples in this book include Berkson’s paradox and Simpson’s paradox. As you might infer from the prevalence of “paradoxes,” using data to answer questions can be tricky. But it is  not hopeless. Once you have seen a few examples, you will start to recognize them, and you will be less likely to be fooled. And I have collected a lot of examples. 

So we can use data to answer questions and resolve debates. We can also use it to make better decisions, but it is not always easy. One of the challenges is that our intuition for probability is sometimes dangerously misleading. For example, in October 2021, a guest on a well- known podcast reported with alarm that “in the [United Kingdom] 70- plus percent of the people who die now from COVID are fully vaccinated.” He was correct; that number was from a report published by Public Health England, based on reliable national statistics. But his implication— that the vaccine is useless or actually harmful— is wrong. 

Be the first to know

Get the latest updates on new releases, special offers, and media highlights when you subscribe to our email lists!

Sign up here for updates about the Press