How do bad numbers get into circulation in our political discourse, and how do they stay there, even after being refuted?
As a statistician and political scientist, I care about getting the numbers right, and I am also interested in how people get things wrong. With economic statistics, it is often all about interpretation: were President Obama’s policies a failure given that unemployment was higher at the end of his first term than when he took office, or were his policies a success given that the unemployment rate was in decline? Are record-high budget deficits a national scandal or at a reasonable level as percentage of GDP? But there are also examples of hard numbers, political statistics that have lodged themselves in outlets such as The New York Times opinion page even though they are just plain wrong. What can be done to correct this sort of mistake?
Errors in newspaper op-eds fall into a kind of limbo: they are influential and get the attention of millions of readers (including many in the economic and political elites), but they are labelled as opinion pieces and thus are not subject to the same level of caution and fact-checking associated with news articles. The Times and other leading news organizations will gladly run corrections if reporting in a news article is incorrect, but they seem to have a much lower bar for sloppy analyses in the opinion pages.
1. The Happy Tea Partiers
Take Arthur Brooks, conservative scholar and president of the American Enterprise Institute, who has written extensively on happiness. In this particular op-ed, Brooks presents a lot of statistics in a very reasonable-sounding way, in the Brooksian mode of low-key concern:
“Who is happier about life — liberals or conservatives? The answer might seem straightforward. After all, there is an entire academic literature in the social sciences dedicated to showing conservatives as naturally authoritarian, dogmatic, intolerant of ambiguity, fearful of threat and loss, low in self-esteem and uncomfortable with complex modes of thinking. And it was the candidate Barack Obama in 2008 who infamously labelled blue-collar voters “bitter,” as they “cling to guns or religion.” Obviously, liberals must be happier, right?
Wrong. Scholars on both the left and right have studied this question extensively, and have reached a consensus that it is conservatives who possess the happiness edge. … This pattern has persisted for decades. The question isn’t whether this is true, but why.”
Brooks concludes: “None, it seems, are happier than the Tea Partiers, many of whom cling to guns and faith with great tenacity. Which some moderately liberal readers of this newspaper might find quite depressing.”
Where did Brooks’s dramatic and surprising claim that Tea Partiers are the happiest Americans come from? Jay Livingston, a sociologist at Montclair State University in New Jersey, did some detective work. He took a look at General Social Survey data and found that, if you average data from 1972 through 2010, you indeed find self-identified extremely conservative people to be the happiest (on average), among all ideological categories. But if you just look at 2009-10 (i.e., the Tea Party era), self-identified extremely conservative people turn out to be the most unhappy, as these graphs point out.
This mistake is consequential—the data show the reverse of the pattern claimed in the op-ed—but the steps leading to the mistake are understandable. Brooks took a high-quality data set from past data and assumed the pattern he saw would remain valid. He carefully studied the data on happiness and political attitudes several years ago in the course of writing a book, and when writing this recent op-ed he did not bother checking with recent data. That is understandable. We cannot think to check everything.
Still, I did not like the idea of those false Tea Party numbers appearing uncorrected in the newspaper, and I sent the editors a note explaining the situation. The editorial-page team has not, to my knowledge, ever run a correction. I understand that correcting errors is not the top priority of the Times, but as a statistician I remain upset.
2. The Declining Jews
Ron Unz is a former businessman and political activist who posted a long article claiming, among other things, that Harvard University discriminates in favor of Jews in its undergraduate admissions. He based this claim on counts of Jewish-appearing names among Ivy League undergraduates, National Merit Scholar semi-finalists, International Mathematical Olympiad participants, and other lists of high-achieving high-school students. Unz’s claims originally appeared in The American Conservative, an obscure magazine published by Unz and originally associated with Patrick Buchanan, but gained wide circulation after being touted by New York Times columnist David Brooks, who wrote:
“You’re going to want to argue with Unz’s article all the way along, especially for its narrow, math-test-driven view of merit. But it’s potentially ground-shifting. Unz’s other big point is that Jews are vastly overrepresented at elite universities and that Jewish achievement has collapsed. In the 1970s, for example, 40 percent of top scorers in the Math Olympiad had Jewish names. Now 2.5 percent do.”
It turned out, though, that Unz’s numbers were way off. It is not so easy to count Jews based on their names. Unz’s estimates came from different, incompatible sources, and he used different rules when looking at different lists. The purported drop from 40% to 2.5% is actually a much more gradual decline from 25%-30% to 12%-15%, easily explainable based on demographic changes and increased competition from Asian-Americans in recent decades.
I learned of these problems from Janet Mertz, a professor of oncology at the University of Wisconsin who has published some articles in recent years on the sex and ethnicity distribution of high-end mathematics achievement. After reading David Brooks’s column a few months ago, Mertz tried with no success to correct the record. Brooks and The New York Times did not respond to her emails, did not publish any of her letters, and Unz reports that he and others “ignored or dismissed” Mertz. I looked into it and found Mertz’s criticisms (along with others sent to me by another correspondent) to be convincing. Indeed, Unz later admitted that he derived his 40% and 2.5% calculations from “five minutes of cursory surname analysis.” There has still been no correction in Brooks’ column or elsewhere in the Times.
One could argue that an opinion writer has more latitude because the reader knows that the op-ed page is for, well, opinion, and not objective reporting. But columnists do issue corrections on their own when they see fit, including Brooks. In one recent example, he ran the following correction of one of his columns:
“An earlier version of this column misstated the location of a statue in Washington that depicts a rambunctious horse being reined in by a muscular man. The sculpture, Michael Lantz’s ‘Man Controlling Trade’ (1942), is outside the Federal Trade Commission, not the Department of Labor.”
I was amazed to see that the Times considers the location of a statue to be worthy of a formal correction, while an erroneous published number on Jewish academic achievement, a number that is off by a factor of five, is allowed to stand.
3. How Did It All Go Wrong, and How Can We Do Better Next Time?
It should be no surprise that newspaper columnists writing on deadline make mistakes, or that a political activist can get hold of a number and refuse to let go, no matter how carefully his error is pointed out to him. What is scarier to me is how this behavior mirrors similar practices in science, and the way we learn, more generally.
In statistics, it is typically necessary and appropriate to combine information from different sources. But when doing so, one should calibrate: where possible, one should check that different data sources and methods give similar answers when estimating the same quantity. That is one thing that was not done in the examples above.
How do we think about this from a statistical perspective? A typical example of one of these stories starts with a jolt, with data that at first seems surprising but then can be fit into one’s larger worldview. In the case of Arthur Brooks, the surprise was survey data revealing extreme conservatives to be happier than other Americans. This is not what one would expect given the “angry” reputation of Tea Partiers, but, upon reflection, it is consistent with Brooks’s view of conservatives as generous, well-adjusted people, expressed in his books such as Who Really Cares: The Surprising Truth about Compassionate Conservatism and Gross National Happiness.
As for Unz, he has long been writing about the high proportion of Jews and Asians at Harvard, but it is perhaps only recently that he thought to get some lists of Mathematical Olympiad participants. He took a glance through, tried to count the Jewish names, and found a decline from 44% to 2.5%—a drop by a factor of 17. One’s first reaction is: That’s big news! And the second reaction could echo this initial take: even if the data are not perfect, this might seem too big a discrepancy to have occurred by mistake. In fact, though, large and dramatic numbers can be wrong, and in this case it turned out that a series of mistakes was enough to invalidate the larger claims. To track the chain of errors required some work, but once that work had been done, the published numbers were clearly wrong.
For David Brooks, the big surprise was seeing something interesting in a borderline anti-Semitic article in a fringe publication. But the article appeared to be backed up by hard numbers. The results of the apparent evidence made it that much more convincing. Perhaps Unz’s very outrageousness serves as a sort of protective coloration; his conclusions are so disturbing that we want to be careful to not dismiss him out of hand. So when he presents numbers implying that “Jewish achievement has collapsed,” we want to be fair to him and take his argument seriously. This is the essence of what Brooks was writing in his column: we should not dismiss Unz’s arguments just because the conclusions may be unpalatable. But in this case his numbers did turn out to be wrong.
As the saying goes, just because something is counterintuitive, that does not make it true. Mistakes are inevitable. But when a published number is clearly wrong, and the publication is informed of the error, and the number is still not corrected, it sends all the wrong messages and gives all the wrong incentives if the goal is clear, honest, and accurate communication. It encourages all sides in any given debate to use data carelessly if they have reason to believe that a wrong number, when injected into public discussion, will not be corrected.