Science Journalism and the Art of Expressing Uncertainty

It is all too easy for unsupported claims to get published in scientific publications. How can journalists address this?

Note: This piece was originally published on August 4, 2013.

Journalism is filled with examples of erroneous reporting turning into received opinion when reporters, editors, and the public take a story at face value after it came from a generally trusted source. Consider, for example, the claims of Iraq’s weapons of mass destruction, or the various public and corporate scandals where authorities ranging from government officials to the chairman of General Electric are taken at their word.

As a scientist, I am concerned about the publication and promotion of speculative research, but I also believe that journalists can address this problem. Indeed, the traditional journalistic tool of interviewing knowledgeable outsiders can help if the focus is on the aspects of uncertainty associated with any scientific claim. Modern science is, by and large, a set of research directions rather than a collection of nuggets of established truths.

In science reporting, the trusted sources are respected journals that actually are not infallible and often publish thought-provoking but speculative claims as settled truth. The story continues from there: The journal or the authors themselves promote the work in the news media, and established outlets report the claims without question. The journalists involved are implicitly following an assumption: If an article is published in a well regarded publication, treat it as true. In fact, this is a dangerous supposition.

Just to cite a few recent examples, news media have reported a finding that African countries are poor because they have too much genetic diversity (published in the American Economic Review); that parents who pay for college will actually encourage their children to do worse in class (American Journal of Sociology); and that women’s political attitudes show huge variation across the menstrual cycle (Psychological Science). Each of these topics is, in its own way, fascinating, but the particular studies have serious flaws, either in the design of their data collection (the political attitudes study), the analysis (the study of college grades), or the interpretation of their data analysis (the genetic diversity study). Flawed research can still contribute in some way toward our understanding—remember our view of science as a set of research directions—but journalists can mislead their readers if they present such claims unquestioningly.

The statistical errors in these published papers are important but subtle—subtle enough so that all three were published in the top journals in their fields. Papers such as this represent a fundamental difficulty in science reporting. On one hand, they are flawed in the sense that their conclusions are not fully supported by their data (at least, according to me and various other observers); on the other, we cannot expect a typical science reporter on his or her own to catch methodological errors that escaped several peer reviewers as well as the articles’ authors. My goal here is to suggest a strategy for science writers to express uncertainty about published studies without resorting to meaningless relativism.

I will get to my recommendations in the context of a paper from 2007 by sociologist Satoshi Kanazawa on the correlation between attractiveness of parents and sex of children. Some detail is required here because this is necessary to understand the statistical problems with this paper. But my ultimate reason for talking about this particular example is that it demonstrates the challenge of reporting on statistical claims. This study was reported in what I view as an inappropriately uncritical way in a leading outlet for science journalism, and I will address how this reporting could be improved without requiring some extraordinary level of statistical expertise on the part of the journalist.

I brought this case up a few years ago at a meeting of the National Association of Science Writers, when I spoke on the challenges of statistical inference for small effects. Using a dataset of 3,000 parents, Kanazawa found that the children of attractive parents were more likely to be girls, compared to the children of less attractive parents. The correlation was “statistically significant”—that is, there was less than a 5% chance of seeing a difference this extreme if there were no correlation in the general population. This result, along with some more general claims about evolutionary psychology, was published in the Journal of Theoretical Biology and received wide media exposure.

But Kanazawa’s claims were not supported by the data in the way claimed in his paper. Simply put, his sample size was so small that it would be essentially impossible to learn anything about the correlation between parental beauty and child’s sex in the population. This may sound surprising, given that a sample size of 3,000 seems large. But it is not given the scientific context.

There is a vast scientific literature on the human sex ratio, and any plausible differences in the probability of a female birth, comparing beautiful and ugly parents, would have to be very small: on the order of one-third of a percentage point or less. For example, it could be that the probability of having a girl is 48.9% for attractive parents and 48.7% for unattractive parents. It turns out that you would need a sample size far greater than 3,000 to detect such a small effect. To develop your intuition on this, consider national opinion polls, which typically interview about 1,500 people and have a margin of error of three percentage points either way. If you crunch the numbers, you would find that you need a representative sample of hundreds of thousands of people to detect differences of less than one-third of a percentage point. So from a mathematical standpoint, Kanazawa’s study never had a chance to provide an adequate estimate for what it was purporting to estimate.

What about the claim of statistical significance, namely, that a pattern as extreme as in the data would occur by chance less than 5% of the time? The answer is that events that are somewhat rare will happen if you look hard enough. In this case, there were various ways to slice the data. For example, in the survey, attractiveness was measured on a scale of one to five. Kanazawa’s statistically significant difference was a comparison between the most beautiful people (category 5), compared to categories 1-4. But he could have compared categories 4-5 to 1-3, or compared 3-5 to 1-2. Or, perhaps more reasonably, he could have fit a model called a linear regression, which can be considered as an average of all these comparisons. It turns out that, of all these, the comparison he looked at happened to be the one that was largest in the data at hand, and this comparison was, on its own, statistically significant.

At one level, we can call this a mistake. And this mistake did come under scrutiny, including from me; I later published a letter in the journal and an article in the magazine American Scientist expanding on the above criticisms. But broadly speaking, the quirky claim of an association between attractiveness and sex ratio received positive press attention at the time. For example, the Freakonomics Blog reported that this study suggests:

“There are more beautiful women in the world than there are handsome men. Why? Kanazawa argues it’s because good-looking parents are 36 percent more likely to have a baby daughter as their first child than a baby son—which suggests, evolutionarily speaking, that beauty is a trait more valuable for women than for men. The study was conducted with data from 3,000 Americans, derived from the National Longitudinal Study of Adolescent Health, and was published in the Journal of Theoretical Biology.”

Actually, from a quantitative perspective, the claim contradicts what is known about variation in the human sex ratio from the scientific literature. A difference of 36 percent is literally 100 times larger than anything that could reasonably be expected in the population.

As I said to the audience of science writers, this story demonstrates the challenges of reporting on technical work. It is the sort of error that can, and does, make its way past the author, peer reviewers, journal editors, and into the news media. This sort of thing happens—none of us is infallible—but it is worth thinking about how the news media could play a more active and constructive role in the scientific conversation.

How could journalists do more? This is where the importance of expert feedback comes in. Just as a careful journalist runs the veracity of a scoop by as many reliable sources as possible, he or she should interview as many experts as possible before reporting on a scientific claim. The point is not necessarily to interview an opponent of the study, or to present “both sides” of the story, but rather to talk to independent scholars get their views and troubleshoot as much as possible. The experts might very well endorse the study, but even then they are likely to add more nuance and caveats. In the Kanazawa study, for example, any expert in sex ratios would have questioned a claim of a 36% difference—or even, for that matter, a 3.6% difference. It is true that the statistical concerns—namely, the small sample size and the multiple comparisons—are a bit subtle for the average reader. But any sort of reality check would have helped by pointing out where this study took liberties.

The point is not that we need reflexive skepticism, or that every story becomes a controversy. Rather, journalists should remember to put any dramatic claims in context, given that publication in a leading journal does not by itself guarantee that work is free of serious error.

So what is new here? Journalists, who already know about the importance of interviewing experts, can bring their training in contextualizing stories to draw a clearer picture of the uncertainty that underlies so much scientific endeavor. We now live in a world of post-publication review—existing peer review serves a function but is not complete—and news reporting can be part of this. And it should not be a problem for a journalist to find these experts; many scientists would be flattered to be quoted in the press.

If journalists go slightly outside the loop — for example, asking a cognitive psychologist to comment on the work of a social psychologist, or asking a computer scientist for views on the work of a statistician – they have a chance to get a broader view. To put it another way: some of the problems of hyped science arise from the narrowness of subfields, but you can take advantage of this by moving to a neighbouring subfield to get an enhanced perspective.

Just as is the case with so many other beats, science journalism has to adhere to the rules of solid reporting and respect the need for skepticism. And this skepticism should not be exercised for the sake of manufacturing controversy—two sides clashing for the sake of getting attention—but for the sake of conveying to readers a sense of uncertainty, which is central to the scientific process. The point is not that all articles are fatally flawed, but that many newsworthy studies are coupled with press releases that, quite naturally, downplay uncertainty.

For an example of the value of critical science reporting, consider the recent discussion of the data-analysis choices of economists Carmen Reinhart and Kenneth Rogoff in their now-famous 2010 paper on public debt and economic growth. In this case, the lively discussion came only after some critics released a paper with detailed refutations of the analysis of Reinhart and Rogoff. The bigger point, though, is that when reporters recognize the uncertainty present in all scientific conclusions, I suspect they will be more likely to ask interesting questions and employ their journalistic skills.

Mayo

Consulting
experts might work if the journalist knows how, and genuinely wants to, find an expert capable of unearthing flaws in the claim or argument they propose to publish. In the case of politically-tinged arguments, it seems more likely they’d select the “expert” that supports their story. On the other hand, if one is a science reporter, or even
just reporting on statistical results, it does not seem far-fetched to have the
capability to raise a handful of standard methodological questions in their
report, e.g., about statistical and substantive significance, the magnitude of
an effect (indicated or well ruled out), the relationship between a statistical
correlation and some substantive causal claim, and the assumptions of the
analysis. This they could/should do even without invoking experts.

Science Journalism and the Art of Expressing Uncertainty

Andrew Gelman

About The Author: