Valentin Amrhein, Sander Greenland, Blakeley McShane, and I write:
Dr Ioannidis writes against our proposals [here and here] to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p<0.05 world.” We appreciate that he echoes our calls for “embracing uncertainty, avoiding hyped claims…and recognizing ‘statistical significance’ is often poorly understood.” We also welcome his agreement that the “interpretation of any result is far more complicated than just significance testing” and that “clinical, monetary, and other considerations may often have more importance than statistical findings.”
Nonetheless, we disagree that a statistical significance-based “filtering process is useful to avoid drowning in noise” in science and instead view such filtering as harmful. First, the implicit rule to not publish nonsignificant results biases the literature with overestimated effect sizes and encourages “hacking” to get significance. Second, nonsignificant results are often wrongly treated as zero. Third, significant results are often wrongly treated as truth rather than as the noisy estimates they are, thereby creating unrealistic expectations of replicability. Fourth, filtering on statistical significance provides no guarantee against noise. Instead, it amplifies noise because the quantity on which the filtering is based (the p-value) is itself extremely noisy and is made more so by dichotomizing it.
We also disagree that abandoning statistical significance will reduce science to “a state of statistical anarchy.” Indeed, the journal Epidemiology banned statistical significance in 1990 and is today recognized as a leader in the field.
Valid synthesis requires accounting for all relevant evidence—not just the subset that attained statistical significance. Thus, researchers should report more, not less, providing estimates and uncertainty statements for all quantities, justifying any exceptions, and considering ways the results are wrong. Publication criteria should be based on evaluating study design, data quality, and scientific content—not statistical significance.
Decisions are seldom necessary in scientific reporting. However, when they are required (as in clinical practice), they should be made based on the costs, benefits, and likelihoods of all possible outcomes, not via arbitrary cutoffs applied to statistical summaries such as p-values which capture little of this picture.
The replication crisis in science is not the product of the publication of unreliable findings. The publication of unreliable findings is unavoidable: as the saying goes, if we knew what we were doing, it would not be called research. Rather, the replication crisis has arisen because unreliable findings are presented as reliable.
I especially like our title and our last paragraph!
Let me also emphasize that we have a lot of positive advice of how researchers can design studies and collect and analyze data (see for example here, here, and here). “Abandon statistical significance” is not the main thing we have to say. We’re writing about statistical significance to do our best to clear up some points of confusion, but our ultimate message in most of our writing and practice is to offer positive alternatives.
P.S. Also to clarify: “Abandon statistical significance” does not mean “Abandon statistical methods.” I do think it’s generally a good idea to produce estimates accompanied by uncertainty statements. There’s lots and lots to be done.