I’ve been reading a book lately before bed, a little bit at a time: Beyond Significance Testing, by Rex B. Kline. It isn’t exactly a suspenseful page-turner; maybe if I tried reading it some other time of day than when I am already sleepy I might be able to get through it faster.
The purpose of the book is to convince readers that Null Hypothesis Significance Testing (NHST) should no longer be practiced, and to suggest alternatives like using confidence intervals and always reporting effect sizes. I think my favorite quote from the book so far is this one, in a paragraph devoted to the suggestion that one way to fix the NHST problem is just to use more careful, less overreaching language in talking about p-values and significance tests (like phasing out the word “significant”):
You can put candles in a cow pie, but that does not
make it a birthday cake.
You can tell what the author thinks of that idea. Ouch.
Part I of the book also makes an interesting argument that NHST is not only bad social science, it is bad FOR social science. The idea is that because p-values are colloquially understood to mean something they actually do not, researchers believe the findings of a single study are more robust and reliable than they actually are. For example, a p-value represents the conditional probability of the data given the null hypothesis, NOT the probability that the null hypothesis is true given the data. According to the book, this and other misinterpretations about the logic of significance testing cause the literature to be biased toward research results “about fad topics that clutter the research literature but have little scientific value”, that are never replicated:
…if one believes that p < .01 implies that the result is likely to be repeated more than 99 times out of 100, why bother to replicate? A related cognitive error is the belief that statistically significant findings should be replicated, but not ones for which [the null hypothesis] was not rejected (F. Schmidt & Hunter, 1997).
This bias perpetuates research for which the practical, meaningful significance of the results is not clear.
A lot of these arguments make sense to me—it seems like the process of NHST hides a lot of the error and uncertainty that is part of doing science, making it seem like the results of individual studies are more definitive and certain than they actually are. I’m looking forward to making it through the rest of the book and starting to practice the alternatives it suggests.