The p-value, an Unproven Ally
Scientific research relies on careful observation of results and analysis of their cause. Are two factors truly correlated, or is their apparent relationship merely due to chance?
Updated on November 14, 2013
Scientific research relies on careful observation of results and analysis of their cause. Are two factors truly correlated, or is their apparent relationship merely due to chance? For many years, researchers in the sciences and social sciences have relied on calculation of the p-value to help determine whether results should be believed. More specifically, a p-value of less than 0.05 is commonly considered a threshold for statistical significance, despite a less than convincing history. Is this convention good for the research community?
On one level, choosing a threshold of 0.05 still corresponds to a 1 in 20 chance that the null hypothesis is true (e.g., no real correlation between two factors). But even that level of uncertainty assumes that conclusions based on that cutoff are accurate. A study published in the journal PNAS questions the utility of the p<0.05 threshold for many published results. The author of the study, Valen Johnson of Texas A&M University, uses some newly developed statistical methods to test the notion that p<0.05 is a useful cutoff value. Using a powerful new class of Bayesian tests, uniformly most powerful Bayesian tests, Johnson argues that as much as 25% of published results may be untrue. That is, studies reaching significance at p<0.05 correspond only to Bayes factors of 3 to 5, which are considered fairly weak evidence to support a conclusion.
Given the larger issues in reproducing published results, could the hallowed convention of p<0.05 be responsible? Certainly, the use of inappropriate statistics, researcher error, and even fraud contribute to the problem. Still, Johnson suggests that aiming for a p-value less than 0.005 would greatly decreases issues with reproducibility, saying, "Very few studies that fail to replicate are based on P values of 0.005 or smaller." Another solution is to abandon the frequentist tests most frequently used in research papers in favor of Bayesian analyses.
How have you used the p-value in your work? Do you think researchers are willing to lower the standard cutoff to 0.005 in an effort to increase confidence in their results? Because the number of published papers is already staggeringly high, it is critical to ensure that the results they present can be trusted. We'd love to hear from you.