The below text, entitled ” Making it quick and easy to report replications”, appears in The Psychologist in their May opinion special on replication and was written by Alex O. Holcombe and Hal Pashler.
What proportion of the statistically significant empirical findings reported in psychological journals are not real? That is, for how many is there actually no phenomenon in the world resembling the one reported?
The intention of our basic statistical practices is to place a small cap on the proportion of errors we are willing to tolerate in our literature, namely the alpha level we use in our statistics – typically 5 per cent. The alpha level is the maximum probability for a published study that an effect at least as large as the one measured could have occurred if there were actually no effect present. This ought to place an upper bound on the proportion of spurious results appearing in our literature. But it doesn’t work.
One of the biggest reasons is publication bias: the fact that scientists are much more likely to publish significant findings than they are to report null effects. There are a host of reasons for this practice, some of them reflecting confusions about statistics, some reflecting editorial tastes that may
have some merit. We also know that researchers who find non-significant results tend not to publish them, probably because they are both difficult to publish and there is little reward for doing so.
To see how this publication bias can lead to a literature infested with error, imagine that 1000 investigators each conduct a study looking for a difference, and a difference actually exists in, say, 10 per cent of these studies, or 100 cases. If the investigations have a power of .5, then 50 of these differences will be discovered.
Of the 900 studies performed looking for effects that do not exist, 5 per cent or 45 will succeed. The result, then, will be that 45 out of 95 significant results (47 per cent) will be type 1 errors. As Joe Simmons and colleagues (2011) recently pointed out in Psychological Science, hidden flexibility in data analysis and presentation is likely to inflate the rate of type 1 errors still further.
The harm done by publication bias has been recognised since at least 1959, when Theodore Sterling canvassed 294 psychology papers and found that 286 of them reported a positive result. Thirty-seven years later, Sterling re-evaluated the literature and concluded that little had changed. Indeed, the problem appears to be getting worse, and not just in psychology. The title of a recent paper in Scientometrics by Fanelli (2012) declares ‘Negative results are disappearing from most disciplines and countries’ based on an analysis of various scientific fields.
The problem is so bad that after a series of calculations involving some additional considerations, John Ioannidis concluded in a 2005 paper that ‘most published research findings are false’. Although the estimate of error depends on several unknowns regarding the proportion of effects being looked for which actually exist, statistical power, and so forth, Ioannidis’ conclusion is, unfortunately and disturbingly, quite reasonable.
Given that the incentives for publishing null results are modest, we must make it easy and quick for scientists to report them. The traditional system of journal article cover-letter writing, submission, rejection, repeat until sent out for review, wait for reviews, revise, and write rejoinders to reviewers is far too consuming of researchers’ time.
To provide a quick way to report replication studies, we have created, together with Bobbie Spellman and Sean Kang, a new website called PsychFileDrawer.org. The site is designed specifically for replications of previously published studies, as this allows the reporting process to be quick. In the case of exact replications, for instance, researchers can simply indicate that their methodology was identical to the published study. When their method differs somewhat, they can report only the differences. Researchers are invited to report the results in as much detail as they can, but we believe even those who simply report the main effect and associated statistics are making a valuable contribution to the researcher community.
In addition to allowing researchers to report their method and results, a forum is provided for users to discuss each report. The website further allows users to vote on ‘What are important studies that your field of Psychology gives credence to, but which – as far as you know – have not been replicated in any published follow-up work?’ Each registered user is allowed up to three votes. As of this writing, the study with the most votes is entitled ‘Improving fluid intelligence with training on working memory’, which was published in PNAS in 2008. As the list solidifies, we hope it will encourage investigators to conduct replications of these studies and report the results on the site.
The most novel feature of PsychFileDrawer is an ‘article-specific networking tool’ designed for users who are timid about posting their findings or who simply wish to connect with others interested in a particular published finding about which nothing is yet posted to PsychFileDrawer. With this feature, users register their interest in learning about unpublished replication attempts relating to a particular study; whenever other users express interest in the same study, the website automatically puts these users in touch with each other via e-mail so they can discuss their experiences and hopefully post their results on the site.
The website is still in beta testing and we continue to add new features. We hope readers will visit and provide suggestions for how it might be improved.