Making it quick and easy to report replications

The below text, entitled ” Making it quick and easy to report replications”, appears in The Psychologist in their May opinion special on replication and was written by Alex O. Holcombe and Hal Pashler.

What proportion of the statistically significant empirical findings reported in psychological journals are not real? That is, for how many is there actually no phenomenon in the world resembling the one reported?

The intention of our basic statistical practices is to place a small cap on the proportion of errors we are willing to tolerate in our literature, namely the alpha level we use in our statistics – typically 5 per cent. The alpha level is the maximum probability for a published study that an effect at least as large as the one measured could have occurred if there were actually no effect present. This ought to place an upper bound on the proportion of spurious results appearing in our literature. But it doesn’t work.

One of the biggest reasons is publication bias: the fact that scientists are much more likely to publish significant findings than they are to report null effects. There are a host of reasons for this practice, some of them reflecting confusions about statistics, some reflecting editorial tastes that may
have some merit. We also know that researchers who find non-significant results tend not to publish them, probably because they are both difficult to publish and there is little reward for doing so.

To see how this publication bias can lead to a literature infested with error, imagine that 1000 investigators each conduct a study looking for a difference, and a difference actually exists in, say, 10 per cent of these studies, or 100 cases. If the investigations have a power of .5, then 50 of these differences will be discovered.

Of the 900 studies performed looking for effects that do not exist, 5 per cent or 45 will succeed. The result, then, will be that 45 out of 95 significant results (47 per cent) will be type 1 errors. As Joe Simmons and colleagues (2011) recently pointed out in Psychological Science, hidden flexibility in data analysis and presentation is likely to inflate the rate of type 1 errors still further.

The harm done by publication bias has been recognised since at least 1959, when Theodore Sterling canvassed 294 psychology papers and found that 286 of them reported a positive result. Thirty-seven years later, Sterling re-evaluated the literature and concluded that little had changed. Indeed, the problem appears to be getting worse, and not just in psychology. The title of a recent paper in Scientometrics by Fanelli (2012) declares ‘Negative results are disappearing from most disciplines and countries’ based on an analysis of various scientific fields.

Continue reading

New feature: Important psychology experiments that should be replicated

Heartened by the rapidly growing awareness of the importance of doing more replication studies in psychology (see editorials linked in our earlier blog postings below), has added a new feature with which users collectively create a list of important studies that need replicationHere is the current list.  You can vote for up to three items (nominations of studies not already listed count as one vote–the software assumes that anyone nominating a study also wants to vote for it.)

So please consider What are important studies in your field of psychology that currently lack published replications?  When the list has grown and the rankings are stable, the psychological community may use this as a guide for which attempted replications would have the most influence. Results could be reported on where publishing a journal article is too difficult or researchers do not have the time to do it.

Journals of Negative Results have not been wildly successful.

One way in which others have addressed the bias against publishing replication attempts is to create new journals devoted to them. Some have come and gone, but still available online are the Journal of Negative Results in Biomedicine, Journal of Articles in Support of the Null Hypothesis, and the Journal of Negative Results. However, with the exception of the Journal of Negative Results in Biomedicine which published 16 papers in 2011 and X in 2010, these journals publish infrequently (JASNH) or seem defunct (JNR). We suspect that these journals receive few submissions, and will continue to do so while the reward for publishing negative results continues to be very small.  Writing up experiments for publication takes a lot of time away from other activities that are more rewarded.

With, we have attempted to make it quick and easy for researchers, by requiring they post only a summary of their results. And if the methodology was very similar or identical to a published study, describing the methodology also takes little time. Hopefully this will encourage researchers to make some contributions, despite the lack of career rewards.

Data, Data, Everywhere. . . Especially in My File Drawer

Data, Data, Everywhere . . . Especially in My File Drawer

 Barbara A. Spellman (published in Perspectives on Psychological Science January 2012 vol. 7 no. 1, 58-59)

I don’t know about you, but most of my data are not published in good journals, or even in bad journals; most of my data are sitting in my file drawer.1 And most of those data have never even been sent to a journal. Some data are from novel studies that just “didn’t work.” Some are from studies my coauthors and I now call “pilot studies”—studies we did before the ones that “worked” and were published. Some are from actual or conceptual replications of other people’s research—a few of those worked and many of those did not. The successful replications are unpublishable; journals reject such research saying “But we already knew that.” Of course, the failures to replicate are also unpublishable; we all learned that our first week in graduate school.2 I’m told that the justification for that practice is that “there are a lot of reasons why a good published study will fail to replicate.”

These days, however, many of us are concerned with the flip side of that statement: There are lots of reasons why a bad result (i.e., one that incorrectly rejects the null hypothesis) will get published. I’m not talking about deliberately miscoding or inventing data as has been recently alleged against a couple of highly visible psychologists. And I’m not simply talking about the “random” set of Type I errors that are likely to occur (see Rosenthal, 1979). Rather, I’m talking about well-intentioned scientists making well-intentioned (although biased) decisions that lead to incorrect results.

A selection of cleverly titled articles over the last few years have made the argument well, pointing to problems in how research is run, analyzed, reported, evaluated, reviewed, and selected for publication.3 The first of these, published in PLoS Medicine and therefore not specific to psychology research, was Ioannidis’s (2005)article “Why Most Published Research Findings Are False.” Perspectives on Psychological Science (PPS) later published a controversial paper by Vul, Harris, Winkielman, and Pashler (2009) titled “Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition,” which had originally been titled “Voodoo Correlations in Social Neuroscience.” And 2011 was a busy year for publications about problems in our science, not only in the news but also in our journals. Generalizing from the Vul et al. paper, in early 2011, PPS published Fiedler’s (2011)“Voodoo Correlations Are Everywhere—Not Only in Neuroscience.” “The (Mis)reporting of Statistical Results in Psychology Journals” by Bakker and Wicherts (2011) appeared in Behavioral Research Methods. Most recently, Psychological Science has published Simmons, Nelson, and Simonsohn’s (2011) paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” So, yes, because it’s now time to worry about what we do actually publish, it’s also time to revisit our thoughts about what we choose not to publish.

So, what can be done and what is being done? There are now more journals, typically online, that make the review process quicker and more open. In addition, various method-related websites have sprung up where people can, for example, post nonrefereed research or “register” experiments before they are run. In Fall 2011, I created—a website where you can post information about your attempted replications of published studies, regardless of whether they succeeded or failed.4 At the same time, Hal Pashler and colleagues created, a website that allows exactly the same thing. When we discovered our “replication” in early November while both sites were still in development (I sent my link to Hal as part of beta testing), we were astonished. The sites were very similar in tone and content. As I write this introduction, we have plans to combine the sites. By the time you read this introduction, you should be able to use our final product [Ed:]. Note that in addition to posting attempted replications you will be able to post comments and questions. Posted replications must be signed and must have met ethical guidelines for research. Browsing may be done anonymously.

Meanwhile, PPS gets many submissions about scientific methodology. Because it is not a “methods journal” per se, most are politely rejected. But because methodology is something we have in common across the field, sometimes we have published single papers or sets of papers on method-related issues.

This issue contains several articles reacting to these recent events and publications. Some address the problems; some address potential solutions. In “Short, Sweet, and Problematic? The Rise of the Short Report in Psychological Science,” Ledgerwood and Sherman discuss the pluses and minuses of the trend toward shorter and faster publications. One of the minuses, of course, is the problem of false positives, which is taken up in more detail by Bertamini and Munafo in “Bite-Size Science and Its Undesired Side Effects.” Hegarty and Walton give us reason to worry about Journal Impact Factors as proxies for scientific merit in “The Consequences of Predicting Scientific Impact in Psychology Using Journal Impact Factors.”

The final paper in this issue is Chan and Arvey’s “Meta-Analysis and the Development of Knowledge.” It describes the many ways that meta-analyses can be useful (with lots of examples and not a lot of math). I am a fan of meta-analyses and look forward to PPS getting more meta-analysis manuscripts that include more unpublished research.5 I encourage people who post to the replication website to collaborate on such endeavors.

We all know that science proceeds not only by the accretion of new facts but also by the weeding out of what was once falsely believed. I hope this new website will provide a place to discuss what is robust and what is not, to discover and report limiting conditions on our findings, and to provide more complete input to the meta-analyses that we so badly need, and, therefore, help us improve our theories. Scientists should not feel attacked when other scientists report failures to replicate our work; it’s not an accusation that we did something wrong. Rather, we should see failures to replicate—and successful replications—first as compliments, because people thought our work was worth paying attention to and spending time on, and second as providing more pieces to the puzzle that is the field of psychology.


I would like to thank Tony Greenwald, Greg Mitchell, Brian Nosek, Hal Pashler, and Jeff Sherman for never-dull discussions of what can and should be done.


  • 1. Okay, my more recent data are scattered across computer files.

  • 2. Very occasionally, a major psychology journal will publish a systematic set of studies that fail to replicate some phenomenon.

  • 3. Of course, this selection is not exhaustive. For example, PPS published an entire special issue on ways of improving the practice of psychological science in January 2009.

  • 4. Here’s the disclaimer: This site has no affiliation with PPS or the Association for Psychological Science.

  • 5. Of course, such unpublished research would need to be evaluated as to quality by authors of the meta-analyses.

Spellman, B. (2012). Introduction to the Special Section: Data, Data, Everywhere . . . Especially in My File Drawer Perspectives on Psychological Science, 7 (1), 58-59 DOI: 10.1177/1745691611432124

Comment on Association for Research in Personality replication problem editorial

Dear Association for Research in Personality

Regarding your article entitled “Personality Psychology Has a Serious Problem (And so Do Many Other Areas of Psychology)”,

We agree wholeheartedly with your diagnosis of a major problem in publication practices in psychology. As you explain, any solution has to include a reduction in the systematic bias against publishing non-replications that now exists. Such a bias seems to be present in the editorial practices of all of the major psychology journals.  In addition, discussions with colleagues lead us to believe that investigators themselves tend to lose interest in a phenomenon when they fail to replicate a result, partly because they know that publishing negative findings is likely to be difficult and writing the manuscript time-consuming.  Given these biases, it seems inevitable that our literature and even our textbooks are filling with fascinating “findings” that lack validity.

To help address this problem, together with colleagues we have created a new website that allows psychology researchers to post brief notices of replication attempts (whether successful or unsuccessful). In designing the website,, we put a premium on making the submission process quick and easy, in recognition of the fact that the incentives for posting are modest. The site has entered beta testing, and we hope readers of P will post studies from their file drawers and provide us with feedback regarding the site.


Alex O. Holcombe, School of Psychology, University of Sydney

Hal Pashler, Department of Psychology, University of California San Diego


The file drawer problem, c.1959

It’s a very old problem, and it’s time to fix it!

Ben Goldacre points out a paper from 1959 finding that 286 out of 294 psychology papers reported a positive result. The author of that paper came back 36 years later and found little had changed.

Publishing negative results is a pain with little reward. But we hope you can find time to post a summary at