The Sunday before the 2004 presidential election the Green Bay Packers beat the Washington Redskins. This news was considered auspicious for the Kerry campaign. After all, every time the Washington team was defeated in the last home game before a presidential election, the incumbent party lost. Conversely, when Washington won, so did the party in power. This perfect record extended back to 1936 even before the team moved to Washington, stretching through fifteen elections.
As we now know this record came to a crashing halt once the votes were counted. But even before, it is likely that few observers treated the connection as more than an interesting coincidence. After all, it is very hard to construct a plausible cause and effect relationship between a football game and a presidential election.
It is useful to ask why such a series of coincidences should exist. After all, if the odds of the coincidence in any election is 50-50, the chances of it repeating over fifteen elections is around one in 32,000. The answer lies in how such patterns are found in the first place. There are numerous possible combinations that can be searched in hopes of finding one that predicts an election. If the Redskins did a poor job of predicting the election, what about another team? If the last home game didn’t work, what about the last away game–or the last game played anywhere? Search far enough through all possible combinations and it is no surprise to find two series that seem connected.
Recently education has seen a number of studies that show with statistical confidence:
• Students in charter schools get lower test scores–and their scores are higher.
• The No Child Left Behind law reduced student achievement–and increased it.
• Small classes increase achievement–and they don’t.
• States with high stakes do better–and they do worse.
And many other similar examples.
When these studies are released, critics–usually those who don’t like the conclusions–find grounds to argue with the study methodology. They may question the comparability of the control group, for instance.
Yet, as the Redskins story underlines, there may be broader reasons to treat the results with skepticism, if the published studies represent a small and selective subset of all studies that could be done. Even if the samples used in any one study are unimpeachably random, the results could easily be misleading if the decision to publish is not similarly random. And there is every indication that the publishing decision is heavily influenced by what the study finds.
To illustrate, assume twenty studies of equal quality are made on the effect a policy has on student achievement. One study shows a statistically significant positive effect and one a significant negative effect; the rest show a insignificant effect . Which of these studies is likely to be published?
It is extremely unlikely that any journal editor would publish all twenty. They would fear losing their readership if they allowed one issue to dominate their publication. Yet it takes all twenty to get a balanced view of the state of the research.
Of the twenty, the eighteen not showing an effect are least likely to be published. Editors and reviewers would regard them as the least interesting of the lot.
If, as seems likely, the policy under review is part of the educational culture wars, editors and reviewers are likely to be influenced by whether they like the results. Both sides may view the eighteen studies that show no effect as undermining their position.
Other factors may influence the publishing decision as well. Whether the publication recently published similar research may have an effect. In some cases, particularly with journals published by think tanks, the authors’ connections may make the difference.
Yet to truly understand what “the research shows”–and what it doesn’t show, one needs to know about the dogs that didn’t bark as well as those that did.
To get a more complete picture of what the research shows, one role we hope to play is as a publishing outlet for well-constructed research on educational programs and policies. We are especially interested in those who that don’t fit the editorial biases of other journals. We would like to hear from others interested in participating, particularly as reviewers.
We hope to encourage research reports stripped of literature reviews. Too often we have to wade through pages of background, including summaries of previous research, before finding what the researchers actually did: how the students were chosen, what programs the control and treatment group had, whether teachers in the control group had the same level of training and support, the numbers involved of students and teachers, attrition rates, and so on. We find that good literature reviewing is quite a different skill, and usually better handled as an enterprise separate from reporting on any individual research.
How would the connection between the Redskins and the election have fared if judged by the standards of educational research? Pretty well as it turns out. Even if the 2004 and 1932 results where the pattern broke down are included, regression analysis gives a very impressive p-value of .00007, allowing it to easily pass any test for statistical significance.