Can we trust the scientific method?
Let's do a study!
We'll do a randomized controlled trial (RCT), which is the gold standard in many fields of science.
Do Malachite crystals prevent malware infections?
Study design (RCT, part 1)
- Take a group of 20 computer users.
- Split them randomly in two groups.
Study design (RCT, part 2)
- Give one group a malachite crystal to put on their desk.
- Give the other group a fake malachite crystal that cannot be easily distinguished from a real one (control group).
- After 6 months check how many malware infections they had.
Simulate study with random data
from scipy import stats
a = [float(os.urandom(1) % 4) for _ in range(10)]
b = [float(os.urandom(1) % 4) for _ in range(10)]
print("%s\n%s" % (a, b))
t, p = stats.ttest_ind(a, b)
print("%.2f;%.2f;%.2f" % (numpy.mean(a), numpy.mean(b), p))
A p-value is the probability that you get a false positive result in idealized conditions
if there is no real effect.
In many fields of science p<0.05 is considered significant.
We just created a significant result out of random data
What is stopping scientists from doing this?
Let's look at a real example: SSRIs (Antidepressants)
Publication Bias and Antidepressants
- 74 studies on SSRIs, data from the FDA.
- 37 out of 38 studies with positive results published.
- 14 out of 36 studies with negative results published, of those 11 claimed a positive outcome.
Turner et al. 2008, NEJM
With Publication Bias you can create results out of nothing.
But it's not efficient, you need 20 studies on average to get a result.
How to interpret our results?
In a scientific study many decisions have to be made:
- What to do with dropouts?
- What to do with cornercase results?
- What exact outcome are we looking for?
- What variables do we control for?
Each of these decisions has a small impact on the result
Even if there is no real result one of these variations may cause enough skew to be significant.
This may be a subconscious process
- Scientists don't start and say: "Today I'm gonna p-hack my result."
- They may subconsciously favor decisions that look like they may lead to the result they expect.
What stops scientists from p-Hacking?
The scientific method is a way to create evidence for whatever theory you like.
A lot of things were wrong with this study.
Psychology is facing a Replication Crisis
Many effects of psychology that were considered facts failed to replicate.
Don't be too snarky about psychologists. Your field is probably not any better. You just don't know yet.
Other fields have a replication crisis as well
Pharma company Amgen failed to replicate 47 out of 53 preclinical cancer studies in 2012.
(Though there are a few problems with this result.)
Some fields don't have a replication problem - because nobody is trying to
What can be done about all this?
The scientific process from analysis to publication needs to be decoupled from its results.
Announce in a public registry what you plan to do in your research.
Later people can check if you published your results and if you changed your research on the way.
This is typically done in drug trials.
It doesn't work very well - but it's better than nothing.
We know Big Pharma is bad
But think about this: Whenever you read about problems in drug trials you should consider that most
other fields don't do preregistration at all.
Right now there's a trend that people from computer science want to change medicine
(Big Data / ML).
Some people in medicine are very worried about this - because the computer science people bring their
weak scientific standards with them.
Turn scientific publication process upside down.
- First publish a protocol for your experiment to a scientific journal.
- Journal decides on publication based on the protocol before the results are in.
- Publish results - independent of outcome.
- Sharing of data, code, methods.
- Large-scale collaboration (one well-designed large study is better than many small ones).
- Higher statistical threshold (p<0.05 means practically nothing).
How's my field doing?
- Are statistical results preregistered in any way?
- Are negative results usually published?
- Are there independent replications of all relevant results?
If you answer all these questions with "No" you are probably not doing science.
You're the alchemists of our time.
- Citation counts (Impact Factor).
Existing incentives strongly favor interesting results - not correct results
Isn't science self-correcting?
If you confront scientists with evidence for Publication Bias and p-hacking -
surely they'll immediately change their practices. That's what scientists do, right?
There is some evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published. Such research being unknown to other investigators may be repeated independently until eventually by chance a significant result occurs—an “error of the first kind”—and is published. Significant results published in these fields are seldom verified by independent replication. The possibility thus arises that the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance.
This article presents evidence that published results of scientific
investigations are not a representative sample of
results of all scientific studies. [...] These results also indicate that practice
leading to publication bias have not changed over a period
of 30 years.
Sterling 1995, The American Statistician
If science is self-correcting it's pretty damn slow in doing so.
Are you prepared for boring science?
There is a choice between TED-talk science and boring science.
- Mostly positive and surprising results.
- Large effects.
- Many citations.
- Media attention.
- You may be able to give a TED talk about it.
- Usually not true.
- Mostly negative results.
- Small effects.
- Closer to the truth.
I prefer boring science.
But this is a tough sell.