What "No Evidence of Harm" Actually Means

Table of contents

The Four Meanings¶
Real Examples¶
The 7-Day Window Problem¶
A Historical Lesson¶
The Checklist¶
The Translation Guide¶
The Point¶

"No evidence of harm."

You'll find this phrase in regulatory documents, product labels, and public health communications. It sounds reassuring. A vaccine was studied, no harm was found, therefore it's safe.

But the phrase can mean at least four very different things. Only one of them is reassuring.

Say you've lost your keys. You pat one pocket of a four-pocket jacket and say: "no evidence my keys are in my jacket." Technically true. You checked one pocket, but you still know nothing about the other pockets. That's what "no evidence of harm" is often short for in safety reporting: we checked one pocket.

The Four Meanings¶

1. We looked and found nothing (Strong)¶

The study was designed to detect the outcome in question. It had adequate sample size, appropriate follow-up duration, and pre-specified endpoints. The outcome was measured. Nothing was found.

This is genuine evidence of safety, at least for that outcome at that detection threshold.

Example: A trial of 70,000 infants specifically designed to detect intussusception at 1:10,000, with active case finding, finds no signal. That's meaningful.¹

This is rare.⁵ Most "no evidence of harm" claims are not this.

2. We looked but couldn't see (Weak)¶

The study measured the outcome, but lacked statistical power to detect realistic effect sizes. The sample was too small, or the event too rare, for absence of signal to mean much.

Example: A trial of 341 pregnant women reports "no serious adverse events." With N=341, you can only detect events occurring in roughly 1 in 50 people (2%). Anything rarer is invisible. "No evidence" here means "we couldn't have seen it if it existed at 1:100 or 1:1,000."

This is the most common meaning of "no evidence of harm" in vaccine safety.

3. We didn't measure it (Worthless)¶

The outcome wasn't in the study design. It wasn't a pre-specified endpoint. Nobody looked.

Example: A trial tracks injection-site reactions and fever for 7 days. Someone asks about autoimmune conditions. The answer "no evidence of harm" means "we didn't measure autoimmune conditions," not "we measured them and found nothing."

This is common.

4. We measured at the wrong time (Worthless for that outcome)¶

The outcome was measured, but the follow-up duration was too short for the outcome type. A 7-day window cannot detect conditions that take weeks or months to manifest.

Example: A study follows infants for 42 days and reports "no neurological events." But conditions that manifest months or years later (whether metabolic, autoimmune, or developmental) cannot be detected in a 42-day window. The study cannot detect what it cannot observe.

Real Examples¶

Thimerosal (1999)¶

On July 9, 1999, the American Academy of Pediatrics and Public Health Service issued a joint statement about thimerosal in vaccines:

"There are no data or evidence of any harm caused by the level of exposure that some children may have encountered in following the existing immunization schedule." — AAP/PHS Joint Statement, MMWR 1999

Reassuring. But the same bodies later acknowledged:

"Limited data on toxicity from low-dose exposures to ethylmercury are available" — AAP Technical Report, 2001

Translation: We hadn't studied it. "No evidence of harm" meant "absence of studies," not "studies showing absence of harm."

The phrase "no evidence of harm" covered both "we looked and found nothing" and "we didn't look." The public heard the first meaning. The technical reality was the second.

Tdap in Pregnancy (2014)¶

The foundational trial for Tdap vaccination in pregnancy (NCT00707148) enrolled 48 pregnant women (33 vaccine, 15 placebo). Solicited adverse events were tracked for 7 days. The study reported "no Tdap-associated serious adverse events."

With N=48 and 7-day follow-up, this study can detect: - Events occurring in >10-20% of subjects - Reactions appearing within one week

It cannot detect: - Anything rarer than roughly 1:5 - Neurological events (typical onset: weeks to months) - Developmental effects (require years of follow-up) - Autoimmune conditions (median onset: 1-4 weeks)

"No serious adverse events" meant "none observed in 48 people over 7 days." That's very different from "none exist."

Boostrix in Pregnancy (UK)¶

The UK's Summary of Product Characteristics for Boostrix states:

"There is no vaccine related adverse effect on pregnancy or on the health of the foetus/newborn child" — EMC SmPC

This statement rests primarily on trial NCT02377349: 341 pregnant women received the vaccine, 346 received placebo. Follow-up for serious adverse events: 2 months post-delivery.

With N=341, this trial can detect events at roughly 1:50 (2%). But the NNT to prevent one infant death from pertussis ranges from 1:5,800 (outbreak year) to 1:64,000 (non-outbreak year).²

The trial can rule out very common harms. It cannot rule out harms at the frequency that would matter for net benefit calculations.

The 7-Day Window Problem¶

Many vaccine trials use a "solicited adverse event" window of 7-8 days. Different outcomes have different onset timelines:

Outcome Type	Typical Onset	7-Day Window Detects?
Injection site reactions	Hours to days	Yes
Fever, fatigue	Hours to days	Yes
Allergic reactions	Minutes to hours	Yes
Guillain-Barré syndrome	2-6 weeks	No
Autoimmune conditions	1-4 weeks	No
Narcolepsy	Weeks to months	No
Developmental effects	Years	No

Short solicited windows are practical for detecting common acute reactions. The problem is when "no adverse events in 7 days" gets reported as "no adverse events," full stop.

A Historical Lesson¶

Every serious vaccine safety signal detected post-licensure was at a rate undetectable by pre-licensure trials:

Case	Rate	Detection Method	Pre-Licensure Trial Could Detect?
RotaShield / Intussusception	~1:10,000	VAERS (9 months post-licensure)	No (needed ~60K)
Pandemrix / Narcolepsy	~1:18,000	Clinical observation (10 months)	No (needed ~55K)
1976 Swine Flu / GBS	~1:100,000	Passive surveillance (2 months)	No (needed ~300K)

⁴

Pre-licensure trials are structurally incapable of detecting the kinds of rare events that have historically led to vaccine withdrawals. This isn't a flaw in any particular trial. It's a feature of how trials work: you can't detect 1:10,000 events in a trial of 1,000 people.

"No evidence of harm" in pre-licensure trials means "no evidence of common harm." It cannot mean "no evidence of rare harm."

The Checklist¶

When you encounter "no evidence of harm," ask:

1. What was the sample size? - N=50 can detect ~1:10 - N=300 can detect ~1:100 - N=3,000 can detect ~1:1,000 - N=30,000 can detect ~1:10,000

If the sample is smaller than the rate you care about, "no evidence" tells you nothing.

2. What was the follow-up duration? - 7 days catches acute reactions - 6 weeks might catch some autoimmune onset - Years are needed for developmental outcomes

If the follow-up is shorter than the outcome's typical onset, "no evidence" is meaningless for that outcome.

3. What outcomes were pre-specified? - Was the outcome you care about actually measured? - Or is "no evidence" really "not measured"?

4. What was the comparator? - Saline placebo? (Can attribute effects to vaccine) - Another vaccine? (Can only detect differences between vaccines) - No control? (Can't attribute anything)

The Translation Guide¶

What They Say	What It Often Means
"No evidence of harm"	"Harm not detected at our detection threshold"
"Well-tolerated"	"No common (>1-2%) adverse events observed"
"No serious adverse events"	"None in our sample over our follow-up period"
"Similar rates in both groups"	Only meaningful if control was inert
"Extensive safety record"	May mean decades of use, not decades of study

The Point¶

"No evidence of harm" is not the same as "evidence of no harm."

The phrase should prompt questions: What was measured? For how long? With what power to detect? The answers determine whether "no evidence" is reassuring or meaningless.

When the sample is small, the follow-up is short, or the outcome wasn't measured, "no evidence of harm" means: we don't know.

The problem is not uncertainty. Uncertainty is inevitable in medicine. The problem is pretending different kinds of uncertainty are the same.

Honesty requires distinguishing "we looked and found nothing" from "we didn't look" or "we couldn't see." The current convention of lumping all three under "no evidence of harm" serves clarity poorly.

Notice the asymmetry: when ambiguity exists, it reliably cuts in the reassuring direction. The public hears "safe." When challenged, experts can retreat to "we only claimed no evidence in our specific study": a motte-and-bailey where the defensible claim is not the one that shaped decisions.

Pre-licensure trials aren't the whole safety system. A future post will examine what post-market surveillance can and cannot detect.

If you spot an error in my reasoning, data, or sources, tell me. I'll correct it publicly.

The first rotavirus vaccine, RotaShield, was licensed in 1998 based on trials of ~10,000 infants. It was withdrawn 9 months later after VAERS detected intussusception at ~1:10,000, a rate the trials couldn't see. The successor vaccine RotaTeq was required to run the REST trial with ~70,000 infants specifically powered to detect intussusception. See: Vesikari et al., NEJM 2006. ↩
Calculated from UKHSA 2024 surveillance data. See The NNT-Detection Gap for full methodology. ↩
Onset timelines: GBS typically presents within 6 weeks of vaccination, with most cases within 2-3 weeks (Institute for Vaccine Safety). Autoimmune conditions show median onset of 1-3 weeks post-vaccination (Chen et al. 2022). Pandemrix-associated narcolepsy showed highest risk within 6 months, with elevated risk extending to 12 months (Miller et al. 2020). ↩
RotaShield: ~1:10,000 intussusception rate, detected via VAERS, withdrawn October 1999 (CDC archived; Iskander et al. 2007). Pandemrix: ~1:18,400 attributable risk in children/adolescents, signal detected August 2010 (CDC archived; Sarkanen et al. 2018). 1976 Swine Flu: ~1:100,000 excess GBS risk, program suspended December 1976 after ~10 weeks (Schonberger et al. 1979; Sencer & Millar 2006). ↩
Pre-licensure trials typically enroll ≤50,000 subjects, sufficient for efficacy but not for detecting events rarer than ~1:10,000. See Jacobson et al. 2001. ↩