2 of 102
Table of contents
The last post ended with a checklist. Question four was: What was the comparator?
Aluminium adjuvants have been used in vaccines since 1926.1 They're in most childhood vaccines on the UK and Italian schedules, starting in the first months of life, with multiple doses in the first year.213
The standard reassurance: aluminium adjuvants have been used for a century, in billions of doses. But a track record of use is not a track record of testing.
So: has aluminium adjuvant been tested for safety on its own, against an inert placebo?
In 2022, a team at the Copenhagen Trial Unit published the most comprehensive answer to that question.3
The Systematic Review¶
Krauss, Jefferson, and colleagues searched 11 databases in 6 languages, from 1946 through June 2021. They screened 15,446 records and assessed 396 full-text articles. They found 102 randomised controlled trials, enrolling 26,457 participants, that compared an aluminium-containing product to a placebo or no intervention.4
102 trials sounds like a substantial evidence base. But the review classified each trial by what the control arm actually received. Of those 102, only 2 compared aluminium alone, with no vaccine antigens, against an inert placebo.
What the Other 100 Compared¶
The review's Supplementary Material 2 contains the detailed characteristics of all 102 trials, each with 2-3 pages documenting methods, interventions, and risk of bias.5 Walking through the entries reveals a pattern.
Take Adler 2019, a cytomegalovirus vaccine study. One arm received the vaccine with aluminium phosphate; the control arm received the same vaccine without it. This isolates the aluminium, but only against a background of vaccine antigens. Any shared effect is invisible.
This is the dominant pattern across the 102 trials. The "placebo" in most trials was not saline. It was the vaccine minus the aluminium, or a different vaccine, or aluminium with a different antigen.
These trials can tell you whether adding aluminium to a vaccine changes the safety profile relative to that vaccine alone. They cannot tell you what aluminium does compared to nothing.
Some of those 100 did include a saline placebo arm. Bernstein 2008, for example, tested an H5N1 vaccine across 9 arms in 394 adults, with a saline group of 29.8 But it had no aluminium-alone arm. The aluminium was always mixed with vaccine antigens. Comparing vaccine-with-aluminium to saline tests the whole vaccine, not the aluminium. Without a matching arm that contains aluminium and no antigens, a saline placebo doesn't help answer this question.
The Two Trials¶
Basavaraj 2014¶
An Indian Phase I trial of HNVAC, an H1N1 influenza vaccine made by Bharat Biotech.6 The trial had four arms: vaccine with aluminium hydroxide (n=60), vaccine without aluminium (n=60), aluminium hydroxide in phosphate buffer alone (n=20), and phosphate buffer alone (n=20).
Phosphate buffer is not the same as plain saline. It's saline with added phosphate salts to maintain pH, used as a standard carrier liquid in injectable preparations. It contains no active pharmaceutical ingredient, but it's not the simplest possible placebo either. That last comparison, aluminium-in-phosphate-buffer versus phosphate buffer alone, has no vaccine antigens in either arm. The only difference is the aluminium.
Twenty subjects per arm.
Landrum 2017¶
A US military Phase I trial of experimental Staphylococcus aureus vaccine antigens at two sites (San Antonio and Portsmouth).7 Subjects in each of 11 cohorts were randomised 6:1:1 to receive vaccine, aluminium placebo, or normal saline placebo. The aluminium placebo was 800 μg/mL aluminium hydroxide in phosphate buffered saline. The saline placebo was sterile 0.9% normal saline.
Twenty-two subjects received aluminium. Twenty-two received saline.
This is the purest comparison in the literature: aluminium in saline versus saline, with identical injection procedure, in a double-blind trial.
Jefferson and Krauss extracted only these two placebo arms from Landrum. They did not use the vaccine arms. Their Supp 2 entry reads: "Vaccine type: no vaccine. Only placebo arm extracted in this review."5
The Evidence Base¶
Combining the two trials with aluminium-alone versus inert placebo:
| Trial | Al arm | Inert arm | Total | Follow-up |
|---|---|---|---|---|
| Basavaraj 2014 | 20 | 20 | 40 | Single dose, 6 weeks |
| Landrum 2017 | 22 | 22 | 44 | Single dose, 84 days |
| Combined | 42 | 42 | 84 |
Eighty-four subjects. That is the entire randomised evidence base for the safety of aluminium adjuvant compared to an inert placebo, across a century of use.
With 42 subjects per arm, even a clean observation of zero serious events would only give a 95% upper bound of 3/42 = 7% (the rule of three). That is the ceiling: events rarer than 1 in 14 cannot be ruled out. The floor is not zero either: under maximum ignorance, the expected rate given zero events in 42 subjects is 1 in 44. But we do not have that clean zero. Neither trial reported serious adverse events separately for the aluminium and inert arms.
For context: the adverse events that led to historical vaccine withdrawals, all detected through post-market surveillance rather than trials, occurred at rates of 1:10,000 (RotaShield intussusception9), 1:18,000 (Pandemrix narcolepsy10), and 1:100,000 (1976 swine flu Guillain-Barré11). To detect a 1:10,000 event, you need roughly 30,000 subjects per arm.
84 is not 30,000.
The Buried Table¶
Landrum's published paper pooled the aluminium and saline placebo groups into a single "Placebo" column (N=44) for its safety table. The rationale, in the statistical methods section: "Initial analyses found reactogenicity and adverse event rates were similar for saline and alum placebo recipients. Therefore, pooled results from all placebo recipients are presented."
But the supplementary material preserves the separated data.12 Tables 3 and 4 break out reactogenicity by treatment group, including Alum placebo (N=22) and Saline placebo (N=22) as separate columns. These reactions were measured during a 7-day solicited window after injection:
| Local reaction | Alum (N=22) | Saline (N=22) |
|---|---|---|
| Tenderness | 9 (41%) | 4 (18%) |
| Pain/Ache | 9 (41%) | 7 (32%) |
| Redness | 2 (9%) | 1 (5%) |
| Heat | 2 (9%) | 0 (0%) |
| Systemic reaction | Alum (N=22) | Saline (N=22) |
|---|---|---|
| Headache | 5 (23%) | 3 (14%) |
| Muscle aches | 5 (23%) | 2 (9%) |
| Nausea | 2 (9%) | 0 (0%) |
| Fatigue | 2 (9%) | 3 (14%) |
No single comparison reaches statistical significance on a Fisher's exact test (the test the paper used). With 22 per arm, almost nothing could. You would need the aluminium rate to be more than triple the saline rate before a pairwise test has reasonable power to detect it.
But look at the pattern. Aluminium is higher in 7 of 8 comparisons. The only exception is fatigue. Across all 8 categories, the aluminium group reported 36 total events to saline's 20, a ratio of 1.8 to 1. A Cochran-Mantel-Haenszel test, which combines multiple small comparisons into one, gives a combined odds ratio16 of 2.1 (p = 0.016) across all 8 reactions. For local reactions alone: odds ratio 2.3 (p = 0.044). (Reproducible analysis)
The paper broke the data into eight small categories, tested each one separately, found none significant, and called the groups "similar." The aggregate tells a different story. This is a textbook case of what happens when you split a small sample into many underpowered subgroups: each comparison fails individually, but the consistent direction across all of them is not random noise.
Adjuvants are designed to provoke local immune activation. Whether that activation stays local, what else it targets, and what happens beyond the first few days are open questions. Treating the local difference as "expected" and therefore harmless is to mistake no evidence of harm for evidence of no harm. But even within these two trials, the pattern extends beyond the injection site. Headache, muscle aches, nausea: these are systemic reactions, not explained by localised inflammation. Analysing systemic reactions alone, the combined odds ratio is 2.7 (p = 0.011).15
And this is only the first seven days. Landrum tracked adverse events through Day 84, with a follow-up safety call at Day 168, but pooled the aluminium and saline groups for all reporting beyond the 7-day window. The separated breakdown was not published. The 7-day signal is a floor, not a ceiling: by Day 8 the two groups had been merged into one.
Twenty-two subjects. Seven days of separated data. That is what we can see.
The Same Pattern¶
Basavaraj's Phase I data is not buried. It is in Table 2 of the published paper. The two placebo arms, aluminium hydroxide in phosphate buffer (N=20) versus phosphate buffer alone (N=20), are listed side by side. Local reactions were monitored for 7 days, systemic reactions for 21 days, with safety follow-up to 6 weeks.6
| Reaction | Al placebo (N=20) | Buffer placebo (N=20) |
|---|---|---|
| Pain at injection site | 4 (20%) | 1 (5%) |
| Fever | 4 (20%) | 1 (5%) |
| Headache | 3 (15%) | 1 (5%) |
| Lower limb pain | 1 (5%) | 0 (0%) |
| Malaise | 1 (5%) | 0 (0%) |
| Nausea | 1 (5%) | 0 (0%) |
Aluminium is higher in 6 of 6 non-zero comparisons. Not a single reaction favours the buffer-only group. Total events: 14 in the aluminium arm, 3 in the buffer arm, a ratio of 4.7 to 1. A Cochran-Mantel-Haenszel test across all 6 comparisons: odds ratio 5.5 (p = 0.005). (Reproducible analysis)
The paper's conclusion: "Aluminium hydroxide did not have a significant effect either on immunogenicity or on reactogenicity."
Two Trials, One Pattern¶
These are the only two trials that compared aluminium adjuvant to an inert placebo. They were conducted independently, in different countries, with different vaccines, different populations, and different designs. Together they provide 14 non-zero reaction comparisons across 84 subjects.
Aluminium is higher in 13 of 14. The only exception is fatigue in Landrum (2 vs 3). Within each study alone, aggregating across categories is significant: p = 0.016 for Landrum, p = 0.005 for Basavaraj. Combining all 14 strata across both trials: odds ratio 2.6 (p = 0.0005).17 A sign test on the direction alone: 13 of 14 favouring aluminium, p = 0.0009. The sign test requires no distributional assumptions and is unaffected by within-subject correlation. (Combined analysis)
The 2022 systematic review that identified these as the only two aluminium-vs-placebo trials did not catch this.3 It classified all 102 trials by control type, correctly flagged these two as the only clean comparisons, then pooled them with the other 100 for its meta-analysis. It accepted both papers' "no significant difference" at face value. It did not aggregate across reaction categories within each trial, and did not notice that the two trials virtually replicate each other.
What the Evidence Shows¶
Aluminium adjuvant causes more reactions than an inert placebo. The combined odds ratio across both trials is 2.6 (p = 0.0005), driven by injection-site pain, tenderness, fever, and headache. This is not ambiguous. Two independent trials, in different countries, with different designs, found the same thing. Every paper that looked at this data declared "no significant difference." The data says otherwise.
That is what was measured. Here is what was not.
Both trials enrolled healthy adults aged 18-55/65. Both gave a single dose. Landrum measured reactions for 7 days, then pooled the groups. Basavaraj measured local reactions for 7 days and systemic reactions for 21 days, with safety follow-up to 6 weeks. Landrum tracked serious adverse events through Day 84 but reported them only for the pooled placebo group (1 SAE in 44 combined subjects, arm unknown). Basavaraj claimed "did not cause serious AEs" without defining the term or showing data. Neither trial was designed to detect rare or delayed harms.
With 42 subjects per arm, the rule of three gives a 95% upper bound of 7%, but only for outcomes the trial was actually looking for. That tells you nothing about what nobody measured.
The evidence base cannot speak to:
- Events rarer than 1 in 14
- Neurological outcomes
- Autoimmune conditions
- Developmental effects
- Effects in children, pregnant women, or the elderly
- Effects of repeated doses
- Long-term biopersistence (how long aluminium remains in tissue after injection)
Aluminium adjuvant is given to nearly every infant on the UK schedule, beginning at 8 weeks of age, with multiple doses in the first year. The two trials that tested it against an inert placebo enrolled adults only, gave one dose, and followed up for weeks. They found more adverse reactions in that narrow window. Nobody has looked further.
The Point¶
Two things are true. Aluminium adjuvant causes more common reactions than a placebo, and we have almost no evidence about anything beyond common reactions.
The 100 other trials provide some safety information, but they answer a different question.14 Post-market surveillance watches for signals, but without a placebo comparison it cannot attribute causation, and its track record is strongest for acute, distinctive events rather than incremental increases in common conditions. Two trials asked the direct question. They found a signal. They missed it in their own data.
For anything administered to billions of people, starting at 8 weeks of age, you would want a safety evidence base larger than 84 adults followed for weeks. The evidence base for aluminium adjuvant against inert placebo is not that.
This is one of those pockets from the jacket that nobody checked. When someone finally did check, the data was already there.
If you spot an error in my reasoning, data, or sources, tell me. I'll correct it publicly.
-
Glenny AT, Pope CG, Waddington H, Wallace U. "Immunological notes. XVII-XXIV." J Pathol Bacteriol 1926;29:31-40. Demonstrated that diphtheria toxoid precipitated with aluminium potassium sulphate ("potash alum") produced stronger antibody responses than soluble toxoid alone. ↩
-
"Complete routine immunisation schedule from January 2026." UK Health Security Agency / GOV.UK. gov.uk. The 8-week vaccines include DTaP/IPV/Hib/HepB (6-in-1) and MenB, both aluminium-adjuvanted. ↩
-
Krauss SR, Barbateskovic M, Klingenberg SL, et al. "Aluminium adjuvants versus placebo or no intervention in vaccine randomised clinical trials: a systematic review with meta-analysis and Trial Sequential Analysis." BMJ Open 2022;12:e058795. doi:10.1136/bmjopen-2021-058795. PMC9226993. ↩↩
-
Supplementary Material 1 (PDF, 5.2MB): search strategies across 11 databases, PRISMA flow diagram, risk of bias summary for all 102 trials, and forest plots. ↩
-
Supplementary Material 2 (PDF, 153MB): complete characteristics of all 102 included trials (251 pages, image-based). Methods, participants, interventions, outcomes, and risk of bias assessments for each trial. ↩↩
-
Basavaraj VH, Sampath G, Hegde NR, et al. "Evaluation of safety and immunogenicity of HNVAC, an MDCK-based H1N1 pandemic influenza vaccine, in Phase I single centre and Phase II/III multi-centre, double-blind, randomized, placebo-controlled, parallel assignment studies." Vaccine 2014. doi:10.1016/j.vaccine.2014.05.039. CTRI registrations: CTRI/2010/091/000152, CTRI/2010/091/000661. ↩↩
-
Landrum ML, Lalani T, Niknian M, et al. "Safety and immunogenicity of a recombinant Staphylococcus aureus α-toxoid and a recombinant Panton-Valentine leukocidin subunit, in healthy adults." Hum Vaccin Immunother 2017;13(4):791-801. doi:10.1080/21645515.2016.1248326. NCT01011335. ↩
-
Bernstein DI, Edwards KM, Dekker CL, et al. "Effects of Adjuvants on the Safety and Immunogenicity of an Avian Influenza H5N1 Vaccine in Adults." J Infect Dis 2008;197:667-75. doi:10.1086/527489. NCT00280033. ↩
-
Estimated rate of ~1 per 10,000 vaccine recipients. Zanardi LR, Haber P, Mootrey GT, et al. "Intussusception among recipients of rotavirus vaccine: reports to the Vaccine Adverse Event Reporting System." Pediatrics 2001;107(6):E97. PMC2094741. ↩
-
Estimated rate of ~1 per 18,000 vaccinated children/adolescents. Sarkanen TO, Alakuijala APE, Dauvilliers YA, Partinen MM. "Incidence of narcolepsy after H1N1 influenza and vaccinations: Systematic review and meta-analysis." Sleep Med Rev 2018;38:177-186. PMC4962758. ↩
-
Estimated rate of ~1 per 100,000 vaccinees. Schonberger LB, Bregman DJ, Sullivan-Bolyai JZ, et al. "Guillain-Barré syndrome following vaccination in the National Influenza Immunization Program, United States, 1976-1977." Am J Epidemiol 1979;110(2):105-123. doi:10.1093/oxfordjournals.aje.a112795. ↩
-
Landrum et al. Supplementary Material, Tables 3-4. Available from the publisher. ↩
-
"Calendario vaccinale." Ministero della Salute. salute.gov.it. The Italian schedule begins hexavalent (DTaP/IPV/HepB/Hib) and pneumococcal vaccines at 3 months, with meningococcal B from 3-4 months, all aluminium-adjuvanted. Part of the Piano Nazionale di Prevenzione Vaccinale (PNPV) 2023-2025. ↩
-
The Krauss/Jefferson review pooled safety data across all 102 trials and found no significant increase in serious adverse events (RR 1.18, 95% CI 0.97-1.43) and a small significant increase in non-serious events (RR 1.13, 95% CI 1.07-1.20). But in 100 of those 102 trials, the control arm also contained aluminium, vaccine antigens, or both. When both arms share the same adjuvant, any aluminium-specific effect cancels out. These pooled results tell you whether adding aluminium changes a vaccine's safety profile relative to that vaccine alone. They cannot detect effects of aluminium compared to nothing. In fact, the significant increase in non-serious events (RR 1.13) is consistent with a real aluminium effect that persists even in comparisons designed to dilute it. ↩
-
Analysing systemic reactions alone (excluding injection-site tenderness, pain, redness, and heat), the combined odds ratio across both trials is 2.7 (p = 0.011), with aluminium higher in 8 of 9 systemic comparisons. The signal does not depend on local reactions. See the combined analysis script for the full local/systemic breakdown. ↩
-
An odds ratio measures how much more likely an event is in one group compared to another. An OR of 2.1 means the odds of a reaction were roughly twice as high in the aluminium group. An OR of 1.0 would mean no difference. ↩
-
A caveat on these p-values. The Cochran-Mantel-Haenszel test assumes independent strata. In a typical meta-analysis each stratum is a different study with different subjects, so independence holds. Here, the same subjects contribute to multiple reaction categories: one person can report both tenderness and headache. This within-subject correlation means the strata are not fully independent, and the p-values may overstate significance. A more rigorous approach (e.g. GEE modelling within-subject correlation, or a composite "any reaction yes/no" endpoint) would avoid this problem entirely, but requires individual-level data that neither trial published. The consistent direction across 13 of 14 comparisons, replicated independently in both trials, is robust to this concern. The exact p-values should be treated as approximate. ↩