How to Read a Clinical Trial Critically
A practical framework for researchers and clinicians: eight questions every RCT reader should answer before trusting a result.
A clinical trial lands in your inbox. The abstract reads cleanly: randomised, placebo-controlled, statistically significant, p = 0.02. Before you forward it to your team or change a treatment protocol, there are eight questions that will tell you whether that result deserves your trust.
This framework is adapted from CONSORT, the Cochrane Risk of Bias tool, and the routine analysis SinaPilot applies to every uploaded RCT.
Question 1: Is the Primary Endpoint Pre-Specified?
The single most important question in clinical trial appraisal is whether the reported primary outcome was declared before the data were collected. Post-hoc primary endpoints — where the authors moved the goalposts to whichever outcome crossed the significance threshold — are epidemic in the literature.
Where to look: ClinicalTrials.gov or a published protocol. If the paper lacks both, treat any "primary" endpoint with scepticism.
Red flag phrase: "As an exploratory outcome, we also assessed..."
Question 2: What Was the Control Condition?
A placebo effect of 30–40% is common in psychiatry, pain, and GI trials. An active intervention that produces a 40% response rate against a placebo with a 38% response rate is not clinically meaningful, even if the trial was technically "positive" due to a favourable sample.
What to calculate: Absolute risk reduction (ARR) and number needed to treat (NNT), not just relative risk. An 80% relative risk reduction sounds impressive until you realise baseline risk was 0.5% (ARR = 0.4%, NNT = 250).
Question 3: Was Randomisation Adequate?
Randomisation is not just a binary yes/no. Key questions:
- Was allocation concealed? (If the enrolling clinician could predict which arm a patient would enter, selection bias is likely.)
- Were groups balanced at baseline? (Table 1 should show comparable demographics and disease severity.)
- Was the randomisation unit the patient, or was it a cluster (e.g., ward, practice)?
Cluster-randomised trials require different statistical models that account for intra-cluster correlation. Many papers analyse them as if they were individually randomised.
Question 4: How Much Data Was Missing?
Missing data is not neutral. A 15% dropout rate that is differential between arms (e.g., more side-effect-related dropouts in the treatment arm) biases the result towards efficacy in multiple ways.
What to check:
- Total dropout rate per arm
- Reason for dropout (if reported)
- Imputation method: last observation carried forward (LOCF) tends to be conservative but can mislead in progressive diseases; multiple imputation is preferred
Acceptable threshold: < 10% dropout with plausible MCAR (missing completely at random) assumption. Above 20%, results should be considered exploratory regardless of p-values.
Question 5: What Statistical Test Was Used?
The most common statistical mismatches in RCTs:
| Data type | Correct test | Frequently misused | |---|---|---| | Continuous, 2 groups | t-test or Mann-Whitney | Paired test on unpaired data | | Repeated measures | MMRM or two-way ANOVA | Separate t-tests per timepoint | | Time-to-event | Kaplan-Meier + log-rank | Percentage at arbitrary cutoff | | Count data with overdispersion | Negative binomial | Poisson regression |
If the statistical analysis section uses a test you do not recognise, look it up before trusting the result. "We used appropriate statistical methods" is not an adequate description.
Question 6: Are All Outcomes Reported?
Outcome reporting bias — the selective reporting of statistically significant outcomes — is one of the most replicated findings in meta-science. A trial that pre-registered eight outcomes and only reports three significant ones has not failed to find effects on the other five; it has chosen not to report them.
How to detect it: Compare the registered outcomes (ClinicalTrials.gov) with the published results section. Any registered outcome that does not appear in the paper should be noted in your appraisal.
Question 7: Who Funded the Study?
Industry-funded trials are not automatically biased, but they are systematically more likely to report positive results. A 2017 Cochrane review of 75 meta-analyses found that industry-funded trials reported favourable results significantly more often than independently funded trials, across therapeutic areas.
This does not mean the result is wrong — it means your prior probability of the result being a true positive should be adjusted accordingly.
Question 8: What Is the Effect Size, Not Just the P-Value?
A p-value of 0.001 tells you the result is unlikely under the null hypothesis. It does not tell you whether the effect is large enough to matter clinically.
Useful effect size measures:
- Cohen's d (standardised mean difference): < 0.2 small, 0.5 medium, > 0.8 large
- Odds ratio with 95% CI
- Absolute risk reduction and NNT (for binary outcomes)
- Minimal clinically important difference (MCID): does the effect size exceed the threshold that patients or clinicians would consider meaningful?
A beta-blocker that reduces office blood pressure by 2 mmHg with p < 0.0001 (large n) is statistically significant and clinically irrelevant.
Putting It Together
These eight questions form a checklist, not an algorithm. A trial can fail three of them and still contain a useful result. The goal is not to discard studies but to understand exactly how much weight a result deserves and under what conditions.
A structured AI review runs all eight checks automatically and surfaces the findings before you have finished reading the discussion section. Human judgment remains essential for clinical interpretation — but the mechanical checklist should never be the bottleneck.
Want to apply this checklist to your next paper in seconds? SinaPilot's peer-review critique feature generates a structured report covering statistical issues, COI, and study design for any uploaded PDF.
SinaPilot
Apply this analysis to your own papers
Upload any PDF and get a structured peer-review critique — statistical issues, COI, and design flaws — in seconds.