Academic

How to Conduct and Write a Systematic Literature Review

The complete guide to conducting rigorous systematic reviews following PRISMA and other established protocols

By Chandler Supple14 min read
Plan Your Review

AI helps structure your systematic review protocol with search strategies, inclusion criteria, and synthesis plans

You need to synthesize everything known about a topic. A regular literature review won't cut it because you need comprehensive coverage, explicit methods, and reproducible results. Your advisor suggests a systematic review. You start reading about PRISMA guidelines, database search strategies, and quality assessment tools. Two hours later you're more confused than when you started. How many databases is enough? What makes inclusion criteria rigorous? How do you know if you found everything relevant?

Systematic reviews aren't just comprehensive literature reviews. They're research studies with explicit methods designed to minimize bias and maximize reproducibility. They follow protocols, use standardized screening processes, assess study quality systematically, and synthesize findings rigorously. Get it right and you produce the highest level of evidence. Get it wrong and you've spent months on a flawed review that won't get published.

This guide shows you how to conduct systematic literature reviews properly. You'll learn how to develop answerable research questions, design comprehensive search strategies, screen and select studies systematically, assess quality using established tools, and synthesize findings appropriately for your field and data.

Understanding What Makes a Review "Systematic"

The difference between a narrative literature review and a systematic review comes down to method and reproducibility.

In a narrative review, you read what you find, summarize it, and draw conclusions. Your search process isn't documented in detail. Selection criteria are implicit. Quality varies across included studies but you don't assess it formally. Someone else reviewing the same topic would find different studies and reach different conclusions.

In a systematic review, every decision is explicit and documented. Your search strategy is comprehensive and reproducible. Inclusion criteria are defined before you start screening. You assess study quality using established tools. Your synthesis methods are specified in advance. Someone following your protocol should find the same studies and reach similar conclusions.

This rigor is why systematic reviews sit at the top of evidence hierarchies in fields like medicine, education, and social sciences. They're designed to minimize bias and maximize comprehensiveness.

Defining Your Research Question Using PICO or PEO

Systematic reviews require focused, answerable questions. Vague questions lead to vague searches and unclear inclusion criteria.

PICO Framework (for Intervention Studies)

If you're reviewing intervention effectiveness, use PICO:

  • Population: Who? (e.g., "adults with Type 2 diabetes")
  • Intervention: What intervention? (e.g., "mindfulness-based stress reduction")
  • Comparison: Compared to what? (e.g., "usual care" or "no intervention")
  • Outcome: What outcomes? (e.g., "HbA1c levels, quality of life")

Example question: "In adults with Type 2 diabetes (P), does mindfulness-based stress reduction (I) compared to usual care (C) improve glycemic control and quality of life (O)?"

This focused question directly translates to search terms and inclusion criteria.

PEO Framework (for Qualitative or Observational Reviews)

If you're not reviewing interventions, use PEO:

  • Population: Who or what context?
  • Exposure or phenomenon of Interest: What are you studying?
  • Outcomes: What outcomes, experiences, or factors?

Example: "In first-generation college students (P), what experiences with academic advising (E) affect persistence and degree completion (O)?"

Why This Matters

Your PICO or PEO components become your search terms. They define your inclusion criteria. They structure your data extraction. They organize your synthesis. Without a well-defined question, you'll struggle at every subsequent stage.

Developing Your Search Strategy

A systematic review requires a comprehensive search designed to find all relevant studies, not just the most prominent ones.

Choosing Databases

You need multiple databases because no single database covers all relevant literature. Common choices by field:

Health sciences: PubMed/MEDLINE, Embase, Cochrane Library, CINAHL

Social sciences: PsycINFO, Web of Science, Scopus, ERIC (for education), Sociological Abstracts

Multidisciplinary: Web of Science, Scopus, ProQuest

Aim for at least 3-4 databases covering your field. Also consider grey literature sources: ProQuest Dissertations, conference proceedings, government reports, organizational websites.

Building Your Search String

This is where many systematic reviews fail. Your search needs to be sensitive (catching all relevant studies) but not so broad you get 50,000 irrelevant hits.

Start with your PICO/PEO components. For each component, brainstorm synonyms and related terms:

Population (first-generation college students):

  • "first-generation" OR "first generation" OR "FGS" OR "first-gen"
  • AND "college student*" OR "university student*" OR "undergraduate*" OR "postsecondary student*"

Phenomenon (academic advising):

  • "academic advising" OR "academic advisor*" OR "academic counsel*" OR "student support services" OR "academic guidance"

Outcome (persistence, completion):

  • "persistence" OR "retention" OR "degree completion" OR "graduation" OR "attrition" OR "dropout*"

Combine with AND between components, OR within components. Use truncation (*) to catch word variations ("advisor" and "advisors").

Final string: ("first-generation" OR "first generation") AND ("college student*" OR "undergraduate*") AND ("academic advising" OR "academic advisor*") AND ("persistence" OR "retention" OR "graduation")

Testing and Refining Your Search

Before running your full search, test it. Identify 5-10 key articles you know should be included. Run your search string. Did it find them? If not, what terms are you missing?

If your search returns 50,000 hits, it's too sensitive. Add specificity. If it returns 200 hits and misses key studies, it's too narrow. Add synonyms.

Document your search development process. Note what terms you tested, what changes you made, and why.

Documenting Your Search

Record the exact search string used in each database, including filters (date range, language, publication type). Record the search date and number of results.

Include full search strings in your protocol appendix. Someone should be able to replicate your search exactly.

Search strategy feeling overwhelming?

River's AI helps develop comprehensive search strategies with PICO/PEO frameworks, database-specific syntax, and synonym identification for systematic reviews.

Build Search Strategy

Setting Inclusion and Exclusion Criteria

Your inclusion and exclusion criteria determine what studies make it into your review. These must be defined before you start screening.

Common Criteria Categories

Study design: What types will you include? RCTs only? Observational studies? Qualitative studies? Mixed methods? Be specific about minimum methodological requirements.

Population: Age ranges, geographic location, specific conditions or characteristics, setting (clinical, community, etc.)

Intervention/Exposure/Phenomenon: What specific interventions, exposures, or phenomena count? Where do you draw boundaries?

Outcomes: What outcomes must be reported for inclusion? What measurement approaches are acceptable?

Language: English only? Multiple languages? (Be aware limiting to English introduces bias but searching multiple languages requires resources)

Publication date: Date restrictions? Justification required (e.g., "Limited to 2010-2025 because [policy/practice/measurement] changed fundamentally in 2010")

Publication type: Peer-reviewed only? Include grey literature? Conference abstracts?

Being Specific About Boundaries

Vague criteria lead to inconsistent screening. Don't say "studies examining educational outcomes." Say "studies measuring at least one of: GPA, standardized test scores, course completion rates, or degree attainment."

Don't say "studies of adults." Say "studies where mean participant age is 18+ or where age range includes only adults 18+."

Specificity enables consistent screening and reproducibility.

Pilot Testing Your Criteria

Before full screening, pilot test your criteria on a sample of 50 records. Do you and your co-reviewer agree on inclusion decisions? If agreement is low (<80%), your criteria need clarification.

Common sources of disagreement: ambiguous population boundaries, unclear outcome definitions, debatable study quality thresholds. Refine criteria to address ambiguity.

Screening and Selecting Studies

Screening happens in stages: title/abstract screening, then full-text screening.

Title and Abstract Screening

At this stage, you're being liberal. If you're unsure, include it. You'll assess it more carefully at full-text stage.

Use screening software (Covidence, Rayyan, or even Excel) to track decisions. If you have a co-reviewer, screen independently. Check inter-rater agreement regularly. If it drops below 80%, pause and discuss disagreements to clarify criteria.

Typical title/abstract rejection reasons:

  • Wrong population (study of K-12 students when you're focused on college)
  • Wrong study design (opinion piece when you need empirical studies)
  • Wrong topic (completely unrelated)
  • Wrong publication type (book review, editorial)

Full-Text Screening

Now you're applying criteria rigorously. Obtain full text of everything that passed title/abstract screening.

Read each paper against your criteria. Document specific reasons for exclusion: "Excluded: participants were high school students (population criteria)" or "Excluded: no measure of academic persistence (outcome criteria)."

Track exclusion reasons. You'll report them in your PRISMA flow diagram showing exactly how many studies were excluded at each stage for each reason.

Resolving Disagreements

When reviewers disagree, discuss until you reach consensus. If you can't agree, involve a third reviewer or your advisor. Document how disagreements were resolved.

Creating Your PRISMA Flow Diagram

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) requires a flow diagram showing:

  • Records identified through database searches (by source)
  • Records after duplicates removed
  • Records screened (title/abstract)
  • Records excluded (with reasons)
  • Full-text articles assessed
  • Full-text articles excluded (with reasons)
  • Studies included in synthesis

This diagram demonstrates the comprehensiveness and rigor of your search and selection process.

Need help managing screening process?

River's AI helps create screening protocols, track inclusion/exclusion decisions, generate PRISMA flow diagrams, and calculate inter-rater reliability.

Organize Screening

Assessing Study Quality

Not all studies are created equal. Systematic reviews assess quality systematically using established tools.

Choosing the Right Tool for Your Study Types

For RCTs: Cochrane Risk of Bias tool (RoB 2) assesses randomization, allocation concealment, blinding, incomplete data, selective reporting.

For observational studies: Newcastle-Ottawa Scale (NOS) assesses selection of cohorts, comparability, and outcome assessment.

For qualitative studies: Critical Appraisal Skills Programme (CASP) checklist or JBI qualitative assessment tool.

For mixed methods: Mixed Methods Appraisal Tool (MMAT).

Use established tools rather than creating your own. They've been validated and enable comparison across reviews.

Conducting Quality Assessment

Two reviewers independently assess each study's quality. Rate each quality domain (e.g., low risk, high risk, unclear risk of bias). Provide justifications: "High risk of performance bias: participants and personnel were not blinded to intervention allocation."

Calculate inter-rater agreement. Resolve disagreements through discussion.

Using Quality Assessment in Your Synthesis

Quality assessment informs how you interpret findings. Options:

  • Sensitivity analysis: Run main analysis with all studies, then repeat excluding low-quality studies. Do results change?
  • Narrative integration: Note when findings come primarily from lower-quality studies
  • Quality as a moderator: Examine whether study quality predicts effect sizes

Don't automatically exclude low-quality studies unless your protocol specified quality thresholds. Instead, acknowledge limitations and assess whether they affect conclusions.

Extracting Data Systematically

Create a standardized data extraction form before you start extracting. This ensures consistency across studies and reviewers.

What to Extract

Study characteristics:

  • Authors, year, country, publication type
  • Study design, setting, duration
  • Funding source, conflicts of interest

Participant characteristics:

  • Sample size, demographics (age, gender, relevant characteristics)
  • Inclusion/exclusion criteria used in the study
  • Recruitment methods, response rates, attrition

Intervention/Exposure/Phenomenon details:

  • Specific description of intervention or exposure
  • Dose, duration, delivery method
  • Comparison or control condition

Outcomes:

  • Outcome measures used, measurement timing
  • Results for each outcome (means, SDs, effect sizes, p-values)
  • For qualitative: key themes, quotes, concepts

Pilot Testing Your Extraction Form

Before extracting all studies, pilot your form on 5-10 studies. Can you extract all relevant information? Are categories clear? Do you need to add fields?

Revise the form as needed, then extract data from all included studies. If you have co-extractors, double-extract a sample (at least 10%) to check consistency.

Synthesizing Your Findings

How you synthesize depends on your data type and research question.

Narrative Synthesis (for Heterogeneous Studies)

When studies are too different for meta-analysis (different outcomes, designs, populations), use narrative synthesis.

Don't just summarize study by study. Instead:

Group studies by relevant characteristics: population type, intervention type, outcome measured, study design.

Identify patterns: What do studies consistently find? Where do findings conflict? What explains differences?

Explore moderators: Do effects differ by setting, population characteristics, intervention intensity?

Assess certainty: How confident are you in conclusions? Use frameworks like GRADE to rate certainty of evidence.

Use tables to present study characteristics and findings systematically. Readers should be able to see patterns across studies.

Meta-Analysis (for Quantitative Synthesis)

If you have sufficient studies reporting the same outcomes with similar designs and populations, meta-analysis statistically combines effect sizes.

Calculate effect sizes: Convert reported statistics (means, SDs, t-tests, odds ratios) to a common metric (Cohen's d, Hedges' g, odds ratios, risk ratios).

Assess heterogeneity: Use I² statistic to quantify how much effect sizes vary across studies. I² < 25% is low heterogeneity, 50% moderate, 75% high.

Choose fixed vs. random effects: Fixed effects assumes one true effect. Random effects assumes effects vary across populations. Random effects is more conservative and common.

Conduct sensitivity analyses: Remove outliers, exclude low-quality studies, adjust for publication bias. Do results hold?

Assess publication bias: Use funnel plots and tests (Egger's test) to detect whether small, non-significant studies are missing (suggesting publication bias).

Use software: R (meta package), Stata, Review Manager (RevMan), or Comprehensive Meta-Analysis (CMA).

Thematic Synthesis (for Qualitative Studies)

For systematic reviews of qualitative research, use thematic synthesis:

Line-by-line coding: Code findings from each study

Develop descriptive themes: Group codes into themes staying close to original studies

Generate analytical themes: Move beyond primary studies to develop new interpretive constructs

This goes beyond summarizing to create new conceptual understanding across studies.

Writing Your Systematic Review

Systematic reviews follow a standard structure:

Abstract (250-300 words)

Structured: Background, Objective, Methods, Results, Conclusions. Include specific numbers ("23 studies, 4,567 participants").

Introduction (3-5 pages)

Establish why this review is needed. What's known? What's unclear or contested? What gap does this review fill? State objectives and research questions clearly.

Methods (5-8 pages)

Describe every methodological decision in detail: - Protocol registration (PROSPERO, OSF) - Eligibility criteria - Information sources and search strategy (full strings in appendix) - Selection process (screening stages, reviewers, software) - Data extraction process - Quality assessment (tools used, process) - Synthesis methods (statistical approach if meta-analysis)

Someone should be able to replicate your review exactly from your methods section.

Results (8-12 pages)

Present results systematically: - Study selection (PRISMA flow diagram, reasons for exclusion) - Study characteristics (table summarizing all included studies) - Quality assessment (summary of ratings across studies) - Synthesis results (narrative synthesis organized by themes, meta-analysis with forest plots if applicable)

Use tables and figures extensively. They communicate patterns more effectively than text.

Discussion (5-8 pages)

Interpret findings: - Summary of main findings - Comparison to prior reviews (if any) - Strengths and limitations of the review itself - Limitations of the evidence base (study quality, gaps in literature) - Implications for practice, policy, or theory - Future research directions

Conclusion (1-2 pages)

Concise summary of key findings and their significance.

Common Mistakes That Undermine Systematic Reviews

Searching only one or two databases. You'll miss relevant studies, and reviewers will question comprehensiveness.

Not documenting search strategies fully. If you can't show exactly what you searched, your review isn't reproducible.

Making inclusion decisions alone. Single-reviewer screening introduces bias. Standard practice is two independent reviewers.

Not using established quality assessment tools. Ad hoc quality assessment lacks validity and comparability.

Conducting meta-analysis on heterogeneous studies. If studies measure different outcomes, use different populations, or have different designs, combining them statistically may not be meaningful.

Ignoring publication bias. If your meta-analysis shows strong effects but you haven't assessed whether small negative studies are missing, your conclusions may be biased.

Not registering your protocol. Pre-registering (on PROSPERO, OSF, or journal registries) demonstrates methods were decided a priori, not data-driven.

Key Takeaways

Systematic reviews are research studies with explicit methods designed for comprehensiveness and reproducibility. They require more rigor than narrative reviews but produce higher-quality evidence.

Start with a focused research question using PICO or PEO frameworks. Your question determines everything that follows: search terms, inclusion criteria, data extraction, and synthesis approach.

Develop comprehensive search strategies covering multiple databases and grey literature. Test your search string on known key articles. Document every database, search term, and date so someone could replicate your search exactly.

Define inclusion and exclusion criteria before screening begins. Be specific about boundaries. Use two independent reviewers for screening and quality assessment. Calculate inter-rater agreement and resolve disagreements systematically.

Assess study quality using established tools appropriate for your study types. Don't automatically exclude lower-quality studies, but use quality assessment to inform interpretation through sensitivity analyses or narrative discussion.

Choose synthesis methods appropriate for your data. Use narrative synthesis for heterogeneous studies, meta-analysis when you have sufficient homogeneous quantitative studies, or thematic synthesis for qualitative reviews.

Write your review following PRISMA guidelines. Include a flow diagram showing your search and selection process, comprehensive methods section enabling replication, and transparent reporting of all findings including those that don't fit your expectations.

Frequently Asked Questions

How long does a systematic review take?

Plan for 6-12 months minimum, often longer. This includes protocol development (1-2 months), searching and screening (2-4 months), data extraction and quality assessment (2-3 months), synthesis and writing (2-4 months). More complex reviews or those requiring meta-analysis may take 18-24 months.

Do I need two reviewers for everything?

Best practice is two independent reviewers for screening and quality assessment. For data extraction, at least double-extract 10-20% to verify consistency. If resources are limited, document this as a limitation but strive for at least partial dual review.

How many databases should I search?

Minimum 3-4 major databases relevant to your field. More is better for comprehensiveness. Also search grey literature sources, reference lists of included studies, and consider forward citation searching of key articles. The goal is comprehensive coverage, not hitting a specific number.

What if I find too many studies to feasibly screen?

First, verify your search isn't too sensitive (test against known key articles). If you genuinely have thousands of relevant hits, consider narrowing your scope (more specific population, shorter date range, specific outcome types) or using machine learning screening tools to assist (with caution and validation).

Should I include unpublished studies?

Including grey literature (dissertations, reports, conference papers) reduces publication bias but requires more effort. Evaluate tradeoffs based on your field (more critical in fields with high publication bias), resources, and whether grey literature sources are searchable and accessible in your topic area.

Chandler Supple

Co-Founder & CTO at River

Chandler spent years building machine learning systems before realizing the tools he wanted as a writer didn't exist. He founded River to close that gap. In his free time, Chandler loves to read American literature, including Steinbeck and Faulkner.

About River

River is an AI-powered document editor built for professionals who need to write better, faster. From business plans to blog posts, River's AI adapts to your voice and helps you create polished content without the blank page anxiety.