Actionable Diagnostics & Clinical Reasoning

Clinical case simulation and metaflow backup route

Author

Minh-Quan Ton-Ngoc, Hao Chung The

Published

May 15, 2026

1 How to use this notebook

Learning objectives

Explain why unbiased metagenomics is useful in a difficult CNS infection case.

In this case, that objective expands into the wider concept of clinical metagenomics:

  • why unbiased sequencing can help when routine tests are unrevealing,
  • how metagenomics supports clinical reasoning rather than replacing it,
  • why sample type, host background, and classification uncertainty matter,
  • and how to interpret both informative and non-informative results in a hospital setting.
Suggested workshop flow
  • Start with the Diagnostic simulation activity below.
  • Use the admission record and manual like a real hospital case discussion.
  • Ask learners to commit to an early differential diagnosis before they see the metagenomic result.
  • Then move into the direct metagenomic output section and the deeper analytical interpretation.
Why this format works for doctors

This notebook is designed to feel like a clinical reasoning exercise first and a bioinformatics notebook second. The goal is to let learners move from:

  • syndrome recognition,
  • to staged diagnostic refinement,
  • to specimen-level metagenomic interpretation,
  • to a final clinically defensible diagnosis.

2 Aim of this notebook

This is not a full replication study. Instead, it is a guided self-study notebook for healthcare workers learning how clinical metagenomics can support difficult diagnostic decisions. The notebook:

  1. reconstructs the diagnostic teaching context,
  2. inspects the locally available re-analysis outputs,
  3. shows which samples are informative and which are not,
  4. makes the role of unclassified signal explicit,
  5. compares the recreated results against the original paper,
  6. and connects the sequencing outputs back to clinical reasoning in a realistic hospital workflow.

2.1 How this notebook fits the teaching package

This notebook is designed as a single self-guided teaching resource. It does not separate student and instructor versions. Instead, it lets learners:

  • read the clinical scenario,
  • decide what information they need next,
  • answer short questions before opening hints,
  • reveal worked solutions only when needed,
  • run or inspect metagenomic workflow outputs,
  • and progressively connect sequencing evidence back to clinical reasoning.

The final diagnosis is still delayed until after learners have worked through the clinical and metagenomic clues.

2.2 Interpretive guardrails used throughout

  • This notebook uses the local files available in the workshop directory; it does not reconstruct the full original UCSF SURPI workflow.
  • The supplied workshop export files nejm_with_unclassified.csv, nejm_without_unclassified.csv, and nejm.phyloseq/ retain only one sample after strict post-filtering.
  • The broader four-sample diagnostic set is therefore reconstructed here from the underlying per-sample taxonomic summaries in ../materials/clinical-reasoning/Readbased_Analysis/.
  • For abundance-based summaries, the main metric is f_weighted_at_rank.
  • The value called unweighted_fraction is still useful, but it should not be treated as if it must behave like a conventional 100% compositional percentage after every filtering step.
  • Diversity and ordination plots are shown only as descriptive teaching visuals. With four runs, they are not suitable for formal inference.
  • AMR outputs are inspected because they are present, but they are secondary to the diagnostic question in this case.
  • The final in-depth analytical material is retained near the end because it is most valuable for learners who want a research-style debrief after the clinical exercise.

3 Diagnostic simulation activity

This front section functions as a realistic diagnostic simulation for clinicians. Work through it sequentially before opening the later metagenomic interpretation.

3.1 Admission record

Hospital: Tertiary referral hospital, Ho Chi Minh City

Department: Emergency Department → Intensive Care Unit

Patient: 14-year-old male

Reason for admission: Fever, severe headache, altered mental status, and one generalized seizure

3.1.1 Background

The patient has a known history of X-linked agammaglobulinemia (XLA) and receives regular intravenous immunoglobulin replacement. He has no previous history of central nervous system infection.

3.1.2 History of present illness

He had been well until 6 days before admission, when he developed persistent fever, diffuse myalgia, and progressive frontal headache. He was initially treated at a district-level facility with supportive care and oral amoxicillin-clavulanate, without clear improvement.

Over the 48 hours before transfer, the headache became severe and was associated with photophobia, lethargy, and reduced interaction with family. On the day of admission, he had a generalized tonic-clonic seizure lasting approximately 4 minutes, prompting urgent transfer.

3.1.3 Examination on arrival

  • Temperature: 39.5°C
  • Heart rate: 128/min
  • Blood pressure: 105/65 mmHg
  • Respiratory rate: 22/min
  • SpO2: 97% on room air

The patient is stuporous but withdraws to pain. Marked neck stiffness is present. Kernig and Brudzinski signs are positive. No petechiae, purpura, eschar, or visible jaundice are seen. Cardiorespiratory examination is otherwise unremarkable.

3.1.4 Initial laboratory results

  • WBC: 16,500/µL, neutrophil predominant
  • Hemoglobin: 13.2 g/dL
  • Platelets: 115,000/µL
  • CRP: elevated
  • Creatinine: normal
  • AST: mildly elevated
  • ALT: mildly elevated
  • Total bilirubin: normal

3.1.5 First discussion task

Discuss with your group:

  1. What syndromes are present at admission?
  2. What are the most likely diagnostic categories at this stage?
  3. What information would you request next from the manual?
Which summary best describes the admission syndrome?
HintStart by naming the syndrome before trying to name the pathogen.

3.2 Manual: additional information on request

How to use the manual: Open only the blocks your group believes are necessary. This preserves the diagnostic pacing of a real case conference.

  • Lives in a peri-urban district in southern Vietnam
  • No travel outside southern Vietnam in the last 6 months
  • No known tuberculosis contact
  • No known sick contacts
  • No known animal bite
  • Family recently cleaned a ground-floor storage area after heavy rain
  • The patient occasionally played football on a waterlogged paved lot near home during the rainy season
  • Minor skin abrasions on the feet were noted by the family at that time
  • Rodents are commonly seen around the neighborhood and storage area

After admission, the patient had further seizures and was transferred to ICU. He was intubated for airway protection and started empirically on:

  • intravenous meropenem
  • intravenous vancomycin
  • intravenous acyclovir
  • antiseizure treatment
  • Appearance: clear, colorless
  • Opening pressure: 240 mmH2O
  • WBC: 380 cells/µL
  • Differential: 88% lymphocytes, 12% neutrophils
  • RBC: 0
  • Protein: 1.4 g/L
  • CSF glucose: 2.2 mmol/L
  • Serum glucose at the same time: 5.5 mmol/L
  • CSF Gram stain: no organisms seen
  • CSF bacterial culture: no growth at 48 hours
  • Blood cultures: no growth at 48 hours
  • CSF GeneXpert MTB/RIF: negative
  • CSF multiplex PCR panel: negative for HSV-1/2, VZV, enterovirus, CMV
  • HIV test: negative

Initial CT head: no acute hemorrhage or mass lesion.

MRI brain during ICU admission:

  • patchy bilateral T2 hyperintensities in the basal ganglia and adjacent deep white matter
  • basilar leptomeningeal thickening/enhancement
  • no clear abscess

MRI panels A to C showing basal ganglia and deep frontal white matter hyperintensities and basilar meningeal thickening.

Day 13 MRI findings from the neuroleptospirosis case: panels A-C show basal ganglia/deep white matter signal abnormality and basilar meningeal thickening.

These images correspond to Panels A, B, and C from day 13 of the patient’s third hospitalization.

  • Panel A is an axial T2-weighted image showing persistent hyperintensities in the basal ganglia and deep frontal white matter.
  • Panels B and C are sagittal and axial T2-weighted FLAIR images showing thickening in and around the basilar meninges.
  • Together, these findings support a serious meningoencephalitic process with basilar meningeal involvement, which keeps diagnoses such as TB meningitis, fungal meningitis, unusual bacterial infection, and inflammatory disease in the differential before metagenomic results are available.

Despite broad empiric antimicrobial and antiviral therapy, the patient remained febrile and neurologically unstable. No targeted diagnosis was established from routine workup.

  • Japanese encephalitis serology: Not performed in practice
  • Dengue IgM serology: Not performed in practice
  • Leptospira serology / MAT: Not performed in practice
  • Brain biopsy: Not performed in practice
  • External reference-laboratory confirmatory PCR before mNGS: Not performed in practice
After reviewing CSF and routine microbiology, which statement is most appropriate?
HintAsk whether the tests exclude all infection, or only the most common and directly tested causes.

3.3 Before the pipeline result

Write down:

  • your top three diagnostic hypotheses,
  • one diagnosis that has become less likely,
  • and the single most important unanswered question in the case.

At this point, strong answers usually stay at the level of diagnostic categories or syndromic possibilities rather than jumping too early to one exact organism.

4 Generate the metagenomics outputs

Before opening the pipeline result, pause and connect this case back to the earlier workshop sessions.

You have already been taught how to use:

  • metaflow,
  • and the scripts that process output returned from metaflow.

In this case, the goal is to retrieve shotgun metagenomics data from the original paper, run the read-based pipeline, process the returned outputs, and then interpret the result clinically.

4.1 Step 1: identify the study and sample files

The study accession for this case is:

PRJNA234452

Only paired-end reads should be used for this exercise.

4.1.1 Active task

Before opening the solution, decide how you would retrieve the correct FASTQ files.

Write down:

  • where you would look for the FASTQ URLs,
  • how you would distinguish single-end from paired-end files,
  • and how you would avoid accidentally downloading the wrong files.

Start from the study accession. Then look for a metadata field that stores FASTQ download URLs. The key practical clue is that paired-end files usually end with _1.fastq.gz and _2.fastq.gz.

Approach 1: use kingfisher, as taught on day 1

Use kingfisher to retrieve run accessions and FASTQ files for PRJNA234452. Keep only paired-end FASTQ files matching patterns such as:

*_1.fastq.gz
*_2.fastq.gz

Do not use the single-end files for this exercise.

Approach 2: retrieve files manually from ENA, as taught on day 1

  1. Visit the ENA study page: https://www.ebi.ac.uk/ena/browser/view/PRJNA234452
  2. Click Show Column Selection.
  3. Select fields needed to retrieve FASTQ URLs.
  4. The most important column is fastq_ftp.
  5. Download the report in TSV format.
  6. Open the TSV and inspect the FASTQ links.
  7. Each sample has three links: the first is single-end, and the next two are paired-end reads.
  8. Extract only the paired-end links.

You can then use wget and a small shell script to download them.

The four samples used in this notebook are:

Run accession Description
SRR1145846 Illumina MiSeq paired end sequencing: Child with meningoencephalitis: untreated CSF
SRR1145847 Illumina MiSeq paired end sequencing: Returning traveler with fever: serum (control)
SRR1145844 Illumina MiSeq paired end sequencing: Child with meningoencephalitis, DNase-treated CSF
SRR1145845 Illumina MiSeq paired end sequencing: Child with meningoencephalitis, serum
Code
#!/bin/bash

# Create output directory
mkdir -p fastq && cd fastq || exit 1

# Download files
urls=(
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/006/SRR1145846/SRR1145846_1.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/007/SRR1145847/SRR1145847_1.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/004/SRR1145844/SRR1145844_1.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/005/SRR1145845/SRR1145845_1.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/006/SRR1145846/SRR1145846_2.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/007/SRR1145847/SRR1145847_2.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/004/SRR1145844/SRR1145844_2.fastq.gz"
"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR114/005/SRR1145845/SRR1145845_2.fastq.gz"
)

for url in "${urls[@]}"; do
    wget -c "$url"
done

4.1.2 Optional Unix check

After downloading, you should be able to confirm that each run has exactly two paired FASTQ files:

Code
ls fastq/*_{1,2}.fastq.gz

You should see eight files total: four _1.fastq.gz files and four _2.fastq.gz files.

4.2 Step 2: run the read-based pipeline

After download, run metaflow using the config file:

microgen2026.config

The exact command may depend on the workshop environment, but the practical idea is:

Code
nextflow run <metaflow_pipeline> -c microgen2026.config

Open microgen2026.config before launching the run. Check whether outdir points to the folder where you want the NEJM case outputs to be written.

4.3 Step 3: process the metaflow outputs

Next, process the returned outputs with:

process_metaflow_daa_microgen2026.py

This script organizes the read-based analysis results into teaching-friendly output files that can be loaded directly in R.

Code
python process_metaflow_daa_microgen2026.py

If the script cannot find the pipeline output, check whether the input path inside the script or command arguments match the outdir used by metaflow.

4.4 Step 4: expected processed outputs

After successful processing, you should expect:

  • phyloseq objects,
  • taxburst objects,
  • AMR results,
  • a taxonomy classification table with unclassified,
  • and a taxonomy classification table without unclassified.

In this notebook, those downstream files are represented by:

nejm.phyloseq/
nejm.taxburst/
nejm.amr/
nejm_with_unclassified.csv
nejm_without_unclassified.csv
nejm_sampleinformation.csv

4.5 Step 5: backup path if pipeline execution fails

If you cannot run the full pipeline during the workshop, you can continue the downstream exercise using the prepared backup files:

/day-4/materials/clinical-reasoning/Readbased_Analysis
/day-4/materials/clinical-reasoning/nejm.phyloseq
/day-4/materials/clinical-reasoning/nejm.amr
/day-4/materials/clinical-reasoning/nejm_with_unclassified.csv
/day-4/materials/clinical-reasoning/nejm_without_unclassified.csv
/day-4/materials/clinical-reasoning/nejm_sampleinformation.csv

These backup files are acceptable for continuing the downstream analysis. The teaching objective is interpretation, not forcing every learner to solve an infrastructure problem.

4.6 Pipeline result release

Open this section only after the pre-mNGS discussion.

In a live workshop, this is the moment when the case moves from broad clinical uncertainty to specimen-level metagenomic narrowing.

Actionable specimen: untreated CSF

Top retained hit in the recreated workflow: Leptospira santarosai

Immediate practical reading: the untreated CSF is the only specimen that yields a clinically coherent retained signal. The comparator CSF, serum, and control do not provide equally reliable organism-level evidence in this recreated analysis.

  • No alternative pathogen shows similarly coherent support in the strict retained-hit view.
  • The full analytic debrief below will show that the untreated CSF still contains substantial unresolved signal, so the result should be interpreted as strong-but-not-perfect evidence rather than as a magically clean answer.
Once the metagenomic result is released, what is the most important next reasoning step?
HintDo not stop at the taxonomy label. Reconnect the sequence result to the patient and the specimen.

4.7 Immediate clinical synthesis

Discuss with your group:

  1. Which disease now best matches the admission context plus the untreated CSF result?
  2. Which earlier alternatives become less likely?
  3. Which specimens in this recreated workflow seem useful, and which seem non-informative?
  4. What confirmatory or management-oriented next step would be most appropriate?
What is the main educational message of this case?
HintThink about both halves of the workflow: what the result helps you include, and what it helps you rule out.
Tip
Back to top