Symptom Checker Validation: Safety & Accuracy Results

We tested CareRoute using standardized clinical vignettes and an interactive protocol that mirrors real use. Below are the headline results and a transparent explanation of methods, definitions, and limitations.

medRxiv preprint • Peer review in progress

Study at a Glance

88.9%
Overall triage accuracy

40/45 cases correct across ER, doctor/urgent care, and self-care.

100%
Emergency safety

0/15 under-triages on emergency cases in the study set.

~2.5 min
Emergency assessment time

Fewer, targeted questions when red flags are detected.

~67%
Elicitation coverage

Median fraction of clinically relevant findings uncovered by questioning.

Why this matters

Appropriate triage reduces delays for emergencies and prevents unnecessary ER visits for minor issues. Our results suggest CareRoute balances caution with efficiency, especially on high-risk presentations.

Methods (Transparent Summary)

Standardized clinical vignettes

We used 45 medical case vignettes widely referenced in symptom-checker research (15 emergency, 15 doctor-visit, 15 self-care). A clinician evaluator interacted with CareRoute naturally: starting from a main complaint, answering follow-up questions, and receiving a triage recommendation.

15 Emergency
Require immediate ER care
15 Doctor visit
Need timely medical attention
15 Self-care
Safe to manage at home

Interactive testing protocol

  • Begin with only the main symptom (e.g., “chest pain”).
  • Answer follow-up questions in natural language.
  • Record time, number of questions, and triage outcome.

What we measured

  • Accuracy: Correct ER / doctor / self-care recommendation.
  • Emergency safety: Under-triage rate on emergency cases.
  • Elicitation coverage: % of key clinical findings the questions surfaced.
  • User burden: Time and question count per case.

Smart Questioning in Action (Kidney Stone)

The vignette

"A 45-year-old man presents with sudden left-sided flank pain radiating to the groin, with nausea and vomiting. He is writhing in pain, unrelieved by position changes."

Reference answer: Emergency care (pain control and evaluation).

How CareRoute performed

Started with just "flank pain"
Asked 16 strategic questions
Extracted 8 of 9 key symptoms (89%)
Result: Emergency care ✓

Active symptom extraction

Unlike static symptom lists, CareRoute discovers critical features through targeted follow-ups:

Key questions asked

  • "Does the pain radiate anywhere?"
  • "Are you experiencing nausea or vomiting?"
  • "Does changing position help the pain?"
  • "How quickly did the pain start?"

Critical findings discovered

  • Pain radiating to groin
  • Sudden onset
  • Nausea/vomiting
  • Pain unrelieved by position changes

Safety-first design

In this study set, there were zero dangerous under-triages on emergencies (0/15). When errors occurred, they were conservative over-triages — a bias toward caution.

What this protects against

Missing time-sensitive conditions where delays increase risk (e.g., heart attack, stroke, sepsis).

Trade-offs

Some non-emergent cases may be escalated to a doctor/urgent care to prioritize safety.

Key Definitions

Triage accuracy

Whether the recommended level of care matched the reference answer for each vignette.

Emergency safety

The rate of under-triage on emergency cases. 0/15 in this study set.

Elicitation coverage

Fraction of clinically relevant findings surfaced by questioning (median ~67%).

Limitations & Scope

  • Research-only setting: Vignettes are standardized cases, not live patient encounters.
  • Sample size: 45 cases is informative but not definitive; broader studies are warranted.
  • Generalizability: Results apply to the tested version and protocol and may evolve with updates.
  • Not a diagnosis: The tool guides care level; it does not replace clinical judgment.

Read the Full Study & Data

The preprint details methods and results; we also provide data artifacts for transparency.

Frequently Asked Questions

How was the validation performed?

Using 45 standardized clinical vignettes with an interactive testing protocol conducted by a clinician evaluator.

What were the headline results?

88.9% overall triage accuracy, 0/15 under-triage on emergencies, ~2.5 minutes to assess emergencies, and ~67% median elicitation coverage.

Is this peer-reviewed?

The study is posted as a medRxiv preprint and is under peer review.

Does this replace medical advice?

No. It guides care level decisions; consult a clinician for medical advice.

Disclaimer: CareRoute provides guidance and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of a qualified health provider for concerning symptoms.

Last updated: August 31, 2025 • Reviewed by Dr. Prathima Madda, MBBS