When AI Steers Clinicians Wrong

Clinical Takeaway

: Physicians who received flawed ChatGPT recommendations had lower diagnostic reasoning scores than those who received error-free output.

Context

Clinicians increasingly turn to large language models (LLMs) for help with diagnostic reasoning, but these tools can generate plausible-sounding errors (). How vulnerable are physicians to this fluency trap?

In this trial, 44 physicians who had completed a 20-hour AI literacy program were given 6 fictitious clinical vignettes with optional ChatGPT (GPT-4o) consultation. They were randomized to receive either error-free suggestions from ChatGPT, or suggestions in which deliberate, clinically significant errors were inserted. The primary outcome was a composite diagnostic reasoning score.

Key Results

Most participants from both groups consulted ChatGPT (≈70%).
Clinicians exposed to flawed AI recommendations had significantly lower …

Reviewing Author

Raja-Elie E. Abdulnour, M.D.

Disclosures

Nothing to disclose