Loading...
Clinicians increasingly turn to large language models (LLMs) for help with diagnostic reasoning, but these tools can generate plausible-sounding errors (). How vulnerable are physicians to this fluency trap?
In this trial, 44 physicians who had completed a 20-hour AI literacy program were given 6 fictitious clinical vignettes with optional ChatGPT (GPT-4o) consultation. They were randomized to receive either error-free suggestions from ChatGPT, or suggestions in which deliberate, clinically significant errors were inserted. The primary outcome was a composite diagnostic reasoning score.
Most participants from both groups consulted ChatGPT (≈70%).
Clinicians exposed to flawed AI recommendations had significantly lower …