Loading...
Large language models (LLMs), such as ChatGPT, perform well in diagnostic reasoning (NEJM JW Gen Med Jan 15 2025), but their ability to balance testing strategies, treatment decisions, and risk is understood less well. Investigators randomized 67 attending physicians and 25 residents (in internal medicine, emergency medicine, or family medicine) to use either conventional resources alone (e.g., UpToDate, Google) or conventional resources plus GPT-4 via the ChatGPT plus (OpenAI) interface. Five complex clinical vignettes based on actual patient encounters were presented to each respondent.
The results were as follows:
Physicians using ChatGPT performed significantly better than those using conventional resources alone, according to an expert-v…