Published on
A new study in Radiology evaluated whether radiologists in 6 different countries could distinguish AI-generated (“deepfake”) radiographs from real images. In a subsequent phase of the study, 4 large language models (LLMs) were also tasked with deciding which images were authentic and which were deepfakes. Among 17 practicing radiologists, only 41% recognized AI-generated images as having poor technical quality at first look (“Did you notice anything unusual about these images?”). After learning that some images were deepfakes and given the opportunity to categorize images as real or AI-generated, the radiologists’ mean detection accuracy was greater than 70%. Diagnostic accuracy for clinical findings remained high and comparable between deepfakes and real images (~92%), indicating deepfakes can appear clinically convincing. Years of experience and prior AI familiarity did not improve detection among the radiologists. LLMs also evaluated the same 154 x-ray images (“Is this radiograph AI-generated or authentic?”), and the best model achieved 85% accuracy (GPT-4o) in detecting the deepfakes. The other 3 models tested—GPT-5 (83%), Llama 4 Maverick (59%), and Gemini 2.5 Pro (56%)—performed with less accuracy. In the study, both clinicians and AI frequently misidentified deepfake radiographs as being authentic.
Put AI to work: Meanwhile, the CEO of NYC Health + Hospitals recently said he’s ready to replace radiologists with AI for some applications, according to Radiology Business.
