In July, Babylon Health released the results of their testing against the MRCGP (Member of the Royal College of General Practitioners) exam based on publicly available questions. As we reported at the time, its AI system passed the exam with a score of 81 percent. A separate test where Babylon worked with the Royal College of Physicians, Stanford University and Yale New Haven Health subjected Babylon and seven primary care physicians to 100 independently-devised symptom sets. Babylon passed with an 80 score.
Now these results are being questioned in a letter to The Lancet. The authors–a medical doctor and two medical informatics academics–argue that the methodology used was questionable. ‘Safety of patient-facing digital symptom checkers’ shows there ‘is a possibility that it [Babylon’s service] might perform significantly worse’. The symptom checking methodology was questioned for not being real world–that the data in the latter test was entered by doctors only, not by patients or other clinicians. While the authors commended Babylon for being open about their research, they felt there was an “urgent need” for improvements in evaluation methods. “Such guidelines should form the basis of a regulatory framework, as there is currently minimal regulatory oversight of these technologies.” Babylon promises a response and additional improvements, presumably from its $100 million investment in AI announced in September. DigitalHealth (UK), Mobihealthnews