OpenAI: GPT-5 is the most advanced model for answering medical questions

GPT-5 has become OpenAI’s most reliable model in the field of medicine, the company says: according to the results of the evaluation on the HealthBench platform, developed with the participation of 250 practicing doctors, the model outperformed previous versions in terms of accuracy and quality of responses. The testing analyzed 5,000 dialogues simulating consultations between patients and digital assistants.
The model is already being used in the pharmaceutical and insurance industries. For example, Amgen uses GPT-5 in drug development, using its capabilities for deep analysis of scientific and clinical data. Oscar Health noted the high efficiency of GPT-5 in interpreting complex medical regulations when working with specific patient cases.
The model's introduction into the work of US federal services was also announced. GPT-5 is available in three variants - GPT-5, GPT-5 mini and GPT-5 nano. OрenAI predicts that in the coming months the model will find wide application in new, as yet unobvious scenarios.
However, as interest in using AI in healthcare grows, so does attention to safety issues. Representatives of Microsoft, a strategic partner of OpenAI, noted that medical scenarios are high-value but also high-risk cases. Potential AI errors in interpreting data can have serious consequences for the patient. This highlights the need for strict expert control over the use of the model in clinical practice.
In March 2024, a group of scientists from Austria and Germany presented a comprehensive study of the application of ChatGPT, including the fourth version, in medical sciences. Research of scientific publications since the release of this LLM (large language model) showed that the main area of testing is focused on medical education, consultation and research, as well as individual stages of clinical work, including diagnosis, decision-making and medical documentation.
As for medical consultations, the authors of the study point out, ChatGPT demonstrates high accuracy in oncology topics (possibly due to the inclusion of public sources such as the National Cancer Institute in the training data), and its effectiveness in other specializations requires further evaluation. Overall, the scientists noted, ChatGPT does not meet high clinical standards - specialized modifications and standardized evaluation methods are needed for real implementation.
Current assessment methods rely excessively on subjective expert opinions and lack objectivity and scalability, the study says. A promising direction seems to be the development of automated quantitative metrics for assessing the quality of responses, which will be a key condition for clinical integration of the technology. The creation of professional versions of ChatGPT for specific medical specialties that have undergone rigorous quantitative testing may pave the way for its practical use in medicine.
At the same time, ChatGPT4 had a number of significant shortcomings that limited its clinical application – the model works exclusively with text data, is unable to analyze images, and does not have the logic of expert systems: its “justifications” are only probabilistic predictions of the next words, which can lead to paradoxical situations when the correct answer is accompanied by an absurd explanation. The reliability of the answers directly depends on the quality of the training data, while the model does not distinguish between reliable and false information, which creates the risk of providing dangerous and biased recommendations. A particular problem is the model’s tendency to generate plausible, but completely fictitious information presented in a convincing form. This requires mandatory expert verification of all conclusions before their use in medical practice.
The scientists also stated that ChatGPT responses are often superficial and lack the necessary depth, the model is not a specialized medical tool and requires additional adaptation for clinical use. An important limitation is the dependence of the results on the formulation of the query - even a slight change in the question can lead to a completely different answer.
Data privacy is a separate issue, as the use of a proprietary model to process personal health information may violate patient privacy requirements. These limitations highlight the need for significant refinement of the model and the development of strict protocols for its use in healthcare.
Surveys in the US and Russia show a similar trend: interest in using AI in healthcare is combined with caution and uneven levels of trust. For example, according to the Annenberg Public Policy Center, 63% of Americans are willing to rely on AI-generated answers when searching for medical information, while 79% regularly turn to the Internet for health-related information. In Russia, according to the results of a MAR CONSULT study, users are interested in new technologies, but prefer face-to-face interaction with a doctor, and the level of mistrust in AI remains high: 46% do not trust machine algorithms, 51% doubt the ability to take into account the individual characteristics of the patient, and 36% are concerned about the leakage of personal data.
According to the forecast of analysts from the Swiss insurance company Swiss Re, by 2034, healthcare and pharmaceuticals will become leaders in terms of the level of insurance risks associated with the use of AI. The study is based on an analysis of the current market situation and cases of negative impact of AI in various industries. If today the IT sphere is considered the most vulnerable, then in the next decade, according to experts, the greatest risks will be associated with the introduction of AI into clinical practice, the protection of medical data and decision-making based on self-learning models.
As ChatGPT is rapidly integrating into medical education, researchers from Sichuan University in China conducted one of the first large-scale studies on how medical students perceive the technology. The survey involved 1,133 future doctors from various medical educational institutions in Sichuan Province. The results showed that 62.9% had already used ChatGPT in their studies, most often to search for medical information (84.4%) and complete specialized academic assignments (60.4%). At the same time, 76.9% of students expressed concerns about the possibility of the AI bot disseminating inaccurate medical information, and 65.4% about the risk of dishonest borrowing of content. Despite this, more than 60% of participants stated their willingness to use ChatGPT to solve educational problems during clinical training and generally positively assessed its potential in medical education.
vademec