15 February, 2026
chatgpt-health-s-mixed-results-spark-concerns-over-ai-accuracy

ChatGPT has introduced a new feature, ChatGPT Health, which claims to analyze personal health data from fitness trackers and medical records. This tool aims to help users understand long-term health trends rather than just moments of illness. After joining a waitlist, a user provided ChatGPT access to extensive health data, including 29 million steps and 6 million heartbeat measurements from the Apple Health app. Upon analysis, the AI assigned a failing grade to the user’s cardiac health, prompting immediate concern and a visit to a healthcare professional.

The user, shocked by the assessment, contacted their doctor, who quickly dismissed the AI’s evaluation. The physician confirmed that the user was at a very low risk for a heart attack, to the extent that their insurance would not cover additional cardio fitness tests. Cardiologist Eric Topol from the Scripps Research Institute also criticized the AI’s assessment as “baseless,” emphasizing that the technology is not ready for clinical use.

While AI has significant potential to enhance medical insights and widen access to care, the early iterations of these health-focused AI tools have raised serious concerns. Many users fear that the information provided by these systems may be misleading or even dangerous, especially when it comes to personal health.

Concerns About Accuracy and Reliability

Shortly after the launch of ChatGPT Health, competitor Anthropic introduced its own tool, Claude for Healthcare, which similarly promises to analyze health metrics. The user noted that Claude also evaluated their cardiac health as a C grade, relying on data that Topol found questionable. Both ChatGPT and Claude include disclaimers indicating that they are not substitutes for professional medical advice. Nevertheless, they provided detailed health analyses based on user data without sufficient caution regarding their accuracy.

ChatGPT’s process involves users sharing intimate health information, raising valid privacy concerns. Although OpenAI asserts that ChatGPT Health includes protective measures, such as data encryption and prohibiting data use for AI training, the platform is not covered by the US federal health privacy law known as HIPAA.

After integrating data from the user’s medical records, ChatGPT updated its assessment, moving the grade from F to D. Topol expressed dismay over the reliance on metrics such as VO2 max, a measure of oxygen consumption during exercise, which he indicated can be inaccurately estimated by consumer devices. Additionally, the user’s heart rate data showed inconsistencies linked to the use of new Apple Watches, which could have affected the AI’s analysis.

Randomness and Inconsistency in Results

The analysis from both AI tools proved to be inconsistent. When the user repeated queries, the grades fluctuated between F and B, demonstrating significant variability in the AI’s evaluations. ChatGPT also overlooked key personal data, such as the user’s age and gender, which should have informed its assessments. Topol criticized this randomness as “totally unacceptable,” warning that it could alarm individuals about their health or provide a false sense of security to those who may be at risk.

OpenAI acknowledged the variability in responses but could not replicate the extreme fluctuations experienced by the user. The company stated that it is working on improving the stability of responses before the tool is made widely available.

Anthropic’s Claude also displayed similar variations in its assessments. While these AI tools might offer valuable insights into general health trends, they have not demonstrated the capability to provide accurate personalized analysis.

The potential of AI in healthcare is promising, yet the current state of these technologies raises questions about their reliability. Users must evaluate the risks associated with trusting AI for health assessments, particularly when the stakes are so high. As the landscape of AI in healthcare continues to evolve, the importance of rigorous validation and careful consideration of user data cannot be overstated.