E-ISSN 2757-8062
Volume: 56 Issue: 3 Year: 2025

Quick Search

ChatGPT for clinical use in labor management: A prospective cohort study [Zeynep Kamil Med J]
Zeynep Kamil Med J. 2025; 56(3): 119-126 | DOI: 10.14744/zkmj.2025.31391

ChatGPT for clinical use in labor management: A prospective cohort study

Ali Selçuk Yeniocak1, Can Tercan1, Emrah Dağdeviren1, Emrullah Akay1, Deniz Aras1, Seda Maş1, Gizem Berfin Uluutku Bulutlar2, Eralp Bulutlar3, Süleyman Salman4
1Department of Obstetrics and Gynecology, Başakşehir Çam and Sakura City Hospital, Istanbul, Turkey
2Department of Obstetrics and Gynecology, University of Health Sciences, Turkey. Istanbul Zeynep Kamil Maternity and Children’s Diseases Health Training and Research Center, Istanbul, Turkey
3Department of Obstetrics and Gynecology, Haydarpaşa Numune Training and Research Hospital, Istanbul, Turkey
4Department of Obstetrics and Gynecology, Gaziosmanpaşa Taksim Training and Research Hospital, Istanbul, Turkey

INTRODUCTION: Artificial intelligence, particularly machine learning, has shown promise in medical applications. This study evaluates the diagnostic accuracy and generalizability of the large language model ChatGPT4.0 in predicting labor protraction.
METHODS: A prospective, single-center cohort study analyzed retrospective data from 100 term pregnancies at low risk for labor protraction. The sample size was calculated using G*Power for 95% statistical power (minimum 46 patients). ChatGPT4.0 was tested on identifying 14 cesarean cases due to labor protraction and predicting active labor durations. The process was repeated after one week to assess consistency. Statistical analyses included Kolmogorov-Smirnov, Mann-Whitney U, Fisher’s Exact, Friedman’s, and independent t-tests (p<0.05 significance).
RESULTS: ChatGPT4.0 achieved 80% overall diagnostic accuracy, with 28.57% sensitivity and 88.37% specificity at initial and follow-up predictions (p=0.105). However, predicted labor durations significantly differed from real-world data: initial (3.66±1.69 hours), follow-up (6.23±0.50 hours), and actual (5.17±2.80 hours) (p<0.001). The difference between initial and follow-up predictions was statistically insignificant (p=0.388).
DISCUSSION AND CONCLUSION: ChatGPT4.0 demonstrates high specificity in identifying labor protraction risks but shows inconsistencies in prediction accuracy, raising concerns about reliability and generalizability. Further research is needed to refine AI tools for clinical applications while ensuring ethical and safety standards. AI has potential in obstetric decision-making but requires rigorous evaluation before integration into practice. The significant limitation of ChatGPT is its restricted generalizability, largely due to the “black box” nature of the algorithm.

Keywords: AI, artificial intelligence, ChatGPT4.0, labor, large language model, LLM, machine learning, ML, obstetrics, protraction.

Corresponding Author: Ali Selçuk Yeniocak, Türkiye
Manuscript Language: English
×
APA
NLM
AMA
MLA
Chicago
Copied!
CITE
LookUs & Online Makale