Artificial Intelligence Documentation Tools in Surgery: A Systematic Review
Background: Clinical documentation burden is a major contributor to burnout in surgery. Artificial intelligence (AI) tools, such as automatic speech recognition (ASR) and large language models (LLMs), may streamline documentation without sacrificing quality.
Objective: We systematically reviewed the performance of ASR- and LLM-based documentation tools in surgical settings.
Methods: Following the Preferred Reporting Items for Systematic Reviews and Meta-analyses, MEDLINE, Embase, CENTRAL, and Scopus (January 2015–October 2025) were searched for studies evaluating AI-enabled documentation (e.g., ambient scribes, advanced ASR, LLM-assisted drafting) in surgical care. Dual reviewers screened, extracted, and assessed risk of bias using the Risk of Bias in Non-randomized Studies of Exposures tool. Heterogeneity of included studies precluded meta-analysis, and results are presented narratively.
Results: Seven studies published between 2023 and 2025 across otolaryngology, neurosurgery, plastic surgery, and urology were included. Tools such as LLM-assisted operative reports, ambient clinic scribes, and ASR dictation were employed. Findings revealed that AI scribes improved documentation efficiency (5.16 min vs. 10.58 min) and reduced documentation time (5–50 s vs. 7.1–7.4 min), with hybrid clinician-in-the-loop workflows achieving the best balance of speed and quality. AI scribe notes were non-inferior to clinician notes on the Physician Documentation Quality Instrument-9 (33.6/45). Operative note quality was highest with hybrid attending-reviewed generative pre-trained transformer drafts (79% as-is approval) and lowest with generative pre-trained transformer-only notes (23%). Whisper ASR was non-inferior to Dragon Medical One for word error rate and superior when linguistic errors were excluded.
Conclusion: Early evidence suggests clinician-supervised AI documentation may accelerate note generation while maintaining comparable quality, with hybrid use outperforming AI-only approaches. However, the evidence base is early, heterogeneous, and largely non-randomized, and downstream outcomes—including burnout—remain unmeasured. Real-world trials incorporating patient, workflow, safety, and governance outcomes are needed to guide supervised implementation.
- Kunze KN, Bepple J, Bedi A, Ramkumar PN, Pean CA. Commercial Products Using Generative Artificial Intelligence Include Ambient Scribes, Automated Documentation and Scheduling, Revenue Cycle Management, Patient Engagement and Education, and Prior Authorization Platforms. Arthroscopy. 2025;41(11):4950-4955. doi: 10.1016/j. arthro.2025.05.021
- Dimou FM, Eckelbarger D, Riall TS. Surgeon burnout: a systematic review. J Am Coll Surg. 2016;222(6):1230-1239. doi: 10.1016/j.jamcollsurg.2016.03.022
- Kataria S, Ravindran V. Electronic health records: a critical appraisal of strengths and limitations. J R Coll Physicians Edinb. 2020;50(3):262-268. doi: 10.4997/jrcpe.2020.309
- Kroth PJ, Morioka-Douglas N, Veres S, et al. Association of electronic health record design and use factors with clinician stress and burnout. JAMA Netw Open. 2019;2(8):e199609. doi: 10.1001/ jamanetworkopen.2019.9609
- McPeek-Hinz E, Boazak M, Sexton JB, et al. Clinician burnout associated with sex, clinician type, work culture, and use of electronic health records. JAMA Netw Open. 2021;4(4):e215686. doi: 10.1001/jamanetworkopen.2021.5686
- Melnick ER, Dyrbye LN, Sinsky CA, et al. The association between perceived electronic health record usability and professional burnout among US physicians. Mayo Clin Proc. 2020;95(3):476- 487. doi: 10.1016/j.mayocp.2019.09.024
- Varghese C, Harrison EM, O’Grady G, Topol EJ. Artificial intelligence in surgery. Nat Med. 2024;30(5):1257-1268. doi: 10.1038/s41591- 024-02970-3
- Chryssofos S, Ochoa E, Sacks JM. The Digital Scribe: A New Wave of Efficiency and Quality of Life for Plastic Surgeons. Plast Reconstr Surg Glob Open. 2025;13(5):e6754. doi: 10.1097/ GOX.0000000000006754
- van Buchem MM, Kant IMJ, King L, Kazmaier J, Steyerberg EW, Bauer MP. Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study. JMIR AI. 2024;3(1):e60020. doi: 10.2196/60020
- Ormond MJ, Garling EH, Woo JJ, Modi IT, Kunze KN, Ramkumar PN. Artificial Intelligence in Commercial Industry: Serving the End-to-End Patient Experience Across the Digital Ecosystem. Arthroscopy. 2025;41(5):1683-1690. doi: 10.1016/j.arthro.2025.01.064
- Higgins J, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane; 2022.
- Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71
- Bero L, Chartres N, Diong J, et al. The risk of bias in observational studies of exposures (ROBINS-E) tool: concerns arising from application to observational studies of exposures. Syst Rev. 2018;7(1):242. doi: 10.1186/s13643-018- 0915-2
- Ong JHW, Tung JYM, Sng GGR, et al. A pilot study using ambient artificial intelligence scribes in clinical documentation in a urology outpatient clinic. BJU Int. 2025;136(3):417. doi: 10.1111/ bju.16784
- Abdelhady AM, Davis CR. Plastic Surgery and Artificial Intelligence: How ChatGPT Improved Operation Note Accuracy, Time, and Education. Mayo Clin Proc Digit Health. 2023;1(3):299- 308. doi: 10.1016/j.mcpdig.2023.06.002
- Hack S, Attal R, Locatelli G, et al. Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI-Augmented Operative Notes. Laryngoscope. 2025. doi: 10.1002/lary.70063
- Ali A, Kumar RP, Polavarapu H, et al. Bridging the Gap: Can Large Language Models Match Human Expertise in Writing Neurosurgical Operative Notes? World Neurosurg. 2024;192:e34-e41. doi: 10.1016/j.wneu.2024.08.062
- Hopkins BS, Dallas J, Yu J, et al. The use of generative artificial intelligence-based dictation in a neurosurgical practice: a pilot study. Neurosurg Focus. 2025;59(1):E8. doi: 10.3171/2025.4.FOCUS24834
- Moryousef J, Nadesan P, Uy M, Matti D, Guo Y. Assessing the Efficacy and Clinical Utility of Artificial Intelligence Scribes in Urology. Urology. 2025;196:12-17. doi: 10.1016/j.urology.2024.11.061
- Thomson A, Perera M, Murphy D, Lawrentschuk N. Scribe smarter, not harder: how artificial intelligence scribes stack up against human clinicians. BJU Int. 2025;137(1):15-17. doi: 10.1111/ bju.70037
- Shah SJ, Crowell T, Jeong Y, et al. Physician Perspectives on Ambient AI Scribes. JAMA Netw Open. 2025;8(3):e251904. doi: 10.1001/jamanetworkopen.2025.1904
- Shah SJ, Devon-Sand A, Ma SP, et al. Ambient artificial intelligence scribes: physician burnout and perspectives on usability and documentation burden. J Am Med Inform Assoc. 2025;32(2):375-380. doi: 10.1093/jamia/ocae295
- Albrecht M, Shanks D, Shah T, et al. Enhancing clinical documentation with ambient artificial intelligence: a quality improvement survey assessing clinician perspectives on work burden, burnout, and job satisfaction. JAMIA Open. 2024;8(1):ooaf013. doi: 10.1093/jamiaopen/ooaf013
