Artificial Intelligence Documentation Tools in Surgery: A Systematic Review

Damien Gibson; Victor Yu; Kate Alexander; Kun Yu; Scott Leslie; Ruban Thanigasalam; Nicola Jeffery; Daniel Steffens

doi:10.36922/sti.0457

HomeEditorial Office Archive website Submissions

Article

Article Types

Year

—

Volume

Issue

Pages

—

Submit to STI

Apply for Special Issue

Cite this article

Download

Views

More by Authors Links

Damien Gibson

Journal Browser

Volume | Year

Issue

Forthcoming Issue

Current Issue

View All

News and Announcements

View All

REVIEW

Artificial Intelligence Documentation Tools in Surgery: A Systematic Review

Damien Gibson^1,2,3*, Victor Yu^1,3, Kate Alexander¹, Kun Yu⁴, Scott Leslie^1,2,3, Ruban Thanigasalam^1,5, Nicola Jeffery^1,3, Daniel Steffens^1,3,6

Show Less

¹ Surgical Outcomes Research Centre, Royal Prince Alfred Hospital, Sydney, New South Wales, Australia

² Faculty of Medicine and Health, Central Clinical School, The University of Sydney, Sydney, New South Wales, Australia

³ Department of Urology, Royal Prince Alfred Hospital, Sydney, New South Wales, Australia

⁴ Data Science Institute, University of Technology Sydney, Sydney, New South Wales, Australia

⁵ Department of Urology, Chris O’Brien Lifehouse, Sydney, New South Wales, Australia

⁶ NHMRC Clinical Trials Centre, The University of Sydney, Sydney, New South Wales, Australia

STI 2026, 46(1), 0457 https://doi.org/10.36922/sti.0457

Received: 27 December 2025 | Revised: 13 February 2026 | Accepted: 24 February 2026 | Published online: 29 May 2026

© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

Background: Clinical documentation burden is a major contributor to burnout in surgery. Artificial intelligence (AI) tools, such as automatic speech recognition (ASR) and large language models (LLMs), may streamline documentation without sacrificing quality.

Objective: We systematically reviewed the performance of ASR- and LLM-based documentation tools in surgical settings.

Methods: Following the Preferred Reporting Items for Systematic Reviews and Meta-analyses, MEDLINE, Embase, CENTRAL, and Scopus (January 2015–October 2025) were searched for studies evaluating AI-enabled documentation (e.g., ambient scribes, advanced ASR, LLM-assisted drafting) in surgical care. Dual reviewers screened, extracted, and assessed risk of bias using the Risk of Bias in Non-randomized Studies of Exposures tool. Heterogeneity of included studies precluded meta-analysis, and results are presented narratively.

Results: Seven studies published between 2023 and 2025 across otolaryngology, neurosurgery, plastic surgery, and urology were included. Tools such as LLM-assisted operative reports, ambient clinic scribes, and ASR dictation were employed. Findings revealed that AI scribes improved documentation efficiency (5.16 min vs. 10.58 min) and reduced documentation time (5–50 s vs. 7.1–7.4 min), with hybrid clinician-in-the-loop workflows achieving the best balance of speed and quality. AI scribe notes were non-inferior to clinician notes on the Physician Documentation Quality Instrument-9 (33.6/45). Operative note quality was highest with hybrid attending-reviewed generative pre-trained transformer drafts (79% as-is approval) and lowest with generative pre-trained transformer-only notes (23%). Whisper ASR was non-inferior to Dragon Medical One for word error rate and superior when linguistic errors were excluded.

Conclusion: Early evidence suggests clinician-supervised AI documentation may accelerate note generation while maintaining comparable quality, with hybrid use outperforming AI-only approaches. However, the evidence base is early, heterogeneous, and largely non-randomized, and downstream outcomes—including burnout—remain unmeasured. Real-world trials incorporating patient, workflow, safety, and governance outcomes are needed to guide supervised implementation.

Keywords

Artificial intelligence

Surgical documentation

Ambient digital scribes

Large language models

Operative reports

Clinical workflow

Surgeon burnout

Funding

Professor Daniel Steffens holds a Cancer Institute NSW Career Develop¬ment Fellowship. No other authors have received any funding or support.

Conflict of interest

The authors declare no conflict of interest.

References

Kunze KN, Bepple J, Bedi A, Ramkumar PN, Pean CA. Commercial Products Using Generative Artificial Intelligence Include Ambient Scribes, Automated Documentation and Scheduling, Revenue Cycle Management, Patient Engagement and Education, and Prior Authorization Platforms. Arthroscopy. 2025;41(11):4950-4955. doi: 10.1016/j. arthro.2025.05.021

Dimou FM, Eckelbarger D, Riall TS. Surgeon burnout: a systematic review. J Am Coll Surg. 2016;222(6):1230-1239. doi: 10.1016/j.jamcollsurg.2016.03.022

Kataria S, Ravindran V. Electronic health records: a critical appraisal of strengths and limitations. J R Coll Physicians Edinb. 2020;50(3):262-268. doi: 10.4997/jrcpe.2020.309

Kroth PJ, Morioka-Douglas N, Veres S, et al. Association of electronic health record design and use factors with clinician stress and burnout. JAMA Netw Open. 2019;2(8):e199609. doi: 10.1001/ jamanetworkopen.2019.9609

McPeek-Hinz E, Boazak M, Sexton JB, et al. Clinician burnout associated with sex, clinician type, work culture, and use of electronic health records. JAMA Netw Open. 2021;4(4):e215686. doi: 10.1001/jamanetworkopen.2021.5686

Melnick ER, Dyrbye LN, Sinsky CA, et al. The association between perceived electronic health record usability and professional burnout among US physicians. Mayo Clin Proc. 2020;95(3):476- 487. doi: 10.1016/j.mayocp.2019.09.024

Varghese C, Harrison EM, O’Grady G, Topol EJ. Artificial intelligence in surgery. Nat Med. 2024;30(5):1257-1268. doi: 10.1038/s41591- 024-02970-3

Chryssofos S, Ochoa E, Sacks JM. The Digital Scribe: A New Wave of Efficiency and Quality of Life for Plastic Surgeons. Plast Reconstr Surg Glob Open. 2025;13(5):e6754. doi: 10.1097/ GOX.0000000000006754

van Buchem MM, Kant IMJ, King L, Kazmaier J, Steyerberg EW, Bauer MP. Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study. JMIR AI. 2024;3(1):e60020. doi: 10.2196/60020

Ormond MJ, Garling EH, Woo JJ, Modi IT, Kunze KN, Ramkumar PN. Artificial Intelligence in Commercial Industry: Serving the End-to-End Patient Experience Across the Digital Ecosystem. Arthroscopy. 2025;41(5):1683-1690. doi: 10.1016/j.arthro.2025.01.064

Higgins J, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane; 2022.

Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71

Bero L, Chartres N, Diong J, et al. The risk of bias in observational studies of exposures (ROBINS-E) tool: concerns arising from application to observational studies of exposures. Syst Rev. 2018;7(1):242. doi: 10.1186/s13643-018- 0915-2

Ong JHW, Tung JYM, Sng GGR, et al. A pilot study using ambient artificial intelligence scribes in clinical documentation in a urology outpatient clinic. BJU Int. 2025;136(3):417. doi: 10.1111/ bju.16784

Abdelhady AM, Davis CR. Plastic Surgery and Artificial Intelligence: How ChatGPT Improved Operation Note Accuracy, Time, and Education. Mayo Clin Proc Digit Health. 2023;1(3):299- 308. doi: 10.1016/j.mcpdig.2023.06.002

Hack S, Attal R, Locatelli G, et al. Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI-Augmented Operative Notes. Laryngoscope. 2025. doi: 10.1002/lary.70063

Ali A, Kumar RP, Polavarapu H, et al. Bridging the Gap: Can Large Language Models Match Human Expertise in Writing Neurosurgical Operative Notes? World Neurosurg. 2024;192:e34-e41. doi: 10.1016/j.wneu.2024.08.062

Hopkins BS, Dallas J, Yu J, et al. The use of generative artificial intelligence-based dictation in a neurosurgical practice: a pilot study. Neurosurg Focus. 2025;59(1):E8. doi: 10.3171/2025.4.FOCUS24834

Moryousef J, Nadesan P, Uy M, Matti D, Guo Y. Assessing the Efficacy and Clinical Utility of Artificial Intelligence Scribes in Urology. Urology. 2025;196:12-17. doi: 10.1016/j.urology.2024.11.061

Thomson A, Perera M, Murphy D, Lawrentschuk N. Scribe smarter, not harder: how artificial intelligence scribes stack up against human clinicians. BJU Int. 2025;137(1):15-17. doi: 10.1111/ bju.70037

Shah SJ, Crowell T, Jeong Y, et al. Physician Perspectives on Ambient AI Scribes. JAMA Netw Open. 2025;8(3):e251904. doi: 10.1001/jamanetworkopen.2025.1904

Shah SJ, Devon-Sand A, Ma SP, et al. Ambient artificial intelligence scribes: physician burnout and perspectives on usability and documentation burden. J Am Med Inform Assoc. 2025;32(2):375-380. doi: 10.1093/jamia/ocae295

Albrecht M, Shanks D, Shah T, et al. Enhancing clinical documentation with ambient artificial intelligence: a quality improvement survey assessing clinician perspectives on work burden, burnout, and job satisfaction. JAMIA Open. 2024;8(1):ooaf013. doi: 10.1093/jamiaopen/ooaf013

Previous article in this issue

Next article in this issue

Surgical Technology International, Electronic ISSN: 1090-3941 Published by AccScience Publishing