AccScience Publishing / STI / Volume 46 / Issue 1 / DOI: 10.36922/sti.0457
REVIEW

Artificial Intelligence Documentation Tools in Surgery: A Systematic Review

Damien Gibson1,2,3* Victor Yu1,3 Kate Alexander1 Kun Yu4 Scott Leslie1,2,3 Ruban Thanigasalam1,5 Nicola Jeffery1,3 Daniel Steffens1,3,6
Show Less
1 Surgical Outcomes Research Centre, Royal Prince Alfred Hospital, Sydney, New South Wales, Australia
2 Faculty of Medicine and Health, Central Clinical School, The University of Sydney, Sydney, New South Wales, Australia
3 Department of Urology, Royal Prince Alfred Hospital, Sydney, New South Wales, Australia
4 Data Science Institute, University of Technology Sydney, Sydney, New South Wales, Australia
5 Department of Urology, Chris O’Brien Lifehouse, Sydney, New South Wales, Australia
6 NHMRC Clinical Trials Centre, The University of Sydney, Sydney, New South Wales, Australia
STI 2026, 46(1), 0457 https://doi.org/10.36922/sti.0457
Received: 27 December 2025 | Revised: 13 February 2026 | Accepted: 24 February 2026 | Published online: 29 May 2026
© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

Background: Clinical documentation burden is a major contributor to burnout in surgery. Artificial intelligence (AI) tools, such as automatic speech recognition (ASR) and large language models (LLMs), may streamline documentation without sac­rificing quality.

Objective: We systematically reviewed the performance of ASR- and LLM-based docu­mentation tools in surgical settings.

Methods: Following the Preferred Reporting Items for Systematic Reviews and Me­ta-analyses, MEDLINE, Embase, CENTRAL, and Scopus (January 2015–October 2025) were searched for studies evaluating AI-enabled documentation (e.g., ambient scribes, advanced ASR, LLM-assisted drafting) in surgical care. Dual reviewers screened, ex­tracted, and assessed risk of bias using the Risk of Bias in Non-randomized Studies of Exposures tool. Heterogeneity of included studies precluded meta-analysis, and results are presented narratively.

Results: Seven studies published between 2023 and 2025 across otolaryngology, neuro­surgery, plastic surgery, and urology were included. Tools such as LLM-assisted operative reports, ambient clinic scribes, and ASR dictation were employed. Findings revealed that AI scribes improved documentation efficiency (5.16 min vs. 10.58 min) and reduced documentation time (5–50 s vs. 7.1–7.4 min), with hybrid clinician-in-the-loop work­flows achieving the best balance of speed and quality. AI scribe notes were non-inferi­or to clinician notes on the Physician Documentation Quality Instrument-9 (33.6/45). Operative note quality was highest with hybrid attending-reviewed generative pre-trained transformer drafts (79% as-is approval) and lowest with generative pre-trained transformer-only notes (23%). Whisper ASR was non-inferior to Dragon Medical One for word error rate and superior when linguistic errors were excluded.

Conclusion: Early evidence suggests clinician-supervised AI documentation may ac­celerate note generation while maintaining comparable quality, with hybrid use out­performing AI-only approaches. However, the evidence base is early, heterogeneous, and largely non-randomized, and downstream outcomes—including burnout—remain unmeasured. Real-world trials incorporating patient, workflow, safety, and governance outcomes are needed to guide supervised implementation.

Keywords
Artificial intelligence
Surgical documentation
Ambient digital scribes
Large language models
Operative reports
Clinical workflow
Surgeon burnout
Funding
Professor Daniel Steffens holds a Cancer Institute NSW Career Develop¬ment Fellowship. No other authors have received any funding or support.
Conflict of interest
The authors declare no conflict of interest.
References
  1. Kunze KN, Bepple J, Bedi A, Ramkumar PN, Pean CA. Commercial Products Using Generative Arti­ficial Intelligence Include Ambient Scribes, Auto­mated Documentation and Scheduling, Revenue Cycle Management, Patient Engagement and Edu­cation, and Prior Authorization Platforms. Arthros­copy. 2025;41(11):4950-4955. doi: 10.1016/j. arthro.2025.05.021

 

  1. Dimou FM, Eckelbarger D, Riall TS. Surgeon burnout: a systematic review. J Am Coll Surg. 2016;222(6):1230-1239. doi: 10.1016/j.jamcoll­surg.2016.03.022

 

  1. Kataria S, Ravindran V. Electronic health records: a critical appraisal of strengths and limitations. J R Coll Physicians Edinb. 2020;50(3):262-268. doi: 10.4997/jrcpe.2020.309

 

  1. Kroth PJ, Morioka-Douglas N, Veres S, et al. Asso­ciation of electronic health record design and use factors with clinician stress and burnout. JAMA Netw Open. 2019;2(8):e199609. doi: 10.1001/ jamanetworkopen.2019.9609

 

  1. McPeek-Hinz E, Boazak M, Sexton JB, et al. Clinician burnout associated with sex, clinician type, work culture, and use of electronic health records. JAMA Netw Open. 2021;4(4):e215686. doi: 10.1001/jamanetworkopen.2021.5686

 

  1. Melnick ER, Dyrbye LN, Sinsky CA, et al. The association between perceived electronic health record usability and professional burnout among US physicians. Mayo Clin Proc. 2020;95(3):476- 487. doi: 10.1016/j.mayocp.2019.09.024

 

  1. Varghese C, Harrison EM, O’Grady G, Topol EJ. Artificial intelligence in surgery. Nat Med. 2024;30(5):1257-1268. doi: 10.1038/s41591- 024-02970-3

 

  1. Chryssofos S, Ochoa E, Sacks JM. The Digital Scribe: A New Wave of Efficiency and Quality of Life for Plastic Surgeons. Plast Reconstr Surg Glob Open. 2025;13(5):e6754. doi: 10.1097/ GOX.0000000000006754

 

  1. van Buchem MM, Kant IMJ, King L, Kazmaier J, Steyerberg EW, Bauer MP. Impact of a Dig­ital Scribe System on Clinical Documentation Time and Quality: Usability Study. JMIR AI. 2024;3(1):e60020. doi: 10.2196/60020

 

  1. Ormond MJ, Garling EH, Woo JJ, Modi IT, Kunze KN, Ramkumar PN. Artificial Intelligence in Commercial Industry: Serving the End-to-End Patient Experience Across the Digital Ecosys­tem. Arthroscopy. 2025;41(5):1683-1690. doi: 10.1016/j.arthro.2025.01.064

 

  1. Higgins J, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions ver­sion 6.3 (updated February 2022). Cochrane; 2022.

 

  1. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71

 

  1. Bero L, Chartres N, Diong J, et al. The risk of bias in observational studies of exposures (ROB­INS-E) tool: concerns arising from application to observational studies of exposures. Syst Rev. 2018;7(1):242. doi: 10.1186/s13643-018- 0915-2

 

  1. Ong JHW, Tung JYM, Sng GGR, et al. A pilot study using ambient artificial intelligence scribes in clinical documentation in a urology outpatient clinic. BJU Int. 2025;136(3):417. doi: 10.1111/ bju.16784

 

  1. Abdelhady AM, Davis CR. Plastic Surgery and Artificial Intelligence: How ChatGPT Improved Operation Note Accuracy, Time, and Educa­tion. Mayo Clin Proc Digit Health. 2023;1(3):299- 308. doi: 10.1016/j.mcpdig.2023.06.002

 

  1. Hack S, Attal R, Locatelli G, et al. Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI-Augmented Operative Notes. Laryngoscope. 2025. doi: 10.1002/lary.70063

 

  1. Ali A, Kumar RP, Polavarapu H, et al. Bridging the Gap: Can Large Language Models Match Human Expertise in Writing Neurosurgical Operative Notes? World Neurosurg. 2024;192:e34-e41. doi: 10.1016/j.wneu.2024.08.062

 

  1. Hopkins BS, Dallas J, Yu J, et al. The use of generative artificial intelligence-based dic­tation in a neurosurgical practice: a pilot study. Neurosurg Focus. 2025;59(1):E8. doi: 10.3171/2025.4.FOCUS24834

 

  1. Moryousef J, Nadesan P, Uy M, Matti D, Guo Y. Assessing the Efficacy and Clinical Utility of Artificial Intelligence Scribes in Urology. Urol­ogy. 2025;196:12-17. doi: 10.1016/j.urol­ogy.2024.11.061

 

  1. Thomson A, Perera M, Murphy D, Lawrentschuk N. Scribe smarter, not harder: how artificial intelligence scribes stack up against human clini­cians. BJU Int. 2025;137(1):15-17. doi: 10.1111/ bju.70037

 

  1. Shah SJ, Crowell T, Jeong Y, et al. Physician Per­spectives on Ambient AI Scribes. JAMA Netw Open. 2025;8(3):e251904. doi: 10.1001/jamanet­workopen.2025.1904

 

  1. Shah SJ, Devon-Sand A, Ma SP, et al. Ambient arti­ficial intelligence scribes: physician burnout and perspectives on usability and documentation bur­den. J Am Med Inform Assoc. 2025;32(2):375-380. doi: 10.1093/jamia/ocae295

 

  1. Albrecht M, Shanks D, Shah T, et al. Enhancing clin­ical documentation with ambient artificial intelli­gence: a quality improvement survey assessing cli­nician perspectives on work burden, burnout, and job satisfaction. JAMIA Open. 2024;8(1):ooaf013. doi: 10.1093/jamiaopen/ooaf013
Share
Back to top
Surgical Technology International, Electronic ISSN: 1090-3941 Published by AccScience Publishing