Quality assurance informs large-scale use of ambient AI clinical documentation

March 26, 2025
Leading in quality, Research

Kaiser Permanente analysis shares lessons learned in one of the largest rollouts of assisted clinical notetaking technology

Software that uses generative artificial intelligence (AI) to document medical visits and assist clinicians in drafting notes was largely accurate and well-received by doctors, according to a Kaiser Permanente report in the journal NEJM AI. The authors described lessons learned from one of health care’s largest rollouts of AI ambient clinical documentation technology.

Key to safe deployment was a quality assurance feedback loop. The organization carried out a 10-week pilot in early 2024, before the system was deployed throughout Kaiser Permanente’s 8 regions, 600 medical offices, and 40 hospitals. Because of how widely it would be used, the organization closely analyzed clinician feedback on their experience with the technology’s accuracy and usability.

“Kaiser Permanente adopted and scaled AI clinical documentation technology to respond to growing physician strain and burnout from the long hours they spend documenting patient care, both in the office and at home late into the evening,” said Brian Hoberman, MD, chief information officer for The Permanente Federation. “We developed a responsible AI framework for novel technology and ensured that the tools are fair, appropriate, valid, effective, and safe. As the first large-scale clinical AI deployment, this project received special focused analysis. We will build on lessons learned to drive future deployments of other AI initiatives, features, and enhancements.”

Related AI clinical documentation story: Lessons learned from the Kaiser Permanente rollout of ambient AI scribes

Use of the ambient AI documentation tool was — and remains — voluntary for clinicians, who seek approval from the patient before recording the medical visit. AI clinical documentation provides a transcript of the patient visit as well as a draft summary that physicians edit before adding it to the patient record. The audio recording is not maintained and patients’ data remain secured and private.

The software could allow doctors to spend more time talking to patients and less time focused on a computer screen, said co-author Vincent Liu, MD, chief data officer for The Permanente Medical Group and research scientist with the Kaiser Permanente Division of Research. “We have already seen the technology improve physician workloads and reduce documentation burden,” Dr. Liu said. “Both doctors and patients have told us that the AI clinical documentation tool also improves face-to-face communication during a visit.”

Co-author Khang Nguyen, MD, medical director for the Southern California Permanente Medical Group Care Transformation Office, emphasized that despite the involvement of AI, the tool does not make decisions or recommendations to the doctor about patient care. “This is not meant to replace clinicians, but to augment them, and to reduce the burden of documentation that can be distracting,” Dr. Nguyen said. “There are concerns about AI technology but in this use, we found benefits and no evidence of harm.”

Quality assurance central to rollout

While the AI clinical documentation technology was shown to be safe in pilot deployments in 2024, for the expanded rollout, Kaiser Permanente leaders commissioned a formalized quality assurance (QA) plan guided by responsible AI principles. This QA plan used existing processes and resources, incorporating flexibility given the evolving nature of AI technology, and helped the organization learn how to evaluate generative AI tools.

“In health care today, quality is often equated with safety. But quality also means taking advantage of technologies that improve patient experience and care, while supporting the wellness of our physicians and clinicians by enabling more focused time with patients and less with the computer,” said Nancy Gin, MD, FACP, chief quality officer of The Permanente Federation.

A QA team was assembled that included physicians, administrators, informaticists, safety officers, quality risk leaders, and evaluators. More than 1,000 physicians were recruited to be users in the 10-week pilot period.

Kaiser Permanente’s longstanding value-based model, in which care is led by physicians and integrated with coverage by Kaiser Foundation Health Plan, has established a partnership that was key to the deployment’s success.

“Like any technology we adopt in the care setting, we thoroughly evaluate all AI-based tools and systems before use and regularly monitor their performance to ensure they meet the highest standards for patient privacy and safety,” said Andrew Bindman, MD, EVP and chief medical officer for Kaiser Permanente. “The findings from our quality assurance process helped inform physician training on how and when to use the AI tool and reinforced that our clinicians and care teams are the medical decision-makers — not AI.”

The users gave feedback by rating the draft note on a 5-star scale after a patient encounter, by participating in feedback forums, and through a structured survey evaluating draft notes using a modified Physician Documentation Quality Instrument (PDQI), which measures dimensions such as accuracy, thoroughness, and perceived bias.

The pilot included 63,000 patient encounters and produced a large amount of user comments, providing rapid and ongoing feedback to leaders. Two quality team members read through more than 3,600 clinician free-text comments.

Star ratings were positive. Of the 14% of patient encounters that received ratings, 47% of ratings were 5-star, 31% were 4-star, and 7% received 1 or 2 stars out of 5. The authors described this as a “largely positive user reaction.”

The QA team received 1,252 responses to the PDQI questionnaire, with two-thirds of them coming from primary care providers. The average score was 4.35 out of 5 points. The most highly rated domain was that the tool was “free from bias,” while the lowest-rated domain was “note thoroughness.”

A minority of users said the AI tool had difficulty tracking multiple speakers, left out information, or made erroneous assumptions that were subsequently corrected by physicians. The QA team communicated frequently with the software vendor during the QA evaluation, and the vendor made rapid improvements in response to user feedback.

AI tool use might differ by medical specialty given how patient visits differ. The QA team provided specific findings so each specialty could deploy the technology in a way that suited their workflow.

The national rollout design was based on an earlier successful pilot implementation of AI documentation technology at Kaiser Permanente in Northern California. That pilot — which used a different vendor’s AI clinical documentation product but a similar QA process — was described in an article published in January 2024 in NEJM Catalyst.

That analysis found doctors using the AI tool spent less time looking at medical records outside of work hours and less time looking at health record notes during appointments. Also, a patient survey found that a majority felt they spent more time speaking with the physician during the visit, the doctor spent less time looking at the computer, and the technology felt neutral or very comfortable to them.

Rapid analysis of QA activities allowed Kaiser Permanente’s executive leadership to make informed decisions about continuing with deployment, and helped its vendor partner improve based on real-world evidence. Since rolling out the technology throughout Kaiser Permanente, it has been used to support more than 4 million encounters.

Additional co-authors of the NEJM AI article were Carol Cain, PhD, and Scott Young, MD, of The Permanente Federation; and Anna C. Davis, PhD, of Kaiser Foundation Health Plan and Hospitals Quality. To read the study and for a full list of authors please click here.

Jan Greene is a science writer with the Kaiser Permanente Division of Research in Northern California.

ambient AI artificial intelligence best practices in medicine Brian Hoberman MD Division of Research health care innovation health care transformation Jan Greene Khang Nguyen MD MEDICAL EXCELLENCE Nancy Gin MD NEJM New England Journal of Medicine Vincent Liu MD

Kaiser Permanente analysis shares lessons learned in one of the largest rollouts of assisted clinical notetaking technology

Quality assurance central to rollout

Share This

Related Posts