New evaluation system to render AI chatbots safe, empathetic | Center for Biomedical Ethics and Society

September 23, 2024

New evaluation system to render AI chatbots safe, empathetic

V-CARES will focus on detecting hallucinations, omissions and misaligned values in AI-generated responses on critical health topics.

By: Paul Govern

The public’s use of artificial intelligence (AI) chatbots for health care information continues to grow amid a welter of questions about their accuracy, safety and reliability and excitement over their potential to improve access to care.

A team at Vanderbilt University Medical Center will address the concerns and hopes surrounding health-related AI chatbots with the aid of a two-year funding award of up to $7.3 million from the Advanced Research Projects Agency for Health (ARPA-H).

ARPA-H, established in 2022, is an agency within the U.S. Department of Health and Human Services that supports transformative high-risk, high-reward research to drive biomedical and health breakthroughs to benefit everyone.

Led by principal investigator Susannah Rose, MSSW, PhD, associate professor of Biomedical Informatics, and co-principal investigator Zhijun Yin, PhD, MS, assistant professor of Biomedical Informatics and Computer Science, the team will build the Vanderbilt Chatbot Accuracy and Reliability Evaluation System (V-CARES). Using mental health as a demonstration case, the system will focus on detecting hallucinations, omissions and misaligned values, ensuring safety and empathy in AI-generated responses on critical health topics.

“We chose screening and treatment for major depression and generalized anxiety disorder because, from a safety and reliability standpoint, these chatbots pose a number of lingering challenges and unresolved questions,” said Rose, a core faculty member in the Center for Biomedical Ethics and Society and executive director of the AI Discovery & Vigilance to Accelerate Innovation & Clinical Excellence (ADVANCE) Center at VUMC. “In tackling this sensitive category of chatbot, V-CARES looks to furnish recommendations broadly applicable to the evaluation of chatbots across health care, and to do it in a way that is ethical and applicable to diverse populations.”

The project will combine human expertise with advanced computational techniques. Along the way, the team will engage patients and clinicians, develop comprehensive knowledge bases, and deploy state-of-the-art machine learning.

“We will pursue a novel multiexpert ensemble learning framework,” Yin said, “integrating various AI models and human expertise to achieve accurate detection of potential issues in chatbot responses.”

A key aspect of the research involves ensuring that the evaluation system addresses the concerns and values of diverse stakeholders.

“By incorporating community members, patients and clinicians throughout the process, we aim to create a system that not only improves technical accuracy but also aligns with users’ diverse values and expectations,” Rose said.

The project team brings together researchers with expertise in computer science, biomedical informatics, anthropology, bioethics, clinical psychology, psychiatry, social work and behavioral economics.

“Research like this is essential to developing AI solutions that effectively serve patients and health systems, and to provide the evidence needed to build and deploy systems that people can trust and rely upon in practice,” said project team member Peter Embí, MD, MS, professor and chair of Biomedical Informatics, senior vice president for Research and Innovation, and co-director of the ADVANCE Center. “This project is also a prime example of the multidisciplinary and cross-sector work that is central to our center’s mission.”

Other Vanderbilt researchers on the project include Laurie Novak, PhD, MHSA, Shelagh Mulvaney, PhD, Siru Liu, PhD, Bryan Steitz, PhD, Bradley Malin, PhD, and Keith Meador, MD, MPH. They are joined by Murat Kantarcioglu, PhD, from Virginia Tech in Blacksburg, Virginia, and Christopher Symons, PhD, MSc, and Amy Bucher, PhD, from Lirio, Inc., a Tennessee-based tech company combining AI innovation with behavioral science to improve health outcomes.