Topic Models in Microbiome Analysis

Abstract

Microbiome data give us a glimpse into the microbial dynamics and interactions that underlie so much of human health. Getting the most out of these data often presents a challenge, however—any analysis needs to account for the large number of taxa present relative to the number of samples (high-dimensionality) and the fact that meaningful signals can be present across a range of resolutions (from entire phyla to individual strains). We will explore how these issues can be addressed through topic modeling, an idea originally developed in the population genetics and language modeling literatures. In contrast to traditional clustering, where each sample/document is assigned to a fixed cluster, topic models suppose that observations are a continuous blend of representative prototypes. While broadly useful, topic models do require users to specify the choice of the number of topics K, which governs the resolution at which the topics are learned. We will discuss a new technique, which we call topic alignment, for comparing topics from across a collection of topic models. Through simulation studies, we show that this approach can distinguish between true and spurious topics by accounting for the stability of the recovered topics across K. We will illustrate how topic modeling and alignment can clarify the ecosystem dynamics in gut and vaginal microbiome data. Code for all examples is available through online  vignettes (https://go.wisc.edu/uc1qq5, https://go.wisc.edu/73ne7a), and topic alignment is available through the R package Alto (https://go.wisc.edu/8ez208).

Department students and members are invited to meet with Dr. Sankaran after the presentation. Sign-up for your small group appointment here.


Kris Sankaran is an assistant professor in the Statistics Department at UW-Madison and is a discovery fellow at the Wisconsin Institute for Discovery. His group's research focuses on the statistical foundations for microbiome analysis. They actively develop visualization and modeling software packages to support practical microbiome analysis projects, from experimental design to interpretation. He completed his PhD at Stanford University in 2018 and his postdoc at the Mila-Quebec AI Institute in 2020.