Abstract
In many areas of science, technological advances have led to devices that produce data streams consisting of an enormous number of measurements per subject, including wearable devices, imaging data, and geospatial/temporal climate measurements. Frequently, researchers deal with these data by extracting summary statistics from them (e.g. mean) and modeling those, but this approach can miss key insights when the summaries do not capture all of the relevant information in the raw data. In this talk, we present a general distributional regression framework we have developed for distribution-on-scalar or distribution-on-function regression that models distributions using quantile functions, represents them by sparse, near-lossless basis functions called quantlets, and yields joint inference determining which predictors impact the distribution, while characterizing which aspects of the distribution differ while accounting for multiple testing.
The framework is built on the Functional Mixed Model framework developed over the past two decades encompassing any number of discrete and continuous predictors linearly or nonlinearly related to the response, with arbitrary numbers of random effect levels to encompass multi-level sampling designs, accommodating nonstationary spatial or temporal relationships between distributions, and with Gaussian models as well as heavier-tailed options to flexibly handle outliers. The approach uses a basis projection modeling approach that, with quantlets being a sparse near-lossless basis, makes it scalable to enormous sample sizes both in terms of number of distributional responses as well as number of repeated measurements used to estimate each distribution. I will describe the general modeling framework with its sparse, near-lossless basis transform strategy; describe and characterize the quantlet basis functions; and illustrate the power of generality of the framework through various applications including imaging data from cancer and multiple sclerosis studies, high frequency wearable data from a non-human-primate glaucoma study, geospatial/temporal climate data, and actigraphy data in a teen activity study. Time allowing, I will mention an informative missingness model to account for the inherent bias missingness can induce in the distributional summaries of wearable device data.
Department students and members are invited to meet with Dr. Morris before the presentation. Sign up for a one-on-one or small group appointment here.
Dr. Morris is the George S. Pepper Professor of Public Health and Preventative Medicine, and a professor of biostatistics at the Perelman School of Medicine at the University of Pennsylvania, where he also serves as director of the Division of Biostatistics. He received his PhD in statistics in 2000 from Texas A&M University working with Raymond J. Carroll, and was a distinguished professor at University of Texas M.D. Anderson Cancer Center until 2019 when he moved to the University of Pennsylvania.
Dr. Morris' research interests focus on developing quantitative methods to extract knowledge from biomedical big data, including work to relate complex biomedical object data—including functions, images, and manifolds—to patient outcomes and characteristics using flexible, automated regression methods, and to integrate information across multiple types of multi-platform genomic, proteomic, imaging, and wearable device data to uncover biomedical insights contained in these complex data. He has done extensive applied work in cancer research, including constructing novel prognostic indices for hepatocellular carcinoma and helping develop and characterize molecular subtypes of colorectal cancer to discover new precision therapeutic strategies.
During the pandemic, he has gotten involved in scientific communication, trying to bring sound statistical principles to scientific discourse in the media and on social media, working with various media outlets in communication what the scientific evidence has found in pandemic related matters, including vaccines, and has worked with fact checkers and others to debunk misinformation during the pandemic. He has also worked on various projects related to COVID-19, vaccines, and the pandemic, including large observational studies of national pediatric consortium data.