Abstract
Randomized controlled trials (RCTs) are the gold standard for causal inference on treatment effects, but they can be limited by small sample sizes due to the indications associated with rare diseases and small patient populations, where ethical concerns or patient reluctance may limit control group assignment. Hybrid controlled trials use external controls (ECs) from historical studies or large observational databases to enhance statistical efficiency. However, non-randomized ECs can introduce biases that compromise validity and inflate Type I errors for treatment discovery, particularly in small samples. To address this, we extend the Fisher randomization test to hybrid controlled trials. Our approach involves a test statistic combining RCT and EC data and is based solely on randomization in the RCT. This method strictly controls the Type I error rate, even with biased ECs, and improves power by incorporating unbiased ECs. To mitigate the power loss caused by biased ECs, we introduce Conformal Selective Borrowing, which uses individual conformal p-values to selectively incorporate unbiased ECs, offering the flexibility to use either computationally efficient parametric models or off-the-shelf machine learning models to construct the score function, along with model-agnostic reliability. We identify a risk-benefit trade-off in the power of FRT, associated with different selection thresholds for conformal p-values, analogous to the mean squared error trade-offs observed in the data integrative estimators. We propose a data-driven selection of the threshold value to achieve robust performance across different levels of hidden bias. The advantages of our method are demonstrated through simulations and an application to a small-sized lung cancer trial with ECs from the National Cancer Database.
Department students and members are invited to meet with Dr. Yang after the presentation. Sign up for your small-group appointment here.
Shu Yang is an associate professor of statistics at North Carolina State University. She received her PhD in applied mathematics and statistics from Iowa State University and completed her postdoctoral training at Harvard T. H. Chan School of Public Health. Her primary research interests are causal inference and data integration, particularly with applications to comparative effectiveness research in health studies. She also works extensively on methods for missing data and spatial statistics. She has been principal investigator for several NSF, NIH, and FDA research projects.