-
Di Gravio C**, Tao R*, Schildcrout JS*. Design and analysis of two-phase studies with multivariate longitudinal data. Biometrics. 2022 Jan 11.
Abstract
Two-phase studies are crucial when outcome and covariate data are available in a first-phase sample (e.g., a cohort study), but costs associated with retrospective ascertainment of a novel exposure limit the size of the second-phase sample, in whom the exposure is collected. For longitudinal outcomes, one class of two-phase studies stratifies subjects based on an outcome vector summary (e.g., an average or a slope over time) and oversamples subjects in the extreme value strata while undersampling subjects in the medium-value stratum. Based on the choice of the summary, two-phase studies for longitudinal data can increase efficiency of time-varying and/or time-fixed exposure parameter estimates. In this manuscript, we extend efficient, two-phase study designs to multivariate longitudinal continuous outcomes, and we detail two analysis approaches. The first approach is a multiple imputation analysis that combines complete data from subjects selected for phase two with the incomplete data from those not selected. The second approach is a conditional maximum likelihood analysis that is intended for applications where only data from subjects selected for phase two are available. Importantly, we show that both approaches can be applied to secondary analyses of previously conducted two-phase studies. We examine finite sample operating characteristics of the two approaches and use the Lung Health Study (Connett et al. (1993), Controlled Clinical Trials, 14, 3S-19S) to examine genetic associations with lung function decline over time.