Power and sample size calculation for differential expression analysis in single-cell RNA sequencing data
Presenting author: Chih-Yuan Hsu, Department of Biostatistics, Vanderbilt University Medical Center
Co-authored by:
- Qi Liu, Department of Biostatistics, Vanderbilt University Medical Center
- Yu Shyr, Department of Biostatistics, Vanderbilt University Medical Center
Abstract:
Single-cell RNA sequencing (scRNAseq) have been widely used to characterize cellular heterogeneity in complex tissues. The power and sample size calculation for scRNAseq experiments generally assume the data follow specific distributions, such as zero-inflated Poisson, negative binomial, or zero-inflated negative binomial. However, the data after normalization may no longer follow the same distribution as assumed before, leading to unsatisfying and biased estimation for power and sample sizes. To address this issue, we propose an analytic method based on the generalized estimating equations for calculating power and sample sizes in the differential expression analysis of scRNAseq data. The method starts with the normalized pilot data without the limit of normalization methods, makes no assumption on the data distribution, and accounts for cell-cell correlation within subjects. Therefore, the method largely facilitates the design of scRNAseq experiments.