What is biostatistics?
Biostatistics is the branch of statistics responsible for the proper interpretation of scientific data generated in the biomedical sciences. In these sciences, subjects (patients, mice, cells, etc.) exhibit considerable variation in their response to stimuli. This variation may be due to different treatments or it may be due to chance, measurement error, or underlying characteristics of the individual subjects. Biostatistics is particularly concerned with disentangling these different sources of variation, as well as seeking to distinguish between correlation and causation, and making valid inferences from known samples. (For example, when patients are treated with two different therapies, do the results justify the conclusion that one treatment is better than the other?)
Who are biostatisticians?
Biostatisticians are specialists in the evaluation of data as scientific evidence. They understand the generic construct of data and they provide the mathematical framework that transcends the scientific context to generalize the findings; in other words, biostatisticians use mathematics to enhance science and bridge the gap between theory and practice. Their expertise includes the design and conduct of experiments, the mode and manner in which data are collected, the analysis of data, and the interpretation of results.
Biostatistics is integral to the advancement of knowledge in biology, health policy, clinical medicine, public health policy, health economics, proteomics, genomics, and other disciplines. In the era of big data and precision medicine, biostatisticians have an especially crucial role to play as data scientists.
The key concepts of precision medicine are prevention and treatment strategies that take individual molecular profile and clinical information into account. Single-cell next-generation sequencing (NGS) technologies, liquid biopsy for circulating tumor DNA (ctDNA), microbiomics, radiomics, and other types of high-throughput assays have exploded in popularity in recent years, thanks to their ability to produce an enormous volume of data quickly and at relatively low cost. The emergence of these big data has advanced the goals of precision medicine; however, across the entire continuum of big data capture to utilization, many more challenges lie ahead—from analysis of high-throughput biomarkers to maximum exploitation of the electronic health record (EHR), to the ultimate goal of clinical guidance based on a patient’s genome.
Because of these challenges, the field of biostatistics is in a period of disruptive change—change long-time coming, as John Tukey called for a reformation of academic statistics almost 60 years ago. He pointed to the existence of an as-yet unrecognized science in “The Future of Data Analysis.” More than ten years ago, John Chambers, Bill Cleveland, and Leo Breiman independently urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics. Cleveland even suggested the catchy name “Data Science” for his envisioned field.
Biostatisticians as data scientists
At Vanderbilt, our biostatisticians understand the changing landscape for statistical science: we not only facilitate biomedical research by providing methodological expertise and by closely collaborating with scientists and physician-researchers, we also are leaders in the data revolution. From methodology to application to education, we embrace the concepts of data science, and continue to advance our ability to extract evidence from data. We welcome the challenges and opportunities of the big data era—from the explosive growth in sheer volume of data, to treating unstructured text as quantitative data, to machine learning, to burgeoning applications for artificial intelligence—and look forward to ever-more-rapid advances in biomedical knowledge.