## PUBHLTH 223: Introduction to Biostatistics for Public Health

*Primary Audience: Undergraduate*

This introductory course is designed to give students the basic skills to organize and summarize data, along with an introduction to the fundamental principles of statistical inference. The course emphasizes an understanding of statistical concepts and interpretation of numeric data summaries along with basic analysis methods, using examples and exercises from medical and public health studies. The course does not require a high-level mathematics background, and will highlight the use and integration of statistical software, spreadsheets and word processing software in conducting and presenting data summaries and analyses.

## PUBHLTH 390R: Introduction to Data Science Using R

*Primary Audience: Undergraduate*

This course focuses on data visualization and data transformation, followed by other topics including exploratory data analysis and programing. You will learn the most important tools in R to do data science and gain hands-on experience through in-class coding activities and homework assignments. Along with the introduction of tools in R, you will also learn about basic concepts in data science. Throughout the course, you will practice communicating your results with others.

## PUBHLTH 460: Telling Stories with Data: Statistics, Modeling, and Data Visualization

*Primary Audience: Undergraduate*

The aim of this course is to provide students with the skills necessary to tell interesting and useful stories in real-world encounters with data. Specifically, they will develop the statistical and programming expertise necessary to analyze datasets with complex relationships between variables. Students will gain hands-on experience summarizing, visualizing, modeling, and analyzing data. Students will learn how to build statistical models that can be used to describe and evaluate multidimensional relationships that exist in the real world. Specific methods covered will include linear and logistic regression. Students will work with the R statistical computing language and by the end of the course will require substantial independent programming. The course will not provide explicit or detailed training in R programming. To the extent possible, the course will draw on real datasets from biological and biomedical applications. This course is designed for students who are looking for a second course in applied statistics/biostatistics (e.g. beyond PUBHLTH 223, 390B or STAT 240), or an accelerated introduction to statistics and modern statistical computing.

## PUBHLTH 490Z: Statistical Modeling for Health Data Science

*Primary Audience: Undergraduate*

This course is aimed at developing a broad understanding of statistical models with application to real data. Specifically, students will gain hands-on, in-depth experience analyzing data using simple/multiple linear regression, logistic regression, multinomial and Poisson regression and an introduction to machine learning. This course is designed for students who are looking for a second course in applied statistics/biostatistics beyond PUBHLTH 460 but can also be taken after PUBHLTH 223, 390B, STAT 240, or an equivalent introduction to statistics and modern statistical computing.

## 540: Intro Biostatistics

*Primary Audience: Foundational pre-requisite for MS Biostatistics*

Principles of statistics applied to analysis of biological and health data, evaluation of public health and clinical programs. Gen Ed: R2 (Analytical Reasoning).

## 640: Intermediate Biostatistics

*Primary Audience: Foundational pre-requisite for MS Biostatistics*

Principles of statistics applied to analysis of biological and health data. Continuation of Bioepi 540 including analysis of variance, regression, nonparametric statistics, sampling, and categorical data analysis.

## 597D-E-A: R for data science (Levels 1-3)

*Primary Audience: Undergraduates, MS/PhD in Biostatistics, Epidemiology*

R has emerged as a preferred programming language in data science. This sequence of three courses covers topics in R programming to develop powerful, robust, and reusable data science tools. Main topics in part I include data wrangling, visualization, and reporting using R markdown. Part II focuses on programming, modeling, iteration, and the development of web apps using R Shiny. In Part III, we learn how to collaborate on code using GitHub and write R packages.

## 691F: Data Mgmt & Analysis/SAS

*Primary Audience: Undergraduates, MS/PhD in Biostatistics, Epidemiology*

SAS software is used widely outside academia. Many graduates find it a useful skill in job hunting. This course covers using SAS for basic data manipulation and analysis. It also reinforces understanding of key statistical concepts. You will use SAS to: read data from many formats; generate univariate statistics and histograms; define new variables using logic and functions; merge and subset data sets; make bivariate tables and scatterplots; test for association; perform linear and logistic regression.

## 690C: Data Mgmt & Analysis/Stata

*Primary Audience: Undergraduates, MS/PhD in Biostatistics, Epidemiology*

This course is an introduction to the design, management, and use of data management systems for the collection and analysis of research data, especially epidemiologic research data on humans. MS Excel, MS Access, and Stata are emphasized. Topics include data base development, Health Insurance Portability Accountability Act (HIPAA) compliance, data manipulation and cleaning, data summarization, and selected topics in statistical analysis programming.

## 690A: Fundamentals of Probability and Statistical Inference

*Primary Audience: MS Biostatistics, MS/PhD Epidemiology*

The goal of this 3-credit course is to introduce fundamentals of probability theory, statistical inference tools and their application to biostatistics. The course is intended for first-year graduate students in Biostatistics MS program. The topics in this course include basic concepts of probability, random variables, important probability distributions (e.g., normal, exponential, binomial and Poisson), marginal distribution, conditional distribution, joint distribution, expectation and variance, conditional expectation, law of large numbers, central limit theorem, sampling distributions, point estimation, maximum likelihood estimation, method of moments and estimating equations, interval estimation, hypothesis testing. Examples from biostatistical applications will be used whenever possible. Simple simulations with R software will be used to illustrate some concepts in probability and statistical inference.

## 690Z: Health Data Science and Statistical Modeling

*Primary Audience: MS Biostatistics, MS/PhD Epidemiology*

This course is for students who want to learn essential statistical and computational skills for health data science. Students will obtain hands-on experience in implementing a wide range of commonly used statistical methods with real data from public health and biomedical research using the statistical programming language R. The course motivates statistical reasoning and methods through real health data. The focus of the course is to train students in refining a scientific question into a statistical framework, choosing proper regression models, writing scripts and executing them in R, and interpreting scientifically meaningful findings.

## 690P: Topics in Biostatistics and Data Science

*Primary Audience: MS Biostatistics, MS/PhD Epidemiology*

The course introduces advanced central topics in biostatistics and health data science including maximum likelihood inference, survival analysis, design and analysis of clinical trials, models for correlated data, bayesian modeling, and causal inference. The course motivates statistical reasoning and methods through substantive research questions and features of data typically available in public health and biomedical research. Students will obtain hands-on experience in applying selected methods on real data using the statistical programming language R.

## 683: Introduction to Causal Inference in a Big Data World

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

With the recent and ongoing 'data explosion', methods to delineate causation from correlation are perhaps more pressing now than ever. This course will introduce a general framework for causal inference: 1) clear statement of the scientific question, 2) definition of the causal model and parameter of interest, 3) assessment of identifiability, 4) choice and implementation of estimators including parametric and semi-parametric methods, and 5) interpretation of findings. The methods include G-computation, inverse probability weighted (IPW), and targeted maximum likelihood estimation (TMLE) with Super Learning, an ensemble machine learning method. Students gain practical experience implementing these estimators and interpreting results through discussion assignments, R labs, R assignments, and a final project.

## 690T: Applied Statistical Genetics

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

This course will provide fundamental statistical concepts and tools relevant to the analysis of high-dimensional genomics data arising from population-based association studies. A first-course in statistics is assumed.

## 730: Applied Bayesian Statistical Modeling

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

Bayesian modeling approaches provide natural ways for researchers in many disciplines to structure their data and knowledge, and they yield direct and intuitive answers to scientific questions. In this course, students will learn how to construct Bayesian models to relate (potentially complex) data to scientific questions, to fit such models fitting using statistical programs (R, JAGS and;or STAN), to interpret model results and lastly, to check model assumptions. Specific methods covered will include Bayesian linear and logistic regression, as well as hierarchical (regression) models. Additional topics (survival analysis, time series analysis, spline regression models) will be discussed as time allows. The class also includes the discussion of selected papers from the literature that serve as case studies of Bayesian analyses in public health, as well as a project in which students carry out Bayesian modeling for a real data set.

## 740: Analysis of Mixed Models Data

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

In many applications, one of the key assumptions foundational to statistical analysis is violated: the assumption of statistical independence. That is, in life, observations often have some amount of dependence among them. This course mainly explores mixed models, one key method that accommodates correlated data. Other approaches will be touched on briefly. In this course you will learn what the implications of correlated data are, from a statistical perspective, how to think about them, how to analyze them and interpret the data analysis, and the theoretical underpinnings of the data analysis approaches. Learning this material will suit you for professional environments in which you might encounter this kind of data, which are legion. You will also reinforce your data management skills in SAS and R. By the end of this course, students will 1) recognize correlated data occurring by design or chance; 2) diagnose the likely effects of ignoring correlation in data analysis; 3) discuss correlated and longitudinal data; 4) learn about mean models for longitudinal data; 5) learn about mixed modesl; and 6) understand the theoretical bases of the methods employed.

## 743: Analysis of Categorical Data in Public Health

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

This course provides an overview of statistical methods for analyzing data where the outcome variable is categorical or discrete. The course will emphasize the theoretical underpinnings of the methods as well as an applied understanding of the computation and interpretation, both of which are necessary to succeed with real data analysis. We will cover inference for binomial and multinomial variables with contingency tables, generalized linear models, logistic regression for binary responses, logit models for multiple response categories, log-linear models, some statistical machine learning approaches, inference for matched-pairs, and correlated, clustered data. Examples will be taken from public health and biomedical research.

## 748: Applied Survival Analysis

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

The aim of this course is to provide students with a strong foundation in statistical techniques used for the analysis of time-to-event data. Specific topics will include types of censoring mechanisms, graphical and numerical description of survival data, methods for comparison of survival between groups, models to explain and predict survival as a function of baseline and time-varying covariates. Advanced topics including the analysis of competing risks and recurrent event data will be introduced as time permits. The course will include hands-on analysis of datasets using standard statistical software R.

## 749: Statistical Methods in Clinical Trials

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

The aim of this course is to provide students with a strong foundation in statistical techniques used for the design, analysis and interpretation of clinical trials. Topics include types of clinical research, study design, treatment allocation, randomization and stratification, quality control, sample size requirements, patient consent, introduction to survival analysis and interpretation of results from clinical trials. Special topics introduced include group sequential methods in clinical trials, including hypothesis testing (one-sided, two-sided and equivalence tests), analysis techniques in the context of sequential trials (repeated confidence intervals) and an introduction to flexible monitoring approaches. Statistical software (SAS/R) will be introduced as needed.

## 750: Applied Statistical Learning

*Primary Audience: MS/PhD Biostatistics, MS/PhD Epidemiology*

The goal of this course is to introduce some statistical modeling approaches, which have been developed in the last few years and are widely used in medical and public health research, but are not covered in the core courses of the MS;PhD programs in Biostatistics and Epidemiology. Topics include penalized regression, methods for classification, evaluation of predictions, and robust regression. The cross-validation and bootstrapping procedures, which are important in evaluating and performing inference for models, will be introduced. The course emphasizes helping students to understand the concepts and ideas of some modern statistical methods and apply these methods to research medical and public health studies. Implementation of different methods with R software will be introduced whenever appropriate.

## 790A: Advanced Statistical Inference

*Primary Audience: PhD Biostatistics*

The goal of this 3-credit course is to introduce advanced statistical inference tools which are widely used in biostatistics methodology research and applications in medical and public health. The course is intended for second-year graduate students who have taken a first-year graduate level probability and mathematical statistics courses from a text like "Statistical Inference" by Casella and Berger. The topics in this course include classical likelihood inference plus modern topics like M-estimation, the jackknife, and the bootstrap.

## 790C: Special Topic: Causal Inference

*Primary Audience: PhD Biostatistics*

This course will introduce students to both statistical theory and practice of causal inference. We will review the basics of causal inference, introduce a missing data perspective of causal inference and instrumental variable methods. We then cover 3 advanced topics based on a survey to students. Tentative topics include randomization inference, mediation analysis, principal stratification, measurement error, natural experiments, and causal inference with interference.