Topic outline

  • Start Here

    Before you begin the course, please take a moment to take a short knowledge pre-test. Thank you for participating.


    Quiz: 1
  • Course Home Page

    StatisticsThis Biostatistics course provides an introduction to the field of biostatistics, the branch of statistics responsible for the interpretation and application of scientific data generated in public health, clinical medicine, biology, and other health sciences. Students will develop foundational skills and knowledge in biostatistics (through online didactics) and gain deeper understanding of its relevance and application to public health, health policy, clinical medicine, and health economics (with globally-available peers and mentors). All components of this training (like all NextGenU.org trainings) are free, including registration, learning, testing, and a certificate of completion.

    There are 7 modules to complete through online study and peer and mentored activities. These modules provide an introduction to probability and sampling distributions, confidence intervals, hypothesis testing, regression analysis, confounding and interactions and the application of biostatistics in the practice and study of public health.

    There are practice quizzes in each module, and at the end of the course you’ll have a final exam, and a chance to give your assessment of this training. We will give you all the results of your assessments, such as your final exam and peer activities. We can report your testing information and share your work with anyone (your school, employer, etc.) that you request. We hope this is a wonderful learning experience for you, and that your assessments will teach us how we can make it even better.

    Select the Next button to begin Module 1: The Basics of Biostatistics

    This course was sponsored by the University of the Incarnate Word and was developed in partnership with the Association for Prevention Teaching and Research (APTR) and the US Centers for Disease Control and Prevention (CDC). Like all NextGenU courses, it is competency-based, using competencies from the Association of Schools and Programs of Public Health (ASPPH). This course uses learning resources from world-class academic and governmental organizations such as Penn State, Rice University,  and the US Centers for Disease Control and Prevention (CDC).

    Approximate time required for the required readings for the course is 47 hours at an average rate of 144 words/minute; in addition, there are required activities.

    Forum: 1Glossaries: 2
  • Module 1: The Basics of Biostatistics

    Cumulative risk of death from lung cancerData are everywhere around us. Making sense of the massive amounts of data for the purpose of improving the population health requires understanding statistical principles and developing skills in applying these concepts. In this first module, we start by looking at the different types of data we often encounter, and what we can say about them in a basic sense using descriptive statistics. We will learn about different ways to present the data to highlight important messages, and investigate designing studies to understand the practical applications of important statistical concepts. The type of data often determines what statistical tools we can use, so this module is crucial for upcoming materials in this course.

    Resources in this module are quite diverse. The resources are not an exhaustive reference for tools for understanding and presenting descriptive statistics, but rather a starting point, covering the basic concepts.

    Competencies covered in this module:
    1. Describe the roles biostatistics serves in the discipline of public health.
    2. Distinguish among the different measurement scales and the implications for selection of statistical methods to be used based on these distinctions.
    3. Apply descriptive techniques commonly used to summarize public health data.
    4. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.

    Upon completion of this module, students should be able to:

    • Explore the basic principles of statistics and some of its common uses
    • Understand the basic principles of probability, descriptive statistics, and data analysis
    • Understand how to generate descriptive statistics from data
    • Understand the different types of variables, how they are used, and how to summarize the data
    • Understand and identify the different types of plots and graphs
    • Generate descriptive statistics from data, calculate descriptive statistics and standard deviations, and understand the methods of summarizing a single quantitative variable
    • Summarize and describe the distribution of a categorical variable, and understand the uses and implications of the normal distribution
    • Understand the basic types of data, the main ways in which data is used, and important considerations when using data in analysis
    • Identify the design of a study and explain how this impacts interpretation
    • Apply knowledge and skills in working with different data types in a chosen public health setting
  • Module 1: Lesson 1: Introduction to Biostatistics

    Learning Objective IconLearning Objectives
    • Explore the basic principles of statistics and some of its common uses
    • Understand the basic principles of probability, descriptive statistics, and data analysis
    • Understand how to generate descriptive statistics from data
    Pie Chart

    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 4Quiz: 1
  • Module 1: Lesson 2: Types of Variables, Plots and Graphs

    Learning Objective Icon Learning Objectives
    • Understand the different types of variables, how they are used, and how to summarize the data
    • Understand and identify the different types of plots and graphs
    English Dialects

    Approximate time required for the readings for this lesson (at 144 words/minute): 2 hours.

    URLs: 5Quiz: 1
  • Module1: Lesson 3: Descriptive Statistics and Distribution

    Learning Objectives Icon Learning Objectives
    • Generate descriptive statistics from data, calculate descriptive statistics and standard deviations, and understand the methods of summarizing a single quantitative variable
    • Summarize and describe the distribution of a categorical variable, and understand the uses and implications of the normal distribution
    Visualisation mode median mean<

    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 5Quiz: 1
  • Module1: Lesson 4: Data Analysis and Study Design

    Learning Objectives IconLearning Objectives
    • Understand the basic types of data, the main ways in which data is used, and important considerations when using data in analysis
    • Identify the design of a study and explain how this impacts interpretation
    • Apply knowledge and skills in working with different data types in a chosen public health setting
    Research design and evidence

    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 4Workshop: 1Assignment: 1Quiz: 1
  • Module 2: Probability and Sampling Distributions

    Image of a hand tossing a coin with the thumb, coin tossIn this module, we cover three major topics of the foundation of biostatistics: 1) probability of events, 2) random variables, and 3) sampling distributions. Probability can be a simple concept, but is sometimes not intuitive and may require much thinking and practice to fully grasp. Once we model health phenomena as random variables, then we can use the principles of probability to help us learn about their distributions and understand what is likely to happen. The central limit theorem and the normal model are very important tools to allow us to understand the big picture from a small snapshot (i.e. the relationship between population parameters and sample statistics), and help us arrive at a concrete numerical measure of likelihood.

    The resources in this module try to achieve a balance between theory and practice. These concepts may seem very different at first, but they are very much connected. Please feel free to explore different ways of presenting the concepts, and go back and forth between theory and practice until you are confident in your understanding and in calculating event probabilities.

    Competencies covered in this module:
    1. Describe basic concepts of probability, random variation and commonly used statistical probability distributions.
    2. Distinguish among the different measurement scales and the implications for selection of statistical methods to be used based on these distinctions.
    3. Apply descriptive techniques commonly used to summarize public health data.
    4. Apply common statistical methods for inference.
    5. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.

    Upon completion of this module, students should be able to:

    • Relate the probability of an event to the likelihood of this event occurring
    • Understand how to interpret and generate proportions from data
    • Explain how relative frequency can be used to estimate the probability of an event
    • Understand the concepts of probability, conditional probability, and independence
    • Understand the concept of random variables
    • Distinguish between samples and population and identify different types of samples
    • Understand sampling distribution, variance, and the central limit theorem
    • Understand the implications and uses of normality and skewness
    • Be able to calculate and correctly interpret probability data from a sampling distribution
  • Module 2: Lesson 1: Probability, Frequency, and the Concepts of Probability

    Learning Objectives Icon - image of target Learning Objectives
    • Relate the probability of an event to the likelihood of this event occurring
    • Understand how to interpret and generate proportions from data
    • Explain how relative frequency can be used to estimate the probability of an event
    • Understand the concepts of probability, conditional probability, and independence

    Image of dice on Barchart. By Ipipipourax [CC BY-SA 3.0  (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons


    Approximate time required for the readings for this lesson (at 144 words/minute): 2 hours.
    URLs: 6Quiz: 1
  • Module 2: Lesson 2: Variables, Sampling and Distribution

    Learning Objectives Icon - image of target Learning Objectives
    • Understand the concept of random variables
    • Distinguish between samples and population and identify different types of samples
    • Understand sampling distribution, variance, and the central limit theorem
    • Understand the implications and uses of normality and skewness
    • Be able to calculate and correctly interpret probability data from a sampling distribution
    Variables sampling - By Dan Kernler [CC BY-SA 4.0  (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons

    Approximate time required for the readings for this lesson (at 144 words/minute): 7 hours.

    URLs: 10Workshop: 1Quiz: 1
  • Module 3: Confidence Intervals

    Standard deviation diagram

    In this short module, we build on the concept of sampling distributions to use samples to make inference on the population. We will start by understanding inference and estimation, then learn to calculate confidence intervals. We will also look at factors and conditions that influence estimation.

    In contrast to an orderly presentation like in previous modules, most of the resources in this module cover the same key concepts in slightly different ways.

    Competencies covered in this module
    1. Distinguish among the different measurement scales and the implications for selection of statistical methods to be used based on these distinctions.
    2. Apply common statistical methods for inference.
    3. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.

    Upon completion of this module, students should be able to:

    • Understand and be able to apply point estimation and confidence interval estimation
    • Recognize inference on means versus inference on proportions
    • Be able to calculate one-sided and two-sided confidence intervals for mean and proportion
    • Understand the effect of sample size and other conditions on estimation as well as understand and be able to calculate the sample size needed to achieve the desired confidence level
  • Module 3: Lesson 1: Point and Confidence Interval Estimation

    Learning Objectives Icon - image of target Learning Objectives
    • Understand and be able to apply point estimation and confidence interval estimation
    • Recognize inference on means versus inference on proportion
    • Be able to calculate one-sided and two-sided confidence intervals for mean and proportion
    Relative survival of ovarian cancer by stage

    Approximate time required for the readings for this lesson (at 144 words/minute): 3 hours.

    URLs: 6Quiz: 1
  • Module 3: Lesson 2: Effect of Sample Size on Confidence Interval

    Learning Objectives Icon - image of target Learning Objective
    • Understand the effect of sample size and other conditions on estimation as well as understand and be able to calculate the sample size needed to achieve the desired confidence level

    IMage of stacks at different levels, intervals


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 2Quiz: 1
  • Module 4: Hypothesis Testing

    H0 h1 fehler

    Hypothesis testing is a cornerstone in science. In biostatistics, it is used often to assess anything from the effectiveness of an intervention, to changes in the distribution of a health outcome. But there are many tests out there so how do they work? When do you use which? How does hypothesis testing differ from estimation, and from regression modelling? We will answer these questions in this module. The concept of hypothesis testing can appear counter-intuitive at first, but this is the common “frequentist” approach of statistics; there are other approaches that are beyond the scope of this course. Pictures can help a great deal in understanding ideas such as the null and alternate hypothesis, power, types of errors, etc.

    These days most calculations for hypothesis testing, power and sample size calculations are done using statistical software. However, it is still very important that you understand the fundamental theory underlying these concepts, and what quantities influence these values, but there is less emphasis on your ability to do hand calculations.

    What you should know:

    • Write hypothesis statements
    • Choose the appropriate test
    • Interpret results of the hypothesis test

    What software can do:

    • Run the hypothesis test chosen by you (calculate test statistic, report p-value, etc.)
    • Calculate sample size given other values (e.g. desired power, effect size, etc.)

    Several resources in this module cover the same topics but with different approaches. Reading about different ways to explain the same complex concept may help you find what speaks to you best. Read as many as needed to understand key concepts in this module.

    Statistics is a hands-on discipline. The learning activity in this module asks you to run through some hypothesis tests from beginning to end. You should be able to calculate by hand and arrive at a conclusion. For more complex tests, you will likely need the assistance of statistical packages to do the bulk of the work for you.

    Competencies covered in this module:
    1. Describe preferred methodological alternatives to commonly used statistical methods when assumptions are not met.
    2. Apply common statistical methods for inference.
    3. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.

    Upon completion of this module, students should be able to:

    • Reinforce understanding of probability distributions in the context of hypothesis testing, especially in drawing conclusions and types of errors
    • Distinguish types of explanatory and response variables
    • Understand the relationship between confidence interval and hypothesis testing
    • Understand and apply hypothesis tests for a single mean and a single proportion as well as for two means (independent and paired/matched samples), and understand chi-squared test and ANOVA
    • Understand inference, estimation, and the basics of hypothesis testing
    • Understand implications of multiple testing and Bonferroni correction
    • Understand and be able to correctly interpret p-values
    • Understand the importance and implications of Type I and Type II errors
    • Understand factors that affect study power and sample size requirements, and how they impact study design
    • Summarize and describe nonparametric tests and understand the conditions under which they are applied
    • Be able (1) to apply appropriate hypothesis tests to variable types in order to explore relationships and (2) to draw conclusions based on such hypothesis testing and to interpret p-values
  • Module 4: Lesson 1: Principles of Hypothesis Testing

    Learning Objectives Icon - image of target Learning Objectives
    • Reinforce understanding of probability distributions in the context of hypothesis testing, especially in drawing conclusions and types of errors
    • Distinguish types of explanatory and response variables
    • Understand the relationship between confidence interval and hypothesis testing

    Image by Joanne Krupa, PhD, joanne.krupa@videotron.ca


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 7Quiz: 1
  • Module 4: Lesson 2: Applications of Hypothesis Testing

    Learning Objectives Icon - image of target Learning Objectives
    • Understand and apply hypothesis tests for a single mean and a single proportion as well as for two means (independent and paired/matched samples), and understand chi-squared test and ANOVA
    • Understand inference, estimation, and the basics of hypothesis testing
    • Understand implications of multiple testing and Bonferroni correction
    • Understand and be able to correctly interpret p-values
    • Understand the importance and implications of Type I and Type II errors

    Contingency table


    Approximate time required for the readings for this lesson (at 144 words/minute): 5 hours.

    URLs: 11Quiz: 1
  • Module 4: Lesson 3: Power and Sample Size

    Learning Objectives Icon - image of target Learning Objective
    • Understand factors that affect study power and sample size requirements, and how they impact study design

    PS Power & Sample Size logo


    Approximate time required for the readings for this lesson (at 144 words/minute): 2 hours.

    URLs: 5Quiz: 1
  • Module 4: Lesson 4: Nonparametric Tests

    Learning Objectives Icon - image of target Learning Objectives
    • Summarize and describe nonparametric tests and understand the conditions under which they are applied
    • Be able (1) to apply appropriate hypothesis tests to variable types in order to explore relationships and (2) to draw conclusions based on such hypothesis testing and to interpret p-values

    SkewedDistribution


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 5Workshop: 1Quiz: 1
  • Module 5: Regression Analysis

    Linear least squares example2

    In the previous module, we learned methods we could use to decide whether to reject a null hypothesis based on the evidence we have. What if we wanted more than a simple yes/no answer, and want to be able to precisely predict how the outcome changes as the predictors change? This is the reason we use regression models, as we will learn in this module. This module has four lessons, and the division is based on the nature of the outcome variable: if it is a continuous variable, we use simple or multiple linear regression; if it is a binary categorical variable, we use logistic regression. These are some very common and basic statistical models; there are many more regression methods in the field, but learning the basic concepts well will help you quickly pick up other methods when you need to. You will notice that some concepts cross over from epidemiology, such as risk, odds, confounding, bias, etc. It is helpful to have some prior knowledge in epidemiology, but if you don’t, these concepts will also be explained in this module.

    Resources on regression can vary a great deal in their depth and complexity. Don’t let the math scare you. Focus on the main ideas to start and add the mathematical details as you continue progressing in the topic.

    The learning activity in this module asks you to go through some regression models. Even more so than hypothesis testing, the tedious calculation parts are done by the computer, but you will need to answer questions about the model and using the model. These are essential skills that you will need to understand research literature and reports.

    Competencies covered in this module:
    1. Apply common statistical methods for inference.
    2. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.

    Upon completion of this module, students should be able to:

    • Understand linear relationships, outliers, and the basics of correlation
    • Understand the difference between correlation and simple linear regression, and when to apply one or the other
    • Understand homoscedasticity, and its applications to correlation and regression
    • Understand linear regression and how it relates to prediction
    • Understand multiple linear regression and its applications
    • Understand simple logistic regression analysis
    • Understand multiple logistic regression analysis and distinguish between adjusted and unadjusted regression coefficients
    • Be able (1) to distinguish between risks, absolute and relative risks, as well as odds and odds ratios, and (2) to differentiate relative risks from odds ratios and know how to conduct both methods
    • Be able (1) to distinguish between correlation, linear and multiple regression, as well as logistic regression, and (2) to understand the purpose and methods of linear (simple and multiple) and logistic regression including when to use each of them
    • Be able to specify regression models and interpret regression results
  • Module 5: Lesson 1: Simple Linear Regression Analysis

    Learning objective iconLearning Objectives
    • Understand linear relationships, outliers, and the basics of correlation
    • Understand the difference between correlation and simple linear regression, and when to apply one or the other
    • Understand homoscedasticity, and its applications to correlation and regression
    • Understand linear regression and how it relates to prediction

    Linear regression


    Approximate time required for the readings for this lesson (at 144 words/minute): 5 hours.

    URLs: 10Quiz: 1
  • Module 5: Lesson 2: Multiple Linear Regression Analysis

    Learning Objectives Icon - image of target Learning Objective
    • Understand multiple linear regression and its applications

    EnglishAchievementModel1


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 2Quiz: 1
  • Module 5: Lesson 3: Logistic Regression Analysis

    Learning Objectives Icon - image of target Learning Objectives
    • Understand simple logistic regression analysis
    • Understand multiple logistic regression analysis and distinguish between adjusted and unadjusted regression coefficients
    • Be able (1) to distinguish between risks, absolute and relative risks, as well as odds and odds ratios, and (2) to differentiate relative risks from odds ratios and know how to conduct both methods


    Approximate time required for the readings for this lesson (at 144 words/minute): 4 hours.

    URLs: 9Quiz: 1
  • Module 5: Lesson 4: Overview of Correlation and Regression Analysis

    Learning Objectives Icon - image of target Learning Objectives
    • Be able (1) to distinguish between correlation, linear and multiple regression, as well as logistic regression, and (2) to understand the purpose and methods of linear (simple and multiple) and logistic regression including when to use each of them
    • Be able to specify regression models and interpret regression results

    Visualization of errors-in-variables linear regression


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 2Workshops: 2Quiz: 1
  • Module 6: Confounding and Interactions

    Simple Confounding Case

    Health phenomena are complex. Often there are variables that interact with the variables in our primary relationship of interest. In order to understand how these other variables influence the primary association and to be sure that the effect we find actually reflects the relationship we are interested in, we can use some statistical tools to help us. This short module offers an introductory look at how to use biostatistics to identify the nature and extent of these influences. These tools are best used in conjunction with domain expertise, therefore it is important that you also understand the context of a particular research question or a relationship.

    The learning activity in this module is a mentored activity that ties together several aspects of statistical analysis in the context of a research project. We hope that through discussions with your mentor, you can further your understanding of these tools and see them in action.

    Competencies covered in this module:
    1. Describe the roles biostatistics serves in the discipline of public health.
    2. Apply common statistical methods for inference.

    Upon completion of this module, students should be able to:

    • Know the definition of a confounder, and understand the concepts of adjustment and stratification, as well the concepts of confounding and effect modification
    • Identify the statistical techniques for dealing with confounding and effect modification and their strengths and limitations
    • Be able to comment on the validity of study conclusions with respect to confounding and alternative explanations and appreciate that association neither means causation nor indicates the directionality of potential cause and effect
    • Identify potential confounders in a relationship from a theoretical perspective and understand the consequences of using faulty reasoning and improper methods in studies
    • Apply analytical statistics in the context of a research project and be able to think critically about practical application of statistical concepts
  • Module 6: Lesson 1: Confounding and Effect Modification

    Learning Objectives Icon - image of target Learning Objectives
    • Know the definition of a confounder, and understand the concepts of adjustment and stratification, as well the concepts of confounding and effect modification
    • Identify the statistical techniques for dealing with confounding and effect modification, their strengths and limitations

    Assessing the role of a confounder


    Approximate time required for the readings for this lesson (at 144 words/minute): 3 hours.

    URLs: 7Quiz: 1
  • Module 6: Lesson 2: Confounders and their Impact on Study Conclusions

    Learning Objectives Icon - image of target Learning Objectives
    • Be able to comment on the validity of study conclusions with respect to confounding and alternative explanations and appreciate that association neither means causation nor indicates the directionality of potential cause and effect
    • Identify potential confounders in a relationship from a theoretical perspective and understand the consequences of using faulty reasoning and improper methods in studies
    • Apply analytical statistics in the context of a research project and be able to think critically about practical application of statistical concepts

    Correlation vs causation


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 2Assignment: 1Quiz: 1
  • Module 7: Biostatistics in Public Health

    US timeline. Number of overdose deaths from all drugs

    As we come to the end of this course, it is time to review the major concepts in the context of public health research and practice. In this module, we will look at the importance of evidence-based decision-making, and some tips and advice on how to read public health literature to spot misleading claims and inappropriate methods. Finally, we will look at two examples of how biostatistics is used in public health initiatives that benefit the population.

    Competencies covered in this module:
    1. Describe the roles biostatistics serves in the discipline of public health.
    2. Apply descriptive and inferential methodologies according to the type of study design for answering a particular research question.
    3. Apply basic informatics techniques with vital statistics and public health records in the description of public health characteristics and in public health research and evaluation.
    4. Interpret results of statistical analyses found in public health studies.
    5. Develop written and oral presentations based on statistical analyses for both public health professionals and educated lay audiences.

    Upon completion of this module, students should be able to:

    • Understand and explain the relative strengths and limitations of biostatistics as it applies in various settings in public health.
    • Be able to detect misleading claims and inappropriate methods in research papers as well as appreciate that a statistically significant result may not be clinically significant.
    • Appreciate that many interests may influence researchers towards favourable interpretation and presentation of their findings.
    • Understand that public health programs rely on biostatistics principles and methodologies to collect, analyze, use, and present data.
    • Relate specific public health contributions to biostatistics concepts learned in this course.
  • Module 7: Lesson 1: Limitations and Misinterpretations of Biostatistics

    Learning Objectives Icon - image of target Learning Objectives
    • Understand and explain the relative strengths and limitations of biostatistics as it applies in various settings in public health
    • Be able to detect misleading claims and inappropriate methods in research papers as well as appreciate that a statistically significant result may not be clinically significant
    • Appreciate that many interests may influence researchers towards favourable interpretation and presentation of their findings

    Magnifying glass with focus on paper


    Approximate time required for the readings for this lesson (at 144 words/minute): 3 hours.
    URLs: 4Quiz: 1
  • Module 7: Lesson 2: Biostatistics in Public Health Programs

    Learning Objectives Icon - image of target Learning Objective
    • Understand that public health programs rely on biostatistics principles and methodologies to collect, analyze, use, and present data
    • Relate specific public health contributions to biostatistics concepts learned in this course

    Most common cancers - female, by mortality


    Approximate time required for the readings for this lesson (at 144 words/minute): 1 hour.

    URLs: 4Workshop: 1Assignment: 1Quiz: 1
  • Final Exam

    Quiz: 1