Practical Survey Data Analysis Using RStudio & R with DHS Data for Public Health Research
This course provides a practical, step-by-step guide to Demographic and Health Survey (DHS) data analysis using R and RStudio for public health, epidemiology, and health research. It is designed for MSc, MPH, PhD students, researchers, and data analysts who want to analyze survey data and produce publication-ready tables and figures.
What you’ll learn
- Import, understand, and manage Demographic and Health Survey (DHS) data in R and RStudio.
- Perform descriptive and exploratory analysis of DHS survey data using R.
- Apply survey design concepts and conduct survey-weighted analyses in R.
- Analyze key public health topics (childhood malnutrition, maternal health, child feeding, women’s empowerment) using DHS data.
- Produce clean, reproducible, and publication-ready tables and results from DHS data.
Course Content
- Introduction to RStudio for DHS Data Analysis –> 2 lectures • 9min.
- Reading and Understanding DHS Data and Codebooks (Multi-Country) –> 6 lectures • 21min.
- Preparing Nutritional Indicators from DHS Data: Stunting, Wasting, &Underweigt –> 13 lectures • 1hr 9min.
- Descriptive, Univariate & Bivariate Analysis with Statistical Tests (Unweighted) –> 8 lectures • 26min.
- Descriptive, Univariate & Bivariate Analysis | Survey-Weigthed | p-value –> 7 lectures • 23min.
- Bar Diagrams for Categorical Variables in RStudio (ggplot2) –> 3 lectures • 22min.
- Bar Diagrams for Binary Variables in RStudio (ggplot2) –> 5 lectures • 41min.
- Box Plots for Continuous Variables in RStudio (ggplot2) –> 5 lectures • 22min.
- Logistic Regression in RStudio Using the gtsummary Package –> 10 lectures • 55min.
- Logistic Regression in RStudio Using the gtsummary Package (Sampling Weights) –> 4 lectures • 20min.
- Linear Regression in RStudio for Public Health Research –> 3 lectures • 8min.
- Survey Logistic Regression (svy) in RStudio | Cluster, Strata, Sampling weight –> 7 lectures • 28min.
- Multilevel Logistic Regression in RStudio (glmer + gtsummary) –> 6 lectures • 24min.
Requirements
This course provides a practical, step-by-step guide to Demographic and Health Survey (DHS) data analysis using R and RStudio for public health, epidemiology, and health research. It is designed for MSc, MPH, PhD students, researchers, and data analysts who want to analyze survey data and produce publication-ready tables and figures.
You will learn how to work with DHS/NFHS-type survey datasets using RStudio, starting from data preparation and variable modification to descriptive, univariate, and bivariate analysis. The course covers both unweighted and survey-weighted (svy) analysis, including correct handling of sampling weights, clusters, and strata.
Key topics include descriptive statistics, chi-square tests, t-tests, bar diagrams, box plots, and logistic regression for binary outcomes such as stunting, underweight, and wasting. You will learn how to estimate unadjusted and adjusted odds ratios, change reference categories, interpret results correctly, and generate publication-ready tables using the gtsummary package.
The course also emphasizes data visualization with ggplot2, showing how to create clean, professional graphs suitable for theses, reports, and journal articles. You will learn how to export tables and figures to Microsoft Word while preserving formatting.
This is a hands-on, applied course, using real DHS-style data examples from countries such as Bangladesh, India, Nepal, Ethiopia, Nigeria, Kenya, and Tanzania. Advanced topics such as GEE, multilevel models, and longitudinal analysis will be added progressively.
By the end of this course, you will be confident in analyzing survey data in RStudio and producing results ready for academic publication and policy research.