R for Statistical Analysis
From Basic to Advanced Understanding
Preface
Welcome to the Book
This book is an attempt to bring together three things that matter deeply to modern data analysis: rigorous statistical thinking, the R programming language, and reproducible research workflows powered by Quarto.
Whether you are a student encountering regression for the first time, a researcher who has been “copying and pasting” R code from Stack Overflow for years, or a data scientist looking to formalise your workflow, this book is written for you.
Core Philosophy
Reproducibility First. The reproducibility crisis in science is, at least partially, a tooling crisis. When analyses live in opaque spreadsheets or undocumented scripts, they cannot be checked, extended, or trusted. Quarto provides a framework where code, prose, and output live together in a single, auditable document.
Quarto as the Engine. Quarto is not just the tool used to publish this book — it is part of the curriculum. By the time you finish, you will know how to write papers, reports, websites, and presentations that anyone can replicate from a single source file.
Dual Workflow. This book teaches the modern Tidyverse ecosystem — dplyr, ggplot2, tidyr, purrr — as the primary path. But it never leaves base R behind. Understanding base R ensures you can read legacy code, work in constrained environments, and truly understand what the tidyverse is abstracting over.
How to Read This Book
The book is divided into five parts:
| Part | Focus |
|---|---|
| I | Foundations & Quarto Workflow |
| II | Data Wrangling & Visualization |
| III | Core Statistical Analysis |
| IV | Advanced Statistical Modeling |
| V | Advanced R Programming & Publishing |
You can read linearly or jump to the chapter most relevant to your needs. Each chapter begins with a list of learning objectives and ends with exercises.
Software Requirements
To follow along with the examples, you will need:
- R (version 4.3 or later) — https://cran.r-project.org
- RStudio (version 2023.06 or later) — https://posit.co/download/rstudio-desktop/
- Quarto (version 1.4 or later) — https://quarto.org/docs/get-started/
# Run this once to install all packages used in this book
packages <- c(
# Data import / wrangling
"tidyverse", "data.table", "readxl", "haven",
# Visualization
"ggplot2", "plotly", "corrplot", "ggfortify",
# Statistics & Modeling
"skimr", "lme4", "car", "multcomp", "emmeans",
# Regression tables
"modelsummary", "gtsummary", "broom",
# Multivariate
"factoextra", "cluster",
# Time Series
"forecast", "tsibble", "feasts",
# Utilities
"renv", "here", "janitor"
)
install.packages(packages)Conventions Used
Throughout the book, the following conventions apply:
-
code()— inline R code or function names - Bold — new statistical or conceptual terms at first introduction
- Italics — emphasis or titles
-
#>— output printed to the console