R for Statistical Analysis

From Basic to Advanced Understanding

Author

Pawan

Published

03 21, 2026

Preface

Welcome to the Book

This book is an attempt to bring together three things that matter deeply to modern data analysis: rigorous statistical thinking, the R programming language, and reproducible research workflows powered by Quarto.

Whether you are a student encountering regression for the first time, a researcher who has been “copying and pasting” R code from Stack Overflow for years, or a data scientist looking to formalise your workflow, this book is written for you.

NoteWhat Makes This Book Different?

Every single output — every table, every figure, every number in the text — is generated by live R code embedded in this document. Nothing is copy-pasted. This means you can trust the outputs, and more importantly, you can change the code and see what happens.

Core Philosophy

Reproducibility First. The reproducibility crisis in science is, at least partially, a tooling crisis. When analyses live in opaque spreadsheets or undocumented scripts, they cannot be checked, extended, or trusted. Quarto provides a framework where code, prose, and output live together in a single, auditable document.

Quarto as the Engine. Quarto is not just the tool used to publish this book — it is part of the curriculum. By the time you finish, you will know how to write papers, reports, websites, and presentations that anyone can replicate from a single source file.

Dual Workflow. This book teaches the modern Tidyverse ecosystem — dplyr, ggplot2, tidyr, purrr — as the primary path. But it never leaves base R behind. Understanding base R ensures you can read legacy code, work in constrained environments, and truly understand what the tidyverse is abstracting over.

How to Read This Book

The book is divided into five parts:

Part Focus
I Foundations & Quarto Workflow
II Data Wrangling & Visualization
III Core Statistical Analysis
IV Advanced Statistical Modeling
V Advanced R Programming & Publishing

You can read linearly or jump to the chapter most relevant to your needs. Each chapter begins with a list of learning objectives and ends with exercises.

Software Requirements

To follow along with the examples, you will need:

  1. R (version 4.3 or later) — https://cran.r-project.org
  2. RStudio (version 2023.06 or later) — https://posit.co/download/rstudio-desktop/
  3. Quarto (version 1.4 or later) — https://quarto.org/docs/get-started/
# Run this once to install all packages used in this book
packages <- c(
  # Data import / wrangling
  "tidyverse", "data.table", "readxl", "haven",
  # Visualization
  "ggplot2", "plotly", "corrplot", "ggfortify",
  # Statistics & Modeling
  "skimr", "lme4", "car", "multcomp", "emmeans",
  # Regression tables
  "modelsummary", "gtsummary", "broom",
  # Multivariate
  "factoextra", "cluster",
  # Time Series
  "forecast", "tsibble", "feasts",
  # Utilities
  "renv", "here", "janitor"
)

install.packages(packages)

Conventions Used

Throughout the book, the following conventions apply:

  • code() — inline R code or function names
  • Bold — new statistical or conceptual terms at first introduction
  • Italics — emphasis or titles
  • #> — output printed to the console
TipTip Boxes

These highlight practical advice, shortcuts, or best practices.

WarningWarning Boxes

These flag common mistakes or potential confusion points.

ImportantImportant Boxes

These highlight critical concepts you must not skip.

About the Author

Pawan is an independent economic researcher and data analyst with over seven years of experience in social and economic research, focusing on India’s food and energy economy. His work spans field surveys, data analysis, visualization, and reproducible research workflows.


This book was written entirely in Quarto. The source files are available at github.com/pawan1198/r-stats-book.