Appendix C: Common R Packages for Data Science

This appendix catalogs the R packages used throughout this book, organised by task, with brief descriptions and installation commands.

Installing All Book Packages

# Run this once to install every package used in this book
pkgs <- c(
  # Core tidyverse
  "tidyverse", "tibble", "dplyr", "tidyr", "readr",
  "ggplot2", "purrr", "stringr", "forcats", "lubridate",

  # Data import
  "readxl", "haven", "jsonlite", "googlesheets4",
  "writexl", "DBI", "RSQLite", "RPostgres",

  # Data cleaning
  "janitor", "skimr", "naniar",

  # Visualisation
  "plotly", "corrplot", "GGally", "ggfortify",
  "patchwork", "scales", "RColorBrewer", "viridis",
  "ggthemes", "ggrepel",

  # Statistics
  "moments", "effectsize", "car", "multcomp", "emmeans",

  # Regression and modelling
  "broom", "modelsummary", "gtsummary", "gt",
  "lme4", "broom.mixed", "MASS", "performance",

  # Multivariate
  "factoextra", "cluster", "FactoMineR",

  # Time series
  "forecast", "tseries", "tsibble", "feasts", "fable",

  # Machine learning
  "caret", "tidymodels",

  # Functional programming / utilities
  "purrr", "furrr", "bench", "microbenchmark",
  "here", "renv", "fs", "glue",

  # Quarto / reporting
  "knitr", "rmarkdown", "quarto",

  # Debugging / style
  "lintr", "styler",

  # Datasets
  "gapminder", "palmerpenguins", "nycflights13"
)

# Install packages not already installed
new_pkgs <- pkgs[!pkgs %in% installed.packages()[, "Package"]]
if (length(new_pkgs)) install.packages(new_pkgs)

Package Catalogue

Data Import and Export

Table 1: Packages for data import and export.
Package Purpose Install
readr Read CSV, TSV, and delimited text files tidyverse
readxl Read Excel files (.xls, .xlsx) without Java tidyverse
haven Read SPSS (.sav), Stata (.dta), SAS (.sas7bdat) files install.packages(‘haven’)
writexl Write data frames to Excel without Java install.packages(‘writexl’)
jsonlite Parse and generate JSON install.packages(‘jsonlite’)
DBI Unified database interface install.packages(‘DBI’)
RSQLite Interface to SQLite databases install.packages(‘RSQLite’)
RPostgres Interface to PostgreSQL databases install.packages(‘RPostgres’)
googlesheets4 Read and write Google Sheets install.packages(‘googlesheets4’)
httr2 HTTP requests for web APIs install.packages(‘httr2’)

Data Wrangling

Table 2: Packages for data manipulation and cleaning.
Package Purpose
dplyr Core data manipulation: filter, select, mutate, summarise
tidyr Reshape data: pivot_longer, pivot_wider, separate, unite
data.table High-performance data manipulation for large datasets
janitor Clean data frame names and values (clean_names)
skimr Rich summary statistics with skim()
lubridate Parse, manipulate, and arithmetic on dates and times
stringr Consistent string manipulation functions
forcats Tools for working with categorical (factor) variables
naniar Visualise and analyse missing data patterns
validate Validate data against rules

Visualisation

Table 3: Packages for data visualisation.
Package Purpose
ggplot2 Grammar of Graphics — the core visualisation package
plotly Interactive charts; converts ggplot2 with ggplotly()
patchwork Combine multiple ggplot2 plots with / and |
corrplot Visualise correlation matrices
GGally Pairs plots, parallel coordinates, and ggplot2 extensions
ggrepel Non-overlapping text labels for scatter plots
ggthemes Additional themes (Economist, Tufte, FiveThirtyEight, …)
viridis Perceptually uniform colour palettes (colourblind-safe)
scales Format axis labels (dollar, percent, comma, …)
ggfortify autoplot() for time series, PCA, survival objects
leaflet Interactive choropleth and point maps

Statistical Modelling

Table 4: Packages for statistical modelling and inference.
Package Purpose
stats Base R statistics: lm, glm, t.test, aov, etc.
car Companion to Applied Regression: ANOVA, VIF, leveneTest
multcomp Multiple comparisons and simultaneous inference
emmeans Estimated marginal means for factorial designs
lme4 Linear and generalised linear mixed models
MASS Negative binomial regression, stepwise selection, LDA
survival Survival analysis: Kaplan-Meier, Cox regression
nlme Mixed effects models with correlation structures
mgcv Generalised additive models (GAMs)
glmnet LASSO, Ridge, and Elastic Net regression
broom Tidy model outputs: tidy, glance, augment
modelsummary Publication-quality regression tables
gtsummary Clinical summary tables with gtsummary
effectsize Standardised effect sizes (Cohen’s d, eta-squared, …)
performance Model quality indices: R², AIC, ICC, RMSE
see Visualisation for easystats packages

Time Series

Table 5: Packages for time series analysis and forecasting.
Package Purpose
forecast ARIMA, ETS, Holt-Winters forecasting; auto.arima()
tseries Unit root tests (adf.test), ARCH effects
tsibble Tidy time series data structure
feasts Feature extraction and visualisation for tsibble
fable Tidy forecasting framework (ARIMA, ETS, …)
prophet Facebook’s additive forecasting model (trends + seasonality)
xts Extensible time series for irregular data
zoo Ordered observations for irregular time series

Reproducibility and Workflow

Table 6: Packages for reproducible research workflows.
Package Purpose
renv Reproducible package environments with lockfile
here File paths relative to project root
targets Make-style pipeline for complex, cached workflows
quarto Render Quarto documents from R
knitr R code chunks in documents; kable() for tables
rmarkdown Dynamic documents with R Markdown
lintr Static code analysis and style checking
styler Automatic code reformatting to tidyverse style
usethis Automate project and package setup tasks
devtools Package development tools

Datasets

Table 7: Packages that provide datasets for learning and examples.
Package Key Datasets
gapminder gapminder: GDP, life expectancy, population for 142 countries, 1952–2007
palmerpenguins penguins: measurements for 344 penguins of 3 species
nycflights13 flights, airlines, airports, planes, weather (2013)
ISLR2 Auto, Boston, Caravan, Wage, and many others (ISLR textbook)
wooldridge 100+ datasets for Wooldridge’s Econometrics textbook
AER CPS1985, Fatalities, PSID, and many applied econometrics datasets
datasets mtcars, iris, airquality, USArrests, etc. (built into R)

Package Discovery

Finding the right package for a new task:


Getting Package Help

# Built-in help
?dplyr::filter
help(package = "ggplot2")
vignette("dplyr")                  # Package tutorial
vignette(package = "tidyr")        # List all vignettes

# Check package version
packageVersion("ggplot2")

# List all functions in a package
ls("package:dplyr")

# Package news (changelog)
news(package = "ggplot2")