06 February 2022
Welcome to a very special edition of Big Book of R updates.
For the past week, my kid has been asking to help me out with Big Book of R and created this amazing featured image. It’s an outer-space theme complete with a space-monkey (holding a banana) tethered to a rocket and a space-cow (saying “moo”) being abducted by a UFO. Beautiful!
This update also see’s the addition of a new Chapter called Data, Databases and Engineering.
Many thanks to to the many contributors to this round of additions!
Exploring Enterprise Databases with R: A Tidyverse Approach
by John David Smith, Sophie Yang, M. Edward (Ed) Borasky, Jim Tyhurst, Scott Came, Mary Anne Thygesen
Great resource for moving from a standard R developer to incorporating R workflows into enterprise-grade technologies using Docker and Databases.
R for Data Engineers
by Greg Wilson
Years ago, Patrick Burns wrote The R Inferno, a guide to R for those who think they are in hell. Upon first encountering the language after two decades of using Python, I thought Burns was an optimist—after all, hell has rules.
I have since realized that R does too, and that they are no more confusing or contradictory than those of other programming languages. They only appear so because R draws on a tradition unfamiliar to those of us raised with derivatives of C. Counting from one, copying data rather than modifying it, lazy evaluation: to quote the other bard, these are not mad, just differently sane.
Welcome, then, to a universe where the strange will become familiar, and everything familiar, strange. Welcome, thrice welcome, to R.
R Function a Day
A book that collects (and provides an easy way to access and search) tweets from R Function A Day account that maintained for 1 year (from 24.01.2021 to 24.01.2022).
Intro to GIS and Spatial Analysis
by Manuel Gimond
A well structures book which serves as an introduction to GIS and spatial data analysis. The book is structures around the authors Introduction to GIS and Spatial Analysis course (ES214). The book provides a good introduction to working with geographical datasets and performing spatial analysis such as point pattern analysis, hypothesis testing, spatial autocorrelation and spatial interpolation,
The Grammar of Experimental Designs
by Emi Tanaka
An book about designing experiments using the eddible package.
Regression and Other Stories
by Andrew Gelman, Jennifer Hill, Aki Vehtari
Many textbooks on regression focus on theory and the simplest of examples. Real statistical problems, however, are complex and subtle. This is not a book about the theory of regression. It is a book about how to use regression to solve real problems of comparison, estimation, prediction, and causal inference. It focuses on practical issues such as sample size and missing data and a wide range of goals and techniques. It jumps right in to methods and computer code you can use fresh out of the box.
PDF is free for personal use
Library of Statistical Techniques
by Nick Huntington-Klein, Volunteers
In short, LOST is a Rosetta Stone for statistical software.
LOST is a publicly-editable website with the goal of making it easy to execute statistical techniques in statistical software.
Each page of the website contains a statistical technique — which may be an estimation method, a data manipulation or cleaning method, a method for presenting or visualizing results, or any of the other kinds of things that statistical software typically does.
For each of those techniques, the LOST page will contain code for performing that method in a variety of packages and languages. It may also contain information (or links) with thorough descriptions of the method, but the focus here is on implementation. How can you do it in your language of choice? If there are multiple ways, how are those ways different? Is the way you used to do it outdated, or does it do something unexpected? What’s the R equivalent of that command you know about in Stata or SAS, or vice versa?
Translating Stata to R
This website is for Stata users who are interested in learning R. But it could also be useful for those going the other way around. We provide side-by-side code snippets for common tasks in both Stata and R, so that users have a dictionary for navigating across the two languages.
The Hitchhiker’s Guide to Responsible Machine Learning
by Przemysɫaw Biecek, Anna Kozak, Aleksander Zawada
A graphic novel approach to responsible machine learning.
Behavior Analysis with Machine Learning Using R
by Enrique Garcia Ceja
This book aims to provide an introduction to machine learning concepts and algorithms applied to a diverse set of behavior analysis problems. It focuses on the practical aspects of solving such problems based on data collected from sensors or stored in electronic records. The included examples demonstrate how to perform several of the tasks involved during a data analysis pipeline such as: data exploration, visualization, preprocessing, representation, model training/validation, and so on. All of this, using the R programming language and real-life datasets.
Causal Inference: The Mixtape
by Scott Cunningham
Causal inference encompasses the tools that allow social scientists to determine what causes what. In a messy world, causal inference is what helps establish the causes and effects of the actions being studied—for example, the impact (or lack thereof) of increases in the minimum wage on employment, the effects of early childhood education on incarceration later in life, or the influence on economic growth of introducing malaria nets in developing regions. Scott Cunningham introduces students and practitioners to the methods necessary to arrive at meaningful answers to the questions of causation, using a range of modeling techniques and coding instructions for both the R and the Stata programming languages.
Manage your data projects like a pro with a free copy of my ebook!
Newsletter subscribers get a free copy of Project Management Fundamentals for Data Analysts worth $12, and I’ll occasionally email you about new blog posts or data-related products I make. Unsubscribe at any time.