You can’t do data science without data, and data aren’t going to wrangle themselves…
Data wrangling is the process of getting data in whatever form they exist and, through a variety of processes, turning those data into a form that suits your current needs. We’ll talk about how to get data in several common formats into R; how to transform, manage, and manipulate data in a cohesive way using dplyr
; what it means for data to be “tidy” and how to make them so; and what to do when your data are spread across multiple tables.
The topic is made up of the following components:
It has been argued that data carpentry is a better term than data wrangling. I like the analogy to carpentry a lot, but most folks call this data wrangling anyway.
The slack channel for this topic is here.
The code that I produced working examples in lecture is here.