Tidyverse tips I forget about

The tidyverse is an opinionated collection of R packages designed for data science. I am a regular user of the tidyverse, and that means I know two things. First, almost any data manipulation, data visualization, or data modeling problem can be solved with the tidyverse. Second, I regularly have to search Google or StackOverflow to remember which package to use or the parameters of a given function.

This post is a place for me to log my own tips for the tidyverse features and functions I use regularly. This way I can come to this page and CTRL + F to remind myself how to use them.

I am also adding additional functions and features from outside the tidyverse, such as the janitor package. So while it is heavily focused on the tidyverse, it also includes some other packages I use regularly.

Getting Data
Data Cleaning Data Manipulation
Data Vis
Building models
Other Functions

Getting Data

  • readr package
    • many useful functions for getting data, such as read_csv or read_file

Data Cleaning

  • Append column names with with dplyr::rename
data.frame(col1 = 1, col2 = 2, col3 =3) %>% 
  dplyr::rename_all(function(x) paste("prefix_", x))
##   prefix_ col1 prefix_ col2 prefix_ col3
## 1            1            2            3

notice rename_at below, versus rename_all above

data.frame(col1_ignore = 1, col2_ending = 2, col3_ending =3) %>% 
  dplyr::rename_at(vars(tidyr::ends_with("ending")), function(x) paste("prefix_", x))
##   col1_ignore prefix_ col2_ending prefix_ col3_ending
## 1           1                   2                   3

Data Manipulation

  • forcats::fct_infreq()
    • I love the forcats package. There, I’ve said it. But this function relevels a factor based on the count of observations. Very helpful when setting the factor levels for dummy variables or plotting charts (where you want the bars to be order from highest to lowest, for example.)
  • tidyr::seperate() & tidyr::unite()
    • combine two columns into one or split a column into two.
  • tidyr::full_seq()
    • fills in missing data of a sequence, such as values by date where certain days have no values (i.e. NULL or NA)

Data Vis

In this section I am including things that help make data pretty for tables or output, not just graphs and more standard visualizations.

  • janitor::adorn_*
    • the janitor package also rocks much like the forcats package. I’ve used the tabyl function for quite some time, but I just found out i t has a series of functions that help round, create percents, etc., Example below.
data.frame(row = c(1,2), x = c(1,3), y = c(1,4)) %>% 
  janitor::adorn_percentages("col") %>% 
##  row     x     y
##    1 25.0% 20.0%
##    2 75.0% 80.0%
data.frame(x = 1.227, y = 2.375) %>% 
  janitor::adorn_rounding(digits = 0, skip_first_col = FALSE)
##   x y
## 1 1 2

Building models

Other Functions