The tidyverse
is an opinionated collection of R packages designed for data science. I am a regular user of the tidyverse
, and that means I know two things. First, almost any data manipulation, data visualization, or data modeling problem can be solved with the tidyverse
. Second, I regularly have to search Google or StackOverflow to remember which package to use or the parameters of a given function.
This post is a place for me to log my own tips for the tidyverse
features and functions I use regularly. This way I can come to this page and CTRL + F to remind myself how to use them.
I am also adding additional functions and features from outside the tidyverse
, such as the janitor
package. So while it is heavily focused on the tidyverse
, it also includes some other packages I use regularly.
Getting Data
Data Cleaning
Data Manipulation
Data Vis
Building models
Other Functions
dplyr::rename
data.frame(col1 = 1, col2 = 2, col3 =3) %>%
dplyr::rename_all(function(x) paste("prefix_", x))
## prefix_ col1 prefix_ col2 prefix_ col3
## 1 1 2 3
notice rename_at
below, versus rename_all
above
data.frame(col1_ignore = 1, col2_ending = 2, col3_ending =3) %>%
dplyr::rename_at(vars(tidyr::ends_with("ending")), function(x) paste("prefix_", x))
## col1_ignore prefix_ col2_ending prefix_ col3_ending
## 1 1 2 3
forcats::fct_infreq()
forcats
package. There, I’ve said it. But this function relevels a factor based on the count of observations. Very helpful when setting the factor levels for dummy variables or plotting charts (where you want the bars to be order from highest to lowest, for example.)tidyr::seperate()
& tidyr::unite()
tidyr::full_seq()
In this section I am including things that help make data pretty for tables or output, not just graphs and more standard visualizations.
janitor::adorn_*
janitor
package also rocks much like the forcats
package. I’ve used the tabyl
function for quite some time, but I just found out i t has a series of functions that help round, create percents, etc., Example below.data.frame(row = c(1,2), x = c(1,3), y = c(1,4)) %>%
janitor::adorn_percentages("col") %>%
janitor::adorn_pct_formatting()
## row x y
## 1 25.0% 20.0%
## 2 75.0% 80.0%
data.frame(x = 1.227, y = 2.375) %>%
janitor::adorn_rounding(digits = 0, skip_first_col = FALSE)
## x y
## 1 1 2
tidymodels
package
tidymodels