class: center, middle, inverse, title-slide # Intro to the Tidyverse ### Thomas Mock ### RStudio, Inc. ### updated: 2020-06-11 --- # Today's Agenda - RStudio - Writing code - File manipulation - Package control -- - R coding basics - Math - Assignment - Functions - Load and install packages -- - `Tidyverse` - Read data in with `readr` - Tidy data with`tidyr` - Transform data with `dplyr` - Plot data with `ggplot2` --- # RStudio RStudio is an IDE (integrated development environment) - A place to write - Console - R Scripts - R Markdown - Code Completion - Debugging -- - A place to open *things* - File and path exploration - Open plots, data, .R/.Rmd file -- - A place for projects - Self-contained structure - Consistent/easy pathing - Keep relevant files/code together with output --- class: center, middle, inverse # `RStudio Basics` --- ![](static-img/rstudio-editor.png) --- ![](static-img/R-file.png) --- ![](static-img/rmd-file.png) --- # RStudio Diagnostics The script editor highlights syntax errors ![](static-img/rstudio-diagnostic.png) -- Hover over the cross to see the problem ![](static-img/rstudio-diagnostic-tip.png) -- RStudio also warns about potential problems ![](static-img/rstudio-diagnostic-warn.png) --- # Everyone makes mistakes! Errors are *ok*, it happens to everyone! ```r my_variable <- x * 3.14 my_variab1e ``` ``` ## Warning: # Error: object 'my_variab1e' not found ``` -- ```r my_variable ``` ``` ## [1] 31.4 ``` -- * R is essentially the most aggressive spell-checker of all time! -- * Focus on what the error says and if you don't understand it, Googling can often help! --- class: center, middle, inverse # R Code Basics --- # R Code Basics Assignment .pull-left[ ```r x <- 3 * 4 x ``` ``` ## [1] 12 ``` ```r y <- 5 y * x ``` ``` ## [1] 60 ``` ] -- .pull-right[ ```r tmp_df <- data.frame( col_1 = c(1, 2, 3), col_2 = c("a", "b", "c") ) tmp_df ``` ``` ## col_1 col_2 ## 1 1 a ## 2 2 b ## 3 3 c ``` ] --- # Functions A function is essentially shorthand to call specific code ```r function_name(arg1, arg2, arg3) ``` -- ```r seq(from, to, by, length.out, along.with) ``` -- ```r seq(from = 10, to = 100, by = 10) ``` ``` ## [1] 10 20 30 40 50 60 70 80 90 100 ``` -- ```r seq(10, 100, 10) ``` ``` ## [1] 10 20 30 40 50 60 70 80 90 100 ``` -- ```r result_out <- seq(10, 100, length.out = 5) result_out ``` ``` ## [1] 10.0 32.5 55.0 77.5 100.0 ``` --- # The `%>%` `==` and then Rather than multiple assignment or nesting functions ```r did_something <- do_something(data) did_another_thing <- do_another_thing(did_something) final_thing <- do_last_thing(did_another_thing) ``` -- ```r final_thing <- do_last_thing( do_another_thing( do_something( data ) ) ) ``` -- ```r final_thing <- data %>% do_something() %>% do_another_thing() %>% do_last_thing() ``` --- # The Pipe `%>%` ```r data %>% do_something(.) %>% do_another_thing(.) %>% do_last_thing(.) ``` -- `do_something(data)` is equivalent to: -- * `data %>% do_something(data = .)` -- * `data %>% do_something(.)` -- * `data %>% do_something()` --- ```r data_in <- seq(10, 100, by = 10) result_out <- mean(data_in) result_out ``` ``` ## [1] 55 ``` -- ```r mean(seq(10,100, by = 10)) ``` ``` ## [1] 55 ``` -- ```r seq(10, 100, by = 10) %>% mean() ``` ``` ## [1] 55 ``` -- ```r mean_output <- seq(10, 100, by = 10) %>% mean() mean_output ``` ``` ## [1] 55 ``` --- ## About the penguins ```r penguins <- palmerpenguins::penguins penguins %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 7 ## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… ## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen",… ## $ culmen_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,… ## $ culmen_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,… ## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18… ## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,… ## $ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "… ``` --- # More Complex Example ```r penguins %>% filter(species == "Adelie" & !is.na(sex)) %>% group_by(sex, island) %>% summarize(mean = mean(body_mass_g, na.rm = TRUE)) ``` -- ``` ## `summarise()` regrouping output by 'sex' (override with `.groups` argument) ``` ``` ## # A tibble: 6 x 3 ## # Groups: sex [2] ## sex island mean ## <chr> <chr> <dbl> ## 1 FEMALE Biscoe 3369. ## 2 FEMALE Dream 3344. ## 3 FEMALE Torgersen 3396. ## 4 MALE Biscoe 4050 ## 5 MALE Dream 4046. ## 6 MALE Torgersen 4035. ``` --- class: center, middle background-image: url("static-img/tidyverse-default.png") --- # `Tidyverse` An opinionated collection of R packages for data science. <br> -- All packages share an underlying design philosophy, grammar, and data structures -- * Core packages - `readr`, `tidyr`, `dplyr`, `ggplot2` -- ![](static-img/tidy1.png) --- # `Tidyverse` `Tidyverse` is an R package, as such you need to do two things to be able to use it * `install.packages("tidyverse")` * This downloads and installs the `tidyverse` -- * `library(tidyverse)` * This loads and gives you access to the `tidyverse` package --- # `tidyverse` Core Principles * Built around `data` - usually as a `data.frame` or `tibble` * Built around `tidy` data * Each `variable` in its own `column` * Each `observation` or `case` in its own `row` * Each type of observational units forms a table -- ![](static-img/tidy-1.png) --- # Untidy data ```r untidy_df ``` ``` ## # A tibble: 5 x 7 *## age_group male_2016 female_2016 male_2017 female_2017 male_2018 female_2018 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 < 18 22000 20000 22000 20000 22000 20000 ## 2 18-30 36000 35000 36000 35000 36000 35000 ## 3 31-50 50000 40000 50000 40000 50000 40000 ## 4 51-60 62000 60000 62000 60000 62000 60000 ## 5 > 60 75000 72000 75000 72000 75000 72000 ``` --- # Tidy data ```r tidy_df ``` ``` ## # A tibble: 30 x 4 ## age_group gender year income ## <chr> <chr> <chr> <dbl> *## 1 < 18 male 2016 22000 ## 2 18-30 male 2016 36000 ## 3 31-50 male 2016 50000 ## 4 51-60 male 2016 62000 ## 5 > 60 male 2016 75000 *## 6 < 18 female 2016 20000 ## 7 18-30 female 2016 35000 ## 8 31-50 female 2016 40000 ## 9 51-60 female 2016 60000 ## 10 > 60 female 2016 72000 ## # … with 20 more rows ``` --- # Tidy the data ```r untidy_df %>% pivot_longer(cols = male_2016:female_2018, names_to = "gender_year", values_to = "income") %>% * separate(gender_year, into = c("gender", "year")) ``` ``` ## # A tibble: 30 x 4 ## age_group gender year income ## <chr> <chr> <chr> <dbl> *## 1 < 18 male 2016 22000 ## 2 < 18 female 2016 20000 ## 3 < 18 male 2017 22000 ## 4 < 18 female 2017 20000 ## 5 < 18 male 2018 22000 *## 6 < 18 female 2018 20000 ## 7 18-30 male 2016 36000 ## 8 18-30 female 2016 35000 ## 9 18-30 male 2017 36000 ## 10 18-30 female 2017 35000 ## # … with 20 more rows ``` --- # Tidy the data ```r untidy_df %>% pivot_longer( * cols = male_2016:female_2018, * names_to = c("gender", "year"), * names_pattern = "(.*)_(.*)", * values_to = "income" ) ``` ``` ## # A tibble: 30 x 4 ## age_group gender year income ## <chr> <chr> <chr> <dbl> *## 1 < 18 male 2016 22000 ## 2 < 18 female 2016 20000 ## 3 < 18 male 2017 22000 ## 4 < 18 female 2017 20000 ## 5 < 18 male 2018 22000 *## 6 < 18 female 2018 20000 ## 7 18-30 male 2016 36000 ## 8 18-30 female 2016 35000 ## 9 18-30 male 2017 36000 ## 10 18-30 female 2017 35000 ## # … with 20 more rows ``` --- # Read data in Read in data with `readr`, `haven`, `readxl` ### `readr` * `read_csv()`, `read_tsv()`, `read_delim()` ### `haven` * `read_sas()`, `read_spss()`, `read_stata()`, `read_dta()` ### `readxl` * `read_xls()`, `read_xlsx()`, `read_excel()` --- .pull-left[ ### `dplyr` ### 6 Main verbs * `filter()` * `arrange()` * `select()` * `mutate()` * `group_by()` * `summarise()` ### Simple use * `pull()` * `n()`/`count()` * `glimpse()` ] -- .pull-right[ ### Advanced iterations * `across()` * `rowwise()` ### More info * [`dplyr.tidyverse.org`](https://dplyr.tidyverse.org/) * [`R for Data Science`](https://r4ds.had.co.nz/transform.html) <img src="static-img/hex-dplyr.png" width="40%" /> ] --- #### Meet the penguins .center[ ![](https://allisonhorst.github.io/palmerpenguins/man/figures/lter_penguins.png) ] --- .center[ ![](https://allisonhorst.github.io/palmerpenguins/man/figures/culmen_depth.png) ] --- # penguins dataset `data.frame` vs `tibble` ```r penguin_df <- palmerpenguins::penguins %>% as.data.frame() class(penguin_df) ``` ``` ## [1] "data.frame" ``` -- ```r penguins <- as_tibble(penguin_df) class(penguins) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` --- # penguins dataset `data.frame` vs `tibble` ```r penguin_df ``` ``` *## species island culmen_length_mm culmen_depth_mm flipper_length_mm *## 1 Adelie Torgersen 39.1 18.7 181 *## 2 Adelie Torgersen 39.5 17.4 186 ## 3 Adelie Torgersen 40.3 18.0 195 ## 4 Adelie Torgersen NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 ## 6 Adelie Torgersen 39.3 20.6 190 ## 7 Adelie Torgersen 38.9 17.8 181 ## 8 Adelie Torgersen 39.2 19.6 195 ## 9 Adelie Torgersen 34.1 18.1 193 ## 10 Adelie Torgersen 42.0 20.2 190 ## 11 Adelie Torgersen 37.8 17.1 186 ## 12 Adelie Torgersen 37.8 17.3 180 ## 13 Adelie Torgersen 41.1 17.6 182 ## 14 Adelie Torgersen 38.6 21.2 191 ## 15 Adelie Torgersen 34.6 21.1 198 ## 16 Adelie Torgersen 36.6 17.8 185 ## 17 Adelie Torgersen 38.7 19.0 195 ## 18 Adelie Torgersen 42.5 20.7 197 ## 19 Adelie Torgersen 34.4 18.4 184 ## 20 Adelie Torgersen 46.0 21.5 194 ## 21 Adelie Biscoe 37.8 18.3 174 ## 22 Adelie Biscoe 37.7 18.7 180 ## 23 Adelie Biscoe 35.9 19.2 189 ## 24 Adelie Biscoe 38.2 18.1 185 ## 25 Adelie Biscoe 38.8 17.2 180 ## 26 Adelie Biscoe 35.3 18.9 187 ## 27 Adelie Biscoe 40.6 18.6 183 ## 28 Adelie Biscoe 40.5 17.9 187 ## 29 Adelie Biscoe 37.9 18.6 172 ## 30 Adelie Biscoe 40.5 18.9 180 ## 31 Adelie Dream 39.5 16.7 178 ## 32 Adelie Dream 37.2 18.1 178 ## 33 Adelie Dream 39.5 17.8 188 ## 34 Adelie Dream 40.9 18.9 184 ## 35 Adelie Dream 36.4 17.0 195 ## 36 Adelie Dream 39.2 21.1 196 ## 37 Adelie Dream 38.8 20.0 190 ## 38 Adelie Dream 42.2 18.5 180 ## 39 Adelie Dream 37.6 19.3 181 ## 40 Adelie Dream 39.8 19.1 184 ## 41 Adelie Dream 36.5 18.0 182 ## 42 Adelie Dream 40.8 18.4 195 ## 43 Adelie Dream 36.0 18.5 186 ## 44 Adelie Dream 44.1 19.7 196 ## 45 Adelie Dream 37.0 16.9 185 ## 46 Adelie Dream 39.6 18.8 190 ## 47 Adelie Dream 41.1 19.0 182 ## 48 Adelie Dream 37.5 18.9 179 ## 49 Adelie Dream 36.0 17.9 190 ## 50 Adelie Dream 42.3 21.2 191 ## 51 Adelie Biscoe 39.6 17.7 186 ## 52 Adelie Biscoe 40.1 18.9 188 ## 53 Adelie Biscoe 35.0 17.9 190 ## 54 Adelie Biscoe 42.0 19.5 200 ## 55 Adelie Biscoe 34.5 18.1 187 ## 56 Adelie Biscoe 41.4 18.6 191 ## 57 Adelie Biscoe 39.0 17.5 186 ## 58 Adelie Biscoe 40.6 18.8 193 ## 59 Adelie Biscoe 36.5 16.6 181 ## 60 Adelie Biscoe 37.6 19.1 194 ## 61 Adelie Biscoe 35.7 16.9 185 ## 62 Adelie Biscoe 41.3 21.1 195 ## 63 Adelie Biscoe 37.6 17.0 185 ## 64 Adelie Biscoe 41.1 18.2 192 ## 65 Adelie Biscoe 36.4 17.1 184 ## 66 Adelie Biscoe 41.6 18.0 192 ## 67 Adelie Biscoe 35.5 16.2 195 ## 68 Adelie Biscoe 41.1 19.1 188 ## 69 Adelie Torgersen 35.9 16.6 190 ## 70 Adelie Torgersen 41.8 19.4 198 ## 71 Adelie Torgersen 33.5 19.0 190 ## 72 Adelie Torgersen 39.7 18.4 190 ## 73 Adelie Torgersen 39.6 17.2 196 ## 74 Adelie Torgersen 45.8 18.9 197 ## 75 Adelie Torgersen 35.5 17.5 190 ## 76 Adelie Torgersen 42.8 18.5 195 ## 77 Adelie Torgersen 40.9 16.8 191 ## 78 Adelie Torgersen 37.2 19.4 184 ## 79 Adelie Torgersen 36.2 16.1 187 ## 80 Adelie Torgersen 42.1 19.1 195 ## 81 Adelie Torgersen 34.6 17.2 189 ## 82 Adelie Torgersen 42.9 17.6 196 ## 83 Adelie Torgersen 36.7 18.8 187 ## 84 Adelie Torgersen 35.1 19.4 193 ## 85 Adelie Dream 37.3 17.8 191 ## 86 Adelie Dream 41.3 20.3 194 ## 87 Adelie Dream 36.3 19.5 190 ## 88 Adelie Dream 36.9 18.6 189 ## 89 Adelie Dream 38.3 19.2 189 ## 90 Adelie Dream 38.9 18.8 190 ## 91 Adelie Dream 35.7 18.0 202 ## 92 Adelie Dream 41.1 18.1 205 ## 93 Adelie Dream 34.0 17.1 185 ## 94 Adelie Dream 39.6 18.1 186 ## 95 Adelie Dream 36.2 17.3 187 ## 96 Adelie Dream 40.8 18.9 208 ## 97 Adelie Dream 38.1 18.6 190 ## 98 Adelie Dream 40.3 18.5 196 ## 99 Adelie Dream 33.1 16.1 178 ## 100 Adelie Dream 43.2 18.5 192 ## 101 Adelie Biscoe 35.0 17.9 192 ## 102 Adelie Biscoe 41.0 20.0 203 ## 103 Adelie Biscoe 37.7 16.0 183 ## 104 Adelie Biscoe 37.8 20.0 190 ## 105 Adelie Biscoe 37.9 18.6 193 ## 106 Adelie Biscoe 39.7 18.9 184 ## 107 Adelie Biscoe 38.6 17.2 199 ## 108 Adelie Biscoe 38.2 20.0 190 ## 109 Adelie Biscoe 38.1 17.0 181 ## 110 Adelie Biscoe 43.2 19.0 197 ## 111 Adelie Biscoe 38.1 16.5 198 ## 112 Adelie Biscoe 45.6 20.3 191 ## 113 Adelie Biscoe 39.7 17.7 193 ## 114 Adelie Biscoe 42.2 19.5 197 ## 115 Adelie Biscoe 39.6 20.7 191 ## 116 Adelie Biscoe 42.7 18.3 196 ## 117 Adelie Torgersen 38.6 17.0 188 ## 118 Adelie Torgersen 37.3 20.5 199 ## 119 Adelie Torgersen 35.7 17.0 189 ## 120 Adelie Torgersen 41.1 18.6 189 ## 121 Adelie Torgersen 36.2 17.2 187 ## 122 Adelie Torgersen 37.7 19.8 198 ## 123 Adelie Torgersen 40.2 17.0 176 ## 124 Adelie Torgersen 41.4 18.5 202 ## 125 Adelie Torgersen 35.2 15.9 186 ## 126 Adelie Torgersen 40.6 19.0 199 ## 127 Adelie Torgersen 38.8 17.6 191 ## 128 Adelie Torgersen 41.5 18.3 195 ## 129 Adelie Torgersen 39.0 17.1 191 ## 130 Adelie Torgersen 44.1 18.0 210 ## 131 Adelie Torgersen 38.5 17.9 190 ## 132 Adelie Torgersen 43.1 19.2 197 ## 133 Adelie Dream 36.8 18.5 193 ## 134 Adelie Dream 37.5 18.5 199 ## 135 Adelie Dream 38.1 17.6 187 ## 136 Adelie Dream 41.1 17.5 190 ## 137 Adelie Dream 35.6 17.5 191 ## 138 Adelie Dream 40.2 20.1 200 ## 139 Adelie Dream 37.0 16.5 185 ## 140 Adelie Dream 39.7 17.9 193 ## 141 Adelie Dream 40.2 17.1 193 ## 142 Adelie Dream 40.6 17.2 187 ## 143 Adelie Dream 32.1 15.5 188 ## 144 Adelie Dream 40.7 17.0 190 ## 145 Adelie Dream 37.3 16.8 192 ## 146 Adelie Dream 39.0 18.7 185 ## 147 Adelie Dream 39.2 18.6 190 ## 148 Adelie Dream 36.6 18.4 184 ## 149 Adelie Dream 36.0 17.8 195 ## 150 Adelie Dream 37.8 18.1 193 ## 151 Adelie Dream 36.0 17.1 187 ## 152 Adelie Dream 41.5 18.5 201 ## 153 Chinstrap Dream 46.5 17.9 192 ## 154 Chinstrap Dream 50.0 19.5 196 ## 155 Chinstrap Dream 51.3 19.2 193 ## 156 Chinstrap Dream 45.4 18.7 188 ## 157 Chinstrap Dream 52.7 19.8 197 ## 158 Chinstrap Dream 45.2 17.8 198 ## 159 Chinstrap Dream 46.1 18.2 178 ## 160 Chinstrap Dream 51.3 18.2 197 ## 161 Chinstrap Dream 46.0 18.9 195 ## 162 Chinstrap Dream 51.3 19.9 198 ## 163 Chinstrap Dream 46.6 17.8 193 ## 164 Chinstrap Dream 51.7 20.3 194 ## 165 Chinstrap Dream 47.0 17.3 185 ## 166 Chinstrap Dream 52.0 18.1 201 ## 167 Chinstrap Dream 45.9 17.1 190 ## 168 Chinstrap Dream 50.5 19.6 201 ## 169 Chinstrap Dream 50.3 20.0 197 ## 170 Chinstrap Dream 58.0 17.8 181 ## 171 Chinstrap Dream 46.4 18.6 190 ## 172 Chinstrap Dream 49.2 18.2 195 ## 173 Chinstrap Dream 42.4 17.3 181 ## 174 Chinstrap Dream 48.5 17.5 191 ## 175 Chinstrap Dream 43.2 16.6 187 ## 176 Chinstrap Dream 50.6 19.4 193 ## 177 Chinstrap Dream 46.7 17.9 195 ## 178 Chinstrap Dream 52.0 19.0 197 ## 179 Chinstrap Dream 50.5 18.4 200 ## 180 Chinstrap Dream 49.5 19.0 200 ## 181 Chinstrap Dream 46.4 17.8 191 ## 182 Chinstrap Dream 52.8 20.0 205 ## 183 Chinstrap Dream 40.9 16.6 187 ## 184 Chinstrap Dream 54.2 20.8 201 ## 185 Chinstrap Dream 42.5 16.7 187 ## 186 Chinstrap Dream 51.0 18.8 203 ## 187 Chinstrap Dream 49.7 18.6 195 ## 188 Chinstrap Dream 47.5 16.8 199 ## 189 Chinstrap Dream 47.6 18.3 195 ## 190 Chinstrap Dream 52.0 20.7 210 ## 191 Chinstrap Dream 46.9 16.6 192 ## 192 Chinstrap Dream 53.5 19.9 205 ## 193 Chinstrap Dream 49.0 19.5 210 ## 194 Chinstrap Dream 46.2 17.5 187 ## 195 Chinstrap Dream 50.9 19.1 196 ## 196 Chinstrap Dream 45.5 17.0 196 ## 197 Chinstrap Dream 50.9 17.9 196 ## 198 Chinstrap Dream 50.8 18.5 201 ## 199 Chinstrap Dream 50.1 17.9 190 ## 200 Chinstrap Dream 49.0 19.6 212 ## 201 Chinstrap Dream 51.5 18.7 187 ## 202 Chinstrap Dream 49.8 17.3 198 ## 203 Chinstrap Dream 48.1 16.4 199 ## 204 Chinstrap Dream 51.4 19.0 201 ## 205 Chinstrap Dream 45.7 17.3 193 ## 206 Chinstrap Dream 50.7 19.7 203 ## 207 Chinstrap Dream 42.5 17.3 187 ## 208 Chinstrap Dream 52.2 18.8 197 ## 209 Chinstrap Dream 45.2 16.6 191 ## 210 Chinstrap Dream 49.3 19.9 203 ## 211 Chinstrap Dream 50.2 18.8 202 ## 212 Chinstrap Dream 45.6 19.4 194 ## 213 Chinstrap Dream 51.9 19.5 206 ## 214 Chinstrap Dream 46.8 16.5 189 ## 215 Chinstrap Dream 45.7 17.0 195 ## 216 Chinstrap Dream 55.8 19.8 207 ## 217 Chinstrap Dream 43.5 18.1 202 ## 218 Chinstrap Dream 49.6 18.2 193 ## 219 Chinstrap Dream 50.8 19.0 210 ## 220 Chinstrap Dream 50.2 18.7 198 ## 221 Gentoo Biscoe 46.1 13.2 211 ## 222 Gentoo Biscoe 50.0 16.3 230 ## 223 Gentoo Biscoe 48.7 14.1 210 ## 224 Gentoo Biscoe 50.0 15.2 218 ## 225 Gentoo Biscoe 47.6 14.5 215 ## 226 Gentoo Biscoe 46.5 13.5 210 ## 227 Gentoo Biscoe 45.4 14.6 211 ## 228 Gentoo Biscoe 46.7 15.3 219 ## 229 Gentoo Biscoe 43.3 13.4 209 ## 230 Gentoo Biscoe 46.8 15.4 215 ## 231 Gentoo Biscoe 40.9 13.7 214 ## 232 Gentoo Biscoe 49.0 16.1 216 ## 233 Gentoo Biscoe 45.5 13.7 214 ## 234 Gentoo Biscoe 48.4 14.6 213 ## 235 Gentoo Biscoe 45.8 14.6 210 ## 236 Gentoo Biscoe 49.3 15.7 217 ## 237 Gentoo Biscoe 42.0 13.5 210 ## 238 Gentoo Biscoe 49.2 15.2 221 ## 239 Gentoo Biscoe 46.2 14.5 209 ## 240 Gentoo Biscoe 48.7 15.1 222 ## 241 Gentoo Biscoe 50.2 14.3 218 ## 242 Gentoo Biscoe 45.1 14.5 215 ## 243 Gentoo Biscoe 46.5 14.5 213 ## 244 Gentoo Biscoe 46.3 15.8 215 ## 245 Gentoo Biscoe 42.9 13.1 215 ## 246 Gentoo Biscoe 46.1 15.1 215 ## 247 Gentoo Biscoe 44.5 14.3 216 ## 248 Gentoo Biscoe 47.8 15.0 215 ## 249 Gentoo Biscoe 48.2 14.3 210 ## 250 Gentoo Biscoe 50.0 15.3 220 ## 251 Gentoo Biscoe 47.3 15.3 222 ## 252 Gentoo Biscoe 42.8 14.2 209 ## 253 Gentoo Biscoe 45.1 14.5 207 ## 254 Gentoo Biscoe 59.6 17.0 230 ## 255 Gentoo Biscoe 49.1 14.8 220 ## 256 Gentoo Biscoe 48.4 16.3 220 ## 257 Gentoo Biscoe 42.6 13.7 213 ## 258 Gentoo Biscoe 44.4 17.3 219 ## 259 Gentoo Biscoe 44.0 13.6 208 ## 260 Gentoo Biscoe 48.7 15.7 208 ## 261 Gentoo Biscoe 42.7 13.7 208 ## 262 Gentoo Biscoe 49.6 16.0 225 ## 263 Gentoo Biscoe 45.3 13.7 210 ## 264 Gentoo Biscoe 49.6 15.0 216 ## 265 Gentoo Biscoe 50.5 15.9 222 ## 266 Gentoo Biscoe 43.6 13.9 217 ## 267 Gentoo Biscoe 45.5 13.9 210 ## 268 Gentoo Biscoe 50.5 15.9 225 ## 269 Gentoo Biscoe 44.9 13.3 213 ## 270 Gentoo Biscoe 45.2 15.8 215 ## 271 Gentoo Biscoe 46.6 14.2 210 ## 272 Gentoo Biscoe 48.5 14.1 220 ## 273 Gentoo Biscoe 45.1 14.4 210 ## 274 Gentoo Biscoe 50.1 15.0 225 ## 275 Gentoo Biscoe 46.5 14.4 217 ## 276 Gentoo Biscoe 45.0 15.4 220 ## 277 Gentoo Biscoe 43.8 13.9 208 ## 278 Gentoo Biscoe 45.5 15.0 220 ## 279 Gentoo Biscoe 43.2 14.5 208 ## 280 Gentoo Biscoe 50.4 15.3 224 ## 281 Gentoo Biscoe 45.3 13.8 208 ## 282 Gentoo Biscoe 46.2 14.9 221 ## 283 Gentoo Biscoe 45.7 13.9 214 ## 284 Gentoo Biscoe 54.3 15.7 231 ## 285 Gentoo Biscoe 45.8 14.2 219 ## 286 Gentoo Biscoe 49.8 16.8 230 ## 287 Gentoo Biscoe 46.2 14.4 214 ## 288 Gentoo Biscoe 49.5 16.2 229 ## 289 Gentoo Biscoe 43.5 14.2 220 ## 290 Gentoo Biscoe 50.7 15.0 223 ## 291 Gentoo Biscoe 47.7 15.0 216 ## 292 Gentoo Biscoe 46.4 15.6 221 ## 293 Gentoo Biscoe 48.2 15.6 221 ## 294 Gentoo Biscoe 46.5 14.8 217 ## 295 Gentoo Biscoe 46.4 15.0 216 ## 296 Gentoo Biscoe 48.6 16.0 230 ## 297 Gentoo Biscoe 47.5 14.2 209 ## 298 Gentoo Biscoe 51.1 16.3 220 ## 299 Gentoo Biscoe 45.2 13.8 215 ## 300 Gentoo Biscoe 45.2 16.4 223 ## 301 Gentoo Biscoe 49.1 14.5 212 ## 302 Gentoo Biscoe 52.5 15.6 221 ## 303 Gentoo Biscoe 47.4 14.6 212 ## 304 Gentoo Biscoe 50.0 15.9 224 ## 305 Gentoo Biscoe 44.9 13.8 212 ## 306 Gentoo Biscoe 50.8 17.3 228 ## 307 Gentoo Biscoe 43.4 14.4 218 ## 308 Gentoo Biscoe 51.3 14.2 218 ## 309 Gentoo Biscoe 47.5 14.0 212 ## 310 Gentoo Biscoe 52.1 17.0 230 ## 311 Gentoo Biscoe 47.5 15.0 218 ## 312 Gentoo Biscoe 52.2 17.1 228 ## 313 Gentoo Biscoe 45.5 14.5 212 ## 314 Gentoo Biscoe 49.5 16.1 224 ## 315 Gentoo Biscoe 44.5 14.7 214 ## 316 Gentoo Biscoe 50.8 15.7 226 ## 317 Gentoo Biscoe 49.4 15.8 216 ## 318 Gentoo Biscoe 46.9 14.6 222 ## 319 Gentoo Biscoe 48.4 14.4 203 ## 320 Gentoo Biscoe 51.1 16.5 225 ## 321 Gentoo Biscoe 48.5 15.0 219 ## 322 Gentoo Biscoe 55.9 17.0 228 ## 323 Gentoo Biscoe 47.2 15.5 215 ## 324 Gentoo Biscoe 49.1 15.0 228 ## 325 Gentoo Biscoe 47.3 13.8 216 ## 326 Gentoo Biscoe 46.8 16.1 215 ## 327 Gentoo Biscoe 41.7 14.7 210 ## 328 Gentoo Biscoe 53.4 15.8 219 ## 329 Gentoo Biscoe 43.3 14.0 208 ## 330 Gentoo Biscoe 48.1 15.1 209 ## 331 Gentoo Biscoe 50.5 15.2 216 ## 332 Gentoo Biscoe 49.8 15.9 229 ## 333 Gentoo Biscoe 43.5 15.2 213 ## 334 Gentoo Biscoe 51.5 16.3 230 ## 335 Gentoo Biscoe 46.2 14.1 217 ## 336 Gentoo Biscoe 55.1 16.0 230 ## 337 Gentoo Biscoe 44.5 15.7 217 ## 338 Gentoo Biscoe 48.8 16.2 222 ## 339 Gentoo Biscoe 47.2 13.7 214 ## 340 Gentoo Biscoe NA NA NA ## 341 Gentoo Biscoe 46.8 14.3 215 ## 342 Gentoo Biscoe 50.4 15.7 222 ## 343 Gentoo Biscoe 45.2 14.8 212 ## 344 Gentoo Biscoe 49.9 16.1 213 ## body_mass_g sex ## 1 3750 MALE ## 2 3800 FEMALE ## 3 3250 FEMALE ## 4 NA <NA> ## 5 3450 FEMALE ## 6 3650 MALE ## 7 3625 FEMALE ## 8 4675 MALE ## 9 3475 <NA> ## 10 4250 <NA> ## 11 3300 <NA> ## 12 3700 <NA> ## 13 3200 FEMALE ## 14 3800 MALE ## 15 4400 MALE ## 16 3700 FEMALE ## 17 3450 FEMALE ## 18 4500 MALE ## 19 3325 FEMALE ## 20 4200 MALE ## 21 3400 FEMALE ## 22 3600 MALE ## 23 3800 FEMALE ## 24 3950 MALE ## 25 3800 MALE ## 26 3800 FEMALE ## 27 3550 MALE ## 28 3200 FEMALE ## 29 3150 FEMALE ## 30 3950 MALE ## 31 3250 FEMALE ## 32 3900 MALE ## 33 3300 FEMALE ## 34 3900 MALE ## 35 3325 FEMALE ## 36 4150 MALE ## 37 3950 MALE ## 38 3550 FEMALE ## 39 3300 FEMALE ## 40 4650 MALE ## 41 3150 FEMALE ## 42 3900 MALE ## 43 3100 FEMALE ## 44 4400 MALE ## 45 3000 FEMALE ## 46 4600 MALE ## 47 3425 MALE ## 48 2975 <NA> ## 49 3450 FEMALE ## 50 4150 MALE ## 51 3500 FEMALE ## 52 4300 MALE ## 53 3450 FEMALE ## 54 4050 MALE ## 55 2900 FEMALE ## 56 3700 MALE ## 57 3550 FEMALE ## 58 3800 MALE ## 59 2850 FEMALE ## 60 3750 MALE ## 61 3150 FEMALE ## 62 4400 MALE ## 63 3600 FEMALE ## 64 4050 MALE ## 65 2850 FEMALE ## 66 3950 MALE ## 67 3350 FEMALE ## 68 4100 MALE ## 69 3050 FEMALE ## 70 4450 MALE ## 71 3600 FEMALE ## 72 3900 MALE ## 73 3550 FEMALE ## 74 4150 MALE ## 75 3700 FEMALE ## 76 4250 MALE ## 77 3700 FEMALE ## 78 3900 MALE ## 79 3550 FEMALE ## 80 4000 MALE ## 81 3200 FEMALE ## 82 4700 MALE ## 83 3800 FEMALE ## 84 4200 MALE ## 85 3350 FEMALE ## 86 3550 MALE ## 87 3800 MALE ## 88 3500 FEMALE ## 89 3950 MALE ## 90 3600 FEMALE ## 91 3550 FEMALE ## 92 4300 MALE ## 93 3400 FEMALE ## 94 4450 MALE ## 95 3300 FEMALE ## 96 4300 MALE ## 97 3700 FEMALE ## 98 4350 MALE ## 99 2900 FEMALE ## 100 4100 MALE ## 101 3725 FEMALE ## 102 4725 MALE ## 103 3075 FEMALE ## 104 4250 MALE ## 105 2925 FEMALE ## 106 3550 MALE ## 107 3750 FEMALE ## 108 3900 MALE ## 109 3175 FEMALE ## 110 4775 MALE ## 111 3825 FEMALE ## 112 4600 MALE ## 113 3200 FEMALE ## 114 4275 MALE ## 115 3900 FEMALE ## 116 4075 MALE ## 117 2900 FEMALE ## 118 3775 MALE ## 119 3350 FEMALE ## 120 3325 MALE ## 121 3150 FEMALE ## 122 3500 MALE ## 123 3450 FEMALE ## 124 3875 MALE ## 125 3050 FEMALE ## 126 4000 MALE ## 127 3275 FEMALE ## 128 4300 MALE ## 129 3050 FEMALE ## 130 4000 MALE ## 131 3325 FEMALE ## 132 3500 MALE ## 133 3500 FEMALE ## 134 4475 MALE ## 135 3425 FEMALE ## 136 3900 MALE ## 137 3175 FEMALE ## 138 3975 MALE ## 139 3400 FEMALE ## 140 4250 MALE ## 141 3400 FEMALE ## 142 3475 MALE ## 143 3050 FEMALE ## 144 3725 MALE ## 145 3000 FEMALE ## 146 3650 MALE ## 147 4250 MALE ## 148 3475 FEMALE ## 149 3450 FEMALE ## 150 3750 MALE ## 151 3700 FEMALE ## 152 4000 MALE ## 153 3500 FEMALE ## 154 3900 MALE ## 155 3650 MALE ## 156 3525 FEMALE ## 157 3725 MALE ## 158 3950 FEMALE ## 159 3250 FEMALE ## 160 3750 MALE ## 161 4150 FEMALE ## 162 3700 MALE ## 163 3800 FEMALE ## 164 3775 MALE ## 165 3700 FEMALE ## 166 4050 MALE ## 167 3575 FEMALE ## 168 4050 MALE ## 169 3300 MALE ## 170 3700 FEMALE ## 171 3450 FEMALE ## 172 4400 MALE ## 173 3600 FEMALE ## 174 3400 MALE ## 175 2900 FEMALE ## 176 3800 MALE ## 177 3300 FEMALE ## 178 4150 MALE ## 179 3400 FEMALE ## 180 3800 MALE ## 181 3700 FEMALE ## 182 4550 MALE ## 183 3200 FEMALE ## 184 4300 MALE ## 185 3350 FEMALE ## 186 4100 MALE ## 187 3600 MALE ## 188 3900 FEMALE ## 189 3850 FEMALE ## 190 4800 MALE ## 191 2700 FEMALE ## 192 4500 MALE ## 193 3950 MALE ## 194 3650 FEMALE ## 195 3550 MALE ## 196 3500 FEMALE ## 197 3675 FEMALE ## 198 4450 MALE ## 199 3400 FEMALE ## 200 4300 MALE ## 201 3250 MALE ## 202 3675 FEMALE ## 203 3325 FEMALE ## 204 3950 MALE ## 205 3600 FEMALE ## 206 4050 MALE ## 207 3350 FEMALE ## 208 3450 MALE ## 209 3250 FEMALE ## 210 4050 MALE ## 211 3800 MALE ## 212 3525 FEMALE ## 213 3950 MALE ## 214 3650 FEMALE ## 215 3650 FEMALE ## 216 4000 MALE ## 217 3400 FEMALE ## 218 3775 MALE ## 219 4100 MALE ## 220 3775 FEMALE ## 221 4500 FEMALE ## 222 5700 MALE ## 223 4450 FEMALE ## 224 5700 MALE ## 225 5400 MALE ## 226 4550 FEMALE ## 227 4800 FEMALE ## 228 5200 MALE ## 229 4400 FEMALE ## 230 5150 MALE ## 231 4650 FEMALE ## 232 5550 MALE ## 233 4650 FEMALE ## 234 5850 MALE ## 235 4200 FEMALE ## 236 5850 MALE ## 237 4150 FEMALE ## 238 6300 MALE ## 239 4800 FEMALE ## 240 5350 MALE ## 241 5700 MALE ## 242 5000 FEMALE ## 243 4400 FEMALE ## 244 5050 MALE ## 245 5000 FEMALE ## 246 5100 MALE ## 247 4100 <NA> ## 248 5650 MALE ## 249 4600 FEMALE ## 250 5550 MALE ## 251 5250 MALE ## 252 4700 FEMALE ## 253 5050 FEMALE ## 254 6050 MALE ## 255 5150 FEMALE ## 256 5400 MALE ## 257 4950 FEMALE ## 258 5250 MALE ## 259 4350 FEMALE ## 260 5350 MALE ## 261 3950 FEMALE ## 262 5700 MALE ## 263 4300 FEMALE ## 264 4750 MALE ## 265 5550 MALE ## 266 4900 FEMALE ## 267 4200 FEMALE ## 268 5400 MALE ## 269 5100 FEMALE ## 270 5300 MALE ## 271 4850 FEMALE ## 272 5300 MALE ## 273 4400 FEMALE ## 274 5000 MALE ## 275 4900 FEMALE ## 276 5050 MALE ## 277 4300 FEMALE ## 278 5000 MALE ## 279 4450 FEMALE ## 280 5550 MALE ## 281 4200 FEMALE ## 282 5300 MALE ## 283 4400 FEMALE ## 284 5650 MALE ## 285 4700 FEMALE ## 286 5700 MALE ## 287 4650 <NA> ## 288 5800 MALE ## 289 4700 FEMALE ## 290 5550 MALE ## 291 4750 FEMALE ## 292 5000 MALE ## 293 5100 MALE ## 294 5200 FEMALE ## 295 4700 FEMALE ## 296 5800 MALE ## 297 4600 FEMALE ## 298 6000 MALE ## 299 4750 FEMALE ## 300 5950 MALE ## 301 4625 FEMALE ## 302 5450 MALE ## 303 4725 FEMALE ## 304 5350 MALE ## 305 4750 FEMALE ## 306 5600 MALE ## 307 4600 FEMALE ## 308 5300 MALE ## 309 4875 FEMALE ## 310 5550 MALE ## 311 4950 FEMALE ## 312 5400 MALE ## 313 4750 FEMALE ## 314 5650 MALE ## 315 4850 FEMALE ## 316 5200 MALE ## 317 4925 MALE ## 318 4875 FEMALE ## 319 4625 FEMALE ## 320 5250 MALE ## 321 4850 FEMALE ## 322 5600 MALE ## 323 4975 FEMALE ## 324 5500 MALE ## 325 4725 <NA> ## 326 5500 MALE ## 327 4700 FEMALE ## 328 5500 MALE ## 329 4575 FEMALE ## 330 5500 MALE ## 331 5000 FEMALE ## 332 5950 MALE ## 333 4650 FEMALE ## 334 5500 MALE ## 335 4375 FEMALE ## 336 5850 MALE ## 337 4875 . ## 338 6000 MALE ## 339 4925 FEMALE ## 340 NA <NA> ## 341 4850 FEMALE ## 342 5750 MALE ## 343 5200 FEMALE ## 344 5400 MALE ``` --- # penguins dataset ```r penguins <- as_tibble(penguin_df) ``` ```r penguins ``` ``` *## # A tibble: 344 x 7 *## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g *## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 1 more variable: sex <chr> ``` --- # penguins dataset ```r head(penguins, 5) ``` ``` ## # A tibble: 5 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## # … with 1 more variable: sex <chr> ``` -- ```r tail(penguins, 5) ``` ``` ## # A tibble: 5 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Gentoo Biscoe NA NA NA NA ## 2 Gentoo Biscoe 46.8 14.3 215 4850 ## 3 Gentoo Biscoe 50.4 15.7 222 5750 ## 4 Gentoo Biscoe 45.2 14.8 212 5200 ## 5 Gentoo Biscoe 49.9 16.1 213 5400 ## # … with 1 more variable: sex <chr> ``` --- ### [`dplyr::slice()`](https://dplyr.tidyverse.org/reference/slice.html) ```r slice(penguins, 1:3) ``` ``` ## # A tibble: 3 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 *## 3 Adelie Torge… 40.3 18 195 3250 ## # … with 1 more variable: sex <chr> ``` -- ```r slice(penguins, 1, 3, 5) ``` ``` ## # A tibble: 3 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Adelie Torge… 39.1 18.7 181 3750 *## 2 Adelie Torge… 40.3 18 195 3250 ## 3 Adelie Torge… 36.7 19.3 193 3450 ## # … with 1 more variable: sex <chr> ``` --- #### [`dplyr::slice_min()`](https://dplyr.tidyverse.org/reference/slice.html) & [`dplyr::slice_max()`](https://dplyr.tidyverse.org/reference/slice.html) ```r # bottom 3 beak lengths slice_min(penguins, order_by = culmen_length_mm, n = 3) ``` ``` ## # A tibble: 3 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g *## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Adelie Dream 32.1 15.5 188 3050 *## 2 Adelie Dream 33.1 16.1 178 2900 ## 3 Adelie Torge… 33.5 19 190 3600 ## # … with 1 more variable: sex <chr> ``` -- ```r # top 3 beak lengths slice_max(penguins, order_by = culmen_length_mm, n = 3) ``` ``` ## # A tibble: 3 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g *## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Gentoo Biscoe 59.6 17 230 6050 *## 2 Chinst… Dream 58 17.8 181 3700 ## 3 Gentoo Biscoe 55.9 17 228 5600 ## # … with 1 more variable: sex <chr> ``` --- ### [`dplyr::slice_sample()`](https://dplyr.tidyverse.org/reference/slice.html) ```r slice_sample(penguins, n = 10) # random selection ``` ``` ## # A tibble: 10 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Dream 38.1 17.6 187 3425 ## 2 Chinst… Dream 47.6 18.3 195 3850 ## 3 Adelie Biscoe 40.1 18.9 188 4300 ## 4 Adelie Dream 36.6 18.4 184 3475 ## 5 Adelie Torge… 37.8 17.3 180 3700 ## 6 Gentoo Biscoe 45.2 16.4 223 5950 ## 7 Gentoo Biscoe 49.6 16 225 5700 ## 8 Adelie Torge… 36.7 18.8 187 3800 ## 9 Adelie Dream 44.1 19.7 196 4400 ## 10 Gentoo Biscoe 44.4 17.3 219 5250 ## # … with 1 more variable: sex <chr> ``` --- ### [`tibble::glimpse()`](https://tibble.tidyverse.org/reference/glimpse.html) ```r glimpse(penguins) ``` ``` *## Rows: 344 *## Columns: 7 ## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… ## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen",… ## $ culmen_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,… ## $ culmen_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,… ## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18… ## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,… ## $ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "… ``` --- ### Quick Pause for `Logic` Logical operators in R. `?base::Logic` - for additional details | Operator| Description| `TRUE` | `FALSE` | |:-------- |:---------- | :---------:| :----------: | | `<` | Less than | `3 < 5` | `100 < 1` | | `<=` | Less than or equal to | `2 <= 2`| `4 <= 2` | | `>` | Greater than | `5 > 3` | `1 > 100` | | `>=` | Greater than or equal to | `25 >= 25.1` | `12 >= 100` | | `==` | Exactly equal to | `"cat" == "cat"` | `"cat" == "dog"` | | `!=` | NOT equal to | `5 != 3` | `as.character(5) != "5"` | |`x %in% y`| Returns TRUE for x that are present in y| `3 %in% c(1, 2, 3)` | `3 %in% c(4:9)` | | `!(x %in% y)` | Returns TRUE for NOT present in y | `!(3 %in% c(4:9))` | `!("cat" %in% c("dog", "cat", "rat"))` | | `x` | `y` | x OR y | `5 == 3` | `3 != 2` | `"cat" == "dog"` | `3 != 3` | | `x & y`| x AND y | `3 == 3 & "dog" == "dog"` | `5 == 3 & 3 != 2` | --- ### [`dplyr::filter()`](https://dplyr.tidyverse.org/reference/filter.html) Returns rows where the logical argument is `TRUE` ```r # sex EQUAL to MALE filter(penguins, species == "Adelie") ``` ``` ## # A tibble: 152 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 142 more rows, and 1 more variable: sex <chr> ``` --- ### [`dplyr::filter()`](https://dplyr.tidyverse.org/reference/filter.html) ```r penguins %>% # species matching Adelie or Gentoo filter(species %in% c("Adelie", "Gentoo")) ``` ``` ## # A tibble: 276 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 266 more rows, and 1 more variable: sex <chr> ``` --- ### `dplyr::filter()` ```r penguins %>% # species EQUAL to Chinstrap and culmen lenth greater than 53 filter(species != "Chinstrap" & culmen_length_mm >= 53) ``` ``` ## # A tibble: 5 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Gentoo Biscoe 59.6 17 230 6050 ## 2 Gentoo Biscoe 54.3 15.7 231 5650 ## 3 Gentoo Biscoe 55.9 17 228 5600 ## 4 Gentoo Biscoe 53.4 15.8 219 5500 ## 5 Gentoo Biscoe 55.1 16 230 5850 ## # … with 1 more variable: sex <chr> ``` --- ### [`dplyr::arrange()`](https://dplyr.tidyverse.org/reference/arrange.html) `arrange` defaults to smallest to largest ```r penguins %>% arrange(culmen_length_mm) ``` ``` ## # A tibble: 344 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Adelie Dream 32.1 15.5 188 3050 ## 2 Adelie Dream 33.1 16.1 178 2900 ## 3 Adelie Torge… 33.5 19 190 3600 ## 4 Adelie Dream 34 17.1 185 3400 ## 5 Adelie Torge… 34.1 18.1 193 3475 ## 6 Adelie Torge… 34.4 18.4 184 3325 ## 7 Adelie Biscoe 34.5 18.1 187 2900 ## 8 Adelie Torge… 34.6 21.1 198 4400 ## 9 Adelie Torge… 34.6 17.2 189 3200 ## 10 Adelie Biscoe 35 17.9 190 3450 ## # … with 334 more rows, and 1 more variable: sex <chr> ``` --- ### `dplyr::arrange()` `desc` means descending order, ie largest to smallest ```r penguins %>% arrange(desc(culmen_length_mm)) ``` ``` ## # A tibble: 344 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Gentoo Biscoe 59.6 17 230 6050 ## 2 Chinst… Dream 58 17.8 181 3700 ## 3 Gentoo Biscoe 55.9 17 228 5600 ## 4 Chinst… Dream 55.8 19.8 207 4000 ## 5 Gentoo Biscoe 55.1 16 230 5850 ## 6 Gentoo Biscoe 54.3 15.7 231 5650 ## 7 Chinst… Dream 54.2 20.8 201 4300 ## 8 Chinst… Dream 53.5 19.9 205 4500 ## 9 Gentoo Biscoe 53.4 15.8 219 5500 ## 10 Chinst… Dream 52.8 20 205 4550 ## # … with 334 more rows, and 1 more variable: sex <chr> ``` --- ### [`dplyr::arrange()`](https://dplyr.tidyverse.org/reference/arrange.html) ```r penguins %>% arrange(desc(flipper_length_mm), desc(culmen_length_mm)) ``` ``` ## # A tibble: 344 x 7 ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Gentoo Biscoe 54.3 15.7 231 5650 ## 2 Gentoo Biscoe 59.6 17 230 6050 ## 3 Gentoo Biscoe 55.1 16 230 5850 ## 4 Gentoo Biscoe 52.1 17 230 5550 ## 5 Gentoo Biscoe 51.5 16.3 230 5500 ## 6 Gentoo Biscoe 50 16.3 230 5700 ## 7 Gentoo Biscoe 49.8 16.8 230 5700 ## 8 Gentoo Biscoe 48.6 16 230 5800 ## 9 Gentoo Biscoe 49.8 15.9 229 5950 ## 10 Gentoo Biscoe 49.5 16.2 229 5800 ## # … with 334 more rows, and 1 more variable: sex <chr> ``` --- ### [`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) ```r penguins %>% * select(species, sex) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 2 *## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… *## $ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "FEMALE", "… ``` --- ### [`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) ```r penguins %>% * select(species, sex, island, body_mass_g) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 4 *## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie… *## $ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "FEMALE… *## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torg… *## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250,… ``` -- ```r penguins %>% select(species, sex, island, body_mass_g) %>% * select(-island) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 3 *## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie… *## $ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "FEMALE… *## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250,… ``` --- ### [`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) ```r penguins %>% * select(sex, everything()) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 7 ## $ sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "… ## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… ## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen",… ## $ culmen_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,… ## $ culmen_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,… ## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18… ## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,… ``` --- ### [`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) ```r penguins %>% * select(starts_with("culmen"), contains("flip")) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 3 ## $ culmen_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,… ## $ culmen_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,… ## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18… ``` --- ### [`dplyr::mutate()`](https://dplyr.tidyverse.org/reference/mutate.html) ```r penguins %>% select(species) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 1 *## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… ``` ```r penguins %>% * mutate(species = factor(species, * levels = c("Adelie", "Chinstrap", "Gentoo"), * labels = c("AD", "CS", "GT"))) %>% select(species) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 1 *## $ species <fct> AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, AD, A… ``` --- ### [`dplyr::mutate()`](https://dplyr.tidyverse.org/reference/mutate.html) ```r penguins %>% * mutate(body_mass_kg = body_mass_g / 1000, * body_mass = body_mass_kg * 1000) %>% select(body_mass_kg, body_mass_g, body_mass) %>% head(10) ``` ``` ## # A tibble: 10 x 3 ## body_mass_kg body_mass_g body_mass ## <dbl> <dbl> <dbl> *## 1 3.75 3750 3750 ## 2 3.8 3800 3800 ## 3 3.25 3250 3250 ## 4 NA NA NA ## 5 3.45 3450 3450 ## 6 3.65 3650 3650 ## 7 3.62 3625 3625 ## 8 4.68 4675 4675 ## 9 3.48 3475 3475 ## 10 4.25 4250 4250 ``` --- ### [`dplyr::group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) ```r penguins %>% group_by(species) ``` ``` ## # A tibble: 344 x 7 *## # Groups: species [3] ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 1 more variable: sex <chr> ``` --- ### [`dplyr::group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) ```r penguins %>% * group_by(species) %>% slice(1) ``` ``` ## # A tibble: 3 x 7 ## # Groups: species [3] ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> *## 1 Adelie Torge… 39.1 18.7 181 3750 *## 2 Chinst… Dream 46.5 17.9 192 3500 *## 3 Gentoo Biscoe 46.1 13.2 211 4500 ## # … with 1 more variable: sex <chr> ``` --- ### [`dplyr::group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) ```r penguins %>% * group_by(species) %>% slice_max(culmen_length_mm, n = 1) ``` ``` ## # A tibble: 3 x 7 ## # Groups: species [3] ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 46 21.5 194 4200 ## 2 Chinst… Dream 58 17.8 181 3700 ## 3 Gentoo Biscoe 59.6 17 230 6050 ## # … with 1 more variable: sex <chr> ``` --- ### [`dplyr::group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) ```r penguins %>% * group_by(species) %>% arrange(desc(culmen_length_mm)) %>% slice(1) ``` ``` ## # A tibble: 3 x 7 ## # Groups: species [3] ## species island culmen_length_mm culmen_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 46 21.5 194 4200 ## 2 Chinst… Dream 58 17.8 181 3700 ## 3 Gentoo Biscoe 59.6 17 230 6050 ## # … with 1 more variable: sex <chr> ``` --- ### [`dplyr::group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) ```r penguins %>% * group_by(species, island) %>% count() ``` ``` ## # A tibble: 5 x 3 *## # Groups: species, island [5] ## species island n ## <chr> <chr> <int> ## 1 Adelie Biscoe 44 ## 2 Adelie Dream 56 ## 3 Adelie Torgersen 52 ## 4 Chinstrap Dream 68 ## 5 Gentoo Biscoe 124 ``` --- ### [`dplyr::summarize()`](https://dplyr.tidyverse.org/reference/summarise.html) ```r penguins %>% summarize(mean = mean(body_mass_g)) ``` ``` ## # A tibble: 1 x 1 ## mean ## <dbl> ## 1 NA ``` -- ```r penguins %>% * summarize(mean = mean(body_mass_g, na.rm = TRUE)) ``` ``` *## # A tibble: 1 x 1 *## mean ## <dbl> ## 1 4202. ``` --- ### [`dplyr::summarize()`](https://dplyr.tidyverse.org/reference/summarise.html) ```r penguins %>% summarize(median(body_mass_g, na.rm = TRUE)) ``` ``` ## # A tibble: 1 x 1 *## `median(body_mass_g, na.rm = TRUE)` ## <dbl> ## 1 4050 ``` -- ```r penguins %>% summarize(median_mass= median(body_mass_g, na.rm = TRUE)) ``` ``` ## # A tibble: 1 x 1 *## median_mass ## <dbl> ## 1 4050 ``` --- ### [`dplyr::summarize()`](https://dplyr.tidyverse.org/reference/summarise.html) ```r penguins %>% group_by(species) %>% * summarize(mean_mass = mean(body_mass_g, na.rm = TRUE), * sd_mass = sd(body_mass_g, na.rm = TRUE), * n = n()) ``` ``` ## `summarise()` ungrouping output (override with `.groups` argument) ``` ``` ## # A tibble: 3 x 4 ## species mean_mass sd_mass n ## <chr> <dbl> <dbl> <int> ## 1 Adelie 3701. 459. 152 ## 2 Chinstrap 3733. 384. 68 ## 3 Gentoo 5076. 504. 124 ``` --- ### [`dplyr::mutate() + across()`](https://dplyr.tidyverse.org/reference/across.html) ```r penguins %>% * mutate(across(c(species, island), factor)) %>% select(species, island) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 2 *## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adeli… *## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torger… ``` -- ```r penguins %>% select(species, island) %>% glimpse() ``` ``` ## Rows: 344 ## Columns: 2 *## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… *## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgerse… ``` --- ### [`dplyr::summarize() + across`](https://dplyr.tidyverse.org/reference/across.html) ```r penguins %>% group_by(species) %>% summarize( * across(c(body_mass_g, culmen_length_mm), mean, na.rm = TRUE), * n = n() ) ``` ``` ## `summarise()` ungrouping output (override with `.groups` argument) ``` ``` ## # A tibble: 3 x 4 *## species body_mass_g culmen_length_mm n ## <chr> <dbl> <dbl> <int> ## 1 Adelie 3701. 38.8 152 ## 2 Chinstrap 3733. 48.8 68 ## 3 Gentoo 5076. 47.5 124 ``` --- ### [`dplyr::summarize() + across`](https://dplyr.tidyverse.org/reference/across.html) ```r penguins %>% group_by(species) %>% summarize( across(c(body_mass_g, culmen_length_mm, culmen_depth_mm), list( * mean = ~mean(.x, na.rm = TRUE), * sd = ~sd(.x, na.rm = TRUE)) ), * n = n() ) ``` ``` ## `summarise()` ungrouping output (override with `.groups` argument) ``` ``` ## # A tibble: 3 x 8 *## species body_mass_g_mean body_mass_g_sd culmen_length_m… culmen_length_m… ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie 3701. 459. 38.8 2.66 ## 2 Chinst… 3733. 384. 48.8 3.34 ## 3 Gentoo 5076. 504. 47.5 3.08 ## # … with 3 more variables: culmen_depth_mm_mean <dbl>, ## # culmen_depth_mm_sd <dbl>, n <int> ``` --- # [`tidyr`](https://tidyr.tidyverse.org/) The goal of tidyr is to help you create tidy data. Tidy data is data where: * Each variable is in a column. * Each observation is a row. * Each value is a cell. -- ### Make Taller and Make Wider * `pivot_longer()` - "lengthens" data, increasing the number of rows and decreasing the number of columns. * `pivot_wider()` - "widens" data, increases the number of columns and decreasing the number of rows. -- ### Separate and unite columns * `separate()` - Separate one column into multiple columns. * `unite()` - Unite multiple columns into one. --- # Tidy the data ```r untidy_df ``` ``` ## # A tibble: 5 x 7 *## age_group male_2016 female_2016 male_2017 female_2017 male_2018 female_2018 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 < 18 22000 20000 22000 20000 22000 20000 ## 2 18-30 36000 35000 36000 35000 36000 35000 ## 3 31-50 50000 40000 50000 40000 50000 40000 ## 4 51-60 62000 60000 62000 60000 62000 60000 ## 5 > 60 75000 72000 75000 72000 75000 72000 ``` --- # Tidy the data ```r untidy_df %>% * pivot_longer(cols = male_2016:female_2018, * names_to = "gender_year", * values_to = "income") ``` ``` ## # A tibble: 30 x 3 ## age_group gender_year income ## <chr> <chr> <dbl> *## 1 < 18 male_2016 22000 ## 2 < 18 female_2016 20000 ## 3 < 18 male_2017 22000 ## 4 < 18 female_2017 20000 ## 5 < 18 male_2018 22000 *## 6 < 18 female_2018 20000 ## 7 18-30 male_2016 36000 ## 8 18-30 female_2016 35000 ## 9 18-30 male_2017 36000 ## 10 18-30 female_2017 35000 ## # … with 20 more rows ``` --- # Tidy the data ```r untidy_df %>% pivot_longer(cols = male_2016:female_2018, names_to = "gender_year", values_to = "income") %>% * separate(gender_year, into = c("gender", "year")) ``` ``` ## # A tibble: 30 x 4 ## age_group gender year income ## <chr> <chr> <chr> <dbl> *## 1 < 18 male 2016 22000 ## 2 < 18 female 2016 20000 ## 3 < 18 male 2017 22000 ## 4 < 18 female 2017 20000 ## 5 < 18 male 2018 22000 *## 6 < 18 female 2018 20000 ## 7 18-30 male 2016 36000 ## 8 18-30 female 2016 35000 ## 9 18-30 male 2017 36000 ## 10 18-30 female 2017 35000 ## # … with 20 more rows ``` --- # Tidy the data ```r untidy_df %>% pivot_longer( * cols = male_2016:female_2018, * names_to = c("gender", "year"), * names_pattern = "(.*)_(.*)", * values_to = "income" ) ``` ``` ## # A tibble: 30 x 4 ## age_group gender year income ## <chr> <chr> <chr> <dbl> *## 1 < 18 male 2016 22000 ## 2 < 18 female 2016 20000 ## 3 < 18 male 2017 22000 ## 4 < 18 female 2017 20000 ## 5 < 18 male 2018 22000 *## 6 < 18 female 2018 20000 ## 7 18-30 male 2016 36000 ## 8 18-30 female 2016 35000 ## 9 18-30 male 2017 36000 ## 10 18-30 female 2017 35000 ## # … with 20 more rows ``` --- # Untidy the data ```r tidy_df %>% * unite("gender_year", c("gender", "year"), sep = "_") ``` -- ``` ## # A tibble: 30 x 3 *## age_group gender_year income ## <chr> <chr> <dbl> *## 1 < 18 male_2016 22000 ## 2 18-30 male_2016 36000 ## 3 31-50 male_2016 50000 ## 4 51-60 male_2016 62000 ## 5 > 60 male_2016 75000 ## 6 < 18 female_2016 20000 ## 7 18-30 female_2016 35000 ## 8 31-50 female_2016 40000 ## 9 51-60 female_2016 60000 ## 10 > 60 female_2016 72000 ## # … with 20 more rows ``` --- # Untidy the data ```r tidy_df %>% unite("gender_year", c("gender", "year"), sep = "_") %>% * pivot_wider(names_from = gender_year, values_from = income) ``` -- ``` ## # A tibble: 5 x 7 *## age_group male_2016 female_2016 male_2017 female_2017 male_2018 female_2018 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 < 18 22000 20000 22000 20000 22000 20000 ## 2 18-30 36000 35000 36000 35000 36000 35000 ## 3 31-50 50000 40000 50000 40000 50000 40000 ## 4 51-60 62000 60000 62000 60000 62000 60000 ## 5 > 60 75000 72000 75000 72000 75000 72000 ``` --- # Untidy the data ```r tidy_df %>% * pivot_wider(names_from = c(gender, year), values_from = income) ``` -- ``` ## # A tibble: 5 x 7 *## age_group male_2016 female_2016 male_2017 female_2017 male_2018 female_2018 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 < 18 22000 20000 22000 20000 22000 20000 ## 2 18-30 36000 35000 36000 35000 36000 35000 ## 3 31-50 50000 40000 50000 40000 50000 40000 ## 4 51-60 62000 60000 62000 60000 62000 60000 ## 5 > 60 75000 72000 75000 72000 75000 72000 ``` --- # `ggplot2` ### 3 Core parts .pull-left[ * `ggplot` - builds base layer * `geom_` is the shape * `geom_point()` * `geom_line()` * `geom_bar()` * `geom_boxplot()` * `geom_?()` ] -- .pull-right[ * `aes` is the mappings/relationships * Horizontal Dimensions (x) * Vertical Dimensions (y) * Color * Shape * Size * Transparency * Relationships ] --- ```r ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>() ``` -- ```r ggplot(penguins, aes(x = culmen_length_mm, y = culmen_depth_mm, color = species)) + geom_point() ``` -- ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-87-1.png)<!-- --> --- Supply the data, tell `ggplot2` the aesthetic mappings, and then add layers of `plots` via `geom_` -- .pull-left[ * `geom_point()` - Points * `geom_dotplot()` - Dot plot * `geom_hline()` - Horizontal reference line * `geom_vline()` - Vertical reference line * `geom_boxplot()` - A box and whisker plot * `geom_density()` - Smoothed density estimates * `geom_errorbarh()` - Horizontal error bars * `geom_hex()` - Hexagonal heatmap of 2d bin counts * `geom_jitter()` - Jittered points * `geom_linerange()` - Vertical interval line * `geom_pointrange()` - Vertical point line * `geom_line()` - Connect observations line ] .pull-right[ * `geom_step()` - Connect observations via step lines * `geom_polygon()` - Polygons * `geom_segment()` - Line segment * `geom_ribbon()` - Ribbon plot * `geom_area()` - Area plot * `geom_rug()` - Rug plots in the margins * `geom_smooth()` - Smoothed conditional means * `geom_label()` - Label points with text * `geom_text()` - Add text * `geom_violin()` - Violin plot * `geom_sf()` - Visual sf objects * `geom_map()` - Plot map * `geom_qq_line()` - A quantile-quantile plot * `geom_histogram()` - Histogram plot ] --- # `ggplot2` ### [Gapminder dataset](https://cran.r-project.org/web/packages/gapminder/index.html) Excerpt from the Gapminder data. For each of 142 countries, the data provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007. ```r gapminder_df <- gapminder::gapminder glimpse(gapminder_df) ``` ``` ## Rows: 1,704 ## Columns: 6 ## $ country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani… ## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia,… ## $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997,… ## $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.… ## $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 1… ## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134,… ``` --- ### `ggplot2` ```r gapminder_df %>% filter(country == "Malawi") %>% * ggplot(aes(x = year, y = lifeExp)) + * geom_line(colour = "#1380A1", size = 1) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-89-1.png)<!-- --> --- ###`ggplot2` ```r gapminder_df %>% filter(country == "Malawi") %>% ggplot(aes(x = year, y = lifeExp)) + geom_line(colour = "#1380A1", size = 1) + * geom_point(size = 2) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-90-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% * filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-91-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country, * size = pop)) + geom_point() + theme_minimal() + theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank()) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-92-1.png)<!-- --> --- # `ggplot2` ```r bar_df <- gapminder_df %>% filter(year == 2007 & continent == "Africa") %>% arrange(desc(lifeExp)) %>% head(5) (bars <- ggplot(bar_df, aes(x = country, y = lifeExp)) + geom_bar(stat = "identity")) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-93-1.png)<!-- --> --- ### `ggplot2` ```r bar_df <- gapminder_df %>% filter(year == 2007 & continent == "Africa") %>% arrange(desc(lifeExp)) %>% head(5) (bars <- ggplot(bar_df, aes(x = country, y = lifeExp)) + * geom_col()) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-94-1.png)<!-- --> --- ### `ggplot2` ```r grouped_bars <- ggplot(grouped_bar_df, aes(x = country, y = lifeExp, fill = as.factor(year))) + geom_bar(stat="identity", position="dodge") grouped_bars ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-96-1.png)<!-- --> --- ### ggplot2 ```r hist_plot <- gapminder_df %>% filter(year == 2007) %>% ggplot(aes(x = lifeExp)) + * geom_histogram(binwidth = 5, color = "white") hist_plot ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-97-1.png)<!-- --> --- ### ggplot2 ```r hist_plot + labs(x = "Life Expectancy (Years)", y = "Count", title = "Life expectancy", subtitle = "Year = 2007", caption = "Source: Gapminder") ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-98-1.png)<!-- --> --- ### ggplot2 ```r ggsave("life_exp_2007.png", hist_plot, height = 4, width = 6, units = "in", dpi = 450) ``` -- ```r knitr::include_graphics("life_exp_2007.png") ``` <img src="life_exp_2007.png" width="600px" /> --- ### ggplot2 Unlimited customization via theme! ```r theme(line, rect, text, title, aspect.ratio, axis.title, axis.title.x, axis.title.x.top, axis.title.x.bottom, axis.title.y, axis.title.y.left, axis.title.y.right, axis.text, axis.text.x, axis.text.x.top, axis.text.x.bottom, axis.text.y, axis.text.y.left, axis.text.y.right, axis.ticks, axis.ticks.x, axis.ticks.x.top, axis.ticks.x.bottom, axis.ticks.y, axis.ticks.y.left, axis.ticks.y.right, axis.ticks.length, axis.ticks.length.x, axis.ticks.length.x.top, axis.ticks.length.x.bottom, axis.ticks.length.y, axis.ticks.length.y.left, axis.ticks.length.y.right, axis.line, axis.line.x, axis.line.x.top, axis.line.x.bottom, axis.line.y, axis.line.y.left, axis.line.y.right, legend.background, legend.margin, legend.spacing, legend.spacing.x, legend.spacing.y, legend.key, legend.key.size, legend.key.height, legend.key.width, legend.text, legend.text.align, legend.title, legend.title.align, legend.position, legend.direction, legend.justification, legend.box, legend.box.just, legend.box.margin, legend.box.background, legend.box.spacing, panel.background, panel.border, panel.spacing, panel.spacing.x, panel.spacing.y, panel.grid, panel.grid.major, panel.grid.minor, panel.grid.major.x, panel.grid.major.y, panel.grid.minor.x, panel.grid.minor.y, panel.ontop, plot.background, plot.title, plot.subtitle, plot.caption, plot.tag, plot.tag.position, plot.margin, strip.background, strip.background.x, strip.background.y, strip.placement, strip.text, strip.text.x, strip.text.y, strip.switch.pad.grid, strip.switch.pad.wrap, ..., complete = FALSE, validate = TRUE) ``` --- ### `ggplot2` .pull-left[ Saved themes apply multiple theme elements all at once * Saves writing * Reproducibility Built in themes * `theme_grey()` # default * `theme_bw()` * `theme_minimal()` * `theme_classic()` ] -- .pull-right[ New packages * `bbplot` * `bbc_style()` * `urbnthemes()` * `theme_urbn_print()` * `ggthemes` * `theme_few()` * `theme_excel()` * `theme_economist()` * Build your own package/theme! ] --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + * theme_grey() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-101-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + * theme_bw() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-102-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + * theme_minimal() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-103-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + * theme_classic() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-104-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + * ggthemes::theme_economist_white() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-105-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + * ggthemes::theme_few() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-106-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + geom_hline(yintercept = 0, size = 1, colour="#333333") + * bbplot::bbc_style() ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-107-1.png)<!-- --> --- ### `ggplot2` ```r gapminder_df %>% filter(country == "China" | country == "United States") %>% ggplot(aes(x = year, y = lifeExp, colour = country)) + geom_line(size = 1) + geom_hline(yintercept = 0, size = 1, colour="#333333") + * theme_minimal() + * theme(panel.grid.major.x = element_blank(), * panel.grid.minor.x = element_blank()) ``` ![](intro-to-tidyverse_files/figure-html/unnamed-chunk-108-1.png)<!-- --> --- # Whirlwind of information! `tidyverse` focuses on `tidy` data - `Tidyverse` - Read data in with `readr` - Tidy data with `tidyr` - Transform data with `dplyr` - Plot data with `ggplot2` -- - Next Steps * [R for Data Science book (free!)](https://r4ds.had.co.nz/) * [RStudio Cloud Primers (free!)](https://rstudio.cloud/learn/primers)