Rich, Did I miss something? The summarise() command is telling you that you had not implicitly grouped the data and it made a guess. The canonical way is:
... %>% group_by(year, month, day, hour) %>% summarise(...) You decide which fields to group by, sometimes including others so they are in the output. Avi -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Rich Shepard Sent: Monday, September 13, 2021 4:53 PM To: r-help@r-project.org Subject: [R] tidyverse: grouped summaries (with summerize) I changed the data files so the date-times are in five separate columns: year, month, day, hour, and minute; for example, year,month,day,hour,min,cfs 2016,03,03,12,00,149000 2016,03,03,12,10,150000 2016,03,03,12,20,151000 2016,03,03,12,30,156000 2016,03,03,12,40,154000 2016,03,03,12,50,150000 2016,03,03,13,00,153000 2016,03,03,13,10,156000 2016,03,03,13,20,154000 The script is based on the example (on page 59 of 'R for Data Science'): library('tidyverse') disc <- read.csv('../data/water/disc.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) disc$year <- as.integer(disc$year) disc$month <- as.integer(disc$month) disc$day <- as.integer(disc$day) disc$hour <- as.integer(disc$hour) disc$min <- as.integer(disc$min) disc$cfs <- as.double(disc$cfs, length = 6) # use dplyr to filter() by year, month, day; summarize() to get monthly # means, sds disc_by_month <- group_by(disc, year, month) summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) but my syntax is off because the results are: > source('disc.R') `summarise()` has grouped output by 'year'. You can override using the `.groups` argument. Warning messages: 1: In eval(ei, envir) : NAs introduced by coercion 2: In eval(ei, envir) : NAs introduced by coercion > ls() [1] "disc" "disc_by_month" > disc_by_month # A tibble: 590,940 × 6 # Groups: year, month [66] year month day hour min cfs <int> <int> <int> <int> <int> <dbl> 1 2016 3 3 12 0 149000 2 2016 3 3 12 10 150000 3 2016 3 3 12 20 151000 4 2016 3 3 12 30 156000 5 2016 3 3 12 40 154000 6 2016 3 3 12 50 150000 7 2016 3 3 13 0 153000 8 2016 3 3 13 10 156000 9 2016 3 3 13 20 154000 10 2016 3 3 13 30 155000 # … with 590,930 more rows I have the same results if I use as.numeric rather than as.integer and as.double. What am I doing incorrectly? TIA, Rich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.