Never use stringsAsFactors on uncleaned data. For one thing you give a factor to as.Date and it tries to make sense of the integer representation, not the character representation.
library(dplyr) dta <- read.csv( text = "sampdate,samptime,cfs 2020-08-26,09:30,136000 2020-08-26,09:35,126000 2020-08-26,09:40,130000 2020-08-26,09:45,128000 2020-08-26,09:50,126000 2020-08-26,09:55,125000 2020-08-26,10:00,121000 2020-08-26,10:05,117000 2020-08-26,10:10,120000 ", stringsAsFactors = FALSE) dtad <- ( dta %>% group_by( sampdate ) %>% summarise( exp_value = mean(cfs, na.rm = TRUE) , Count = n() ) ) On August 31, 2021 2:11:05 PM PDT, Rich Shepard <rshep...@appl-ecosys.com> wrote: >On Sun, 29 Aug 2021, Jeff Newmiller wrote: > >> The general idea is to create a "grouping" column with repeated values for >> each day, and then to use aggregate to compute your combined results. The >> dplyr package's group_by/summarise functions can also do this, and there >> are also proponents of the data.table package which is high performance >> but tends to depend on altering data in-place unlike most other R data >> handling functions. > >Jeff, > >I've read a number of docs discussing dplyr's summerize and group_by >functions (including that section of Hadley's 'R for Data Science' book, yet >I'm missing something; I think that I need to separate the single sampdate >column into colums for year, month, and day and group_by year/month >summarizing within those groups. > >The data are of this format: >sampdate,samptime,cfs >2020-08-26,09:30,136000 >2020-08-26,09:35,126000 >2020-08-26,09:40,130000 >2020-08-26,09:45,128000 >2020-08-26,09:50,126000 >2020-08-26,09:55,125000 >2020-08-26,10:00,121000 >2020-08-26,10:05,117000 >2020-08-26,10:10,120000 > >My curent script is: > >-------8<-------------- >library('tidyverse') > >discharge <- read.table('../data/discharge.dat', header = TRUE, sep = ',', >stringsAsFactors = TRUE) >discharge$sampdate <- as.Date(discharge$sampdate) >discharge$cfs <- as.numeric(discharge$cfs, length = 6) > ># use dplyr.summarize grouped by date > ># need to separate sampdate into %Y-%M-%D in order to group_by the month? >by_month <- discharge %>% > group_by(sampdate ... >summarize(by_month, exp_value = mean(cfs, na.rm = TRUE), sd(cfs)) >---------------->8-------- > >and the results are: > >> str(discharge) >'data.frame': 93254 obs. of 3 variables: > $ sampdate: Date, format: "2020-08-26" "2020-08-26" ... > $ samptime: Factor w/ 728 levels "00:00","00:05",..: 115 116 117 118 123 128 > 133 138 143 148 ... > $ cfs : num 176 156 165 161 156 154 144 137 142 142 ... >> ls() >[1] "by_month" "discharge" >> by_month ># A tibble: 93,254 × 3 ># Groups: sampdate [322] > sampdate samptime cfs > <date> <fct> <dbl> > 1 2020-08-26 09:30 176 > 2 2020-08-26 09:35 156 > 3 2020-08-26 09:40 165 > 4 2020-08-26 09:45 161 > 5 2020-08-26 09:50 156 > 6 2020-08-26 09:55 154 > 7 2020-08-26 10:00 144 > 8 2020-08-26 10:05 137 > 9 2020-08-26 10:10 142 >10 2020-08-26 10:15 142 ># … with 93,244 more rows > >I don't know why the discharge values are truncated to 3 digits when they're >6 digits in the input data. > >Suggested readings appreciated, > >Rich > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.