The data file begins this way: year,month,day,hour,min,fps 2016,03,03,12,00,1.74 2016,03,03,12,10,1.75 2016,03,03,12,20,1.76 2016,03,03,12,30,1.81 2016,03,03,12,40,1.79 2016,03,03,12,50,1.75 2016,03,03,13,00,1.78 2016,03,03,13,10,1.81
The script to process it: library('tidyverse') vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', stringsAsFactors = FALSE) vel$year <- as.integer(vel$year) vel$month <- as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour <- as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <- as.double(vel$fps, length = 6) # use dplyr to filter() by year, month, day; summarize() to get monthly # means vel_by_month = vel %>% group_by(year, month) %>% summarize(flow = mean(fps, na.rm = TRUE)) R's display after running the script:
source('vel.R')
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument. Warning messages: 1: In eval(ei, envir) : NAs introduced by coercion 2: In eval(ei, envir) : NAs introduced by coercion 3: In eval(ei, envir) : NAs introduced by coercion The dataframe created by the read.csv() command:
head(vel)
year month day hour min fps 1 2016 3 3 12 0 1.74 2 2016 3 3 12 10 1.75 3 2016 3 3 12 20 1.76 4 2016 3 3 12 30 1.81 5 2016 3 3 12 40 1.79 6 2016 3 3 12 50 1.75 and the resulting grouping:
vel_by_month
# A tibble: 67 × 3 # Groups: year [8] year month flow <int> <int> <dbl> 1 0 NA NaN 2 2016 3 2.40 3 2016 4 3.00 4 2016 5 2.86 5 2016 6 2.51 6 2016 7 2.18 7 2016 8 1.89 8 2016 9 1.38 9 2016 10 1.73 10 2016 11 2.01 # … with 57 more rows I cannot find why line 1 is there. Other data sets don't produce this result. TIA, Rich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.