I think we wandered away into a package rather than base R, but the request 
seems easy enough.

Just FYI, Rich, as you seem not to have incorporated the advice we gave yet 
about the first argument, your use of group_by() is a tad odd.

disc %>%
     group_by(hour) %>%
     group_by(day) %>%
     group_by(year, month) %>%
     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

Not sure why you use disc once and disc_by_month the second superfluous time 
but if you read the manual page for group_by() 
https://dplyr.tidyverse.org/reference/group_by.html you may note it tends to be 
called ONCE with multiple arguments in sequence that specify what columns in 
the data.frame to group by sequentially.

disc %>%
     group_by(hour, day, year, month) %>%
     summarize(vol = mean(cfs, na.rm = TRUE))

Not sure most people would group that way as the above sorts by hours first. 
Many might reverse that sequence.

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Rich Shepard
Sent: Monday, September 13, 2021 6:32 PM
To: R mailing list <r-help@r-project.org>
Subject: Re: [R] tidyverse: grouped summaries (with summerize)

On Tue, 14 Sep 2021, Eric Berger wrote:

> This code is not correct:
> disc_by_month %>%
>     group_by(year, month) %>%
>     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE)) It should 
> be:
> disc %>% group_by(year,month) %>% summarize(vol=mean(cfs,na.rm=TRUE)

Eric/Avi:

That makes no difference:
> disc_by_month
# A tibble: 590,940 × 6
# Groups:   year, month [66]
     year month   day  hour   min    cfs
    <int> <int> <int> <int> <int>  <dbl>
  1  2016     3     3    12     0 149000
  2  2016     3     3    12    10 150000
  3  2016     3     3    12    20 151000
  4  2016     3     3    12    30 156000
  5  2016     3     3    12    40 154000
  6  2016     3     3    12    50 150000
  7  2016     3     3    13     0 153000
  8  2016     3     3    13    10 156000
  9  2016     3     3    13    20 154000
10  2016     3     3    13    30 155000
# … with 590,930 more rows

I wondered if I need to group first by hour, then day, then year-month.
This, too, produces the same output:

disc %>%
     group_by(hour) %>%
     group_by(day) %>%
     group_by(year, month) %>%
     summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))

And disc shows the read dataframe.

I don't understand why the columns are not grouping.

Thanks,

Rich

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to