Inline. Bert
On Mon, Sep 17, 2018 at 11:54 AM Rich Shepard <rshep...@appl-ecosys.com> wrote: > My dataframe has 113K rows split by a factor into 58 separate > data.frames, > with a different numbers of rows (see error output below). > > I cannot think of a way of proving a sample of data; if a sample for a > MWE > is desired advice on produing one using dput() is needed. > This is gibberish. What does "proving a sample of data" mean? etc. Please proofread and edit. > > To summarize each group within this dataframe I'm using by() and getting > an error because of the different number of rows: > > > by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) { > + mean.rain <- mean(rainfall_by_site[, 'prcp']) > + }) > You are misspecifying your function. It has argument x, but you do not use x in your function. Also the assignment at the end is unnecessary and probably wrong for your use case. Please go through a tutorial on how to write functions in R. You are probably also misusing by(), but as you did not provided sufficient information -- head(your_data_frame) or similar would have told us its structure, rather than having us guess -- nor a reproducible example, it's hard (for me) to figure out your intent. **PLEASE** follow the posting guide and provide such information. You have been requested to do this several times already. Here is the sort of thing I think you wanted to do: > set.seed(54321) ## for reproducibility > df <- data.frame(f = sample(LETTERS[1:3], 12, rep = TRUE), y = runif(12)) > df f y 1 B 0.04529991 2 B 0.65272100 3 A 0.99406601 4 A 0.67763735 5 A 0.91854517 6 C 0.46244494 7 A 0.57141480 8 A 0.45193882 9 B 0.16770701 10 B 0.06826135 11 A 0.89691069 12 C 0.27383703 > by(df, df$f, function(x)mean(x$y)) df$f: A [1] 0.7517521 ------------------------------------------------------ df$f: B [1] 0.2334973 ------------------------------------------------------ df$f: C [1] 0.368141 Note that you do not first break up the df into separate df's, which sounds like what you tried to do. However, note that if all you want to do is summarize a *single* numeric column by a factor, you do not need to use by() at all, which is designed to work on (several columns of) the whole data frame simultaneously. For a single column, tapply() is all you need (or as Duncan noted, functionality in the dplyr package. > with(df,tapply(y,f,mean)) A B C 0.7517521 0.2334973 0.3681410 Finally, if I have misunderstood your intent, my apologies. I tried. -- Bert mean.rain <- by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) { + mean.rain <- mean(rainfall_by_site[, 'prcp']) + }) > Error in (function (..., row.names = NULL, check.rows = FALSE, check.names > = TRUE, : > arguments imply differing number of rows: 4900, 1085, 1894, 2844, 3520, > 647, 239, 3652, 3701, 3063, 176, 4713, 4887, 119, 165, 1221, 3358, 1457, > 4896, 166, 690, 1110, 212, 1727, 227, 236, 1175, 1485, 186, 769, 139, > 203, > 2727, 4357, 1035, 1329, 1454, 973, 4536, 208, 350, 125, 3437, 731, 4894, > 2598, 2419, 752, 427, 136, 685, 4849, 914, 171 > > My web searches have not found anything relevant; perhaps my search > terms > (such as 'R: apply by() with different factor row numbers') can be > improved. > > The help pages found using apropos('by') appear the same: ?by, > ?by.data.frame, ?by.default and provide no hint on how to work with unequal > rows per factor. > > How can I apply by() on these data.frames? > > Rich > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.