If I follow what you are trying to do, you want the mean of z for each value of y.
tapply(df$z, df$y, mean) > On Nov 17, 2021, at 8:20 AM, Luigi Marongiu <marongiu.lu...@gmail.com> wrote: > > Hello, > I have a dataframe with 3 variables. I want to loop through it to get > the mean value of the variable `z`, as follows: > ``` > df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)), > y = rep(letters[1:5],3), > z = rnorm(15), > stringsAsFactors = FALSE) > m = vector() > for (i in unique(df$y)) { > s = df[df$y == i,] > m = append(m, mean(s$z)) > } > names(m) = unique(df$y) >> (m) > a b c d e > -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004 > ``` > The problem is that I have one million `y` values, so the work takes > almost a day. I understand that vectorization will speed up the > procedure. But how shall I write the procedure in vectorial terms? > Thank you > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael’s Hospital Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.