If I follow what you are trying to do, you want the mean of z for each value of 
y.

tapply(df$z, df$y, mean)


> On Nov 17, 2021, at 8:20 AM, Luigi Marongiu <marongiu.lu...@gmail.com> wrote:
> 
> Hello,
> I have a dataframe with 3 variables. I want to loop through it to get
> the mean value of the variable `z`, as follows:
> ```
> df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
> y = rep(letters[1:5],3),
> z = rnorm(15),
> stringsAsFactors = FALSE)
> m = vector()
> for (i in unique(df$y)) {
> s = df[df$y == i,]
> m = append(m, mean(s$z))
> }
> names(m) = unique(df$y)
>> (m)
> a          b          c          d          e
> -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
> ```
> The problem is that I have one million `y` values, so the work takes
> almost a day. I understand that vectorization will speed up the
> procedure. But how shall I write the procedure in vectorial terms?
> Thank you
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael’s Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to