Hello,

I have a dataframe with 40 columns and around 450,000 rows. The first column in each row is a factor id and the remaining are numeric. Some rows have the same ids. What I want to do is to merge each set of rows sharing the same ids (id set) into one single row (summarizing row) with that id. To create the summarizing row, I'd like to apply a different function on each of the original columns in the id set. Some columns within the summarizing row will equal the mean of the columns in the id set, others will equal the minimum, others the maximum.

To do this, I tried using the by() function. However, this was extremely slow (it ran for more than two hours before I stopped it). Also, it used up all of 16 GB of memory on my machine. Is there any more efficient function, both in terms of time and memory, to do this sort of thing?

Thank you very much,
Schraga Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to