Hi, In addition to Rainer's suggestion (which are to give an small example of what your input data look like and an example of what you want to output), given the size of your input data, you might want to try to use the data.table package instead of plyr::ddply -- especially while you are exploring different combinations/calculations over your data.
Usually, the equivalent data.table approach (to the ddply one) tend to be orders of magnitude faster and usually more memory efficient. When the size of my data is small, I often use both (I think the plyr/ddply "language" is rather beautiful), but when my data gets into the 1000++ rows, I'll universally switch to data.table. HTH, -steve On Sat, Aug 17, 2013 at 4:33 PM, Dylan Doyle <ddoyle....@gmail.com> wrote: > > Hello R users, > > > I have recently begun a project to analyze a large data set of approximately > 1.5 million rows it also has 9 columns. My objective consists of locating > particular subsets within this data ie. take all rows with the same column 9 > and perform a function on that subset. It was suggested to me that i use the > ddply() function from the Pylr package. Any advice would be greatly > appreciated > > > Thanks much, > > Dylan > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.