Hi,

In addition to Rainer's suggestion (which are to give an small example
of what your input data look like and an example of what you want to
output), given the size of your input data, you might want to try to
use the data.table package instead of plyr::ddply -- especially while
you are exploring different combinations/calculations over your data.

Usually, the equivalent data.table approach (to the ddply one) tend to
be orders of magnitude faster and usually more memory efficient.

When the size of my data is small, I often use both (I think the
plyr/ddply "language" is rather beautiful), but when my data gets into
the 1000++ rows, I'll universally switch to data.table.

HTH,
-steve


On Sat, Aug 17, 2013 at 4:33 PM, Dylan Doyle <ddoyle....@gmail.com> wrote:
>
> Hello R users,
>
>
> I have recently begun a project to analyze a large data set of approximately 
> 1.5 million rows it also has 9 columns. My objective consists of locating 
> particular subsets within this data ie. take all rows with the same column 9 
> and perform a function on that subset. It was suggested to me that i use the 
> ddply() function from the Pylr package. Any advice would be greatly 
> appreciated
>
>
> Thanks much,
>
> Dylan
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to