----------------------------------------
> From: had...@rice.edu
> Date: Mon, 7 Feb 2011 11:00:59 -0600
> To: mdo...@mdowle.plus.com
> CC: r-h...@stat.math.ethz.ch
> Subject: Re: [R] aggregate function - na.action
>
> > Does FAQ 1.8 answer that ok ?
> > "Ok, I'm starting to see what data.table is about, but why didn't you
> > enhance data.frame in R? Why does it have to be a new package?"
> > http://datatable.r-forge.r-project.org/datatable-faq.pdf
>
> Kind of. I think there are two sets of features data.table provides:
>
> * a compact syntax for expressing many common data manipulations
> * high performance data manipulation
>
> FAQ 1.8 answers the question for the syntax, but not for the
> performance related features.
>
> Basically, I'd love to be able to use the high performance components
> of data table in plyr, but keep using my existing syntax. Currently
> the only way to do that is for me to dig into your C code to
> understand why it's fast, and then implement those ideas in plyr.
Without looking ( theo original subj would have caused me to miss most of this
thread),
usually the problems are with data strcutures that
don't know about algorithm access patterns or are not characterized beyond
things like order
to operate on a collection of some kind( O(n) for example to access). I think
the author suggested
page loading time as a contributing factor IIRC and this would
be great news since that is one of my personal rants:) People complain
about "running out of memory" but it is unlikely you have an algorithm that
just randomly picks one of those "billions and billions" of bits after the
prior memory operation. Cache aware structures and algorothms can be a big
deal, see for example many good white papers on intel site. Tables generally
connote
random access but usually you just want to stream the data or hopefully operate
on
local blocks. Long before VM thrashing, low level cache pollution can become a
problem etc.
Personally I've always thought a streaming source would be nice. Not sure if
you
want a prefetch() or similar interface signatures to let your algorithm
prepare your stucts etc.
>
> Hadley
>
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.