On Wed, 9 Dec 2009, Peng Yu wrote:

On Tue, Dec 8, 2009 at 11:06 PM, David Winsemius <dwinsem...@comcast.net> wrote:

On Dec 9, 2009, at 12:00 AM, Peng Yu wrote:

On Tue, Dec 8, 2009 at 10:37 PM, David Winsemius <dwinsem...@comcast.net>
wrote:

On Dec 8, 2009, at 11:28 PM, Peng Yu wrote:

I have the following code, which tests the split on a data.frame and
the split on each column (as vector) separately. The runtimes are of
10 time difference. When m and k increase, the difference become even
bigger.

I'm wondering why the performance on data.frame is so bad. Is it a bug
in R? Can it be improved?

You might want to look at the data.table package. The author calinms
significant speed improvements over dta.frames

This bug has been found long time back and a package has been
developed for it. Should the fix be integrated in data.frame rather
than be implemented in an additional package?

What bug?

Is the slow speed in splitting a data.frame a performance bug?


NO!

The two computations are not equivalent.

One is a list whose elements are split vectors, and the other is a list of data.frames containing those vectors.

If you take the trouble to assemble that list of data frames from the list of split vectors you will see that it is very time consuming.

Read up on memory management issues. Think about what the computer actually has to do in terms of memory access to split a data.frame versus split a vector.

---

And even if it were simply a matter of having code that is slow for some application, that would not be a bug. Read the FAQ!

Chuck




David.

system.time(split(as.data.frame(x),f))

 user  system elapsed
 1.700   0.010   1.786

system.time(lapply(

+         1:dim(x)[[2]]
+         , function(i) {
+           split(x[,i],f)
+         }
+         )
+     )
 user  system elapsed
 0.170   0.000   0.167

###########
m=30000
n=6
k=3000

set.seed(0)
x=replicate(n,rnorm(m))
f=sample(1:k, size=m, replace=T)

system.time(split(as.data.frame(x),f))

system.time(lapply(
     1:dim(x)[[2]]
     , function(i) {
       split(x[,i],f)
     }
     )
 )

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to