Re: [R] why is nrow() so slow?

hadley wickham Tue, 15 Sep 2009 09:01:19 -0700

On Tue, Sep 15, 2009 at 9:48 AM, ivo welch <ivo...@gmail.com> wrote:
> dear R wizards:  here is the strange question for the day.  It seems to me
> that nrow() is very slow.  Let me explain what I mean:
>
> ds= data.frame( NA, x=rnorm(10000) )   ##  a sample data set
>
>> system.time( { for (i in 1:10000) NA } )   ## doing nothing takes
> virtually no time
>   user  system elapsed
>  0.000   0.000   0.001
>
> ## this is something that should take time; we need to add 10,000 values
> 10,000 times
>> system.time( { for (i in 1:10000) mean(ds$x) } )
>   user  system elapsed
>  0.416   0.001   0.416
>
> ## alas, this should be very fast.  it is just reading off an attribute of
> ds.  it takes almost a quarter of the time of mean()!
>> system.time( { for (i in 1:10000) nrow(ds) } )
>   user  system elapsed
>  0.124   0.001   0.125


I just encountered this same problem.  nrow is so slow because it
works like this:

 nrow(df)
 dim(df)[1]
 dim.data.frame(df)[1]
 c(.row_names_info(df, 2L), length(df))

If you use .row_names_info(df, 2L) directly it's about 6 times faster.

> system.time( { for (i in 1:10000) nrow(ds) })
   user  system elapsed
  0.183   0.002   0.187

> system.time( { for (i in 1:10000) .row_names_info(ds, 2) })
   user  system elapsed
  0.026   0.000   0.027

Hadley

-- 
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why is nrow() so slow?

Reply via email to