On Tue, Sep 15, 2009 at 9:48 AM, ivo welch <ivo...@gmail.com> wrote: > dear R wizards: here is the strange question for the day. It seems to me > that nrow() is very slow. Let me explain what I mean: > > ds= data.frame( NA, x=rnorm(10000) ) ## a sample data set > >> system.time( { for (i in 1:10000) NA } ) ## doing nothing takes > virtually no time > user system elapsed > 0.000 0.000 0.001 > > ## this is something that should take time; we need to add 10,000 values > 10,000 times >> system.time( { for (i in 1:10000) mean(ds$x) } ) > user system elapsed > 0.416 0.001 0.416 > > ## alas, this should be very fast. it is just reading off an attribute of > ds. it takes almost a quarter of the time of mean()! >> system.time( { for (i in 1:10000) nrow(ds) } ) > user system elapsed > 0.124 0.001 0.125
I just encountered this same problem. nrow is so slow because it works like this: nrow(df) dim(df)[1] dim.data.frame(df)[1] c(.row_names_info(df, 2L), length(df)) If you use .row_names_info(df, 2L) directly it's about 6 times faster. > system.time( { for (i in 1:10000) nrow(ds) }) user system elapsed 0.183 0.002 0.187 > system.time( { for (i in 1:10000) .row_names_info(ds, 2) }) user system elapsed 0.026 0.000 0.027 Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.