Hello,

I've been dealing with a set of values that contain time stamps and part of my summary needs to look at just weekend data. In trying to limit the data I've found a large difference in performance in the way I index a data frame. I've constructed a minimal example here to try to explain my observation.

   is.weekend <- function(x) {
       tm <- as.POSIXlt(x,origin="1970/01/01")
       format(tm,"%a") %in% c("Sat","Sun")
   }

   use.lapply <- function(data) {
       data[do.call(rbind,lapply(data$TIME,FUN=is.weekend)),]
   }

   use.sapply <- function(data) {
       data[sapply(data$TIME,FUN=is.weekend),]
   }

   use.vapply <- function(data) {
       data[vapply(data$TIME,FUN=is.weekend,FALSE),]
   }

   use.indexing <- function(data) {
       data[is.weekend(data$TIME),]
   }

And the results of these methods:

    > names(csv.data)
   [1] "TIME"     "FILE"     "RADIAN"   "BITS"     "DURATION"

    > length(csv.data$TIME)
   [1] 21471

    > system.time(v1 <- use.lapply(csv.data))
      user  system elapsed
    19.562   6.402  25.967

    > system.time(v2 <- use.sapply(csv.data))
      user  system elapsed
    19.456   6.492  25.951

    > system.time(v3 <- use.vapply(csv.data))
      user  system elapsed
    19.334   6.468  25.808

    > system.time(v4 <- use.indexing(csv.data))
      user  system elapsed
     0.032   0.020   0.052

    > all(identical(v1,v2),identical(v2,v3),identical(v3,v4))
   [1] TRUE



Forgive what is probably a trivial question, but why is there such a large difference in the *apply functions as opposed to the direct indexing method? On the surface it seems as though the use.indexing method uses the entire vector as an argument to the function while the others /might/ iterate over the values using one at a time as an argument to the function. In either case all elements must be part of the calculation...

Thanks for any insight.

Jesse

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to