William Dunlap wrote:

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of smu
Sent: Wednesday, November 11, 2009 7:58 AM
To: r-help@r-project.org
Subject: [R] partial cumsum

Hello,

I am searching for a function to calculate "partial" cumsums.

For example it should calculate the cumulative sums until a NA appears,
and restart the cumsum calculation after the NA.

this:

x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)

should become this:

1 3 6 NA 5  11  18  26  35  45

Perhaps
   > ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
    [1]  1  3  6 NA  5 11 18 26 35 45


Nice simple function!  Here's a different approach I use that's faster for long 
vectors with many NA values.  Note however, that this approach can suffer from 
catastrophic round-off error because it does a cumsum over the whole vector 
(after replacing NA's with zeros) and then subtracting out the cumsum at the 
most recent NA values.  Most of the body of this function is devoted to 
allowing (an unreasonable degree of) flexibility in specification of where to 
reset.

cumsum.reset <- function(x, reset.at=which(is.na(x)), na.rm=F) {
   # compute the cumsum of x, resetting the cumsum to 0 at each element indexed 
by reset.at
   if (is.logical(reset.at)) {
       if (length(reset.at)>length(x)) {
           if ((length(reset.at) %% length(x))!=0)
               stop("length of reset.at must be a multiple of length of x")
           x <- rep(x, len=length(reset.at))
       } else if (length(reset.at)<length(x)) {
           if ((length(x) %% length(reset.at))!=0)
               stop("length of x must be a multiple of length of reset.at")
           reset.at <- rep(reset.at, len=length(x))
       }
       reset.at <- which(reset.at)
   } else if (!is.numeric(reset.at)) {
       stop("reset.at must be logical or numeric")
   }
   if (length(i <- which(reset.at<=1)))
       reset.at <- reset.at[-i]
   if (any(i <- is.na(x[reset.at])))
       x[reset.at[i]] <- 0
   if (na.rm && any(i <- which(is.na(x))))
       x[i] <- 0
   y <- cumsum(x)
   d <- diff(c(0, y[reset.at-1]))
   r <- rep(0, length(x))
   r[reset.at] <- d
   return(y - cumsum(r))
}



x <- c(1, 2, 3, NA, 5, 6, 7, 8, 9, 10)
cumsum.reset(x)
[1]  1  3  6  0  5 11 18 26 35 45
ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
[1]  1  3  6 NA  5 11 18 26 35 45


The speedup from not breaking the input vector into smaller vectors is (to me) 
surprisingly small -- only a factor of 3:

x <- replace(rnorm(1e6), sample(1e6, 10000), NA)
all.equal(replace(ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum), is.na(x), 0), 
cumsum.reset(x))
[1] TRUE
system.time(cumsum.reset(x))
user system elapsed 0.31 0.03 0.35
system.time(ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum))
user system elapsed 0.99 0.05 1.15


So, I'd go with the ave() approach unless this type of cumsum is the core of a 
long computationally intensive job.  And if that's the case, it would make 
sense to code it in C.

-- Tony Plate

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
any ideas?

thank you and best regards,

    stefan

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to