If you use Rcpp::Rcpp.package.skeleton() to make a package out of the attached C++ code you can speed up the run counting quite a bit - to 1.4 s. from 48 s. for the 101 x 107 x 17 x 103 example.
The package you make will define an R function called CountColumnRuns that computes the number of runs of TRUE of a given minimum length for each column in a matrix. You can adapt it to your 4-dimensional array with f5 <- function (condition, spellLength = 2) { stopifnot(is.logical(condition), !anyNA(condition)) d <- dim(condition) dn <- dimnames(condition) tmp <- array(aperm(condition, c(3, 1, 2, 4)), c(d[3], prod(d[-3]))) runCounts <- RcppCountRuns::CountColumnRuns(tmp, spellLength) array(runCounts, dim = d[-3], dimnames = dn[-3]) } and call it as f5( x > 1, spellLength=2) to get the counts of all runs of length at least 2 in the 3rd dimension of your 4-d array. #include <Rcpp.h> int CountRunsRaw(const int *x, int n, int minRun) { int count(0); int runLength(0); const int* lastX(x + n); for( ; x != lastX ; x++) { if (*x) { runLength++; } else { if (runLength >= minRun) { count++; } runLength = 0; } } if (runLength >= minRun) { count++; } return count; } // [[Rcpp::export]] int CountRuns(const Rcpp::LogicalVector& x, int minRun) { return CountRunsRaw(&x[0], x.size(), minRun); } // [[Rcpp::export]] Rcpp::IntegerVector CountColumnRuns(const Rcpp::LogicalMatrix& x, int minRun) { Rcpp::IntegerVector out(x.ncol()); for(int i = 0 ; i < x.ncol() ; i++) { out[i] = CountRuns(x(Rcpp::_, i), minRun); } return out; } Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Jul 11, 2016 at 9:42 AM, William Dunlap <wdun...@tibco.com> wrote: > How fast is fast enough and what size and shape is your dataset > (show the output of str(yourData))? You will get the fastest execution > time by using C or C++ or Fortran, but you will want to parameterize > the problem well enough that you can amortize the time it takes to > write the code over many problems. > > Peter Langflder's suggested you use aperm followed by colSums > for your earlier problem. For this one you can use aperm followed > by filter (to identify the runs of a given minimum length, column by > column) and then use colSums (to count the number of runs filter > identifies in each column). E.g., > > f4 <- function (condition, spellLength = 2) > { > # obscure way to count runs of length>spellLength > # in 3'rd dimension of logical array 'condition'. > stopifnot(is.logical(condition), !anyNA(condition)) > coef <- c(-1, rep(1, spellLength)) > d <- dim(condition) > dn <- dimnames(condition) > tmp <- array(aperm(condition * 2 - 1, c(3, 1, 2, 4)), c(d[3], > prod(d[-3]))) > fTmp <- filter(rbind(tmp, -1), coef, sides = 1) > sfTmp <- colSums(fTmp == spellLength + 1, na.rm = TRUE) > array(sfTmp, dim = d[-3], dimnames = dn[-3]) > } > > f4(x>=1) is not a great deal faster (48 s. vs. 67 s.) than > apply(x>=1, c(1,2,4), FUN=f3) where f3 is > f3 <- function (condition, spellLength = 2) > { > stopifnot(is.logical(condition), !anyNA(condition)) > y <- rle(condition) > sum(y$lengths[y$values] >= spellLength) > } > and where x has dimensions c(101,107,17,103). > > The relative speed will depend on the size and shape of your dataset, > so show the output of str(yourData). > > > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Sun, Jul 10, 2016 at 1:38 PM, Debasish Pai Mazumder <pai1...@gmail.com> > wrote: > >> Thanks for your response. It is faster than before but still very slow. >> Any other suggestion ? >> -Deb >> >> >> On Sun, Jul 10, 2016 at 2:13 PM, William Dunlap <wdun...@tibco.com> >> wrote: >> >>> There is no need to test that a logical equals TRUE: >>> 'logicalVector==TRUE' is the >>> same as just 'logicalVector'. >>> >>> There is no need to convert logical vectors to numeric, since rle() >>> works on both >>> types. >>> >>> There is no need to use length(subset(x, logicalVector)) to count how >>> many elements >>> in logicalVector are TRUE, just use sum(logicalVector). >>> >>> There is no need to make a variable, 'ans', then immediately return it. >>> >>> Hence your >>> >>> b[b == TRUE] = 1 >>> y <- rle(b) >>> ans <- length(subset(y$lengths[y$values == 1], y$lengths[y$values == >>> 1] >= 2)) >>> return(ans) >>> >>> could be replaced by >>> >>> y <- rle(b) >>> sum(y$lengths[y$values] >= 2) >>> >>> This gives some speedup, mainly for long vectors, but I find it more >>> understandable. >>> E.g., if f1 is your original function and f2 has the above replacement I >>> get: >>> > d <- -sin(1:10000+sqrt(1:4)) >>> > system.time(for(i in 1:10000)f1(d,.3)) >>> user system elapsed >>> 5.19 0.00 5.19 >>> > system.time(for(i in 1:10000)f2(d,.3)) >>> user system elapsed >>> 3.65 0.00 3.65 >>> > c(f1(d,.3), f2(d,.3)) >>> [1] 1492 1492 >>> > length(d) >>> [1] 10000 >>> >>> If it were my function, I would also get rid of the part that deals with >>> the threshhold >>> and direction of the inequality and tell the user to to use f(data <= >>> 0.3) instead of >>> f(data, .3, "below"). I would also make the spell length an argument >>> instead of >>> fixing it at 2. E.g. >>> >>> > f3 <- function (condition, spellLength = 2) >>> { >>> stopifnot(is.logical(condition), !anyNA(condition)) >>> y <- rle(condition) >>> sum(y$lengths[y$values] >= spellLength) >>> } >>> > f3( d >= .3 ) >>> [1] 1492 >>> >>> >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> On Sun, Jul 10, 2016 at 11:58 AM, Debasish Pai Mazumder < >>> pai1...@gmail.com> wrote: >>> >>>> Hi Everyone, >>>> Thanks for your help. It works. I have similar problem when I am >>>> calculating number of spell. >>>> I am also calculation spell (definition: period of two or more days >>>> where x >>>> exceeds 70) using similar way: >>>> >>>> *new = apply(x,c(1,2,4),FUN=function(y) {fun.spell.deb(y, 70)})* >>>> >>>> where fun.spell.deb.R: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *## Calculate spell durationfun.spell.deb <- function(data, threshold = >>>> 1, >>>> direction = c("above", "below")){ #coln <- grep(weather, names(data))# >>>> var <- data[,8] if(missing(direction)) {direction <- "above"} >>>> if(direction=="below") {b <- (data <= threshold)} else {b <- (data >= >>>> threshold)} b[b==TRUE] = 1 y <-rle(b) ans >>>> <-length(subset((y$lengths[y$values==1]), (y$lengths[y$values==1])>=2)) >>>> return(ans)}* >>>> >>>> Do you have any idea how to make the "apply" faster here? >>>> >>>> -Deb >>>> >>>> >>>> On Sat, Jul 9, 2016 at 3:46 PM, Charles C. Berry <ccbe...@ucsd.edu> >>>> wrote: >>>> >>>> > On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote: >>>> > >>>> > I have 4-dimension array x(lat,lon,time,var) >>>> >> >>>> >> I am using "apply" to calculate over time >>>> >> new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))}) >>>> >> >>>> >> This is very slow. Is there anyway make it faster? >>>> >> >>>> > >>>> > If dim(x)[3] << prod(dim(x)[-3]), >>>> > >>>> > new <- Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70)) >>>> > >>>> > will be faster. >>>> > >>>> > However, if you can follow Peter Langfelder's suggestion to use >>>> rowSums, >>>> > that would be best. Even using rowSums(aperm(x,c(1,2,4,3)>=70,dims=3) >>>> and >>>> > paying the price of aperm() might be better. >>>> > >>>> > Chuck >>>> > >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.