> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys > Sent: Tuesday, May 04, 2010 2:02 PM > To: jim holtman > Cc: R mailing list > Subject: Re: [R] Avoiding for-loop for splitting vector into > subvectorsbased on positions > > Thanks, works nicely. I have to do some clocking to see how much the > improvement is, but I surely learnt again. > > Attentive readers might have noticed my initial code contains > an error. > tmp <- x[pos2[i]:pos2[i+1]] > should be: > tmp <- x[pos2[i]:(pos2[i+1]-1)] > off course...
I think you also wanted your for loop to run along 1:length(pos) instead of 1:length(x). Your subject line asked how to avoid a for loop but you seem to be interested in how to make your function run quickly. These are different questions. The following test functions seem to show that your time (and probably memory) problems arise from growing a dataset: out <- c() for(i in 1:length(pos)) { ... out<-c(out, length(tmp)) } instead of preallocating it and inserting into it: out <- numeric(length(pos)) # or integer or list or ... ? for(i in 1:length(pos)) { ... out[i] <- length(tmp) } makeData <- function (nX, nPos) { # make data for timing tests pos <- sort(sample(nX, size=nPos, replace=FALSE)) pos[1] <- 1L list(x = seq_len(nX), pos = pos) } f0 <- function (x, pos, FUN = length) { # OP's code, slightly modified pos2 <- c(pos, length(x) + 1) retval <- c() for (i in seq_len(length(pos))) { tmp <- x[pos2[i]:(pos2[i + 1] - 1)] retval <- c(retval, FUN(tmp)) } retval } f1 <- function (x, pos, FUN = length) { # like f0 but we preallocate the result pos2 <- c(pos, length(x) + 1) retval <- numeric(length(pos)) for (i in seq_len(length(pos))) { tmp <- x[pos2[i]:(pos2[i + 1] - 1)] retval[i] <- FUN(tmp) } retval } f2 <- function (x, pos, FUN = length) { # use tapply groupId <- rep(seq_along(pos), diff(c(pos, length(x) + 1))) tapply(x, groupId, FUN) } f3 <- function (x, pos, FUN = length) { # lapply(split(...)) groupId <- rep(seq_along(pos), diff(c(pos, length(x) + 1))) unlist(lapply(split(x, groupId), FUN)) } # make one million numbers in 400 thousand groups z <- makeData(nX=1e6, nPos=4e5) t0 <- system.time( r0 <- f0(z$x, z$pos) ) t1 <- system.time( r1 <- f1(z$x, z$pos) ) t2 <- system.time( r2 <- f2(z$x, z$pos) ) t3 <- system.time( r3 <- f3(z$x, z$pos) ) > rbind(t0=t0, t1=t1, t2=t2, t3=t3) user.self sys.self elapsed user.child sys.child t0 429.44 3.30 425.84 NA NA t1 3.20 0.00 3.16 NA NA t2 6.91 0.01 6.72 NA NA t3 2.68 0.02 2.72 NA NA The results from each, r0-r3, are almost the same. f1 produced a "numeric" (double precision) result instead of an integer one (length() returns an integer). tapply() spends time seeing if FUN always returns the same kind of result and simplifies the answer if it does. The others will run into problems if FUN doesn't always return a single number. Choose a method based on how general the code needs to be and how much error checking your require. In any case, growing a vector that is destined to be large can take a lot of time. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > On Tue, May 4, 2010 at 5:50 PM, jim holtman > <jholt...@gmail.com> wrote: > > > Try this: > > > > > x <- 1:10 > > > pos <- c(1,4,7) > > > pat <- rep(seq_along(pos), times=diff(c(pos, length(x) + 1))) > > > split(x, pat) > > $`1` > > [1] 1 2 3 > > $`2` > > [1] 4 5 6 > > $`3` > > [1] 7 8 9 10 > > > > > > > > On Tue, May 4, 2010 at 11:29 AM, Joris Meys > <jorism...@gmail.com> wrote: > > > >> Dear all, > >> > >> I'm trying to optimize code and want to avoid for-loops as much as > >> possible. > >> I'm applying a calculation on subvectors from a big one, > and I get the > >> subvectors by using a vector of starting positions: > >> > >> x <- 1:10 > >> pos <- c(1,4,7) > >> n <- length(x) > >> > >> I try to do something like this : > >> pos2 <- c(pos, n+1) > >> > >> out <- c() > >> for(i in 1:n){ > >> tmp <- x[pos2[i]:pos2[i+1]] > >> out <- c(out, length(tmp)) > >> } > >> > >> Never mind the length function, I apply a far more > complicated one. It's > >> about the use of the indices in the for-loop. I didn't see > any way of > >> doing > >> that with an apply, unless there is a very convenient way > of splitting my > >> vector in a list of the subvectors or so. > >> > >> Anybody an idea? > >> Cheers > >> -- > >> Joris Meys > >> Statistical Consultant > >> > >> Ghent University > >> Faculty of Bioscience Engineering > >> Department of Applied mathematics, biometrics and process control > >> > >> Coupure Links 653 > >> B-9000 Gent > >> > >> tel : +32 9 264 59 87 > >> joris.m...@ugent.be > >> ------------------------------- > >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html<http://www.r-proje ct.org/posting-guide.html> > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem that you are trying to solve? > > > > > > -- > Joris Meys > Statistical Consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > Coupure Links 653 > B-9000 Gent > > tel : +32 9 264 59 87 > joris.m...@ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.