Taking Hadley's clue, I guess x * unlist(lapply(rle(x)$lengths, seq_len))
is faster than my previous suggestion (which Dan had inspired in the first place). For > A vector of the following format: > (0,0,1,0,0,0,3,0,0,0,2,0,1,0,0,0,0,0,6) one might > z <- numeric(length(x)) > r <- rle(x) > i <- r$values==1 > z[cumsum(r$lengths)[i]] <- r$lengths[i] > z [1] 0 0 1 0 0 0 3 0 0 0 2 0 1 0 0 0 0 0 0 6 but since 'rle' keeps coming up, perhaps you're really wanting to know basic things about run-length encoding, such as the length of each run of 1's > r$lengths[r$values==1] [1] 1 3 2 1 6 ? If space and time are an issue, you might also consider reprsenting your data as 'raw' to save space > x <- rbinom(1000000, 1, .5) > object.size(x) [1] 8000040 > system.time(x*unlist(lapply(rle(x)$lengths, seq_len)), gcFirst=TRUE) user system elapsed 1.800 0.000 1.798 > x <- as.raw(x) > head(x) [1] 01 01 01 01 00 01 > object.size(x) [1] 1000040 > system.time((x==1)*unlist(lapply(rle(x)$lengths, seq_len)), gcFirst=TRUE) user system elapsed 1.730 0.000 1.730 (the timings are quite variable; perhaps they're about equal?) Martin "carlos martinez" <[EMAIL PROTECTED]> writes: > Appreciate the ingenious and effective suggestions and feedback from: > > Dan Davison > Vincent Goulet > Martin Morgan > Hadley Wickham > > The variety of technical approaches proposes so far are clear prove of the > strong and flexible capabilites of the R system, and specially the dynamics > and technical understanding of the R user base. > > We tested all four recommendations with an input vector of more than 850000 > components, and got time-responses from about 40-second to 20-seconds. > > All four approches produced the desired vector. The Wickham's approach > produced and extra vector, but the second vector included the correct > format. > > Just one additional follow up, to obtain from the same input vector: > c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1) > > A vector of the following format: > (0,0,1,0,0,0,3,0,0,0,2,0,1,0,0,0,0,0,6) > > Will be easier and more efficient to start from the original input vector, > or start from the above second vector > (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6) > > Thanks for your responses. > > ------------------------------------------------------------------------- > Hadley Wickham Approach > > How about: > > unlist(lapply(split(x, cumsum(x == 0)), seq_along)) - 1 > > Hadley > -------------------------------------------------------------------------- > -----Original Message----- > From: Martin Morgan [mailto:[EMAIL PROTECTED] > Sent: Saturday, April 12, 2008 5:00 PM > To: Dan Davison > Cc: [EMAIL PROTECTED] > Subject: Re: [Rd] HOW TO AVOID LOOPS > > (anonymous 'off-list' response; some extra calcs but tidy) > >> x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1) >> x * unlist(lapply(rle(x)$lengths, seq)) > [1] 0 0 1 0 1 2 3 0 0 1 2 0 1 0 1 2 3 4 5 6 > > > Dan Davison <[EMAIL PROTECTED]> writes: > >> On Sat, Apr 12, 2008 at 06:45:00PM +0100, Dan Davison wrote: >>> On Sat, Apr 12, 2008 at 01:30:13PM -0400, Vincent Goulet wrote: >>> > Le sam. 12 avr. à 12:47, carlos martinez a écrit : >>> > >> Looking for a simple, effective a minimum execution time solution. >>> > >> >>> > >> For a vector as: >>> > >> >>> > >> c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1) >>> > >> >>> > > To transform it to the following vector without using any loops: >>> > > >>> > >> (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6) >>> > >> >>> > > Appreciate any suggetions. >>> > >>> > This does it -- but it is admittedly ugly: >>> > >>> > > x <- c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1) >>> > > ind <- which(x == 0) >>> > > unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))), >>> > function(y) cumsum(x[y]))) >>> > [1] 0 0 1 0 1 2 3 0 0 1 2 0 1 0 1 2 3 4 5 6 >>> > >>> > (The mapply() part is used to create the indexes of each sequence >>> > in x starting with a 0. The rest is then straightforward.) >>> >>> >>> Here's my effort. Maybe a bit easier to digest? Only one *apply so > probably more efficient. >>> >>> function(x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)) { >>> d <- diff(c(0,x,0)) >>> starts <- which(d == 1) >>> ends <- which(d == -1) >>> x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n)) >>> x >>> } >>> >> >> Come to think of it, I suggest using the existing R function rle(), rather > than my dodgy substitute. >> >> e.g. >> >> g <- function(x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)) { >> >> runs <- rle(x) >> runlengths <- runs$lengths[runs$values == 1] >> x[x == 1] <- unlist(lapply(runlengths, function(n) 1:n)) >> x >> } >> >> Dan >> >> p.s. R-help would perhaps have been more appropriate than R-devel >> >> >>> Dan >>> >>> >>> > >>> > HTH >>> > >>> > --- >>> > Vincent Goulet, Associate Professor >>> > École d'actuariat >>> > Université Laval, Québec >>> > [EMAIL PROTECTED] http://vgoulet.act.ulaval.ca >>> > >>> > ______________________________________________ >>> > R-devel@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview > Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel