Thanks very much Bill, good catch! John
________________________________ From: William Dunlap <wdun...@tibco.com> Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 3:52:47 PM Subject: RE: [R] algorithm help > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of array chip > Sent: Thursday, January 06, 2011 3:29 PM > To: ted.hard...@wlandres.net > Cc: r-h...@stat.math.ethz.ch > Subject: Re: [R] algorithm help > [[elided Yahoo spam]] > > I made a routine to do this: > > f.fragment<-function(a,b) { > dat<-as.data.frame(cbind(a,b)) > > L <- rle(dat$a)$lengths > V <- rle(dat$a)$values > pos <- c(1,cumsum(L)) > V1 <- c(-1,V) > start<-1+pos[V1==0] > end<-pos[V1==1] > > cbind(stretch=1:length(start),start=dat$b[start] > ,end=dat$b[end],no.of.1s=L[V==1]) > > } > > f.fragment(dat$a,dat$b) > > stretch start end no.of.1s > [1,] 1 13 20 4 > [2,] 2 34 46 2 > [3,] 3 49 77 4 > [4,] 4 97 97 1 You need to be more careful about the first and last rows in the dataset. I think yours only works when a starts with 0 and ends with 1. > f.fragment(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 1 NA 12 2 > f.fragment(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 1 14 12 2 [2,] 1 14 14 1 > f.fragment(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 1 12 12 1 [2,] 2 14 14 1 > f.fragment(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 1 12 12 1 [2,] 2 NA 12 1 > f.fragment(c(1,1,1,1), c(11,12,13,14)) stretch end no.of.1s [1,] 1 14 4 [2,] 0 14 4 > f.fragment(c(0,0,0,0), c(11,12,13,14)) stretch start [1,] 1 NA The following does better. It keeps things as logical vectors as long as possible, which tends to work better when dealing with runs. f <- function(a, b) { isFirstIn1Run <- c(TRUE, a[-1] != a[-length(a)]) & a==1 isLastIn1Run <- c(a[-1] != a[-length(a)], TRUE) & a==1 data.frame(stretch=seq_len(sum(isFirstIn1Run)), start = b[isFirstIn1Run], end = b[isLastIn1Run], no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run) + 1) } > f(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 1 11 12 2 > f(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 1 11 12 2 2 2 14 14 1 > f(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 1 12 12 1 2 2 14 14 1 > f(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 1 12 12 1 > f(c(1,1,1,1), c(11,12,13,14)) stretch start end no.of.1s 1 1 11 14 4 > f(c(0,0,0,0), c(11,12,13,14)) [1] stretch start end no.of.1s <0 rows> (or 0-length row.names) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > John > > > > > ________________________________ > From: "ted.hard...@wlandres.net" <ted.hard...@wlandres.net> > > Cc: r-h...@stat.math.ethz.ch > Sent: Thu, January 6, 2011 2:57:47 PM > Subject: RE: [R] algorithm help > > On 06-Jan-11 22:16:38, array chip wrote: > > Hi, I am seeking help on designing an algorithm to identify the > > locations of stretches of 1s in a vector of 0s and 1s. Below is > > an simple example: > > > >> > dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) > > ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) > > > >> dat > > a b > > 1 0 4 > > 2 0 12 > > 3 1 13 > > 4 1 16 > > 5 1 18 > > 6 1 20 > > 7 0 28 > > 8 0 30 > > 9 1 34 > > 10 1 46 > > 11 0 47 > > 12 1 49 > > 13 1 61 > > 14 1 73 > > 15 1 77 > > 16 0 84 > > 17 0 87 > > 18 0 90 > > 19 0 95 > > 20 1 97 > > > > In this dataset, "b" is sorted and denotes the location for each > > number in "a". > > So I would like to find the starting & ending locations for each > > stretch of 1s within "a", also counting the number of 1s in each > > stretch as well. > > Hope the results from the algorithm would be: > > > > stretch start end No.of.1s > > 1 13 20 4 > > 2 34 46 2 > > 3 49 77 4 > > 4 97 97 1 > > > > I can imagine using for loops can do the job, but I feel it's not a > > clever way to do this. Is there an efficient algorithm that can do > > this fast? > > > > Thanks for any suggestions. > > John > > The basic information you need can be got using rle() ("run length > encoding"). See '?rle'. In your example: > > rle(dat$a) > # Run Length Encoding > # lengths: int [1:8] 2 4 2 2 1 4 4 1 > # values : num [1:8] 0 1 0 1 0 1 0 1 > ## Note: F -> 0, T -> 1 > > The following has a somewhat twisted logic at the end, and may > [[elided Yahoo spam]] > > L <- rle(dat$a)$lengths > V <- rle(dat$a)$values > pos <- c(1,cumsum(L)) > V1 <- c(-1,V) > 1+pos[V1==0] > # [1] 3 9 12 20 > ## Positions in the series dat$a where each run of "T" (i.e. 1) > ## starts > > Hoping this helps, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@wlandres.net> > Fax-to-email: +44 (0)870 094 0861 > Date: 06-Jan-11 Time: 22:57:44 > ------------------------------ XFMail ------------------------------ > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.