> What I was trying to do was get a vector saying, for each item, > whether that item is the same as the preceding item. Now that I think > of it, I could do this easily by copying the vector, shifting it over > one (by removing the first element and adding something to the end), > and then just compare the elements of the two vectors directly.
Right. Did you look at rle() yet? Though for your particular simple case, > system.time(verylong[1:(n-1)] == verylong[2:n]) user system elapsed 0.001 0.000 0.002 is nearly instantaneous. On Wed, Dec 5, 2012 at 5:04 PM, Stephen Politzer-Ahles <politzerahl...@gmail.com> wrote: > Hi Sarah, > > Thanks a lot for your explanation. I was mistakenly under the > impression that duplicated() only looked at immediately preceding > element, not all preceding elements. > > What I was trying to do was get a vector saying, for each item, > whether that item is the same as the preceding item. Now that I think > of it, I could do this easily by copying the vector, shifting it over > one (by removing the first element and adding something to the end), > and then just compare the elements of the two vectors directly. > > Best, > Steve > > On Wed, Dec 5, 2012 at 3:08 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: >> Hi, >> >> duplicated() doesn't just look at consecutive values, but anywhere in >> the object. Since your 12320-element vector has only 48 separate >> values, and all of them occur before the last 30 elements, so >> duplicated() returns TRUE. >> >> You might be looking for something involving rle(). What are you >> trying to accomplish? >> >> Sarah >> >> On Wed, Dec 5, 2012 at 3:53 PM, Stephen Politzer-Ahles >> <politzerahl...@gmail.com> wrote: >>> Hello, >>> >>> duplicated() does not seem to work for a long vector. For example, if >>> you download the data from >>> https://docs.google.com/open?id=0B6-m45Jvl3ZmNmpaSlJWMXo5bmc (a vector >>> with about 12,000 numbers) and then run the following code which does >>> duplicated() over the whole vector but just shows the last 30 >>> elements: >>> >>> data.frame( tail(verylong, 30), tail(duplicated(verylong), 30) ) >>> >>> you'll see that at the end of the very long vector everything is >>> listed as a duplicate of the preceding element (even though it >>> shouldn't be). On the other hand, if you run the following code which >>> just takes out the last 30 elements of the vector and does duplicated >>> on them: >>> >>> data.frame( tail(verylong, 30), duplicated(tail(verylong, 30)) ) >>> >>> you get the correct results (FALSE shows up wherever the value in the >>> first column changes). Does anyone know why this happens, and if >>> there's a fix? I notice the documentation for duplicated() says: "Long >>> vectors are supported for the default method of duplicated, but may >>> only be usable if nmax is supplied." But I've tried running this with >>> a high value of nmax given, and it still gives me the same problem. >>> >>> So far the only way I've figured out to get this duplicated()-like >>> vector is to use a for loop going through one item at a time, but that >>> takes about a minute to run. >>> >>> Best, >>> Steve Politzer-Ahles >>> >> >> -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.