Thanks, Bill. I also had some concerns about how reliable numeric values converted to character might be, so I'm glad to have an authoritative criticism. Of course, I was really just being cute with R's versatility.
But Jim Holtman's solution seems like the best way to go, anyway, does it not? -- Bert Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: William Dunlap [mailto:wdun...@tibco.com] Sent: Thursday, May 14, 2009 10:44 AM To: Bert Gunter; Gabor Grothendieck; christiaan pauw Cc: r-help@r-project.org Subject: RE: [R] Duplicates and duplicated The table()-based solution can have problems when there are very closely spaced floating point numbers in x, as in x1<-c(1, 1-.Machine$double.eps, 1+2*.Machine$double.eps)[c(1,2,3,2,3)] It also relies on table(x) turning x into a factor with the default levels=as.character(sort(x)) and that default may change. It omits NA's from the result. (I think it also ought to put the results in the original order of the data, so one can, e.g., omit or select values which are duplicated.) The ave()-based solution fails when there are NA's or NaN's in the data. x2 <- c(1,2,3,NA,10,6,3) The ave()-based solution can be slower than necessary on long datasets, especially ones with few or no duplicates. x3 <- sample(1e5,replace=FALSE) ; x3[17] <- x3[length(x3)-17] I think the following function avoids these problems. It never converts the data to character, but uses match() on the original data to convert it to a set of unique integers that tabulate can handle. f2 <- function(x){ ix<-match(x,x) tix<-tabulate(ix) retval<-logical(length(x)) retval[which(tix!=1)]<-TRUE retval } Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter > Sent: Thursday, May 14, 2009 9:10 AM > To: 'Gabor Grothendieck'; 'christiaan pauw' > Cc: r-help@r-project.org > Subject: Re: [R] Duplicates and duplicated > > ... or, similar in character to Gabor's solution: > > tbl <- table(x) > (tbl[as.character(sort(x))]>1)+0 > > > Bert Gunter > Nonclinical Biostatistics > 467-7374 > > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On > Behalf Of Gabor Grothendieck > Sent: Thursday, May 14, 2009 7:34 AM > To: christiaan pauw > Cc: r-help@r-project.org > Subject: Re: [R] Duplicates and duplicated > > Noting that: > > > ave(x, x, FUN = length) > 1 > [1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE > > try this: > > > rbind(x, dup = ave(x, x, FUN = length) > 1) > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > x 1 2 3 4 4 5 6 7 8 9 > dup 0 0 0 1 1 0 0 0 0 0 > > > On Thu, May 14, 2009 at 2:16 AM, christiaan pauw > <cjp...@gmail.com> wrote: > > Hi everybody. > > I want to identify not only duplicate number but also the > original number > > that has been duplicated. > > Example: > > x=c(1,2,3,4,4,5,6,7,8,9) > > y=duplicated(x) > > rbind(x,y) > > > > gives: > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > x 1 2 3 4 4 5 6 7 8 9 > > y 0 0 0 0 1 0 0 0 0 0 > > > > i.e. the second 4 [,5] is a duplicate. > > > > What I want is the first and second 4. i.e [,4] and [,5] to be TRUE > > > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > x 1 2 3 4 4 5 6 7 8 9 > > y 0 0 0 1 1 0 0 0 0 0 > > > > I assume it can be done by sorting the vector and then > checking is the > next > > or the previous entry matches using > > identical() . I am just unsure on how to write such a loop > the logic of > > which (I think) is as follows: > > > > sort x > > for every value of x check if the next value is identical > and return TRUE > > (or 1) if it is and FALSE (or 0) if it is not > > AND > > check is the previous value is identical and return TRUE > (or 1) if it is > and > > FALSE (or 0) if it is not > > > > Im i thinking correct and can some help to write such a function > > > > regards > > Christiaan > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.