Now I nailed down the problem, but I am still confused why match() takes the 1st two components and the last two the same.
> match(a,a) [1] 1 2 3 1 2 > a [[1]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3" [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" [43] "YMRCTy1-4" "YMR050C" "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" "YNL054W-B" [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [[2]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2" [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YNLCTy1-1" "YNL284C-B" [49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" [55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [61] "YPRCTy1-4" "YPR158C-D" [[3]] [1] "YARCTy1-1" "YAR009C" "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" [13] "YERCTy1-1" "YER138C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [19] "YJRWTy1-1" "YJR027W" "YJRWTy1-2" "YJR029W" "YLRCTy1-1" "YLR157C-B" [25] "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-4" [31] "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" [37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [[4]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3" [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" [43] "YMRCTy1-4" "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" [49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [[5]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2" [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YOLWTy1-1" "YOL103W-B" [49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" [55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D" On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu <zhenjiang...@gmail.com> wrote: > Thanks, Bill. match() is nice and efficient. However, I met a problem: > > My real data is a large _list_ named "read.genes". I found conflict results > between match() and unique() - the lengths of the outcomes are different > (and my final result are wrong too). I suspect that some different list > components are regarded as the same when they are converted to vectors (the > r-help of match() says "Factors, raw vectors and lists are converted to > character vectors"). Is it possible? And as important, how to fix this? > > > read.genes[[1]] > [1] "YAL065C" "YAL063C" "YAR050W" "YHR211W" > > > duplicates <- as.vector(table(match(read.genes, read.genes))) > > > length(duplicates) > [1] 1424 > > read.genes.uniq <- unique(read.genes) > > length(read.genes.uniq) > [1] 1469 > > > sum(duplicates) > [1] 9945348 > > length(read.genes) > [1] 9945348 > > On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap <wdun...@tibco.com>wrote: > >> table(match(x, x)) gives you the numbers but the labels are >> a bit more work. >> >> E.g., I'll define another list >> > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), >> 2^(0:4)) >> > tb <- table(m <- match(x, x)) >> > m >> [1] 1 1 3 4 3 >> > tb >> >> 1 3 4 >> 2 2 1 >> which says that the first element of x is seen twice, >> the third twice, and the fourth once. How to organize >> that the best depends on what you want to do with the >> data. >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> > -----Original Message----- >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf Of zhenjiang xu >> > Sent: Wednesday, August 31, 2011 9:25 AM >> > To: r-help >> > Subject: [R] counting the duplicates in an object of list >> > >> > Hi all, >> > >> > I have a list x: >> > >> > > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3')) >> > >> > I can get the unique elements with unique(), but how can I get the >> > number of duplicates for each unique elements? >> > >> > > unique(x) >> > [[1]] >> > [1] "1" "2" >> > >> > [[2]] >> > [1] "2" "3" >> > >> > Thanks >> > >> > -- >> > Best, >> > Zhenjiang >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Best, > Zhenjiang > -- Best, Zhenjiang [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.