I tried converting the elements to strings before, but due to the large data size it took forever to finish with paste(). Is there anyway to set the default width.cutoff longer and pass it to match()?
On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap <wdun...@tibco.com> wrote: > match(aList, aList) probably does what as.character(aList) does:**** > > cut off the character strings at 500 characters (because deparse(x,**** > > nlines=1, width.cutoff) requires that width.cutoff<=500) . Try**** > > converting the elements to character strings yourself before passing them* > *** > > to match. E.g.,**** > > ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))**** > > and use match on that. You can use the indices it returns on**** > > the original list.**** > > ** ** > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com **** > > *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com] > *Sent:* Wednesday, September 07, 2011 7:25 PM > *To:* William Dunlap > *Cc:* r-help > *Subject:* Re: [R] counting the duplicates in an object of list**** > > ** ** > > Now I nailed down the problem, but I am still confused why match() takes > the 1st two components and the last two the same.**** > > ** ** > > > match(a,a)**** > > [1] 1 2 3 1 2**** > > ** ** > > > a**** > > [[1]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-3"**** > > [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" > **** > > [43] "YMRCTy1-4" "YMR050C" "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" > "YNL054W-B"**** > > [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" > "YPL257W-B"**** > > [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"**** > > ** ** > > [[2]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-2"**** > > [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" > **** > > [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YNLCTy1-1" > "YNL284C-B"**** > > [49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" > "YOR142W-B"**** > > [55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" > "YPR158W-B"**** > > [61] "YPRCTy1-4" "YPR158C-D"**** > > ** ** > > [[3]]**** > > [1] "YARCTy1-1" "YAR009C" "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" > "YDR210C-D"**** > > [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" > "YDR365W-B"**** > > [13] "YERCTy1-1" "YER138C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [19] "YJRWTy1-1" "YJR027W" "YJRWTy1-2" "YJR029W" "YLRCTy1-1" > "YLR157C-B"**** > > [25] "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" > "YMRCTy1-4"**** > > [31] "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" > "YPLWTy1-1"**** > > [37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"**** > > ** ** > > [[4]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-3"**** > > [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" > **** > > [43] "YMRCTy1-4" "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" > "YOR142W-B"**** > > [49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" > "YPR158W-B"**** > > ** ** > > [[5]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-2"**** > > [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" > **** > > [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YOLWTy1-1" > "YOL103W-B"**** > > [49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" > "YPR137C-B"**** > > [55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D"**** > > ** ** > > On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu <zhenjiang...@gmail.com> > wrote:**** > > Thanks, Bill. match() is nice and efficient. However, I met a problem:**** > > ** ** > > My real data is a large _list_ named "read.genes". I found conflict results > between match() and unique() - the lengths of the outcomes are different > (and my final result are wrong too). I suspect that some different list > components are regarded as the same when they are converted to vectors (the > r-help of match() says "Factors, raw vectors and lists are converted to > character vectors"). Is it possible? And as important, how to fix this?*** > * > > ** ** > > > read.genes[[1]]**** > > [1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"**** > > ** ** > > > duplicates <- as.vector(table(match(read.genes, read.genes)))**** > > ** ** > > > length(duplicates)**** > > [1] 1424**** > > > read.genes.uniq <- unique(read.genes)**** > > > length(read.genes.uniq)**** > > [1] 1469**** > > ** ** > > > sum(duplicates)**** > > [1] 9945348**** > > > length(read.genes)**** > > [1] 9945348**** > > ** ** > > On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap <wdun...@tibco.com> > wrote:**** > > table(match(x, x)) gives you the numbers but the labels are > a bit more work. > > E.g., I'll define another list > > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4)) > > tb <- table(m <- match(x, x)) > > m > [1] 1 1 3 4 3 > > tb > > 1 3 4 > 2 2 1 > which says that the first element of x is seen twice, > the third twice, and the fourth once. How to organize > that the best depends on what you want to do with the > data. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com**** > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of zhenjiang xu > > Sent: Wednesday, August 31, 2011 9:25 AM > > To: r-help > > Subject: [R] counting the duplicates in an object of list > > > > Hi all, > > > > I have a list x: > > > > > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3')) > > > > I can get the unique elements with unique(), but how can I get the > > number of duplicates for each unique elements? > > > > > unique(x) > > [[1]] > > [1] "1" "2" > > > > [[2]] > > [1] "2" "3" > > > > Thanks > > > > -- > > Best, > > Zhenjiang > >**** > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.**** > > > > **** > > ** ** > > -- > Best, > Zhenjiang**** > > > > **** > > ** ** > > -- > Best, > Zhenjiang**** > -- Best, Zhenjiang [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.