hmm, frustrating. BTW, unique() works alright. It seems not using deparse() or using it differently
On Wed, Sep 7, 2011 at 11:27 PM, William Dunlap <wdun...@tibco.com> wrote: > I don't think you can increase width.cutoff above 500 and**** > > it isn't an argument to as.character or match. The best**** > > solution would be to avoid the internal use of deparse**** > > when using match() or unique() on lists and to hash the**** > > list element directly, but that is a fair bit of work.**** > > ** ** > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com **** > > *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com] > *Sent:* Wednesday, September 07, 2011 8:04 PM > > *To:* William Dunlap > *Cc:* r-help > *Subject:* Re: [R] counting the duplicates in an object of list**** > > ** ** > > I tried converting the elements to strings before, but due to the large > data size it took forever to finish with paste(). Is there anyway to set the > default width.cutoff longer and pass it to match()?**** > > On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap <wdun...@tibco.com> wrote: > **** > > match(aList, aList) probably does what as.character(aList) does:**** > > cut off the character strings at 500 characters (because deparse(x,**** > > nlines=1, width.cutoff) requires that width.cutoff<=500) . Try**** > > converting the elements to character strings yourself before passing them* > *** > > to match. E.g.,**** > > ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))**** > > and use match on that. You can use the indices it returns on**** > > the original list.**** > > **** > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com **** > > *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com] > *Sent:* Wednesday, September 07, 2011 7:25 PM > *To:* William Dunlap > *Cc:* r-help > *Subject:* Re: [R] counting the duplicates in an object of list**** > > **** > > Now I nailed down the problem, but I am still confused why match() takes > the 1st two components and the last two the same.**** > > **** > > > match(a,a)**** > > [1] 1 2 3 1 2**** > > **** > > > a**** > > [[1]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-3"**** > > [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" > **** > > [43] "YMRCTy1-4" "YMR050C" "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" > "YNL054W-B"**** > > [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" > "YPL257W-B"**** > > [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"**** > > **** > > [[2]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-2"**** > > [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" > **** > > [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YNLCTy1-1" > "YNL284C-B"**** > > [49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" > "YOR142W-B"**** > > [55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" > "YPR158W-B"**** > > [61] "YPRCTy1-4" "YPR158C-D"**** > > **** > > [[3]]**** > > [1] "YARCTy1-1" "YAR009C" "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" > "YDR210C-D"**** > > [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" > "YDR365W-B"**** > > [13] "YERCTy1-1" "YER138C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [19] "YJRWTy1-1" "YJR027W" "YJRWTy1-2" "YJR029W" "YLRCTy1-1" > "YLR157C-B"**** > > [25] "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" > "YMRCTy1-4"**** > > [31] "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" > "YPLWTy1-1"**** > > [37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"**** > > **** > > [[4]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-3"**** > > [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" > **** > > [43] "YMRCTy1-4" "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" > "YOR142W-B"**** > > [49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" > "YPR158W-B"**** > > **** > > [[5]]**** > > [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" > "YBR012W-B"**** > > [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" > "YDR261C-D"**** > > [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" > **** > > [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" > "YGR038C-B"**** > > [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" > **** > > [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" > "YLRWTy1-2"**** > > [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" > **** > > [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YOLWTy1-1" > "YOL103W-B"**** > > [49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" > "YPR137C-B"**** > > [55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D"**** > > **** > > On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu <zhenjiang...@gmail.com> > wrote:**** > > Thanks, Bill. match() is nice and efficient. However, I met a problem:**** > > **** > > My real data is a large _list_ named "read.genes". I found conflict results > between match() and unique() - the lengths of the outcomes are different > (and my final result are wrong too). I suspect that some different list > components are regarded as the same when they are converted to vectors (the > r-help of match() says "Factors, raw vectors and lists are converted to > character vectors"). Is it possible? And as important, how to fix this?*** > * > > **** > > > read.genes[[1]]**** > > [1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"**** > > **** > > > duplicates <- as.vector(table(match(read.genes, read.genes)))**** > > **** > > > length(duplicates)**** > > [1] 1424**** > > > read.genes.uniq <- unique(read.genes)**** > > > length(read.genes.uniq)**** > > [1] 1469**** > > **** > > > sum(duplicates)**** > > [1] 9945348**** > > > length(read.genes)**** > > [1] 9945348**** > > **** > > On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap <wdun...@tibco.com> > wrote:**** > > table(match(x, x)) gives you the numbers but the labels are > a bit more work. > > E.g., I'll define another list > > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4)) > > tb <- table(m <- match(x, x)) > > m > [1] 1 1 3 4 3 > > tb > > 1 3 4 > 2 2 1 > which says that the first element of x is seen twice, > the third twice, and the fourth once. How to organize > that the best depends on what you want to do with the > data. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com**** > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of zhenjiang xu > > Sent: Wednesday, August 31, 2011 9:25 AM > > To: r-help > > Subject: [R] counting the duplicates in an object of list > > > > Hi all, > > > > I have a list x: > > > > > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3')) > > > > I can get the unique elements with unique(), but how can I get the > > number of duplicates for each unique elements? > > > > > unique(x) > > [[1]] > > [1] "1" "2" > > > > [[2]] > > [1] "2" "3" > > > > Thanks > > > > -- > > Best, > > Zhenjiang > >**** > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.**** > > > > **** > > **** > > -- > Best, > Zhenjiang**** > > > > **** > > **** > > -- > Best, > Zhenjiang**** > > > > **** > > ** ** > > -- > Best, > Zhenjiang**** > -- Best, Zhenjiang [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.