I don't think you can increase width.cutoff above 500 and it isn't an argument to as.character or match. The best solution would be to avoid the internal use of deparse when using match() or unique() on lists and to hash the list element directly, but that is a fair bit of work.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com From: zhenjiang xu [mailto:zhenjiang...@gmail.com] Sent: Wednesday, September 07, 2011 8:04 PM To: William Dunlap Cc: r-help Subject: Re: [R] counting the duplicates in an object of list I tried converting the elements to strings before, but due to the large data size it took forever to finish with paste(). Is there anyway to set the default width.cutoff longer and pass it to match()? On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap <wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote: match(aList, aList) probably does what as.character(aList) does: cut off the character strings at 500 characters (because deparse(x, nlines=1, width.cutoff) requires that width.cutoff<=500) . Try converting the elements to character strings yourself before passing them to match. E.g., ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai))) and use match on that. You can use the indices it returns on the original list. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com<http://tibco.com> From: zhenjiang xu [mailto:zhenjiang...@gmail.com<mailto:zhenjiang...@gmail.com>] Sent: Wednesday, September 07, 2011 7:25 PM To: William Dunlap Cc: r-help Subject: Re: [R] counting the duplicates in an object of list Now I nailed down the problem, but I am still confused why match() takes the 1st two components and the last two the same. > match(a,a) [1] 1 2 3 1 2 > a [[1]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3" [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" [43] "YMRCTy1-4" "YMR050C" "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" "YNL054W-B" [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [[2]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2" [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YNLCTy1-1" "YNL284C-B" [49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" [55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [61] "YPRCTy1-4" "YPR158C-D" [[3]] [1] "YARCTy1-1" "YAR009C" "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" [13] "YERCTy1-1" "YER138C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [19] "YJRWTy1-1" "YJR027W" "YJRWTy1-2" "YJR029W" "YLRCTy1-1" "YLR157C-B" [25] "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-4" [31] "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" [37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [[4]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3" [37] "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" "YMRCTy1-3" "YMR045C" [43] "YMRCTy1-4" "YMR050C" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" [49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B" [[5]] [1] "YARCTy1-1" "YAR009C" "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B" [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D" [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C" [19] "YERCTy1-2" "YER160C" "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B" [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W" [31] "YJRWTy1-2" "YJR029W" "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2" [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W" "YMLWTy1-2" "YML039W" [43] "YMRCTy1-3" "YMR045C" "YMRCTy1-4" "YMR050C" "YOLWTy1-1" "YOL103W-B" [49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" [55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D" On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu <zhenjiang...@gmail.com<mailto:zhenjiang...@gmail.com>> wrote: Thanks, Bill. match() is nice and efficient. However, I met a problem: My real data is a large _list_ named "read.genes". I found conflict results between match() and unique() - the lengths of the outcomes are different (and my final result are wrong too). I suspect that some different list components are regarded as the same when they are converted to vectors (the r-help of match() says "Factors, raw vectors and lists are converted to character vectors"). Is it possible? And as important, how to fix this? > read.genes[[1]] [1] "YAL065C" "YAL063C" "YAR050W" "YHR211W" > duplicates <- as.vector(table(match(read.genes, read.genes))) > length(duplicates) [1] 1424 > read.genes.uniq <- unique(read.genes) > length(read.genes.uniq) [1] 1469 > sum(duplicates) [1] 9945348 > length(read.genes) [1] 9945348 On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap <wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote: table(match(x, x)) gives you the numbers but the labels are a bit more work. E.g., I'll define another list > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4)) > tb <- table(m <- match(x, x)) > m [1] 1 1 3 4 3 > tb 1 3 4 2 2 1 which says that the first element of x is seen twice, the third twice, and the fourth once. How to organize that the best depends on what you want to do with the data. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com<http://tibco.com> > -----Original Message----- > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> > [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On > Behalf Of zhenjiang xu > Sent: Wednesday, August 31, 2011 9:25 AM > To: r-help > Subject: [R] counting the duplicates in an object of list > > Hi all, > > I have a list x: > > > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3')) > > I can get the unique elements with unique(), but how can I get the > number of duplicates for each unique elements? > > > unique(x) > [[1]] > [1] "1" "2" > > [[2]] > [1] "2" "3" > > Thanks > > -- > Best, > Zhenjiang > > ______________________________________________ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Best, Zhenjiang -- Best, Zhenjiang -- Best, Zhenjiang [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.