match(aList, aList) probably does what as.character(aList) does:
cut off the character strings at 500 characters (because deparse(x,
nlines=1, width.cutoff) requires that width.cutoff<=500) .  Try
converting the elements to character strings yourself before passing them
to match.  E.g.,
    ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))
and use match on that.  You can use the indices it returns on
the original list.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
From: zhenjiang xu [mailto:zhenjiang...@gmail.com]
Sent: Wednesday, September 07, 2011 7:25 PM
To: William Dunlap
Cc: r-help
Subject: Re: [R] counting the duplicates in an object of list

Now I nailed down the problem, but I am still confused why match() takes the 
1st two components and the last two the same.

> match(a,a)
[1] 1 2 3 1 2

> a
[[1]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" "YNL054W-B"
[49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B"
[55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[2]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B"
[49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"
[61] "YPRCTy1-4" "YPR158C-D"

[[3]]
 [1] "YARCTy1-1" "YAR009C"   "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D"
 [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B"
[13] "YERCTy1-1" "YER138C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[19] "YJRWTy1-1" "YJR027W"   "YJRWTy1-2" "YJR029W"   "YLRCTy1-1" "YLR157C-B"
[25] "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-4"
[31] "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
[37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[4]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[5]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B"
[49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B"
[55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D"

On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu 
<zhenjiang...@gmail.com<mailto:zhenjiang...@gmail.com>> wrote:
Thanks, Bill. match() is nice and efficient. However, I met a problem:

My real data is a large _list_ named "read.genes". I found conflict results 
between match() and unique() - the lengths of the outcomes are different (and 
my final result are wrong too). I suspect that some different list components 
are regarded as the same when they are converted to vectors (the r-help of 
match() says "Factors, raw vectors and lists are converted to character 
vectors"). Is it possible? And as important, how to fix this?

> read.genes[[1]]
[1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"

> duplicates <- as.vector(table(match(read.genes, read.genes)))

> length(duplicates)
[1] 1424
> read.genes.uniq <- unique(read.genes)
> length(read.genes.uniq)
[1] 1469

> sum(duplicates)
[1] 9945348
> length(read.genes)
[1] 9945348

On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap 
<wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote:
table(match(x, x)) gives you the numbers but the labels are
a bit more work.

E.g., I'll define another list
 > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4))
 > tb <- table(m <- match(x, x))
 > m
 [1] 1 1 3 4 3
 > tb

 1 3 4
 2 2 1
which says that the first element of x is seen twice,
the third twice, and the fourth once.  How to organize
that the best depends on what you want to do with the
data.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>

> -----Original Message-----
> From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
> Behalf Of zhenjiang xu
> Sent: Wednesday, August 31, 2011 9:25 AM
> To: r-help
> Subject: [R] counting the duplicates in an object of list
>
> Hi all,
>
> I have a list x:
>
>  > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3'))
>
> I can get the unique elements with unique(), but how can I get the
> number of duplicates for each unique elements?
>
> > unique(x)
> [[1]]
> [1] "1" "2"
>
> [[2]]
> [1] "2" "3"
>
> Thanks
>
> --
> Best,
> Zhenjiang
>
> ______________________________________________
> R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Best,
Zhenjiang



--
Best,
Zhenjiang

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to