I don't think you can increase width.cutoff above 500 and
it isn't an argument to as.character or match.  The best
solution would be to avoid the internal use of deparse
when using match() or unique() on lists and to hash the
list element directly, but that is a fair bit of work.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
From: zhenjiang xu [mailto:zhenjiang...@gmail.com]
Sent: Wednesday, September 07, 2011 8:04 PM
To: William Dunlap
Cc: r-help
Subject: Re: [R] counting the duplicates in an object of list

I tried converting the elements to strings before, but due to the large data 
size it took forever to finish with paste(). Is there anyway to set the default 
width.cutoff longer and pass it to match()?
On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap 
<wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote:
match(aList, aList) probably does what as.character(aList) does:
cut off the character strings at 500 characters (because deparse(x,
nlines=1, width.cutoff) requires that width.cutoff<=500) .  Try
converting the elements to character strings yourself before passing them
to match.  E.g.,
    ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))
and use match on that.  You can use the indices it returns on
the original list.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>
From: zhenjiang xu 
[mailto:zhenjiang...@gmail.com<mailto:zhenjiang...@gmail.com>]
Sent: Wednesday, September 07, 2011 7:25 PM
To: William Dunlap
Cc: r-help
Subject: Re: [R] counting the duplicates in an object of list

Now I nailed down the problem, but I am still confused why match() takes the 
1st two components and the last two the same.

> match(a,a)
[1] 1 2 3 1 2

> a
[[1]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" "YNL054W-B"
[49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B"
[55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[2]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B"
[49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"
[61] "YPRCTy1-4" "YPR158C-D"

[[3]]
 [1] "YARCTy1-1" "YAR009C"   "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D"
 [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B"
[13] "YERCTy1-1" "YER138C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[19] "YJRWTy1-1" "YJR027W"   "YJRWTy1-2" "YJR029W"   "YLRCTy1-1" "YLR157C-B"
[25] "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-4"
[31] "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
[37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[4]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[5]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B"
[49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B"
[55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D"

On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu 
<zhenjiang...@gmail.com<mailto:zhenjiang...@gmail.com>> wrote:
Thanks, Bill. match() is nice and efficient. However, I met a problem:

My real data is a large _list_ named "read.genes". I found conflict results 
between match() and unique() - the lengths of the outcomes are different (and 
my final result are wrong too). I suspect that some different list components 
are regarded as the same when they are converted to vectors (the r-help of 
match() says "Factors, raw vectors and lists are converted to character 
vectors"). Is it possible? And as important, how to fix this?

> read.genes[[1]]
[1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"

> duplicates <- as.vector(table(match(read.genes, read.genes)))

> length(duplicates)
[1] 1424
> read.genes.uniq <- unique(read.genes)
> length(read.genes.uniq)
[1] 1469

> sum(duplicates)
[1] 9945348
> length(read.genes)
[1] 9945348

On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap 
<wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote:
table(match(x, x)) gives you the numbers but the labels are
a bit more work.

E.g., I'll define another list
 > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4))
 > tb <- table(m <- match(x, x))
 > m
 [1] 1 1 3 4 3
 > tb

 1 3 4
 2 2 1
which says that the first element of x is seen twice,
the third twice, and the fourth once.  How to organize
that the best depends on what you want to do with the
data.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>

> -----Original Message-----
> From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
> Behalf Of zhenjiang xu
> Sent: Wednesday, August 31, 2011 9:25 AM
> To: r-help
> Subject: [R] counting the duplicates in an object of list
>
> Hi all,
>
> I have a list x:
>
>  > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3'))
>
> I can get the unique elements with unique(), but how can I get the
> number of duplicates for each unique elements?
>
> > unique(x)
> [[1]]
> [1] "1" "2"
>
> [[2]]
> [1] "2" "3"
>
> Thanks
>
> --
> Best,
> Zhenjiang
>
> ______________________________________________
> R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Best,
Zhenjiang



--
Best,
Zhenjiang



--
Best,
Zhenjiang

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to