Now I nailed down the problem, but I am still confused why match() takes the
1st two components and the last two the same.

> match(a,a)
[1] 1 2 3 1 2

> a
[[1]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2" "YNL054W-B"
[49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B"
[55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[2]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B"
[49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"
[61] "YPRCTy1-4" "YPR158C-D"

[[3]]
 [1] "YARCTy1-1" "YAR009C"   "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D"
 [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B"
[13] "YERCTy1-1" "YER138C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[19] "YJRWTy1-1" "YJR027W"   "YJRWTy1-2" "YJR029W"   "YLRCTy1-1" "YLR157C-B"
[25] "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-4"
[31] "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
[37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[4]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-3"
[37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
[43] "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
[49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"

[[5]]
 [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2" "YBR012W-B"
 [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3" "YDR261C-D"
[13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
[19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2" "YGR038C-B"
[25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
[31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B" "YLRWTy1-2"
[37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
[43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B"
[49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B"
[55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D"

On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu <zhenjiang...@gmail.com> wrote:

> Thanks, Bill. match() is nice and efficient. However, I met a problem:
>
> My real data is a large _list_ named "read.genes". I found conflict results
> between match() and unique() - the lengths of the outcomes are different
> (and my final result are wrong too). I suspect that some different list
> components are regarded as the same when they are converted to vectors (the
> r-help of match() says "Factors, raw vectors and lists are converted to
> character vectors"). Is it possible? And as important, how to fix this?
>
> > read.genes[[1]]
> [1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"
>
> > duplicates <- as.vector(table(match(read.genes, read.genes)))
>
> > length(duplicates)
> [1] 1424
> > read.genes.uniq <- unique(read.genes)
> > length(read.genes.uniq)
> [1] 1469
>
> > sum(duplicates)
> [1] 9945348
> > length(read.genes)
> [1] 9945348
>
> On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap <wdun...@tibco.com>wrote:
>
>> table(match(x, x)) gives you the numbers but the labels are
>> a bit more work.
>>
>> E.g., I'll define another list
>>  > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2),
>> 2^(0:4))
>>  > tb <- table(m <- match(x, x))
>>  > m
>>  [1] 1 1 3 4 3
>>  > tb
>>
>>  1 3 4
>>  2 2 1
>> which says that the first element of x is seen twice,
>> the third twice, and the fourth once.  How to organize
>> that the best depends on what you want to do with the
>> data.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>> > -----Original Message-----
>> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On Behalf Of zhenjiang xu
>> > Sent: Wednesday, August 31, 2011 9:25 AM
>> > To: r-help
>> > Subject: [R] counting the duplicates in an object of list
>> >
>> > Hi all,
>> >
>> > I have a list x:
>> >
>> >  > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3'))
>> >
>> > I can get the unique elements with unique(), but how can I get the
>> > number of duplicates for each unique elements?
>> >
>> > > unique(x)
>> > [[1]]
>> > [1] "1" "2"
>> >
>> > [[2]]
>> > [1] "2" "3"
>> >
>> > Thanks
>> >
>> > --
>> > Best,
>> > Zhenjiang
>> >
>> > ______________________________________________
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Best,
> Zhenjiang
>



-- 
Best,
Zhenjiang

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to