hmm, frustrating. BTW, unique() works alright. It seems not using deparse()
or using it differently

On Wed, Sep 7, 2011 at 11:27 PM, William Dunlap <wdun...@tibco.com> wrote:

>  I don't think you can increase width.cutoff above 500 and****
>
> it isn't an argument to as.character or match.  The best****
>
> solution would be to avoid the internal use of deparse****
>
> when using match() or unique() on lists and to hash the****
>
> list element directly, but that is a fair bit of work.****
>
> ** **
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com ****
>
> *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com]
> *Sent:* Wednesday, September 07, 2011 8:04 PM
>
> *To:* William Dunlap
> *Cc:* r-help
> *Subject:* Re: [R] counting the duplicates in an object of list****
>
>  ** **
>
> I tried converting the elements to strings before, but due to the large
> data size it took forever to finish with paste(). Is there anyway to set the
> default width.cutoff longer and pass it to match()?****
>
> On Wed, Sep 7, 2011 at 10:42 PM, William Dunlap <wdun...@tibco.com> wrote:
> ****
>
> match(aList, aList) probably does what as.character(aList) does:****
>
> cut off the character strings at 500 characters (because deparse(x,****
>
> nlines=1, width.cutoff) requires that width.cutoff<=500) .  Try****
>
> converting the elements to character strings yourself before passing them*
> ***
>
> to match.  E.g.,****
>
>     ac <- sapply(a, function(ai) paste(collapse="\n", deparse(ai)))****
>
> and use match on that.  You can use the indices it returns on****
>
> the original list.****
>
>  ****
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com ****
>
> *From:* zhenjiang xu [mailto:zhenjiang...@gmail.com]
> *Sent:* Wednesday, September 07, 2011 7:25 PM
> *To:* William Dunlap
> *Cc:* r-help
> *Subject:* Re: [R] counting the duplicates in an object of list****
>
>  ****
>
> Now I nailed down the problem, but I am still confused why match() takes
> the 1st two components and the last two the same.****
>
>  ****
>
> > match(a,a)****
>
> [1] 1 2 3 1 2****
>
>  ****
>
> > a****
>
> [[1]]****
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"****
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"****
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  ****
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"****
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  ****
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-3"****
>
> [37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
>  ****
>
> [43] "YMRCTy1-4" "YMR050C"   "YNLCTy1-1" "YNL284C-B" "YNLWTy1-2"
> "YNL054W-B"****
>
> [49] "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B" "YPLWTy1-1"
> "YPL257W-B"****
>
> [55] "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"****
>
>  ****
>
> [[2]]****
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"****
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"****
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  ****
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"****
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  ****
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-2"****
>
> [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
>  ****
>
> [43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YNLCTy1-1"
> "YNL284C-B"****
>
> [49] "YNLWTy1-2" "YNL054W-B" "YOLWTy1-1" "YOL103W-B" "YORWTy1-2"
> "YOR142W-B"****
>
> [55] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3"
> "YPR158W-B"****
>
> [61] "YPRCTy1-4" "YPR158C-D"****
>
>  ****
>
> [[3]]****
>
>  [1] "YARCTy1-1" "YAR009C"   "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2"
> "YDR210C-D"****
>
>  [7] "YDRCTy1-3" "YDR261C-D" "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5"
> "YDR365W-B"****
>
> [13] "YERCTy1-1" "YER138C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"****
>
> [19] "YJRWTy1-1" "YJR027W"   "YJRWTy1-2" "YJR029W"   "YLRCTy1-1"
> "YLR157C-B"****
>
> [25] "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
> "YMRCTy1-4"****
>
> [31] "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2" "YOR142W-B"
> "YPLWTy1-1"****
>
> [37] "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3" "YPR158W-B"****
>
>  ****
>
> [[4]]****
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"****
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"****
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  ****
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"****
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  ****
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-3"****
>
> [37] "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"   "YMRCTy1-3" "YMR045C"
>  ****
>
> [43] "YMRCTy1-4" "YMR050C"   "YOLWTy1-1" "YOL103W-B" "YORWTy1-2"
> "YOR142W-B"****
>
> [49] "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2" "YPR137C-B" "YPRWTy1-3"
> "YPR158W-B"****
>
>  ****
>
> [[5]]****
>
>  [1] "YARCTy1-1" "YAR009C"   "YBLWTy1-1" "YBL005W-B" "YBRWTy1-2"
> "YBR012W-B"****
>
>  [7] "YDRCTy1-1" "YDR098C-B" "YDRCTy1-2" "YDR210C-D" "YDRCTy1-3"
> "YDR261C-D"****
>
> [13] "YDRWTy1-4" "YDR316W-B" "YDRWTy1-5" "YDR365W-B" "YERCTy1-1" "YER138C"
>  ****
>
> [19] "YERCTy1-2" "YER160C"   "YGRWTy1-1" "YGR027W-B" "YGRCTy1-2"
> "YGR038C-B"****
>
> [25] "YGRCTy1-3" "YGR161C-D" "YHRCTy1-1" "YHR214C-B" "YJRWTy1-1" "YJR027W"
>  ****
>
> [31] "YJRWTy1-2" "YJR029W"   "YLR035C-A" "YLRCTy1-1" "YLR157C-B"
> "YLRWTy1-2"****
>
> [37] "YLR227W-B" "YLRWTy1-3" "YMLWTy1-1" "YML045W"   "YMLWTy1-2" "YML039W"
>  ****
>
> [43] "YMRCTy1-3" "YMR045C"   "YMRCTy1-4" "YMR050C"   "YOLWTy1-1"
> "YOL103W-B"****
>
> [49] "YORWTy1-2" "YOR142W-B" "YPLWTy1-1" "YPL257W-B" "YPRCTy1-2"
> "YPR137C-B"****
>
> [55] "YPRWTy1-3" "YPR158W-B" "YPRCTy1-4" "YPR158C-D"****
>
>  ****
>
> On Wed, Sep 7, 2011 at 9:15 PM, zhenjiang xu <zhenjiang...@gmail.com>
> wrote:****
>
> Thanks, Bill. match() is nice and efficient. However, I met a problem:****
>
>  ****
>
> My real data is a large _list_ named "read.genes". I found conflict results
> between match() and unique() - the lengths of the outcomes are different
> (and my final result are wrong too). I suspect that some different list
> components are regarded as the same when they are converted to vectors (the
> r-help of match() says "Factors, raw vectors and lists are converted to
> character vectors"). Is it possible? And as important, how to fix this?***
> *
>
>  ****
>
> > read.genes[[1]]****
>
> [1] "YAL065C" "YAL063C" "YAR050W" "YHR211W"****
>
>  ****
>
> > duplicates <- as.vector(table(match(read.genes, read.genes)))****
>
>  ****
>
> > length(duplicates)****
>
> [1] 1424****
>
> > read.genes.uniq <- unique(read.genes)****
>
> > length(read.genes.uniq)****
>
> [1] 1469****
>
>  ****
>
> > sum(duplicates)****
>
> [1] 9945348****
>
> > length(read.genes)****
>
> [1] 9945348****
>
>  ****
>
> On Wed, Aug 31, 2011 at 12:42 PM, William Dunlap <wdun...@tibco.com>
> wrote:****
>
> table(match(x, x)) gives you the numbers but the labels are
> a bit more work.
>
> E.g., I'll define another list
>  > x <- list(c("1", "2", "4"), c("1", "2", "4"), 2^(0:4), 3^(1:2), 2^(0:4))
>  > tb <- table(m <- match(x, x))
>  > m
>  [1] 1 1 3 4 3
>  > tb
>
>  1 3 4
>  2 2 1
> which says that the first element of x is seen twice,
> the third twice, and the fourth once.  How to organize
> that the best depends on what you want to do with the
> data.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com****
>
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of zhenjiang xu
> > Sent: Wednesday, August 31, 2011 9:25 AM
> > To: r-help
> > Subject: [R] counting the duplicates in an object of list
> >
> > Hi all,
> >
> > I have a list x:
> >
> >  > x=list(a=c('1','2'),b=c('2','3'),c=c('1','2'),d=c('2','3'))
> >
> > I can get the unique elements with unique(), but how can I get the
> > number of duplicates for each unique elements?
> >
> > > unique(x)
> > [[1]]
> > [1] "1" "2"
> >
> > [[2]]
> > [1] "2" "3"
> >
> > Thanks
> >
> > --
> > Best,
> > Zhenjiang
> >****
>
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.****
>
>
>
> ****
>
>  ****
>
> --
> Best,
> Zhenjiang****
>
>
>
> ****
>
>  ****
>
> --
> Best,
> Zhenjiang****
>
>
>
> ****
>
> ** **
>
> --
> Best,
> Zhenjiang****
>



-- 
Best,
Zhenjiang

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to