Re: [R] Counting occurences of variables in a dataframe

Kai Mx Sun, 12 Feb 2012 04:31:32 -0800

Amazing. Thanks everybody for the help. I have about 12,000 rows of data
with up to 50 reccurrences, but it seems to work like a charm.


Best,

Kai


On Sun, Feb 12, 2012 at 8:11 AM, Petr Savicky <savi...@cs.cas.cz> wrote:

> On Sat, Feb 11, 2012 at 04:05:25PM -0500, David Winsemius wrote:
> >
> > On Feb 11, 2012, at 1:17 PM, Kai Mx wrote:
> >
> > >Hi everybody,
> > >I have a large dataframe similar to this one:
> > >knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
> > >kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
> > >'20101201', '20110105', '20101001', '20110504', '20110603',
> > >'20110201'),
> > >format="%Y%m%d")
> > >kdata <- data.frame (knames, kdate)
> >
> > >  ave(unclass(kdate), knames, FUN=order )
> >  [1] 2 2 1 1 1 2 1 2 1 1
> >
> >
> > That was actually not using the dataframe values but you could also do
> > this:
> >
> > > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order ))
> > > kdata
> >    knames      kdate ord
> > 1      ab 2011-10-01   2
> > 2      aa 2011-11-02   2
> > 3      ac 2010-10-01   1
> > 4      ad 2010-03-15   1
> > 5      ab 2010-12-01   1
> > 6      ac 2011-01-05   2
> > 7      aa 2010-10-01   1
> > 8      ad 2011-05-04   2
> > 9      ae 2011-06-03   1
> > 10     af 2011-02-01   1
>
> Hi.
>
> This is a good solution, if there are at most two occurrences
> of each name. If there are more occurrences, then function "order"
> should be replaced by "rank". Replacing name "aa" at row 2 by "ab",
> we get
>
>  knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
>   kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315',
>  '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
>  format="%Y%m%d")
>  kdata <- data.frame (knames, kdate)
>
>   kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order))
>   kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
>  kdata
>
>     knames      kdate ord rank
>  1      ab 2011-10-01   3    2
>  2      ab 2011-11-02   1    3
>  3      ac 2010-10-01   1    1
>  4      ad 2010-03-15   1    1
>  5      ab 2010-12-01   2    1
>  6      ac 2011-01-05   2    2
>  7      aa 2010-10-01   1    1
>  8      ad 2011-05-04   2    2
>  9      ae 2011-06-03   1    1
>  10     af 2011-02-01   1    1
>
> The names "ab" occur in the order row 5, row 1, row 2, so
> row 1 should get index 2, row 2 index 3.
>
> If some of the dates repeat, then rank() by default computes
> the average index. In this case, the following function f()
> may be used
>
>  knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af')
>  kdate <- as.Date( c('20111001', '20111001', '20101001', '20100315',
>   '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'),
>  format="%Y%m%d")
>  kdata <- data.frame (knames, kdate)
>
>   kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank))
>  f <- function(x) rank(x, ties.method="first")
>  kdata$f <- with(kdata, ave(unclass(kdate), knames, FUN=f))
>  kdata
>
>     knames      kdate rank f
>  1      ab 2011-10-01  2.5 2
>  2      ab 2011-10-01  2.5 3
>  3      ac 2010-10-01  1.0 1
>  4      ad 2010-03-15  1.0 1
>  5      ab 2010-12-01  1.0 1
>  6      ac 2011-01-05  2.0 2
>  7      aa 2010-10-01  1.0 1
>  8      ad 2011-05-04  2.0 2
>  9      ae 2011-06-03  1.0 1
>  10     af 2011-02-01  1.0 1
>
> Hope this helps.
>
> Petr Savicky.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting occurences of variables in a dataframe

Reply via email to