Amazing. Thanks everybody for the help. I have about 12,000 rows of data with up to 50 reccurrences, but it seems to work like a charm.
Best, Kai On Sun, Feb 12, 2012 at 8:11 AM, Petr Savicky <savi...@cs.cas.cz> wrote: > On Sat, Feb 11, 2012 at 04:05:25PM -0500, David Winsemius wrote: > > > > On Feb 11, 2012, at 1:17 PM, Kai Mx wrote: > > > > >Hi everybody, > > >I have a large dataframe similar to this one: > > >knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > > >kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', > > >'20101201', '20110105', '20101001', '20110504', '20110603', > > >'20110201'), > > >format="%Y%m%d") > > >kdata <- data.frame (knames, kdate) > > > > > ave(unclass(kdate), knames, FUN=order ) > > [1] 2 2 1 1 1 2 1 2 1 1 > > > > > > That was actually not using the dataframe values but you could also do > > this: > > > > > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order )) > > > kdata > > knames kdate ord > > 1 ab 2011-10-01 2 > > 2 aa 2011-11-02 2 > > 3 ac 2010-10-01 1 > > 4 ad 2010-03-15 1 > > 5 ab 2010-12-01 1 > > 6 ac 2011-01-05 2 > > 7 aa 2010-10-01 1 > > 8 ad 2011-05-04 2 > > 9 ae 2011-06-03 1 > > 10 af 2011-02-01 1 > > Hi. > > This is a good solution, if there are at most two occurrences > of each name. If there are more occurrences, then function "order" > should be replaced by "rank". Replacing name "aa" at row 2 by "ab", > we get > > knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', > '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), > format="%Y%m%d") > kdata <- data.frame (knames, kdate) > > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order)) > kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank)) > kdata > > knames kdate ord rank > 1 ab 2011-10-01 3 2 > 2 ab 2011-11-02 1 3 > 3 ac 2010-10-01 1 1 > 4 ad 2010-03-15 1 1 > 5 ab 2010-12-01 2 1 > 6 ac 2011-01-05 2 2 > 7 aa 2010-10-01 1 1 > 8 ad 2011-05-04 2 2 > 9 ae 2011-06-03 1 1 > 10 af 2011-02-01 1 1 > > The names "ab" occur in the order row 5, row 1, row 2, so > row 1 should get index 2, row 2 index 3. > > If some of the dates repeat, then rank() by default computes > the average index. In this case, the following function f() > may be used > > knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > kdate <- as.Date( c('20111001', '20111001', '20101001', '20100315', > '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), > format="%Y%m%d") > kdata <- data.frame (knames, kdate) > > kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank)) > f <- function(x) rank(x, ties.method="first") > kdata$f <- with(kdata, ave(unclass(kdate), knames, FUN=f)) > kdata > > knames kdate rank f > 1 ab 2011-10-01 2.5 2 > 2 ab 2011-10-01 2.5 3 > 3 ac 2010-10-01 1.0 1 > 4 ad 2010-03-15 1.0 1 > 5 ab 2010-12-01 1.0 1 > 6 ac 2011-01-05 2.0 2 > 7 aa 2010-10-01 1.0 1 > 8 ad 2011-05-04 2.0 2 > 9 ae 2011-06-03 1.0 1 > 10 af 2011-02-01 1.0 1 > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.