On Sat, Feb 11, 2012 at 04:05:25PM -0500, David Winsemius wrote: > > On Feb 11, 2012, at 1:17 PM, Kai Mx wrote: > > >Hi everybody, > >I have a large dataframe similar to this one: > >knames <-c('ab', 'aa', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') > >kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', > >'20101201', '20110105', '20101001', '20110504', '20110603', > >'20110201'), > >format="%Y%m%d") > >kdata <- data.frame (knames, kdate) > > > ave(unclass(kdate), knames, FUN=order ) > [1] 2 2 1 1 1 2 1 2 1 1 > > > That was actually not using the dataframe values but you could also do > this: > > > kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order )) > > kdata > knames kdate ord > 1 ab 2011-10-01 2 > 2 aa 2011-11-02 2 > 3 ac 2010-10-01 1 > 4 ad 2010-03-15 1 > 5 ab 2010-12-01 1 > 6 ac 2011-01-05 2 > 7 aa 2010-10-01 1 > 8 ad 2011-05-04 2 > 9 ae 2011-06-03 1 > 10 af 2011-02-01 1
Hi. This is a good solution, if there are at most two occurrences of each name. If there are more occurrences, then function "order" should be replaced by "rank". Replacing name "aa" at row 2 by "ab", we get knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') kdate <- as.Date( c('20111001', '20111102', '20101001', '20100315', '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), format="%Y%m%d") kdata <- data.frame (knames, kdate) kdata$ord <- with(kdata, ave(unclass(kdate), knames, FUN=order)) kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank)) kdata knames kdate ord rank 1 ab 2011-10-01 3 2 2 ab 2011-11-02 1 3 3 ac 2010-10-01 1 1 4 ad 2010-03-15 1 1 5 ab 2010-12-01 2 1 6 ac 2011-01-05 2 2 7 aa 2010-10-01 1 1 8 ad 2011-05-04 2 2 9 ae 2011-06-03 1 1 10 af 2011-02-01 1 1 The names "ab" occur in the order row 5, row 1, row 2, so row 1 should get index 2, row 2 index 3. If some of the dates repeat, then rank() by default computes the average index. In this case, the following function f() may be used knames <-c('ab', 'ab', 'ac', 'ad', 'ab', 'ac', 'aa', 'ad','ae', 'af') kdate <- as.Date( c('20111001', '20111001', '20101001', '20100315', '20101201', '20110105', '20101001', '20110504', '20110603', '20110201'), format="%Y%m%d") kdata <- data.frame (knames, kdate) kdata$rank <- with(kdata, ave(unclass(kdate), knames, FUN=rank)) f <- function(x) rank(x, ties.method="first") kdata$f <- with(kdata, ave(unclass(kdate), knames, FUN=f)) kdata knames kdate rank f 1 ab 2011-10-01 2.5 2 2 ab 2011-10-01 2.5 3 3 ac 2010-10-01 1.0 1 4 ad 2010-03-15 1.0 1 5 ab 2010-12-01 1.0 1 6 ac 2011-01-05 2.0 2 7 aa 2010-10-01 1.0 1 8 ad 2011-05-04 2.0 2 9 ae 2011-06-03 1.0 1 10 af 2011-02-01 1.0 1 Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.