Thank you David - it took me awhile to get back to this and dig into it. It's clever to imitate gtools::mixedorder() as far as possible. A few comments:
1. It took me a while to understand why you picked 3899 in your Roman-to-integer table; it's because roman(x) is NA for x > 3899. (BTW, in 'utils', there's utils:::.roman2numeric() which could be utilized, but it's currently internal.) 2. I think you forgot D=500 and M=1000. 3. There was a typo in your code; I think you meant rank.roman instead of rank.numeric in one place. 4. The idea behind nonnumeric() is to identify non-numeric substrings by is.na(as.numeric()). Unfortunately, for romans that does not work. Instead, we need to use is.na(numeric(x)) here, i.e. nonnumeric <- function(x) { suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) } Actually, gtools::mixedorder() could use the same. 5. I undid your ".numeric" to ".roman" to minimize any differences to gtools::mixedorder(). With the above fixes, we now have: mixedorderRoman <- function (x) { if (length(x) < 1) return(NULL) else if (length(x) == 1) return(1) if (is.numeric(x)) return(order(x)) delim = "\\$\\@\\$" # NOTE: Note that as.roman(x) is NA for x > 3899 romanC <- as.character( as.roman(1:3899) ) numeric <- function(x) { suppressWarnings(match(x, romanC)) } nonnumeric <- function(x) { suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) } x <- as.character(x) which.nas <- which(is.na(x)) which.blanks <- which(x == "") if (length(which.blanks) > 0) x[which.blanks] <- -Inf if (length(which.nas) > 0) x[which.nas] <- Inf delimited <- gsub("([IVXCLM]+)", paste(delim, "\\1", delim, sep = ""), x) step1 <- strsplit(delimited, delim) step1 <- lapply(step1, function(x) x[x > ""]) step1.numeric <- lapply(step1, numeric) step1.character <- lapply(step1, nonnumeric) maxelem <- max(sapply(step1, length)) step1.numeric.t <- lapply(1:maxelem, function(i) sapply(step1.numeric, function(x) x[i])) step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, function(x) x[i])) rank.numeric <- sapply(step1.numeric.t, rank) rank.character <- sapply(step1.character.t, function(x) as.numeric(factor(x))) rank.numeric[!is.na(rank.character)] <- 0 rank.character <- t(t(rank.character) + apply(matrix(rank.numeric), 2, max, na.rm = TRUE)) rank.overall <- ifelse(is.na(rank.character), rank.numeric, rank.character) order.frame <- as.data.frame(rank.overall) if (length(which.nas) > 0) order.frame[which.nas, ] <- Inf retval <- do.call("order", order.frame) return(retval) } The difference to gtools::mixedorder() is minimal: < romanC <- as.character( as.roman(1:3899) ) 21c11 < suppressWarnings(match(x, romanC)) --- > suppressWarnings(as.numeric(x)) 24c14 < suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), --- > suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), 34c24 < delimited <- gsub("([IVXCLDM]+)", --- > delimited <- > gsub("([+-]{0,1}[0-9]+\\.{0,1}[0-9]*([eE][\\+\\-]{0,1}[0-9]+\\.{0,1}[0-9]*){0,1})", 59,62d48 This difference is so small that the above could now be an option to mixedorder() with minimal overhead added, e.g. mixedorder(y, type=c("decimal", "roman")). One could even imagine adding support for "binary", "octal" and "hexadecimal" (not done). Greg (maintainer of gtools; cc:ed), is this something you would consider adding to gtools? I've modified the gtools source code available on CRAN (that's the only source I found), added package tests, updated the Rd and verified it passes R CMD check. If interested, please find the updates at: https://github.com/HenrikBengtsson/gtools/compare/cran:master...master Thanks Henrik On Tue, Aug 26, 2014 at 6:46 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote: > >> Hi, >> >> does anyone know of an implementation/function that sorts strings that >> *contain* roman numerals (I, II, III, IV, V, ...) which are treated as >> numbers. In 'gtools' there is mixedsort() which does this for strings >> that contains (decimal) numbers. I'm looking for a "mixedsortroman()" >> function that does the same but with roman numbers, e.g. > > It's pretty easy to sort something you know to be congruent with the existing > roman class: > > romanC <- as.character( as.roman(1:3899) ) > match(c("I", "II", "III","X","V"), romanC) > #[1] 1 2 3 10 5 > > But I guess you already know that, so you want a regex approach to parsing. > Looking at the path taken by Warnes, it would involve doing something like > his regex based insertion of a delimiter for "Roman numeral" but simpler > because he needed to deal with decimal points and signs and exponent > notation, none of which you appear to need. If you only need to consider > character and Roman, then this hack of Warnes tools succeeds: > > mixedorderRoman <- function (x) > { > if (length(x) < 1) > return(NULL) > else if (length(x) == 1) > return(1) > if (is.numeric(x)) > return(order(x)) > delim = "\\$\\@\\$" > roman <- function(x) { > suppressWarnings(match(x, romanC)) > } > nonnumeric <- function(x) { > suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), > NA)) > } > x <- as.character(x) > which.nas <- which(is.na(x)) > which.blanks <- which(x == "") > if (length(which.blanks) > 0) > x[which.blanks] <- -Inf > if (length(which.nas) > 0) > x[which.nas] <- Inf > delimited <- gsub("([IVXCL]+)", > paste(delim, "\\1", delim, sep = ""), x) > step1 <- strsplit(delimited, delim) > step1 <- lapply(step1, function(x) x[x > ""]) > step1.roman <- lapply(step1, roman) > step1.character <- lapply(step1, nonnumeric) > maxelem <- max(sapply(step1, length)) > step1.roman.t <- lapply(1:maxelem, function(i) sapply(step1.roman, > function(x) x[i])) > step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, > function(x) x[i])) > rank.roman <- sapply(step1.roman.t, rank) > rank.character <- sapply(step1.character.t, function(x) > as.numeric(factor(x))) > rank.roman[!is.na(rank.character)] <- 0 > rank.character <- t(t(rank.character) + apply(matrix(rank.roman), > 2, max, na.rm = TRUE)) > rank.overall <- ifelse(is.na(rank.character), rank.numeric, > rank.character) > order.frame <- as.data.frame(rank.overall) > if (length(which.nas) > 0) > order.frame[which.nas, ] <- Inf > retval <- do.call("order", order.frame) > return(retval) > } > > y[mixedorderRoman(y)] > [1] "chr I" "chr II" "chr III" "chr IV" "chr IX" > [6] "chr V" "chr VI" "chr VII" "chr VIII" "chr X" > [11] "chr XI" "chr XII" > > > -- > David. >> >> ## DECIMAL NUMBERS >>> x <- sprintf("chr %d", 12:1) >>> x >> [1] "chr 12" "chr 11" "chr 10" "chr 9" "chr 8" >> [6] "chr 7" "chr 6" "chr 5" "chr 4" "chr 3" >> [11] "chr 2" "chr 1" >> >>> sort(x) >> [1] "chr 1" "chr 10" "chr 11" "chr 12" "chr 2" >> [6] "chr 3" "chr 4" "chr 5" "chr 6" "chr 7" >> [11] "chr 8" "chr 9" >> >>> gtools::mixedsort(x) >> [1] "chr 1" "chr 2" "chr 3" "chr 4" "chr 5" >> [6] "chr 6" "chr 7" "chr 8" "chr 9" "chr 10" >> [11] "chr 11" "chr 12" >> >> >> ## ROMAN NUMBERS >>> y <- sprintf("chr %s", as.roman(12:1)) >>> y >> [1] "chr XII" "chr XI" "chr X" "chr IX" >> [5] "chr VIII" "chr VII" "chr VI" "chr V" >> [9] "chr IV" "chr III" "chr II" "chr I" >> >>> sort(y) >> [1] "chr I" "chr II" "chr III" "chr IV" >> [5] "chr IX" "chr V" "chr VI" "chr VII" >> [9] "chr VIII" "chr X" "chr XI" "chr XII" >> >>> mixedsortroman(y) >> [1] "chr I" "chr II" "chr III" "chr IV" >> [5] "chr V" "chr VI" "chr VII" "chr VIII" >> [9] "chr IX" "chr X" "chr XI" "chr XII" >> >> The latter is what I'm looking for. >> >> Before hacking together something myself (e.g. identify roman numerals >> substrings, translate them to decimal numbers, use gtools::mixedsort() >> to sort them and then translate them back to roman numbers), I'd like >> to hear if someone already has this implemented/know of a package that >> does this. >> >> Thanks, >> >> Henrik >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.