> For regular expression afficianados, I'd like a cleverer solution to > the following problem (my solution works just fine for my needs; I'm > just trying to improve my regex skills): > > Given the string (entered, say, at a readline prompt): > > "1 2 -5, 3- 6 4 8 5-7 10" ## only integers will be entered > > parse it to produce the numeric vector: > > c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10) > > Note that "-" in the expression is used to indicate a range of values > instead of ":" > > Here's my UNclever solution: > > First convert more than one space to a single space and then replace > "<any spaces>-<any spaces>" by ":" by: > > > x1 <- gsub(" *- *",":",gsub(" +"," ",resp)) #giving > > x1 > [1] "1 2:5, 3:6 4 8 5:7 10" ## Note that the comma remains > > Next convert the single string into a character vector via strsplit by > splitting on anything but ":" or a digit: > > > x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]] #giving > > x2 > [1] "1" "2:5" "3:6" "4" "8" "5:7" "10" > > Finally, parse() the vector, eval() each element, and unlist() the > resulting list of numeric vectors: > > > unlist(lapply(parse(text=x2),eval)) #giving, as desired, > [1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10 > > > This seems far too clumsy and circumlocuitous not to have a more > elegant solution from a true regex expert. > > (Special note to Thomas Lumley: This seems one of the few instances > where eval(parse..)) may actually be appropriate.)
Howdy. I don't know that I can produce anything less circumlocutory, but I note that your "x2" form has a simple-enough structure that it can be further parsed with regular expressions, i.e., as opposed to using parse and eval. I don't know that this is an improvement -- just a variation on the theme. I've appended an example. -- Mike #### Original vector x <- "1 2 -5, 3- 6 4 8 5-7 10"; x #### Convert ranges to standard R form x1 <- gsub("[ ]*-[ ]*", ":", x); x1 #### Get rid of the comma x2 <- gsub(",", " ", x1); x2 #### Remove extra spaces x3 <- gsub("[ ]+", " ", x2); x3 #### Split off elements, now in standard form x4 <- unlist(strsplit(x3, " ")); x4 #### Use regular expression for simple parse of elements x5 <- sapply(x4, function(a) { n1 <- gsub("([[:digit:]]):[[:digit:]]", "\\1", a) n2 <- gsub("[[:digit:]]:([[:digit:]])", "\\1", a) n1:n2}, USE.NAMES=FALSE); x5 x6 <- unlist(x5); x6 ########################################################## > #### Original vector > x <- "1 2 -5, 3- 6 4 8 5-7 10"; x [1] "1 2 -5, 3- 6 4 8 5-7 10" > > #### Convert ranges to standard R form > x1 <- gsub("[ ]*-[ ]*", ":", x); x1 [1] "1 2:5, 3:6 4 8 5:7 10" > > #### Get rid of the comma > x2 <- gsub(",", " ", x1); x2 [1] "1 2:5 3:6 4 8 5:7 10" > > #### Remove extra spaces > x3 <- gsub("[ ]+", " ", x2); x3 [1] "1 2:5 3:6 4 8 5:7 10" > > #### Split off elements, now in standard form > x4 <- unlist(strsplit(x3, " ")); x4 [1] "1" "2:5" "3:6" "4" "8" "5:7" "10" > > #### Use regular expression for simple parse of elements > x5 <- sapply(x4, function(a) { + n1 <- gsub("([[:digit:]]):[[:digit:]]", "\\1", a) + n2 <- gsub("[[:digit:]]:([[:digit:]])", "\\1", a) + n1:n2}, USE.NAMES=FALSE); x5 [[1]] [1] 1 [[2]] [1] 2 3 4 5 [[3]] [1] 3 4 5 6 [[4]] [1] 4 [[5]] [1] 8 [[6]] [1] 5 6 7 [[7]] [1] 10 > x6 <- unlist(x5); x6 [1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10 > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.