Thanks Michael: You are essentially doing the eval and parsing by hand instead of letting eval(parse()) do the work. I prefer the latter.
However, your code did something that I did not expect and for which I can find no documentation -- I would have thought it shouldn't work. ... and that is, the return of your sapply is n1:n2 where n1 and n2 are _character values_ (because that's what gsub returns, of course). I would have thought this would give an error, but in fact it gives the "correct" result. That is, to my complete surprise: > "3":"5" [1] 3 4 5 > seq(from= "3", to= "5") [1] 3 4 5 > seq.int( "3", "5") [1] 3 4 5 > "3":5 [1] 3 4 5 all work! Is this behavior documented anywhere and I've missed it or is it a secret "feature." And to what extent does it work, noting that: seq(from="3.5",to="5.5",by="1") Error in to - from : non-numeric argument to binary operator Cheers, Bert On Fri, Aug 20, 2010 at 4:39 PM, Michael Hannon <jm_han...@yahoo.com> wrote: >> For regular expression afficianados, I'd like a cleverer solution to >> the following problem (my solution works just fine for my needs; I'm >> just trying to improve my regex skills): >> >> Given the string (entered, say, at a readline prompt): >> >> "1 2 -5, 3- 6 4 8 5-7 10" ## only integers will be entered >> >> parse it to produce the numeric vector: >> >> c(1, 2, 3, 4, 5, 3, 4, 5, 6, 8, 5, 6, 7, 10) >> >> Note that "-" in the expression is used to indicate a range of values >> instead of ":" >> >> Here's my UNclever solution: >> >> First convert more than one space to a single space and then replace >> "<any spaces>-<any spaces>" by ":" by: >> >> > x1 <- gsub(" *- *",":",gsub(" +"," ",resp)) #giving >> > x1 >> [1] "1 2:5, 3:6 4 8 5:7 10" ## Note that the comma remains >> >> Next convert the single string into a character vector via strsplit by >> splitting on anything but ":" or a digit: >> >> > x2 <- strsplit(x1,split="[^:[:digit:]]+")[[1]] #giving >> > x2 >> [1] "1" "2:5" "3:6" "4" "8" "5:7" "10" >> >> Finally, parse() the vector, eval() each element, and unlist() the >> resulting list of numeric vectors: >> >> > unlist(lapply(parse(text=x2),eval)) #giving, as desired, >> [1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10 >> >> >> This seems far too clumsy and circumlocuitous not to have a more >> elegant solution from a true regex expert. >> >> (Special note to Thomas Lumley: This seems one of the few instances >> where eval(parse..)) may actually be appropriate.) > > Howdy. I don't know that I can produce anything less circumlocutory, but I > note that your "x2" form has a simple-enough structure that it can be further > parsed with regular expressions, i.e., as opposed to using parse and eval. I > don't know that this is an improvement -- just a variation on the theme. > > I've appended an example. > > -- Mike > > #### Original vector > x <- "1 2 -5, 3- 6 4 8 5-7 10"; x > > #### Convert ranges to standard R form > x1 <- gsub("[ ]*-[ ]*", ":", x); x1 > > #### Get rid of the comma > x2 <- gsub(",", " ", x1); x2 > > #### Remove extra spaces > x3 <- gsub("[ ]+", " ", x2); x3 > > #### Split off elements, now in standard form > x4 <- unlist(strsplit(x3, " ")); x4 > > #### Use regular expression for simple parse of elements > x5 <- sapply(x4, function(a) { > n1 <- gsub("([[:digit:]]):[[:digit:]]", "\\1", a) > n2 <- gsub("[[:digit:]]:([[:digit:]])", "\\1", a) > n1:n2}, USE.NAMES=FALSE); x5 > x6 <- unlist(x5); x6 > > ########################################################## > >> #### Original vector >> x <- "1 2 -5, 3- 6 4 8 5-7 10"; x > [1] "1 2 -5, 3- 6 4 8 5-7 10" >> >> #### Convert ranges to standard R form >> x1 <- gsub("[ ]*-[ ]*", ":", x); x1 > [1] "1 2:5, 3:6 4 8 5:7 10" >> >> #### Get rid of the comma >> x2 <- gsub(",", " ", x1); x2 > [1] "1 2:5 3:6 4 8 5:7 10" >> >> #### Remove extra spaces >> x3 <- gsub("[ ]+", " ", x2); x3 > [1] "1 2:5 3:6 4 8 5:7 10" >> >> #### Split off elements, now in standard form >> x4 <- unlist(strsplit(x3, " ")); x4 > [1] "1" "2:5" "3:6" "4" "8" "5:7" "10" >> >> #### Use regular expression for simple parse of elements >> x5 <- sapply(x4, function(a) { > + n1 <- gsub("([[:digit:]]):[[:digit:]]", "\\1", a) > + n2 <- gsub("[[:digit:]]:([[:digit:]])", "\\1", a) > + n1:n2}, USE.NAMES=FALSE); x5 > [[1]] > [1] 1 > > [[2]] > [1] 2 3 4 5 > > [[3]] > [1] 3 4 5 6 > > [[4]] > [1] 4 > > [[5]] > [1] 8 > > [[6]] > [1] 5 6 7 > > [[7]] > [1] 10 > >> x6 <- unlist(x5); x6 > [1] 1 2 3 4 5 3 4 5 6 4 8 5 6 7 10 >> > > > > -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.