On Fri, Jul 16, 2010 at 1:59 PM, Davis, Brian <brian.da...@uth.tmc.edu> wrote:
> I have a text processing problem I'm hoping someone can help me solve.  This 
> issue it this.
>
>  I have a character string in which I need to delete a variable number of 
> characters from the string.  The string itself contains the number of 
> characters to be deleted.  The number of characters to be deleted is 
> proceeded by either a "+" or a "-".
>
> A toy example:
>
> Suppose I have
>
> x<-c("A-1CB-2GHX", "*+11gAgggTgtgggH")
>> x
> [1] "A-1CB-2GHX"       "*+11gAgggTgtgggH"
>
> What I need as output is
> "ABX" "*H"
>
> I know I can use gsub to remove the control character and the number portion 
> with
>
> gsub("(\\-|\\+)([0-9]+)", replacement="", x)
>
> However, I can't figure out how to delete the variable number of characters 
> after the number portion of the string.
>

Using gsubfn in the gsubfn package we match

- the - or + via [-+],
- the digits via \\d+ and
- the remaining characters via [^-+]*

parenthesizing the digits and remaining characters so that they form
back references which are passed to the function as args 1 and 2
respectively.  gsubfn supports a formula notation for functions and the
specified function using that formula notation has arguments d and s
and function body which strips the characters and returns the rest
to be substituted back in:

   > library(gsubfn)
   > gsubfn("[-+](\\d+)([^-+]*)", d + s ~ substring(s, as.numeric(d) + 1), x)
   [1] "ABX" "*H"

See http://gsubfn.googlecode.com for more.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to