On Fri, Jul 16, 2010 at 1:59 PM, Davis, Brian <brian.da...@uth.tmc.edu> wrote: > I have a text processing problem I'm hoping someone can help me solve. This > issue it this. > > I have a character string in which I need to delete a variable number of > characters from the string. The string itself contains the number of > characters to be deleted. The number of characters to be deleted is > proceeded by either a "+" or a "-". > > A toy example: > > Suppose I have > > x<-c("A-1CB-2GHX", "*+11gAgggTgtgggH") >> x > [1] "A-1CB-2GHX" "*+11gAgggTgtgggH" > > What I need as output is > "ABX" "*H" > > I know I can use gsub to remove the control character and the number portion > with > > gsub("(\\-|\\+)([0-9]+)", replacement="", x) > > However, I can't figure out how to delete the variable number of characters > after the number portion of the string. >
Using gsubfn in the gsubfn package we match - the - or + via [-+], - the digits via \\d+ and - the remaining characters via [^-+]* parenthesizing the digits and remaining characters so that they form back references which are passed to the function as args 1 and 2 respectively. gsubfn supports a formula notation for functions and the specified function using that formula notation has arguments d and s and function body which strips the characters and returns the rest to be substituted back in: > library(gsubfn) > gsubfn("[-+](\\d+)([^-+]*)", d + s ~ substring(s, as.numeric(d) + 1), x) [1] "ABX" "*H" See http://gsubfn.googlecode.com for more. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.