[R] Regular Expression

Steven Kang Sun, 08 Aug 2010 18:44:29 -0700

Hi all,

>From a list of strings, I desire to filter out the followings:
1. Digits at the beginning of the strings
2. Character "SPE" following the digits (if it exists)
3. Any characters followed by hyphen


The following produces the desired result, but would like to know whether
this can be done more efficiently.

Any suggestions would be much appreciated.


dat <- c("2148 SPE MAR - CCC", "9843 SPE ANN - BBB", "56748 LIF - AA", "3489
SPE GEN - CC", "4752473 MAR - AA", "980843 SPE PEN - CC")
> dat
[1] "2148 SPE MAR - CCC"  "9843 SPE ANN - BBB"  "56748 LIF - AA"      "3489
SPE GEN - CC"   "4752473 MAR - AA"    "980843 SPE PEN - CC"

dd <- sub(pattern = "^[0-9]+[[:blank:]]", "", dat)
dd <- sub(pattern = "SPE ", "", dd)
dd <- substr(x = dd, start = 1, stop = regexpr("-", dd) - 2)
> dd
[1] "MAR" "ANN" "LIF" "GEN" "MAR" "PEN"


-- 
Steven

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regular Expression

Reply via email to