You could also try: library(gsubfn)
strapply(gsub("\\d+<|>\\d+","",vec1),"([0-9]+)",as.numeric,simplify=c) A.K. On Thursday, February 6, 2014 1:55 PM, arun <smartpink...@yahoo.com> wrote: Hi, One way would be: vec1 <- c("CDS 3300..4037", "CDS complement(3300..4037)", "CDS 3300<..4037", "CDS join(21467..26641,27577..28890)", "CDS complement(join(30708..31700,31931..31984))", "CDS 3300<..>4037") library(stringr) as.numeric(unlist(strsplit(str_trim(gsub("\\D+"," ",gsub("\\d+<|>\\d+","",vec1)))," "))) # [1] 3300 4037 3300 4037 4037 21467 26641 27577 28890 30708 31700 31931 #[13] 31984 A.K. Hi, I have been using R for the past 1.5 years and usually have found topics to be relatively easy to learn on your own, but I am finding the learning curve with the regular expressions to be a little steep especially since I haven't found any good tutorials. While I intend to spend more time systematically learning proper ways of making regular expressions, I have a project that is coming due and can't wait for that so I was hoping to get some direct help. I need to extract all the numbers in lines with following formats: "CDS 3300..4037" or "CDS complement(3300..4037)" or "CDS join(21467..26641,27577..28890)" or "CDS complement(join(30708..31700,31931..31984))" but not if any of the numbers are preceded by "<" or followed by ">" Many thanks in advance! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.