This does not answer your question directly but note that strapply in the gsubfn package can be used to select strings by content:
> library(gsubfn) > (x <- strapply(txt, "Variation_....", simplify = c)) [1] "Variation_0001" "Variation_5452" "Variation_4192" "Variation_4193" [5] "Variation_8246" "Variation_8246" > paste(x, collapse = ";") [1] "Variation_0001;Variation_5452;Variation_4192;Variation_4193;Variation_8246;Variation_8246" On Feb 2, 2008 1:40 PM, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > That actually reminds me of a problem I had to tackle a while ago. > > Say I have the following: > > txt <- c("Variation_0001 // chr1:1083805-1283805 // Array CGH // > 15286789 // Iafrate et al. (2004) // CopyNumber /// Variation_5452 // > chr1:1142956-1147823 // Computational mapping of resequencing > traces // 16902084 // Mills et al. (2006) // CopyNumber", > "Variation_4192 // chr1:2062347-2242269 // Array CGH // 17160897 // > Wong et al. (2007) // CopyNumber /// Variation_4193 // > chr1:2145626-2314237 // Array CGH // 17160897 // Wong et al. (2007) // > CopyNumber /// Variation_8246 // chr1:2224111-3755284 // Affymetrix > 500K and 100K SNP Mapping Arrays // 17638019 // Zogopoulos et al. > (2007) // CopyNumber", "Variation_8246 // chr1:2224111-3755284 // > Affymetrix 500K and 100K SNP Mapping Arrays // 17638019 // Zogopoulos > et al. (2007) // CopyNumber") > > For each record, I'm interested in keeping the following: > > results <- c("Variation_0001;Variation_5452", > "Variation_4192;Variation_4193;Variation_8246", "Variation_8246") > > My solution was: > > theNames <- function(tmp) > sapply(strsplit(tmp, " /+ "), > function(y) > paste(y[grep("Variation_", y)], > collapse=";")) > > But my wish was to know the regular expression that I needed to select > everything but "Variation_\\d+"... For example, something like: > > gsub( NOT "Variation_\\d+", ";", txt, perl=TRUE) > > Suggestions? > > b > > On Feb 2, 2008, at 1:03 PM, Peter Dalgaard wrote: > > > Benilton Carvalho wrote: > >> help("strsplit") > >> b > >> > > Yes, but... > > > > The postprocessing gets a bit awkward. It might be easier to use > > sub() to get rid of the first/last bit of the string i.e. > > > > C2 <- sub("^.*:", "", Col) > > C1 <- sub(":.*$", "", Col) > > > > An orthogonal idea is > > > > con <- textConnection("Col") > > read.table(con, sep=":") > > close(con) > > > >> On Feb 2, 2008, at 12:43 PM, joseph wrote: > >> > >>> > >>> > >>> Hello > >>> > >>> I have a data frame and one of its columns is as follows: > >>> > >>> > >>> > >>> > >>> Col > >>> > >>> > >>> chr1:71310034 > >>> > >>> > >>> > >>> chr15:37759058 > >>> > >>> > >>> chr22:18262638 > >>> > >>> > >>> chrUn:31337214 > >>> > >>> > >>> chr10_random:4369261 > >>> > >>> > >>> chrUn:3545097 > >>> > >>> > >>> > >>> > >>> > >>> I would like to get rid of colon (:) and replace this column > >>> with two new columns containing the terms on each side of the > >>> colon. The new columns > >>> should look as follows: > >>> > >>> > >>> > >>> > >>> Col_a Col_b > >>> > >>> > >>> chr1 71310034 > >>> > >>> > >>> chr14 23354088 > >>> > >>> > >>> chr15 37759058 > >>> > >>> > >>> chr22 18262638 > >>> > >>> > >>> chrUn 31337214 > >>> > >>> > >>> chr10_random 4369261 > >>> > >>> > >>> chrUn 3545097 > >>> > >>> > >>> > >>> > >>> > >>> Any help will be much appreciated > >>> > >>> > >>> Joseph > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ____________________________________________________________________________________ > >>> Looking for last minute shopping deals? > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> ------------------------------------------------------------------------ > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > -- > > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) > > 35327918 > > ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) > > 35327907 > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.