Hi: This is a bit of a roundabout approach; I'm sure that folks with regex expertise will trump this in a heartbeat. I modified the last piece of the string a bit to accommodate the approach below. Depending on where the strings have line breaks, you may have some odd '\n' characters inserted.
# Step 1: read the input as a single character string u <- "SpeciesCommon=(Human);SpeciesScientific=(Homo sapiens);ReactiveCentres=(N,C,C,C,+H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond=(255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);SpeciesScientific=(Achromobacter cycloclastes);SpeciesCommon=(Bacteria);Reactive=(Ce+)" # Step 2: Split input lines by the ';' delimiter and then use lapply() to split variable names from values. # This results in a nested list for ulist2. ulist <- strsplit(u, ';') ulist2 <- lapply(ulist, function(s) strsplit(s, '=')) # Step 3: Break out the results into a matrix whose first column is the variable name # and whose second column is the value (with parens included) # This avoids dealing with nested lists v <- matrix(unlist(ulist2), ncol = 2, byrow = TRUE) # Step 4: Strip off the parens w <- apply(v, 2, function(s) gsub('([\\(\\)])', '', s)) colnames(w) <- c('Name', 'Value') w Name Value [1,] "SpeciesCommon" "Human" [2,] "SpeciesScientific" "Homo sapiens" [3,] "ReactiveCentres" "N,C,C,C,+H,O,C,C,C,C,O,H" [4,] "BondInvolved" "C-H" [5,] "EzCatDBID" "S00343" [6,] "BondFormed" "O-H,O-H" [7,] "Bond" "255B" [8,] "Cofactors" "CuII,CU,501,A,CuII,CU,502,A" [9,] "CatalyticSwissProt" "P25006" [10,] "SpeciesScientific" "Achromobacter\ncycloclastes" [11,] "SpeciesCommon" "Bacteria" [12,] "Reactive" "Ce+" # Step 5: Subset out the values of the SpeciesScientific variables subset(as.data.frame(w), Name == 'SpeciesScientific', select = 'Value') Value 2 Homo sapiens 10 Achromobacter\ncycloclastes One possible 'advantage' of this approach is that if you have a number of string records of this type, you can create nested lists for each string and then manipulate the lists to get what you need. Hopefully you can use some of these ideas for other purposes as well. Dennis On Wed, Apr 20, 2011 at 10:17 AM, Neeti <nikkiha...@gmail.com> wrote: > Hi ALL, > > I have very simple question regarding pattern matching. Could anyone tell me > how to I can use R to retrieve string pattern from text file. for example > my file contain following information > > SpeciesCommon=(Human);SpeciesScientific=(Homo > sapiens);ReactiveCentres=(N,C,C,C,+ > H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond+ > 255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);Sp+ > eciesScientific=(Achromobacter > cycloclastes);SpeciesCommon=(Bacteria);ReactiveCe+ > > and I want to extract “SpeciesScientific = (?)” information from this file. > Problem is in 3rd line where SpeciesScientific word is divided with +. > > Could anyone help me please? > Thank you > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Pattern-match-tp3463625p3463625.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.