Thank you for your message. please see attach file for the template/test dataset of my file.
On Thu, Apr 21, 2011 at 1:30 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Apr 21, 2011, at 5:27 AM, neetika nath wrote: > > Thank you Dennis, >> >> yes the problem is the input file. i have .rdf file and the format is in >> same way i have posted earlier. if i open that file in notepad++ the lines >> are divided or broken with CR+LF character. so any suggestion to retrieve >> SpeciesScientific information without changing the input file? >> > > You might consider attaching the original file named with an extension of > `.txt`, since your verbal description does not match your included example. > What I see after the various servers have passed this around and inserted > line-ends is the string `SpeciesScientific` in the first line, rather than > in the third. > > -- > David > > -- > >> >> Thank you >> >> On Wed, Apr 20, 2011 at 9:49 PM, Dennis Murphy <djmu...@gmail.com> wrote: >> >> Hi: >>> >>> This is a bit of a roundabout approach; I'm sure that folks with regex >>> expertise will trump this in a heartbeat. I modified the last piece of >>> the string a bit to accommodate the approach below. Depending on where >>> the strings have line breaks, you may have some odd '\n' characters >>> inserted. >>> >>> # Step 1: read the input as a single character string >>> u <- "SpeciesCommon=(Human);SpeciesScientific=(Homo >>> >>> >>> sapiens);ReactiveCentres=(N,C,C,C,+H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond=(255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);SpeciesScientific=(Achromobacter >>> cycloclastes);SpeciesCommon=(Bacteria);Reactive=(Ce+)" >>> >>> # Step 2: Split input lines by the ';' delimiter and then use lapply() >>> to split variable names from values. >>> # This results in a nested list for ulist2. >>> ulist <- strsplit(u, ';') >>> ulist2 <- lapply(ulist, function(s) strsplit(s, '=')) >>> >>> # Step 3: Break out the results into a matrix whose first column is >>> the variable name >>> # and whose second column is the value (with parens included) >>> # This avoids dealing with nested lists >>> v <- matrix(unlist(ulist2), ncol = 2, byrow = TRUE) >>> >>> # Step 4: Strip off the parens >>> w <- apply(v, 2, function(s) gsub('([\\(\\)])', '', s)) >>> colnames(w) <- c('Name', 'Value') >>> w >>> Name Value >>> [1,] "SpeciesCommon" "Human" >>> [2,] "SpeciesScientific" "Homo sapiens" >>> [3,] "ReactiveCentres" "N,C,C,C,+H,O,C,C,C,C,O,H" >>> [4,] "BondInvolved" "C-H" >>> [5,] "EzCatDBID" "S00343" >>> [6,] "BondFormed" "O-H,O-H" >>> [7,] "Bond" "255B" >>> [8,] "Cofactors" "CuII,CU,501,A,CuII,CU,502,A" >>> [9,] "CatalyticSwissProt" "P25006" >>> [10,] "SpeciesScientific" "Achromobacter\ncycloclastes" >>> [11,] "SpeciesCommon" "Bacteria" >>> [12,] "Reactive" "Ce+" >>> >>> # Step 5: Subset out the values of the SpeciesScientific variables >>> subset(as.data.frame(w), Name == 'SpeciesScientific', select = 'Value') >>> Value >>> 2 Homo sapiens >>> 10 Achromobacter\ncycloclastes >>> >>> >>> One possible 'advantage' of this approach is that if you have a number >>> of string records of this type, you can create nested lists for each >>> string and then manipulate the lists to get what you need. Hopefully >>> you can use some of these ideas for other purposes as well. >>> >>> Dennis >>> >>> >>> >>> On Wed, Apr 20, 2011 at 10:17 AM, Neeti <nikkiha...@gmail.com> wrote: >>> >>>> Hi ALL, >>>> >>>> I have very simple question regarding pattern matching. Could anyone >>>> tell >>>> >>> me >>> >>>> how to I can use R to retrieve string pattern from text file. for >>>> >>> example >>> >>>> my file contain following information >>>> >>>> SpeciesCommon=(Human);SpeciesScientific=(Homo >>>> sapiens);ReactiveCentres=(N,C,C,C,+ >>>> >>>> H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond+ >>> >>>> >>>> 255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);Sp+ >>> >>>> eciesScientific=(Achromobacter >>>> cycloclastes);SpeciesCommon=(Bacteria);ReactiveCe+ >>>> >>>> and I want to extract “SpeciesScientific = (?)” information from this >>>> >>> file. >>> >>>> Problem is in 3rd line where SpeciesScientific word is divided with +. >>>> >>>> Could anyone help me please? >>>> Thank you >>>> >>>> >>>> -- >>>> View this message in context: >>>> >>> http://r.789695.n4.nabble.com/Pattern-match-tp3463625p3463625.html >>> >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> >>> http://www.R-project.org/posting-guide.html >>> >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > >
-- $DTYPE ROOT:OVERALL REACTION(1):OVERALL REACTION ANNOTATION lyticCATH=(3.40.50.360);BondOrderChanged=(C-N,1,C=N,2,C=C,2,C-C,1,C-C,1,C=C,2,C-+ C,1,C=C,2,C=C,2,C-C,1,C-C,1,C=C,2,C=O,2,C-O,1,C=O,2,C-O,1);CatalyticResidues=(Gl+ y149A,Tyr155A,His161A);Cofactors=(FAD,FAD,601,none);CatalyticSwissProt=(P15559);+ SpeciesCommon=(Human);SpeciesScientific=(Homo sapiens);ReactiveCentres=(N,C,C,C,+ H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond+ -- $DTYPE ROOT:OVERALL REACTION(1):OVERALL REACTION ANNOTATION $DATUM CatalyticCATH=(2.60.40.420);CatalyticResidues=(Asp98A,His135A,Cys136A,His+ 255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);Sp+ eciesScientific=(Achromobacter cycloclastes);SpeciesCommon=(Bacteria);ReactiveCe+ ntres=(N,O,H,Cu);BondFormed=(O-H);BondCleaved=(O-N);PreviousEC=(1.7.99.3,1.9.3.2+ );Return=(Yes);CreatedBy=(GLH,GJB,DEA);DLU=(24102008);MID=(M0004);KEGG=(R00785). -- $DTYPE ROOT:OVERALL REACTION(1):OVERALL REACTION ANNOTATION $DATUM OverallComment=(The reference states that this mechanism was elucidated a+ t low pH. This enzyme specifically removes basic or hydrophobic amino acid resid+ ues from the C-terminus of the peptide substrate.);CatalyticCATH=(3.40.50.1820);+ CatalyticResidues=(Gly53A,Ser146A,Tyr147A,Asp338B,His397B);CatalyticSwissProt=(P+ 08819);SpeciesCommon=(Wheat);SpeciesScientific=(Triticum aestivum);ReactiveCentr+ es=(N,H,O,C);EzCatDBID=(S00374);BondFormed=(N-H,C-O);BondCleaved=(C-N,O-H);Retur+ n=(Yes);DLU=(24102008);MID=(M0005);CreatedBy=(GLH,GJB,DEA).
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.