Re: [R] Pattern match

David Winsemius Thu, 21 Apr 2011 05:31:17 -0700


On Apr 21, 2011, at 5:27 AM, neetika nath wrote:

Thank you Dennis,
yes the problem is the input file. i have .rdf file and the formatis insame way i have posted earlier. if i open that file in notepad++ thelinesare divided or broken with CR+LF character. so any suggestion toretrieve
SpeciesScientific information without changing the input file?

You might consider attaching the original file named with an extensionof `.txt`, since your verbal description does not match your includedexample. What I see after the various servers have passed this aroundand inserted line-ends is the string `SpeciesScientific` in the firstline, rather than in the third.


--
David

--


Thank you

On Wed, Apr 20, 2011 at 9:49 PM, Dennis Murphy <djmu...@gmail.com>wrote:

Hi:

This is a bit of a roundabout approach; I'm sure that folks withregexexpertise will trump this in a heartbeat. I modified the last pieceofthe string a bit to accommodate the approach below. Depending onwhere

the strings have line breaks, you may have some odd '\n' characters
inserted.

# Step 1: read the input as a single character string
u <- "SpeciesCommon=(Human);SpeciesScientific=(Homo

sapiens);ReactiveCentres=(N,C,C,C,+H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond=(255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);SpeciesScientific=(Achromobacter

cycloclastes);SpeciesCommon=(Bacteria);Reactive=(Ce+)"

# Step 2: Split input lines by the ';' delimiter and then uselapply()

to split variable names from values.
# This results in a nested list for ulist2.
ulist <- strsplit(u, ';')
ulist2 <- lapply(ulist, function(s) strsplit(s, '='))

# Step 3: Break out the results into a matrix whose first column is
the variable name
# and whose second column is the value (with parens included)
# This avoids dealing with nested lists
v <- matrix(unlist(ulist2), ncol = 2, byrow = TRUE)

# Step 4: Strip off the parens
w <- apply(v, 2, function(s) gsub('([\\(\\)])', '', s))
colnames(w) <- c('Name', 'Value')
w
    Name                 Value
[1,] "SpeciesCommon"      "Human"
[2,] "SpeciesScientific"  "Homo sapiens"
[3,] "ReactiveCentres"    "N,C,C,C,+H,O,C,C,C,C,O,H"
[4,] "BondInvolved"       "C-H"
[5,] "EzCatDBID"          "S00343"
[6,] "BondFormed"         "O-H,O-H"
[7,] "Bond"               "255B"
[8,] "Cofactors"          "CuII,CU,501,A,CuII,CU,502,A"
[9,] "CatalyticSwissProt" "P25006"
[10,] "SpeciesScientific"  "Achromobacter\ncycloclastes"
[11,] "SpeciesCommon"      "Bacteria"
[12,] "Reactive"           "Ce+"

# Step 5: Subset out the values of the SpeciesScientific variables

subset(as.data.frame(w), Name == 'SpeciesScientific', select ='Value')

                       Value
2                 Homo sapiens
10 Achromobacter\ncycloclastes

One possible 'advantage' of this approach is that if you have anumber

of string records of this type, you can create nested lists for each
string and then manipulate the lists to get what you need. Hopefully
you can use some of these ideas for other purposes as well.

Dennis



On Wed, Apr 20, 2011 at 10:17 AM, Neeti <nikkiha...@gmail.com> wrote:

Hi ALL,
I have very simple question regarding pattern matching. Couldanyone tell

me

how to I can use R to retrieve string pattern from text file.  for

example

my file contain following information

SpeciesCommon=(Human);SpeciesScientific=(Homo
sapiens);ReactiveCentres=(N,C,C,C,+

H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond+

255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);Sp+

eciesScientific=(Achromobacter
cycloclastes);SpeciesCommon=(Bacteria);ReactiveCe+
and I want to extract SpeciesScientific = (?) information fromthis

file.

Problem is in 3rd line where SpeciesScientific word is dividedwith +.
Could anyone help me please?
Thank you


--
View this message in context:

http://r.789695.n4.nabble.com/Pattern-match-tp3463625p3463625.html

Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pattern match

Reply via email to