On Apr 21, 2011, at 5:27 AM, neetika nath wrote:

Thank you Dennis,

yes the problem is the input file. i have .rdf file and the format is in same way i have posted earlier. if i open that file in notepad++ the lines are divided or broken with CR+LF character. so any suggestion to retrieve
SpeciesScientific information without changing the input file?

You might consider attaching the original file named with an extension of `.txt`, since your verbal description does not match your included example. What I see after the various servers have passed this around and inserted line-ends is the string `SpeciesScientific` in the first line, rather than in the third.

--
David

--

Thank you

On Wed, Apr 20, 2011 at 9:49 PM, Dennis Murphy <djmu...@gmail.com> wrote:

Hi:

This is a bit of a roundabout approach; I'm sure that folks with regex expertise will trump this in a heartbeat. I modified the last piece of the string a bit to accommodate the approach below. Depending on where
the strings have line breaks, you may have some odd '\n' characters
inserted.

# Step 1: read the input as a single character string
u <- "SpeciesCommon=(Human);SpeciesScientific=(Homo

sapiens);ReactiveCentres=(N,C,C,C,+H,O,C,C,C,C,O,H);BondInvolved=(C- H);EzCatDBID=(S00343);BondFormed=(O-H,O- H);Bond=(255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU, 502,A);CatalyticSwissProt=(P25006);SpeciesScientific=(Achromobacter
cycloclastes);SpeciesCommon=(Bacteria);Reactive=(Ce+)"

# Step 2: Split input lines by the ';' delimiter and then use lapply()
to split variable names from values.
# This results in a nested list for ulist2.
ulist <- strsplit(u, ';')
ulist2 <- lapply(ulist, function(s) strsplit(s, '='))

# Step 3: Break out the results into a matrix whose first column is
the variable name
# and whose second column is the value (with parens included)
# This avoids dealing with nested lists
v <- matrix(unlist(ulist2), ncol = 2, byrow = TRUE)

# Step 4: Strip off the parens
w <- apply(v, 2, function(s) gsub('([\\(\\)])', '', s))
colnames(w) <- c('Name', 'Value')
w
    Name                 Value
[1,] "SpeciesCommon"      "Human"
[2,] "SpeciesScientific"  "Homo sapiens"
[3,] "ReactiveCentres"    "N,C,C,C,+H,O,C,C,C,C,O,H"
[4,] "BondInvolved"       "C-H"
[5,] "EzCatDBID"          "S00343"
[6,] "BondFormed"         "O-H,O-H"
[7,] "Bond"               "255B"
[8,] "Cofactors"          "CuII,CU,501,A,CuII,CU,502,A"
[9,] "CatalyticSwissProt" "P25006"
[10,] "SpeciesScientific"  "Achromobacter\ncycloclastes"
[11,] "SpeciesCommon"      "Bacteria"
[12,] "Reactive"           "Ce+"

# Step 5: Subset out the values of the SpeciesScientific variables
subset(as.data.frame(w), Name == 'SpeciesScientific', select = 'Value')
                       Value
2                 Homo sapiens
10 Achromobacter\ncycloclastes


One possible 'advantage' of this approach is that if you have a number
of string records of this type, you can create nested lists for each
string and then manipulate the lists to get what you need. Hopefully
you can use some of these ideas for other purposes as well.

Dennis



On Wed, Apr 20, 2011 at 10:17 AM, Neeti <nikkiha...@gmail.com> wrote:
Hi ALL,

I have very simple question regarding pattern matching. Could anyone tell
me
how to I can use R to retrieve string pattern from text file.  for
example
my file contain following information

SpeciesCommon=(Human);SpeciesScientific=(Homo
sapiens);ReactiveCentres=(N,C,C,C,+

H,O,C,C,C,C,O,H);BondInvolved=(C- H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond+

255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU, 502,A);CatalyticSwissProt=(P25006);Sp+
eciesScientific=(Achromobacter
cycloclastes);SpeciesCommon=(Bacteria);ReactiveCe+

and I want to extract “SpeciesScientific = (?)” information from this
file.
Problem is in 3rd line where SpeciesScientific word is divided with +.

Could anyone help me please?
Thank you


--
View this message in context:
http://r.789695.n4.nabble.com/Pattern-match-tp3463625p3463625.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to