On Mar 8, 2:02 pm, Nemesis <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > I have a large file that has many lines like this, > > > <element tag="300a,0014" vr="CS" vm="1" len="4" > > name="DoseReferenceStructureType">SITE</element> > > > I would like to identify the line by the tag (300a,0014) and then grab > > the name (DoseReferenceStructureType) and value (SITE). > > You should try with Regular Expressions or if it is something like xml there > is for sure a library you can you to parse it ... <snip>
When it comes to parsing HTML or XML of uncontrolled origin, regular expressions are an iffy proposition. You'd be amazed what kind of junk shows up inside an XML (or worse, HTML) tag. Pyparsing includes a builtin method for constructing tag matching parsing patterns, which you can then use to scan through the XML or HTML source: from pyparsing import makeXMLTags, withAttribute, SkipTo testdata = """ <blah> <element tag="300a,0014" vr="CS" vm="1" len="4" name="DoseReferenceStructureType">SITE</element> <element tag="300Z,0019" vr="CS" vm="1" len="4" name="DoseReferenceStructureType">SITEXXX</element> <element tag="300a,0014" vr="CS" vm="1" len="4" name="DoseReferenceStructureType">SITE2</element> <blahblah> """ elementStart,elementEnd = makeXMLTags("element") elementStart.setParseAction(withAttribute(tag="300a,0014")) search = elementStart + SkipTo(elementEnd)("body") for t in search.searchString(testdata): print t.name print t.body Prints: DoseReferenceStructureType SITE DoseReferenceStructureType SITE2 In this case, the parse action withAttribute filters <element> tag matches, accepting *only* those with the attribute "tag" and the value "300a,0014". The pattern search adds on the body of the <element></ element> tag, and gives it the name "body" so it is easily accessed after parsing is completed. -- Paul (More about pyparsing at http://pyparsing.wikispaces.com.) -- http://mail.python.org/mailman/listinfo/python-list