[EMAIL PROTECTED] wrote: > Situation is this: > 1) I have inherited some python code that accepts a string object, the > contents of which is an XML document, and produces a data structure > that represents some of the content of the XML document > 2) The inherited code is somewhat 'brittle' in that some well-formed > XML documents are not correctly processed by the code; the brittleness > is caused by how the parser portion of the code handles whitespace. > 3) I would like to change the code to make it less brittle. Whatever > changes I make must continue to produce the same data structure that is > currently being produced. > 4) Rather than attempt to fix the parser portion of the code, I would > prefer to use ElementTree. ElementTree handles parsing XML documents > flawlessly, so the brittle portion of the code goes away. In addition, > the ElementTree model is very sweet to work with, so it is a relatively > easy task using the information in ElementTree to produce the same data > structure that is currently being produced. > 5) The existing data structure--the structure that must be > maintained--that gets produced does NOT include any {xmlns=<whatever>} > information that may appear in the source XML document. > 6) Based on a review of several posts in this group, I understand why > ElementTree hanldes xmlns=<whatever> information the way it does. This > is an oversimplification, but one of the things it does is to > incorporate the {whatever} within the tag property of the element and > of any descendent elements. > 7) One of the pieces of information in the data structure that gets > produced by this code is the tag...the tag in the data structure should > not have any xmlns=<whatever> information. > > So, given that the goal is to produce the same data structure and given > that I really want to use ElementTree, I need to find a way to remove > the xmlns=<whatever> information. It seems like there are 2 general > methods for accomplishing this: > 1) before feeding the string object to the ElementTree.XML() method, > remove the xmlns=<whatever> information from the string. > 2) keep the xmlns=<whatever> information in the string that feeds > ElementTree.XML(), but when building the data structure, ensure that > the {whatever} information in the tag property of the element should > NOT be included in the data structure. > [snip]
maybe transform the document with XSLT before processing? google: xslt remove namespaces eg. http://www.tei-c.org/wiki/index.php/Remove-Namespaces.xsl eg. http://www.thescripts.com/forum/thread86057.html hth Gerard -- http://mail.python.org/mailman/listinfo/python-list