Hello all, I have the unenviable task of turning about 20K strangely formatted XML documents from different sources into something resembling a clean, standard, uniform format. I like Elementtree and have been using it to step through the documents to get a feel for their structure. .getiterator() gives me a depth-first traversal that eliminates the hierarchy of the elements. What I'd like is to be able to traverse elements while keeping track of ancestors, and print out the full structure of all of an ancestor's nodes as I arrive at each node. So, for example, if I had a document that looked like this:
<a> <b att="atttag" content="b"> this is node b </b> <c> this is node c <d /> <e> this is node e </e> </c> <f> this is node f </f> </a> I would want to print the following: <a> <a> <b> <a> <b> text: this is node b <a> <c> <a> <c> text: this is node c <a> <c> <d> <a> <c> <e> <a> <c> <e> text: this is node e <a> <f> <a> <f> this is node f Is there a simple way to do this? Any help would be appreciated. Thanks.. -- http://mail.python.org/mailman/listinfo/python-list