[EMAIL PROTECTED] wrote: > I have the unenviable task of turning about 20K strangely formatted > XML documents from different sources into something resembling a > clean, standard, uniform format. I like Elementtree and have been > using it to step through the documents to get a feel for their > structure. .getiterator() gives me a depth-first traversal that > eliminates the hierarchy of the elements. What I'd like is to be able > to traverse elements while keeping track of ancestors, and print out > the full structure of all of an ancestor's nodes as I arrive at each > node.
Try lxml.etree. It's an extended re-implementation of ElementTree based on libxml2. Amongst tons of other features, it provides its Elements with a getparent() method and allows you to iterate over their ancestors (and other XPath axes), or to iterate over a parsed document in an iterparse-like fashion (called iterwalk). http://codespeak.net/lxml/ Stefan -- http://mail.python.org/mailman/listinfo/python-list