The output I was contemplating was a DOM "DNA" - that is the DOM without the instances of the elements or their data, a bare tree, a prototype tree based on what is in the document (rather than what is legal to include in the document).
Just enough data that for an arbitrary element I would know: 1) whether the element was in a document 2) where to find it (the chain of parents) As I mentioned, I'm just starting to think about the subject, so maybe the best practice is something else like loading the full DOM into memory. I'm still in the make one to throw away one mode. EP -- http://mail.python.org/mailman/listinfo/python-list