On Sun, 31 Jan 2010 20:57:31 +0100, Robert wrote: > I tried lxml, but after walking and making changes in the element > tree, I'm forced to do a full serialization of the whole document > (etree.tostring(tree)) - which destroys the "human edited" format > of the original HTML code. > makes it rather unreadable. > > is there an existing HTML parser which supports tracking/writing > back particular changes in a cautious way by just making local > changes? or a least tracks the tag start/end positions in the file?
HTMLParser, sgmllib.SGMLParser and htmllib.HTMLParser all allow you to retrieve the literal text of a start tag (but not an end tag). Unfortunately, they're only tokenisers, not parsers, so you'll need to handle minimisation yourself. -- http://mail.python.org/mailman/listinfo/python-list