On Aug 1, 1:31 pm, [EMAIL PROTECTED] wrote: <snip> > > I'm thinking maybe somehow have HTMLParser append each character it > reads except for data inside tags in some kind of buffer? This way I > can have the HTML contents read into a buffer, then when I do my own > handle_ overrides, I can also append to that buffer with the > transformed data. Once the HTML page is finished parsing, ideally I > would be able to print the contents of the buffer and the HTML would > be identical except for the string transformations. > > I also need to make sure that all newlines, tags, spacing, etc are > kept in tact -- this part is a requirement for other reasons. > > Thanks!
What you describe is almost exactly how pyparsing implements transformString. See below: from pyparsing import * boldStart,boldEnd = makeHTMLTags("B") # convert <B> to <div class="bold"> and </B> to </div> boldStart.setParseAction(replaceWith('<div class="emphatic">')) boldEnd.setParseAction(replaceWith('</div>')) converter = boldStart | boldEnd html = "Display this in <b>bold</b>" print converter.transformString(html) Prints: Display this in <div class="emphatic">bold</div> All text not matched by a pattern in the converter is left as-is. (My CSS style/form may not be up to date, but I hope you get the idea.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list