Hello, I am working on a project where I'm using python to parse HTML pages, transforming data between certain tags. Currently the HTMLParser class is being used for this. In a nutshell, its pretty simple -- I'm feeding the contents of the HTML page to HTMLParser, then I am overriding the appropriate handle_ method to handle this extracted data. In that method, I take the found data and I transform it into another string based on some logic.
Now, what I would like to do here is take that transformed string and put it "back into" the HTML document. Has anybody ever implemented something like this with HTMLParser? I'm thinking maybe somehow have HTMLParser append each character it reads except for data inside tags in some kind of buffer? This way I can have the HTML contents read into a buffer, then when I do my own handle_ overrides, I can also append to that buffer with the transformed data. Once the HTML page is finished parsing, ideally I would be able to print the contents of the buffer and the HTML would be identical except for the string transformations. I also need to make sure that all newlines, tags, spacing, etc are kept in tact -- this part is a requirement for other reasons. Thanks! -- http://mail.python.org/mailman/listinfo/python-list