samwyse wrote: > I'm processing some potentially large datasets stored as HTML. I've > subclassed HTMLParser so that handle_endtag() accumulates data into a > list, which I can then fetch when everything's done. I'd prefer, > however, to have handle_endtag() somehow yield values while the input > data is still streaming in. I'm sure someone's done something like > this before, but I can't figure it out. Can anyone help? Thanks.
If you can afford stepping away from HTMLParser, you could give lxml a try. Its iterparse() function supports HTML parsing. http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk Stefan -- http://mail.python.org/mailman/listinfo/python-list