Fredrik Lundh wrote:
> the only difference between the libs (*) is that HTMLParser is a bit
> stricter
*) "the libs" referring to htmllib and HTMLParser, not htmllib and sgmllib.
--
http://mail.python.org/mailman/listinfo/python-list
Kenneth McDonald wrote:
> The problem I'm having with HTMLParser is simple; I don't seem to be
> getting the actual text in the HTML document. I've implemented the
> do_data method of HTMLParser.HTMLParser in my HTMLParser subclass, but
> it never seems to receive any data. Is there another way
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.TokenList = []
def handle_data( self,data):
data = data.strip()
if data and len(data) > 0:
self.TokenList.append(data)