Re: Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-08 Thread Fredrik Lundh
Fredrik Lundh wrote: > the only difference between the libs (*) is that HTMLParser is a bit > stricter *) "the libs" referring to htmllib and HTMLParser, not htmllib and sgmllib. -- http://mail.python.org/mailman/listinfo/python-list

Re: Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-08 Thread Fredrik Lundh
Kenneth McDonald wrote: > The problem I'm having with HTMLParser is simple; I don't seem to be > getting the actual text in the HTML document. I've implemented the > do_data method of HTMLParser.HTMLParser in my HTMLParser subclass, but > it never seems to receive any data. Is there another way

Re: Parsing HTML--looking for info/comparison of HTMLParser vs. htmllib modules.

2006-07-07 Thread wes weston
from HTMLParser import HTMLParser class MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.TokenList = [] def handle_data( self,data): data = data.strip() if data and len(data) > 0: self.TokenList.append(data)