New submission from Dave Day <dayve...@gmail.com>: When HTMLParser.HTMLParser encounters a malformed charref (for example &#bad;) it no longer parsers the following HTML correctly.
For example: <p>&#bad;</p> Recognises the starttag "p" but considers the rest to be data. To reproduce: class MyParser(HTMLParser.HTMLParser): def handle_starttag(self, tag, attrs): print 'Start "%s"' % tag def handle_endtag(self,tag): print 'End "%s"' % tag def handle_charref(self, ref): print 'Charref "%s"' % ref def handle_data(self, data): print 'Data "%s"' % data parser = MyParser() parser.feed('<p>&#bad;</p>') parser.close() Expected output: Start "p" Data "&#bad;" End "p" Actual output: Start "p" Data "&#bad;</p>" ---------- components: Library (Lib) messages: 91392 nosy: dayveday severity: normal status: open title: HTMLParser.HTMLParser doesn't handle malformed charrefs type: behavior versions: Python 2.4, Python 2.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6662> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com