New submission from Stefan Schweizer <steve.schwei...@gmail.com>: HTMLParser should only handle entity references that are terminated with a semicolon. I know that the semicolon can be omitted in some cases (http://www.w3.org/TR/html4/charset.html#h-5.3) and that some browsers are more tolerant, but the following example causes some odd output:
>>> import HTMLParser >>> class EntityrefParser(HTMLParser.HTMLParser): ... def handle_data(self, data): ... print "handle_data '%s'" % data ... def handle_entityref(self, name): ... print "handle_entityref '%s'" % name ... >>> p = EntityrefParser() >>> p.feed("<p>spam&eggs are delicious</p>") Expected Result: handle_data 'spam&eggs are delicious' Actual Result: handle_data 'spam' handle_entityref 'eggs' handle_data ' are delicious' ---------- components: Library (Lib) messages: 97177 nosy: stefan.schweizer severity: normal status: open title: Entity references without semicolon in HTMLParser type: behavior versions: Python 2.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7626> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com