[issue7626] Entity references without semicolon in HTMLParser

Stefan Schweizer Sun, 03 Jan 2010 12:13:41 -0800

New submission from Stefan Schweizer <steve.schwei...@gmail.com>:

HTMLParser should only handle entity references that are terminated with a 
semicolon. I know that the semicolon can be omitted in some cases 
(http://www.w3.org/TR/html4/charset.html#h-5.3) and that some browsers are more 
tolerant, but the following example causes some odd output:


>>> import HTMLParser
>>> class EntityrefParser(HTMLParser.HTMLParser):
...     def handle_data(self, data):
...         print "handle_data '%s'" % data
...     def handle_entityref(self, name):
...         print "handle_entityref '%s'" % name
... 
>>> p = EntityrefParser()
>>> p.feed("<p>spam&eggs are delicious</p>")

Expected Result:
handle_data 'spam&eggs are delicious'

Actual Result:
handle_data 'spam'
handle_entityref 'eggs'
handle_data ' are delicious'

----------
components: Library (Lib)
messages: 97177
nosy: stefan.schweizer
severity: normal
status: open
title: Entity references without semicolon in HTMLParser
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7626>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7626] Entity references without semicolon in HTMLParser

Reply via email to