[issue6662] HTMLParser.HTMLParser doesn't handle malformed charrefs

Dave Day Thu, 06 Aug 2009 18:27:46 -0700

New submission from Dave Day <dayve...@gmail.com>:

When HTMLParser.HTMLParser encounters a malformed charref (for example 
&#bad;) it no longer parsers the following HTML correctly.


For example:
  <p>&#bad;</p>
Recognises the starttag "p" but considers the rest to be data.

To reproduce:
class MyParser(HTMLParser.HTMLParser):
  def handle_starttag(self, tag, attrs):
    print 'Start "%s"' % tag
  def handle_endtag(self,tag):
    print 'End "%s"' % tag
  def handle_charref(self, ref):
    print 'Charref "%s"' % ref
  def handle_data(self, data):
    print 'Data "%s"' % data
parser = MyParser()
parser.feed('<p>&#bad;</p>')
parser.close()

Expected output:
Start "p"
Data "&#bad;"
End "p"

Actual output:
Start "p"
Data "&#bad;</p>"

----------
components: Library (Lib)
messages: 91392
nosy: dayveday
severity: normal
status: open
title: HTMLParser.HTMLParser doesn't handle malformed charrefs
type: behavior
versions: Python 2.4, Python 2.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6662>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6662] HTMLParser.HTMLParser doesn't handle malformed charrefs

Reply via email to