New submission from Martin Potthast :
The title says it all; try the minimal example.
--
components: Library (Lib)
files: parser-fail.py
messages: 124506
nosy: Martin.Potthast
priority: normal
severity: normal
status: open
title: HTMLParser.unescape() cannot handle HTML entities with
Changes by Martin Potthast :
--
title: HTMLParser.unescape() cannot handle HTML entities with incorrect syntax
(e.g. &#hearts;) -> HTMLParser.unescape() fails on HTML entities with incorrect
syntax (e.g. &#hearts;)
___
Python tra
Martin Potthast added the comment:
I'd suggest to better verify the input and return such strings unchanged.
--
type: -> behavior
___
Python tracker
<http://bugs.python.org
Martin Potthast added the comment:
Agreed. Here's a patch for HTMLParser. That was easy enough.
With regard to tests, there seems to be already one called
test_malformatted_charref in test_htmlparser.py. However, the test tests the
whole parser and not only HTMLParser.unescape().
A
Martin Potthast added the comment:
Why not simply remove the additional check in line 168 and leave the
responsibility to check the validity of its input to the unescape function (be
it explicitly or, like now, lazily). That way, the code changes are minimal,
the existing test covers the