[issue10759] HTMLParser.unescape() cannot handle HTML entities with incorrect syntax (e.g. &#hearts; )

2010-12-22 Thread Martin Potthast
New submission from Martin Potthast : The title says it all; try the minimal example. -- components: Library (Lib) files: parser-fail.py messages: 124506 nosy: Martin.Potthast priority: normal severity: normal status: open title: HTMLParser.unescape() cannot handle HTML entities with

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. &#hearts; )

2010-12-22 Thread Martin Potthast
Changes by Martin Potthast : -- title: HTMLParser.unescape() cannot handle HTML entities with incorrect syntax (e.g. &#hearts;) -> HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. &#hearts;) ___ Python tra

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. &#hearts; )

2010-12-22 Thread Martin Potthast
Martin Potthast added the comment: I'd suggest to better verify the input and return such strings unchanged. -- type: -> behavior ___ Python tracker <http://bugs.python.org

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. &#hearts; )

2010-12-22 Thread Martin Potthast
Martin Potthast added the comment: Agreed. Here's a patch for HTMLParser. That was easy enough. With regard to tests, there seems to be already one called test_malformatted_charref in test_htmlparser.py. However, the test tests the whole parser and not only HTMLParser.unescape(). A

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. &#hearts; )

2010-12-22 Thread Martin Potthast
Martin Potthast added the comment: Why not simply remove the additional check in line 168 and leave the responsibility to check the validity of its input to the unescape function (be it explicitly or, like now, lazily). That way, the code changes are minimal, the existing test covers the