[issue25258] HtmlParser doesn't handle void element tags correctly

Chenyun Yang Fri, 02 Oct 2015 12:18:30 -0700

Chenyun Yang added the comment:

the example you give for <li> is a different case.


<img>, <link> are void elements which are allowed to have no close tag;
<li> without </li> is a browser implementation detail, most browser
autocompletes </li>.

Without the parser calls the handle_endtag(), the client code which uses
HTMLParser won't be able to know whether the a traversal is finished.

Do you have a strong reason why we should include the knowledge of  void
elements into the HTMLParser at this line?

https://github.com/python/cpython/blob/bdfb14c688b873567d179881fc5bb67363a6074c/Lib/html/parser.py#L341

if end.endswith('/>') or (end.endswith('>') and tag in VOID_ELEMENTS)

On Wed, Sep 30, 2015 at 7:05 PM, Martin Panter <[email protected]>
wrote:

>
> Martin Panter added the comment:
>
> My thinking is that the knowledge that <img> does not have a closing tag
> is at a higher level than the current HTMLParser class. It is similar to
> knowing where the following HTML implicitly closes the <li> elements:
>
> <ul><li>Item A<li>Item B</ul>
>
> In both cases I would not expect the HTMLParser to report “virtual” empty
> or closing tags. I don’t think it should report an empty <img/> or closing
> </img> tag just because that is easy to do, because it would be
> inconsistent with other implied HTML tags. But maybe see what other people
> say.
>
> I don’t know your particular use case, but I would suggest if you need to
> parse non-XML HTML <img> tags, use the handle_starttag() method and don’t
> rely on the end tag :)
>
> ----------
>
> _______________________________________
> Python tracker <[email protected]>
> <http://bugs.python.org/issue25258>
> _______________________________________
>

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25258>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25258] HtmlParser doesn't handle void element tags correctly

Reply via email to