New submission from Chenyun Yang: For void elements such as (<link>, <img>), there doesn't need to have xhtml empty end tag. HtmlParser which relies on the XHTML empty end syntax failed to handle this situation.
from HTMLParser import HTMLParser # create a subclass and override the handler methods class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print "Encountered a start tag:", tag def handle_endtag(self, tag): print "Encountered an end tag :", tag def handle_data(self, data): print "Encountered some data :", data >>> parser.feed('<link rel="import"><img src="som">') Encountered a start tag: link Encountered a start tag: img >>> parser.feed('<link rel="import"/><img src="som"/>') Encountered a start tag: link Encountered an end tag : link Encountered a start tag: img Encountered an end tag : img Reference: https://github.com/python/cpython/blob/bdfb14c688b873567d179881fc5bb67363a6074c/Lib/html/parser.py http://www.w3.org/TR/html5/syntax.html#void-elements ---------- components: Library (Lib) messages: 251792 nosy: Chenyun Yang priority: normal severity: normal status: open title: HtmlParser doesn't handle void element tags correctly versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25258> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com