New submission from Waylan Limberg <waylan.limb...@icloud.com>:
When the `close` method of the HtmlParser is called, any cached text data is generally flushed and passed to a `data` event; except when in `data_mode`. Specifically, if an unclosed `script` or `style` tag has been encountered, a call to `close` does not flush the data. A simple test which demonstrates the issue is attached. I see that in Lib/html/parser.py#L244-L249 there are two nested if statements which both check for `not self.cdata_elem`. Obviously, if we got past the first one, that situation will never exist for the nested one. Somehow this block of code needs a branch for when `self.cdata_elem` is True. I should note that the input is invalid HTML. However, the existing behavior results in data loss. Within any other unclosed tag (other than `script` or `style`) any data is still flushed and passed to a `data` event. I would expect the same behavior here. Although, the data escaping behavior should perhaps be applied as it is with data within properly closed tags. ---------- components: Library (Lib) files: test_html.py messages: 378359 nosy: waylan priority: normal severity: normal status: open title: htmlparser unclosed script tag causes data loss type: behavior versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file49505/test_html.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41989> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com