New submission from Ezio Melotti <ezio.melo...@gmail.com>: The attached patch adds a few tests about the handling of broken conditional comments (condcoms). A valid condcom looks like <!--[if ie 6]>...<![endif]-->. An invalid one looks like <![if ie 6]>...<![endif]>. This seems a common mistake, and it's found even on popular sites like adobe, linkedin, deviantart.
Currently, HTMLParser calls unknown_decl() passing e.g. 'if ie 6', and if strict=True an error is raised. With strict=False no error is raised and the unknown declaration is ignored. The HTML5 specs say: """ [After '<!',] If the next two characters are both U+002D HYPHEN-MINUS characters (-), consume those two characters, [...] Otherwise, this is a parse error. Switch to the bogus comment state.[0] [Once in the bogus comment state,] Consume every character up to and including the first U+003E GREATER-THAN SIGN character (>) or the end of the file (EOF), whichever comes first. Emit a comment token whose data is the concatenation of all the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character (i.e. up to the character just before the U+003E or EOF character), but with any U+0000 NULL characters replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment was started by the end of the file (EOF), the token is empty.)[1] """ So, IIUC, '<![if ie 6]>...<![endif]>' should emit a '[if ie 6]' comment, parse the '...' normally, and emit a '[endif]' comment. However I think it's fine to leave the current behavior for the following reasons: 1) backward compatibility; 2) handling broken condcoms in unknown_decl is easier than doing it in handle_comment, where all the other comments are sent; 3) no one probably cares about them anyway; [0]: http://www.w3.org/TR/html5/tokenization.html#markup-declaration-open-state [1]: http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state ---------- assignee: ezio.melotti components: Library (Lib) files: issue13576.diff keywords: patch messages: 149204 nosy: eric.araujo, ezio.melotti priority: normal severity: normal stage: commit review status: open title: Handling of broken condcoms in HTMLParser type: behavior versions: Python 2.7, Python 3.2, Python 3.3 Added file: http://bugs.python.org/file23909/issue13576.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13576> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com