New submission from Mark Nottingham <m...@mnot.net>: In markupbase.py's ParserBase.parse_declaration, an unexpected character is caught like this:
else: self.error( "unexpected %r char in declaration" % rawdata[j]) However, the position (j) isn't updated, which means that error() will be called again once it returns. For example, this declaration: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" http://www.w3.org/TR/html4/loose.dtd> (which I think is generated by MS Office) will trigger this behaviour. Two possible resolutions: 1) increment J and try the next character in this case 2) document that error() is not recoverable; i.e., it MUST raise an exception. My preference is strongly for #1 (as HTML parsing should be forgiving, and HTMLParser is based upon markerbase). ---------- components: Library (Lib) messages: 106938 nosy: mnot priority: normal severity: normal status: open title: markerbase declaration errors aren't recoverable type: behavior versions: Python 2.6 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8885> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com