Ezio Melotti <ezio.melo...@gmail.com> added the comment:

I think <x><y z=""o"" /></x> should be parser as <x><y z="" /></x>, and the o"" 
should be ignored.
<x><y z="""" /></x> should be parser as <x><y z="" /></x>, and the last two "" 
should be ignored.  This is what Firefox seems to do.

Currently the parser doesn't seem to handle extraneous data in the start tag 
too well, because the locatestarttagend_tolerant regex looks for (more or less) 
well-formed attributes.
Attached a patch for test_htmlparser with the two examples provided by Kevin.

----------
keywords: +patch
nosy: +ezio.melotti
stage:  -> needs patch
Added file: http://bugs.python.org/file23579/issue12629.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12629>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to