Ezio Melotti <ezio.melo...@gmail.com> added the comment: With 3.2 the situation is more complicated because there is a strict and a non-strict mode. The strict mode uses: attrfind = re.compile( r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*' r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]*))?')
and the tolerant mode uses: attrfind_tolerant = re.compile( r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*' r'(\'[^\']*\'|"[^"]*"|[^>\s]*))?') This means that the strict mode doesn't allow valid non-ASCII chars, and that tolerant mode is a little too permissive. The attached patch changes the strict regex to be more permissive and leaves the tolerant regex unchanged. The difference between the two are now so small that the tolerant version could be removed, except that re.search is used instead of re.match when the tolerant regex is used. ---------- nosy: +r.david.murray Added file: http://bugs.python.org/file21545/issue7311-3.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7311> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com