kxroberto <kxrobe...@users.sourceforge.net> added the comment: The old patch warned already the majority of real cases - except the missing white space between attributes.
"The tolerant regex will match both": locatestarttagend_tolerant: The main and frequent issue on the web here is the missing white space between attributes (with enclosed values). And there is the new tolerant comma between attributes, which however I have not seen so far anywhere (the old warning machanism and attrfind.match would have already raised it at "junk chars ..." event. Both issues can be easily warned (also/already) at quite no cost by the slightly extended regex below (when the 2 new non-pseudo regex groups are check against <>None in check_for_whole_start_tag). Or missing whitespace could be warned (multiple times) at attrfind time. attrfind_tolerant : I see no point in the old/"strict" attrfind. (and the difference is guessed 0.000% of real cases). attrfind_tolerant could become the only attrfind. -- locatestarttagend_tolerant = re.compile(r""" <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:(?:\s+|(\s*)) # optional whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) (?:\s*(,))* # possibly followed by a comma )? ) )* \s* # trailing whitespace """, re.VERBOSE) attrfind_tolerant = re.compile( r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*' r'(\'[^\']*\'|"[^"]*"|[^>\s]*))?') #s='<abc a="b,+"c="d"e=f>text' #s='<abc a="b,+" c="d"e=f>text' s='<abc a="b,+",c="d" e=f>text' m = locatestarttagend_tolerant.search(s) print m.group() print m.groups() #if m.group(1) is not None: self.warning('space missing ... #if m.group(2) is not None: self.warning('comma between attr... m = attrfind_tolerant.search(s, 5) print m.group() print m.groups() ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue1486713> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com