Vlastimil Brom <vlastimil.b...@gmail.com> added the comment: I'd like to add some detail to the previous msg91473
The current behaviour of the character properties looks a bit surprising sometimes: >>> >>> regex.findall(ur"\p{UppercaseLetter}", u"QW\p{UppercaseLetter}as") [u'Q', u'W', u'U', u'L'] >>> regex.findall(ur"\p{Uppercase Letter}", u"QW\p{Uppercase Letter}as") [u'\\p{Uppercase Letter}'] >>> regex.findall(ur"\p{UppercaseÄÄÄLetter}", u"QW\p {UppercaseÄÄÄLetter}as") [u'\\p{Uppercase\xc4\xc4\xc4Letter}'] >>> regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p {UppercaseQQQLetter}as") Traceback (most recent call last): File "<pyshell#34>", line 1, in <module> regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p {UppercaseQQQLetter}as") ... File "C:\Python26\lib\regex.py", line 1178, in _parse_property raise error("undefined property name '%s'" % name) error: undefined property name 'UppercaseQQQLetter' >>> i.e. potential property names consisting only from the ascii-letters (+ _, -) are looked up and either used or an error is raised, other names (containing whitespace or non-ascii letters) aren't treated as a special expression, hence, they either match their literal value or simply don't match (without errors). Is this the intended behaviour? I am not sure whether it is maybe defined somewhere, or there are some de-facto standards for this... I guess, the space in the property names might be allowed (unless there are some implications for the parser...), otherwise the fallback handling of invalid property names as normal strings is probably the expected way. vbr ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2636> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com