[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

Vlastimil Brom Mon, 24 Aug 2009 05:56:01 -0700

Vlastimil Brom <vlastimil.b...@gmail.com> added the comment:

I'd like to add some detail to the previous msg91473


The current behaviour of the character properties looks a bit 
surprising sometimes:

>>> 
>>> regex.findall(ur"\p{UppercaseLetter}", u"QW\p{UppercaseLetter}as")
[u'Q', u'W', u'U', u'L']
>>> regex.findall(ur"\p{Uppercase Letter}", u"QW\p{Uppercase Letter}as")
[u'\\p{Uppercase Letter}']
>>> regex.findall(ur"\p{UppercaseÄÄÄLetter}", u"QW\p
{UppercaseÄÄÄLetter}as")
[u'\\p{Uppercase\xc4\xc4\xc4Letter}']
>>> regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p
{UppercaseQQQLetter}as")

Traceback (most recent call last):
  File "<pyshell#34>", line 1, in <module>
    regex.findall(ur"\p{UppercaseQQQLetter}", u"QW\p
{UppercaseQQQLetter}as")
...
  File "C:\Python26\lib\regex.py", line 1178, in _parse_property
    raise error("undefined property name '%s'" % name)
error: undefined property name 'UppercaseQQQLetter'
>>> 

i.e. potential property names consisting only from the ascii-letters  
(+ _, -) are looked up and either used or an error is raised,
other names (containing whitespace or non-ascii letters) aren't treated 
as a special expression, hence, they either match their literal value 
or simply don't match (without errors).

Is this the intended behaviour? 
I am not sure whether it is maybe defined somewhere, or there are some 
de-facto standards for this...
I guess, the space in the property names might be allowed (unless there 
are some implications for the parser...), otherwise the fallback 
handling of invalid property names as normal strings is probably the 
expected way.
vbr

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

Reply via email to