MRAB wrote:
Robin Becker wrote:
Jean-Paul Calderone wrote:
.........
You have to give the re module an additional hint that you care about
unicode:
[EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> print re.compile(r'\s').search(u'a\xa0b')
None
>>> print re.compile(r'\s', re.U).search(u'a\xa0b')
<_sre.SRE_Match object at 0xb7dbb3a0>
>>>
Jean-Paul
.......
so the default behaviour differs for unicode and re working on
unicode. I suppose that won't be true in Python 3.
>
I'm not sure why the Unicode flag is needed in the API. I reckon that it
should just look at the text that the regular expression is being
applied to: if it's Unicode then follow the Unicode rules, if not then
don't.
I presume because \b is interpreted and replaced when the re is compiled
into internal state machine form.
--
http://mail.python.org/mailman/listinfo/python-list