On Thu, 04 Dec 2008 14:27:49 +0000, Robin Becker <[EMAIL PROTECTED]> wrote:
Is python of two minds about what is white space. I notice that split, strip seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not matched by the \s pattern. If this difference is intended can we rely on it continuing?


>>> u'a b'.split()
[u'a', u'b']
>>> u'a\xa0b'.split()
[u'a', u'b']
>>> re.compile(r'\s').search(u'a b')
<_sre.SRE_Match object at 0x00DBB2C0>
>>> re.compile(r'\s').search(u'a\xa0b')
>>>


You have to give the re module an additional hint that you care about
unicode:

 [EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import re
 >>> print re.compile(r'\s').search(u'a\xa0b')
 None
 >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
 <_sre.SRE_Match object at 0xb7dbb3a0>
 >>>

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to