Re: schizophrenic view of what is white space

2008-12-04 Thread Jean-Paul Calderone
On Thu, 04 Dec 2008 22:40:46 +0100, Hrvoje Niksic <[EMAIL PROTECTED]> wrote: [snip] Whitespace is probably not controversial, but many parsers tend to expect things like \d to match [0-9], not any Unicode character marked as "digit". For example, I'm not sure if this behavior would be a good de

Re: schizophrenic view of what is white space

2008-12-04 Thread Hrvoje Niksic
MRAB <[EMAIL PROTECTED]> writes: > I'm not sure why the Unicode flag is needed in the API. I reckon > that it should just look at the text that the regular expression is > being applied to: if it's Unicode then follow the Unicode rules, if > not then don't. It might be that using Unicode tables f

Re: schizophrenic view of what is white space

2008-12-04 Thread MRAB
Terry Reedy wrote: MRAB wrote: Robin Becker wrote: Jean-Paul Calderone wrote: . You have to give the re module an additional hint that you care about unicode: [EMAIL PROTECTED]:~$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux

Re: schizophrenic view of what is white space

2008-12-04 Thread Terry Reedy
MRAB wrote: Robin Becker wrote: Jean-Paul Calderone wrote: . You have to give the re module an additional hint that you care about unicode: [EMAIL PROTECTED]:~$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "co

Re: schizophrenic view of what is white space

2008-12-04 Thread Terry Reedy
so the default behaviour differs for unicode and re working on unicode. I suppose that won't be true in Python 3. import re print(re.compile(r'\s').search('a b')) print(re.compile(r'\s').search('a\xa0b') ) >>> RESTART === >>> <_sre.SRE_Match object

Re: schizophrenic view of what is white space

2008-12-04 Thread MRAB
Robin Becker wrote: Jean-Paul Calderone wrote: . You have to give the re module an additional hint that you care about unicode: [EMAIL PROTECTED]:~$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "cr

Re: schizophrenic view of what is white space

2008-12-04 Thread Robin Becker
Jean-Paul Calderone wrote: . You have to give the re module an additional hint that you care about unicode: [EMAIL PROTECTED]:~$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" f

Re: schizophrenic view of what is white space

2008-12-04 Thread Jean-Paul Calderone
On Thu, 04 Dec 2008 14:27:49 +, Robin Becker <[EMAIL PROTECTED]> wrote: Is python of two minds about what is white space. I notice that split, strip seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not matched by the \s pattern. If this difference is intended can we rely on

schizophrenic view of what is white space

2008-12-04 Thread Robin Becker
Is python of two minds about what is white space. I notice that split, strip seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not matched by the \s pattern. If this difference is intended can we rely on it continuing? >>> u'a b'.split() [u'a', u'b'] >>> u'a\xa0b'.split() [u'a