On Thu, 04 Dec 2008 22:40:46 +0100, Hrvoje Niksic <[EMAIL PROTECTED]> wrote:
[snip]
Whitespace is probably not controversial, but many parsers tend to
expect things like \d to match [0-9], not any Unicode character marked
as "digit". For example, I'm not sure if this behavior would be a
good de
MRAB <[EMAIL PROTECTED]> writes:
> I'm not sure why the Unicode flag is needed in the API. I reckon
> that it should just look at the text that the regular expression is
> being applied to: if it's Unicode then follow the Unicode rules, if
> not then don't.
It might be that using Unicode tables f
Terry Reedy wrote:
MRAB wrote:
Robin Becker wrote:
Jean-Paul Calderone wrote:
.
You have to give the re module an additional hint that you care about
unicode:
[EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3
(Ubuntu 4.2.3-2ubuntu7)] on linux
MRAB wrote:
Robin Becker wrote:
Jean-Paul Calderone wrote:
.
You have to give the re module an additional hint that you care about
unicode:
[EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
4.2.3-2ubuntu7)] on linux2
Type "help", "co
so the default behaviour differs for unicode and re working on unicode.
I suppose that won't be true in Python 3.
import re
print(re.compile(r'\s').search('a b'))
print(re.compile(r'\s').search('a\xa0b') )
>>> RESTART ===
>>>
<_sre.SRE_Match object
Robin Becker wrote:
Jean-Paul Calderone wrote:
.
You have to give the re module an additional hint that you care about
unicode:
[EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "cr
Jean-Paul Calderone wrote:
.
You have to give the re module an additional hint that you care about
unicode:
[EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" f
On Thu, 04 Dec 2008 14:27:49 +, Robin Becker <[EMAIL PROTECTED]> wrote:
Is python of two minds about what is white space. I notice that split, strip
seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not
matched by the \s pattern. If this difference is intended can we rely on
Is python of two minds about what is white space. I notice that split, strip
seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not matched
by the \s pattern. If this difference is intended can we rely on it continuing?
>>> u'a b'.split()
[u'a', u'b']
>>> u'a\xa0b'.split()
[u'a