Re: split on NO-BREAK SPACE

2007-07-22 Thread Ben Finney
Steve Holden <[EMAIL PROTECTED]> writes: > Well, if you're going to start answering questions with FACTS, how > can questioners reply on their prejudices to guide them any more? You clearly underestimate the capacity for such people to choose only the particular facts that support those prejudice

Re: split on NO-BREAK SPACE

2007-07-22 Thread I V
On Sun, 22 Jul 2007 21:13:02 +0200, Peter Kleiweg wrote: > Here is another "space": > > >>> u'\uFEFF'.isspace() > False > > isspace() is inconsistent Well, U+00A0 is in the category "Separator, Space" while U+FEFF is in the category "Other, Format", so it doesn't seem unreasonable that one i

Re: split on NO-BREAK SPACE

2007-07-22 Thread Steve Holden
Jean-Paul Calderone wrote: > On Sun, 22 Jul 2007 21:13:02 +0200, Peter Kleiweg <[EMAIL PROTECTED]> wrote: >> Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007: >> >>> On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote: > It's a feature. See help(str.split): "If

Re: split on NO-BREAK SPACE

2007-07-22 Thread Jean-Paul Calderone
On Sun, 22 Jul 2007 21:13:02 +0200, Peter Kleiweg <[EMAIL PROTECTED]> wrote: >Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007: > >> On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote: >> > > It's a feature. See help(str.split): "If sep is not specified or is >> > > None,

Re: split on NO-BREAK SPACE

2007-07-22 Thread Wildemar Wildenburger
Peter Kleiweg wrote: > > Define white space to isspace() > > Explain that phrase. > > Here is another "space": > > >>> u'\uFEFF'.isspace() > False > > isspace() is inconsistent > I don't really know much about unicode, but google tells me that \uFEFF is a byte order mark. I thought we

Re: split on NO-BREAK SPACE

2007-07-22 Thread Peter Kleiweg
Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007: > On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote: > > > It's a feature. See help(str.split): "If sep is not specified or is > > > None, any whitespace string is a separator." > > > > Define "any whitespace". > > Any

Re: split on NO-BREAK SPACE

2007-07-22 Thread Carsten Haese
On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote: > > It's a feature. See help(str.split): "If sep is not specified or is > > None, any whitespace string is a separator." > > Define "any whitespace". Any string for which isspace returns True. > Why is it different in and ? >>> '\xa0'.is

Re: split on NO-BREAK SPACE

2007-07-22 Thread Peter Kleiweg
Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007: > On Sun, 2007-07-22 at 17:15 +0200, Peter Kleiweg wrote: > > Is this a bug or a feature? > > > > > > Python 2.4.4 (#1, Oct 19 2006, 11:55:22) > > [GCC 2.95.3 20010315 (SuSE)] on linux2 > > > > >>> a = 'a b c\2

Re: split on NO-BREAK SPACE

2007-07-22 Thread Carsten Haese
On Sun, 2007-07-22 at 17:15 +0200, Peter Kleiweg wrote: > Is this a bug or a feature? > > > Python 2.4.4 (#1, Oct 19 2006, 11:55:22) > [GCC 2.95.3 20010315 (SuSE)] on linux2 > > >>> a = 'a b c\240d e' > >>> a > 'a b c\xa0d e' > >>> a.split() > ['a', 'b', 'c\xa0d', 'e

split on NO-BREAK SPACE

2007-07-22 Thread Peter Kleiweg
Is this a bug or a feature? Python 2.4.4 (#1, Oct 19 2006, 11:55:22) [GCC 2.95.3 20010315 (SuSE)] on linux2 >>> a = 'a b c\240d e' >>> a 'a b c\xa0d e' >>> a.split() ['a', 'b', 'c\xa0d', 'e'] >>> a = a.decode('latin-1') >>> a u'a b c\xa0d e' >>> a.sp