Re: split on NO-BREAK SPACE

Jean-Paul Calderone Sun, 22 Jul 2007 12:41:49 -0700

On Sun, 22 Jul 2007 21:13:02 +0200, Peter Kleiweg <[EMAIL PROTECTED]> wrote:
>Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007:
>
>> On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote:
>> > > It's a feature. See help(str.split): "If sep is not specified or is
>> > > None, any whitespace string is a separator."
>> >
>> > Define "any whitespace".
>>
>> Any string for which isspace returns True.
>
>Define white space to isspace()
>
>> > Why is it different in <type 'str'> and <type 'unicode'>?
>>
>> >>> '\xa0'.isspace()
>> False
>> >>> u'\xa0'.isspace()
>> True
>
>Here is another "space":
>
>  >>> u'\uFEFF'.isspace()
>  False
>
>isspace() is inconsistent


It's only inconsistent if you think it should behave based on the
name of a unicode code point.  It doesn't use the name, though. It
uses the category.  NO-BREAK SPACE is in the Zs category (Separator, Space).
ZERO WIDTH NO-BREAK SPACE is in the Cf category (Other, Format).

Maybe that makes unicode inconsistent (I won't try to argue either way),
but it's pretty clear that isspace is being consistent based on the data
it has to work with.

Jean-Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: split on NO-BREAK SPACE

Reply via email to