Alexander Belopolsky added the comment:

It looks like str.isspace() is incorrect.  The proper definition of unicode 
whitespace seems to include 26 characters:

# ================================================

0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
0020          ; White_Space # Zs       SPACE
0085          ; White_Space # Cc       <control-0085>
00A0          ; White_Space # Zs       NO-BREAK SPACE
1680          ; White_Space # Zs       OGHAM SPACE MARK
180E          ; White_Space # Zs       MONGOLIAN VOWEL SEPARATOR
2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028          ; White_Space # Zl       LINE SEPARATOR
2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
3000          ; White_Space # Zs       IDEOGRAPHIC SPACE

# Total code points: 26

http://www.unicode.org/Public/UNIDATA/PropList.txt

Python's str.isspace() uses the following definition: "Whitespace characters 
are those characters defined in the Unicode character database as “Other” or 
“Separator” and those with bidirectional property being one of “WS”, “B”, or 
“S”."

Information separators are swept in because they have bidirectional property 
"B":

>>> unicodedata.bidirectional('\N{RS}')
'B'

See also #10587.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18236>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to