Bugs item #1390608, was opened at 2005-12-26 16:03 Message generated for change (Comment added) made by doerwalter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1390608&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: MvR (maxim_razin) Assigned to: Nobody/Anonymous (nobody) Summary: split() breaks no-break spaces Initial Comment: string.split(), str.split() and unicode.split() without parameters break strings by the No-break space (U+00A0) character. This character is specially intended not to be a split border. >>> u"Hello\u00A0world".split() [u'Hello', u'world'] ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2005-12-30 13:35 Message: Logged In: YES user_id=89016 What's wrong with the following? import sys, unicodedata spaces = u"".join(unichr(c) for c in xrange(0, sys.maxunicode) if unicodedata.category(unichr(c))=="Zs" and c != 160) foo.split(spaces) ---------------------------------------------------------------------- Comment By: Hye-Shik Chang (perky) Date: 2005-12-30 01:30 Message: Logged In: YES user_id=55188 Python documentation says that it splits in "whitespace characters" not "breaking characters". So, current behavior is correct according to the documentation. And even rationale among string methods are heavily depends on ctype functions on libc. Therefore, we can't serve special treatment for the NBSP. However, I feel the need for the splitting function that awares what character is breaking or not. How about to add it as unicodedata.split()? ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2005-12-29 21:42 Message: Logged In: YES user_id=38376 split isn't a word-wrapping split, so I'm not sure that's the right place to fix this. ("no-break space" is white- space, according to the Unicode standard, and split breaks on whitespace). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1390608&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com