Dave Mankoff added the comment:
"Use regular expressions for more advanced stripping than what the .strip
method provides."
So I guess this brings me back to my original issue. I'm not looking for
particularly advanced stripping. I just want to remove all whitespace and othe
Dave Mankoff added the comment:
So I contacted the Unicode Technical Committee about the issue and received a
promptly received a response back. They pointed that the ZWSP was, once upon a
time considered white space but that was changed in Unicode 4.0.1
http://www.unicode.org/review
Dave Mankoff added the comment:
But why are they not a space? I mean, they literally have the word space in
their name and are used as separators between words. I can't really see any
reason why you wouldn't want this behavior - there's not time when I would be
thankful tha
Dave Mankoff added the comment:
I appreciated the quick turnaround on this.
Perhaps I am misunderstanding the resolution. I understand that strip uses
_PyUnicode_IsWhitespace, and that _PyUnicode_IsWhitespace "Returns 1 for
Unicode characters having the bidirectional type 'WS'
New submission from Dave Mankoff :
Title pretty much says it all. Simple test case:
>>> len(u' \t\r\n\u200B'.strip())
1
Should be zero.
Same problem in Python3:
>>> len(' \t\r\n\u200B'.strip())
1
--
components: Unicode
messages: 147538
nosy: