On Jan 13, 9:49 am, Carl Banks <[EMAIL PROTECTED]> wrote: > On Sat, 12 Jan 2008 13:51:18 -0800, John Machin wrote: > > On Jan 12, 11:26 pm, Torsten Bronger <[EMAIL PROTECTED]> > > wrote: > >> Hallöchen! > > >> Fredrik Lundh writes: > >> > Robert Kern wrote: > > >> >>> However it appears from your bug ticket that you have a much > >> >>> narrower problem (case-shifting a small known list of English words > >> >>> like VOID) and can work around it by writing your own > >> >>> locale-independent casing functions. Do you still need to find out > >> >>> whether Python unicode casings are locale-dependent? > > >> >> I would still like to know. There are other places where .lower() is > >> >> used in numpy, not to mention the rest of my code. > > >> > "lower" uses the informative case mappings provided by the Unicode > >> > character database; see > > >> > http://www.unicode.org/Public/4.1.0/ucd/UCD.html > > >> > afaik, changing the locale has no influence whatsoever on Python's > >> > Unicode subsystem. > > >> Slightly off-topic because it's not part of the Unicode subsystem, but > >> I was once irritated that the none-breaking space (codepoint xa0 I > >> think) was included into string.whitespace. I cannot reproduce it on > >> my current system anymore, but I was pretty sure it occured with a > >> fr_FR.UTF-8 locale. Is this possible? And who is to blame, or must my > >> program cope with such things? > > > The NO-BREAK SPACE is treated as whitespace in the Python unicode > > subsystem. As for str objects, the default "C" locale doesn't know it > > exists; otherwise AFAIK if the character set for the locale has it, it > > will be treated as whitespace. > > > You were irritated because non-break SPACE was included in > > string.whiteSPACE? Surely not! It seems eminently logical to me. > > To me it seems the point of a non-breaking space is to have something > that's printed as whitespace but not treated as it.
To me it seems the point of a no-break space is that it's treated as a space in all respects except that it doesn't "break". > > > Perhaps > > you were irritated because str.split() ignored the "no-break"? If like > > me you had been faced with removing trailing spaces from text columns in > > databases, you surely would have been delighted that str.rstrip() > > removed the trailing-padding-for-nicer-layout no-break spaces that the > > users had copy/pasted from some clown's website :-) > > > What was the *real* cause of your irritation? > > If you want to use str.split() to split words, you will foil the user who > wants to not break at a certain point. Which was exactly my point -- but this would happen only rarely or not at all in my universe (names, addresses, product descriptions, etc in databases). > > Your use of rstrip() is a lot more specialized, if you ask me. Not very specialised at all in my universe -- a standard transformation that one normally applies to database text is to remove all leading and trailing whitespace, and compress runs of 1 or more whitespace characters to a single normal space. Your comment seems to imply that trailing non-break spaces are significant and should be preserved ... -- http://mail.python.org/mailman/listinfo/python-list