On Jan 12, 11:26 pm, Torsten Bronger <[EMAIL PROTECTED]> wrote: > Hallöchen! > > > > Fredrik Lundh writes: > > Robert Kern wrote: > > >>> However it appears from your bug ticket that you have a much > >>> narrower problem (case-shifting a small known list of English > >>> words like VOID) and can work around it by writing your own > >>> locale-independent casing functions. Do you still need to find > >>> out whether Python unicode casings are locale-dependent? > > >> I would still like to know. There are other places where .lower() > >> is used in numpy, not to mention the rest of my code. > > > "lower" uses the informative case mappings provided by the Unicode > > character database; see > > > http://www.unicode.org/Public/4.1.0/ucd/UCD.html > > > afaik, changing the locale has no influence whatsoever on Python's > > Unicode subsystem. > > Slightly off-topic because it's not part of the Unicode subsystem, > but I was once irritated that the none-breaking space (codepoint xa0 > I think) was included into string.whitespace. I cannot reproduce it > on my current system anymore, but I was pretty sure it occured with > a fr_FR.UTF-8 locale. Is this possible? And who is to blame, or > must my program cope with such things?
The NO-BREAK SPACE is treated as whitespace in the Python unicode subsystem. As for str objects, the default "C" locale doesn't know it exists; otherwise AFAIK if the character set for the locale has it, it will be treated as whitespace. You were irritated because non-break SPACE was included in string.whiteSPACE? Surely not! It seems eminently logical to me. Perhaps you were irritated because str.split() ignored the "no-break"? If like me you had been faced with removing trailing spaces from text columns in databases, you surely would have been delighted that str.rstrip() removed the trailing-padding-for-nicer-layout no-break spaces that the users had copy/pasted from some clown's website :-) What was the *real* cause of your irritation? -- http://mail.python.org/mailman/listinfo/python-list