[Long posting due to the examples, but pretty simple question.] I'm sitting here with a Debian Linux 'Woody' system with the default Python 2.2 installation, and I want the re module to understand that re.compile(r'\W+'. re.LOCALE) doesn't match my national, accented characters.
I don't quite understand how the locale module reasons about these things, and Python doesn't seem to act as other programs on my system. Bug or my mistake? Here's my environment: frailea> env |grep -e LC -e LANG LC_MESSAGES=C LC_TIME=C LANG=sv_SE LC_NUMERIC=C LC_MONETARY=C frailea> locale LANG=sv_SE LC_CTYPE="sv_SE" LC_NUMERIC=C LC_TIME=C LC_COLLATE="sv_SE" LC_MONETARY=C LC_MESSAGES=C LC_PAPER="sv_SE" LC_NAME="sv_SE" LC_ADDRESS="sv_SE" LC_TELEPHONE="sv_SE" LC_MEASUREMENT="sv_SE" LC_IDENTIFICATION="sv_SE" LC_ALL= This seems to indicate that $LANG acts as a fallback when other things (e.g. LC_CTYPE isn't defined) and that's also what the glibc setlocale(3) man page says. Works well for me in general, too. However, consider this tiny Python program: frailea> cat foo import locale print locale.getlocale() locale.setlocale(locale.LC_CTYPE) print locale.getlocale() When I paste it into an interactive Python session, the locale is already set up correctly (which is what I suppose interactive mode /should/ do): >>> import locale >>> print locale.getlocale() ['sv_SE', 'ISO8859-1'] >>> locale.setlocale(locale.LC_CTYPE) 'sv_SE' >>> print locale.getlocale() ['sv_SE', 'ISO8859-1'] >>> When I run it as a script it isn't though, and the setlocale() call does not appear to fall back to looking at $LANG as it's supposed to(?), so my LC_CTYPE remains in the POSIX locale: frailea> python foo (None, None) (None, None) The corresponding program written in C works as expected: frailea> cat foot.c #include <stdio.h> #include <locale.h> int main(void) { printf("%s\n", setlocale(LC_CTYPE, 0)); printf("%s\n", setlocale(LC_CTYPE, "")); printf("%s\n", setlocale(LC_CTYPE, 0)); return 0; } frailea> ./foot C sv_SE sv_SE So, is this my fault or Python's? I realize I could just adapt and set $LC_CTYPE explicitly in my environment, but I don't want to capitulate for a Python bug, if that's what this is. BR, Jorgen -- // Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu \X/ algonet.se> R'lyeh wgah'nagl fhtagn! -- http://mail.python.org/mailman/listinfo/python-list