New submission from Ned Deily <n...@acm.org>: Potential Release Blocker
The default file encoding for 3.x file objects is the value of locale.getpreferredencoding(). Currently, the locale module behavior on OS X deviates from other python POSIX platforms in a few unexpected and bad ways: 1. On OS X, locale.getpreferredencoding() returns "mac-roman", an obsolete encoding from the old "Classic" MacOS days. 2. Unlike other POSIX platforms (at least Debian Linux), the values returned by locale.getdefaultlocale() and locale.getpreferredencoding() on OS X are not influenced by the settings of the POSIX locale environment variables, i.e LANG. So, unlike on the other POSIX platforms, one can't override the (obsolete) encoding without explicitly setting the encoding argument to open(). Compare the results from Debian Linux: $ unset LANG $ python3.1 Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.getpreferredencoding() 'ANSI_X3.4-1968' >>> open('blah','r').encoding 'ANSI_X3.4-1968' >>> locale.getlocale() (None, None) >>> locale.getdefaultlocale() (None, None) >>> $ export LANG=en_US.UTF-8 $ python3.1 Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.getpreferredencoding() 'UTF-8' >>> open('blah','r').encoding 'UTF-8' >>> locale.getlocale() ('en_US', 'UTF8') >>> locale.getdefaultlocale() ('en_US', 'UTF8') >>> ... to OS X: $ unset LANG $ python3.1 Python 3.1rc1+ (py3k, Jun 3 2009, 14:31:41) [GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.getpreferredencoding() 'mac-roman' >>> open('blah','r').encoding 'mac-roman' >>> locale.getlocale() (None, None) >>> locale.getdefaultlocale() (None, 'mac-roman') >>> $ export LANG=en_US.UTF-8 $ python3.1 Python 3.1rc1+ (py3k, Jun 3 2009, 14:31:41) [GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.getpreferredencoding() 'mac-roman' >>> open('blah','r').encoding 'mac-roman' >>> locale.getlocale() ('en_US', 'UTF8') >>> locale.getdefaultlocale() (None, 'mac-roman') >>> A quick look at the code shows that part of the problem is in Modules/_localemodule.c where there is a #if defined(__APPLE__) version of PyLocale_getdefaultlocale which appears to have its origins in MacOS and should probably just be removed and locale.py modified to eliminate/minimize the special case mac/darwin code. For the case of no locale, "UTF-8" would seem to be a reasonable default. In any case, "mac-roman" is not. ---------- assignee: ronaldoussoren components: IO, Library (Lib), Macintosh messages: 88929 nosy: benjamin.peterson, nad, ronaldoussoren severity: normal status: open title: Obsolete default file encoding "mac-roman" on OS X, not influenced by locale env variables type: behavior versions: Python 3.1 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6202> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com