[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

eryksun Mon, 21 Sep 2015 13:50:56 -0700

eryksun added the comment:

> local_encoding = locale.getdefaultlocale()[1]


Use locale.getpreferredencoding().

> b = eval('b' + ascii(result))
> result = b.decode(local_encoding)

It's simpler and more reliable to use 'latin-1' and 'mbcs' (ANSI). For example:

    result = result.encode('latin-1').decode('mbcs')

If setlocale(LC_CTYPE, "") is called before importing the time module, then 
tzname is already correct. In this case, the above is either harmless or raises 
a UnicodeEncodeError that can be handled. OTOH, your approach silently corrupts 
the value:

    >>> result = 'Střední Evropa (běžný čas)'
    >>> b = eval('b' + ascii(result))
    >>> b.decode('1251')
    'St\\u0159ednн Evropa (b\\u011b\\u017enэ \\u010das)'

Back to the issue. In review, on initial import of the time module, if the CRT 
is using the default "C" locale, we have this inconsistency in which the time 
functions encode/decode tzname as ANSI and mbstowcs decodes tzname as Latin-1. 
(Plus strftime in the new CRT calls wcsftime, which adds another transcoding 
layer to compound the mojibake goodness.)

If time.tzset is implemented on Windows, then at startup an application can set 
the locale (specifically LC_CTYPE for tzname, and LC_TIME for strftime) and 
then call time.tzset(). 

Example with Russian system locale:

Initially we're in the "C" locale and the CRT's tzname is in ANSI. time.tzname 
incorrectly decodes this as Latin-1 since that's what mbstowcs uses in the "C" 
locale:

    >>> time.tzname[0]
    '\xc2\xf0\xe5\xec\xff \xe2 \xf4\xee\xf0\xec\xe0\xf2\xe5 UTC'

The way the CRT's strftime is implemented compounds the problem:

    >>> time.strftime('%Z')
    'A?aiy a oi?iaoa UTC'

It's implemented by calling the wide-character function, wcsftime. Just like 
Python, this gets a wide-character string by calling mbstowcs on the ANSI 
tzname. Then the CRT's strftime encodes the wide-character string back as a 
best-fit ANSI string, and finally time.strftime decodes the result as Latin-1 
via mbstowcs. The result is mutated mojibake:

    >>> time.tzname[0].encode('mbcs', 'replace').decode('latin-1')
    'A?aiy a oi?iaoa UTC'

Ironically, Python stopped calling wcsftime on Windows because of these 
problems, but changes to the code since then, plus the new CRT, have brought 
the problem back, and worse. See my comment in issue 10653, msg243660.

Fix this by setting the locale and calling _tzset:

    >>> import ctypes, locale
    >>> locale.setlocale(locale.LC_ALL, '')
    'Russian_Russia.1251'
    >>> ctypes.cdll.ucrtbase._tzset()
    0
    >>> time.strftime('%Z')
    'Время в формате UTC'

If time.tzset were implemented on Windows, calling it would reload the 
time.tzname tuple.

----------
versions: +Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16322>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16322] time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

Reply via email to