New submission from STINNER Victor <vstin...@redhat.com>:

Currently, the Python filesystem encoding is get in the middle of the Python 
initialization. initfsencoding() is only called when the import machinery is 
ready. Technically, only getting the Python codec name requires a working 
import machinery.

We can get the locale encoding even before Py_Initialize(), but update the 
encoding to the Python codec name (ex: replace "ANSI_X3.4-1968" with "ascii") 
once the import machinery is ready.

With my change, for an application embedding Python, the filesystem encoding 
can now be forced using _PyCoreConfig.filesystem_encoding. If it's set, Python 
will use it (and ignore the locale encoding).

Attached PR implements this idea and adds _PyCoreConfig.filesystem_encoding.


The change move all code to select encoding and error handler inside 
config_init_fs_encoding(), rather than relying on Py_FileSystemDefaultEncoding 
constant (its default value is set using #ifdef) and initfsencoding() (to get 
the locale encoding at runtime).


With this change, it becomes more obvious than the interpreter core 
configuration is mutable. initfsencoding() modify the encoding/errors during 
Python initialization, and sys._enablelegacywindowsfsencoding() even modify it 
at runtime. Previously, I wanted to expose the full core_config at the Python 
level, as something like sys.flags. But since it's mutable, I'm not longer sure 
that it's a good idea. To *get* the filesystem encoding/errors, they are 
already sys.getfilesystemencoding() and sys.getfilesystemencodeerrors(). 
Modifying the filesystem encoding at runtime is a very bad idea tried in the 
past and it caused many issues.


For the long term, I would like to remove Py_HasFileSystemDefaultEncoding, 
since it's really an implementation detail and there is no need to expose it. 
More generally, I don't see the purpose of exposing directly the encoding and 
error handler at the C level: Py_FileSystemDefaultEncoding and 
Py_FileSystemDefaultEncodeErrors. These symbols are constant constant, they 
cannot be modified when Python is embedded. But well, it's just a remark, 
technically, these 3 symbols don't cause any kind of trouble.


Commit message:
---
bpo-34485: Add _PyCoreConfig.filesystem_encoding

_PyCoreConfig_Read() is now responsible to choose the filesystem
encoding and error handler. Using Py_Main(), the encoding is now
selected even before calling Py_Initialize().

_PyCoreConfig.filesystem_encoding is now the reference instead of
Py_FileSystemDefaultEncoding for the Python filesystem encoding.

Changes:

* Add filesystem_encoding and filesystem_errors to _PyCoreConfig
* _PyCoreConfig_Read() now reads the locale encoding for the file
  system encoding. Coerce temporarly the locale or set temporarly the
  LC_CTYPE locale to the user preferred locale.
* PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize()
  now use the interpreter configuration rather than
  Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors
  global configuration variables.
* Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding()
  private functions to only modify Py_FileSystemDefaultEncoding and
  Py_FileSystemDefaultEncodeErrors in coreconfig.c.
---

----------
components: Interpreter Core
messages: 324206
nosy: eric.snow, inada.naoki, ncoghlan, vstinner
priority: normal
severity: normal
status: open
title: Choose the filesystem encoding earlier in Python initialization (add 
_PyCoreConfig.filesystem_encoding)
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34523>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to