Hi, all.
I updated the PEP 597 yesterday.
Please review it to move it forward.
PEP: https://www.python.org/dev/peps/pep-0597/
Previous thread: https://discuss.python.org/t/3880
Main difference from the previous version:
* Added new warning category; EncodingWarning
* Added dedicated option to enable the warning instead of using dev mode.
Abstract
Add a new warning category ``EncodingWarning``. It is emitted when
``encoding`` option is omitted and the default encoding is a locale
encoding.
The warning is disabled by default. New ``-X warn_encoding``
command-line option and ``PYTHONWARNENCODING`` environment variable
are used to enable the warnings.
Motivation
==
Using the default encoding is a common mistake
--
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.
For example, ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users can not install
the package if there is at least one non-ASCII character (e.g. emoji)
in the ``README.md`` file which is encoded in UTF-8.
For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
ASCII. [1_] They used the default encoding to read README or TOML
file.
Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2_]
Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [3_] and [4_].
Emitting a warning when the ``encoding`` option is omitted will help
to find such mistakes.
Prepare to change the default encoding to UTF-8
---
We had chosen to use locale encoding for the default text encoding in
Python 3.0. But UTF-8 has been adopted very widely since then.
We might change the default text encoding to UTF-8 in the future.
But this change will affect many applications and libraries.
Many ``DeprecationWarning`` will be emitted if we start emitting the
warning by default. It will be too noisy.
Although this PEP doesn't propose to change the default encoding,
this PEP will help to reduce the warning in the future if we decide
to change the default encoding.
Specification
=
``EncodingWarning``
Add new ``EncodingWarning`` warning class which is a subclass of
``Warning``. It is used to warn when the ``encoding`` option is
omitted and the default encoding is locale-specific.
Options to enable the warning
--
``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
environment variable are added. They are used to enable the
``EncodingWarning``.
``sys.flags.encoding_warning`` is also added. The flag represents
``EncodingWarning`` is enabled.
When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
other modules using them will emit ``EncodingWarning`` when
``encoding`` is omitted.
``encoding="locale"`` option
``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
be used to avoid confusing ``LookupError: unknown encoding: locale``
error when the code is run in old Python accidentally.
The constant can be used to test that ``encoding="locale"`` option is
supported too. For example,
.. code-block::
# Want to suppress an EncodingWarning but still need support
# old Python versions.
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
...
``io.text_encoding()``
---
``io.text_encoding()`` is a helper function for functions having
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or
``open()``.
Pure Python implementation will be like this::
def text_encoding(encoding, stacklevel=1):
"""Helper function to choose the text encoding.
When *encoding* is not None, just return it.
Otherwise, return the default text encoding (i.e., "locale").
This function emits EncodingWarning if *encoding* is None and
sys.flags.encoding_warning is true.
This function can be used in APIs having encoding=None option
and pass it to TextIOWrapper or open.
But please consider using encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.encoding_warning:
import warnings
warnings.warn("'encoding' option is omitted",
EncodingWarning, stacklevel + 2)
encoding = LOCALE_ENCODING
return encoding
For example, ``pathlib.