date:20210319

[issue43551] [Subinterpreters]: PyImport_Import use static silly_list under building Python with --with-experimental-isolated-subinterpreters share silly_list in multi subinterpreters cause crash.

2021-03-19 Thread junyixie



New submission from junyixie :

fix PyImport_Import use static silly_list under building Python with 
--with-experimental-isolated-subinterpreters share silly_list in multi 
subinterpreters  cause crash.

Under the sub interpreters parallel, PyObject_CallFunction clean stack, 
Py_DECREF(stack[i]), Py_DECREF silly_list is not thread safe. cause crash
```
PyObject *
PyImport_Import(PyObject *module_name)
{
PyThreadState *tstate = _PyThreadState_GET();
static PyObject *silly_list = NULL;
...
/* Initialize constant string objects */
if (silly_list == NULL) {
import_str = PyUnicode_InternFromString("__import__");
if (import_str == NULL)
return NULL;
builtins_str = PyUnicode_InternFromString("__builtins__");
if (builtins_str == NULL)
return NULL;
silly_list = PyList_New(0);
if (silly_list == NULL)
return NULL;
}
...
/* Call the __import__ function with the proper argument list
   Always use absolute import here.
   Calling for side-effect of import. */
r = PyObject_CallFunction(import, "i", module_name, globals,
  globals, silly_list, 0, NULL);

```

--
messages: 389056
nosy: JunyiXie, vstinner
priority: normal
severity: normal
status: open
title: [Subinterpreters]: PyImport_Import use static silly_list under building 
Python with --with-experimental-isolated-subinterpreters share silly_list in 
multi subinterpreters  cause crash.

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43551] [Subinterpreters]: PyImport_Import use static silly_list under building Python with --with-experimental-isolated-subinterpreters share silly_list in multi subinterpreters cause crash.

2021-03-19 Thread junyixie



Change by junyixie :


--
keywords: +patch
pull_requests: +23691
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24929

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



New submission from STINNER Victor :

I propose to add two new functions:

* locale.get_locale_encoding(): it's exactly the same than 
locale.getpreferredencoding(False).

* locale.get_current_locale_encoding(): always get the current locale encoding. 
Read the ANSI code page on Windows, or nl_langinfo(CODESET) on other platforms. 
Ignore the UTF-8 Mode. Don't always return "UTF-8" on macOS, Android, VxWorks.


Technically, locale.get_locale_encoding() would simply expose 
_locale.get_locale_encoding() that I added recently. It calls the new private 
_Py_GetLocaleEncoding() function (which has no argument).

By the way, Python requires nl_langinfo(CODESET) to be built. It's not a new 
requirement of Python 3.10, but I wanted to note that, I noticed it when I 
implemented _locale.get_locale_encoding() :-)


Python has a bad habit of lying to the user: locale.getpreferredencoding(False) 
is *NOT* the current locale encoding in multiple cases.

* locale.getpreferredencoding(False) always return "UTF-8" on macOS, Android 
and VxWorks
* locale.getpreferredencoding(False) always return "UTF-8" if the UTF-8 Mode is 
enabled
* otherwise, it returns the current locale encoding: ANSI code page on Windwos, 
or nl_langinfo(CODESET) on other platforms


Even if locale.getpreferredencoding(False) already exists, I propose to add 
locale.get_locale_encoding() because I dislike locale.getpreferredencoding() 
API. By default, this function sets temporarily LC_CTYPE to the user preferred 
locale. It can cause mojibake in other threads since setlocale(LC_CTYPE, "") 
affects all threads :-( Calling locale.getpreferredencoding(), rather than 
locale.getpreferredencoding(False), is not what most people expect. This API 
can be misused.

On the other side, locale.get_locale_encoding() does exactly what it says: only 
*get* the encoding, don't *set* temporarily a locale to something else.

By the way, the locale.localeconv() function can change temporarily LC_CTYPE 
locale to the LC_MONETARY locale which can cause other threads to use the wrong 
LC_CTYPE locale! But this is a different issue.

--
components: Library (Lib)
messages: 389057
nosy: vstinner
priority: normal
severity: normal
status: open
title: Add  locale.get_locale_encoding() and 
locale.get_current_locale_encoding()
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43466] ssl/hashlib: Add configure option to set or auto-detect rpath to OpenSSL libs

2021-03-19 Thread Christian Heimes



Christian Heimes  added the comment:


New changeset 32eba61ea431c76f15a910c0a4eded7f5f8b9b34 by Christian Heimes in 
branch 'master':
bpo-43466: Add --with-openssl-rpath configure option (GH-24820)
https://github.com/python/cpython/commit/32eba61ea431c76f15a910c0a4eded7f5f8b9b34


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43466] ssl/hashlib: Add configure option to set or auto-detect rpath to OpenSSL libs

2021-03-19 Thread Christian Heimes



Christian Heimes  added the comment:

I'm leaving the ticket open as a reminder for me to update whatsnew.

--
components: +Documentation

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43553] [sqlite3] Improve test coverage

2021-03-19 Thread Erlend Egeberg Aasland



New submission from Erlend Egeberg Aasland :

Attached patch improves the code coverage of the sqlite3 module. I've used 
llvm-cov for coverage measurement.

I'll create a PR for this, if you're fine with this, Berker/Serhiy.

Filename  RegionsMissed Regions Cover   Functions  
Missed Functions  Executed   Lines  Missed Lines Cover
-
prepare_protocol.c 10 730.00%   3   
  233.33%  161131.25%
util.c 652167.69%   3   
  0   100.00%  782666.67%
module.c  3065980.72%  10   
  190.00% 2364580.93%
row.c 1731690.75%  11   
  0   100.00% 1461391.10%
microprotocols.c   81 988.89%   3   
  0   100.00%  981584.69%
connection.c 1113   15586.07%  43   
  0   100.00%1366   17986.90%
cache.c   1363872.06%   7   
  185.71% 2275974.01%
cursor.c  758   11684.70%  19   
  0   100.00% 794   12284.63%
statement.c   3402293.53%  10   
  0   100.00% 3922992.60%
-
TOTAL2982   44385.14% 109   
  496.33%3353   49985.12%

--
components: Library (Lib)
files: improve-sqlite3-coverage.diff
keywords: patch
messages: 389060
nosy: berker.peksag, erlendaasland, serhiy.storchaka
priority: normal
severity: normal
status: open
title: [sqlite3] Improve test coverage
type: enhancement
versions: Python 3.10
Added file: https://bugs.python.org/file49892/improve-sqlite3-coverage.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43554] email: encoded headers lose their quoting when refolded

2021-03-19 Thread Emil Styrke



New submission from Emil Styrke :

When a header with an encoded (QP or Base64) display_name is refolded, it may 
lose (some of) its encoding.  If it then contains illegal "atext" tokens, an 
invalid header will result.

For example, `From: =?utf-8?Q?a=2C=20123456789012345678901234567890123456?= 
` will become `From: a, 123456789012345678901234567890123456 
`  This contains a comma character which needs to be quoted: 
correct rendering would be `From: "a, 123456789012345678901234567890123456" 
`. Note that this example isn't even folded to multiple 
lines, since the decoded text is short enough to fit in one line.

This can be triggered by `BytesParser(policy=policy.default).parsebytes("From: 
=?utf-8?Q?a=2C=20123456789012345678901234567890123456?= 
").as_bytes()`, but the offending code seems to be in or 
below `email.policy.EmailPolicy.fold`.  See attached file for examples with and 
without folding.

--
components: Library (Lib)
files: test_folding_bug.py
messages: 389061
nosy: Emil.Styrke
priority: normal
severity: normal
status: open
title: email: encoded headers lose their quoting when refolded
type: behavior
versions: Python 3.9
Added file: https://bugs.python.org/file49893/test_folding_bug.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 10:17, STINNER Victor wrote:
> 
> New submission from STINNER Victor :
> 
> I propose to add two new functions:
> 
> * locale.get_locale_encoding(): it's exactly the same than 
> locale.getpreferredencoding(False).
> 
> * locale.get_current_locale_encoding(): always get the current locale 
> encoding. Read the ANSI code page on Windows, or nl_langinfo(CODESET) on 
> other platforms. Ignore the UTF-8 Mode. Don't always return "UTF-8" on macOS, 
> Android, VxWorks.

I'm not sure whether this would improve the situation much.

The problem is that the locale module is meant to expose the lib C
locale settings, but many of the recent additions actually do something
completely different: they look into the process and user environment
and try to determine external settings, which are not reflected in
the lib C locale settings.

I had added locale.getdefaultlocale() to give applications a chance
to determine the locale setting defined by the process environment
*without* calling setlocale(LC_ALL, '') and causing problems
in other threads. I used the X11 database for locale encodings,
which was the closest you could get to in terms of a standard for
encodings at the time (around 2000).

Part of the return value is the encoding, which would be set.

Martin later added locale.getpreferredencoding(), which tries to
determine the encoding in a different way new way, based on
nl_langset(CODEINFO). As you mentioned, this intention was broken
on several platforms by forcing UTF-8 as output. And in many cases,
the API had to call setlocale() as well, causing the thread problems.

However, the problem with nl_langset(CODEINFO) is the same as
with setlocale(): it returns the current state of the lib C
settings, which may well point to the 'C' locale. Not the ones
the user has configured in the OS environment. So while you get
an encoding defined by lib C for the current locale settings
(without guessing it as with locale.getdefaultlocale()), you
still don't get what the user really wants to use.

Unfortunately, lib C does not provide a way to query the locale
database without changing the locale settings at the same time.
This is the main issue we're facing.

Now, the correct way in all this would be to just call
setlocale(LC_ALL, '') at the start of the application and
not try to apply all the magic to get around this. But this
has to be done by the application and not Python (which may
well be embedded into some other application).

I'd suggest to add a single new API:

locale.getencoding()

which interfaces to nl_langinfo(CODESET) or the Windows code
page and does not try to do any magic, ie. does *not* call
setlocale(). It needs to return what the lib C currently
knows and uses as encoding.

locale.getpreferredencoding() should then be deprecated.

It does not make sense to pretend to query information which is
not really directly available from the lib C locale system.

And the documentation should point out that applications should
call setlocale(LC_ALL, '') when they start up, if they want to
get the lib C locale, and thus Python locale module, setup to
work based on what the user really wants -- instead of just
guessing at this.

PS: The locale module normally does not use underscores in
function names, so it's not a good idea to add more.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



Change by STINNER Victor :


--
keywords: +patch
pull_requests: +23693
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24931

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

I created this issue while reviewing the implementation of the PEP 597: PR 
19481.

Copy of my comments on the PR related to this issue.


_locale.get_locale_encoding() calls _Py_GetLocaleEncoding() which returns UTF-8 
if the Python UTF-8 Mode is enabled.

Maybe the function could have a flag: please don't lie to me and return the 
current locale encoding ;-)

Or we could add a function to get the *current* locale encoding: 
**locale.get_current_locale_encoding()**.

This one would ignore the UTF-8 Mode and call nl_langinfo(CODESET). There are 
APIs to use the *current* locale encoding: 
PyUnicode_EncodeLocale/PyUnicode_DecodeLocale and 
_Py_EncodeLocaleEx/_Py_DecodeLocaleEx with current_locale=1. You can see which 
functions use it:

* decode tm_zone field of localtime_r() and gmtime()
* decode tzname[0] and tzname[1] strings
* decode setlocale() result
* decode some localeconv() fields (this function requires to switch to 
different locale encoding, it's bad!)
* decode nl_langinfo() result
* decode gettext(), dgettext(), dcgettext(), textdomain(), bindtextdomain(), 
bind_textdomain_codeset() result
* decode strerror() and dlerror() result
* encode/decode in the readline module
* encode format string for strftime() in time.strftime() (only used on Windows, 
Unix provides wcsftime) and then decode strftime() result


> encoding="locale" : Uses locale encoding regardless UTF-8 mode.

Currently, open(encoding=None) doesn't work like that. For example, on macOS, 
Android and VxWorks, it always use UTF-8. And if the UTF-8 Mode is used, UTF-8 
is used.

In the PEP 597, I read the encoding="locale" is the same than encoding=None but 
don't emit an EncodingWarning. Where the PEP 597 changes the chosen encoding 
for encoding=None case? The PEP says "locale encoding" without specifying 
exactly what it is. In Python, it means different things depending on the 
context. There is subtle difference the **current** locale encoding and "the 
locale encoding". I agree that it needs some clarification :-)

While we discuss encodings, I never understood why open() gets the current 
locale encoding from nl_langinfo(CODESET), encoding which can change at runtime 
while Python is running. For example, if thread A calls open(filename, 
encoding=None), thread B calls locale.localeconv(), and the LC_MONETARY locale 
uses a different encoding than the LC_CTYPE locale, thread A can get the 
LC_MONETARY encoding because of how locale.localeconv() is currently 
implemented: it changes temporarily LC_CTYPE to LC_MONETARY to decode the 
monetary fields of localeconv() result.

I would prefer that Python uses the same encoding for the whole lifetime of the 
process, since the beginning until the end. The Python filesystem encoding is a 
good choice for that. It's the same than locale.getpreferredencoding(False) 
(currently used by open() and friends), but becomes different if the LC_CTYPE 
is changed (temporarily or permanently).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

I created PR 24931 to add locale.get_current_locale_encoding(). I tried to 
clarified the differences between the "current locale encoding" and the "locale 
encoding".

Maybe we should rename the "locale encoding" to the "Python locale encoding", 
since it's not what most Unix developers would expect. What do you think?

While most locale function have no underscore in their name, it seems like the 
current trend is to allow underscores in names for *new* functions. For 
example, the sys module has without underscores:

* sys.getallocatedblocks()
* sys.getdefaultencoding()
* sys.getfilesystemencodeerrors
* ...

But it got new functions with underscores:

* sys.set_asyncgen_hooks()
* sys.set_coroutine_origin_tracking_depth()

... and there are some old functions with underscores:

* sys.exc_info()
* sys.call_tracing()
* sys._clear_type_cache()
* sys._current_frames()

In the locale module, there is one existing function with an undercore:

* locale.format_string()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

> Now, the correct way in all this would be to just call setlocale(LC_ALL, '') 
> at the start of the application

Python now does that during its initialization on all platforms. So 
getpreferredencoding(False) is what its documentation says: the user preferred 
encoding, the LC_CTYPE locale encoding.

On Python 3.7, _Py_SetLocaleFromEnv(LC_CTYPE) was called in 
_Py_InitializeCore() on Unix, but not on Windows.

Since Python 3.8, _PyPreConfig_Write() calls _Py_SetLocaleFromEnv(LC_CTYPE) on 
all platforms including Windows. See bpo-34485 and my article for more details 
("C locale on Windows" section):
https://vstinner.github.io/python3-locales-encodings.html

_Py_SetLocaleFromEnv(LC_CTYPE) calls setlocale(LC_CTYPE, ""), but has more 
complex code on Android.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

> locale.getencoding()
>
> which interfaces to nl_langinfo(CODESET) or the Windows code
> page and does not try to do any magic, ie. does *not* call
> setlocale(). It needs to return what the lib C currently
> knows and uses as encoding.

This is locale.get_current_locale_encoding(). I would like to put "current" in 
the name, because there is a lot of confusion between 
get_current_locale_encoding() encoding and locale.getpreferredencoding(False) 
encoding. In locale.getpreferredencoding(False), Python ignores the locale in 
some cases which is counter intuitive.

I propose to add new functions to reduce confusion and better document the 
subtle differences between the different "locale encodings".

That's also why I propose to rename the "locale encoding" to the "Python locale 
encoding" in the documentation: clarify the Python ignores the locale sometimes.

The PEP 538 (coerce the C locale) and PEP 540 (Python UTF-8 Mode) introduced 
confusion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43546] "Impossible" KeyError from importlib._bootstrap acquire line 110

2021-03-19 Thread Anentropic



Anentropic  added the comment:

Thank you for your explanation.

I am baffled why this has never happened to us before, and why a specific 
number of test cases should trigger it (comment one out or add an extra one, it 
goes away).

I've pasted our full stack trace below. It always fails in this place.

My original thought when getting this error was to check for code that executes 
at import time. The module being imported when it breaks seems totally 
innocuous though:
https://github.com/django/django/blob/2.2.19/django/db/migrations/optimizer.py

There are two instances of:

File "", line 1007, in _find_and_load
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py",
 line 215, in wrapped_find_and_load_unlocked
return exec_and_call_hooks(module_name, wrapped, args, kwargs)

...in the traceback, so https://github.com/DataDog/dd-trace-py starts to look 
suspicious.


Traceback (most recent call last):
  File "/work/manage.py", line 26, in 
execute_from_command_line(sys.argv)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py",
 line 381, in execute_from_command_line
utility.execute()
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py",
 line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py",
 line 23, in run_from_argv
super().run_from_argv(argv)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py",
 line 323, in run_from_argv
self.execute(*args, **cmd_options)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py",
 line 364, in execute
output = self.handle(*args, **options)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py",
 line 53, in handle
failures = test_runner.run_tests(test_labels)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py",
 line 308, in run_tests
result = self.run_suite(nose_argv)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py",
 line 244, in run_suite
nose.core.TestProgram(argv=nose_argv, exit=False,
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", 
line 118, in __init__
unittest.TestProgram.__init__(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/unittest/main.py", line 101, 
in __init__
self.runTests()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", 
line 207, in runTests
result = self.testRunner.run(self.test)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", 
line 50, in run
wrapper = self.config.plugins.prepareTest(test)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py",
 line 99, in __call__
return self.call(*arg, **kw)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py",
 line 167, in simple
result = meth(*arg, **kw)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/plugin.py",
 line 82, in prepareTest
self.old_names = self.runner.setup_databases()
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py",
 line 495, in setup_databases
return super(NoseTestSuiteRunner, self).setup_databases()
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/runner.py",
 line 552, in setup_databases
return _setup_databases(
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/utils.py", 
line 170, in setup_databases
connection.creation.create_test_db(
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/backends/base/creation.py",
 line 67, in create_test_db
call_command(
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py",
 line 110, in call_command
command = load_command_class(app_name, command_name)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py",
 line 36, in load_command_class
module = import_module('%s.management.commands.%s' % (app_name, name))
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/importlib/__init__.py", line 
127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
  File "", line 1030, in _gcd_import
  File "", line 1007, in _find_and_load
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py",
 line 215, in wrapped_find_and_load_unlocked
return exec_and_call_hooks(module_name, wrapped, args, kwargs)
  File 
"/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py",
 line 171, in exec_and_call_hooks

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 11:36, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> locale.getencoding()
>>
>> which interfaces to nl_langinfo(CODESET) or the Windows code
>> page and does not try to do any magic, ie. does *not* call
>> setlocale(). It needs to return what the lib C currently
>> knows and uses as encoding.
> 
> This is locale.get_current_locale_encoding(). I would like to put "current" 
> in the name, because there is a lot of confusion between 
> get_current_locale_encoding() encoding and locale.getpreferredencoding(False) 
> encoding. In locale.getpreferredencoding(False), Python ignores the locale in 
> some cases which is counter intuitive.

These attempts have resulted much of the confusion around the locale
module. It's better not to create more of it.

- "locale" in the name is unnecessary, since this is the locale module.

- If you add "current", people will rightly ask: then what do all the
other APIs in the locale module return ? Of course, they all return
the current state of settings :-) So this is unnecessary as well.

locale.getencoding() works in the same way as locale.getlocale().
It interfaces to the lib C and returns the current encoding setting
as known by the lib C. It's just a more intuitive name than
locale.nl_langinfo(CODESET) and works on Windows as well.

And, again, locale.getpreferredencoding() should be deprecated.
The API has been misused in too many ways and is completely broken
by now. It was a good idea at the time, when Martin added it,
even though I never liked the name.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

Attached encodings.py lists the different "locale encodings" used by Python. 
Example:
---
$ LANG=fr_FR ./python -X utf8 encodings.py fr_FR@euro
Set LC_CTYPE to 'fr_FR@euro'

LC_ALL env var: ''
LC_CTYPE env var: ''
LANG env var: 'fr_FR'
LC_CTYPE locale: 'fr_FR@euro'
Coerce C locale: 0
Python UTF-8 Mode: 1

(1) Python FS encoding
sys.getfilesystemencoding(): 'utf-8'

(2) Python locale encoding
_locale._get_locale_encoding(): 'UTF-8'
locale.getpreferredencoding(False): 'UTF-8'

(3) Current locale encoding
locale.get_current_locale_encoding(): 'ISO-8859-15'

(4) And more encodings for more fun!
locale.getdefaultlocale()[1]: 'ISO8859-1'
locale.getpreferredencoding(True): 'UTF-8'
---

Python starts with LC_CTYPE locale set to fr_FR (ISO8859-1), then the script 
sets the LC_CTYPE locale to fr_FR@euro (ISO-8859-15). The Python UTF-8 Mode is 
enabled explicitly. We get a funny combination of not less than 3 encodings!

* UTF-8
* ISO-8859-1
* ISO-8859-15

Which one is the correct one? Wel... It depends :-)

(1) The Python filesystem encoding is used to call almost all operating system 
functions: encode to the OS and decode from the OS. Filenames, environment 
variables, command line options, etc.

(2) The "Python" locale encoding is used by open() when no encoding is specific.

(3) The current locale encoding is used for a limited amount of functions that 
I listed in msg389063. Most users should not use it.

(4) locale.getpreferredencoding(True) is a weird beast. It is Python locale 
encoding until setlocale(LC_CTYPE, locale) is called for the first time. But it 
can be same if the Python UTF-8 Mode is enabled. I'm not sure in which category 
we should put this function :-(

(4 bis) locale.getdefaultlocale()[1] is the only function returning the 
ISO-8859-1 encoding. This encoding is not used by any function. I'm not sure of 
the purpose of this function. It sounds confusing.


I suggest to deprecate locale.getpreferredencoding(True).

I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate it? 
I never used this function. How is it used? For which purpose?

I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid 
calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") by 
default at startup since the early versions, and it's now called on all 
platforms since Python 3.8. Moreover, its internal database seems to be 
outdated and is painful to maintain (especially if we consider all platforms 
supported by Python, not only Linux, there are many issues on macOS).

--
Added file: https://bugs.python.org/file49894/encodings.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

> Martin later added locale.getpreferredencoding(), which tries to
> determine the encoding in a different way new way, based on
> nl_langset(CODEINFO). As you mentioned, this intention was broken
> on several platforms by forcing UTF-8 as output.

When I designed and implemented the PEP 540 (Python UTF-8 Mode), I tried to 
leave getpreferredencoding() unchanged. The problem was that I quickly got 
mojibake because too many functions call getpreferredencoding(False):

* open() and _pyio.open() -- in Python 3.10, open() now calls the C 
_Py_GetLocaleEncoding() function to fix issues during Python shutdown, it also 
avoids issues at startup.
* Many gettext functions
* cgi to decode the query string from QUERY_STRING env var or sys.argv[1]}
* xml.etree.ElementTree.write(encoding="unicode") is some cases

The Python UTF-8 Mode ignores the locale *on purpose*. But I agree that it's 
surprising and can lead to confusion. That's what I'm trying to fix here :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43555] Location of SyntaxError with new parser missing (after continuation character)

2021-03-19 Thread Andre Roberge



New submission from Andre Roberge :

Normally, for SyntaxErrors, the location of the error is indicated by a ^. 
There is at least one case where the location is missing for 3.9 and 3.10.0a6 
where it was shown before. Using the old parser for 3.9, or with previous 
versions of Python, the location is shown.

Python 3.10.0a6 ... on win32
>>> a = 3 \ 4
  File "", line 1
SyntaxError: unexpected character after line continuation character
>>>


Python 3.9.0 ... on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 3 \ 4
  File "", line 1
SyntaxError: unexpected character after line continuation character
>>>

Using the old parser with Python 3.9, the location of the error is shown 
*after* the unexpected character.

> python -X oldparser
Python 3.9.0 ... on win32
>>> a = 3 \ 4
  File "", line 1
a = 3 \ 4
 ^
SyntaxError: unexpected character after line continuation character
>>>

Using Python 3.8 (and 3.7, 3.6), the location is pointing at the unexpected 
character.


Python 3.8.4 ... on win32
>>> a = 3 \ 4
  File "", line 1
a = 3 \ 4
^
SyntaxError: unexpected character after line continuation character
>>>

--
components: Interpreter Core
messages: 389071
nosy: aroberge
priority: normal
severity: normal
status: open
title: Location of SyntaxError with new parser missing (after continuation 
character)
type: behavior
versions: Python 3.10, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

Recently, I spent some days to document properly encodings used by Python.

Python filesystem encoding:
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_encoding

Python filesystem errors:
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_errors

stdio encoding and errors:
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.stdio_encoding

Glossary: "Locale encoding"
https://docs.python.org/dev/glossary.html#term-locale-encoding

Glossary: "filesystem encoding and error handler"
https://docs.python.org/dev/glossary.html#term-filesystem-encoding-and-error-handler

Python UTF-8 Mode:
https://docs.python.org/dev/library/os.html#utf8-mode

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43529] pathlib.Path.glob causes OSError encountering symlinks to long filenames

2021-03-19 Thread Eric Frederich



Eric Frederich  added the comment:

I'm happy to create a pull request but would need some help.

Looking at that routine it has changed over time and I cannot simply create a 
single patch against 3.6 and have it merge cleanly into newer versions.

I'd need help explaining the process

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



Change by STINNER Victor :


--
nosy: +methane

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

> Read the ANSI code page on Windows,

I don't see why the Windows implementation is inconsistent with POSIX here. If 
it were changed to be consistent, the default encoding at startup would remain 
the same, since setlocale(LC_CTYPE, "") uses the process code page from 
GetACP(). In many if not most cases, no one would be the wiser. But it seems to 
me that if a script calls setlocale(LC_CTYPE, "el_GR"), then it clearly wants 
to encode Greek text (code page 1253). open() with encoding passed as None or 
"locale" should respect this. Similarly if it calls setlocale(LC_CTYPE, 
".UTF-8"), then it wants the default locale (language/region), but with UTF-8 
encoding.

The following is a snippet to get the current locale encoding with ucrt in 
Windows:

#include 

int cp = 0;
__crt_locale_data_public *locale_data;

_locale_t locale = _get_current_locale();
if (locale) {
locale_data = (__crt_locale_data_public *)locale->locinfo;
cp = locale_data->_locale_lc_codepage;
   _free_locale(locale);
}

if (cp == 0) {
/* "C" locale. The CRT in effect uses Latin-1 (cp28591), but 
   Windows Python prefers the process code page. */
cp = GetACP();
}

With ucrt, the C runtime was changed to hide most of the locale definition that 
was previously public, but it intentionally defines __crt_locale_data_public, 
so I'm assuming it's there for programs to use. That said, the fact that we 
have to cast locinfo seems suspect to me. Steve Dower could maybe check with 
the ucrt devs to ensure that this is supported. 

There's also ___lc_codepage() to get the same value more simply, and also more 
efficiently since the current locale data doesn't have to be copied and freed. 
However, it's documented as internal and could be removed (unlikely as that is).

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43244] Move PyArena C API to the internal C API

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:


New changeset 28ad12f8fe889a741661eb99daacebd9243cc1ba by Victor Stinner in 
branch 'master':
bpo-43244: Remove symtable.h header file (GH-24910)
https://github.com/python/cpython/commit/28ad12f8fe889a741661eb99daacebd9243cc1ba


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 12:05, STINNER Victor wrote:
> I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate 
> it? I never used this function. How is it used? For which purpose?
>
> I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid 
> calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") 
> by default at startup since the early versions, and it's now called on all 
> platforms since Python 3.8. Moreover, its internal database seems to be 
> outdated and is painful to maintain (especially if we consider all platforms 
> supported by Python, not only Linux, there are many issues on macOS).

Yes, deprecate it as well. If Python calls setlocale() per default now,
it has served its purpose.

The alias database is needed by the normalization engine. We may be
able to drop the encoding part, but this would have to be checked.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43556] fix attr names for ast.expr and ast.stmt

2021-03-19 Thread Samwyse



New submission from Samwyse :

In Doc/library/ast.rst, the lineno and end_col attributes are repeated; the 
second set should have 'end_' prefixed to them.  Also, there's a minor 
indentation error in the RST file.

# diff ast.rst ast.rst~ 
78c78
<   col_offset
---
> col_offset
83c83
<   :attr:`lineno`, :attr:`col_offset`, :attr:`end_lineno`, and 
:attr:`end_col_offset`
---
>   :attr:`lineno`, :attr:`col_offset`, :attr:`lineno`, and 
> :attr:`col_offset`

--
assignee: docs@python
components: Documentation
messages: 389077
nosy: docs@python, samwyse
priority: normal
severity: normal
status: open
title: fix attr names for ast.expr and ast.stmt
type: enhancement
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43502] [C-API] Convert obvious unsafe macros to static inline functions

2021-03-19 Thread Erlend Egeberg Aasland



Erlend Egeberg Aasland  added the comment:

FYI, thread started on 
https://discuss.python.org/t/what-to-do-with-unsafe-macros/7771?u=erlendaasland

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 12:26, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
> Recently, I spent some days to document properly encodings used by Python.

Thanks for documenting this.

I would prefer to leave the locale module to really just an interface
to the lib C locale logic and not add encoding details which are
specific to Python's view on I/O (sys or io) or the file system (os).

Hopefully, in a few years, we can get rid of all this and standardize
on UTF-8 everywhere.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 12:35, Eryk Sun wrote:
> 
> Eryk Sun  added the comment:
> 
>> Read the ANSI code page on Windows,
> 
> I don't see why the Windows implementation is inconsistent with POSIX here. 
> If it were changed to be consistent, the default encoding at startup would 
> remain the same, since setlocale(LC_CTYPE, "") uses the process code page 
> from GetACP().

I'm not sure I understand what you're saying (but then, I have little
experience with locales on Windows).

My assumption is that nl_langinfo(CODESET) does not work on Windows
or gives wrong results. Is that incorrect ?

If it does work, getencoding() could just be a shim over
nl_langinfo(CODESET) on all platforms.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43492] Upgrade to SQLite 3.35.2 in macOS and Windows

2021-03-19 Thread Erlend Egeberg Aasland



Erlend Egeberg Aasland  added the comment:

SQLite 3.35.3 is upcoming: https://sqlite.org/forum/forumpost/6e2b05ad62?t=h
Seems like we'll have to wait a little bit for 3.35 to stabilise.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

> If Python calls setlocale() per default now, it has served its purpose.

Except not for embedding applications if configure_locale [1] isn't set. But in 
that case determining the default locale isn't Python's problem to solve.

> My assumption is that nl_langinfo(CODESET) does not work on Windows
> or gives wrong results. Is that incorrect ?

There is no such function for CRT locales. I provided two alternatives that 
would allow implementing this consistent with POSIX, and thus avoid all of the 
"except on Windows..." disclaimers that have to explain (apologize) that only 
the process ANSI code page is used in Windows, and, for no good reason as far 
as I can tell, the LC_CTYPE locale encoding is completely ignored.

---

[1] 
https://docs.python.org/3/c-api/init_config.html#c.PyPreConfig.configure_locale

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43244] Move PyArena C API to the internal C API

2021-03-19 Thread STINNER Victor



Change by STINNER Victor :


--
pull_requests: +23694
pull_request: https://github.com/python/cpython/pull/24933

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Marc-Andre Lemburg



Marc-Andre Lemburg  added the comment:

On 19.03.2021 13:25, Eryk Sun wrote:
>> My assumption is that nl_langinfo(CODESET) does not work on Windows
>> or gives wrong results. Is that incorrect ?
> 
> There is no such function for CRT locales. I provided two alternatives that 
> would allow implementing this consistent with POSIX, and thus avoid all of 
> the "except on Windows..." disclaimers that have to explain (apologize) that 
> only the process ANSI code page is used in Windows, and, for no good reason 
> as far as I can tell, the LC_CTYPE locale encoding is completely ignored.

Sounds good. If we can get consistent behavior on Windows as well,
all the better :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43539] test_asyncio: test_sendfile_close_peer_in_the_middle_of_receiving() fails randomly

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

It failed again:
https://github.com/python/cpython/pull/24933/checks?check_run_id=2148450378

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43546] "Impossible" KeyError from importlib._bootstrap acquire line 110

2021-03-19 Thread Anentropic



Anentropic  added the comment:

Upgrading ddtrace lib to latest version did not help.

Disabling ddtrace patching of django module does make the error go away.

Thanks again for your help, I will move my bug report over to ddtrace.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43557] Deprecate getdefaultlocale(), getlocale() and normalize() functions

2021-03-19 Thread STINNER Victor



New submission from STINNER Victor :

I propose to deprecate getdefaultlocale(), getlocale() and normalize() 
functions since they have multiple issues, and remove them in Python 3.12.


The normalize() function uses the locale.locale_alias dictionary which was 
copied from the X11 locale database in 2000. It's hard to keep this dictionary 
up to date and to support all locales of all platforms supported by Python. 
There are multiple issues on macOS for example.

getdefaultlocale() and getlocale() use heuristics to get an encoding from the 
locale name. These heuristics are not reliable.

getdefaultlocale() only rely on environment variables. When setlocale() is 
called, environment variables are not updated, and so the encoding returned by 
getdefaultlocale() is not the effective LC_CTYPE locale encoding. Example:
https://bugs.python.org/issue43552#msg389069

getlocale() open issues:

* bpo-20088: locale.getlocale() fails if locale name doesn't include encoding 
* bpo-23425: Windows getlocale unix-like with french, german, portuguese, 
spanish 
* bpo-33934: locale.getlocale() seems wrong when the locale is yet unset 
(python3 on linux) 
* bpo-38805: locale.getlocale() returns a non RFC1766 language code 
* bpo-43115: locale.getlocale fails if locale is set 

getdefaultlocale() open issue:

* bpo-6981: locale.getdefaultlocale() envvars default code and documentation 
mismatch 
* bpo-30755: locale.normalize() and getdefaultlocale() convert C.UTF-8 to 
en_US.UTF-8


Replacements:

* getdefaultlocale()[1] => getpreferredencoding(False) or 
get_current_locale_encoding(), see bpo-43552
* getlocale(loc) => setlocale(loc) or setlocale(loc, None)
* normalize => no replacement. There is no standard way to normalize a locale 
name.

--
components: Library (Lib)
messages: 389086
nosy: vstinner
priority: normal
severity: normal
status: open
title: Deprecate getdefaultlocale(), getlocale() and normalize() functions
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

I created bpo-43557 "Deprecate getdefaultlocale(), getlocale() and normalize() 
functions". Let's discuss deprecating getdefaultlocale() there.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

> - If you add "current", people will rightly ask: then what do all the
> other APIs in the locale module return ? Of course, they all return
> the current state of settings :-) So this is unnecessary as well.

The problem is that there are two different "locale encodings", what I call:

* "current locale encoding": nl_langinfo(CODESET) in short
* "Python locale encoding": "UTF-8" in some cases, nl_langinfo(CODESET) 
otherwise

It is unfortunate that the Python UTF-8 Mode which "ignores the locale" changes 
the behavior of the locale module, of the locale.getpreferredencoding() 
function. But the ship has sailed.

People are used to look into the "locale" module to get the "locale" encoding. 
So I prefer to put  the function to get the "Python locale encoding" in the 
locale module.

I propose to add "current" in the name since this encoding is not the one you 
are looking for usually.

An alternative is to have a single function with an optional parameter. Example:

* get_locale_encoding() or get_locale_encoding(True) returns the locale encoding
* get_locale_encoding(False) returns the current locale encoding

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Inada Naoki



Inada Naoki  added the comment:

> I created this issue while reviewing the implementation of the PEP 597: PR 
> 19481.

What I want is same to `locale.getpreferredencoding(False)` but ignores UTF-8 
mode.

Background: PEP 597 adds new `encoding="locale"`option to open() and 
TextIOWrapper(). It is same to `encoding=None` for now, but it means using 
"locale encoding" explicitly.

But this is wrong in UTF-8 mode.

In UTF-8 mode, it's fine to `open(filename)` uses UTF-8. But I want to use 
"locale encoding" for `open(filename, encoding="locale")` because "locale" 
encoding is specified.

I don't want to add new meaning here. It should be same to 
`locale.getpreferredencoding(False)` without UTF-8 mode. So I need "cp%d" % 
GetACP() on Windows, not CRT locale encoding.

I don't care its name. both of sys.locale_encoding() and locale.get_encoding() 
are OK.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

> In UTF-8 mode, it's fine to `open(filename)` uses UTF-8. But I want to use 
> "locale encoding" for `open(filename, encoding="locale")` because "locale" 
> encoding is specified.

Is it about the current implementation of the PEP 597, or are you thinking at 
the future Python which would use UTF-8 by default?

Currently, getpreferredencoding(False) respects the behavior that you 
described, no?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Inada Naoki



Inada Naoki  added the comment:

> Is it about the current implementation of the PEP 597, or are you thinking at 
> the future Python which would use UTF-8 by default?

I had forgot to consider about UTF-8 mode while finishing PEP 597. If possible, 
I want to ignore UTF-8 mode when `encoding="locale"` is specified from Python 
3.10.
Otherwise, behavior will be changed between Python 3.10 and 3.11.

> Currently, getpreferredencoding(False) respects the behavior that you 
> described, no?

getpreferredencoding(False) respects UTF-8 mode. That's what PEP 597 said 
(because the PEP don't define behavior in UTF-8 mode) and GH-19481 implements. 

But it is not what I want for now. I want to ignore UTF-8 mode when 
`encoding="locale"` is specified.

This is almost "only in Windows" issue, and users can use `encoding="mbcs"` in 
Windows-only script.

But `encoding="locale"` is new and recommended way to specify using "locale" 
encoding explicitly. When user specify "locale" encoding explicitly, I think we 
should respect it regardless UTF-8 mode.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43555] Location of SyntaxError with new parser missing (after continuation character)

2021-03-19 Thread Karthikeyan Singaravelan



Change by Karthikeyan Singaravelan :


--
nosy: +gvanrossum, lys.nikolaou, pablogsal

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43510] PEP 597: Implemente encoding="locale" option and EncodingWarning

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

I replied to INADA-san message on bpo-43552:
https://bugs.python.org/issue43552#msg389091

> I had forgot to consider about UTF-8 mode while finishing PEP 597. If 
> possible, I want to ignore UTF-8 mode when `encoding="locale"` is specified 
> from Python 3.10.

In this case, the PEP 597 statement that open(filename, encoding="locale") is 
the same  than open(filename) is wrong. It would mean that users which got the 
UTF-8 Mode enabled (implicitly or explicitly) would switch to a legacy encoding 
like latin1 rather than using the UTF-8 encoding, if they add encoding="locale" 
to their open() calls?

Since the final goal is to move everybody towards to UTF-8, I'm not sure how 
it's a good thing.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

Hum, latest messages are specific to the PEP 597 (implementation).

> I had forgot to consider about UTF-8 mode while finishing PEP 597.

I propose to continue the discussion about the PEP 597 in bpo-43510. I replied 
there.

I prefer to keep this issue to discuss the locale module.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43510] PEP 597: Implemente encoding="locale" option and EncodingWarning

2021-03-19 Thread Inada Naoki



Inada Naoki  added the comment:

> Since the final goal is to move everybody towards to UTF-8, I'm not sure how 
> it's a good thing.

The final goal (the third motivation of the pep 597) is changing the default 
encoding (i.e. encoding used when it is not specified) to UTF-8.

But forcing people to use UTF-8 even they specify locale encoding explicitly is 
not the goal. That's why I want to ignore UTF-8 mode when `encoding="locale"` 
is specified.

I think this is almost Windows-only issue, and "mbcs" can be used in Windows 
already. It is documented in 
https://docs.python.org/3/using/windows.html#utf-8-mode

So this is not a blocker. Just my preference.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43510] PEP 597: Implemente encoding="locale" option and EncodingWarning

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

I see different cases when open() is called with no encoding argument:

(A) User wants to use UTF-8: add encoding="utf-8"

(B) Windows user wants to use the ANSI code page of their computer, local file 
not intended to be shared with other computers: add encoding="mbcs". This makes 
the code specific to Windows ("mbcs" alias doesn't exist on Unix).

(C) User wants to use the locale encoding and is fine with the UTF-8 Mode: add 
encoding=getpreferredencoding(False)

(D) Unix user wants to use the locale encoding but not the UTF-8 Mode: 
encoding=get_current_locale_encoding() (function proposed in bpo-43552) or 
nl_langinfo(CODESET) (should work on any Python version). I don't know if 
nl_langinfo(CODESET) is available on Windows.

(E) User has no idea of what they are doing and don't understand anything to 
Unicode: please trust us and specify explicitly UTF-8 :-)

Apart the encoding="utf-8" case, I understand that they are two main complex 
cases:

(1) "UTF-8" in the UTF-8 Mode, or the locale encoding
(2) Always use the locale encoding, ignore the UTF-8 Mode

What I don't expect is the current behavior, before PEP 597. Who uses open() 
without specifying an encoding but always want to use the locale encoding? 
(case 2) So this use case is already broken when the UTF-8 Mode is enabled 
explicitly?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43557] Deprecate getdefaultlocale(), getlocale() and normalize() functions

2021-03-19 Thread Marc-Andre Lemburg



Marc-Andre Lemburg  added the comment:

+1 on getdefaultlocale() as mentioned in https://bugs.python.org/issue43552

However, -1 on getlocale() and normalize().

Those two are needed to access and successfully set the locale on
Linux: the lib C setlocale() is very picky about locale names and
so normalization helps in finding the right form and getting
usable results across platforms.

The issues open for these should be addressed and fixed.

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43558] The dataclasses documentation should mention how to call super().init

2021-03-19 Thread Eric V. Smith



New submission from Eric V. Smith :

https://docs.python.org/3/library/dataclasses.html#post-init-processing should 
mention that if you need to call super().__init__, you should do it in 
__post_init__. Dataclasses cannot know what parameters to pass to the super 
class's __init__, so you'll need to do it yourself manually in __post_init__.

--
assignee: eric.smith
components: Documentation
messages: 389097
nosy: eric.smith
priority: low
severity: normal
stage: needs patch
status: open
title: The dataclasses documentation should mention how to call super().__init__
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 14:47, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
>> - If you add "current", people will rightly ask: then what do all the
>> other APIs in the locale module return ? Of course, they all return
>> the current state of settings :-) So this is unnecessary as well.
> 
> The problem is that there are two different "locale encodings", what I call:
> 
> * "current locale encoding": nl_langinfo(CODESET) in short
> * "Python locale encoding": "UTF-8" in some cases, nl_langinfo(CODESET) 
> otherwise

The UTF-8 mode is a Python invention. It doesn't have anything to
do with the lib C locale functions, which this module addresses and
interfaces to.

Please don't mix the two.

In fact, in order to avoid issues, Python should probably set the locale
encoding to UTF-8 as well, when run in UTF-8 mode. It's dangerous to
have Python and the lib C use different assumptions about the encoding,
esp. in embedded applications.

> It is unfortunate that the Python UTF-8 Mode which "ignores the locale" 
> changes the behavior of the locale module, of the 
> locale.getpreferredencoding() function. But the ship has sailed.
> 
> People are used to look into the "locale" module to get the "locale" 
> encoding. So I prefer to put  the function to get the "Python locale 
> encoding" in the locale module.
> 
> I propose to add "current" in the name since this encoding is not the one you 
> are looking for usually.
> 
> An alternative is to have a single function with an optional parameter. 
> Example:
> 
> * get_locale_encoding() or get_locale_encoding(True) returns the locale 
> encoding
> * get_locale_encoding(False) returns the current locale encoding

-1, both on the names and the idea to again add parameters which change
their meaning. We should have one function per meaning and really
only need the interface getencoding(), since the UTF-8 mode
doesn't fit into the locale module scope.

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43510] PEP 597: Implemente encoding="locale" option and EncodingWarning

2021-03-19 Thread Inada Naoki



Inada Naoki  added the comment:

> (1) "UTF-8" in the UTF-8 Mode, or the locale encoding
> (2) Always use the locale encoding, ignore the UTF-8 Mode
>
> What I don't expect is the current behavior, before PEP 597. Who uses open() 
> without specifying an encoding but always want to use the locale encoding? 
> (case 2) So this use case is already broken when the UTF-8 Mode is enabled 
> explicitly?

Yes, it is broken already.  So they can not use UTF-8 mode.

If `encoding="locale"` ignore UTF-8 mode, it save the use case. They can add 
`encoding="locale"` where they need to use locale/GetACP encoding and enable 
UTF-8 mode.

That's why it is important If we enable UTF-8 mode by default in the future.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: [issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread M.-A. Lemburg

On 19.03.2021 14:57, Inada Naoki wrote:
> 
> Background: PEP 597 adds new `encoding="locale"`option to open() and 
> TextIOWrapper(). It is same to `encoding=None` for now, but it means using 
> "locale encoding" explicitly.
> 
> But this is wrong in UTF-8 mode.

Please address UTF-8 mode explicitly in open() or elsewhere. The locale
module is about the state of the lib C, not what Python enforces via
options in its own I/O layers.

As mentioned, both should ideally be synchronized, though, so
UTF-8 mode in Python should trigger setting a UTF-8 encoding
via setlocale().

-- 
Marc-Andre Lemburg
eGenix.com

___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Inada Naoki



Inada Naoki  added the comment:

> Please address UTF-8 mode explicitly in open() or elsewhere. The locale
> module is about the state of the lib C, not what Python enforces via
> options in its own I/O layers.

I agree with you. APIs in locale module shouldn't aware UTF-8 mode.

`locale.getpreferredencoding()` is special, because it "Return the encoding 
used for text data, according to user preferences. User preferences are 
expressed differently on different systems, and might not be available 
programmatically on some systems, so this function only returns a guess."


> As mentioned, both should ideally be synchronized, though, so
> UTF-8 mode in Python should trigger setting a UTF-8 encoding
> via setlocale().

There is PEP 538 already :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

On 19.03.2021 16:15, Inada Naoki wrote:
> 
> `locale.getpreferredencoding()` is special, because it "Return the encoding 
> used for text data, according to user preferences. User preferences are 
> expressed differently on different systems, and might not be available 
> programmatically on some systems, so this function only returns a guess."

I already wrote earlier that we should deprecate this API, since the
overloading with different meanings in the past has turned it into
an unreliable source of information. At this point, it returns
"some encoding, which may or may not be what you want" :-)

We need to get things separated out clearly again: the locale
module is for the lib C locale state. What Python does in the
I/O layers has to be defined and queries at the appropriate
places elsewhere (e.g. os, sys or io modules).

>> As mentioned, both should ideally be synchronized, though, so
>> UTF-8 mode in Python should trigger setting a UTF-8 encoding
>> via setlocale().
> 
> There is PEP 538 already :)

Great :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43179] Remove 31/32-bit s390 Linux support (s390-linux-gnu triplet)

2021-03-19 Thread STINNER Victor



Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43179] Remove 31/32-bit s390 Linux support (s390-linux-gnu triplet)

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

https://bugs.python.org/issue43179 and 
https://mail.python.org/archives/list/python-...@python.org/thread/F5BXISYP7RAINXUMYJSEYG7GCFRFAENF/
 discussions didn't reach kind of consensus. I'm tired of these discussions, so 
I just closed my PR 24534. If someone wants to drop support of the 31-bit Linux 
s390 platform, please go ahead, feel free to copy my patch.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43179] Remove 31/32-bit s390 Linux support (s390-linux-gnu triplet)

2021-03-19 Thread John Paul Adrian Glaubitz



John Paul Adrian Glaubitz  added the comment:

I think there is one productive result of this discussion which is this patch 
by Jessica Clark which gets rid of architecture-specific alignment code:

> https://github.com/python/cpython/pull/24624

Unfortunately, it has not seen any positive reviews yet. Getting this merged 
would remove some maintenance burden from the CPython maintainers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue41718] test.regrtest has way too many imports

2021-03-19 Thread STINNER Victor



Change by STINNER Victor :


--
pull_requests: +23695
pull_request: https://github.com/python/cpython/pull/24934

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue41718] test.regrtest has way too many imports

2021-03-19 Thread STINNER Victor



STINNER Victor  added the comment:

Serhiy: "You could save/restore this data only when corresponded modules was 
imported, like it was done in clear_caches() in refleak.py."

That's a very good idea! I implemented it in PR 24934. But I modified runtest() 
to use *two* saved_test_environment instance. One before the test module is 
imported, one after. The one before is needed to check if the import itself has 
side effect, for example if the module body has side effect. The second is to 
check if running tests has side effect. The second one is more likely to have 
modules imported. The first one may miss some bugs, but IMO it's an acceptable 
trade-off.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43535] Make str.join auto-convert inputs to strings.

2021-03-19 Thread Matthew Barnett



Matthew Barnett  added the comment:

I'm also -1, for the same reason as Serhiy gave. However, if it was opt-in, 
then I'd be OK with it.

--
nosy: +mrabarnett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43559] ctypes: Heap Pointer is damaged between C and Python

2021-03-19 Thread Canberk Sönmez


New submission from Canberk Sönmez :

Please see the SO post:

https://stackoverflow.com/questions/66713071/ctypes-heap-pointer-is-damaged-between-c-and-python-linux-x86-64

In summary, when I return a pointer to a heap-allocated memory location from a 
C function, its most significant 32 bits are chopped off for some reason.

I observed this behavior in Python 3.7 and Python 3.8, on Ubuntu 18.04 and 
Centos 7 (x86_64).

--
messages: 389107
nosy: canberk.sonmez.409
priority: normal
severity: normal
status: open
title: ctypes: Heap Pointer is damaged between C and Python

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43554] email: encoded headers lose their quoting when refolded

2021-03-19 Thread hai shi



Change by hai shi :


--
nosy: +barry, maxking, r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43560] Modify SAX/expat parsing to avoid fragmentation of already-tiny content chunks

2021-03-19 Thread Larry Trammell



New submission from Larry Trammell :

Issue 43483 was posted as a "bug" but retracted.  Though the problem is real, 
it is tricky to declare an UNSPECIFIED behavior to be a bug.  See that issue 
page for more discussion and a test case.  A brief overview is repeated here.

SCENARIO - XML PARSING LOSES DATA (or not)

The parsing attempts to capture text consisting of very tiny quoted strings. A 
typical content line reads something like this: 

   Colchuck

The parser implements a scheme presented at various tutorial Web sites, using 
two member functions. 

   # Note the name attribute of the current tag group
   def element_handler(self, tagname, attrs) :
   self.CurrentTag = tagname  

   # Record the content from each "p" tag when encountered
   def characters(self, content):
   if self.CurrentTag == "p":
   self.name = content

   ...

   > print(parser.name)
   "Colchuck" 

But then, after successfully extracting content from perhaps hundreds of 
thousands of XML tag sets in this way, the parsing suddenly "drops" a few 
characters of content. 

   > print(parser.name)
   "lchuck" 

While this problem was observed with a SAX parser, it can affect expat parsers 
as well.  It affects 32-bit and 64-bit implementations the same, over several 
major releases of the Python 3 system.  

SPECIFIED BEHAVIOR (or not) 

The "xml.sax.handler" page in the Python 3.9.2 Documentation for the Python 
Standard Library (and many prior versions) states:

---
ContentHandler.characters(content) -- The Parser will call this method to 
report each chunk of character data.  SAX parsers may return all contiguous 
character data in a single chunk, or they may split it into several chunks...
---

If it happens that the content is delivered in two chunks instead of one, the 
characters() method shown above overwrites the first part of the text with the 
second part, and some content seems lost.  This completely explains the 
observed behavior.  

EXPECTED BEHAVIOR (or not)

Even though the behavior is unspecified, users can have certain expectations 
about what a reasonable parser should do.  Among these:

  -- EFFICIENCY: the parser should do simple things simply, and complicated 
things as simply as possible
  -- CONSISTENCY: the parser behavior should be repeatable and dependable

The design can be considered "poor" if thorough testing cannot identify what 
the actual behaviors are going to be, because those behaviors are rare and 
unpredictable.

The obvious "simple thing," from the user perspective, is that the parser 
should return each tiny text string as one tiny text chunk.  In fact, this is 
precisely what it does... 99.999% of the time.  But then, suddenly, it doesn't. 
 

One hypothesis is that when the parsing scan of raw input text reaches the end 
of a large internal text buffer, it is easier from the implementer's 
perspective to flush any text remaining in the old buffer prior to fetching a 
new one, even if that produces a fragmented chunk with only a couple of 
characters.  

IMPROVEMENTS REQUIRED

Review the code to determine whether the text buffer scenario is in fact the 
primary cause of inconsistent behavior. Modify the data handling to defer 
delivery of content fragments that are small, carrying over a small amount of 
previously scanned text so that small contiguous text chunks are recombined 
rather than reported as multiple fragments. If the length of the content text 
to carry over is greater than some configurable 
xml.sax.handler.ContiguousChunkLength, the parser can go ahead and deliver it 
as a fragment.  

DOCUMENTING THE IMPROVEMENTS 

Strictly speaking:  none required.  Undefined behaviors are undefined, whether 
consistent or otherwise.  But after the improvements are implemented, it would 
be helpful to modify documentation to expose the new performance guarantees, 
making users more aware of the possible hazards.  For example, a new 
description in the "xml.sax.handler" page might read as follows: 

---
ContentHandler.characters(content) -- The Parser will call this method to 
report chunks of character data.  In general, character data may be reported as 
a single chunk or as sequence of chunks; but character data sequences with 
fewer than  xml.sax.handler.ContiguousChunkLength characters, when 
uninterrupted any other xml.sax.handler.ContentHandler event, are guaranteed to 
be delivered as a single chunk...  
---

--
components: XML
messages: 389108
nosy: ridgerat1611
priority: normal
severity: normal
status: open
title: Modify SAX/expat parsing to avoid fragmentation of already-tiny content 
chunks
type: enhancement
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/pyth

[issue43559] ctypes: Heap Pointer is damaged between C and Python

2021-03-19 Thread Eric V. Smith



Eric V. Smith  added the comment:

Are you using a 64-bit version of python? What is sys.maxsize?

--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43559] ctypes: Heap Pointer is damaged between C and Python

2021-03-19 Thread Canberk Sönmez


Canberk Sönmez  added the comment:

Alright, I solved the problem. It was simply a typo: "restypes" instead of 
"res_types". It didn't cause a problem in Python 3.6 but obviously, something 
was changed. A nice error message might be nice when setting these attributes.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43561] Modify XML parsing library descriptions to forewarn of content loss hazard

2021-03-19 Thread Larry Trammell



New submission from Larry Trammell :

With reference to improvement issue 43560 :

If those improvements remain unimplemented, or are demoted to "don't fix", 
users are left in the tricky situation where XML parsing applications can fail, 
apparently "losing content" in a rare and unpredictable manner.  It would be 
useful to patch the documentation to give users fair warning of this hazard. 

For example: the "xml.sax.handler" page in the Python 3.9.2 Documentation for 
the Python Standard Library (and many prior versions) currently states:

---
ContentHandler.characters(content) -- The Parser will call this method to 
report each chunk of character data.  SAX parsers may return all contiguous 
character data in a single chunk, or they may split it into several chunks...
---
 
The modified documentation would read something like the following:

---
ContentHandler.characters(content) -- The Parser will call this method to 
report each chunk of character data.  SAX parsers may return all contiguous 
character data in a single chunk, or they may split it into several chunks... 
To avoid a situation in which one small content fragment unexpectedly 
overwrites another one, it is essential for the characters() method to collect 
content by appending, rather than by assignment.
---

To give a concrete example, suppose that a Python programming site recommends 
the following coding to preserve a small text chunk bracketed by "" tags: 

   # Note the name attribute of the current tag group
   def element_handler(self, tagname, attrs) :
   self.CurrentTag = tagname  

   # Record the content from each "p" tag when encountered
   def characters(self, content):
   if self.CurrentTag == "p" :
   self.name = content

Even though that coding could be expected to work most of the time, it is 
exposed to the hazard that an unanticipated sequence of calls to the 
characters() function would overwrite data.

Instead, the coding should look something like this.

   # Note the name attribute of the current tag group
   def element_handler(self, tagname, attrs) :
   self.CurrentTag = tagname 
   self.name = "" 

   # Accumulate the content from each "p" tag when encountered
   def characters(self, content):
   if self.CurrentTag == "p":
   self.name.append(content)

--
assignee: docs@python
components: Documentation
messages: 389111
nosy: docs@python, ridgerat1611
priority: normal
severity: normal
status: open
title: Modify XML parsing library descriptions to forewarn of content loss 
hazard
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43483] Loss of content in simple (but oversize) SAX parsing

2021-03-19 Thread Larry Trammell



Larry Trammell  added the comment:

Check out issues 

43560 (an enhancement issue to improve handling of small XML content chunks)

43561 (a documentation issue to give users warning about the hazard in the 
interim before the changes are implemented)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43484] we can create valid datetime objects that become invalid if the timezone is changed

2021-03-19 Thread Éric Araujo


Change by Éric Araujo :


--
nosy: +eric.araujo
versions:  -Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43562] test_ssl.NetworkedTests.test_timeout_connect_ex fails if network is unreachable

2021-03-19 Thread Carl Meyer



New submission from Carl Meyer :

In general it seems the CPython test suite takes care to not fail if the 
network is unreachable, but `test_timeout_connect_ex` fails because the result 
code of the connection is checked without any exception being raised that would 
reach `support.transient_internet`.

--
components: Tests
messages: 389113
nosy: carljm
priority: normal
severity: normal
status: open
title: test_ssl.NetworkedTests.test_timeout_connect_ex fails if network is 
unreachable
type: behavior
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43563] Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver

2021-03-19 Thread Vladimir Matveev



New submission from Vladimir Matveev :

Calling methods and lookup up attributes when receiver is `super()` has extra 
cost comparing to regular attribute lookup. It mainly comes from the need to 
allocate and initialize the instance of the `super` which for zero argument 
case also include peeking into frame/code object for the `__class__` cell and 
first argument. In addition because `PySuper_Type` has custom implementation of 
tp_getattro - `_PyObject_GetMethod` would always return bound method. 
```
import timeit

setup = """
class A:
def f(self): pass
class B(A):
def f(self): super().f()
def g(self): A.f(self)
b = B()
"""
print(timeit.timeit("b.f()", setup=setup, number=2000))
print(timeit.timeit("b.g()", setup=setup, number=2000))

7.329449548968114
3.892987059080042
```
 
One option to improve it could be to make compiler/interpreter aware of super 
calls so they can be treated specially. Attached patch introduces two new 
opcodes LOAD_METHOD_SUPER and LOAD_ATTR_SUPER that are intended to be 
counterparts for LOAD_METHOD and LOAD_ATTR for cases when receiver is super 
with either zero or two arguments.

Immediate argument for both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER is a pair 
that consist of:
 0: index of method/attribute in co_names
 1: Py_True if super was originally called with 0 arguments and Py_False 
otherwise.

Both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER expect 3 elements on the stack:
TOS3: global_super
TOS2: type
TOS1: self/cls

Result of LOAD_METHOD_SUPER is the same as LOAD_METHOD.
Result of LOAD_ATTR_SUPER is the same as LOAD_ATTR

In runtime both LOAD_METHOD_SUPER and LOAD_ATTR_SUPER will check if 
`global_super` is `PySuper_Type` to handle situations when `super` is patched. 
If `global_super` is `PySuper_Type` then it can use dedicated routine to 
perform the lookup for provided `__class__` and `cls/self` without allocating 
new `super` instance. If `global_super` is different from `PySuper_Type` then 
runtime will fallback to the original logic using `global_super` and original 
number of arguments that was captured in immediate.

Benchmark results with patch:
4.381768501014449
3.9492998640052974

--
components: Interpreter Core
messages: 389114
nosy: v2m
priority: normal
severity: normal
status: open
title: Use dedicated opcodes to speed up calls/attribute lookups with super() 
as receiver
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43563] Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver

2021-03-19 Thread Vladimir Matveev



Change by Vladimir Matveev :


--
keywords: +patch
pull_requests: +23696
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24936

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43562] test_ssl.NetworkedTests.test_timeout_connect_ex fails if network is unreachable

2021-03-19 Thread Carl Meyer



Change by Carl Meyer :


--
keywords: +patch
pull_requests: +23697
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24937

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43563] Use dedicated opcodes to speed up calls/attribute lookups with super() as receiver

2021-03-19 Thread Batuhan Taskaya



Change by Batuhan Taskaya :


--
nosy: +BTaskaya

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43564] some tests in test_urllib2net fail instead of skipping on unreachable network

2021-03-19 Thread Carl Meyer



New submission from Carl Meyer :

In general it seems the CPython test suite takes care to skip instead of 
failing networked tests when the network is unavailable (c.f. 
`support.transient_internet` test helper).

In this case of the 5 FTP tests in `test_urllib2net` (that is, `test_ftp`, 
`test_ftp_basic`, `test_ftp_default_timeout`, `test_ftp_no_timeout`, and 
`test_ftp_timeout`), even though they use `support_transient_internet`, they 
still fail if the network is unavailable.

The reason is that they make calls which end up raising an exception in the 
form `URLError("ftp error: OSError(101, 'Network is unreachable')"` -- the 
original OSError is flattened into the exception string message, but is 
otherwise not in the exception args. This means that `transient_network` does 
not detect it as a suppressable exception.

It seems like many uses of `URLError` in urllib pass the original `OSError` 
directly to `URLError.__init__()`, which means it ends up in `args` and the 
unwrapping code in `transient_internet` is able to find the original `OSError`. 
But the ftp code instead directly interpolates the `OSError` into a new message 
string.

--
components: Tests
messages: 389115
nosy: carljm
priority: normal
severity: normal
status: open
title: some tests in test_urllib2net fail instead of skipping on unreachable 
network
type: behavior
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43564] ftp tests in test_urllib2net fail instead of skipping on unreachable network

2021-03-19 Thread Carl Meyer



Change by Carl Meyer :


--
title: some tests in test_urllib2net fail instead of skipping on unreachable 
network -> ftp tests in test_urllib2net fail instead of skipping on unreachable 
network

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43564] ftp tests in test_urllib2net fail instead of skipping on unreachable network

2021-03-19 Thread Carl Meyer



Change by Carl Meyer :


--
keywords: +patch
pull_requests: +23699
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24938

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43564] ftp tests in test_urllib2net fail instead of skipping on unreachable network

2021-03-19 Thread Carl Meyer



Carl Meyer  added the comment:

Created a PR that fixes this by being more consistent in how urllib wraps 
network errors. If there are backward-compatibility concerns with this change, 
another option could be some really ugly regex-matching code in 
`test.support.transient_internet`.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43550] pip.exe is missing from the NuGet package

2021-03-19 Thread Steve Dower



Steve Dower  added the comment:

Unfortunately not, because we don't know where it will be installed to, and 
that executable embeds the full path to its matching python.exe.

I suggest running "python.exe -m pip" instead.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

> But it is not what I want for now. I want to ignore UTF-8 mode 
> when `encoding="locale"` is specified.
> This is almost "only in Windows" issue, and users can use 
> `encoding="mbcs"` in Windows-only script.

Why is it being specified that the current LC_CTYPE encoding should be ignored 
in Windows when a "locale" encoding is requested? Cross-platform C code would 
use mbstowcs() and wcstombs(), with the current LC_CTYPE encoding. That's 
Latin-1 in the initial "C" locale and defaults to GetACP() if 
setlocale(LC_CTYPE, "") is called, but otherwise it's whatever locale is 
requested by the program and supported by the system (all Windows installations 
support pretty much every locale).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43550] pip.exe is missing from the NuGet package

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

I suppose if you really need a plain `pip` command, you could reinstall with 
`python.exe -m pip install --force-reinstall pip`.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43481] PyEval_EvalCode() namespace issue not observed in Python 2.7.

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

I cannot reproduce in Python with either 3.8 or 3.10.  (Please try with latter 
if you can.)  I thought the issue might possibly be passing two different 
dicts, which results in the code being executed as if in a class statement, but 
it is not.  

code = '''
c=[1,2,3,4]
d={'list': [c[i] for i in range(len(c))]}
print(d)
'''
bcode = compile(code, '', 'exec')
gdict = globals()
ldict = {}
exec(bcode, gdict, gdict)
exec(bcode, gdict, ldict)
class C:
c=[1,2,3,4]
d={'list': [c[i] for i in range(len(c))]}
print(d)

prints {'list': [1, 2, 3, 4]} 3 times.  Using 'eval' instead of 'exec' gives 
the same.  I presume that code compiled with 'exec' is 'exec'ed even if use 
eval.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43484] valid datetimes can become invalid if the timezone is changed

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

This tracker is for patching CPython, including the docs.  Questions for 
discussion should be posted to python-list.  Perhaps such a discussion would 
lead to a concrete change proposal.  In the meanwhile, I think that this should 
be closed as 'not a bug' (as you admit).

--
nosy: +terry.reedy
title: we can create valid datetime objects that become invalid if the timezone 
is changed -> valid datetimes can become invalid if the timezone is changed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43487] Rename unicode methods to str in 2to3 conversion

2021-03-19 Thread Terry J. Reedy



Change by Terry J. Reedy :


--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43555] Location of SyntaxError with new parser missing (after continuation character)

2021-03-19 Thread Pablo Galindo Salgado



Change by Pablo Galindo Salgado :


--
keywords: +patch
pull_requests: +23700
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24939

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43490] IDLE freezes at random

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

How are you starting IDLE? How are you using the turtle module?  What does 
"can't get the window open" mean exactly?  Does the menu item Shell => Restart 
Shell work?

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43484] valid datetimes can become invalid if the timezone is changed

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

That it allows creating the datetime instance looks like a bug to me, i.e. a 
time before 0001-01-01 00:00 UTC is invalid. What am I misunderstanding?

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43494] Minor changes to Objects/lnotab_notes.txt

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:


New changeset 7cb033c423b65def1632d6c3c747111543b342a2 by Skip Montanaro in 
branch 'master':
bpo-43494: Make some minor changes to lnotab notes (GH-24861)
https://github.com/python/cpython/commit/7cb033c423b65def1632d6c3c747111543b342a2


--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43494] Minor changes to Objects/lnotab_notes.txt

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

Skip, if you do not make a backport, you can close this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43494] Minor changes to Objects/lnotab_notes.txt

2021-03-19 Thread Skip Montanaro



Skip Montanaro  added the comment:

Closing, per Terry's comment.

--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43504] Site linked in docs, effbot.org, down

2021-03-19 Thread Terry J. Reedy



Change by Terry J. Reedy :


--
assignee:  -> docs@python
components: +Documentation
nosy: +docs@python
stage:  -> needs patch
title: effbot.org down -> Site linked in docs, effbot.org, down
versions: +Python 3.10, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43484] valid datetimes can become invalid if the timezone is changed

2021-03-19 Thread Paul Ganssle


Paul Ganssle  added the comment:

> That it allows creating the datetime instance looks like a bug to me, i.e. a 
> time before 0001-01-01 00:00 UTC is invalid. What am I misunderstanding?

`datetime.datetime(1, 1, 1, tzinfo=timezone(timedelta(hours=1)))` is a valid 
datetime, it's just that it cannot be converted to all other timestamps, 
because in some time zones, the same absolute time is out of datetime's range.

`datetime.datetime` is a representation of an abstract datetime, and it can 
also be annotated with a time zone to basically tag the civil time with a 
function for converting it into other representations of the same *absolute* 
time. The range of valid `datetime.datetime` objects is based entirely on the 
naïve portion of the datetime, and has nothing to do with the absolute time. So 
this is indeed a natural consequence of the chosen design.

If we wanted to change things, it would cause a number of problems, and the 
cure would be much worse than the "disease". For one thing, accessing UTC 
offsets is done lazily, so `.utcoffset()` is not currently called during 
`datetime` creation. The datetime documentation lays out that this is 
consistent with the raison d'être of `datetime`: "While date and time 
arithmetic is supported, the focus of the implementation is on efficient 
attribute extraction for output formatting and manipulation." In order to 
determine whether a given `datetime` can always be converted to an equivalent 
datetime in any time zone, we'd need to actively determine its UTC offset, 
which would be a major performance regression in creating aware datetimes. We 
could avoid this performance regression by only doing the `.utcoffset()` check 
when the datetime is within 2 days of `MINYEAR` or `MAXYEAR`, but while this 
would be a more minor performance regression, it would also add new edge cases 
where `.utcoffset()` is sometimes called during the constructor and sometimes 
not, which is not ideal. Not to mention if we were to ever open up the allowed 
return values for `.utcoffset()` the logic might get hairier (depending on the 
nature of the allowed values).

Another issue with "fixing" this is that it would take currently-valid 
datetimes and turn them into invalid datetimes, which violates backwards 
compatibility. I imagine in most cases this is only done as part of test 
suites, since TZ-aware datetimes near 0 and 10,000 CE are anachronistic and not 
likely to be of much instrumental value, but the same can be said of these 
potentially "invalid" dates in the first place.

Additionally, even worse is that even naïve datetimes can be converted to UTC 
or other time zones, and if we want to add a new constraint that 
`some_datetime.astimezone(some_timezone)` must always work, then you wouldn't 
even be able to *construct* `datetime.MINYEAR` or `datetime.MAXYEAR`, since 
`datetime.MINYEAR.astimezone(timezone(timedelta(hours=-24)))` would fail 
everywhere, and worse, the minimum datetime value you could construct would 
depend on your system locale! Again, the alternative would be to make an 
exception for naïve datetimes, but given that this change is of dubious value 
to start with, I don't think it is worth it.

> So I'm pretty sure this is "not a bug" but it's a bit of a problem and I have 
> a user suggesting the "security vulnerability" bell on this one, and to be 
> honest I don't even know what any library would do to "prevent" this.

I don't really know why it would be a "security vulnerability", but presumably 
a library could either convert their datetimes to UTC as soon as they get them 
from the user if they want to use them as UTC in the future, or they could 
simply refuse to accept any datetimes outside the range 
`datetime.datetime.MINYEAR + timedelta(hours=48) < dt.replace(tzinfo=None) < 
datetime.datetime.MAXYEAR - timedelta(hours=48)`, or if the concern is only 
about UTC, then refuse datetimes outside the range 
`datetime.MINYEAR.replace(tzinfo=timezone.utc) < dt < 
datetime.MAXYEAR.replace(tzinfo=timezone.utc)`.

> Why's this a security problem?   ish?because PostgreSQL has a data type 
> "TIMESTAMP WITH TIMEZONE" and if you take said date and INSERT it into your 
> database, then SELECT it back using any Python DBAPI that returns datetime() 
> objects like psycopg2, if your server is in a timezone with zero or negative 
> offset compared to the given date, you get an error.  So the mischievous user 
> can create that datetime for some reason and now they've broken your website 
> which can't SELECT that table anymore without crashing.

Can you clarify why this crashes? Is it because it always returns the datetime 
value in UTC?

> So, suppose you maintain the database library that helps people send data in 
> and out of psycopg2.We have, the end user's application, we have the 
> database abstraction library, we have the psycopg2 driver, we have Python's 
> datetime() object with MIN_YEAR, and finally we have PostgreSQL with the 
> TIMEZON

[issue43484] valid datetimes can become invalid if the timezone is changed

2021-03-19 Thread mike bayer



mike bayer  added the comment:

> 
I don't really know why it would be a "security vulnerability", but presumably 
a library could either convert their datetimes to UTC as soon as they get them 
from the user if they want to use them as UTC in the future, or they could 
simply refuse to accept any datetimes outside the range 
`datetime.datetime.MINYEAR + timedelta(hours=48) < dt.replace(tzinfo=None) < 
datetime.datetime.MAXYEAR - timedelta(hours=48)`, 

this is absolutely correct, but I'm not sure if you're aware, there's kind of a 
whole subsection of the tech community that considers anything that a user 
might do without understanding all the consequences which could in any way 
allow untrusted input to affect things to be a "security risk".  In SQLAlchemy 
i had CVEs posted because we have a method called order_by() that accepted a 
string, and the notion was, someone will write a web app that takes an 
arbitrary string as input and send it there!  CVE!   For you and me that would 
of course be crazy as this is obviously part of the SQL string being sent to 
the database,  but this is a particular programming subculture that has the 
ability to create a lot of havoc by filling up the CVE system with "Security 
Vulnerabilities" based on what many of us consider obviously wrong.

> Can you clarify why this crashes? Is it because it always returns the 
> datetime value in UTC?

it returns the datetime value in the default timezone set up for the server 
which could be UTC or a local timezone, but the idea is it's potentially 
different from the timezone that's been put in.

> I'll note that I don't actually understand the difference between 
> "abstraction layer" and "psycopg2 driver", so I may be conflating those two, 

from my POV I have always thought PostgreSQLs' TIMESTAMP WITH TIMEZONE datatype 
is nuts, and that you should only be sending UTC timestamps to a database.   
But people want to use PG's type and IMO they need to understand what they're 
doing.  thanks for the response.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43518] textwrap.shorten does not always respect word boundaries

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

Verified in 3.10.0a6 that change is at 3 !s.  I agree that is is a bug relative 
to the doc.

The issue is that 'world!!!' is 8 chars, and by default, wrap splits that into 
'w' and 'orld!!!' and add ' w' to 'hello'.
>>> sh('hello world!!!', width=7)
['hello w', 'orld!!!']

A solution is to not break long words.
>>> sh('hello world!!!', width=7, placeholder='', break_long_words=False)
'hello'

Then

>>> sh('hello world!!!', width=7, placeholder='', break_long_words=False)
''

versus

>>> sh('hello world!!!', width=7, placeholder='')
'hello!!'

The docstring and doc say "enough words are dropped from the end so that the 
remaining words plus the placeholder fit within width:".  Taking this 
literally, '' is correct.  So a fix would be to add "break_long_words=False" to 
options if break_long_words not in options.

Antoine, you last touched the shorten docstring.  Serhiy, you last touched its 
code.  What do you two think?

--
nosy: +pitrou, serhiy.storchaka, terry.reedy
stage:  -> needs patch
versions: +Python 3.10, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43484] valid datetimes can become invalid if the timezone is changed

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

Thank you for thoughtful and detailed answer, Paul.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Inada Naoki


Inada Naoki  added the comment:

> Why is it being specified that the current LC_CTYPE encoding should be 
> ignored in Windows when a "locale" encoding is requested?

Because `encoding="locale"` must be replacement of the current `encoding=None` 
(i.e. locale.getpreferredencoding(False).

`encoding=None` behavior will be changed if we change the default encoding or 
enable UTF-8 mode by default. So we are adding an explicit name to current 
behavior.

So It is not an option to assign other encoding. See PEP 597 for detail.

I know you are proposing to use CRT locale on Windows. If we change the 
`locale.getpreferredencoding(False)` to use CRT locale, `encoding="locale"` 
follow it.
But please discuss it in another issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43520] Make Fraction(string) handle non-ascii slashes

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

I agree with Raymond, at least for now.  I would expect the string argument to 
Fraction to be quoted legal Python code.  Without a lot of thought and 
discussion leading to a change in python design with respect to unicode and 
operators, this limits  '/' to ascii '/'.

I believe that we accept non-ascii digits in at least some places, but 
operators are a different case.

--
nosy: +terry.reedy
title: Fraction only handles regular slashes ("/") and fails with other similar 
slashes -> Make Fraction(string) handle non-ascii slashes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43526] Programmatic management of BytesWarning doesn't work for native triggers.

2021-03-19 Thread Terry J. Reedy



Change by Terry J. Reedy :


--
versions: +Python 3.10 -Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43565] PyUnicode_KIND macro does not has specified return type

2021-03-19 Thread Max Bachmann



New submission from Max Bachmann :

The documentation stated, that the PyUnicode_KIND macro has the following 
interface:
- int PyUnicode_KIND(PyObject *o)
However it actually returns a value of the underlying type of the 
PyUnicode_Kind enum. This could be e.g. an unsigned int as well.

--
components: C API
messages: 389133
nosy: maxbachmann
priority: normal
severity: normal
status: open
title: PyUnicode_KIND macro does not has specified return type
type: behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43452] Microoptimize PyType_Lookup for cache hits

2021-03-19 Thread Dino Viehland



Dino Viehland  added the comment:

Setup a micro-benchmark, foo.c:

#define PY_SSIZE_T_CLEAN
#include 
#include 
#include 

int
main(int argc, char *argv[])
{
wchar_t *program = Py_DecodeLocale(argv[0], NULL);
if (program == NULL) {
fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
exit(1);
}
Py_SetProgramName(program);  /* optional but recommended */
Py_Initialize();
PyObject *pName = PyUnicode_DecodeFSDefault("foo");
if (pName == NULL) { printf("no foo\n"); PyErr_Print(); }
PyObject *pModule = PyImport_Import(pName);
if (pModule == NULL) { printf("no mod\n"); PyErr_Print(); return 0; }
PyObject *cls = PyObject_GetAttrString(pModule, "C");
if (cls == NULL) { printf("no cls\n"); }
PyObject *fs[20];
for(int i = 0; i<20; i++) {
 char buf[4];
 sprintf(buf, "f%d", i);
 fs[i] = PyUnicode_DecodeFSDefault(buf);
}
for(int i = 0; i<1; i++) {
 for(int j = 0; j<20; j++) {
 if(_PyType_Lookup(cls, fs[j])==NULL) {
printf("Uh oh\n");
 }
 }
}

   if (Py_FinalizeEx() < 0) {
exit(120);
}
PyMem_RawFree(program);
return 0;
}


Lib/foo.py:
import time


class C:
pass

for i in range(20):
setattr(C, f"f{i}", lambda self: None)


obj hash: 0m6.222s
str hash: 0m6.327s
baseline: 0m6.784s

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43535] Make str.join auto-convert inputs to strings.

2021-03-19 Thread Terry J. Reedy



Terry J. Reedy  added the comment:

I am sympathetic to the 'hiding bugs' argument in general, but what bugs would 
this proposal hide?  What bugs does print hide by auto-converting non-strings 
to strings?

I recently had the same thought as Raymond's: "it would be nice if str.join 
converted inputs to strings when needed."

I have always known that print() is slower in IDLE than in a console.  A recent 
SO question 
https://stackoverflow.com/questions/66286367/why-is-my-function-faster-than-pythons-print-function-in-idle
 showed that it could be 20X slower and asked why?  It turns out that while

print(*values, sep=sep, end=end, file=file) # is equivalent to 
file.write(sep.join(map(str, values))+end)

print must be implemented as the C equivalent of something like

first=True
for val in values:
if first:
first = False
else
file.write(sep)
file.write(str(value))
file.write(end)

When sys.stdout is a screen buffer, the multiple writes effectively implement a 
join.  But in IDLE, each write(s) results in a separate socket.send(s.encode) 
and socket.receive).decode + text.insert(s, tag).  I discovered that removing 
nearly all the overhead from the very slow example with sep.join and end.join 
made the example only trivially slower on IDLE (5%) than the standard REPL.  In 
#43283 I added the option of speedups using .join and .format to the IDLE doc, 
but this workaround would be much more usable if map(str, x) were not needed.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

2021-03-19 Thread Eryk Sun



Eryk Sun  added the comment:

> But please discuss it in another issue.

What's returned by locale.get_locale_encoding() and 
locale.get_current_locale_encoding() is relevant to adding them as new 
functions and is a chance to implement this correctly in Windows. 

You're right that what open() does for encoding="locale" is a separate issue, 
with backwards compatibility problems. I think it was implemented badly and 
needlessly inconsistent with POSIX. But we may be stuck with the behavior 
considering scripts are within their rights, per documented behavior, to expect 
that calling setlocale(LC_CTYPE, locale_name) in Windows has no effect on the 
result of locale.getpreferredencoding(False), unlike POSIX generally, except 
for some platforms such as macOS and Android.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43544] mimetype default list make a wrong guess for illustrator file

2021-03-19 Thread Terry J. Reedy



Change by Terry J. Reedy :


--
versions:  -Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

1 2 >

1 - 100 of 115 matches

Mail list logo