Hi Jeroen,
So, there's the "old style" cleandoc output generated by Python 3.12,
a "new style" output for the same function in 3.13 (stripping more
whitespace), and the patch then introduces a third option that
produces the same output under both 3.12 and 3.13 but doesn't match
the "native" output of either one.
That's approximately the story, yes, but let me be pedantic about the
wording "doesn't match". In the sense that it is byte-for-byte
different, that is correct, but in the sense of signature matching, that
is not what goes on.
There's an extra part to the story, as the parser signature must be both
written (for compile time) and checked (at run time).
- the proposed patch outputs a version-independent signature as you say,
- but more importantly, the proposed patch allows both "old style" and
"3.13 native" style signatures to succeed in matching the "new style" by
normalising meaningless whitespace in both signatures. This expands
signature matching to avoid bogus cache-misses.
The latter property is the main point of this work as it means that the
parser signatures already in the archive continue to work.
Short term, adding the patch eliminates the issue introduced by the
changes to the docstring handling in 3.13. Unfortunately, it also
seems to require keeping the patch around after the transition from
3.12 to 3.13 is complete, even though the problem is tied to the
transition between these specific versions. Given that trixie is to
ship with 3.13 as the sole supported Python version (and work on that
seems to be progressing nicely), once 3.12 gets dropped and all
Python packages rebuilt for 3.13-only, the performance issues would
be history even if no action were taken on the ply end.
No, the problem is not confined to the transition period for when we
have two interpreters. There are numerous packages in the archive that
use the "old style" format and will continue to do so - many won't even
rebuild their parser signature at package build time and so this
performance penalty doesn't go away, ever. There's also no intention to
rebuild all packages with Python 3.13 - the vast majority of Python
packages are arch:all and therefore go through the 3.13-only transition
without being touched.
Looking at the cleandoc functions in inspect.py [1] and compile.c
[2], the only documented difference between them seems to be the
latter not removing leading and trailing empty lines.
No, that analysis is incorrect. The Python release notes also indicate
that common whitespace at the front of all lines is also removed from
the docstring. This has been observed in a few places and has caused
other bugs too.
https://docs.python.org/3/whatsnew/3.13.html#other-language-changes
(first item in the "Other language changes" section)
https://github.com/python/cpython/pull/106411/files
(check the use of `lstrip(" ")` in inspect.py)
This is pretty easy to see in the context of `phply.phpparse` which is
where this all started:
$ python3.11 -c "import phply.phpparse;
print(phply.phpparse.p_top_statement_namespace.__doc__)"
top_statement : NAMESPACE namespace_name SEMI
| NAMESPACE LBRACE top_statement_list RBRACE
| NAMESPACE namespace_name LBRACE
top_statement_list RBRACE
$ python3.13 -c "import phply.phpparse;
print(phply.phpparse.p_top_statement_namespace.__doc__)"
top_statement : NAMESPACE namespace_name SEMI
| NAMESPACE LBRACE top_statement_list RBRACE
| NAMESPACE namespace_name LBRACE top_statement_list RBRACE
(and no, email line-wrapping is not helping us here!)
> Is there any
> reason the _normalize function in your patch couldn't mimick 3.13's
> inspect.cleandoc [1], minus the removal of empty lines, to produce
> output compatible with 3.13's compile.c under both 3.12 and 3.13?
> That way, once 3.12 is dropped from the supported Python version in
> Debian, the patch in ply could be dropped as well.
No, the only way of doing that would be to vendor an awful lot of core
Python code and that's an even worse option. There's also no way of
making a function that can re-insert that whitespace as there's no way
of knowing how much whitespace was removed. It's not possible to
unscramble an egg.
Going down that path still requires every single build-rdep of
python3-ply to be re-uploaded to rebuild the parser signature.
So, after 3.13-only, we still have the performance problem...
- My preferred approach would be to patch python-ply so that only one
package needs touching and it's then fixed.
- As an alternative, are you planning to reupload all build-rdeps of
python3-ply to rebuild the parser signatures (and check that they got
rebuilt)? There are 7 packages that include a parser that uses the
default name, and fortunately most of them are team maintained if you
want to go down that route. I have no idea how many change the filename
to track through other build-deps.
thanks
Stuart
--
Stuart Prescott http://www.nanonanonano.net/ stu...@nanonanonano.net
Debian Developer http://www.debian.org/ stu...@debian.org
GPG fingerprint 90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7