[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

Pablo Galindo Salgado Wed, 21 Oct 2020 20:21:52 -0700


New submission from Pablo Galindo Salgado <pablog...@gmail.com>:


After https://bugs.python.org/issue42093 and https://bugs.python.org/issue26219 
is being clear that we can leverage some cache for different information in the 
evaluation loop to speed up CPython. This observation is also based on the fact 
that although Python is dynamic, there is plenty of code that does not exercise 
said dynamism and therefore factoring out the "dynamic" parts of the execution 
by using a cache mechanism can yield excellent results. 

So far we have two big improvements in performance for caching LOAD_ATTR and 
LOAD_GLOBAL (in some cases up to 10-14%) but I think we can do much much more. 
Here are some observations of what I think we can do:

* Instead of adding more caches using the current mechanism, which adds some 
inlined code in every opcode in the evaluation loop, we can try to formalize 
some kind of caching mechanism that has some better API that will allow adding 
more opcodes in the future. Having the code inline in ceval.c is going to 
become difficult to maintain if we keep adding more stuff directly there.

* Instead of handling the specialization in the same opcode as the original one 
(LOAD_ATTR is doing the slow and the fast path) we could mutate the original 
code object and replacing the slow and generic opcodes for the more specialized 
ones and these will also be in charge of changing it back to the generic and 
slow ones if the assumptions that activated them appear.

Obviously, mutating code objects is scary, so we could have some "specialized" 
version of the bytecode in the cache and use that if is present. Ideas that we 
could do with this cached stuff:

- For binary operators, we can grab both operands, resolve the addition 
function and cache that together with the types and the version tags and if the 
types and the version tags are the same, use directly the addition function 
instead of resolving it.

- For loading methods, we could cache the bound method as proposed by Yury 
originally here: 
https://mail.python.org/pipermail/python-dev/2016-January/142945.html.

- We could also do the same for operations like "some_container[]" if the 
container is some builtin. We can substitute/specialize the opcode for someone 
that directly uses built-in operations instead of the generic BINARY_SUBSCR.

The plan will be:

- Making some infrastructure/framework for the caching that allows us to 
optimize/deoptimize individual opcodes.
- Refactor the existing specialization for LOAD_GLOBAL/LOAD_ATTR to use said 
infrastructure.
- Thinking of what operations could benefit from specialization and start 
adding them one by one.

----------
components: C API
messages: 379272
nosy: Mark.Shannon, methane, nascheme, pablogsal, vstinner, yselivanov
priority: normal
severity: normal
status: open
title: Caching infrastructure for the evaluation loop: specialised opcodes
versions: Python 3.10

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42115>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue42115] Caching infrastructure for the evaluation loop: specialised opcodes

Reply via email to