[Python-Dev] Micro-optimizations by adding special-case bytecodes?
Hi folks, I was looking at some `dis` output today, and I was wondering if anyone has investigated optimizing Python (slightly) by adding special-case bytecodes for common expressions or statements involving constants? For example, I (and, based on a quick grep of the stdlib, many others) write "x is None" and "x is not None" very often. Or "return True" or "return None" or "return 1" and things like that. These all expand into two bytecodes, which seems pretty non-optimal (LOAD_CONST + COMPARE_OP or LOAD_CONST + RETURN_VALUE). It seems we could get an easy speedup for these common cases by adding a peephole optimization and some new opcodes (maybe COMPARE_IS_SMALL_CONST and RETURN_SMALL_CONST for these cases). I'm not proposing to do this yet, as I'd need to benchmark to see how much of a gain (if any) it would amount to, but I'm just wondering if there's any previous work on this kind of thing. Or, if not, any other thoughts before I try it? Thanks, Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
there is the Peephole for that, it's a small optimizer in the compiler. On 05/24/2017 08:07 PM, Ben Hoyt wrote: Hi folks, I was looking at some `dis` output today, and I was wondering if anyone has investigated optimizing Python (slightly) by adding special-case bytecodes for common expressions or statements involving constants? For example, I (and, based on a quick grep of the stdlib, many others) write "x is None" and "x is not None" very often. Or "return True" or "return None" or "return 1" and things like that. These all expand into two bytecodes, which seems pretty non-optimal (LOAD_CONST + COMPARE_OP or LOAD_CONST + RETURN_VALUE). It seems we could get an easy speedup for these common cases by adding a peephole optimization and some new opcodes (maybe COMPARE_IS_SMALL_CONST and RETURN_SMALL_CONST for these cases). I'm not proposing to do this yet, as I'd need to benchmark to see how much of a gain (if any) it would amount to, but I'm just wondering if there's any previous work on this kind of thing. Or, if not, any other thoughts before I try it? Thanks, Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/stephane%40wirtel.be ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
> On 2017-05-24, at 20:07 , Ben Hoyt wrote: > > Hi folks, > > I was looking at some `dis` output today, and I was wondering if anyone has > investigated optimizing Python (slightly) by adding special-case bytecodes > for common expressions or statements involving constants? Python 3.6 added an opcode specifically for dicts with constant keys: https://bugs.python.org/issue27140. Though I guess that's a slightly different case as it's not a peephole-fused opcode. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
> On 2017-05-24, at 20:26 , Xavier Morel wrote: > >> On 2017-05-24, at 20:07 , Ben Hoyt wrote: >> >> Hi folks, >> >> I was looking at some `dis` output today, and I was wondering if anyone has >> investigated optimizing Python (slightly) by adding special-case bytecodes >> for common expressions or statements involving constants? > > Python 3.6 added an opcode specifically for dicts with constant keys: > https://bugs.python.org/issue27140. Though I guess that's a slightly > different case as it's not a peephole-fused opcode. And followup, Python 2.7 did add fused opcodes related to conditional: http://bugs.python.org/issue4715. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
Hi Ben, On 24/05/17 19:07, Ben Hoyt wrote: I'm not proposing to do this yet, as I'd need to benchmark to see how much of a gain (if any) it would amount to, but I'm just wondering if there's any previous work on this kind of thing. Or, if not, any other thoughts before I try it? This is exactly what I looked into just over a year ago. As Stephane suggests, I did this by adding new opcodes that the peephole optimizer generated and the interpreter loop understood (but the compiler itself did not need to know anything about these new opcodes, so that makes things much easier). Adding new opcodes like this at the time wasn't straightforward because of issues with the build process (see this thread: https://mail.python.org/pipermail/python-dev/2015-December/142600.html - it starts out as a question about the bytecode format but ended up with some very useful information on the build process). Note that since that thread, a couple of things have changed - the bytecode is now wordcode so some of my original questions aren't relevant, and some of the things I had a problem with in the build system are now auto-generated with a new 'make' target. So it _should_ be easier now than it was then. In terms of the results I got once I had things building and running, I didn't manage to find any particular magic bullets that gave me a significant enough speedup. Perhaps I just didn't pick the right opcode sequences or the right test cases (though what I was trying to do was quite successful in terms of doing things like replacing branches-to-RETURN into a single RETURN - so LOAD_CONST/RETURN_VALUE became RETURN_CONST and therefore if the target of an unconditional branch was to a RETURN_CONST op, the branch op could be replaced by the RETURN_CONST). I figured that one thing every function or method needs to do is return, so I tried to make that more efficient. I only had two weeks to spend on it though ... I was trying to do that by avoiding trips-around-the-interpreter-loop as that was historically something that would give speedups. However, with the new computed-goto version of the interpreter I came to the conclusion that it's not at important as it used to be. I was building with gcc though and what I *didn't* do was disable the computed-goto code (it's controlled by a #define) to see if my changes improved performance on platforms that can't use it. I have other opcode sequences that I identified that might be useful to look at further. I didn't (and still don't) have enough bandwidth to *drive* something like this through though, but if you want to do that I'd be more than happy to be kept in the loop on what you're doing and can possibly find time to write some code too. Regards, E. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
Interesting -- thanks very much, Erik. Yeah, I wonder if the wordcode will
change things too. But interesting to know you got not much of a speedup!
Also, the "x is None" and "x is not None" were the opcodes I started out
thinking would be helpful, so I'm interested to try those.
I'm wondering if the simplest way to test would be to add the new opcodes
and handle them in ceval.c, but then do all the bytecode manipulation in a
pure Python peephole optimizer ("bytecode munger"). Then if it worked we
could put those optimizations in the compiler or peephole.c.
-Ben
On Wed, May 24, 2017 at 4:14 PM, Erik wrote:
> Hi Ben,
>
> On 24/05/17 19:07, Ben Hoyt wrote:
>
>> I'm not proposing to do this yet, as I'd need to benchmark to see how
>> much of a gain (if any) it would amount to, but I'm just wondering if
>> there's any previous work on this kind of thing. Or, if not, any other
>> thoughts before I try it?
>>
>
> This is exactly what I looked into just over a year ago. As Stephane
> suggests, I did this by adding new opcodes that the peephole optimizer
> generated and the interpreter loop understood (but the compiler itself did
> not need to know anything about these new opcodes, so that makes things
> much easier).
>
>
>
> Adding new opcodes like this at the time wasn't straightforward because of
> issues with the build process (see this thread:
> https://mail.python.org/pipermail/python-dev/2015-December/142600.html -
> it starts out as a question about the bytecode format but ended up with
> some very useful information on the build process).
>
> Note that since that thread, a couple of things have changed - the
> bytecode is now wordcode so some of my original questions aren't relevant,
> and some of the things I had a problem with in the build system are now
> auto-generated with a new 'make' target. So it _should_ be easier now than
> it was then.
>
>
>
> In terms of the results I got once I had things building and running, I
> didn't manage to find any particular magic bullets that gave me a
> significant enough speedup. Perhaps I just didn't pick the right opcode
> sequences or the right test cases (though what I was trying to do was quite
> successful in terms of doing things like replacing branches-to-RETURN into
> a single RETURN - so LOAD_CONST/RETURN_VALUE became RETURN_CONST and
> therefore if the target of an unconditional branch was to a RETURN_CONST
> op, the branch op could be replaced by the RETURN_CONST).
>
> I figured that one thing every function or method needs to do is return,
> so I tried to make that more efficient. I only had two weeks to spend on it
> though ...
>
> I was trying to do that by avoiding trips-around-the-interpreter-loop as
> that was historically something that would give speedups. However, with the
> new computed-goto version of the interpreter I came to the conclusion that
> it's not at important as it used to be. I was building with gcc though and
> what I *didn't* do was disable the computed-goto code (it's controlled by a
> #define) to see if my changes improved performance on platforms that can't
> use it.
>
>
> I have other opcode sequences that I identified that might be useful to
> look at further.
>
> I didn't (and still don't) have enough bandwidth to *drive* something like
> this through though, but if you want to do that I'd be more than happy to
> be kept in the loop on what you're doing and can possibly find time to
> write some code too.
>
> Regards, E.
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%
> 40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
On 24/05/17 11:07, Ben Hoyt wrote: Hi folks, I was looking at some `dis` output today, and I was wondering if anyone has investigated optimizing Python (slightly) by adding special-case bytecodes for common expressions or statements involving constants? [snip] Hi Ben, What you are suggesting is an ad-hoc form of what is known as "static superinstructions". Take a look at http://www.complang.tuwien.ac.at/projects/interpreters.html Cheers, Mark. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
On 24/05/17 21:56, Ben Hoyt wrote:
But interesting to know you got not much of a
speedup!
I think that improvements at the hardware level in terms of
parallelizing instruction and data fetching (and branch prediction) in
even the cheapest processors these days have largely trivialized the
amount of time it take the interpreter loop to read another opcode and
branch to the code that executes it.
I think that the answer to speeding things up is better algorithms at a
higher level rather than micro-optimizations.
But it's still fun to _try_ this sort of thing isn't it? ;)
I'm wondering if the simplest way to test would be to add the new
opcodes and handle them in ceval.c, but then do all the bytecode
manipulation in a pure Python peephole optimizer ("bytecode munger").
There used to be just such a thing in pure Python (but not with _new_
opcodes, obviously) - Skip Montanaro wrote it IIRC. Probably about 1997
or so. I think that may be where the current peephole optimizer
originated (the way it works is certainly similar to the Python version
I remember experimenting with back then).
E.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 544: Protocols - second round
Hi all, After collecting suggestions in the previous discussion on python-dev https://mail.python.org/pipermail/python-dev/2017-March/thread.html#147629 and playing with implementation, here is an updated version of PEP 544. -- Ivan A link for those who don't like reading long e-mails: https://www.python.org/dev/peps/pep-0544/ = PEP: 544 Title: Protocols Version: $Revision$ Last-Modified: $Date$ Author: Ivan Levkivskyi , Jukka Lehtosalo < [email protected]>, Ćukasz Langa Discussions-To: Python-Dev Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-Mar-2017 Python-Version: 3.7 Abstract Type hints introduced in PEP 484 can be used to specify type metadata for static type checkers and other third party tools. However, PEP 484 only specifies the semantics of *nominal* subtyping. In this PEP we specify static and runtime semantics of protocol classes that will provide a support for *structural* subtyping (static duck typing). .. _rationale: Rationale and Goals === Currently, PEP 484 and the ``typing`` module [typing]_ define abstract base classes for several common Python protocols such as ``Iterable`` and ``Sized``. The problem with them is that a class has to be explicitly marked to support them, which is unpythonic and unlike what one would normally do in idiomatic dynamically typed Python code. For example, this conforms to PEP 484:: from typing import Sized, Iterable, Iterator class Bucket(Sized, Iterable[int]): ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... The same problem appears with user-defined ABCs: they must be explicitly subclassed or registered. This is particularly difficult to do with library types as the type objects may be hidden deep in the implementation of the library. Also, extensive use of ABCs might impose additional runtime costs. The intention of this PEP is to solve all these problems by allowing users to write the above code without explicit base classes in the class definition, allowing ``Bucket`` to be implicitly considered a subtype of both ``Sized`` and ``Iterable[int]`` by static type checkers using structural [wiki-structural]_ subtyping:: from typing import Iterator, Iterable class Bucket: ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... def collect(items: Iterable[int]) -> int: ... result: int = collect(Bucket()) # Passes type check Note that ABCs in ``typing`` module already provide structural behavior at runtime, ``isinstance(Bucket(), Iterable)`` returns ``True``. The main goal of this proposal is to support such behavior statically. The same functionality will be provided for user-defined protocols, as specified below. The above code with a protocol class matches common Python conventions much better. It is also automatically extensible and works with additional, unrelated classes that happen to implement the required protocol. Nominal vs structural subtyping --- Structural subtyping is natural for Python programmers since it matches the runtime semantics of duck typing: an object that has certain properties is treated independently of its actual runtime class. However, as discussed in PEP 483, both nominal and structural subtyping have their strengths and weaknesses. Therefore, in this PEP we *do not propose* to replace the nominal subtyping described by PEP 484 with structural subtyping completely. Instead, protocol classes as specified in this PEP complement normal classes, and users are free to choose where to apply a particular solution. See section on `rejected`_ ideas at the end of this PEP for additional motivation. Non-goals - At runtime, protocol classes will be simple ABCs. There is no intent to provide sophisticated runtime instance and class checks against protocol classes. This would be difficult and error-prone and will contradict the logic of PEP 484. As well, following PEP 484 and PEP 526 we state that protocols are **completely optional**: * No runtime semantics will be imposed for variables or parameters annotated with a protocol class. * Any checks will be performed only by third-party type checkers and other tools. * Programmers are free to not use them even if they use type annotations. * There is no intent to make protocols non-optional in the future. Existing Approaches to Structural Subtyping === Before describing the actual specification, we review and comment on existing approaches related to structural subtyping in Python and other languages: * ``zope.interface`` [zope-interfaces]_ was one of the first widely used approaches to structural subtyping in Python. It is implemented by providing special classes to distinguish interface classes from normal classes, to mark interface attributes, and to explicitly declare implementation. For example::
Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?
When testing things like this, as well as testing whether it speeds up your target cases, remember to check that it doesn't slow everything else down due to the increased size of the eval code pushing something out of instruction cache or some such effect. -- Greg ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
