[Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Ben Hoyt
Hi folks,

I was looking at some `dis` output today, and I was wondering if anyone has
investigated optimizing Python (slightly) by adding special-case bytecodes
for common expressions or statements involving constants?

For example, I (and, based on a quick grep of the stdlib, many others)
write "x is None" and "x is not None" very often. Or "return True" or
"return None" or "return 1" and things like that. These all expand into two
bytecodes, which seems pretty non-optimal (LOAD_CONST + COMPARE_OP or
LOAD_CONST + RETURN_VALUE). It seems we could get an easy speedup for these
common cases by adding a peephole optimization and some new opcodes (maybe
COMPARE_IS_SMALL_CONST and RETURN_SMALL_CONST for these cases).

I'm not proposing to do this yet, as I'd need to benchmark to see how much
of a gain (if any) it would amount to, but I'm just wondering if there's
any previous work on this kind of thing. Or, if not, any other thoughts
before I try it?

Thanks,
Ben
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Stephane Wirtel via Python-Dev

there is the Peephole for that, it's a small optimizer in the compiler.

On 05/24/2017 08:07 PM, Ben Hoyt wrote:

Hi folks,

I was looking at some `dis` output today, and I was wondering if anyone has
investigated optimizing Python (slightly) by adding special-case bytecodes
for common expressions or statements involving constants?

For example, I (and, based on a quick grep of the stdlib, many others)
write "x is None" and "x is not None" very often. Or "return True" or
"return None" or "return 1" and things like that. These all expand into two
bytecodes, which seems pretty non-optimal (LOAD_CONST + COMPARE_OP or
LOAD_CONST + RETURN_VALUE). It seems we could get an easy speedup for these
common cases by adding a peephole optimization and some new opcodes (maybe
COMPARE_IS_SMALL_CONST and RETURN_SMALL_CONST for these cases).

I'm not proposing to do this yet, as I'd need to benchmark to see how much
of a gain (if any) it would amount to, but I'm just wondering if there's
any previous work on this kind of thing. Or, if not, any other thoughts
before I try it?

Thanks,
Ben



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/stephane%40wirtel.be


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Xavier Morel

> On 2017-05-24, at 20:07 , Ben Hoyt  wrote:
> 
> Hi folks,
> 
> I was looking at some `dis` output today, and I was wondering if anyone has 
> investigated optimizing Python (slightly) by adding special-case bytecodes 
> for common expressions or statements involving constants?

Python 3.6 added an opcode specifically for dicts with constant keys:
https://bugs.python.org/issue27140. Though I guess that's a slightly
different case as it's not a peephole-fused opcode.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Xavier Morel

> On 2017-05-24, at 20:26 , Xavier Morel  wrote:
> 
>> On 2017-05-24, at 20:07 , Ben Hoyt  wrote:
>> 
>> Hi folks,
>> 
>> I was looking at some `dis` output today, and I was wondering if anyone has 
>> investigated optimizing Python (slightly) by adding special-case bytecodes 
>> for common expressions or statements involving constants?
> 
> Python 3.6 added an opcode specifically for dicts with constant keys:
> https://bugs.python.org/issue27140. Though I guess that's a slightly
> different case as it's not a peephole-fused opcode.

And followup, Python 2.7 did add fused opcodes related to conditional: 
http://bugs.python.org/issue4715.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Erik

Hi Ben,

On 24/05/17 19:07, Ben Hoyt wrote:
I'm not proposing to do this yet, as I'd need to benchmark to see how 
much of a gain (if any) it would amount to, but I'm just wondering if 
there's any previous work on this kind of thing. Or, if not, any other 
thoughts before I try it?


This is exactly what I looked into just over a year ago. As Stephane 
suggests, I did this by adding new opcodes that the peephole optimizer 
generated and the interpreter loop understood (but the compiler itself 
did not need to know anything about these new opcodes, so that makes 
things much easier).




Adding new opcodes like this at the time wasn't straightforward because 
of issues with the build process (see this thread: 
https://mail.python.org/pipermail/python-dev/2015-December/142600.html - 
it starts out as a question about the bytecode format but ended up with 
some very useful information on the build process).


Note that since that thread, a couple of things have changed - the 
bytecode is now wordcode so some of my original questions aren't 
relevant, and some of the things I had a problem with in the build 
system are now auto-generated with a new 'make' target. So it _should_ 
be easier now than it was then.




In terms of the results I got once I had things building and running, I 
didn't manage to find any particular magic bullets that gave me a 
significant enough speedup. Perhaps I just didn't pick the right opcode 
sequences or the right test cases (though what I was trying to do was 
quite successful in terms of doing things like replacing 
branches-to-RETURN into a single RETURN - so LOAD_CONST/RETURN_VALUE 
became RETURN_CONST and therefore if the target of an unconditional 
branch was to a RETURN_CONST op, the branch op could be replaced by the 
RETURN_CONST).


I figured that one thing every function or method needs to do is return, 
so I tried to make that more efficient. I only had two weeks to spend on 
it though ...


I was trying to do that by avoiding trips-around-the-interpreter-loop as 
that was historically something that would give speedups. However, with 
the new computed-goto version of the interpreter I came to the 
conclusion that it's not at important as it used to be. I was building 
with gcc though and what I *didn't* do was disable the computed-goto 
code (it's controlled by a #define) to see if my changes improved 
performance on platforms that can't use it.



I have other opcode sequences that I identified that might be useful to 
look at further.


I didn't (and still don't) have enough bandwidth to *drive* something 
like this through though, but if you want to do that I'd be more than 
happy to be kept in the loop on what you're doing and can possibly find 
time to write some code too.


Regards, E.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Ben Hoyt
Interesting -- thanks very much, Erik. Yeah, I wonder if the wordcode will
change things too. But interesting to know you got not much of a speedup!
Also, the "x is None" and "x is not None" were the opcodes I started out
thinking would be helpful, so I'm interested to try those.

I'm wondering if the simplest way to test would be to add the new opcodes
and handle them in ceval.c, but then do all the bytecode manipulation in a
pure Python peephole optimizer ("bytecode munger"). Then if it worked we
could put those optimizations in the compiler or peephole.c.

-Ben

On Wed, May 24, 2017 at 4:14 PM, Erik  wrote:

> Hi Ben,
>
> On 24/05/17 19:07, Ben Hoyt wrote:
>
>> I'm not proposing to do this yet, as I'd need to benchmark to see how
>> much of a gain (if any) it would amount to, but I'm just wondering if
>> there's any previous work on this kind of thing. Or, if not, any other
>> thoughts before I try it?
>>
>
> This is exactly what I looked into just over a year ago. As Stephane
> suggests, I did this by adding new opcodes that the peephole optimizer
> generated and the interpreter loop understood (but the compiler itself did
> not need to know anything about these new opcodes, so that makes things
> much easier).
>
>
>
> Adding new opcodes like this at the time wasn't straightforward because of
> issues with the build process (see this thread:
> https://mail.python.org/pipermail/python-dev/2015-December/142600.html -
> it starts out as a question about the bytecode format but ended up with
> some very useful information on the build process).
>
> Note that since that thread, a couple of things have changed - the
> bytecode is now wordcode so some of my original questions aren't relevant,
> and some of the things I had a problem with in the build system are now
> auto-generated with a new 'make' target. So it _should_ be easier now than
> it was then.
>
>
>
> In terms of the results I got once I had things building and running, I
> didn't manage to find any particular magic bullets that gave me a
> significant enough speedup. Perhaps I just didn't pick the right opcode
> sequences or the right test cases (though what I was trying to do was quite
> successful in terms of doing things like replacing branches-to-RETURN into
> a single RETURN - so LOAD_CONST/RETURN_VALUE became RETURN_CONST and
> therefore if the target of an unconditional branch was to a RETURN_CONST
> op, the branch op could be replaced by the RETURN_CONST).
>
> I figured that one thing every function or method needs to do is return,
> so I tried to make that more efficient. I only had two weeks to spend on it
> though ...
>
> I was trying to do that by avoiding trips-around-the-interpreter-loop as
> that was historically something that would give speedups. However, with the
> new computed-goto version of the interpreter I came to the conclusion that
> it's not at important as it used to be. I was building with gcc though and
> what I *didn't* do was disable the computed-goto code (it's controlled by a
> #define) to see if my changes improved performance on platforms that can't
> use it.
>
>
> I have other opcode sequences that I identified that might be useful to
> look at further.
>
> I didn't (and still don't) have enough bandwidth to *drive* something like
> this through though, but if you want to do that I'd be more than happy to
> be kept in the loop on what you're doing and can possibly find time to
> write some code too.
>
> Regards, E.
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%
> 40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Mark Shannon

On 24/05/17 11:07, Ben Hoyt wrote:

Hi folks,

I was looking at some `dis` output today, and I was wondering if anyone
has investigated optimizing Python (slightly) by adding special-case
bytecodes for common expressions or statements involving constants?


[snip]

Hi Ben,

What you are suggesting is an ad-hoc form of what is known as "static 
superinstructions".

Take a look at http://www.complang.tuwien.ac.at/projects/interpreters.html

Cheers,
Mark.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Erik

On 24/05/17 21:56, Ben Hoyt wrote:
But interesting to know you got not much of a 
speedup! 


I think that improvements at the hardware level in terms of 
parallelizing instruction and data fetching (and branch prediction) in 
even the cheapest processors these days have largely trivialized the 
amount of time it take the interpreter loop to read another opcode and 
branch to the code that executes it.


I think that the answer to speeding things up is better algorithms at a 
higher level rather than micro-optimizations.


But it's still fun to _try_ this sort of thing isn't it? ;)

I'm wondering if the simplest way to test would be to add the new 
opcodes and handle them in ceval.c, but then do all the bytecode 
manipulation in a pure Python peephole optimizer ("bytecode munger").


There used to be just such a thing in pure Python (but not with _new_ 
opcodes, obviously) - Skip Montanaro wrote it IIRC. Probably about 1997 
or so. I think that may be where the current peephole optimizer 
originated (the way it works is certainly similar to the Python version 
I remember experimenting with back then).


E.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 544: Protocols - second round

2017-05-24 Thread Ivan Levkivskyi
Hi all,

After collecting suggestions in the previous discussion on python-dev
https://mail.python.org/pipermail/python-dev/2017-March/thread.html#147629
and playing with implementation, here is an updated version of PEP 544.

--
Ivan


A link for those who don't like reading long e-mails:
https://www.python.org/dev/peps/pep-0544/

=

PEP: 544
Title: Protocols
Version: $Revision$
Last-Modified: $Date$
Author: Ivan Levkivskyi , Jukka Lehtosalo <
[email protected]>, Ɓukasz Langa 
Discussions-To: Python-Dev 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Mar-2017
Python-Version: 3.7


Abstract


Type hints introduced in PEP 484 can be used to specify type metadata
for static type checkers and other third party tools. However, PEP 484
only specifies the semantics of *nominal* subtyping. In this PEP we specify
static and runtime semantics of protocol classes that will provide a support
for *structural* subtyping (static duck typing).


.. _rationale:

Rationale and Goals
===

Currently, PEP 484 and the ``typing`` module [typing]_ define abstract
base classes for several common Python protocols such as ``Iterable`` and
``Sized``. The problem with them is that a class has to be explicitly marked
to support them, which is unpythonic and unlike what one would
normally do in idiomatic dynamically typed Python code. For example,
this conforms to PEP 484::

  from typing import Sized, Iterable, Iterator

  class Bucket(Sized, Iterable[int]):
  ...
  def __len__(self) -> int: ...
  def __iter__(self) -> Iterator[int]: ...

The same problem appears with user-defined ABCs: they must be explicitly
subclassed or registered. This is particularly difficult to do with library
types as the type objects may be hidden deep in the implementation
of the library. Also, extensive use of ABCs might impose additional
runtime costs.

The intention of this PEP is to solve all these problems
by allowing users to write the above code without explicit base classes in
the class definition, allowing ``Bucket`` to be implicitly considered
a subtype of both ``Sized`` and ``Iterable[int]`` by static type checkers
using structural [wiki-structural]_ subtyping::

  from typing import Iterator, Iterable

  class Bucket:
  ...
  def __len__(self) -> int: ...
  def __iter__(self) -> Iterator[int]: ...

  def collect(items: Iterable[int]) -> int: ...
  result: int = collect(Bucket())  # Passes type check

Note that ABCs in ``typing`` module already provide structural behavior
at runtime, ``isinstance(Bucket(), Iterable)`` returns ``True``.
The main goal of this proposal is to support such behavior statically.
The same functionality will be provided for user-defined protocols, as
specified below. The above code with a protocol class matches common Python
conventions much better. It is also automatically extensible and works
with additional, unrelated classes that happen to implement
the required protocol.


Nominal vs structural subtyping
---

Structural subtyping is natural for Python programmers since it matches
the runtime semantics of duck typing: an object that has certain properties
is treated independently of its actual runtime class.
However, as discussed in PEP 483, both nominal and structural
subtyping have their strengths and weaknesses. Therefore, in this PEP we
*do not propose* to replace the nominal subtyping described by PEP 484 with
structural subtyping completely. Instead, protocol classes as specified in
this PEP complement normal classes, and users are free to choose
where to apply a particular solution. See section on `rejected`_ ideas at
the
end of this PEP for additional motivation.


Non-goals
-

At runtime, protocol classes will be simple ABCs. There is no intent to
provide sophisticated runtime instance and class checks against protocol
classes. This would be difficult and error-prone and will contradict the
logic
of PEP 484. As well, following PEP 484 and PEP 526 we state that protocols
are
**completely optional**:

* No runtime semantics will be imposed for variables or parameters annotated
  with a protocol class.
* Any checks will be performed only by third-party type checkers and
  other tools.
* Programmers are free to not use them even if they use type annotations.
* There is no intent to make protocols non-optional in the future.


Existing Approaches to Structural Subtyping
===

Before describing the actual specification, we review and comment on
existing
approaches related to structural subtyping in Python and other languages:

* ``zope.interface`` [zope-interfaces]_ was one of the first widely used
  approaches to structural subtyping in Python. It is implemented by
providing
  special classes to distinguish interface classes from normal classes,
  to mark interface attributes, and to explicitly declare implementation.
  For example::


Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-24 Thread Greg Ewing

When testing things like this, as well as testing whether it speeds
up your target cases, remember to check that it doesn't slow everything
else down due to the increased size of the eval code pushing something
out of instruction cache or some such effect.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com