Re: [Python-Dev] C99
Ned Deily writes: > But the point I was trying to make is that, by changing the > language requirement, we will likely have an effect on what > platforms (across the board) and versions we support and we should > take that into account when making this decision. It may be the > right decision, in balance, to drop support for some of these but > we shouldn't do it by accident. Sure, you were clear enough about that. My point was simply that at least for older Macs it probably is not that big a problem (but I do have a Panther still running, at least as of March it did ;-). Similarly, for platforms where we build with GCC, many of these features have been available for a long time with switches. Adding it all up, we don't want to break anybody inadvertantly and we should take care to fix what breakage we can in advance, but I think it's time to allow at least some of these features, and maybe move to C99 wholesale. Steve ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Review request: issue 10910, pyport.h causes trouble for C++ extensions on BSDs
Hi python-dev! I'm a maintainer for Homebrew, a third-party package manager for OS X, where I'm the resident parseltongue. Issue 10910 is related to problems building CPython extension modules with C++ code on BSDs. As I understand it, pyport.h has some code specific to BSD and friends, including OS X, which causes conflicts with the C++ stdlib. We've been carrying the patch Ronald Oussoren wrote in 2011 [2] against Python 2.7 since olden times. We were recently prompted to add the patch to our 3.5 package as well [3] because the bug was causing build problems in the wild. [4] We strive to apply as few patches as possible in Homebrew and we (I) would love to see a fix for this deployed upstream. Can I do anything to help code get checked in? Thanks, Tim [1] https://bugs.python.org/issue10910 [2] https://bugs.python.org/issue10910#msg135414 [3] https://github.com/Homebrew/homebrew-core/pull/3396 [4] https://github.com/IntelPNI/brainiak/pull/82 -- Tim Smith Freenode: tdsmith, #machomebrew https://tds.xyz, https://github.com/tdsmith ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python-committers] Failed to build select
On Aug 8, 2016, at 02:45, Steven D'Aprano wrote: > On Mon, Aug 08, 2016 at 12:17:21AM -0400, Ned Deily wrote: >> Also, try without setting PYTHONHOME. I'm not sure what you're trying to do >> by setting that but you shouldn't need to. > I didn't think I needed to either, but when I try without it, I get: > > Could not find platform dependent libraries > Consider setting $PYTHONHOME to [:] > Could not find platform dependent libraries > Consider setting $PYTHONHOME to [:] On Aug 8, 2016, at 03:25, Chris Jerdonek wrote: > FWIW, I would be interested in learning more about the above warning > (its significance, how it can be addressed, whether it can be ignored, > etc). I also get this message when installing 3.5.2 from source on > Ubuntu 14.04. Those messages are harmless and are generated by the Makefile steps that update Importlib's bootstrap files, Python/importlib_external.h and Python/importlib_external.h. See http://bugs.python.org/issue14928 for the origins of this. It should be possible to fix the Makefile to suppress those messages. I suggest you open an issue about it. -- Ned Deily [email protected] -- [] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Review request: issue 10910, pyport.h causes trouble for C++ extensions on BSDs
On Mon, 8 Aug 2016 at 08:10 tdsmith wrote: > Hi python-dev! I'm a maintainer for Homebrew, a third-party package > manager for OS X, where I'm the resident parseltongue. > > Issue 10910 is related to problems building CPython extension modules > with C++ code on BSDs. As I understand it, pyport.h has some code > specific to BSD and friends, including OS X, which causes conflicts > with the C++ stdlib. > > We've been carrying the patch Ronald Oussoren wrote in 2011 [2] > against Python 2.7 since olden times. We were recently prompted to add > the patch to our 3.5 package as well [3] because the bug was causing > build problems in the wild. [4] > > We strive to apply as few patches as possible in Homebrew and we (I) > would love to see a fix for this deployed upstream. Can I do anything > to help code get checked in? > The trick is someone feeling up to the task of knowing enough C, C++, and what's happening on OS X/BSD to validate the patch and apply it. Usually that's Ronald or Ned and Ronald never applied his patch, so I guess that leaves Ned. :) If Ned doesn't have the time to look then just ping the issue in a week and I will apply it since both you and FreeBSD are already carrying the patch forward. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python-committers] Failed to build select
On Mon, Aug 8, 2016 at 8:59 AM, Ned Deily wrote: > On Aug 8, 2016, at 02:45, Steven D'Aprano wrote: >> >> Could not find platform dependent libraries >> Consider setting $PYTHONHOME to [:] >> Could not find platform dependent libraries >> Consider setting $PYTHONHOME to [:] > > On Aug 8, 2016, at 03:25, Chris Jerdonek wrote: >> FWIW, I would be interested in learning more about the above warning >> (its significance, how it can be addressed, whether it can be ignored, >> etc). I also get this message when installing 3.5.2 from source on >> Ubuntu 14.04. > > Those messages are harmless and are generated by the Makefile steps that > update Importlib's bootstrap files, Python/importlib_external.h and > Python/importlib_external.h. See http://bugs.python.org/issue14928 for the > origins of this. It should be possible to fix the Makefile to suppress those > messages. I suggest you open an issue about it. I created an issue for this here: http://bugs.python.org/issue27713 --Chris ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Rewrite @contextlib.contextmanager in C
import timeit
import contextlib
@contextlib.contextmanager
def ctx1():
yield
class ctx2:
def __enter__(self):
pass
def __exit__(self, *args):
pass
t1 = timeit.timeit("with ctx1(): pass", setup="from __main__ import ctx1")
t2 = timeit.timeit("with ctx2(): pass", setup="from __main__ import ctx2")
print("%.3f secs" % t1)
print("%.3f secs" % t2)
print("slowdown: -%.2fx" % (t1 / t2))
...with Python 3.5:
1.938 secs
0.443 secs
slowdown: -4.37x
I wanted to give it a try rewriting this in C but since @contextmanager has
a lot of magic I wanted to ask first whether this 1) is technically
possible 2) is desirable.
Thoughts?
--
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite @contextlib.contextmanager in C
On 2016-08-08 3:33 PM, Giampaolo Rodola' wrote: I wanted to give it a try rewriting this in C but since @contextmanager has a lot of magic I wanted to ask first whether this 1) is technically possible 2) is desirable. It is definitely technologically possible. However, the C implementation will be quite complex, and will require a lot of time to review and later maintain. My question would be how critical is the performance of @contextmanager? I'd say that unless it's used in a tight loop it can't affect the performance too much. Yury ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite @contextlib.contextmanager in C
I think Nick would be interested in understanding why this is the case. What does the decorator do that could be so expensive? On Mon, Aug 8, 2016 at 1:07 PM, Yury Selivanov wrote: > > > On 2016-08-08 3:33 PM, Giampaolo Rodola' wrote: > >> I wanted to give it a try rewriting this in C but since @contextmanager >> has a lot of magic I wanted to ask first whether this 1) is technically >> possible 2) is desirable. >> > > It is definitely technologically possible. However, the C implementation > will be quite complex, and will require a lot of time to review and later > maintain. My question would be how critical is the performance of > @contextmanager? I'd say that unless it's used in a tight loop it can't > affect the performance too much. > > Yury > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite @contextlib.contextmanager in C
On Mon, Aug 8, 2016 at 10:07 PM, Yury Selivanov wrote: > > > On 2016-08-08 3:33 PM, Giampaolo Rodola' wrote: > >> I wanted to give it a try rewriting this in C but since @contextmanager >> has a lot of magic I wanted to ask first whether this 1) is technically >> possible 2) is desirable. >> > > It is definitely technologically possible. However, the C implementation > will be quite complex, and will require a lot of time to review and later > maintain. My question would be how critical is the performance of > @contextmanager? I'd say that unless it's used in a tight loop it can't > affect the performance too much. > Yeah, the (my) use case is exactly that (tight loops). -- Giampaolo - http://grodola.blogspot.com ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite @contextlib.contextmanager in C
On 2016-08-08 4:18 PM, Guido van Rossum wrote: I think Nick would be interested in understanding why this is the case. What does the decorator do that could be so expensive? From the looks of it it doesn't do anything special. Although with @contextlib.contextmanager we have to instantiate a generator (the decorated one) and advance it in __enter__. So it's an extra object instantiation + extra code in __enter__ and __exit__. Anyways, Nick knows much more about that code. Giampaolo, before experimenting with a C implementation, I suggest you to try to compile contextlib.py with Cython. I'll be surprised if you can make it more than 30-40% faster. And you won't get much faster than Cython when you code contextmanager in C by hand. Also, we don't have slots for __enter__ and __exit__, so there is no way to avoid the attribute lookup. Yury ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite @contextlib.contextmanager in C
On 8/8/2016 22:38, Yury Selivanov wrote: On 2016-08-08 4:18 PM, Guido van Rossum wrote: I think Nick would be interested in understanding why this is the case. What does the decorator do that could be so expensive? From the looks of it it doesn't do anything special. Although with @contextlib.contextmanager we have to instantiate a generator (the decorated one) and advance it in __enter__. So it's an extra object instantiation + extra code in __enter__ and __exit__. Anyways, Nick knows much more about that code. Right, I think a fairer comparison would be to: class ctx2: def __enter__(self): self.it = iter(self) return next(self.it) def __exit__(self, *args): try: next(self.it) except StopIteration: pass def __iter__(self): yield With this change alone the slowdown diminishes to ~ 1.7x for me. The rest is probably the extra overhead for being able to pass exceptions raised inside the with block back into the generator and such. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rewrite @contextlib.contextmanager in C
On Tue, Aug 9, 2016 at 7:14 AM, Wolfgang Maier
wrote:
> Right, I think a fairer comparison would be to:
>
> class ctx2:
> def __enter__(self):
> self.it = iter(self)
> return next(self.it)
>
> def __exit__(self, *args):
> try:
> next(self.it)
> except StopIteration:
> pass
>
> def __iter__(self):
> yield
>
> With this change alone the slowdown diminishes to ~ 1.7x for me. The rest is
> probably the extra overhead for being able to pass exceptions raised inside
> the with block back into the generator and such.
I played around with a few other variants to see where the slowdown
is. They all work out pretty much the same as the above; my two
examples are both used the same way as contextlib.contextmanager is,
but are restrictive on what you can do.
import timeit
import contextlib
import functools
class ctx1:
def __enter__(self):
pass
def __exit__(self, *args):
pass
@contextlib.contextmanager
def ctx2():
yield
class SimplerContextManager:
"""Like contextlib._GeneratorContextManager but way simpler.
* Doesn't reinstantiate itself - just reinvokes the generator
* Doesn't allow yielded objects (returns self)
* Lacks a lot of error checking. USE ONLY AS DIRECTED.
"""
def __init__(self, func):
self.func = func
functools.update_wrapper(self, func)
def __call__(self, *a, **kw):
self.gen = self.func(*a, **kw)
return self
def __enter__(self):
next(self.gen)
return self
def __exit__(self, type, value, traceback):
if type is None:
try: next(self.gen)
except StopIteration: return
else: raise RuntimeError("generator didn't stop")
try: self.gen.throw(type, value, traceback)
except StopIteration: return True
# Assume any instance of the same exception type is a proper reraise
# This is way simpler than contextmanager normally does, and costs us
# the ability to detect exception handlers that coincidentally raise
# the same type of error (eg "except ZeroDivisionError: print(1/0)").
except type: return False
# Check that it actually behaves correctly
@SimplerContextManager
def ctxdemo():
print("Before yield")
try:
yield 123
except ZeroDivisionError:
print("Absorbing 1/0")
return
finally:
print("Finalizing")
print("After yield (no exception)")
with ctxdemo() as val:
print("1/0 =", 1/0)
with ctxdemo() as val:
print("1/1 =", 1/1)
#with ctxdemo() as val:
#print("1/q =", 1/q)
@SimplerContextManager
def ctx3():
yield
class TooSimpleContextManager:
"""Now this time you've gone too far."""
def __init__(self, func):
self.func = func
def __call__(self):
self.gen = self.func()
return self
def __enter__(self):
next(self.gen)
def __exit__(self, type, value, traceback):
try: next(self.gen)
except StopIteration: pass
@TooSimpleContextManager
def ctx4():
yield
class ctx5:
def __enter__(self):
self.it = iter(self)
return next(self.it)
def __exit__(self, *args):
try:
next(self.it)
except StopIteration:
pass
def __iter__(self):
yield
t1 = timeit.timeit("with ctx1(): pass", setup="from __main__ import ctx1")
print("%.3f secs" % t1)
for i in range(2, 6):
t2 = timeit.timeit("with ctx2(): pass", setup="from __main__
import ctx%d as ctx2"%i)
print("%.3f secs" % t2)
print("slowdown: -%.2fx" % (t2 / t1))
My numbers are:
0.320 secs
1.354 secs
slowdown: -4.23x
0.899 secs
slowdown: -2.81x
0.831 secs
slowdown: -2.60x
0.868 secs
slowdown: -2.71x
So compared to the tight custom-written context manager class, all the
"pass it a generator function" varieties look pretty much the same.
The existing contextmanager factory has several levels of indirection,
and that's probably where the rest of the performance difference comes
from, but there is some cost to the simplicity of the gen-func
approach.
My guess is that a C-implemented version could replace piles of
error-handling code with simple pointer comparisons (hence my
elimination of it), and may or may not be able to remove some of the
indirection. I'd say it'd land about in the same range as the other
examples here. Is that worth it?
ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] New calling convention to avoid temporarily tuples when calling functions
Hi, tl;dr I found a way to make CPython 3.6 faster and I validated that there is no performance regression. I'm requesting approval of core developers to start pushing changes. In 2014 during a lunch at Pycon, Larry Hasting told me that he would like to get rid of temporary tuples to call functions in Python. In Python, positional arguments are passed as a tuple to C functions: "PyObject *args". Larry wrote Argument Clinic which gives more control on how C functions are called. But I guess that Larry didn't have time to finish his implementation, since he didn't publish a patch. While trying to optimize CPython 3.6, I wrote a proof-of-concept patch and results were promising: https://bugs.python.org/issue26814#msg264003 https://bugs.python.org/issue26814#msg266359 C functions get a C array "PyObject **args, int nargs". Getting the nth argument become "arg = args[n];" at the C level. This format is not new, it's already used internally in Python/ceval.c. A Python function call made from a Python function already avoids a temporary tuple in most cases: we pass the stack of the first function as the list of arguments to the second function. My patch generalizes the idea to C functions. It works in all directions (C=>Python, Python=>C, C=>C, etc.). Many function calls become faster than Python 3.5 with my full patch, but even faster than Python 2.7! For multiple reasons (not interesting here), tested functions are slower in Python 3.4 than Python 2.7. Python 3.5 is better than Python 3.4, but still slower than Python 2.7 in a few cases. Using my "FASTCALL" patch, all tested function calls become faster or as fast as Python 2.7! But when I ran the CPython benchmark suite, I found some major performance regressions. In fact, it took me 3 months to understand that I didn't run benchmarks correctly and that most benchmarks of the CPython benchmark suite are very unstable. I wrote articles explaining how benchmarks should be run (to be stable) and I patched all benchmarks to use my new perf module which runs benchmarks in multiple processes and computes the average (to make benchmarks more stable). At the end, my minimum FASTCALL patch (issue #27128) doesn't show any major performance regression if you run "correctly" benchmarks :-) https://bugs.python.org/issue27128#msg272197 Most benchmarks are not significant, 14 are faster, and only 4 are slower. According to benchmarks on the "full" FASTCALL patch, the slowdown are temporary and should become quickly speedup (with further changes). My question is now: can I push fastcall-2.patch of the issue #27128? This patch only adds the infrastructure to start working on more useful optimizations, more patches will come, I expect more exciting benchmark results. Overview of the initial FASTCALL patch, see my first message on the issue: https://bugs.python.org/issue27128#msg266422 -- Note: My full FASTCALL patch changes the C API: this is out of the scope of my first simple FASTCALL patch. I will open a new discussion to decide if it's worth it and if yes, how it should be done. Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions
On Mon, Aug 8, 2016 at 3:25 PM, Victor Stinner wrote: > tl;dr I found a way to make CPython 3.6 faster and I validated that > there is no performance regression. But is there a performance improvement? > I'm requesting approval of core > developers to start pushing changes. > > In 2014 during a lunch at Pycon, Larry Hasting told me that he would > like to get rid of temporary tuples to call functions in Python. In > Python, positional arguments are passed as a tuple to C functions: > "PyObject *args". Larry wrote Argument Clinic which gives more control > on how C functions are called. But I guess that Larry didn't have time > to finish his implementation, since he didn't publish a patch. > Hm, I agree that those tuples are probably expensive. I recall that IronPython boasted faster Python calls by doing something closer to the platform (in their case I'm guessing C# or the CLR :-). Is this perhaps something that could wait until the Core devs sprint in a few weeks? (I presume you're coming?!) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions
2016-08-09 0:40 GMT+02:00 Guido van Rossum : >> tl;dr I found a way to make CPython 3.6 faster and I validated that >> there is no performance regression. > > But is there a performance improvement? Sure. On micro-benchmarks, you can see nice improvements: * getattr(1, "real") becomes 44% faster * list(filter(lambda x: x, list(range(1000 becomes 31% faster * namedtuple.attr becomes -23% faster * etc. See https://bugs.python.org/issue26814#msg263999 for default => patch, or https://bugs.python.org/issue26814#msg264003 for comparison python 2.7 / 3.4 / 3.5 / 3.6 / 3.6 patched. On the CPython benchmark suite, I also saw many faster benchmarks: Faster (25): - pickle_list: 1.29x faster - etree_generate: 1.22x faster - pickle_dict: 1.19x faster - etree_process: 1.16x faster - mako_v2: 1.13x faster - telco: 1.09x faster - raytrace: 1.08x faster - etree_iterparse: 1.08x faster (...) See https://bugs.python.org/issue26814#msg266359 Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions
2016-08-09 0:40 GMT+02:00 Guido van Rossum : > Hm, I agree that those tuples are probably expensive. I recall that > IronPython boasted faster Python calls by doing something closer to the > platform (in their case I'm guessing C# or the CLR :-). To be honest, I didn't expect *any* speedup just by avoiding the temporary tuples. The C structore of tuples is simple and the allocation of tuples is already optimized by a free list. I still don't understand how the cost of tuple creation/destruction can have such "large" impact on performances. The discussion with Larry was not really my first motivation to work on FASTCALL. I worked on this topic because CPython already uses some "hidden" tuples to avoid the cost of the tuple creation/destruction in various places, but using really ugly code and this ugly code caused crashes and surprising behaviours... https://bugs.python.org/issue26811 is a recent crash related to property_descr_get() optimization, whereas the optimization was already "fixed" once: https://hg.python.org/cpython/rev/5dbf3d932a59/ The fix is just another hack on top of the existing hack. The issue #26811 rewrote the optimization to avoid the crash using _PyObject_GC_UNTRACK(): https://hg.python.org/cpython/rev/a98ef122d73d I tried to make this "optimization" the standard way to call functions, rather than a corner case, and avoid hacks like PyTuple_SET_ITEM(args, 0, NULL) or _PyObject_GC_UNTRACK(). Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions
I just wanted to say I'm excited about this and I'm glad someone is taking advantage of what Argument Clinic allows for and what I know Larry had initially hoped AC would make happen! I should also point out that Serhiy has a patch for faster keyword argument parsing thanks to AC: http://bugs.python.org/issue27574 . Not sure how your two patches would intertwine (if at all). On Mon, 8 Aug 2016 at 15:26 Victor Stinner wrote: > Hi, > > tl;dr I found a way to make CPython 3.6 faster and I validated that > there is no performance regression. I'm requesting approval of core > developers to start pushing changes. > > In 2014 during a lunch at Pycon, Larry Hasting told me that he would > like to get rid of temporary tuples to call functions in Python. In > Python, positional arguments are passed as a tuple to C functions: > "PyObject *args". Larry wrote Argument Clinic which gives more control > on how C functions are called. But I guess that Larry didn't have time > to finish his implementation, since he didn't publish a patch. > > While trying to optimize CPython 3.6, I wrote a proof-of-concept patch > and results were promising: > https://bugs.python.org/issue26814#msg264003 > https://bugs.python.org/issue26814#msg266359 > > C functions get a C array "PyObject **args, int nargs". Getting the > nth argument become "arg = args[n];" at the C level. This format is > not new, it's already used internally in Python/ceval.c. A Python > function call made from a Python function already avoids a temporary > tuple in most cases: we pass the stack of the first function as the > list of arguments to the second function. My patch generalizes the > idea to C functions. It works in all directions (C=>Python, Python=>C, > C=>C, etc.). > > Many function calls become faster than Python 3.5 with my full patch, > but even faster than Python 2.7! For multiple reasons (not interesting > here), tested functions are slower in Python 3.4 than Python 2.7. > Python 3.5 is better than Python 3.4, but still slower than Python 2.7 > in a few cases. Using my "FASTCALL" patch, all tested function calls > become faster or as fast as Python 2.7! > > But when I ran the CPython benchmark suite, I found some major > performance regressions. In fact, it took me 3 months to understand > that I didn't run benchmarks correctly and that most benchmarks of the > CPython benchmark suite are very unstable. I wrote articles explaining > how benchmarks should be run (to be stable) and I patched all > benchmarks to use my new perf module which runs benchmarks in multiple > processes and computes the average (to make benchmarks more stable). > > At the end, my minimum FASTCALL patch (issue #27128) doesn't show any > major performance regression if you run "correctly" benchmarks :-) > https://bugs.python.org/issue27128#msg272197 > > Most benchmarks are not significant, 14 are faster, and only 4 are slower. > > According to benchmarks on the "full" FASTCALL patch, the slowdown are > temporary and should become quickly speedup (with further changes). > > My question is now: can I push fastcall-2.patch of the issue #27128? > This patch only adds the infrastructure to start working on more > useful optimizations, more patches will come, I expect more exciting > benchmark results. > > Overview of the initial FASTCALL patch, see my first message on the issue: > https://bugs.python.org/issue27128#msg266422 > > -- > > Note: My full FASTCALL patch changes the C API: this is out of the > scope of my first simple FASTCALL patch. I will open a new discussion > to decide if it's worth it and if yes, how it should be done. > > Victor > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions
2016-08-09 1:36 GMT+02:00 Brett Cannon : > I just wanted to say I'm excited about this and I'm glad someone is taking > advantage of what Argument Clinic allows for and what I know Larry had > initially hoped AC would make happen! To make "Python" faster, not only a few specific functions, "all" C code should be updated to use the new "FASTCALL" calling convention. But it's a pain to have to rewrite the code parsing arguments, we all hate having to put #ifdef in the code... (for backward compatibility.) This is where the magic happens: if your code is written using Argument Clinic, you will get the optimization (FASTCALL) for free: just run again Argument Clinic to get the "updated" "calling convention". It can be a very good motivation to rewrite your code using Argument Clinic: get better inline documentation (docstring, help(func) in REPL) *and* performance ;-) > I should also point out that Serhiy has a patch for faster keyword argument > parsing thanks to AC: http://bugs.python.org/issue27574 . Not sure how your > two patches would intertwine (if at all). In a first implementation, I packed *all* arguments in the same C array: positional and keyword arguments. The problem is that all functions expect a dict to parse keyword arguments. A dict has an important property: O(1) for lookup. It becomes O(n) if you pass keyword arguments as a list of (key, value) tuples in a C array. So I chose to don't touch keyword arguments at all: continue to pass them as a dict. By the way, it's very rare to call a function using keyword arguments from C. -- About http://bugs.python.org/issue27574 : it's really nice to see work done on this part! I recall a discussion of the performance of operator versus function call. In some cases, the overhead of "parsing" arguments is higher than the cost of the same feature implemented as an operator! Hum, it was probably this issue: https://bugs.python.org/issue17170 Extract of the issue: """ Some crude C benchmarking on this computer: - calling PyUnicode_Replace is 35 ns (per call) - calling "hundred".replace is 125 ns - calling PyArg_ParseTuple with the same signature as "hundred".replace is 80 ns """ Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New calling convention to avoid temporarily tuples when calling functions
On 2016-08-08 6:53 PM, Victor Stinner wrote: 2016-08-09 0:40 GMT+02:00 Guido van Rossum : tl;dr I found a way to make CPython 3.6 faster and I validated that there is no performance regression. But is there a performance improvement? Sure. On micro-benchmarks, you can see nice improvements: * getattr(1, "real") becomes 44% faster * list(filter(lambda x: x, list(range(1000 becomes 31% faster * namedtuple.attr becomes -23% faster * etc. See https://bugs.python.org/issue26814#msg263999 for default => patch, or https://bugs.python.org/issue26814#msg264003 for comparison python 2.7 / 3.4 / 3.5 / 3.6 / 3.6 patched. On the CPython benchmark suite, I also saw many faster benchmarks: Faster (25): - pickle_list: 1.29x faster - etree_generate: 1.22x faster - pickle_dict: 1.19x faster - etree_process: 1.16x faster - mako_v2: 1.13x faster - telco: 1.09x faster - raytrace: 1.08x faster - etree_iterparse: 1.08x faster (...) Exceptional results, congrats Victor. Will be happy to help with code review. Yury ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
