[issue4753] Faster opcode dispatch on gcc

2015-06-02 Thread David Bolen
David Bolen added the comment: Oops, sorry, I had just followed the commit comment to this issue. For the record here, it looks like Benjamin has committed an update (5e8fa1b13516) that resolves the problem. -- ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2015-06-01 Thread R. David Murray
R. David Murray added the comment: Please open a new issue with the details about your problem. -- nosy: +r.david.murray ___ Python tracker ___ ___

[issue4753] Faster opcode dispatch on gcc

2015-06-01 Thread David Bolen
David Bolen added the comment: I ran a few more tests, and the generated executable hangs in both release and debug builds. The closest I can get at the moment is that it's stuck importing errno from the "import sys, errno" line in os.py - at least no matter how long I wait after starting a p

[issue4753] Faster opcode dispatch on gcc

2015-06-01 Thread David Bolen
David Bolen added the comment: The 2.7 back-ported version of this patch appears to have broken compilation on the Windows XP buildbot, during the OpenSSL build process, when the newly built Python is used to execute the build_ssl.py script. After this patch, when that stage executes, and prio

[issue4753] Faster opcode dispatch on gcc

2015-05-28 Thread Roundup Robot
Roundup Robot added the comment: New changeset 17d3bbde60d2 by Benjamin Peterson in branch '2.7': backport computed gotos (#4753) https://hg.python.org/cpython/rev/17d3bbde60d2 -- nosy: +python-dev ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2015-05-27 Thread Ned Deily
Ned Deily added the comment: @Vamsi, could you please open a new issue and attach your patch there so it can be properly tracked for 2.7? This issue has been closed for five years and the code has been out in the field for a long time in Python 3. Thanks! -- nosy: +ned.deily ___

[issue4753] Faster opcode dispatch on gcc

2015-05-27 Thread Robert Collins
Robert Collins added the comment: FWIW I'm interested and willing to poke at this if more testers/reviewers are needed. -- nosy: +rbcollins ___ Python tracker ___ ___

[issue4753] Faster opcode dispatch on gcc

2015-05-27 Thread Srinivas Vamsi Parasa
Srinivas Vamsi Parasa added the comment: Hi All, This is Vamsi from Server Scripting Languages Optimization team at Intel Corporation. Would like to submit a request to enable the computed goto based dispatch in Python 2.x (which happens to be enabled by default in Python 3 given its perform

[issue4753] Faster opcode dispatch on gcc

2010-07-19 Thread Antoine Pitrou
Antoine Pitrou added the comment: This is too late for 2.x now, closing. -- resolution: accepted -> fixed status: open -> closed ___ Python tracker ___ __

[issue4753] Faster opcode dispatch on gcc

2010-05-20 Thread Skip Montanaro
Changes by Skip Montanaro : -- nosy: -skip.montanaro ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue4753] Faster opcode dispatch on gcc

2009-07-18 Thread Michele Dionisio
Michele Dionisio added the comment: I have patch the code of python3.1 to use computed goto tecnique also with Visual Studio. The performance result is not good (I really don't know why). But it is a good work-araound for use the computed goto also on windows. The only diffentes is that the opco

[issue4753] Faster opcode dispatch on gcc

2009-07-02 Thread Jesús Cea Avión
Changes by Jesús Cea Avión : -- nosy: +jcea ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue4753] Faster opcode dispatch on gcc

2009-04-11 Thread Alexandre Vassalotti
Changes by Alexandre Vassalotti : -- nosy: -alexandre.vassalotti ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue4753] Faster opcode dispatch on gcc

2009-04-11 Thread Mark Dickinson
Changes by Mark Dickinson : -- nosy: -marketdickinson ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://ma

[issue4753] Faster opcode dispatch on gcc

2009-04-09 Thread Andrew I MacIntyre
Andrew I MacIntyre added the comment: Antoine, in my testing the "loss" of the HAS_ARG() optimisation in my patch appears to have negligible cost on i386, but starts to look significant on amd64. On an Intel E8200 cpu running FreeBSD 7.1 amd64, with gcc 7.2.1 and the 3.1a2 sources, the computed

[issue4753] Faster opcode dispatch on gcc

2009-03-31 Thread Antoine Pitrou
Antoine Pitrou added the comment: Andrew, your patch disables the optimization that HAS_ARG(op) is a constant when op is known by the compiler (that is, inside a "TARGET_##op" label), so I'd rather keep the version which is currently in SVN. -- versions: -Python 3.1 __

[issue4753] Faster opcode dispatch on gcc

2009-03-31 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-31 03:19, A.M. Kuchling wrote: > A.M. Kuchling added the comment: > > Is a backport to 2.7 still planned? I hope it is. -- ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-03-30 Thread A.M. Kuchling
A.M. Kuchling added the comment: Is a backport to 2.7 still planned? -- nosy: +akuchling ___ Python tracker ___ ___ Python-bugs-list m

[issue4753] Faster opcode dispatch on gcc

2009-03-22 Thread Andrew I MacIntyre
Andrew I MacIntyre added the comment: Out of interest, the attached patch against the py3k branch at r70516 cleans up the threaded code changes a little: - gets rid of TARGET_WITH_IMPL macro; - TARGET(op) is followed by a colon, so that it looks like a label (for editors that make use of that).

[issue4753] Faster opcode dispatch on gcc

2009-02-20 Thread Joshua Bronson
Changes by Joshua Bronson : -- nosy: +jab ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org

[issue4753] Faster opcode dispatch on gcc

2009-02-07 Thread Skip Montanaro
Skip Montanaro added the comment: Antoine> Skip, removing the colon doesn't work if the macro adds code Antoine> after the colon :) When I looked I thought both TARGET and TARGET_WITH_IMPL ended with a colon, but I see that's not the case. How about removing TARGET_WITH_IMPL and just inclu

[issue4753] Faster opcode dispatch on gcc

2009-02-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: Skip, removing the colon doesn't work if the macro adds code after the colon :) ___ Python tracker ___ ___ Py

[issue4753] Faster opcode dispatch on gcc

2009-02-04 Thread Gabriel Genellina
Gabriel Genellina added the comment: > Might I suggest that the TARGET and TARGET_WITH_IMPL macros not > include the trailing colon? Yes, please! -- nosy: +gagenellina ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-02-03 Thread Skip Montanaro
Skip Montanaro added the comment: This has been checked in, right? Might I suggest that the TARGET and TARGET_WITH_IMPL macros not include the trailing colon? I think that will make it more friendly toward "smart" editors such as Emacs' C mode. I definitely get better indentation with TA

[issue4753] Faster opcode dispatch on gcc

2009-01-31 Thread Mark Dickinson
Mark Dickinson added the comment: > The test failure also happens on trunk, it may be related to the recent > tk changes. Yes; sorry---I didn't mean to suggest that the test failures were in any way related to the opcode dispatch stuff. Apart from the ttk teething difficulties, there's a wei

[issue4753] Faster opcode dispatch on gcc

2009-01-31 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Square brackets added in r69133. The gentoo x86 3.x buildbot seems to be > passing the compile stage now. (Though not the test stage, of course: > one can't have everything!) The test failure also happens on trunk, it may be related to the recent tk chan

[issue4753] Faster opcode dispatch on gcc

2009-01-31 Thread Mark Dickinson
Mark Dickinson added the comment: Square brackets added in r69133. The gentoo x86 3.x buildbot seems to be passing the compile stage now. (Though not the test stage, of course: one can't have everything!) ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-01-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: Mark: """Are there any objections to me adding a couple of square brackets to this line to turn the argument of join into a list comprehension?""" No problems for me. You might also add to the top comments of the file that it is 2.3-compatible. __

[issue4753] Faster opcode dispatch on gcc

2009-01-30 Thread Mark Dickinson
Mark Dickinson added the comment: Sorry: ignore that last. Python/opcode_targets.h is already part of the distribution. I don't know what I was doing wrong. ___ Python tracker ___ ___

[issue4753] Faster opcode dispatch on gcc

2009-01-30 Thread Mark Dickinson
Mark Dickinson added the comment: One other thought: it seems that as a result of this change, the py3k build process now depends on having some version of Python already installed; before this, it didn't. Is this true, or am I misinterpreting something? Might it be worth adding the file

[issue4753] Faster opcode dispatch on gcc

2009-01-30 Thread Mark Dickinson
Mark Dickinson added the comment: The x86 gentoo buildbot is failing to compile, with error: /Python/makeopcodetargets.py ./Python/opcode_targets.h File "./Python/makeopcodetargets.py", line 28 f.write(",\n".join("\t&&%s" % s for s in targets)) ^ Synt

[issue4753] Faster opcode dispatch on gcc

2009-01-30 Thread Kevin Watters
Kevin Watters added the comment: Does anyone know the equivalent ICC command line option for GCC's -fno- gcse? (Or if it has one?) I can't find a related option in the docs. It looks like ICC hits the same combining goto problem, as was mentioned: without changing any options, I applied pitrou

[issue4753] Faster opcode dispatch on gcc

2009-01-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: For the record, I've compiled py3k on an embarassingly fast Core2-based server (Xeon E5410), and the computed gotos option gives a 16% speedup on pybench and pystone. (with gcc 4.3.2 in 64-bit mode) ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-01-27 Thread Gregory P. Smith
Gregory P. Smith added the comment: I'll take on the two remaining tasks for this: * add configure magic to detect when the compiler supports this so that it can default to --with-computed-gotos on modern systems. * commit the back port to 2.7 trunk. -- assignee: -> gregory.p.smith

[issue4753] Faster opcode dispatch on gcc

2009-01-26 Thread Kevin Watters
Changes by Kevin Watters : -- nosy: +kevinwatters ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue4753] Faster opcode dispatch on gcc

2009-01-26 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: -fno-gcse is controversial. Even if it might avoid jumps sharing, the impact of that option has to be measured, since common subexpression elimination allows omitting some recalculations, so disabling global CSE might have a negative impact on ot

[issue4753] Faster opcode dispatch on gcc

2009-01-25 Thread Antoine Pitrou
Antoine Pitrou added the comment: Committed in py3k in r68924. I won't backport it to trunk myself but it should be easy enough, provided people are interested. -- resolution: -> accepted stage: patch review -> committed/rejected status: open -> pending versions: -Python 2.6, Python 3

[issue4753] Faster opcode dispatch on gcc

2009-01-24 Thread Jeffrey Yasskin
Jeffrey Yasskin added the comment: In the comment, you might mention both -fno-crossjumping and -fno-gcse. -fno-crossjumping's description looks like it ought to prevent combining computed gotos, but http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Optimize-Options.html says -fno-gcse actually does i

[issue4753] Faster opcode dispatch on gcc

2009-01-21 Thread Stefan Ring
Stefan Ring added the comment: Hi, I ported threadedceval6.patch to Python 2.5.4, in case anyone is interested... Note that you need to run autoconf and autoheader. -- nosy: +Ringding Added file: http://bugs.python.org/file12824/threadedceval6-py254.patch

[issue4753] Faster opcode dispatch on gcc

2009-01-16 Thread Antoine Pitrou
Changes by Antoine Pitrou : Removed file: http://bugs.python.org/file12767/threadedceval6.patch ___ Python tracker ___ ___ Python-bugs-list mai

[issue4753] Faster opcode dispatch on gcc

2009-01-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Thanks Skip, it makes sense... so here is a patch without the configure script. (I wonder however if those huge configure changes, when checked into the SVN, could break something silently somewhere) Added file: http://bugs.python.org/file12769/threadedceval6.

[issue4753] Faster opcode dispatch on gcc

2009-01-16 Thread Skip Montanaro
Skip Montanaro added the comment: Antoine> (sorry, the patch is very long because it seems running Antoine> autoconf changes a lot of things in the configure script) Normal practice is to not include the configure script in such patches and indicate to people that they will need to run auto

[issue4753] Faster opcode dispatch on gcc

2009-01-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is an updated patch with a dedicated configure option (--with-computed-gotos, disabled by default), rather than a compiler detection switch. (sorry, the patch is very long because it seems running autoconf changes a lot of things in the configure script)

[issue4753] Faster opcode dispatch on gcc

2009-01-13 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: #4715 is interesting, but is not really about superinstructions. Superinstructions are not created because they make sense; any common sequence of opcodes can become a superinstruction, just for the point of saving dispatches. And the creation ca

[issue4753] Faster opcode dispatch on gcc

2009-01-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: As for superinstructions, you can find an example here: #4715. ___ Python tracker ___ ___ Python-bugs-list ma

[issue4753] Faster opcode dispatch on gcc

2009-01-12 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: Ok, then vmgen adds almost just direct threading instead of indirect threading. Since the purpose of superinstructions is to eliminate dispatch overhead, and that's more important when little actual work is done, what about all ones which uncond

[issue4753] Faster opcode dispatch on gcc

2009-01-12 Thread Jeffrey Yasskin
Jeffrey Yasskin added the comment: @Paolo: I'm going to be looking into converting more common sequences into superinstructions. We only have LOAD_CONST+XXX so far. The others are difficult because vmgen doesn't provide easy ways to deal with error handling, but Jakob and I have come up with a c

[issue4753] Faster opcode dispatch on gcc

2009-01-12 Thread Jeffrey Yasskin
Jeffrey Yasskin added the comment: I've left some line-by-line comments at http://codereview.appspot.com/11905. Sorry if there was already a Rietveld thread; I didn't see one. ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-01-12 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: A couple percent maybe is not worth vmgen-ing. But even if I'm not a vmgen expert, I read many papers from Ertl about superinstructions and replication, so the expected speedup from vmgen'ing is much bigger. Is there some more advanced feature we

[issue4753] Faster opcode dispatch on gcc

2009-01-12 Thread Jeffrey Yasskin
Jeffrey Yasskin added the comment: Here's the vmgen-based patch for comparison. Again, it passes all the tests, but isn't complete outside of that and (unless consensus develops that a couple percent is worth requiring vmgen) shouldn't distract from reviewing Antoine's patch. I'll look over thre

[issue4753] Faster opcode dispatch on gcc

2009-01-11 Thread Gregory P. Smith
Gregory P. Smith added the comment: Benchmarking pitrou_dispatch_2.7.patch applied to trunk r68522 on a 32- bit Efficeon (x86) using gcc 4.2.4-1ubuntu3 yields a 10% pybench speedup. ___ Python tracker _

[issue4753] Faster opcode dispatch on gcc

2009-01-11 Thread Jeffrey Yasskin
Jeffrey Yasskin added the comment: Here's a port of threadedceval5.patch to trunk. It passes the tests. I haven't benchmarked this exact patch, but on one Intel Core2, a similar patch got an 11%-14% speedup (on 2to3 and pybench). I've also cleaned up Jakob Sievers' vmgen patch (representing for

[issue4753] Faster opcode dispatch on gcc

2009-01-11 Thread Andrew Bennetts
Changes by Andrew Bennetts : -- nosy: +spiv ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Alexander Belopolsky
Changes by Alexander Belopolsky : -- nosy: +belopolsky ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://ma

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: > (First culprit might > be license/compatibility problems I guess, but the speedup would be > worth the time to fix the troubles IMHO). That would be the obvious reason IMO. And Intel is the only one who can "fix the troubles". ___

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: > Same for CPU-specific tuning: I don't think we want to ship Python with compiler flags which depend on the particular CPU being used. I wasn't suggesting this - but since different CPUs have different optimization rules, something like "oh, 20

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-01-10 10:55, Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> It looks like we still didn't manage, and since ICC is the best >> compiler out there, this matters. > > Well, from the perspective of Python, what matters mostly is the

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: > It looks like we still didn't manage, and since ICC is the best > compiler out there, this matters. Well, from the perspective of Python, what matters mostly is the commonly used compilers (that is, gcc and MSVC). I doubt many people compile Python with icc,

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Antoine Pitrou
Antoine Pitrou added the comment: > @pitrou: > > The machine I got the 15% speedup on is in 64-bit mode with gcc > 4.3.2. > > Which is the processor? I guess the bigger speedups should be on > Pentium4, since it has the bigger mispredict penalties. Athlon X2 3600+. __

[issue4753] Faster opcode dispatch on gcc

2009-01-10 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: The standing question is still: can we get ICC to produce the expected output? It looks like we still didn't manage, and since ICC is the best compiler out there, this matters. Some problems with SunCC, even if it doesn't do jump sharing, it se

[issue4753] Faster opcode dispatch on gcc

2009-01-09 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: @ ajaksu2 > Applying your patches makes no difference with gcc 4.2 and gives a > barely noticeable (~2%) slowdown with icc. "Your patches" is something quite unclear :-) Which are the patch sets you are comparing? And on 32 or 64 bits? But does Y

[issue4753] Faster opcode dispatch on gcc

2009-01-09 Thread Daniel Diniz
Daniel Diniz added the comment: Paolo, Applying your patches makes no difference with gcc 4.2 and gives a barely noticeable (~2%) slowdown with icc. These results are from a Celeron M 410 (Core Solo Yonah-based), so it's a rather old platform to run benchmarks on. __

[issue4753] Faster opcode dispatch on gcc

2009-01-07 Thread Gregory P. Smith
Changes by Gregory P. Smith : -- nosy: +gregory.p.smith ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue4753] Faster opcode dispatch on gcc

2009-01-07 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: @skip: In simple words, the x86 call: call 0x2000 placed at address 0x1000 becomes: call %rip + 0x1000 RIP holds the instruction pointer, which will be 0x1000 in this case (actually, I'm ignoring the detail that when executing the call, RIP

[issue4753] Faster opcode dispatch on gcc

2009-01-07 Thread Skip Montanaro
Skip Montanaro added the comment: Paolo> Various techniques allow to create binary code from the Paolo> interpreter binary, by just pasting together the code for the Paolo> common interpreters cases and producing calls to the other. But, Paolo> guess what, on most platforms (except p

[issue4753] Faster opcode dispatch on gcc

2009-01-07 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: I finally implemented my suggestion for the switch elimination. On top of threadedceval5.patch, apply abstract-switch-reduced.diff and then restore-old-oparg-load.diff to test it. This way, only computed goto's are used. I would like who had mis

[issue4753] Faster opcode dispatch on gcc

2009-01-07 Thread Paolo 'Blaisorblade' Giarrusso
Changes by Paolo 'Blaisorblade' Giarrusso : Added file: http://bugs.python.org/file12634/restore-old-oparg-load.diff ___ Python tracker ___ ___

[issue4753] Faster opcode dispatch on gcc

2009-01-07 Thread Paolo 'Blaisorblade' Giarrusso
Changes by Paolo 'Blaisorblade' Giarrusso : Added file: http://bugs.python.org/file12633/abstract-switch-reduced.diff ___ Python tracker ___ __

[issue4753] Faster opcode dispatch on gcc

2009-01-06 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: @pitrou: Argh, reference counting hinders even that? I just discovered another problem caused by refcounting. Various techniques allow to create binary code from the interpreter binary, by just pasting together the code for the common interpre

[issue4753] Faster opcode dispatch on gcc

2009-01-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: FWIW, I have made a quick attempt at removing the f->f_lasti assignment in the few places where it could be removed, but it didn't make a difference on my machine. The problem being that there are very few places where it is legitimate to remove the assignment (

[issue4753] Faster opcode dispatch on gcc

2009-01-05 Thread Jeffrey Yasskin
Changes by Jeffrey Yasskin : -- nosy: +collinwinter, jyasskin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: ht

[issue4753] Faster opcode dispatch on gcc

2009-01-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le lundi 05 janvier 2009 à 02:39 +, Paolo 'Blaisorblade' Giarrusso a écrit : > About f->last_i, when I have time I want to try optimizing it. Somewhere > you can be sure it's not going to be used. There are lots of places which can call into arbitrary Pytho

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: @alexandre: if you add two labels per opcode and two dispatch tables, one before (like now) and one after the parameter fetch (where we have the 'case'), you can keep the same speed. And under the hood we also had two dispatch tables before, with

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Alexandre Vassalotti
Alexandre Vassalotti added the comment: > I managed to remove switch pretty easily by moving opcode fetching > in the FAST_DISPATCH macro and abstracting the control flow of the > switch. Here is the diff against threadceval5.patch. Added file: http://bugs.python.org/file12584/abstract-switch.

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Alexandre Vassalotti
Alexandre Vassalotti added the comment: > Removing the switch won't be possible unless we change the semantic > EXTENDED_ARG. In addition, I doubt the improvement, if any, would worth > the increased complexity. Nevermind what I have said. I managed to remove switch pretty easily by moving opco

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: @Skip: if one decides to generate binary code, there is no need to use switches. Inline threading (also known as "code copying" in some research papers) is what you are probably looking for: http://blog.mozilla.com/dmandelin/2008/08/27/inline-th

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: @Alexandre: > > So, can you try dropping the switch altogether, using always computed > > goto and seeing how does the resulting code get compiled? > Removing the switch won't be possible unless we change the semantic > EXTENDED_ARG. In addition

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Ralph Corderoy
Changes by Ralph Corderoy : ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Ralph Corderoy
Ralph Corderoy added the comment: Regarding compressing the opcode table to make better use of cache; what if the most frequently occurring opcodes where placed together, e.g. the opcodes were ordered by frequency, most frequent first. Just based on a one-off static analysis of a body of code.

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Skip Montanaro
Skip Montanaro added the comment: I'm sure this is the wrong place to bring this up, but I had a thought about simple JIT compilation coupled with the opcode dispatch changes in this issue. Consider this silly function: >>> def f(a, b): ... result = 0 ... while b: ... r

[issue4753] Faster opcode dispatch on gcc

2009-01-04 Thread Facundo Batista
Changes by Facundo Batista : -- nosy: +facundobatista ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mai

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Alexandre Vassalotti
Alexandre Vassalotti added the comment: Paolo wrote: > So, can you try dropping the switch altogether, using always computed > goto and seeing how does the resulting code get compiled? Removing the switch won't be possible unless we change the semantic EXTENDED_ARG. In addition, I doubt the imp

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Daniel Diniz
Daniel Diniz added the comment: Paolo 'Blaisorblade' Giarrusso wrote: > > 1st note: is that code from the threaded version? [...] It is vital to > this patch that the jump is not shared, something similar to > -fno-crossjumping should be found. Yes, threaded version by unconditionally defining

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: Daniel, I forgot to ask for the compilation command line you used, since they make a lot of difference. Can you post them? Thanks ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: 1st note: is that code from the threaded version? Note that you need to modify the source to make it accept also ICC to try that. In case you already did that, I guess the patch is not useful at all with ICC since, as far as I can see, the jump i

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Daniel Diniz
Daniel Diniz added the comment: IIUC, this is what gcc 4.2.4 generates on a Celeron M for the code Alexandre posted: movl-272(%ebp), %eax movl8(%ebp), %edx subl-228(%ebp), %eax movl%eax, 60(%edx) movl-272(%ebp), %ecx movzbl (%e

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Benoit Boissinot
Changes by Benoit Boissinot : -- nosy: +bboissin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Yann Ramin
Changes by Yann Ramin : -- nosy: +theatrus ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.or

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread djc
Changes by djc : -- nosy: +djc ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/op

[issue4753] Faster opcode dispatch on gcc

2009-01-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I'm not an expert in this kind of optimizations. Could we gain more > speed by making the dispatcher table more dense? Python has less than > 128 opcodes (len(opcode.opmap) == 113) so they can be squeezed in a > smaller table. I naively assume a smaller table

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: > I'm not an expert in this kind of optimizations. Could we gain more speed by making the dispatcher table more dense? Python has less than 128 opcodes (len(opcode.opmap) == 113) so they can be squeezed in a smaller table. I naively assume a smal

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Paolo 'Blaisorblade' Giarrusso
Paolo 'Blaisorblade' Giarrusso added the comment: About miscompilations: the current patch is a bit weird for GCC, because you keep both the switch and the computed goto. But actually, there is no case in which the switch is needed, and computed goto give less room to GCC's choices. So, can yo

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Christian Heimes
Christian Heimes added the comment: > Alexandre Vassalotti added the comment: > The patch make a huge difference on 64-bit Linux. I get a 20% speed-up > and the lowest run time so far. That is quite impressive! I'm really, REALLY impressed by the speed up. Good work! I'm not an expert in this

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Skip Montanaro
Changes by Skip Montanaro : Added file: http://bugs.python.org/file12555/ceval.i.threaded ___ Python tracker ___ ___ Python-bugs-list mailing l

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Skip Montanaro
Skip Montanaro added the comment: Alexandre's last comment reminded me I forgot to post the PPC assembler code. Next two files are the output as requested by Antoine. Added file: http://bugs.python.org/file12553/ceval.i.unthreaded ___ Python tracker

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Alexandre Vassalotti
Alexandre Vassalotti added the comment: One more thing, the patch causes the following warnings to be emited by GCC when USE_COMPUTED_GOTOS is undefined. Python/ceval.c: In function ‘PyEval_EvalFrameEx’: Python/ceval.c:2420: warning: label ‘_make_function’ defined but not used Python/ceval.c:

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Alexandre Vassalotti
Alexandre Vassalotti added the comment: The patch make a huge difference on 64-bit Linux. I get a 20% speed-up and the lowest run time so far. That is quite impressive! At first glance, it seems the extra registers of the x86-64 architecture permit GCC to avoid spilling registers onto the stack

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Skip Montanaro
Skip Montanaro added the comment: Antoine> Ok, so the threaded version is actually faster by 20% on your Antoine> PPC, and slower by 5% on your Core 2 Duo. Thanks for doing the Antoine> measurements! Confirmed by pystone runs as well. Sorry for the earlier misdirection. Skip

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: > OK, I think I'm misreading the output of pybench. Let me reset. Ignore > anything I've written previously on this topic. Instead, I will just > post the output of my pybench comparison runs and let more expert people > interpret as appropriate. The first f

[issue4753] Faster opcode dispatch on gcc

2009-01-02 Thread Skip Montanaro
Skip Montanaro added the comment: The next is the result of running on my MacBook Pro (Intel Core 2 Duo). Added file: http://bugs.python.org/file12546/pybench.sum.Intel ___ Python tracker __

  1   2   >