On 29/03/2015 08:57, Christian Gollwitzer wrote:
Am 29.03.15 um 05:06 schrieb Steven D'Aprano:
[OT: Competing with compiled C code]
I'm not one of those people who think that C is by definition the fastest language conceivable. (People who believe this sometimes make an exception for hand-crafted assembly, which is ironic since these days the best C optimizing compilers can generate faster, tighter code than human assembly programmers.)
Defeating a C compiler is possible in (rare) cases, I wouldn't really count the examples you've posted, since they don't actually compute anything useful. However, for instance the Intel C compiler is able to replace a loop like this: for (int i=0; i<N; i++) { for (int j=0; j<N; j++) { for (int k=0; k<N; k++) { z[i][k]+=x[i][j]*y[j][k]; } } } with a call to an optimized BLAS matrix multiplication in "some" (=impractical) cases.
I have an interpreter for my language which has an implementation in C, and one in my own static language. That latter can also optionally make use of a byte-code dispatcher written in assembly (which tries to deal with each byte-code if it can, if not it passes it on to high-level code).
This combination of HLL+ASM can be up to double the speed of the C/gcc/O3 version (and which has to use gcc extensions), when executing benchmarks. On real programs, it can still be up to 40-50% faster (ie. the C version takes 40-50% longer to execute).
(My own static language by itself is 30% slower than C/gcc/O3 at the minute, when used for the interpreter, but I'm working on it... For small, tight benchmarks though it is still totally outclassed by gcc.)
Just look at the contrived PyPy benchmarks, for a very tuned selected sample you can be almost twice as fast as C - for a random algorithm like the Sudoku solver, not specifically tuned, C is 30x faster. It still boils down to the classic rules: static unboxed types and static or preallocated memory makes your code fast. In many cases, though, programming in Python can free programmer time, which can lead in turn to better algorithms that outperform the wisdom of the C compiler.
I'm quite impressed with PyPy. While its implementation seems (to me) fantastically complicated compared with the very simple concept of a byte-code interpreter, it seems to do the job. For simple integer benchmarks, it can be double the speed of my HLL+ASM interpreter running the same algorithm (but not in Python).
However I believe that CPython could be a bit faster than it is, even with all the dynamic stuff that is what's supposed to keep it slow.
(If Python wasn't so huge, and if I understood it a lot more than I do, I'd be tempted to have a go myself. For example, I could change the syntax of my interpreted language so that it looks like Python. Then I can take a simple benchmark in this 'Python', and run it with both CPython and my interpreter, and mine is likely to be faster. Yet, it still uses a byte-code dispatch loop, and it is still dynamically typed.
So what extra stuff is going on in CPython?) -- Bartc -- https://mail.python.org/mailman/listinfo/python-list