Alan Gauld, 16.04.2010 10:29:
"Stefan Behnel" wrote
import cython
@cython.locals(result=cython.longlong, i=cython.longlong)
def add():
result = 0
for i in xrange(1000000000):
result += i
return result
print add()
This runs in less than half a second on my machine, including the time
to launch the CPython interpreter. I doubt that the JVM can even start
up in that time.
I'm astonished at these results. What kind of C are you using. Even in
assembler I'd expect the loop/sum to take at least 3s
on a quad core 3GHz box.
>
Or is cython doing the precalculation optimisations you mentioned?
Nothing surprising in the C code:
__pyx_v_result = 0;
for (__pyx_t_1 = 0; __pyx_t_1 < 1000000000; __pyx_t_1+=1) {
__pyx_v_i = __pyx_t_1;
__pyx_v_result += __pyx_v_i;
}
And if so when does it do them? Because surely, at some stage, it still
has to crank the numbers?
Cython does a bit of constant folding (which it can benefit from on
internal optimisation decisions), but apart from that, the mantra is to
just show the C compiler explicit C code that it can understand well, and
let it do its job.
(We can of course do some fancy math to speed this particular sum up
since the result for any power of ten has a common pattern, but I
wouldn't expect the compiler optimiser to be that clever)
In this particular case, the C compiler really stores the end result in the
binary module, so I assume that it simply applies Little Gauß as an
optimisation in combination with loop variable aliasing.
Stefan
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor