Alan Gauld, 16.04.2010 10:29:
"Stefan Behnel" wrote
import cython

@cython.locals(result=cython.longlong, i=cython.longlong)
def add():
    result = 0
    for i in xrange(1000000000):
        result += i
    return result

print add()

This runs in less than half a second on my machine, including the time
to launch the CPython interpreter. I doubt that the JVM can even start
up in that time.

I'm astonished at these results. What kind of C are you using. Even in
assembler I'd expect the loop/sum to take at least 3s
on a quad core 3GHz box.
>
Or is cython doing the precalculation optimisations you mentioned?

Nothing surprising in the C code:

  __pyx_v_result = 0;
  for (__pyx_t_1 = 0; __pyx_t_1 < 1000000000; __pyx_t_1+=1) {
    __pyx_v_i = __pyx_t_1;
    __pyx_v_result += __pyx_v_i;
  }


And if so when does it do them? Because surely, at some stage, it still
has to crank the numbers?

Cython does a bit of constant folding (which it can benefit from on internal optimisation decisions), but apart from that, the mantra is to just show the C compiler explicit C code that it can understand well, and let it do its job.


(We can of course do some fancy math to speed this particular sum up
since the result for any power of ten has a common pattern, but I
wouldn't expect the compiler optimiser to be that clever)

In this particular case, the C compiler really stores the end result in the binary module, so I assume that it simply applies Little Gauß as an optimisation in combination with loop variable aliasing.

Stefan

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to