Don's reply didn't reach me for some reason, but pulling it out of the
previous response:
On 21/08/07, Donald Bruce Stewart <[EMAIL PROTECTED]> wrote:
phil:
> The generated assembler suggests (if I've read it correctly) that gcc
> is spotting that it can replace the tail call with a jump in the C
> version, but for some reason it can't spot it for the Haskell version
> when compiling with -fvia-C (and neither does ghc itself using
> -fasm). So the haskell version ends up pushing and popping values on
> and off the stack for every call to f, which is a bit sad.
>
That doesn't sound quite right. The C version should get a tail call ,
with gcc -O2, the Haskell version should be a tail call anyway.
Just to be clear; the Haskell version is a tail call, but it's pushing
the values to and from memory (well, cache really of course) for every
call to f, which is killing the performance.
Let's see:
C
$ gcc -O t.c -o t
$ time ./t 1000000000
zsh: segmentation fault (core dumped) ./t 1000000000
./t 1000000000 0.02s user 0.22s system 5% cpu 4.640 total
Turning on -O2
$ time ./t 1000000000
-243309312
./t 1000000000 1.89s user 0.00s system 97% cpu 1.940 total
-O3 does better thanks to the loop unrolling, see timings bellow.
And GHC:
$ ghc -O2 A.hs -o A
$ time ./A 1000000000
-243309312
./A 1000000000 3.21s user 0.01s system 97% cpu 3.289 total
So, what, 1.6x slower than gcc -O2
Seems ok without any tuning.
You're getting much better timings than I am!
$ time -p ./sum-hs 1000000000
-243309312
real 3.75
user 3.70
$ time -p ./sum-c-O2 1000000000
-243309312
real 1.40
user 1.35
$ time -p ./sum-c-O3 1000000000
-243309312
real 1.21
user 1.18
(My box has a AMD Athlon64 3000+ CPU fwiw, but the powerpc version is
even worse when compared to it's respective C binary!)
Phil
--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe