On Sat, Feb 12, 2005 at 11:36:27AM -0800, Steven M. Schultz wrote:
> On Sat, 12 Feb 2005, Roine Gustafsson wrote:
> > It's an urban myth that 64bit is faster than 32bit, like people assume 
> > a 2GHz computer is twice as fast as a 1GHz computer.
> 
>       It's also an urban myth that 64bit is slower than 32bit :)

Not automatically, but on MacOS you can easily run into trouble.

Anecdote: I took a benchmark of my own (mostly loops of integer math
and bitwise logical operations) and put it on a G5 XServe.  This code
made use of some 64-bit integers.  I compiled it with gcc generically
and it ran quite nicely.  As I started adding flags to enable the use
of the G5's 64-bit instructions, it got slower.  The more optimization
flags I added, the worse it got.

Here's why; consider this trivial bit of code to add two 64-bit
integers:

  #include <stdint.h>
  uint64_t foo(uint64_t const x,uint64_t const y) { return x + y; }

First a generic 32-bit compilation and the resulting assembly:

gcc -O3
  _foo:
        addc r4,r4,r6
        adde r3,r3,r5
        blr

Now let's turn on all of the options for G5 support and see what
happens:

gcc -O3 -mcpu=970 -mtune=970 -mpowerpc64 -mpowerpc-gpopt -force_cpusubtype_ALL
  _foo:
        stw r3,-32(r1)
        stw r4,-28(r1)
        stw r5,-24(r1)
        stw r6,-20(r1)
        ld r4,-32(r1)
        ld r3,-24(r1)
        add r2,r4,r3
        mr r4,r2
        srdi r3,r2,32
        blr

GAH!  The function's arguments are already in registers, but this code
writes them to RAM and then reads them back before using them.  I
don't know enough about the G5's pipelining and cache performance to
say how bad this will be, but it's certainly going to be noticably
slower than the non-G5 version.

My guess is that since the current MacOS has no 64-bit support in the
ABI, all function arguments get broken into 32-bit values before being
passed.  gcc wants to get the values into 64-bit registers so that it
can do a single add instruction, but for whatever reason it believes
the best approach to doing this is to use 32-bit stores and 64-bit
loads.

So a combination of the limits in the Apple ABI, and gcc's crazy
implementation, lead to the resulting code being much worse when
the 64-bit optimizations are turned on.

This is with Apple's supplied/modified gcc 3.3.  I also tried a
vanilla gcc 3.4.1, but it generates even more instructions for the
64-bit case and additionally seemed to have some incompatibilities
with Apple's gcc wrt structure field layout.  Perhaps the commercial
compilers will do better.

Hopefully the next MacOS with a 64-bit userland will fix all this.

                                                  -Dave Dodge


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Mjpeg-users mailing list
Mjpeg-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mjpeg-users

Reply via email to