On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote:
> On the other hand gcc did at the time a very poor job (quite an
> understatement) at bswapdi when compiling for 64 bit processors 
> (see the example).
> 
> But what do modern compilers generate for bswapdi these days? Do they
> still call the library or not?

Nope.

> After all, bswapdi on 32 bit processors only takes 6 instructions if the
> input and output registers don't overlap.

For this testcase:
===
typedef unsigned long long u64;
u64 bs(u64 x) { return __builtin_bswap64(x); }
===

we get with -m32:
===
bs:
        mr 9,3
        rotlwi 3,4,24
        rlwimi 3,4,8,8,15
        rlwimi 3,4,8,24,31
        rotlwi 4,9,24
        rlwimi 4,9,8,8,15
        rlwimi 4,9,8,24,31
        blr
===

and with -m64:
===
.L.bs:
        srdi 10,3,32
        mr 9,3
        rotlwi 3,3,24
        rotlwi 8,10,24
        rlwimi 3,9,8,8,15
        rlwimi 8,10,8,8,15
        rlwimi 3,9,8,24,31
        rlwimi 8,10,8,24,31
        sldi 3,3,32
        or 3,3,8
        blr
===

Neither as tight as possible, but neither horrible either.


Segher

Reply via email to