On Thu, Aug 11, 2016 at 11:34:37PM +0200, Gabriel Paubert wrote: > On the other hand gcc did at the time a very poor job (quite an > understatement) at bswapdi when compiling for 64 bit processors > (see the example). > > But what do modern compilers generate for bswapdi these days? Do they > still call the library or not?
Nope. > After all, bswapdi on 32 bit processors only takes 6 instructions if the > input and output registers don't overlap. For this testcase: === typedef unsigned long long u64; u64 bs(u64 x) { return __builtin_bswap64(x); } === we get with -m32: === bs: mr 9,3 rotlwi 3,4,24 rlwimi 3,4,8,8,15 rlwimi 3,4,8,24,31 rotlwi 4,9,24 rlwimi 4,9,8,8,15 rlwimi 4,9,8,24,31 blr === and with -m64: === .L.bs: srdi 10,3,32 mr 9,3 rotlwi 3,3,24 rotlwi 8,10,24 rlwimi 3,9,8,8,15 rlwimi 8,10,8,8,15 rlwimi 3,9,8,24,31 rlwimi 8,10,8,24,31 sldi 3,3,32 or 3,3,8 blr === Neither as tight as possible, but neither horrible either. Segher