Quoting Christian Franke <[EMAIL PROTECTED]>:

#define grub_swap_bytes32(x) \
({ \
  grub_uint32_t _x = (x), _y; \
  asm ( \
       "xchgb %b0,%h0\n" \
       "roll $0x10,%0\n" \
       "xchgb %b0,%h0\n" \
       : "=q"(_y) : "0"(_x) \
   ); \
   _y; \
})

Results are much better for the inline implementation with gcc 4.3.1 (Fedora development AKA Rawhide).

I tried the idea I suggested (let gcc use any register and optimize in the assembler). The result was a true monstrosity, and it failed to beat your approach, where gcc is forced to use the first four registers. The size of all modules together increased by 8 bytes.

However, this code beats everything in my tests by a wide margin:

#define grub_swap_lo(x) \
({ \
  grub_uint32_t _x = (x), _y; \
  asm ( \
       "xchgb %b0,%h0\n" \
       : "=q"(_y) : "0"(_x) \
   ); \
   _y; \
})
#define grub_swap_hi(x) \
({ \
  grub_uint32_t _x = (x), _y; \
  asm ( \
       "roll $0x10,%0\n" \
       : "=r"(_y) : "0"(_x) \
   ); \
   _y; \
})
#define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x)))

It needs some cleanups to avoid warnings and use better names, but I think it should go in. Telling gcc the little details really helps.

Sum of module sizes ("du -b -S"):

366976  original
364988  "=q", xchgb
365460  "=q", rolw $8,%w0
365420  "=r", rolw $8,%w0
364996  monstrosity ("=r", conditional xchgb/rolw)
364728  "little details"

With my compiler, everything beats the code we have.

--
Regards,
Pavel Roskin


_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/grub-devel

Reply via email to