Quoting Christian Franke <[EMAIL PROTECTED]>:
#define grub_swap_bytes32(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"xchgb %b0,%h0\n" \
"roll $0x10,%0\n" \
"xchgb %b0,%h0\n" \
: "=q"(_y) : "0"(_x) \
); \
_y; \
})
Results are much better for the inline implementation with gcc 4.3.1
(Fedora development AKA Rawhide).
I tried the idea I suggested (let gcc use any register and optimize in
the assembler). The result was a true monstrosity, and it failed to
beat your approach, where gcc is forced to use the first four
registers. The size of all modules together increased by 8 bytes.
However, this code beats everything in my tests by a wide margin:
#define grub_swap_lo(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"xchgb %b0,%h0\n" \
: "=q"(_y) : "0"(_x) \
); \
_y; \
})
#define grub_swap_hi(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"roll $0x10,%0\n" \
: "=r"(_y) : "0"(_x) \
); \
_y; \
})
#define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x)))
It needs some cleanups to avoid warnings and use better names, but I
think it should go in. Telling gcc the little details really helps.
Sum of module sizes ("du -b -S"):
366976 original
364988 "=q", xchgb
365460 "=q", rolw $8,%w0
365420 "=r", rolw $8,%w0
364996 monstrosity ("=r", conditional xchgb/rolw)
364728 "little details"
With my compiler, everything beats the code we have.
--
Regards,
Pavel Roskin
_______________________________________________
Grub-devel mailing list
Grub-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/grub-devel