Pavel Roskin wrote: > On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote: > > > With old gcc versions without the "rol" optimization, even the 16 > > bit swap should be a function: > > > > Or better yet, an asm statement. > > We should consider optimized assembly vs function call. Even the > 32-bit swap could be shorter: > > a: 86 c4 xchg %al,%ah > c: c1 c0 10 rol $0x10,%eax > f: 86 c4 xchg %al,%ah > 11: > > That's 7 bytes! ...
But the function call in the 32-bit case requires only 5 bytes :-) (The 16-bit case requires more due to short->int propagation) Overall size from inline asm would only be smaller if there is any benefit from additional optimizations. Here is a possibly working draft for include/grub/i386/types.h (requires a #ifdef in include/grub/types.h): #define grub_swap_bytes32(x) \ ({ \ grub_uint32_t _x = (x), _y; asm ( \ "xchg %%al,%%ah\n" \ "roll $0x10,%%eax\n" \ "xchg %%al,%%ah\n" \ : "=a"(_y) : "0"(_x) \ ); \ _y; \ }) Result with the test script from my last mail: Debian gcc 4.1.2-7: inline (portable)=357, inline (asm)=126, function=104 Cygwin gcc 3.4.4: inline (portable)=340, inline (asm)=124, function=96 Function call is still better. The only candidate for inline is probably grub_swap_bytes16(). > .... And if written properly, it could work with any of > the registers that allow access to the lower two bytes (%eax, %ebx, > %ecx and %edx), thus giving more flexibility to the optimizer. > This would require support to access the Rl and Rh parts of eRx for each R in [a-d]. Something like: asm ( "xchg %0:l,%0:h\n" "roll $0x10,%0\n" "xchg %0:l,%0:h\n" : "=r"(_y) : "0"(_x) \ ); Do gcc or gas provide a syntax to do this? Christian _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel