On Fri, 11 Nov 2011, Andrew Pinski wrote: > On Fri, Nov 11, 2011 at 12:41 PM, Chris Metcalf <cmetc...@tilera.com> wrote: > > (The 16-bit swap would be done via __builtin_bswap32(x)>> 16.) > > If it's no worse for any platform, and better for some, that's > > probably sufficient reason to make the change in glibc to use it. > > It does produce worse code if the target does not implement the > patterns because the function is not inlined by default. It produces > a call to bswapsi and bswapdi.
Out-of-line does not necessarily mean "worse"; it will be slower in isolation, but probably results in smaller code size (for bswap32 and bswap64, that is) and for most code in practice it seems reduced cache usage produces speed gains that outweigh local slowdown from smaller code. -- Joseph S. Myers jos...@codesourcery.com