http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46453
Summary: MIPS backend is not using special instructions for __builtin_bswap32 Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: n...@chello.at Host: i686 Target: mips-elf Build: ../configure --disable-libssp --prefix=/usr/local --target=mips-elf MIPS32 Relase 2 introduced a special instruction called wsbh that can be used for 32- and 16-bit byteswaps. However GCC never does produce this instruction. all code-snippets are produced with "-march=mips32r2 -O3" (which should enable the wsbh, rotr and ins instructions and optimized code using them). the assembly assumes big-endian endian. * __builtin_bswap32: eg. "v0 = __builtin_bswap32(a0);" should result in wsbh v0, a0 rotr v0, v0, 16 * 16bit byteswaps: similarly "v0 = ((a0 >> 8) | (a0 << 8));" (with a0,v0 being 16 bit uints) should result in: wsbh v0, a0 as it is now, the __builtin_bswap32 will always result in a function call (already atleast 2 instructions) and the implementation which uses 9 instruction. a 16bit bswap results in 4 instructions. So that would be nice savings especially in the 32bit case * Unaligned loads: More unimportantly unaligned 16bit loads could be optimized a bit aswell if the ins instruction is available: --Code sample (unaligned 16bit load): #pragma pack(push,1) union Unaligned { unsigned char c[2]; unsigned short u16; }; #pragma pack(pop) unsigned short readUnaligned16(const void *ptr) { return ((const union Unaligned *)(ptr))->u16; } -- Code sample results in this sequence: # a0 = ptr, v0 = return value lbu v0,0(a0) lbu v1,1(a0) sll v0,v0,0x8 or v0,v1,v0 better would be: lbu v0,0(a0) lbu v1,1(a0) ins v1,v0,8,8 Generating this sequences for unaligend 16bit loads would be a nice start. But there could be generic optimizations with sequences of left-shift and or being replaced with ins instructions, aslong it can be verified that registers have enough explicitly zeroed bits so they dont "overlap". similarly right-shift and masking could be replaced by ext instructions. eg. v0 = ((a0 >> 8) & 0xFF) equals to ext v0,a0,8,8.