I was hoping I could ask an ARM backend maintainer to look over the following patch.
I was examining the code generated for the following C snippet on a raspberry pi, static inline int popcount_lut8(unsigned *buf, int n) { int cnt=0; unsigned int i; do { i = *buf; cnt += lut[i&255]; cnt += lut[i>>8&255]; cnt += lut[i>>16&255]; cnt += lut[i>>24]; buf++; } while(--n); return cnt; } and was surprised to see following instruction sequence generated by the compiler: mov r5, r2, lsr #8 uxtb r5, r5 This sequence can be performed by a single ARM instruction: uxtb r5, r2, ror #8 The attached patch allows GCC's combine pass to take advantage of the ARM's uxtb with rotate functionality to implement the above zero_extract, and likewise to use the sxtb with rotate to implement sign_extract. ARM's uxtb and sxtb can only be used with rotates of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a nop, and extends with ror #24 can be implemented using regular shifts], so the approach here is to add the six missing but useful instructions as 6 different define_insn in arm.md, rather than try to be clever with new predicates. Alas, later ARM hardware has advanced bit field instructions, and earlier ARM cores didn't support extend-with-rotate, so this appears to only benefit armv6 era CPUs. The following patch has been minimally tested by building cc1 of a cross-compiler and confirming the desired instructions appear in the assembly output for the test case. Alas, my minimal raspberry pi hardware is unlikely to be able to bootstrap gcc or run the testsuite, so I'm hoping a ARM expert can check (and confirm) whether this change is safe and suitable. [Thanks in advance and apologies for any inconvenience]. 2018-01-14 Roger Sayle <ro...@nextmovesoftware.com> * config/arm/arm.md (*arm_zeroextractsi2_8_8, *arm_signextractsi2_8_8, *arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16, *arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New. 2018-01-14 Roger Sayle <ro...@nextmovesoftware.com> * gcc.target/arm/extend-ror.c: New test. Cheers, Roger -- Roger Sayle, PhD. NextMove Software Limited Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY
arm_zext.log
Description: Binary data
arm_zext.patch
Description: Binary data
/* { dg-do compile } */ /* { dg-options "-O -march=armv6" } */ /* { dg-prune-output "switch .* conflicts with" } */ unsigned int zeroextractsi2_8_8(unsigned int x) { return (unsigned char)(x>>8); } unsigned int zeroextractsi2_8_16(unsigned int x) { return (unsigned char)(x>>16); } unsigned int signextractsi2_8_8(unsigned int x) { return (int)(signed char)(x>>8); } unsigned int signextractsi2_8_16(unsigned int x) { return (int)(signed char)(x>>16); } unsigned int zeroextractsi2_16_8(unsigned int x) { return (unsigned short)(x>>8); } unsigned int signextractsi2_16_8(unsigned int x) { return (int)(short)(x>>8); } /* { dg-final { scan-assembler-times ", ror #8" 4 } } */ /* { dg-final { scan-assembler-times ", ror #16" 2 } } */