On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > All of the optimizations/transformations mentioned in bugzilla for > PR tree-optimization/40210 are already implemented in mainline GCC, > with one exception. In comment #5, there's a suggestion that > (bswap64(x)>>56)&0xff can be implemented without the bswap as > (unsigned char)x, or equivalently x&0xff. > > This patch implements the above optimization, and closely related > variants. For any single bit, (bswap(X)>>C1)&1 can be simplified > to (X>>C2)&1, where bit position C2 is the appropriate permutation > of C1. Similarly, the bswap can eliminated if the desired set of > bits all lie within the same byte, hence (bswap(x)>>8)&255 can > always be optimized, as can (bswap(x)>>8)&123. > > Previously, > > int foo(long long x) { > return (__builtin_bswap64(x) >> 56) & 0xff; > } > > compiled with -O2 to > foo: movq %rdi, %rax > bswap %rax > shrq $56, %rax > ret > > with this patch, it now compiles to > foo: movzbl %dil, %eax > ret > > This patch has been tested on x86_64-pc-linux-gnu with a "make > bootstrap" and "make -k check" with no new failures. > > Ok for mainline?
I don't like get_builtin_precision too much, did you consider simply using + (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1)) INTEGER_CST@2)) and TYPE_PRECISION (TREE_TYPE (@3))? I think while we'll see argument promotion and thus cannot use @1 to derive the type the return value will be the original type. Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you also use char_type_node. I wonder whether + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & C2. */ + (simplify + (bit_and (convert1? (rshift@0 (convert2? (bswap @1)) INTEGER_CST@2)) + INTEGER_CST@3) and + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2. */ + (simplify + (rshift (convert? (bswap @0)) INTEGER_CST@1) can build upon each other, for example by extending the latter to handle more cases, transforming to ((T)x >> C2) & C3? That might of course be only profitable when the bswap goes away. Thanks, Richard. > > > 2021-07-06 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR tree-optimization/40210 > * builtins.c (get_builtin_precision): Helper function to determine > the precision in bits of a built-in function. > * builtins.h (get_builtin_precision): Prototype here. > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as > (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 > when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. > > gcc/testsuite/ChangeLog > PR tree-optimization/40210 > * gcc.dg/builtin-bswap-13.c: New test. > * gcc.dg/builtin-bswap-14.c: New test. > > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >