On Thu, Jul 8, 2021 at 2:51 AM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Jul 8, 2021 at 9:37 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > > > > Hi Richard, > > Thanks. Yep, you've correctly the diagnosed that the motivation for the > > get_builtin_precision helper function was that the TREE_TYPE of the > > argument is affected by argument promotion. Your suggestion to instead > > use the TREE_TYPE of the function result is a much nicer solution. > > > > I also agree that that all of these bswap optimizations make the assumption > > that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the > > front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which > > can be checked explicitly. > > > > Both of these improvements are implemented in the attached revised patch, > > which has been tested on x86_64-pc-linux-gnu with a "make bootstrap" > > and "make -k check" with no new failures. > > > > Ok for mainline? > > OK. > > Thanks, > Richard. > > > 2021-07-08 Roger Sayle <ro...@nextmovesoftware.com> > > Richard Biener <rguent...@suse.de> > > > > gcc/ChangeLog > > PR tree-optimization/40210 > > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as > > (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 > > when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255. > > > > gcc/testsuite/ChangeLog > > PR tree-optimization/40210 > > * gcc.dg/builtin-bswap-13.c: New test. > > * gcc.dg/builtin-bswap-14.c: New test. > > > > Roger > > -- > > > > -----Original Message----- > > From: Richard Biener <richard.guent...@gmail.com> > > Sent: 07 July 2021 08:56 > > To: Roger Sayle <ro...@nextmovesoftware.com> > > Cc: GCC Patches <gcc-patches@gcc.gnu.org> > > Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to > > (X>>C3)&C2 in match.pd > > > > On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > > > > All of the optimizations/transformations mentioned in bugzilla for PR > > > tree-optimization/40210 are already implemented in mainline GCC, with > > > one exception. In comment #5, there's a suggestion that > > > (bswap64(x)>>56)&0xff can be implemented without the bswap as > > > (unsigned char)x, or equivalently x&0xff. > > > > > > This patch implements the above optimization, and closely related > > > variants. For any single bit, (bswap(X)>>C1)&1 can be simplified to > > > (X>>C2)&1, where bit position C2 is the appropriate permutation of C1. > > > Similarly, the bswap can eliminated if the desired set of bits all lie > > > within the same byte, hence (bswap(x)>>8)&255 can always be optimized, > > > as can (bswap(x)>>8)&123. > > > > > > Previously, > > > > > > int foo(long long x) { > > > return (__builtin_bswap64(x) >> 56) & 0xff; } > > > > > > compiled with -O2 to > > > foo: movq %rdi, %rax > > > bswap %rax > > > shrq $56, %rax > > > ret > > > > > > with this patch, it now compiles to > > > foo: movzbl %dil, %eax > > > ret > > > > > > This patch has been tested on x86_64-pc-linux-gnu with a "make > > > bootstrap" and "make -k check" with no new failures. > > > > > > Ok for mainline? > > > > I don't like get_builtin_precision too much, did you consider simply using > > > > + (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1)) > > + INTEGER_CST@2)) > > > > and TYPE_PRECISION (TREE_TYPE (@3))? I think while we'll see argument > > promotion and thus cannot use @1 to derive the type the return value will > > be the original type. > > > > Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you > > also use char_type_node. > > > > I wonder whether > > > > + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) & > > + C2. */ (simplify (bit_and (convert1? (rshift@0 (convert2? (bswap > > + @1)) INTEGER_CST@2)) > > + INTEGER_CST@3) > > > > and > > > > + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2. */ > > + (simplify (rshift (convert? (bswap @0)) INTEGER_CST@1) > > > > can build upon each other, for example by extending the latter to handle > > more cases, transforming to ((T)x >> C2) & C3? > > That might of course be only profitable when the bswap goes away. > > > > Thanks, > > Richard. > > > > > > > > > > > 2021-07-06 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR tree-optimization/40210 > > > * builtins.c (get_builtin_precision): Helper function to determine > > > the precision in bits of a built-in function. > > > * builtins.h (get_builtin_precision): Prototype here. > > > * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as > > > (x>>C3)&C2 when possible. Simplify bswap(x)>>C1 as ((T)x)>>C2 > > > when possible. Simplify bswap(x)&C1 as (x>>C2)&C1 when > > > 0<=C1<=255. > > > > > > gcc/testsuite/ChangeLog > > > PR tree-optimization/40210 > > > * gcc.dg/builtin-bswap-13.c: New test. > > > * gcc.dg/builtin-bswap-14.c: New test. > > > > > > Roger > > > -- > > > Roger Sayle > > > NextMove Software > > > Cambridge, UK > > >
This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101403 -- H.J.