On Thu, Jul 8, 2021 at 2:51 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jul 8, 2021 at 9:37 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
> >
> >
> > Hi Richard,
> > Thanks. Yep, you've correctly the diagnosed that the motivation for the
> > get_builtin_precision helper function was that the TREE_TYPE of the
> > argument is affected by argument promotion.  Your suggestion to instead
> > use the TREE_TYPE of the function result is a much nicer solution.
> >
> > I also agree that that all of these bswap optimizations make the assumption
> > that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the
> > front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which
> > can be checked explicitly.
> >
> > Both of these improvements are implemented in the attached revised patch,
> > which has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
> > and "make -k check" with no new failures.
> >
> > Ok for mainline?
>
> OK.
>
> Thanks,
> Richard.
>
> > 2021-07-08  Roger Sayle  <ro...@nextmovesoftware.com>
> >             Richard Biener  <rguent...@suse.de>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/40210
> >         * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
> >         (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
> >         when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/40210
> >         * gcc.dg/builtin-bswap-13.c: New test.
> >         * gcc.dg/builtin-bswap-14.c: New test.
> >
> > Roger
> > --
> >
> > -----Original Message-----
> > From: Richard Biener <richard.guent...@gmail.com>
> > Sent: 07 July 2021 08:56
> > To: Roger Sayle <ro...@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to 
> > (X>>C3)&C2 in match.pd
> >
> > On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle <ro...@nextmovesoftware.com> 
> > wrote:
> > >
> > >
> > > All of the optimizations/transformations mentioned in bugzilla for PR
> > > tree-optimization/40210 are already implemented in mainline GCC, with
> > > one exception.  In comment #5, there's a suggestion that
> > > (bswap64(x)>>56)&0xff can be implemented without the bswap as
> > > (unsigned char)x, or equivalently x&0xff.
> > >
> > > This patch implements the above optimization, and closely related
> > > variants.  For any single bit, (bswap(X)>>C1)&1 can be simplified to
> > > (X>>C2)&1, where bit position C2 is the appropriate permutation of C1.
> > > Similarly, the bswap can eliminated if the desired set of bits all lie
> > > within the same byte, hence (bswap(x)>>8)&255 can always be optimized,
> > > as can (bswap(x)>>8)&123.
> > >
> > > Previously,
> > >
> > > int foo(long long x) {
> > >   return (__builtin_bswap64(x) >> 56) & 0xff; }
> > >
> > > compiled with -O2 to
> > > foo:    movq    %rdi, %rax
> > >         bswap   %rax
> > >         shrq    $56, %rax
> > >         ret
> > >
> > > with this patch, it now compiles to
> > > foo:    movzbl  %dil, %eax
> > >         ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with a "make
> > > bootstrap" and "make -k check" with no new failures.
> > >
> > > Ok for mainline?
> >
> > I don't like get_builtin_precision too much, did you consider simply using
> >
> > +  (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1))
> > + INTEGER_CST@2))
> >
> > and TYPE_PRECISION (TREE_TYPE (@3))?  I think while we'll see argument 
> > promotion and thus cannot use @1 to derive the type the return value will 
> > be the original type.
> >
> > Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you 
> > also use char_type_node.
> >
> > I wonder whether
> >
> > + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) &
> > + C2.  */ (simplify  (bit_and (convert1? (rshift@0 (convert2? (bswap
> > + @1)) INTEGER_CST@2))
> > +          INTEGER_CST@3)
> >
> > and
> >
> > + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2.  */
> > + (simplify  (rshift (convert? (bswap @0)) INTEGER_CST@1)
> >
> > can build upon each other, for example by extending the latter to handle 
> > more cases, transforming to ((T)x >> C2) & C3?
> > That might of course be only profitable when the bswap goes away.
> >
> > Thanks,
> > Richard.
> >
> > >
> > >
> > > 2021-07-06  Roger Sayle  <ro...@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR tree-optimization/40210
> > >         * builtins.c (get_builtin_precision): Helper function to determine
> > >         the precision in bits of a built-in function.
> > >         * builtins.h (get_builtin_precision): Prototype here.
> > >         * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
> > >         (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
> > >         when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 
> > > 0<=C1<=255.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR tree-optimization/40210
> > >         * gcc.dg/builtin-bswap-13.c: New test.
> > >         * gcc.dg/builtin-bswap-14.c: New test.
> > >
> > > Roger
> > > --
> > > Roger Sayle
> > > NextMove Software
> > > Cambridge, UK
> > >

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101403

-- 
H.J.

Reply via email to