On Thu, Jul 8, 2021 at 9:37 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Richard,
> Thanks. Yep, you've correctly the diagnosed that the motivation for the
> get_builtin_precision helper function was that the TREE_TYPE of the
> argument is affected by argument promotion.  Your suggestion to instead
> use the TREE_TYPE of the function result is a much nicer solution.
>
> I also agree that that all of these bswap optimizations make the assumption
> that BITS_PER_UNIT is 8 (i.e. that bytes are 8-bits), and some that the
> front-end supports an 8-bit type (i.e. that CHAR_TYPE_SIZE is 8), which
> can be checked explicitly.
>
> Both of these improvements are implemented in the attached revised patch,
> which has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?

OK.

Thanks,
Richard.

> 2021-07-08  Roger Sayle  <ro...@nextmovesoftware.com>
>             Richard Biener  <rguent...@suse.de>
>
> gcc/ChangeLog
>         PR tree-optimization/40210
>         * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
>         (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
>         when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/40210
>         * gcc.dg/builtin-bswap-13.c: New test.
>         * gcc.dg/builtin-bswap-14.c: New test.
>
> Roger
> --
>
> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: 07 July 2021 08:56
> To: Roger Sayle <ro...@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] PR tree-opt/40210: Fold (bswap(X)>>C1)&C2 to (X>>C3)&C2 
> in match.pd
>
> On Tue, Jul 6, 2021 at 9:01 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
> >
> >
> > All of the optimizations/transformations mentioned in bugzilla for PR
> > tree-optimization/40210 are already implemented in mainline GCC, with
> > one exception.  In comment #5, there's a suggestion that
> > (bswap64(x)>>56)&0xff can be implemented without the bswap as
> > (unsigned char)x, or equivalently x&0xff.
> >
> > This patch implements the above optimization, and closely related
> > variants.  For any single bit, (bswap(X)>>C1)&1 can be simplified to
> > (X>>C2)&1, where bit position C2 is the appropriate permutation of C1.
> > Similarly, the bswap can eliminated if the desired set of bits all lie
> > within the same byte, hence (bswap(x)>>8)&255 can always be optimized,
> > as can (bswap(x)>>8)&123.
> >
> > Previously,
> >
> > int foo(long long x) {
> >   return (__builtin_bswap64(x) >> 56) & 0xff; }
> >
> > compiled with -O2 to
> > foo:    movq    %rdi, %rax
> >         bswap   %rax
> >         shrq    $56, %rax
> >         ret
> >
> > with this patch, it now compiles to
> > foo:    movzbl  %dil, %eax
> >         ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with a "make
> > bootstrap" and "make -k check" with no new failures.
> >
> > Ok for mainline?
>
> I don't like get_builtin_precision too much, did you consider simply using
>
> +  (bit_and (convert1? (rshift@0 (convert2? (bswap@3 @1))
> + INTEGER_CST@2))
>
> and TYPE_PRECISION (TREE_TYPE (@3))?  I think while we'll see argument 
> promotion and thus cannot use @1 to derive the type the return value will be 
> the original type.
>
> Now, I see '8' being used which likely should be CHAR_TYPE_SIZE since you 
> also use char_type_node.
>
> I wonder whether
>
> + /* (bswap(x) >> C1) & C2 can sometimes be simplified to (x >> C3) &
> + C2.  */ (simplify  (bit_and (convert1? (rshift@0 (convert2? (bswap
> + @1)) INTEGER_CST@2))
> +          INTEGER_CST@3)
>
> and
>
> + /* bswap(x) >> C1 can sometimes be simplified to (T)x >> C2.  */
> + (simplify  (rshift (convert? (bswap @0)) INTEGER_CST@1)
>
> can build upon each other, for example by extending the latter to handle more 
> cases, transforming to ((T)x >> C2) & C3?
> That might of course be only profitable when the bswap goes away.
>
> Thanks,
> Richard.
>
> >
> >
> > 2021-07-06  Roger Sayle  <ro...@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR tree-optimization/40210
> >         * builtins.c (get_builtin_precision): Helper function to determine
> >         the precision in bits of a built-in function.
> >         * builtins.h (get_builtin_precision): Prototype here.
> >         * match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
> >         (x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
> >         when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.
> >
> > gcc/testsuite/ChangeLog
> >         PR tree-optimization/40210
> >         * gcc.dg/builtin-bswap-13.c: New test.
> >         * gcc.dg/builtin-bswap-14.c: New test.
> >
> > Roger
> > --
> > Roger Sayle
> > NextMove Software
> > Cambridge, UK
> >

Reply via email to