Jakub,

On Wed, 11 Nov 2020 at 11:31, Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Wed, Nov 11, 2020 at 11:17:32AM +0100, Philipp Tomsich wrote:
> > From: Philipp Tomsich <p...@gnu.org>
> >
> > The function
> >     long f(long a)
> >     {
> >       return(a & 0xFFFFFFFFull) << 3;
> >     }
> > is folded into
> >     _1 = a_2(D) << 3;
> >     _3 = _1 & 34359738360;
> > wheras the construction
> >     return (a & 0xFFFFFFFFull) * 8;
> > results in
> >     _1 = a_2(D) & 4294967295;
> >     _3 = _1 * 8;
> >
> > This leads to suboptimal code-generation for RISC-V (march=rv64g), as
> > the shifted constant needs to be expanded into 3 RTX and 2 RTX (one
> > each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm
> > the combine pass (a sequence of 5 RTX are not considered):
> >       li      a5,1            # tmp78,        # 23    [c=4 l=4]  
> > *movdi_64bit/1
> >       slli    a5,a5,35        #, tmp79, tmp78 # 24    [c=4 l=4]  ashldi3
> >       addi    a5,a5,-8        #, tmp77, tmp79 # 9     [c=4 l=4]  adddi3/1
> >       slli    a0,a0,3         #, tmp76, tmp80 # 6     [c=4 l=4]  ashldi3
> >       and     a0,a0,a5        # tmp77,, tmp76 # 15    [c=4 l=4]  anddi3/0
> >       ret                     # 28    [c=0 l=4]  simple_return
> > instead of:
> >       slli    a0,a0,32        #, tmp76, tmp79 # 26    [c=4 l=4]  ashldi3
> >       srli    a0,a0,29        #,, tmp76       # 27    [c=4 l=4]  lshrdi3
> >       ret                                     # 24    [c=0 l=4]  
> > simple_return
> >
> > We address this by adding a simplification for
> >    (a << s) & M, where ((M >> s) << s) == M
> > to
> >    (a & M_unshifted) << s, where M_unshifted := (M >> s)
> > which undistributes the LSHIFT.
>
> This is problematic, we have another rule that goes against this:
> /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
>    (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
> (for shift (lshift rshift)
>  (for bit_op (bit_and bit_xor bit_ior)
>   (simplify
>    (shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
>    (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>     (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); 
> }
>      (bit_op (shift (convert @0) @1) { mask; }))))))
> and we don't want the two rules to keep fighting against each other.
> It is better to have one form as canonical and only right before expansion
> (isel pass) or during expansion decide e.g. based on target costs
> whether that (X << C1) & (C2 << C1) is better expanded like that,
> or as (X & C2) << C1, or as (X << C3) >> C4.


The patch addresses this by disallowing that rule, if an exact power-of-2 is
seen as C1.  The reason why I would prefer to have this canonicalised the
same way the (X & C1) * C2 is canonicalised, is that cleaning this up during
combine is more difficult on some architectures that require multiple insns
to represent the shifted constant (i.e. C1 << C2).

Given that this maps back to another (already used) canonical form, it seems
straightforward enough — assuming that the additional complexity (i.e. an
additional rule and an additional condition for one of the other
rules) is acceptable.

Philipp.

Reply via email to