Jakub, On Wed, 11 Nov 2020 at 11:31, Jakub Jelinek <ja...@redhat.com> wrote: > > On Wed, Nov 11, 2020 at 11:17:32AM +0100, Philipp Tomsich wrote: > > From: Philipp Tomsich <p...@gnu.org> > > > > The function > > long f(long a) > > { > > return(a & 0xFFFFFFFFull) << 3; > > } > > is folded into > > _1 = a_2(D) << 3; > > _3 = _1 & 34359738360; > > wheras the construction > > return (a & 0xFFFFFFFFull) * 8; > > results in > > _1 = a_2(D) & 4294967295; > > _3 = _1 * 8; > > > > This leads to suboptimal code-generation for RISC-V (march=rv64g), as > > the shifted constant needs to be expanded into 3 RTX and 2 RTX (one > > each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm > > the combine pass (a sequence of 5 RTX are not considered): > > li a5,1 # tmp78, # 23 [c=4 l=4] > > *movdi_64bit/1 > > slli a5,a5,35 #, tmp79, tmp78 # 24 [c=4 l=4] ashldi3 > > addi a5,a5,-8 #, tmp77, tmp79 # 9 [c=4 l=4] adddi3/1 > > slli a0,a0,3 #, tmp76, tmp80 # 6 [c=4 l=4] ashldi3 > > and a0,a0,a5 # tmp77,, tmp76 # 15 [c=4 l=4] anddi3/0 > > ret # 28 [c=0 l=4] simple_return > > instead of: > > slli a0,a0,32 #, tmp76, tmp79 # 26 [c=4 l=4] ashldi3 > > srli a0,a0,29 #,, tmp76 # 27 [c=4 l=4] lshrdi3 > > ret # 24 [c=0 l=4] > > simple_return > > > > We address this by adding a simplification for > > (a << s) & M, where ((M >> s) << s) == M > > to > > (a & M_unshifted) << s, where M_unshifted := (M >> s) > > which undistributes the LSHIFT. > > This is problematic, we have another rule that goes against this: > /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1) > (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1). */ > (for shift (lshift rshift) > (for bit_op (bit_and bit_xor bit_ior) > (simplify > (shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1) > (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) > (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); > } > (bit_op (shift (convert @0) @1) { mask; })))))) > and we don't want the two rules to keep fighting against each other. > It is better to have one form as canonical and only right before expansion > (isel pass) or during expansion decide e.g. based on target costs > whether that (X << C1) & (C2 << C1) is better expanded like that, > or as (X & C2) << C1, or as (X << C3) >> C4.
The patch addresses this by disallowing that rule, if an exact power-of-2 is seen as C1. The reason why I would prefer to have this canonicalised the same way the (X & C1) * C2 is canonicalised, is that cleaning this up during combine is more difficult on some architectures that require multiple insns to represent the shifted constant (i.e. C1 << C2). Given that this maps back to another (already used) canonical form, it seems straightforward enough — assuming that the additional complexity (i.e. an additional rule and an additional condition for one of the other rules) is acceptable. Philipp.