On Wed, Nov 11, 2020 at 2:55 AM Jakub Jelinek via Gcc-patches < gcc-patches@gcc.gnu.org> wrote:
> On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote: > > The patch addresses this by disallowing that rule, if an exact > power-of-2 is > > seen as C1. The reason why I would prefer to have this canonicalised the > > same way the (X & C1) * C2 is canonicalised, is that cleaning this up > during > > combine is more difficult on some architectures that require multiple > insns > > to represent the shifted constant (i.e. C1 << C2). > > As I said, it is better to decide which one is better before or during > expansion based on target costs, sure, combine can't catch everything. > it could be fixed in combine if we allowed 4 instructions to split into 3. We allow combinations of 4 insns, but we only allow splits into 2 insns as far as I know. Trying 7, 8, 9 -> 10: 7: r80:DI=0x1 8: r81:DI=r80:DI<<0x23 REG_DEAD r80:DI REG_EQUAL 0x800000000 9: r79:DI=r81:DI-0x8 REG_DEAD r81:DI REG_EQUAL 0x7fffffff8 10: r77:DI=r78:DI&r79:DI REG_DEAD r79:DI REG_DEAD r78:DI Failed to match this instruction: (set (reg:DI 77) (and:DI (reg:DI 78) (const_int 34359738360 [0x7fffffff8]))) The AND operation can be implemented with 3 shifts, a left shift to clear the upper bits, a right shift to clear the lower bits, and then another left shift to shift it back to position. We are then left with 4 shifts, and we can have a combiner pattern to match those 4 shifts and reduce to 2. But this would require combine.c changes to work. Unless maybe we don't split the pattern into 3 insns and accept it as is, but there is risk that this could result in worse code. Or alternatively, maybe we could have an ANDDI3 expander which accepts mask constants like this and emits the 3 shifts directly instead of forcing the constant to a register. Then we just need to be able to recognize that these 3 shifts plus the fourth one can be combined into 2 shifts which might work already, and if not it should be a simple combiner pattern. This doesn't help if the same RTL can be created after initial RTL expansion. With the draft B extension we do this operation with a single instruction, but we should get this right for systems without the B extension also. Jim