On Tue, Aug 11, 2020 at 9:34 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > The recent fix for mul_widen_cost revealed an interesting > quirk of ira/reload register allocation on x86_64. As shown in > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551648.html > for gcc.target/i386/pr71321.c we generate the following code that > performs unnecessary register shuffling. > > movl $-51, %edx > movl %edx, %eax > mulb %dil > > which is caused by reload generating the following instructions > (notice the set of the first register is dead in the 2nd insn): > > (insn 7 4 36 2 (set (reg:QI 1 dx [94]) > (const_int -51 [0xffffffffffffffcd])) {*movqi_internal} > (expr_list:REG_EQUIV (const_int -51 [0xffffffffffffffcd]) > (nil))) > (insn 36 7 8 2 (set (reg:QI 0 ax [93]) > (reg:QI 1 dx [94])) {*movqi_internal} > (expr_list:REG_DEAD (reg:QI 1 dx [94]) > (nil))) > > Various discussions in bugzilla seem to point to reload preferring > not to load constants directly into CLASS_LIKELY_SPILLED_P registers.
This can extend the lifetime of a register over the instruction that needs one of the CLASS_LIKELY_SPILLED_P registers. Various MUL, DIV and even shift insns were able to choke the allocator for x86 targets, so this is a small price to pay to avoid regalloc failure. > Whatever the cause, one solution (workaround), that doesn't involve > rewriting a register allocator, is to use peephole2 to spot this > weirdness and eliminate it. In fact, this use case is (probably) > the reason peephole optimizers were originally developed, but it's > a little disappointing this application of them is still required > today. On a positive note, this clean-up is cheap, as we're already > traversing the instruction stream with liveness (REG_DEAD notes) > already calculated. > > With this peephole2 the above three instructions (from pr71321.c) > are replaced with: > > movl $-51, %eax > mulb %dil > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. This peephole triggers > 1435 during stage2 and stage3 of a bootstrap, and a further 1274 > times during "make check". The most common case is DX_REG->AX_REG > (as above) which occurs 421 times. I've restricted this pattern to > immediate constant loads into general operand registers, which fixes > this particular problem, but broader predicates may help similar cases. > Ok for mainline? > > 2020-08-11 Roger Sayle <ro...@nextmovesoftware.com> > > * config/i386/i386.md (peephole2): Reduce unnecessary > register shuffling produced by register allocation. LGTM, but I wonder if the allocator is also too conservative with memory operands. Perhaps x86_64_general_operand can be used here. Uros. > > Thanks in advance, > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >