On Tue, Aug 11, 2020 at 9:34 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> The recent fix for mul_widen_cost revealed an interesting
> quirk of ira/reload register allocation on x86_64.  As shown in
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551648.html
> for gcc.target/i386/pr71321.c we generate the following code that
> performs unnecessary register shuffling.
>
>         movl    $-51, %edx
>         movl    %edx, %eax
>         mulb    %dil
>
> which is caused by reload generating the following instructions
> (notice the set of the first register is dead in the 2nd insn):
>
> (insn 7 4 36 2 (set (reg:QI 1 dx [94])
>         (const_int -51 [0xffffffffffffffcd])) {*movqi_internal}
>      (expr_list:REG_EQUIV (const_int -51 [0xffffffffffffffcd])
>         (nil)))
> (insn 36 7 8 2 (set (reg:QI 0 ax [93])
>         (reg:QI 1 dx [94])) {*movqi_internal}
>      (expr_list:REG_DEAD (reg:QI 1 dx [94])
>         (nil)))
>
> Various discussions in bugzilla seem to point to reload preferring
> not to load constants directly into CLASS_LIKELY_SPILLED_P registers.

This can extend the lifetime of a register over the instruction that
needs one of the CLASS_LIKELY_SPILLED_P registers. Various MUL, DIV
and even shift insns were able to choke the allocator for x86 targets,
so this is a small price to pay to avoid regalloc failure.

> Whatever the cause, one solution (workaround), that doesn't involve
> rewriting a register allocator, is to use peephole2 to spot this
> weirdness and eliminate it.  In fact, this use case is (probably)
> the reason peephole optimizers were originally developed, but it's
> a little disappointing this application of them is still required
> today.  On a positive note, this clean-up is cheap, as we're already
> traversing the instruction stream with liveness (REG_DEAD notes)
> already calculated.
>
> With this peephole2 the above three instructions (from pr71321.c)
> are replaced with:
>
>         movl    $-51, %eax
>         mulb    %dil
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  This peephole triggers
> 1435 during stage2 and stage3 of a bootstrap, and a further 1274
> times during "make check".  The most common case is DX_REG->AX_REG
> (as above) which occurs 421 times.  I've restricted this pattern to
> immediate constant loads into general operand registers, which fixes
> this particular problem, but broader predicates may help similar cases.
> Ok for mainline?
>
> 2020-08-11  Roger Sayle  <ro...@nextmovesoftware.com>
>
>         * config/i386/i386.md (peephole2): Reduce unnecessary
>         register shuffling produced by register allocation.

LGTM, but I wonder if the allocator is also too conservative with
memory operands. Perhaps x86_64_general_operand can be used here.

Uros.
>
> Thanks in advance,
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>

Reply via email to