On 2020-03-02 10:17 a.m., Jeff Law wrote:
On Mon, 2020-03-02 at 15:37 +0100, Christophe Lyon wrote:
On Fri, 28 Feb 2020 at 17:39, Vladimir Makarov <vmaka...@redhat.com> wrote:
   The following patch is dealing with arm failures after submitting
original patch for PR93564.

    Changing heuristics in the original patch resulted in different order
of allocation and creating gaps in hard reg file which were not enough
for pseudos requiring double regs.  So RA started to use caller-saved
regs and additional store/load insns in function prologue. That is the
reason for some arm failures.

    The patch was successfully bootstrapped and benchmarked on x86-64.
On x86-64 SPEC2000 the patch generates a bit smaller and faster in
average code.

Hi,

This is causing another set of regressions on arm.
For instance on arm-linux-gnueabihf --with-cpu cortex-a9
--with-fpu neon-fp16:
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-not vmov\\.f16
FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fuse-caller-save.c scan-assembler-times mov\tr3, r0 1
FAIL: gcc.target/arm/unaligned-argument-2.c scan-assembler-times stm 1
I suspect at least some of these are likely just register assignments changing.

  It is a generation of unexpected but still correct code. Changing heursitics can create small gaps in hard reg files which are not enough to fit multi-regs pseudos and there will be more probability of usage of callee-saved regs which means loads/stores in prologue/epilogue.

  As assigning to multi-regs pseudos first was never the highest priority in the assignment (execution frequency has a higher priority), we were lucky enough to generate the expected code.  In general, these kind failures are for very small functions without loops where even stack is not used.  The more important cases are RA for big functions (as we have aggressive inlining) with loops and for these cases the latest patch decreases SPEC2000 code size and improved the performance visibly at least for x86-64.

  In any case, I'll look at these tests but fixing all RA performance issues and tests checking them is might be just chasing a rainbow.


Reply via email to