Hi all,

One issue that we keep encountering on aarch64 is GCC not making good use of 
the flag-setting arithmetic instructions
like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the 
result against zero.
They are represented in a fairly standard way in the backend as PARALLEL 
patterns:
(parallel [(set (reg x1) (op (reg x2) (reg x3)))
           (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])

GCC isn't forming these from separate arithmetic and comparison instructions as 
aggressively as it could.
A particular pain point is when the result of the arithmetic insn is used 
before the comparison instruction.
The testcase in this patch is one such example where we have:
(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
        (plus:SI (reg:SI 0 x0 [ x ])
            (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
     (nil))
(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
        (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
            (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
     (nil))
(insn 34 33 17 2 (set (reg:CC 66 cc)
        (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
            (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
     (nil))

This scares combine away as x0 is used in insn 33 as well as the comparison in 
insn 34.
I think the compare-elim pass can help us here.

This patch extends it by handling comparisons against zero, finding the 
defining instruction of the compare
and merging the comparison with the defining instruction into a PARALLEL that 
will hopefully match the form
described above. If between the comparison and the defining insn we find an 
instruction that uses the condition
registers or any control flow we bail out, and we don't cross basic blocks.
This simple technique allows us to catch cases such as this example.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and 
x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
            Michael Collison <michael.colli...@arm.com>

        * compare-elim.c: Include emit-rtl.h.
        (can_merge_compare_into_arith): New function.
        (try_validate_parallel): Likewise.
        (try_merge_compare): Likewise.
        (try_eliminate_compare): Call the above when no previous clobber
        is available.
        (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
        dataflow problems.

2017-08-05  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
            Michael Collison <michael.colli...@arm.com>
            
        * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.

Attachment: pr5198v1.patch
Description: pr5198v1.patch

Reply via email to