Hi all,
One issue that we keep encountering on aarch64 is GCC not making good use of
the flag-setting arithmetic instructions
like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the
result against zero.
They are represented in a fairly standard way in the backend as PARALLEL
patterns:
(parallel [(set (reg x1) (op (reg x2) (reg x3)))
(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
GCC isn't forming these from separate arithmetic and comparison instructions as
aggressively as it could.
A particular pain point is when the result of the arithmetic insn is used
before the comparison instruction.
The testcase in this patch is one such example where we have:
(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
(plus:SI (reg:SI 0 x0 [ x ])
(reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
(nil))
(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
(plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
(const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
(nil))
(insn 34 33 17 2 (set (reg:CC 66 cc)
(compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
(const_int 0 [0]))) "comb.c":4 391 {cmpsi}
(nil))
This scares combine away as x0 is used in insn 33 as well as the comparison in
insn 34.
I think the compare-elim pass can help us here.
This patch extends it by handling comparisons against zero, finding the
defining instruction of the compare
and merging the comparison with the defining instruction into a PARALLEL that
will hopefully match the form
described above. If between the comparison and the defining insn we find an
instruction that uses the condition
registers or any control flow we bail out, and we don't cross basic blocks.
This simple technique allows us to catch cases such as this example.
Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and
x86_64.
Ok for trunk?
2017-08-05 Kyrylo Tkachov <[email protected]>
Michael Collison <[email protected]>
* compare-elim.c: Include emit-rtl.h.
(can_merge_compare_into_arith): New function.
(try_validate_parallel): Likewise.
(try_merge_compare): Likewise.
(try_eliminate_compare): Call the above when no previous clobber
is available.
(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
dataflow problems.
2017-08-05 Kyrylo Tkachov <[email protected]>
Michael Collison <[email protected]>
* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
pr5198v1.patch
Description: pr5198v1.patch
