On 09/28/2016 10:33 AM, Kyrill Tkachov wrote:
FWIW, the simpler form of the transformation is already done just
prior to leaving SSA form in tree-ssa-uncprop.c. So there may not be
a ton of opportunities for the simpler form in the RTL optimizers.
Indeed there weren't that many places, but they were some. They helped
in avoiding materialising the floating point 0.0
in particular in some cases.
Yea, that makes perfect sense. I'm pretty sure we don't do the
transformation on any floating point constants.
My only concern here is this transformation isn't significantly
different than the simpler form, which is apparently causing some kind
of combine problem.
I'd like to understand the combine problem better before giving final
approval to this patch.
So it's performance/codesize issue, not a correctness issue. For some
reason I thought it was a correctness issue and thus needed
investigation prior to this patch going forward.
Sure, I'm attaching the ifcvt patch that does the aforementioned simple
transform.
The testcase it regresses is:
int
foo (float a, float b, float c, int d)
{
return a > 5 && a < 10 ? 6 : 0;
}
As far as I got is:
The relevant pre-combine (aarch64) RTL without the patch is:
(insn 21 19 45 2 (set (reg:SI 87)
(zero_extend:SI (subreg:QI (reg:SI 86) 0))) fbad.c:4 90
{*zero_extendqisi2_aarch64}
(expr_list:REG_DEAD (reg:SI 86)
(nil)))
(insn 47 45 48 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 87)
(const_int 0 [0]))) fbad.c:4 392 {cmpsi}
(expr_list:REG_DEAD (reg:SI 87)
(nil)))
(insn 48 47 30 2 (set (reg:SI 76 [ <retval> ])
(if_then_else:SI (ne (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 89)
(const_int 0 [0]))) fbad.c:4 444 {*cmovsi_insn}
(expr_list:REG_DEAD (reg:SI 89)
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil))))
and with the patch it is:
(insn 21 19 45 2 (set (reg:SI 87)
(zero_extend:SI (subreg:QI (reg:SI 86) 0))) fbad.c:4 90
{*zero_extendqisi2_aarch64}
(expr_list:REG_DEAD (reg:SI 86)
(nil)))
(insn 46 45 47 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 87)
(const_int 0 [0]))) fbad.c:4 392 {cmpsi}
(nil))
(insn 47 46 30 2 (set (reg:SI 76 [ <retval> ])
(if_then_else:SI (eq (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 87)
(reg:SI 89))) fbad.c:4 444 {*cmovsi_insn}
(expr_list:REG_DEAD (reg:SI 89)
(expr_list:REG_DEAD (reg:SI 87)
(expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))))
Combine is trying to combine 21 -> 47 in the first case (21 -> 46 in the
second) and without the
transformation ends up looking into the uses of CC as well, the
IF_THEN_ELSE in insn 48.
But I think it only does that if reg 87 is dead in insn 47.
Yea, I see what you're saying.
In the end what happens is it fails to merge the comparison and the use
of the comparison
and ends up emitting extra code.
So before:
fmov s2, 5.0e+0
fmov s1, 1.0e+1
mov w0, 6
fcmpe s0, s2
fccmpe s0, s1, 0, gt
csel w0, w0, wzr, mi //conditional select
ret
after:
foo:
fmov s2, 5.0e+0
fmov s1, 1.0e+1
mov w1, 6
fcmpe s0, s2
fccmpe s0, s1, 0, gt
cset w0, mi // conditional set
cmp w0, 0
csel w0, w0, w1, eq
ret
I think it has to do with the logic to set and use added_sets_2 in
try_combine
but I got lost trying to debug it, so I sort of attributed it to
"extending live ranges
of registers is bad"
Well, extending the live range restricts what combine can do in some
cases. Essentially combine has to preserve the assignment to (reg 87).
It'll end up trying to match a more complex parallel pattern in that
case -- which likely fails to be recognized as a valid pattern.
These can sometimes be addressed with 3->2 splitters.
Anyway, since the combine issue is strictly a performance question
rather than a correctness issue, your patch is OK for the trunk.
Jeff