On 09/28/2016 10:33 AM, Kyrill Tkachov wrote:
FWIW, the simpler form of the transformation is already done just
prior to leaving SSA form in tree-ssa-uncprop.c.  So there may not be
a ton of opportunities for the simpler form in the RTL optimizers.


Indeed there weren't that many places, but they were some. They helped
in avoiding materialising the floating point 0.0
in particular in some cases.
Yea, that makes perfect sense. I'm pretty sure we don't do the transformation on any floating point constants.


My only concern here is this transformation isn't significantly
different than the simpler form, which is apparently causing some kind
of combine problem.

I'd like to understand the combine problem better before giving final
approval to this patch.
So it's performance/codesize issue, not a correctness issue. For some reason I thought it was a correctness issue and thus needed investigation prior to this patch going forward.




Sure, I'm attaching the ifcvt patch that does the aforementioned simple
transform.
The testcase it regresses is:
int
foo (float a, float b, float c, int d)
{
  return a > 5 && a < 10 ? 6 : 0;
}

As far as I got is:
The relevant pre-combine (aarch64) RTL without the patch is:
(insn 21 19 45 2 (set (reg:SI 87)
        (zero_extend:SI (subreg:QI (reg:SI 86) 0))) fbad.c:4 90
{*zero_extendqisi2_aarch64}
     (expr_list:REG_DEAD (reg:SI 86)
        (nil)))
(insn 47 45 48 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 87)
            (const_int 0 [0]))) fbad.c:4 392 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 87)
        (nil)))
(insn 48 47 30 2 (set (reg:SI 76 [ <retval> ])
        (if_then_else:SI (ne (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:SI 89)
            (const_int 0 [0]))) fbad.c:4 444 {*cmovsi_insn}
     (expr_list:REG_DEAD (reg:SI 89)
        (expr_list:REG_DEAD (reg:CC 66 cc)
            (nil))))


and with the patch it is:
(insn 21 19 45 2 (set (reg:SI 87)
        (zero_extend:SI (subreg:QI (reg:SI 86) 0))) fbad.c:4 90
{*zero_extendqisi2_aarch64}
     (expr_list:REG_DEAD (reg:SI 86)
        (nil)))
(insn 46 45 47 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 87)
            (const_int 0 [0]))) fbad.c:4 392 {cmpsi}
     (nil))
(insn 47 46 30 2 (set (reg:SI 76 [ <retval> ])
        (if_then_else:SI (eq (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:SI 87)
            (reg:SI 89))) fbad.c:4 444 {*cmovsi_insn}
     (expr_list:REG_DEAD (reg:SI 89)
        (expr_list:REG_DEAD (reg:SI 87)
            (expr_list:REG_DEAD (reg:CC 66 cc)
                (nil)))))

Combine is trying to combine 21 -> 47 in the first case (21 -> 46 in the
second) and without the
transformation ends up looking into the uses of CC as well, the
IF_THEN_ELSE in insn 48.
But I think it only does that if reg 87 is dead in insn 47.
Yea, I see what you're saying.

In the end what happens is it fails to merge the comparison and the use
of the comparison
and ends up emitting extra code.
So before:
    fmov    s2, 5.0e+0
    fmov    s1, 1.0e+1
    mov    w0, 6
    fcmpe    s0, s2
    fccmpe    s0, s1, 0, gt
    csel    w0, w0, wzr, mi //conditional select
    ret

after:
foo:
    fmov    s2, 5.0e+0
    fmov    s1, 1.0e+1
    mov    w1, 6
    fcmpe    s0, s2
    fccmpe    s0, s1, 0, gt
    cset    w0, mi // conditional set
    cmp    w0, 0
    csel    w0, w0, w1, eq
    ret

I think it has to do with the logic to set and use added_sets_2 in
try_combine
but I got lost trying to debug it, so I sort of attributed it to
"extending live ranges
of registers is bad"
Well, extending the live range restricts what combine can do in some cases. Essentially combine has to preserve the assignment to (reg 87). It'll end up trying to match a more complex parallel pattern in that case -- which likely fails to be recognized as a valid pattern.

These can sometimes be addressed with 3->2 splitters.

Anyway, since the combine issue is strictly a performance question rather than a correctness issue, your patch is OK for the trunk.

Jeff



Reply via email to