Hongyu Wang <wwwhhhyyy...@gmail.com> writes: > CC'd Richard for ccmp part as previously it is added only for aarch64. > The original logic will not interrupted since if > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the > prepare_operand will fixup the input that cmp supports but ccmp not, > so ret/ret2 will all be valid when comparing cost. > Thanks in advance.
Sorry for the slow review. > Hongyu Wang <hongyu.w...@intel.com> 于2024年5月15日周三 16:22写道: >> >> For general ccmp scenario, the tree sequence is like >> >> _1 = (a < b) >> _2 = (c < d) >> _3 = _1 & _2 >> >> current ccmp expanding will try to swap compare order for _1 and _2, >> compare the cost/cost2 between compare _1 and _2 first, then return the >> sequence with lower cost. >> >> For x86 ccmp, we don't support FP compare as ccmp operand, but we >> support fp comi + int ccmp sequence. With current cost comparison >> model, the fp comi + int ccmp can never be generated since it doesn't >> check whether expand_ccmp_next returns available result and the rtl >> cost for the empty ccmp sequence is always smaller. >> >> Check the expand_ccmp_next result ret and ret2, returns the valid one >> before cost comparison. >> >> gcc/ChangeLog: >> >> * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of >> expand_ccmp_next, returns the valid one first before >> comparing cost. >> --- >> gcc/ccmp.cc | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc >> index 7cb525addf4..4b424220068 100644 >> --- a/gcc/ccmp.cc >> +++ b/gcc/ccmp.cc >> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, >> rtx_insn **gen_seq) >> cost2 = seq_cost (prep_seq_2, speed_p); >> cost2 += seq_cost (gen_seq_2, speed_p); >> } >> - if (cost2 < cost1) >> + >> + /* For x86 target the ccmp does not support fp operands, but >> + have fcomi insn that can produce eflags and then do int >> + ccmp. So if one of the op is fp compare, ret1 or ret2 can >> + fail, and the cost of the corresponding empty seq will >> + always be smaller, then the NULL sequence will be returned. >> + Add check for ret and ret2, returns the available one if >> + the other is NULL. */ I think the more fundamental point is that the cost of a failed expansion isn't meaningful. So how about: /* It's possible that one expansion succeeds and the other fails. For example, x86 has int ccmp but not fp ccmp, and so a combined fp and int comparison must be ordered such that the fp comparison happens first. The costs are not meaningful for failed expansions. */ >> + if ((!ret && ret2) >> + || (!(ret && !ret2) >> + && cost2 < cost1)) I think this simplifies to: if (ret2 && (!ret1 || cost2 < cost1)) OK with those changes, thanks. Richard >> { >> *prep_seq = prep_seq_2; >> *gen_seq = gen_seq_2; >> -- >> 2.31.1 >>