Hongyu Wang <wwwhhhyyy...@gmail.com> writes:
> CC'd Richard for ccmp part as previously it is added only for aarch64.
> The original logic will not interrupted since if
> aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> prepare_operand will fixup the input that cmp supports but ccmp not,
> so ret/ret2 will all be valid when comparing cost.
> Thanks in advance.

Sorry for the slow review.

> Hongyu Wang <hongyu.w...@intel.com> 于2024年5月15日周三 16:22写道:
>>
>> For general ccmp scenario, the tree sequence is like
>>
>> _1 = (a < b)
>> _2 = (c < d)
>> _3 = _1 & _2
>>
>> current ccmp expanding will try to swap compare order for _1 and _2,
>> compare the cost/cost2 between compare _1 and _2 first, then return the
>> sequence with lower cost.
>>
>> For x86 ccmp, we don't support FP compare as ccmp operand, but we
>> support fp comi + int ccmp sequence. With current cost comparison
>> model, the fp comi + int ccmp can never be generated since it doesn't
>> check whether expand_ccmp_next returns available result and the rtl
>> cost for the empty ccmp sequence is always smaller.
>>
>> Check the expand_ccmp_next result ret and ret2, returns the valid one
>> before cost comparison.
>>
>> gcc/ChangeLog:
>>
>>         * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
>>         expand_ccmp_next, returns the valid one first before
>>         comparing cost.
>> ---
>>  gcc/ccmp.cc | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
>> index 7cb525addf4..4b424220068 100644
>> --- a/gcc/ccmp.cc
>> +++ b/gcc/ccmp.cc
>> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
>> rtx_insn **gen_seq)
>>               cost2 = seq_cost (prep_seq_2, speed_p);
>>               cost2 += seq_cost (gen_seq_2, speed_p);
>>             }
>> -         if (cost2 < cost1)
>> +
>> +         /* For x86 target the ccmp does not support fp operands, but
>> +            have fcomi insn that can produce eflags and then do int
>> +            ccmp. So if one of the op is fp compare, ret1 or ret2 can
>> +            fail, and the cost of the corresponding empty seq will
>> +            always be smaller, then the NULL sequence will be returned.
>> +            Add check for ret and ret2, returns the available one if
>> +            the other is NULL.  */

I think the more fundamental point is that the cost of a failed
expansion isn't meaningful.  So how about:

          /* It's possible that one expansion succeeds and the other fails.
             For example, x86 has int ccmp but not fp ccmp, and so a combined
             fp and int comparison must be ordered such that the fp comparison
             happens first.  The costs are not meaningful for failed
             expansions.  */

>> +         if ((!ret && ret2)
>> +             || (!(ret && !ret2)
>> +                 && cost2 < cost1))

I think this simplifies to:

          if (ret2 && (!ret1 || cost2 < cost1))

OK with those changes, thanks.

Richard

>>             {
>>               *prep_seq = prep_seq_2;
>>               *gen_seq = gen_seq_2;
>> --
>> 2.31.1
>>

Reply via email to