Richard Biener <rguent...@suse.de> writes:
> On Wed, 16 Aug 2023, Juzhe-Zhong wrote:
>
>> Hi, Richard and Richi.
>> 
>> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
>> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support 
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>> 
>> Consider this following case:
>> #define TEST_TYPE(TYPE)                                                      
>>   \
>>   __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,          
>>   \
>>                                            TYPE *__restrict a,              \
>>                                            TYPE *__restrict b, int n)       \
>>   {                                                                          
>>   \
>>     for (int i = 0; i < n; i++)                                              
>>   \
>>       dst[i] -= a[i] * b[i];                                           \
>>   }
>> 
>> #define TEST_ALL()                                                           
>>   \
>>   TEST_TYPE (float)                                                          
>>   \
>> 
>> TEST_ALL ()
>> 
>> Gimple IR for RVV:
>> 
>> ...
>> _39 = -vect__8.14_26;
>> vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, 
>> vect__4.8_34, vect__4.8_34, _46, 0);
>> ...
>> 
>> This is because this following piece of codes in tree-ssa-math-opts.cc:
>> 
>>       if (len)
>>      fma_stmt
>>        = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
>>                                      addop, else_value, len, bias);
>>       else if (cond)
>>      fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
>>                                             op2, addop, else_value);
>>       else
>>      fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
>>       gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
>>       gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
>>                                                                 use_stmt));
>>       gsi_replace (&gsi, fma_stmt, true);
>>       /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>>       regardless of where the negation occurs.  */
>>       gimple *orig_stmt = gsi_stmt (gsi);
>>       if (fold_stmt (&gsi, follow_all_ssa_edges))
>>      {
>>        if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>>          gcc_unreachable ();
>>        update_stmt (gsi_stmt (gsi));
>>      }
>> 
>> 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA.
>> 
>> This patch support STMT fold into:
>> 
>> vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, 
>> vect__4.8_34, { 0.0, ... }, _46, 0);
>> 
>> Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
>> 
>> Extend maximum num ops:
>> -  static const unsigned int MAX_NUM_OPS = 5;
>> +  static const unsigned int MAX_NUM_OPS = 7;
>> 
>> Bootstrap and Regtest on X86 passed.
>> 
>> Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
>> 
>> Testing on aarch64 is on progress.
>> 
>> gcc/ChangeLog:
>> 
>>         * genmatch.cc (decision_tree::gen): Support 
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
>>         * gimple-match-exports.cc (gimple_simplify): Ditto.
>>         (gimple_resimplify6): New function.
>>         (gimple_resimplify7): New function.
>>         (gimple_match_op::resimplify): Support 
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
>>         (convert_conditional_op): Ditto.
>>         (build_call_internal): Ditto.
>>         (try_conditional_simplification): Ditto.
>>         (gimple_extract): Ditto.
>>         * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
>>         * internal-fn.cc (CASE): Ditto.
>> 
>> ---
>>  gcc/genmatch.cc             |   2 +-
>>  gcc/gimple-match-exports.cc | 124 ++++++++++++++++++++++++++++++++++--
>>  gcc/gimple-match.h          |  19 +++++-
>>  gcc/internal-fn.cc          |  11 ++--
>>  4 files changed, 144 insertions(+), 12 deletions(-)
>> 
>> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
>> index f46d2e1520d..a1925a747a7 100644
>> --- a/gcc/genmatch.cc
>> +++ b/gcc/genmatch.cc
>> @@ -4052,7 +4052,7 @@ decision_tree::gen (vec <FILE *> &files, bool gimple)
>>      }
>>    fprintf (stderr, "removed %u duplicate tails\n", rcnt);
>>  
>> -  for (unsigned n = 1; n <= 5; ++n)
>> +  for (unsigned n = 1; n <= 7; ++n)
>>      {
>>        bool has_kids_p = false;
>>  
>> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
>> index 7aeb4ddb152..895950309b7 100644
>> --- a/gcc/gimple-match-exports.cc
>> +++ b/gcc/gimple-match-exports.cc
>> @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, 
>> gimple_seq *, tree (*)(tree),
>>                           code_helper, tree, tree, tree, tree, tree);
>>  extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree 
>> (*)(tree),
>>                           code_helper, tree, tree, tree, tree, tree, tree);
>> +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree 
>> (*)(tree),
>> +                         code_helper, tree, tree, tree, tree, tree, tree,
>> +                         tree);
>> +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree 
>> (*)(tree),
>> +                         code_helper, tree, tree, tree, tree, tree, tree,
>> +                         tree, tree);
>>  
>>  /* Functions that are needed by gimple-match but that are exported and used 
>> in
>>     other places in the compiler.  */
>> @@ -89,6 +95,8 @@ static bool gimple_resimplify2 (gimple_seq *, 
>> gimple_match_op *, tree (*)(tree))
>>  static bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree 
>> (*)(tree));
>>  static bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree 
>> (*)(tree));
>>  static bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree 
>> (*)(tree));
>> +static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree 
>> (*)(tree));
>> +static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree 
>> (*)(tree));
>>  
>>  /* Match and simplify the toplevel valueized operation THIS.
>>     Replaces THIS with a simplified and/or canonicalized result and
>> @@ -109,6 +117,10 @@ gimple_match_op::resimplify (gimple_seq *seq, tree 
>> (*valueize)(tree))
>>        return gimple_resimplify4 (seq, this, valueize);
>>      case 5:
>>        return gimple_resimplify5 (seq, this, valueize);
>> +    case 6:
>> +      return gimple_resimplify6 (seq, this, valueize);
>> +    case 7:
>> +      return gimple_resimplify7 (seq, this, valueize);
>>      default:
>>        gcc_unreachable ();
>>      }
>> @@ -146,7 +158,14 @@ convert_conditional_op (gimple_match_op *orig_op,
>>    if (ifn == IFN_LAST)
>>      return false;
>>    unsigned int num_ops = orig_op->num_ops;
>> -  new_op->set_op (as_combined_fn (ifn), orig_op->type, num_ops + 2);
>> +  unsigned int num_cond_ops = 2;
>> +  if (orig_op->cond.len)
>> +    {
>> +      /* Convert COND_FNMA to COND_LEN_FNMA if it has LEN and BIAS.  */
>> +      ifn = get_len_internal_fn (ifn);
>> +      num_cond_ops = 4;
>> +    }
>
> This should definitely be placed before the ifn == IFN_LAST check,
> and possibly not doing the get_conditional_internal_fn thing at all.

FWIW, I think it probably works out neater the current way: i.e. first
get the conditional function without length and bias parameters, then add
the length and bias parameters where needed.  It should be a safe operation
now that there is a mechanical link between COND_FOO and COND_LEN_FOO
(i.e. we can't have one without the other).

Thanks,
Richard

Reply via email to