Hao Liu OS <h...@os.amperecomputing.com> writes:
> Hi Richard,
>
> Update the patch with a simple case (see below case and comments).  It shows 
> a live stmt may not have reduction def, which introduce the ICE.
>
> Is it OK for trunk?

OK, thanks.

Richard

> ----
> Fix the assertion failure on empty reduction define in info_for_reduction.
> Even a stmt is live, it may still have empty reduction define.  Check the
> reduction definition instead of live info before calling info_for_reduction.
>
> gcc/ChangeLog:
>
>         PR target/110625
>         * config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
>         STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/pr110625_3.c: New testcase.
> ---
>  gcc/config/aarch64/aarch64.cc                 |  2 +-
>  gcc/testsuite/gcc.target/aarch64/pr110625_3.c | 34 +++++++++++++++++++
>  2 files changed, 35 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_3.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d4d76025545..5b8d8fa8e2d 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info,
>  static bool
>  aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
>  {
> -  if (!STMT_VINFO_LIVE_P (stmt_info))
> +  if (!STMT_VINFO_REDUC_DEF (stmt_info))
>      return false;
>
>    auto reduc_info = info_for_reduction (vinfo, stmt_info);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_3.c 
> b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
> new file mode 100644
> index 00000000000..35a50290cb0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mcpu=neoverse-n2" } */
> +
> +/* Avoid ICE on empty reduction def in single_defuse_cycle.
> +
> +   E.g.
> +     <bb 3> [local count: 858993456]:
> +     # sum_18 = PHI <sum_15(5), 0(2)>
> +     sum.0_5 = (unsigned int) sum_18;
> +     _6 = _4 + sum.0_5;     <-- it is "live" but doesn't have reduction def
> +     sum_15 = (int) _6;
> +     ...
> +     if (ivtmp_29 != 0)
> +       goto <bb 5>; [75.00%]
> +     else
> +       goto <bb 4>; [25.00%]
> +
> +     <bb 5> [local count: 644245086]:
> +     goto <bb 3>; [100.00%]
> +
> +     <bb 4> [local count: 214748368]:
> +     # _31 = PHI <_6(3)>
> +     _8 = _31 >> 1;
> +*/
> +
> +int
> +f (unsigned int *tmp)
> +{
> +  int sum = 0;
> +  for (int i = 0; i < 4; i++)
> +    sum += tmp[i];
> +
> +  return (unsigned int) sum >> 1;
> +}
> --
> 2.34.1
>
> ________________________________________
> From: Hao Liu OS <h...@os.amperecomputing.com>
> Sent: Tuesday, August 1, 2023 17:43
> To: Richard Sandiford
> Cc: Richard Biener; GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
> multiplying count [PR110625]
>
> Hi Richard,
>
> This is a quick fix to the several ICEs.  It seems even STMT_VINFO_LIVE_P is 
> true, some reduct stmts still don't have REDUC_DEF.  So I change the check to 
> STMT_VINFO_REDUC_DEF.
>
> Is it OK for trunk?
>
> ---
> Fix the ICEs on empty reduction define.  Even STMT_VINFO_LIVE_P is true, some 
> reduct stmts
> still don't have definition.
>
> gcc/ChangeLog:
>
>         PR target/110625
>         * config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
>         STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction
> ---
>  gcc/config/aarch64/aarch64.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d4d76025545..5b8d8fa8e2d 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
> stmt_vec_info stmt_info,
>  static bool
>  aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
>  {
> -  if (!STMT_VINFO_LIVE_P (stmt_info))
> +  if (!STMT_VINFO_REDUC_DEF (stmt_info))
>      return false;
>
>    auto reduc_info = info_for_reduction (vinfo, stmt_info);
> --
> 2.40.0
>
>
> ________________________________________
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Monday, July 31, 2023 17:11
> To: Hao Liu OS
> Cc: Richard Biener; GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
> multiplying count [PR110625]
>
> Hao Liu OS <h...@os.amperecomputing.com> writes:
>>> Which test case do you see this for?  The two tests in the patch still
>>> seem to report correct latencies for me if I make the change above.
>>
>> Not the newly added tests.  It is still the existing case causing the 
>> previous ICE (i.e. assertion problem): 
>> gcc.target/aarch64/sve/cost_model_13.c.
>>
>> It's not the test case itself failed, but the dump message of vect says the 
>> "reduction latency" is 0:
>>
>> Before the change:
>> cost_model_13.c:7:21: note:  Original vector body cost = 6
>> cost_model_13.c:7:21: note:  Scalar issue estimate:
>> cost_model_13.c:7:21: note:    load operations = 1
>> cost_model_13.c:7:21: note:    store operations = 0
>> cost_model_13.c:7:21: note:    general operations = 1
>> cost_model_13.c:7:21: note:    reduction latency = 1
>> cost_model_13.c:7:21: note:    estimated min cycles per iteration = 1.000000
>> cost_model_13.c:7:21: note:    estimated cycles per vector iteration (for VF 
>> 8) = 8.000000
>> cost_model_13.c:7:21: note:  Vector issue estimate:
>> cost_model_13.c:7:21: note:    load operations = 1
>> cost_model_13.c:7:21: note:    store operations = 0
>> cost_model_13.c:7:21: note:    general operations = 1
>> cost_model_13.c:7:21: note:    reduction latency = 2
>> cost_model_13.c:7:21: note:    estimated min cycles per iteration = 2.000000
>>
>> After the change:
>> cost_model_13.c:7:21: note:  Original vector body cost = 6
>> cost_model_13.c:7:21: note:  Scalar issue estimate:
>> cost_model_13.c:7:21: note:    load operations = 1
>> cost_model_13.c:7:21: note:    store operations = 0
>> cost_model_13.c:7:21: note:    general operations = 1
>> cost_model_13.c:7:21: note:    reduction latency = 0         <--- seems not 
>> consistent with above result
>> cost_model_13.c:7:21: note:    estimated min cycles per iteration = 1.000000
>> cost_model_13.c:7:21: note:    estimated cycles per vector iteration (for VF 
>> 8) = 8.000000
>> cost_model_13.c:7:21: note:  Vector issue estimate:
>> cost_model_13.c:7:21: note:    load operations = 1
>> cost_model_13.c:7:21: note:    store operations = 0
>> cost_model_13.c:7:21: note:    general operations = 1
>> cost_model_13.c:7:21: note:    reduction latency = 0         <--- seems not 
>> consistent with above result
>> cost_model_13.c:7:21: note:    estimated min cycles per iteration = 1.000000 
>>         <--- seems not consistent with above result
>>
>> BTW. this should be caused by the reduction stmt is not live, which 
>> indicates whether this stmts is part of a computation whose result is used 
>> outside the loop (tree-vectorized.h:1204):
>>   <bb 3>:
>>   # res_18 = PHI <res_15(7), 0(6)>
>>   # i_20 = PHI <i_16(7), 0(6)>
>>   _1 = (long unsigned int) i_20;
>>   _2 = _1 * 2;
>>   _3 = x_14(D) + _2;
>>   _4 = *_3;
>>   _5 = (unsigned short) _4;
>>   res.0_6 = (unsigned short) res_18;
>>   _7 = _5 + res.0_6;                             <-- This is not live, may 
>> be caused by the below type cast stmt.
>>   res_15 = (short int) _7;
>>   i_16 = i_20 + 1;
>>   if (n_11(D) > i_16)
>>     goto <bb 7>;
>>   else
>>     goto <bb 4>;
>>
>>   <bb 7>:
>>   goto <bb 3>;
>
> Ah, I see, thanks.  My concern was: if requiring !STMT_VINFO_LIVE_P stmts
> can cause "normal" reductions to have a latency of 0, could the same thing
> happen for single-cycle reductions?  But I suppose the answer is "no".
> Introducing a cast like the above would cause reduc_chain_length > 1,
> and so:
>
>   if (ncopies > 1
>       && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
>       && reduc_chain_length == 1
>       && loop_vinfo->suggested_unroll_factor == 1)
>     single_defuse_cycle = true;
>
> wouldn't trigger.  Which makes the single-cycle thing a bit hit-and-miss...
>
> So yeah, I agree the patch is safe after all.
>
> Please split the check out into a helper though, to avoid the awkward
> formatting:
>
> /* Return true if STMT_INFO is part of a reduction that has the form:
>
>       r = r op ...;
>       r = r op ...;
>
>    with the single accumulator being read and written multiple times.  */
> static bool
> aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
> {
>   if (!STMT_VINFO_LIVE_P (stmt_info))
>     return false;
>
>   auto reduc_info = info_for_reduction (vinfo, stmt_info);
>   return STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
> }
>
> OK with that change, thanks.
>
> Richard

Reply via email to