Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> Hi,
> The attached patch tries to fix PR91272.
> Does it look OK ?
>
> With patch, I see following failures for aarch64-sve.exp:
> FAIL: gcc.target/aarch64/sve/clastb_1.c -march=armv8.2-a+sve
> scan-assembler \\tclastb\\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\\.s
> FAIL: gcc.target/aarch64/sve/clastb_2.c -march=armv8.2-a+sve
> scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.s
> FAIL: gcc.target/aarch64/sve/clastb_3.c -march=armv8.2-a+sve
> scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.b
> FAIL: gcc.target/aarch64/sve/clastb_5.c -march=armv8.2-a+sve
> scan-assembler \\tclastb\\tx[0-9]+, p[0-7], x[0-9]+, z[0-9]+\\.d
>
> For instance, in clastb_1.c, it now emits:
>         clastb  s1, p1, s1, z0.s
> while using a fully predicated loop.
> Should I adjust the tests ?

Yeah, that's an improvement really.

> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index acdd90784dc..2cad2cb94c8 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -10016,7 +10016,8 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>        /* See whether another part of the vectorized code applies a loop
>        mask to the condition, or to its inverse.  */
>  
> -      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> +      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +       && reduction_type != EXTRACT_LAST_REDUCTION)
>       {
>         scalar_cond_masked_key cond (cond_expr, ncopies);
>         if (loop_vinfo->scalar_cond_masked_set.contains (cond))

The context here is:

          if (loop_vinfo->scalar_cond_masked_set.contains (cond))
            {
              vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
              loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
            }
          else
            {
              bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
              cond.code = invert_tree_comparison (cond.code, honor_nans);
              if (loop_vinfo->scalar_cond_masked_set.contains (cond))
                {
                  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
                  loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
                                                  vectype, j);
                  cond_code = cond.code;
                  swap_cond_operands = true;
                }
            }

Rather than have another instance of vect_get_loop_mask, I think
it would cleaner to use a nonnull "masks" to decide whether to apply
the loop mask:

      vec_loop_masks *masks = NULL;
      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
        {
          if (reduction_type == EXTRACT_LAST_REDUCTION
              || loop_vinfo->scalar_cond_masked_set.contains (cond))
            masks = &LOOP_VINFO_MASKS (loop_vinfo);
          else
            {
              ...
            }

Then:

> @@ -10116,6 +10117,15 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>            vec_then_clause = vec_oprnds2[i];
>            vec_else_clause = vec_oprnds3[i];
>  
> +          if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +           && reduction_type == EXTRACT_LAST_REDUCTION)
> +         {
> +           vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +           unsigned vec_num = vec_oprnds0.length ();
> +           loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> +                                           vectype, vec_num * j + i);
> +         }
> +

...do this vect_get_loop_mask under the condition of "if (masks)".

>         if (swap_cond_operands)
>           std::swap (vec_then_clause, vec_else_clause);
>  
> @@ -10180,7 +10190,7 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>            vec != { 0, ... } (masked in the MASK_LOAD,
>            unmasked in the VEC_COND_EXPR).  */
>  
> -       if (loop_mask)
> +       if (loop_mask && reduction_type != EXTRACT_LAST_REDUCTION)
>           {
>             if (COMPARISON_CLASS_P (vec_compare))
>               {
> @@ -10220,6 +10230,16 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>                 vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
>                 vec_compare = vec_compare_name;
>               }

The code above here:

              if (!is_gimple_val (vec_compare))
                {
                  tree vec_compare_name = make_ssa_name (vec_cmp_type);
                  gassign *new_stmt = gimple_build_assign (vec_compare_name,
                                                           vec_compare);
                  vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
                  vec_compare = vec_compare_name;
                }

is doing something similar to the new COND_EXPR handling:

              if (COMPARISON_CLASS_P (vec_compare))
                {
                  tree tmp = make_ssa_name (vec_cmp_type);
                  tree op0 = TREE_OPERAND (vec_compare, 0);
                  tree op1 = TREE_OPERAND (vec_compare, 1);
                  gassign *g = gimple_build_assign (tmp,
                                                    TREE_CODE (vec_compare),
                                                    op0, op1);
                  vect_finish_stmt_generation (stmt_info, g, gsi);
                  vec_compare = tmp;
                }

There's then an added case:

              if (must_invert_cmp_result)
                {
                  tree vec_compare_name = make_ssa_name (vec_cmp_type);
                  gassign *new_stmt = gimple_build_assign (vec_compare_name,
                                                           BIT_NOT_EXPR,
                                                           vec_compare);
                  vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
                  vec_compare = vec_compare_name;
                }

that would be a safe no-op for the normal COND_EXPR path (because
must_invert_cmp_result is always false then).  Then this:

> +
> +           if (loop_mask)
> +             {
> +               tree tmp = make_ssa_name (vec_cmp_type);
> +               gassign *g = gimple_build_assign (tmp, BIT_AND_EXPR,
> +                                                 vec_compare, loop_mask);
> +               vect_finish_stmt_generation (stmt_info, g, gsi);
> +               vec_compare = tmp;
> +             }
> +

is the equivalent of the above:

              tree tmp2 = make_ssa_name (vec_cmp_type);
              gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
                                                vec_compare, loop_mask);
              vect_finish_stmt_generation (stmt_info, g, gsi);
              vec_compare = tmp2;

So with this patch, I think the EXTRACT_LAST_REDUCTION and the normal
COND_EXPR paths should be able to share the mask generation code.

Thanks,
Richard

Reply via email to