On Fri, 11 Aug 2023, Alexander Monakov wrote:

> 
> On Fri, 11 Aug 2023, Richard Biener wrote:
> 
> > When we vectorize fold-left reductions with partial vectors but
> > no target operation available we use a vector conditional to force
> > excess elements to zero.  But that doesn't correctly preserve
> > the sign of zero.  The following patch disables partial vector
> > support in that case.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Does this look OK?  With -frounding-math -fno-signed-zeros we are
> > happily using the masking again, but that's OK, right?  An additional
> > + 0.0 shouldn't do anything here.
> 
> I think it converts SNan to QNan (when the partial vector has just one
> element which is SNan), so is a test for -fsignaling-nans missing?

Hm, I guess that's a corner case that could happen when there's no
runtime profitability check on more than one element and when the
element accumulated is directly loaded from memory.  OTOH the
loop vectorizer always expects an initial value for the reduction
and thus we perform either no add (when the loop isn't entered)
or at least a single add (when it is).  So I think this particular
situation cannot occur?

> In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> can do the reduction by substituting negative zero for masked-off
> elements ? maybe it's worth diagnosing that case separately (i.e.
> as "not yet implemented", not an incorrect transform)?

Ah, that's interesting.  So the only case we can't handle is
-frounding-math -fsigned-zeros then.  I'll see to adjust the patch
accordingly, like the following incremental patch:

> (note that in avx512 it's possible to materialize negative zeroes
> by mask with a single vpternlog instruction, which is cheap)

It ends up loading the { -0.0, ... } constant from memory, the
{ 0.0, ... } mask is handled by using a zero-masked load, so
indeed cheaper.

Richard.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 741b5c20389..bc3063c3615 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6905,7 +6905,17 @@ vectorize_fold_left_reduction (loop_vec_info 
loop_vinfo,
 
   tree vector_identity = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-    vector_identity = build_zero_cst (vectype_out);
+    {
+      vector_identity = build_zero_cst (vectype_out);
+      if (!HONOR_SIGNED_ZEROS (vectype_out))
+       ;
+      else
+       {
+         gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+         vector_identity = const_unop (NEGATE_EXPR, vectype_out,
+                                       vector_identity);
+       }
+    }
 
   tree scalar_dest_var = vect_create_destination_var (scalar_dest, NULL);
   int i;
@@ -8040,12 +8050,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       else if (reduction_type == FOLD_LEFT_REDUCTION
               && reduc_fn == IFN_LAST
               && FLOAT_TYPE_P (vectype_in)
-              && HONOR_SIGNED_ZEROS (vectype_in))
+              && HONOR_SIGNED_ZEROS (vectype_in)
+              && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "can't operate on partial vectors because"
-                            " signed zeros need to be handled.\n");
+                            " signed zeros cannot be preserved.\n");
          LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
        }
       else

Reply via email to