Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

Alan Lawrence Wed, 24 Sep 2014 08:03:09 -0700

So it looks like patches 1-6 (reduc_foo) are relatively close to final, andgiven these fix PR/61114, I'm gonna try to land these while working on a respinof the second half (vec_shr)...(summary: yes I like the vec_perm idea too, butthe devil is in the detail!)

However my CompileFarm account is still pending, so to that end, if you wereable to test patch 2/14 (attached inc. Richie's s/VIEW_CONVERT_EXPR/NOP_EXPR/)on the CompileFarm PowerPC machine, that'd be great, many thanks indeed. Itshould apply on its own without patch 1. I'll aim to get an alternative patch 3back to the list shortly, and follow up with .md updates to the various backends.


Cheers, Alan


Richard Biener wrote:

On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:

This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.

These are presently documented as producing a vector with the result in
element 0, and this is inconsistent with their use in tree-vect-loop.c
(which on bigendian targets pulls the bits out of the wrong end of the
vector result). This leads to bugs on bigendian targets - see also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.

I discounted "fixing" the vectorizer (to read from element 0) and then
making bigendian targets (whose architectural insn produces the result in
lane N-1) permute the result vector, as optimization of vectors in RTL seems
unlikely to remove such a permute and would lead to a performance
regression.

Instead it seems more natural for the tree code to produce a scalar result
(producing a vector with the result in lane 0 has already caused confusion,
e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).

However, this patch preserves the meaning of the optab (producing a result
in lane 0 on little-endian architectures or N-1 on bigendian), thus
generally avoiding the need to change backends. Thus, expr.c extracts an
endianness-dependent element from the optab result to give the result
expected for the tree code.

Previously posted as an RFC
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra
VIEW_CONVERT_EXPR if the types of the reduction/result do not match.


Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.

Ok with that change.

Thanks,
Richard.

Testing:
        x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
        aarch64-none-linux-gnu: bootstrap
        aarch64-none-elf:  check-gcc, check-g++
        arm-none-eabi: check-gcc

        aarch64_be-none-elf: check-gcc, showing
        FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
        FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
        Passes the (previously-failing) reduced testcase on
                https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

        Have also assembler/stage-1 tested that testcase on PowerPC, also
fixed.

gcc/ChangeLog:

        * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
        extract_bit_field around optab result.

        * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
produce
        scalar not vector.

        * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
type
        for REDUC_{MIN,MAX,PLUS}_EXPR.

        * tree-vect-loop.c (vect_analyze_loop): Update comment.
        (vect_create_epilog_for_reduction): For direct vector reduction, use
        result of tree code directly without extract_bit_field.

        * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
        comment.

commit a7b173d5efc6f08589b04fffeec9b3942b6282a0
Author: Alan Lawrence <alan.lawre...@arm.com>
Date:   Tue Jul 29 11:46:01 2014 +0100

    Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess)

diff --git a/gcc/expr.c b/gcc/expr.c
index a6233f3..c792028 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
       {
         op0 = expand_normal (treeop0);
         this_optab = optab_for_tree_code (code, type, optab_default);
-        temp = expand_unop (mode, this_optab, op0, target, unsignedp);
+        enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
+        temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+        gcc_assert (temp);
+        /* The tree code produces a scalar result, but (somewhat by convention)
+           the optab produces a vector with the result in element 0 if
+           little-endian, or element N-1 if big-endian.  So pull the scalar
+           result out of that element.  */
+        int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
+        int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+        temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
+				  target, mode, mode);
         gcc_assert (temp);
         return temp;
       }
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index d1b59a1..1773585 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -8246,12 +8246,13 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
     case REDUC_MAX_EXPR:
     case REDUC_PLUS_EXPR:
       {
-	unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i;
+	unsigned int nelts, i;
 	tree *elts;
 	enum tree_code subcode;
 
 	if (TREE_CODE (op0) != VECTOR_CST)
 	  return NULL_TREE;
+        nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0));
 
 	elts = XALLOCAVEC (tree, nelts);
 	if (!vec_cst_ctor_to_array (op0, elts))
@@ -8270,10 +8271,9 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
 	    elts[0] = const_binop (subcode, elts[0], elts[i]);
 	    if (elts[0] == NULL_TREE || !CONSTANT_CLASS_P (elts[0]))
 	      return NULL_TREE;
-	    elts[i] = build_zero_cst (TREE_TYPE (type));
 	  }
 
-	return build_vector (type, elts);
+	return elts[0];
       }
 
     default:
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e89d76a..6c6ff18 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3539,12 +3539,21 @@ verify_gimple_assign_unary (gimple stmt)
 
         return false;
       }
-
-    case VEC_UNPACK_HI_EXPR:
-    case VEC_UNPACK_LO_EXPR:
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+      if (!VECTOR_TYPE_P (rhs1_type)
+	  || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type)))
+        {
+	  error ("reduction should convert from vector to element type");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  return true;
+	}
+      return false;
+
+    case VEC_UNPACK_HI_EXPR:
+    case VEC_UNPACK_LO_EXPR:
     case VEC_UNPACK_FLOAT_HI_EXPR:
     case VEC_UNPACK_FLOAT_LO_EXPR:
       /* FIXME.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fd1166f..8d97e17 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1892,9 +1892,9 @@ vect_analyze_loop (struct loop *loop)
 
    Output:
    REDUC_CODE - the corresponding tree-code to be used to reduce the
-      vector of partial results into a single scalar result (which
-      will also reside in a vector) or ERROR_MARK if the operation is
-      a supported reduction operation, but does not have such tree-code.
+      vector of partial results into a single scalar result, or ERROR_MARK
+      if the operation is a supported reduction operation, but does not have
+      such a tree-code.
 
    Return FALSE if CODE currently cannot be vectorized as reduction.  */
 
@@ -4168,6 +4168,7 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
   if (reduc_code != ERROR_MARK && !slp_reduc)
     {
       tree tmp;
+      tree vec_elem_type;
 
       /*** Case 1:  Create:
            v_out2 = reduc_expr <v_out1>  */
@@ -4176,14 +4177,26 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "Reduce using direct vector reduction.\n");
 
-      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      tmp = build1 (reduc_code, vectype, new_phi_result);
-      epilog_stmt = gimple_build_assign (vec_dest, tmp);
-      new_temp = make_ssa_name (vec_dest, epilog_stmt);
+      vec_elem_type = TREE_TYPE (TREE_TYPE (new_phi_result));
+      if (!useless_type_conversion_p (scalar_type, vec_elem_type))
+	{
+          tree tmp_dest =
+	      vect_create_destination_var (scalar_dest, vec_elem_type);
+	  tmp = build1 (reduc_code, vec_elem_type, new_phi_result);
+	  epilog_stmt = gimple_build_assign (tmp_dest, tmp);
+	  new_temp = make_ssa_name (tmp_dest, epilog_stmt);
+	  gimple_assign_set_lhs (epilog_stmt, new_temp);
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+
+	  tmp = build1 (NOP_EXPR, scalar_type, new_temp);
+	}
+      else
+	tmp = build1 (reduc_code, scalar_type, new_phi_result);
+      epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-
-      extract_scalar_result = true;
+      scalar_results.safe_push (new_temp);
     }
   else
     {
diff --git a/gcc/tree.def b/gcc/tree.def
index bd39e4b..c830e4b 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1161,10 +1161,9 @@ DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1)
    result (e.g. summing the elements of the vector, finding the minimum over
    the vector elements, etc).
    Operand 0 is a vector.
-   The expression returns a vector of the same type, with the first
-   element in the vector holding the result of the reduction of all elements
-   of the operand.  The content of the other elements in the returned vector
-   is undefined.  */
+   The expression returns a scalar, with type the same as the elements of the
+   vector, holding the result of the reduction of all elements of the operand.
+   */
 DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)

Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

Reply via email to