RE: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

Kumar, Venkataramanan Sun, 02 Aug 2015 04:05:03 -0700

Hi Jakub, 

Thank you for reviewing the patch.


I have incorporated your comments in the attached patch.


> -----Original Message-----
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, July 29, 2015 1:24 AM
> To: Kumar, Venkataramanan
> Cc: Richard Beiner (richard.guent...@gmail.com); gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with
> power 2 integer constant.
> 
> Hi!
> 
> > This is just an initial patch and tries to optimize integer type power
> > 2 constants.  I wanted to get feedback on this .  I bootstrapped and
> > reg tested on aarch64-none-linux-gnu .
> 
> Thanks for working on it.
> ChangeLog entry for the patch is missing, probably also some testcases.
> 
> > @@ -90,6 +94,7 @@ static vect_recog_func_ptr
> vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
> >     vect_recog_rotate_pattern,
> >     vect_recog_vector_vector_shift_pattern,
> >     vect_recog_divmod_pattern,
> > +        vect_recog_multconst_pattern,
> >     vect_recog_mixed_size_cond_pattern,
> >     vect_recog_bool_pattern};
> 
> Please watch formatting, the other lines are tab indented, so please use a
> tab rather than 8 spaces.
> 
> > @@ -2147,6 +2152,87 @@ vect_recog_vector_vector_shift_pattern
> (vec<gimple> *stmts,
> >    return pattern_stmt;
> >  }
> >
> 
> Function comment is missing here.
> 
> > +static gimple
> > +vect_recog_multconst_pattern (vec<gimple> *stmts,
> > +                           tree *type_in, tree *type_out)
> 
> About the function name, wonder if just vect_recog_mult_pattern wouldn't
> be enough.
> 
> > +  rhs_code = gimple_assign_rhs_code (last_stmt);  switch (rhs_code)
> > +    {
> > +    case MULT_EXPR:
> > +      break;
> > +    default:
> > +      return NULL;
> > +    }
> 
> This looks too weird, I'd just do
>   if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
>     return NULL;
> (you handle just one pattern).
> 
> > +  /* If the target can handle vectorized multiplication natively,
> > +     don't attempt to optimize this.  */  optab = optab_for_tree_code
> > + (rhs_code, vectype, optab_default);
> 
> Supposedly you can use MULT_EXPR directly here.
> 
> > +  /* If target cannot handle vector left shift then we cannot
> > +     optimize and bail out.  */
> > +  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
> > + if (!optab
> > +      || optab_handler (optab, TYPE_MODE (vectype)) ==
> CODE_FOR_nothing)
> > +        return NULL;
> > +
> > +  if (integer_pow2p (oprnd1))
> > +    {
> > +      /* Pattern detected.  */
> > +      if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +                    "vect_recog_multconst_pattern: detected:\n");
> > +
> > +      tree shift;
> > +      shift = build_int_cst (itype, tree_log2 (oprnd1));
> > +      pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var
> (itype, NULL),
> > +                                     LSHIFT_EXPR, oprnd0, shift);
> > +      if (dump_enabled_p ())
> > +   dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM,
> pattern_stmt,
> > +                              0);
> > +      stmts->safe_push (last_stmt);
> > +      *type_in = vectype;
> > +      *type_out = vectype;
> > +      return pattern_stmt;
> > +    }
> 
> Trailing whitespace.
> The integer_pow2p case (have you checked signed multiply by INT_MIN?) is
> only one of the cases you can actually handle, you can look at expand_mult
> for many other cases - e.g. multiplication by negated powers of 2, or call
> choose_mult_variant and handle whatever it returns.

I have added patterns that detect mults with power of 2 constants and convert 
them to shifts in this patch.
I will do a follow up patch for other variants as done in "expand_mult" and 
"choose_mult_variant".

INT_MIN case, I have not yet checked. 

 I thought of adding like this. 
if (wi::eq_p (oprnd1, HOST_WIDE_INT_MIN))
    return NULL;

But wanted to confirm with you since I don't understand  this clearly.  Is this 
because Var * INT_MIN !=  Var << 31  ?

 I tried this case for "int" or long int type arr  

void test_vector_shifts()
{
        for(i=0; i<=99;i++)
        arr[i]=arr[i]*INT_MIN;
}

Int type-  Looks like  without my patch aarch64 vectorizes it using shifts.
        ldr     q0, [x0]
        shl     v0.4s, v0.4s, 31
        str     q0, [x0], 16

For long int  type -  without my patch it converts to shifts but vectorization 
did not happen.
        ldr     x1, [x0]
        neg     x1, x1, lsl 31
        str     x1, [x0], 8

> 
>       Jakub

gcc/ChangeLog
2015-08-02  Venkataramanan Kumar  <venkataramanan.ku...@amd.com>
     * tree-vect-patterns.c (vect_recog_mult_pattern): New function for 
vectorizing
        multiplication patterns.
     * tree-vectorizer.h: Adjust the number of patterns.

gcc/testsuite/ChangeLog
2015-08-02  Venkataramanan Kumar  <venkataramanan.ku...@amd.com>
     * gcc.dg/vect/vect-mult-patterns.c: New
 

Regards,
Venkat.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c 
b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
new file mode 100644
index 0000000..ad1760c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+unsigned  long int __attribute__ ((aligned (64)))arr[100];
+int i;
+
+void test_vector_shifts ()
+{
+  for (i=0; i<=99; i++)
+    arr[i] = arr[i] * 4;
+}
+
+void test_vectorshift_via_mul ()
+{
+  for (i=0; i<=99; i++)
+    arr[i] = arr[i] * (-4);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target  { 
! { vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 
"vect" {target  { ! { vect_int_mult } } } } } */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f034635..5cbb49e 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -76,6 +76,10 @@ static gimple vect_recog_vector_vector_shift_pattern 
(vec<gimple> *,
                                                      tree *, tree *);
 static gimple vect_recog_divmod_pattern (vec<gimple> *,
                                         tree *, tree *);
+
+static gimple vect_recog_mult_pattern (vec<gimple> *,
+                                      tree *, tree *);
+
 static gimple vect_recog_mixed_size_cond_pattern (vec<gimple> *,
                                                  tree *, tree *);
 static gimple vect_recog_bool_pattern (vec<gimple> *, tree *, tree *);
@@ -90,6 +94,7 @@ static vect_recog_func_ptr 
vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
        vect_recog_rotate_pattern,
        vect_recog_vector_vector_shift_pattern,
        vect_recog_divmod_pattern,
+       vect_recog_mult_pattern,
        vect_recog_mixed_size_cond_pattern,
        vect_recog_bool_pattern};
 
@@ -2147,6 +2152,140 @@ vect_recog_vector_vector_shift_pattern (vec<gimple> 
*stmts,
   return pattern_stmt;
 }
 
+/* Detect multiplication by constant which are postive or negatives of power 2,
+   and convert them to shift patterns.
+
+   Mult with constants that are postive power of two.
+   type a_t;
+   type b_t
+   S1: b_t = a_t * n
+
+   or
+
+   Mult with constants that are negative power of two.
+   S2: b_t = a_t * -n
+
+   Input/Output:
+
+   STMTS: Contains a stmt from which the pattern search begins,
+   i.e. the mult stmt.  Convert the mult operation to LSHIFT if
+   constant operand is a power of 2.
+   type a_t, b_t
+   S1': b_t = a_t << log2 (n)
+
+   Convert the mult operation to LSHIFT and followed by a NEGATE
+   if constant operand is a negative power of 2.
+   type a_t, b_t, res_T;
+   S2': b_t = a_t << log2 (n)
+   S3': res_T  = - (b_t)
+
+ Output:
+
+  * TYPE_IN: The type of the input arguments to the pattern.
+
+  * TYPE_OUT: The type of the output of this pattern.
+
+  * Return value: A new stmt that will be used to replace the multiplication
+    S1 or S2 stmt.  */
+
+static gimple
+vect_recog_mult_pattern (vec<gimple> *stmts,
+                        tree *type_in, tree *type_out)
+{
+  gimple last_stmt = stmts->pop ();
+  tree oprnd0, oprnd1, vectype, itype;
+  gimple pattern_stmt, def_stmt;
+  optab optab;
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
+    return NULL;
+
+  oprnd0 = gimple_assign_rhs1 (last_stmt);
+  oprnd1 = gimple_assign_rhs2 (last_stmt);
+  itype = TREE_TYPE (oprnd0);
+
+  if (TREE_CODE (oprnd0) != SSA_NAME
+      || TREE_CODE (oprnd1) != INTEGER_CST
+      || TREE_CODE (itype) != INTEGER_TYPE
+      || TYPE_PRECISION (itype) != GET_MODE_PRECISION (TYPE_MODE (itype)))
+    return NULL;
+
+  vectype = get_vectype_for_scalar_type (itype);
+  if (vectype == NULL_TREE)
+    return NULL;
+
+  /* If the target can handle vectorized multiplication natively,
+     don't attempt to optimize this.  */
+  optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+  if (optab != unknown_optab)
+    {
+      machine_mode vec_mode = TYPE_MODE (vectype);
+      int icode = (int) optab_handler (optab, vec_mode);
+      if (icode != CODE_FOR_nothing)
+       return NULL;
+    }
+
+  /* If target cannot handle vector left shift then we cannot
+     optimize and bail out.  */
+  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
+  if (!optab
+      || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
+       return NULL;
+
+  /* Handle constant operands that are postive or negative powers of 2.  */
+  if ( wi::exact_log2 (oprnd1) != -1  ||
+       wi::exact_log2 (wi::neg (oprnd1)) != -1)
+    {
+      tree shift;
+
+      if (wi::exact_log2 (oprnd1) != -1)
+       {
+         shift = build_int_cst (itype, wi::exact_log2 (oprnd1));
+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
+                                  LSHIFT_EXPR, oprnd0, shift);
+       }
+      else
+       {
+         /* If the target cannot handle vector NEGATE then we cannot
+            do the optimization.  */
+         optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
+         if (!optab
+             || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
+           return NULL;
+
+         shift = build_int_cst (itype, wi::exact_log2 (wi::neg (oprnd1)));
+         def_stmt
+            = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
+                                   LSHIFT_EXPR, oprnd0, shift);
+         new_pattern_def_seq (stmt_vinfo, def_stmt);
+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
+                                  NEGATE_EXPR, gimple_assign_lhs (def_stmt));
+       }
+
+      /* Pattern detected.  */
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_NOTE, vect_location,
+                        "vect_recog_mult_pattern: detected:\n");
+
+      if (dump_enabled_p ())
+       dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM,
+                             pattern_stmt,0);
+
+      stmts->safe_push (last_stmt);
+      *type_in = vectype;
+      *type_out = vectype;
+      return pattern_stmt;
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index dfa8795..b490af4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1132,7 +1132,7 @@ extern void vect_slp_transform_bb (basic_block);
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *);
-#define NUM_PATTERNS 12
+#define NUM_PATTERNS 13
 void vect_pattern_recog (loop_vec_info, bb_vec_info);
 
 /* In tree-vectorizer.c.  */

RE: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

Reply via email to