On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus
<stefa...@linux.ibm.com> wrote:
>
> On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus wrote:
> [...]
> > > but we won't ever arrive here because of the niters condition.  But
> > > yes, doing the pattern matching in the innermost loop processing code
> > > looks good to me - for the specific case it would be
> > >
> > >       /* Don't distribute loop if niters is unknown.  */
> > >       tree niters = number_of_latch_executions (loop);
> > >       if (niters == NULL_TREE || niters == chrec_dont_know)
> > > ---> here?
> > >         continue;
> >
> > Right, please find attached a new version of the patch where everything
> > is included in the loop distribution pass.  I will do a bootstrap and
> > regtest on IBM Z over night.  If you give me green light I will also do
> > the same on x86_64.
>
> Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at
> least the ldist-strlen testcase).  If you are Ok with the patch, then I
> would rebase and run the testsuites again and post a patch series
> including the rawmemchr implementation for IBM Z.

@@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop
*loop, vec<gimple *> *work_list)
   return work_list->length () > 0;
 }

+static void
+generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
+                           data_reference_p store_dr, tree base, tree pattern,
+                           location_t loc)
+{

this new function needs a comment.  Applies to all of the new ones, btw.

+  gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base))
+                      && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE (pattern));

this looks fragile and is probably unnecessary as well.

+  gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base));

in general you want types_compatible_p () checks which for pointers means
all pointers are compatible ...

(skipping stuff)

@@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun)
              && !optimize_loop_for_speed_p (loop)))
        continue;

-      /* Don't distribute loop if niters is unknown.  */
+      /* If niters is unknown don't distribute loop but rather try to transform
+        it to a call to a builtin.  */
       tree niters = number_of_latch_executions (loop);
       if (niters == NULL_TREE || niters == chrec_dont_know)
-       continue;
+       {
+         if (transform_reduction_loop (loop))
+           {
+             changed = true;
+             loops_to_be_destroyed.safe_push (loop);
+             if (dump_file)
+               fprintf (dump_file, "Loop %d transformed into a
builtin.\n", loop->num);
+           }
+         continue;
+       }

please look at

          if (nb_generated_loops + nb_generated_calls > 0)
            {
              changed = true;
              if (dump_enabled_p ())
                dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
                                 loc, "Loop%s %d distributed: split to
%d loops "
                                 "and %d library calls.\n", str, loop->num,
                                 nb_generated_loops, nb_generated_calls);

and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the
transforms are reported with -fopt-info-loop

+
+  return transform_reduction_loop_1 (loop, load_dr, store_dr, reduction_var);
+}

what's the point in tail-calling here and visually splitting the
function in half?

(sorry for picking random pieces now ;))

+      for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+          gsi_next (&bsi), ++ninsns)
+       {

this counts debug insns, I guess you want gsi_next_nondebug at least.
not sure why you are counting PHIs at all btw - for the loops you match
you are expecting at most two, one IV and eventually one for the virtual
operand of the store?

+         if (gimple_has_volatile_ops (phi))
+           return false;

PHIs never have volatile ops.

+         if (gimple_clobber_p (phi))
+           continue;

or are clobbers.

Btw, can you factor out a helper from find_single_drs working on a
stmt to reduce code duplication?

+  tree reduction_var;
+  switch (gimple_code (reduction_stmt))
+    {
+    case GIMPLE_PHI:
+      reduction_var = gimple_phi_result (reduction_stmt);
+      break;
+    case GIMPLE_ASSIGN:
+      reduction_var = gimple_assign_lhs (reduction_stmt);
+      break;
+    default:
+      /* Bail out e.g. for GIMPLE_CALL.  */
+      return false;

gimple_get_lhs (reduction_stmt); would work for both PHIs
and assigns.

+  if (reduction_var == NULL)
+    return false;

it can never be NULL here.

+  /* Bail out if this is a bitfield memory reference.  */
+  if (TREE_CODE (DR_REF (load_dr)) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (load_dr), 1)))
+    return false;
...

I see this is again quite some code copied from find_single_drs, please
see how to avoid this much duplication by splitting out helpers.

+static bool
+transform_reduction_loop_1 (loop_p loop,
+                           data_reference_p load_dr,
+                           data_reference_p store_dr,
+                           tree reduction_var)
+{
+  tree load_ref = DR_REF (load_dr);
+  tree load_type = TREE_TYPE (load_ref);
+  tree load_access_base = build_fold_addr_expr (load_ref);
+  tree load_access_size = TYPE_SIZE_UNIT (load_type);
+  affine_iv load_iv, reduction_iv;
+  tree pattern;
+
+  /* A limitation of the current implementation is that we only support
+     constant patterns.  */
+  edge e = single_exit (loop);
+  gcond *cond_stmt = safe_dyn_cast <gcond *> (last_stmt (e->src));
+  if (!cond_stmt)
+    return false;

that looks like checks to be done at the start of
transform_reduction_loop, not this late.

+  if (gimple_cond_code (cond_stmt) != NE_EXPR
+      || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (DR_STMT (load_dr))
+      || TREE_CODE (pattern) != INTEGER_CST)
+    return false;

half of this as well.  Btw, there's no canonicalization for
the tests so you have to verify the false edge actually exits
the loop and allow for EQ_EXPR in case the false edge does.

+  /* Handle strlen like loops.  */
+  if (store_dr == NULL
+      && integer_zerop (pattern)
+      && TREE_CODE (reduction_iv.base) == INTEGER_CST
+      && TREE_CODE (reduction_iv.step) == INTEGER_CST
+      && integer_onep (reduction_iv.step)
+      && (types_compatible_p (TREE_TYPE (reduction_var), size_type_node)
+         || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))))
+    {

I wonder what goes wrong with a larger or smaller wrapping IV type?
The iteration
only stops when you load a NUL and the increments just wrap along (you're
using the pointer IVs to compute the strlen result).  Can't you simply truncate?
For larger than size_type_node (actually larger than ptr_type_node would matter
I guess), the argument is that since pointer wrapping would be undefined anyway
the IV cannot wrap either.  Now, the correct check here would IMHO be

      TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION
(ptr_type_node)
       || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var))

?

+      if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)))
+       {
+         const char *msg = G_("assuming signed overflow does not occur "
+                              "when optimizing strlen like loop");
+         fold_overflow_warning (msg, WARN_STRICT_OVERFLOW_MISC);
+       }

no, please don't add any new strict-overflow warnings ;)

The generate_*_builtin routines need some factoring - if you code-generate
into a gimple_seq you could use gimple_build () which would do the fold_stmt
(not sure why you do that - you should see to fold the call, not necessarily
the rest).  The replacement of reduction_var and the dumping could be shared.
There's also GET_MODE_NAME for the printing.

I think overall the approach is sound now but the details still need work.

Thanks,
Richard.



> Cheers,
> Stefan

Reply via email to