On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote:
> 
> There has been support for generating "inbranch" SIMD clones for a long time,
> but nothing actually uses them (as far as I can see).

Thanks for working on this.

Note, there is another case where the inbranch SIMD clones could be used
and I even thought it is implemented, but apparently it isn't or it doesn't
work:
#ifndef TYPE
#define TYPE int
#endif

/* A simple function that will be cloned.  */
#pragma omp declare simd inbranch
TYPE __attribute__((noinline))
foo (TYPE a)
{
  return a + 1;
}

/* Check that "inbranch" clones are called correctly.  */

void __attribute__((noinline))
masked (TYPE * __restrict a, TYPE * __restrict b, int size)
{
  #pragma omp simd
  for (int i = 0; i < size; i++)
    b[i] = foo(a[i]);
}

Here, IMHO we should use the inbranch clone for vectorization (better
than not vectorizing it, worse than when we'd have a notinbranch clone)
and just use mask of all ones.
But sure, it can be done incrementally, just mentioning it for completeness.

> This patch add supports for a sub-set of possible cases (those using
> mask_mode == VOIDmode).  The other cases fail to vectorize, just as before,
> so there should be no regressions.
> 
> The sub-set of support should cover all cases needed by amdgcn, at present.
> 
> gcc/ChangeLog:
> 
>       * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
>       for mask arguments also.
>       * tree-if-conv.cc: Include cgraph.h.
>       (if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
>       (predicate_statements): Pass the predicate to SIMD functions.
>       * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
>       to clones with mask arguments, in some cases.
>       Generate the mask vector arguments.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/vect-simd-clone-16.c: New test.
>       * gcc.dg/vect/vect-simd-clone-16b.c: New test.
>       * gcc.dg/vect/vect-simd-clone-16c.c: New test.
>       * gcc.dg/vect/vect-simd-clone-16d.c: New test.
>       * gcc.dg/vect/vect-simd-clone-16e.c: New test.
>       * gcc.dg/vect/vect-simd-clone-16f.c: New test.
>       * gcc.dg/vect/vect-simd-clone-17.c: New test.
>       * gcc.dg/vect/vect-simd-clone-17b.c: New test.
>       * gcc.dg/vect/vect-simd-clone-17c.c: New test.
>       * gcc.dg/vect/vect-simd-clone-17d.c: New test.
>       * gcc.dg/vect/vect-simd-clone-17e.c: New test.
>       * gcc.dg/vect/vect-simd-clone-17f.c: New test.
>       * gcc.dg/vect/vect-simd-clone-18.c: New test.
>       * gcc.dg/vect/vect-simd-clone-18b.c: New test.
>       * gcc.dg/vect/vect-simd-clone-18c.c: New test.
>       * gcc.dg/vect/vect-simd-clone-18d.c: New test.
>       * gcc.dg/vect/vect-simd-clone-18e.c: New test.
>       * gcc.dg/vect/vect-simd-clone-18f.c: New test.

> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, 
> vec<data_reference_p> refs)
>       tree fndecl = gimple_call_fndecl (stmt);
>       if (fndecl)
>         {
> +         /* We can vectorize some builtins and functions with SIMD
> +            clones.  */
>           int flags = gimple_call_flags (stmt);
> +         struct cgraph_node *node = cgraph_node::get (fndecl);
>           if ((flags & ECF_CONST)
>               && !(flags & ECF_LOOPING_CONST_OR_PURE)
> -             /* We can only vectorize some builtins at the moment,
> -                so restrict if-conversion to those.  */
>               && fndecl_built_in_p (fndecl))
>             return true;
> +         else if (node && node->simd_clones != NULL)
> +           {
> +             need_to_predicate = true;

I think it would be worth it to check that at least one of the
node->simd_clones clones has ->inbranch set, because if all calls
are declare simd notinbranch, then predicating the loop will be just a
wasted effort.

> +             return true;
> +           }
>         }
>       return false;
>        }
> @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
>             gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
>             update_stmt (stmt);
>           }
> +
> +       /* Add a predicate parameter to functions that have a SIMD clone.
> +          This will cause the vectorizer to match the "in branch" clone
> +          variants because they also have the extra parameter, and serves
> +          to build the mask vector in a natural way.  */
> +       gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> +       if (call && !gimple_call_internal_p (call))
> +         {
> +           tree orig_fndecl = gimple_call_fndecl (call);
> +           int orig_nargs = gimple_call_num_args (call);
> +           auto_vec<tree> args;
> +           for (int i=0; i < orig_nargs; i++)
> +             args.safe_push (gimple_call_arg (call, i));
> +           args.safe_push (cond);
> +
> +           /* Replace the call with a new one that has the extra
> +              parameter.  The FUNCTION_DECL remains unchanged so that
> +              the vectorizer can find the SIMD clones.  This call will
> +              either be deleted or replaced at that time, so the
> +              mismatch is short-lived and we can live with it.  */
> +           gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
> +           gimple_call_set_lhs (new_call, gimple_call_lhs (call));
> +           gsi_replace (&gsi, new_call, true);

I think this is way too dangerous to represent conditional calls that way,
there is nothing to distinguish those from non-conditional calls.
I think I'd prefer (but please see what Richi thinks too) to represent
the conditional calls as a call to a new internal function, say
IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original
call had, plus 2 extra ones first (or 3?), one that would be saved copy of
original gimple_call_fn (i.e. usually &fndecl), another one that would be the
condition (and dunno about whether we need also something to represent
gimple_call_fntype, or whether we simply should punt during ifcvt
on conditional calls where gimple_call_fntype is incompatible with
the function type of fndecl.  Another question is about
gimple_call_chain.  Punt or copy it over to the ifn and back.

        Jakub

Reply via email to