On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote:
>
> There has been support for generating "inbranch" SIMD clones for a long time,
> but nothing actually uses them (as far as I can see).
Thanks for working on this.
Note, there is another case where the inbranch SIMD clones could be used
and I even thought it is implemented, but apparently it isn't or it doesn't
work:
#ifndef TYPE
#define TYPE int
#endif
/* A simple function that will be cloned. */
#pragma omp declare simd inbranch
TYPE __attribute__((noinline))
foo (TYPE a)
{
return a + 1;
}
/* Check that "inbranch" clones are called correctly. */
void __attribute__((noinline))
masked (TYPE * __restrict a, TYPE * __restrict b, int size)
{
#pragma omp simd
for (int i = 0; i < size; i++)
b[i] = foo(a[i]);
}
Here, IMHO we should use the inbranch clone for vectorization (better
than not vectorizing it, worse than when we'd have a notinbranch clone)
and just use mask of all ones.
But sure, it can be done incrementally, just mentioning it for completeness.
> This patch add supports for a sub-set of possible cases (those using
> mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
> so there should be no regressions.
>
> The sub-set of support should cover all cases needed by amdgcn, at present.
>
> gcc/ChangeLog:
>
> * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
> for mask arguments also.
> * tree-if-conv.cc: Include cgraph.h.
> (if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
> (predicate_statements): Pass the predicate to SIMD functions.
> * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls
> to clones with mask arguments, in some cases.
> Generate the mask vector arguments.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-simd-clone-16.c: New test.
> * gcc.dg/vect/vect-simd-clone-16b.c: New test.
> * gcc.dg/vect/vect-simd-clone-16c.c: New test.
> * gcc.dg/vect/vect-simd-clone-16d.c: New test.
> * gcc.dg/vect/vect-simd-clone-16e.c: New test.
> * gcc.dg/vect/vect-simd-clone-16f.c: New test.
> * gcc.dg/vect/vect-simd-clone-17.c: New test.
> * gcc.dg/vect/vect-simd-clone-17b.c: New test.
> * gcc.dg/vect/vect-simd-clone-17c.c: New test.
> * gcc.dg/vect/vect-simd-clone-17d.c: New test.
> * gcc.dg/vect/vect-simd-clone-17e.c: New test.
> * gcc.dg/vect/vect-simd-clone-17f.c: New test.
> * gcc.dg/vect/vect-simd-clone-18.c: New test.
> * gcc.dg/vect/vect-simd-clone-18b.c: New test.
> * gcc.dg/vect/vect-simd-clone-18c.c: New test.
> * gcc.dg/vect/vect-simd-clone-18d.c: New test.
> * gcc.dg/vect/vect-simd-clone-18e.c: New test.
> * gcc.dg/vect/vect-simd-clone-18f.c: New test.
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt,
> vec<data_reference_p> refs)
> tree fndecl = gimple_call_fndecl (stmt);
> if (fndecl)
> {
> + /* We can vectorize some builtins and functions with SIMD
> + clones. */
> int flags = gimple_call_flags (stmt);
> + struct cgraph_node *node = cgraph_node::get (fndecl);
> if ((flags & ECF_CONST)
> && !(flags & ECF_LOOPING_CONST_OR_PURE)
> - /* We can only vectorize some builtins at the moment,
> - so restrict if-conversion to those. */
> && fndecl_built_in_p (fndecl))
> return true;
> + else if (node && node->simd_clones != NULL)
> + {
> + need_to_predicate = true;
I think it would be worth it to check that at least one of the
node->simd_clones clones has ->inbranch set, because if all calls
are declare simd notinbranch, then predicating the loop will be just a
wasted effort.
> + return true;
> + }
> }
> return false;
> }
> @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop)
> gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
> update_stmt (stmt);
> }
> +
> + /* Add a predicate parameter to functions that have a SIMD clone.
> + This will cause the vectorizer to match the "in branch" clone
> + variants because they also have the extra parameter, and serves
> + to build the mask vector in a natural way. */
> + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> + if (call && !gimple_call_internal_p (call))
> + {
> + tree orig_fndecl = gimple_call_fndecl (call);
> + int orig_nargs = gimple_call_num_args (call);
> + auto_vec<tree> args;
> + for (int i=0; i < orig_nargs; i++)
> + args.safe_push (gimple_call_arg (call, i));
> + args.safe_push (cond);
> +
> + /* Replace the call with a new one that has the extra
> + parameter. The FUNCTION_DECL remains unchanged so that
> + the vectorizer can find the SIMD clones. This call will
> + either be deleted or replaced at that time, so the
> + mismatch is short-lived and we can live with it. */
> + gcall *new_call = gimple_build_call_vec (orig_fndecl, args);
> + gimple_call_set_lhs (new_call, gimple_call_lhs (call));
> + gsi_replace (&gsi, new_call, true);
I think this is way too dangerous to represent conditional calls that way,
there is nothing to distinguish those from non-conditional calls.
I think I'd prefer (but please see what Richi thinks too) to represent
the conditional calls as a call to a new internal function, say
IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original
call had, plus 2 extra ones first (or 3?), one that would be saved copy of
original gimple_call_fn (i.e. usually &fndecl), another one that would be the
condition (and dunno about whether we need also something to represent
gimple_call_fntype, or whether we simply should punt during ifcvt
on conditional calls where gimple_call_fntype is incompatible with
the function type of fndecl. Another question is about
gimple_call_chain. Punt or copy it over to the ifn and back.
Jakub