On Tue, Aug 09, 2022 at 02:23:50PM +0100, Andrew Stubbs wrote: > > There has been support for generating "inbranch" SIMD clones for a long time, > but nothing actually uses them (as far as I can see).
Thanks for working on this. Note, there is another case where the inbranch SIMD clones could be used and I even thought it is implemented, but apparently it isn't or it doesn't work: #ifndef TYPE #define TYPE int #endif /* A simple function that will be cloned. */ #pragma omp declare simd inbranch TYPE __attribute__((noinline)) foo (TYPE a) { return a + 1; } /* Check that "inbranch" clones are called correctly. */ void __attribute__((noinline)) masked (TYPE * __restrict a, TYPE * __restrict b, int size) { #pragma omp simd for (int i = 0; i < size; i++) b[i] = foo(a[i]); } Here, IMHO we should use the inbranch clone for vectorization (better than not vectorizing it, worse than when we'd have a notinbranch clone) and just use mask of all ones. But sure, it can be done incrementally, just mentioning it for completeness. > This patch add supports for a sub-set of possible cases (those using > mask_mode == VOIDmode). The other cases fail to vectorize, just as before, > so there should be no regressions. > > The sub-set of support should cover all cases needed by amdgcn, at present. > > gcc/ChangeLog: > > * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type > for mask arguments also. > * tree-if-conv.cc: Include cgraph.h. > (if_convertible_stmt_p): Do if conversions for calls to SIMD calls. > (predicate_statements): Pass the predicate to SIMD functions. > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls > to clones with mask arguments, in some cases. > Generate the mask vector arguments. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-simd-clone-16.c: New test. > * gcc.dg/vect/vect-simd-clone-16b.c: New test. > * gcc.dg/vect/vect-simd-clone-16c.c: New test. > * gcc.dg/vect/vect-simd-clone-16d.c: New test. > * gcc.dg/vect/vect-simd-clone-16e.c: New test. > * gcc.dg/vect/vect-simd-clone-16f.c: New test. > * gcc.dg/vect/vect-simd-clone-17.c: New test. > * gcc.dg/vect/vect-simd-clone-17b.c: New test. > * gcc.dg/vect/vect-simd-clone-17c.c: New test. > * gcc.dg/vect/vect-simd-clone-17d.c: New test. > * gcc.dg/vect/vect-simd-clone-17e.c: New test. > * gcc.dg/vect/vect-simd-clone-17f.c: New test. > * gcc.dg/vect/vect-simd-clone-18.c: New test. > * gcc.dg/vect/vect-simd-clone-18b.c: New test. > * gcc.dg/vect/vect-simd-clone-18c.c: New test. > * gcc.dg/vect/vect-simd-clone-18d.c: New test. > * gcc.dg/vect/vect-simd-clone-18e.c: New test. > * gcc.dg/vect/vect-simd-clone-18f.c: New test. > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, > vec<data_reference_p> refs) > tree fndecl = gimple_call_fndecl (stmt); > if (fndecl) > { > + /* We can vectorize some builtins and functions with SIMD > + clones. */ > int flags = gimple_call_flags (stmt); > + struct cgraph_node *node = cgraph_node::get (fndecl); > if ((flags & ECF_CONST) > && !(flags & ECF_LOOPING_CONST_OR_PURE) > - /* We can only vectorize some builtins at the moment, > - so restrict if-conversion to those. */ > && fndecl_built_in_p (fndecl)) > return true; > + else if (node && node->simd_clones != NULL) > + { > + need_to_predicate = true; I think it would be worth it to check that at least one of the node->simd_clones clones has ->inbranch set, because if all calls are declare simd notinbranch, then predicating the loop will be just a wasted effort. > + return true; > + } > } > return false; > } > @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop) > gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); > update_stmt (stmt); > } > + > + /* Add a predicate parameter to functions that have a SIMD clone. > + This will cause the vectorizer to match the "in branch" clone > + variants because they also have the extra parameter, and serves > + to build the mask vector in a natural way. */ > + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi)); > + if (call && !gimple_call_internal_p (call)) > + { > + tree orig_fndecl = gimple_call_fndecl (call); > + int orig_nargs = gimple_call_num_args (call); > + auto_vec<tree> args; > + for (int i=0; i < orig_nargs; i++) > + args.safe_push (gimple_call_arg (call, i)); > + args.safe_push (cond); > + > + /* Replace the call with a new one that has the extra > + parameter. The FUNCTION_DECL remains unchanged so that > + the vectorizer can find the SIMD clones. This call will > + either be deleted or replaced at that time, so the > + mismatch is short-lived and we can live with it. */ > + gcall *new_call = gimple_build_call_vec (orig_fndecl, args); > + gimple_call_set_lhs (new_call, gimple_call_lhs (call)); > + gsi_replace (&gsi, new_call, true); I think this is way too dangerous to represent conditional calls that way, there is nothing to distinguish those from non-conditional calls. I think I'd prefer (but please see what Richi thinks too) to represent the conditional calls as a call to a new internal function, say IFN_COND_CALL or IFN_MASK_CALL, which would have the arguments the original call had, plus 2 extra ones first (or 3?), one that would be saved copy of original gimple_call_fn (i.e. usually &fndecl), another one that would be the condition (and dunno about whether we need also something to represent gimple_call_fntype, or whether we simply should punt during ifcvt on conditional calls where gimple_call_fntype is incompatible with the function type of fndecl. Another question is about gimple_call_chain. Punt or copy it over to the ifn and back. Jakub