On Mon, Oct 30, 2023 at 1:23 PM <pan2...@intel.com> wrote: > > From: Pan Li <pan2...@intel.com> > > Update in v3: > > * Add func to predicate type size is legal or not for vectorizer call. > > Update in v2: > > * Fix one ICE of type assertion. > * Adjust some test cases for aarch64 sve and riscv vector. > > Original log: > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to refine this data > type size check and unblock the standard name like lrintmn2 on conditions. > > The type size of vectype_out need to be exactly the same as the type > size of vectype_in when the vectype_out size isn't participating in > the optab selection. While there is no such restriction when the > vectype_out is somehow a part of the optab query. > > The below test are passed for this patch. > > * The x86 bootstrap and regression test. > * The aarch64 regression test. > * The risc-v regression tests. > * Ensure the lrintf standard name in risc-v. > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New > func impl to predicate the type size is legal or not. > (vectorizable_call): Leverage vectorizable_type_size_legal_p. > > Signed-off-by: Pan Li <pan2...@intel.com> > --- > gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++----------- > 1 file changed, 38 insertions(+), 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index a9200767f67..24b3448d961 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree > fndecl, > return IFN_LAST; > } > > +/* Return TRUE when the type size is legal for the call vectorizer, > + or FALSE. > + The type size of both the vectype_in and vectype_out should be > + exactly the same when vectype_out isn't participating the optab. > + While there is no restriction for type size when vectype_out > + is part of the optab query. > + */ > +static bool > +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out, > + tree vectype_in) > +{ > + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out); > + > + if (ifn == IFN_LAST || !direct_internal_fn_p (ifn)) > + return same_size_p; > + > + const direct_internal_fn_info &difn_info = direct_internal_fn (ifn); > + > + if (!difn_info.vectorizable) > + return same_size_p; > + > + /* According to vectorizable_internal_function, the type0/1 < 0 indicates > + the vectype_out participating the optable selection. Aka the type size > + check can be skipped here. */ > + if (difn_info.type0 < 0 || difn_info.type1 < 0) > + return true;
can you instead amend vectorizable_internal_function to contain the check, returning IFN_LAST if it doesn't hold? > + > + return same_size_p; > +} > > static tree permute_vec_elements (vec_info *, tree, tree, tree, > stmt_vec_info, > gimple_stmt_iterator *); > @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo, > ifn = vectorizable_internal_function (cfn, callee, vectype_out, > vectype_in); > > + if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "mismatched vector sizes %T and %T\n", > + vectype_in, vectype_out); > + return false; > + } > + > /* If that fails, try asking for a target-specific built-in function. */ > if (ifn == IFN_LAST) > { > -- > 2.34.1 >