"Andre Vieira (lists) via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > On 22/11/2021 11:41, Richard Biener wrote: >> >>> On 18/11/2021 11:05, Richard Biener wrote: >>>> This is a good shout and made me think about something I hadn't before... I >>>> thought I could handle the vector forms later, but the problem is if I add >>>> support for the scalar, it will stop the vectorizer. It seems >>>> vectorizable_call expects all arguments to have the same type, which >>>> doesn't >>>> work with passing the integer type as an operand work around. >> We already special case some IFNs there (masked load/store and gather) >> to ignore some args, so that would just add to this set. >> >> Richard. > Hi, > > Reworked it to add support of the new IFN to the vectorizer. Was > initially trying to make vectorizable_call and > vectorizable_internal_function handle IFNs with different inputs more > generically, using the information we have in the <IFN>_direct structs > regarding what operands to get the modes from. Unfortunately, that > wasn't straightforward because of how vectorizable_call assumes operands > have the same type and uses the type of the DEF_STMT_INFO of the > non-constant operands (either output operand or non-constant inputs) to > determine the type of constants. I assume there is some reason why we > use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on > the argument types. That is why I ended up with this sort of half-way > mix of both, which still allows room to add more IFNs that don't take > inputs of the same type, but require adding a bit of special casing > similar to the IFN_FTRUNC_INT and masking ones. > > Bootstrapped on aarch64-none-linux.
Still leaving the match.pd stuff to Richard, but otherwise: > index > bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f > 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF > "TARGET_SIMD_F16INST") > SF DF]) > > ;; Scalar and vetor modes for SF, DF. > -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF]) > +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD") Nit: excess space between [ and (. > + (V4SF "TARGET_SIMD") > + (V2DF "TARGET_SIMD") > + (DF "TARGET_FLOAT") > + (SF "TARGET_FLOAT")]) > > ;; Advanced SIMD single Float modes. > (define_mode_iterator VDQSF [V2SF V4SF]) > […] > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index > 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c > 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6175,6 +6175,15 @@ operands; otherwise, it may not. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern > +@item @samp{ftrunc@var{m}@var{n}2} > +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store > +the result in operand 0. Both operands have mode @var{m}, which is a scalar > or > +vector floating-point mode. Exception must be thrown if operand 1 does not > fit Maybe “An exception must be raised”? “thrown” makes it sound like a signal must be raised or C++ exception thrown. > +in a @var{n} mode signed integer as it would have if the truncation happened > +through separate floating point to integer conversion. > + > + > @cindex @code{round@var{m}2} instruction pattern > @item @samp{round@var{m}2} > Round operand 1 to the nearest integer, rounding away from zero in the > […] > @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, > tree_pair types, > != CODE_FOR_nothing); > } > > +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab, > + tree_pair types, > + optimization_type opt_type) Formatting nit: should be a line break after “bool” > +{ > + return (convert_optab_handler (optab, TYPE_MODE (types.first), > + TYPE_MODE (element_type (types.second)), > + opt_type) != CODE_FOR_nothing); > +} > + > #define direct_unary_optab_supported_p direct_optab_supported_p > #define direct_binary_optab_supported_p direct_optab_supported_p > #define direct_ternary_optab_supported_p direct_optab_supported_p > […] > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > @@ -0,0 +1,47 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=armv8.5-a" } */ > +/* { dg-require-effective-target aarch64_frintnzx_ok } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +#define TEST(name,float_type,int_type) > \ > +void \ > +name (float_type * __restrict__ x, float_type * __restrict__ y, int n) \ > +{ \ > + for (int i = 0; i < n; ++i) \ > + { \ > + int_type x_i = x[i]; \ > + y[i] = (float_type) x_i; \ > + } \ > +} > + > +/* > +** f1: > +** ... > +** frint32z v0.4s, v0.4s I don't think we can rely on v0 being used here. v[0-9]+\.4s would be safer. > +** ... > +*/ > +TEST(f1, float, int) > + > +/* > +** f2: > +** ... > +** frint64z v0.4s, v0.4s > +** ... > +*/ > +TEST(f2, float, long long) > + > +/* > +** f3: > +** ... > +** frint32z v0.2d, v0.2d > +** ... > +*/ > +TEST(f3, double, int) > + > +/* > +** f4: > +** ... > +** frint64z v0.2d, v0.2d > +** ... > +*/ > +TEST(f4, double, long long) > […] > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index > 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62 > 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo, > > static internal_fn > vectorizable_internal_function (combined_fn cfn, tree fndecl, > - tree vectype_out, tree vectype_in) > + tree vectype_out, tree vectype_in, > + tree *vectypes) > { > internal_fn ifn; > if (internal_fn_p (cfn)) > @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree > fndecl, > const direct_internal_fn_info &info = direct_internal_fn (ifn); > if (info.vectorizable) > { > - tree type0 = (info.type0 < 0 ? vectype_out : vectype_in); > - tree type1 = (info.type1 < 0 ? vectype_out : vectype_in); > + tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]); > + if (!type0) > + type0 = vectype_in; > + tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]); > + if (!type1) > + type1 = vectype_in; > if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1), > OPTIMIZE_FOR_SPEED)) > return ifn; > @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, > rhs_type = unsigned_type_node; > } > > - int mask_opno = -1; > + /* The argument that is not of the same type as the others. */ > + int diff_opno = -1; > + bool masked = false; > if (internal_fn_p (cfn)) > - mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > + { > + if (cfn == CFN_FTRUNC_INT) > + /* For FTRUNC this represents the argument that carries the type of the > + intermediate signed integer. */ > + diff_opno = 1; > + else > + { > + /* For masked operations this represents the argument that carries the > + mask. */ > + diff_opno = internal_fn_mask_index (as_internal_fn (cfn)); > + masked = diff_opno >= 0; > + } > + } I think it would be cleaner not to process argument 1 at all for CFN_FTRUNC_INT. There's no particular need to vectorise it. > for (i = 0; i < nargs; i++) > { > - if ((int) i == mask_opno) > + if ((int) i == diff_opno && masked) > { > - if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno, > - &op, &slp_op[i], &dt[i], &vectypes[i])) > + if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, > + diff_opno, &op, &slp_op[i], &dt[i], > + &vectypes[i])) > return false; > continue; > } > […] > diff --git a/gcc/tree.c b/gcc/tree.c > index > 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1 > 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, > cst_size_error *perr /* = NULL */) > return true; > } > > -/* Return the precision of the type, or for a complex or vector type the > - precision of the type of its elements. */ > +/* Return the type, or for a complex or vector type the type of its > + elements. */ > > -unsigned int > -element_precision (const_tree type) > +tree > +element_type (const_tree type) > { > if (!TYPE_P (type)) > type = TREE_TYPE (type); > @@ -6657,7 +6657,16 @@ element_precision (const_tree type) > if (code == COMPLEX_TYPE || code == VECTOR_TYPE) > type = TREE_TYPE (type); > > - return TYPE_PRECISION (type); > + return (tree) type; I think we should stick a const_cast in element_precision and make element_type take a plain “type”. As it stands element_type is an implicit const_cast for many cases. Thanks, Richard > +} > + > +/* Return the precision of the type, or for a complex or vector type the > + precision of the type of its elements. */ > + > +unsigned int > +element_precision (const_tree type) > +{ > + return TYPE_PRECISION (element_type (type)); > } > > /* Return true if CODE represents an associative tree code. Otherwise