"yangyang (ET)" <yangyang...@huawei.com> writes: > Although Richard mentioned in the PR that poly_uint64 will naturally > decay to a uint64_t in i386 target files, it seems that operation /= is not > supported yet, so I change "clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE > (base_type));" into "clonei->simdlen = clonei->simdlen / GET_MODE_BITSIZE > (TYPE_MODE (base_type));".
Ah, don't remember encountering that one. But yeah, expanding the /= seems like the best approach for now. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index a8cc545c370..c630c0c7f81 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen > (struct cgraph_node *node, > tree base_type, int num) > { > tree t, ret_type, arg_type; > - unsigned int elt_bits, vec_bits, count; > + unsigned int elt_bits, count; > + unsigned HOST_WIDE_INT const_simdlen; > + poly_uint64 vec_bits; > > if (!TARGET_SIMD) > return 0; > > - if (clonei->simdlen > - && (clonei->simdlen < 2 > - || clonei->simdlen > 1024 > - || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) > + /* For now, SVE simdclones won't produce illegal simdlen, So only check > + const simdlens here. */ > + if (maybe_ne (clonei->simdlen, 0U) > + && (clonei->simdlen.is_constant (&const_simdlen)) Very minor, but GCC style is (mostly!) not to wrap a condition like this in parentheses if it fits on a single line, so just: && clonei->simdlen.is_constant (&const_simdlen) > else > { > count = 1; > vec_bits = clonei->simdlen * elt_bits; > - if (vec_bits != 64 && vec_bits != 128) > + /* For now, SVE simdclones won't produce illegal simdlen, So only check > + const simdlens here. */ > + if (clonei->simdlen.is_constant (&const_simdlen) > + && known_ne (vec_bits, 64U) && known_ne (vec_bits, 128U)) Although it won't make a difference in context due to the is_constant check, in principle this should be “maybe_ne” rather than “known_ne”. E.g. when testing SVE conditions, known_ne (2 + 2 * (VQ - 1), 2) is false but maybe_ne (2 + 2 * (VQ - 1), 2) is true. Alternatively: !(known_eq (vec_bits, 64U) || known_eq (vec_bits, 128U)) if that seems more natural (either's fine). > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 54c2cdaf060..0ef037e5e55 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct > cgraph_node *node, > || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) > { > warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > - "unsupported simdlen %d", clonei->simdlen); > + "unsupported simdlen %ld", clonei->simdlen.to_constant ()); I think this should be %wd instead. > @@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct > cgraph_node *node, > if (cnt > (TARGET_64BIT ? 16 : 8)) > { > warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > - "unsupported simdlen %d", clonei->simdlen); > + "unsupported simdlen %ld", > + clonei->simdlen.to_constant ()); Same here. > @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node) > veclen = node->simdclone->vecsize_int; > else > veclen = node->simdclone->vecsize_float; > - veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)); > - if (veclen > node->simdclone->simdlen) > + veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t))); > + if (known_gt (veclen, node->simdclone->simdlen)) > veclen = node->simdclone->simdlen; Although again it probably doesn't make a difference in practice, the known/maybe situation is similar here. When comparing: - an SVE vector of 2 + 2 * (VQ - 1) doubles and - an Advanced SIMD vector of 2 doubles the Advanced SIMD version is conceptually ordered <= the SVE one, in the sense that the SVE vector always contains a whole number of Advanced SIMD vectors whereas the Advanced SIMD vector might not contain a whole number of SVE vectors. In other words, the number of lanes in the Advanced SIMD vector is known_le the number of lanes in the SVE vector, and the number of lanes in the SVE vector is known_ge and maybe_gt (but not known_gt) the number of lanes in the Advanced SIMD vector. So for these kinds of comparison, known_gt can look a bit unexpected, even if (as here) it's probably fine in practice. There's currently a hard-coded assumption in this code and in the vectoriser that both constant-length software vectors and constant-length hardware vectors are a power of 2 in size. This means that the > above is effectively testing whether veclen contains a whole number of node->simdclone->simdlens, not just whether the veclen is bigger. So when adding the initial autovec support for SVE, we generally rewrote these kinds of test to use multiple_p instead of comparisons. I think that makes the intent more obvious (but others might disagree :-)). That was a long-winded way of saying that I suggest we use: if (multiple_p (veclen, node->simdclone->simdlen)) veclen = node->simdclone->simdlen; instead. > if (POINTER_TYPE_P (t)) > t = pointer_sized_int_node; > - if (veclen == node->simdclone->simdlen) > + if (known_eq (veclen, node->simdclone->simdlen)) > t = build_vector_type (t, node->simdclone->simdlen); > else > { > t = build_vector_type (t, veclen); > - t = build_array_type_nelts (t, node->simdclone->simdlen / veclen); > + t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen, > + veclen)); Another minor formatting thing, but the “veclen” should be indented under “node->simdclone->simdlen”. > } > TREE_TYPE (TREE_TYPE (fndecl)) = t; > if (!node->definition) > @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node > *node) > veclen = sc->vecsize_int; > else > veclen = sc->vecsize_float; > - veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)); > - if (veclen > sc->simdlen) > + veclen = exact_div (veclen, > + GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type))); > + if (known_gt (veclen, sc->simdlen)) > veclen = sc->simdlen; Similarly here. > if (sc->mask_mode != VOIDmode) > adj.type > @@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node > *node) > adj.type = build_vector_type (base_type, veclen); > vec_safe_push (new_params, adj); > > - for (j = veclen; j < sc->simdlen; j += veclen) > + k = vector_unroll_factor (sc->simdlen, veclen); > + for (j = 1; j < k; j++) > vec_safe_push (new_params, adj); > > /* We have previously allocated one extra entry for the mask. Use > @@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node > *node) > if (sc->mask_mode == VOIDmode) > sc->args[i].simd_array > = create_tmp_simd_array ("mask", base_type, sc->simdlen); > - else if (veclen < sc->simdlen) > + else if (known_lt (veclen, sc->simdlen)) I think k > 1 might be clearer, in the sense that it avoids a choice between “maybe” and “known”. > sc->args[i].simd_array > - = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen); > + = create_tmp_simd_array ("mask", adj.type, k); > else > sc->args[i].simd_array = NULL_TREE; > } > […] > @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node, > iter, NULL_TREE, NULL_TREE); > adjustments->register_replacement (&(*adjustments->m_adj_params)[j], > r); > > - if (simd_clone_subparts (vectype) < node->simdclone->simdlen) > - j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1; > + if (known_lt (simd_clone_subparts (vectype), node->simdclone->simdlen)) > + { > + j += vector_unroll_factor (node->simdclone->simdlen, > + simd_clone_subparts (vectype)) - 1; > + } And similarly here: if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype))) > diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h > index 5e04e63ebf2..78083098baa 100644 > --- a/gcc/poly-int-types.h > +++ b/gcc/poly-int-types.h > @@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int> > poly_widest_int; > #define vector_element_size(SIZE, NELTS) \ > (exact_div (SIZE, NELTS).to_constant ()) > > +/* Return the number of unroll times when a vector has NELTS1 elements s/vector has/vector that has/ > + is unrolled to vectors has NELTS2 elements. s/vectors has/vectors that have/. > + > + to_constant () is safe in this situation because the multiples of the > + NELTS of two vectors are always constant-size scalars. */ > +#define vector_unroll_factor(NELTS1, NELTS2) \ > + (exact_div (NELTS1, NELTS2).to_constant ()) > + > /* Wrapper for poly_int arguments to target macros, so that if a target > doesn't need polynomial-sized modes, its header file can continue to > treat the argument as a normal constant. This should go away once > @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > n = n->simdclone->next_clone) > { > unsigned int this_badness = 0; > - if (n->simdclone->simdlen > vf > + if (known_gt (n->simdclone->simdlen, vf) I think !constant_multiple_p rather than known_gt here. Until we add support for vectorising simd clones with partial vectors, a whole number of simdlens must fit within VF. So maybe: unsigned int num_calls; if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls) and then… > || n->simdclone->nargs != nargs) > continue; > - if (n->simdclone->simdlen < vf) > - this_badness += (exact_log2 (vf) > - - exact_log2 (n->simdclone->simdlen)) * 1024; > + if (known_lt (n->simdclone->simdlen, vf)) > + this_badness += exact_log2 > + (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024; …use num_calls instead of vector_unroll_factor here. > if (n->simdclone->inbranch) > this_badness += 2048; > int target_badness = targetm.simd_clone.usable (n); > @@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type, > slp_node); > if (arginfo[i].vectype == NULL > - || (simd_clone_subparts (arginfo[i].vectype) > - > bestn->simdclone->simdlen)) > + || (known_gt (simd_clone_subparts (arginfo[i].vectype), > + bestn->simdclone->simdlen))) Here too I think we want constant_multiple_p: || !constant_multiple_p (bestn->simdclone->simdlen, simd_clone_subparts (arginfo[i].vectype)) > return false; > } > > fndecl = bestn->decl; > nunits = bestn->simdclone->simdlen; > - ncopies = vf / nunits; > + ncopies = vector_unroll_factor (vf, nunits); > > /* If the function isn't const, only allow it in simd loops where user > has asserted that at least nunits consecutive iterations can be > performed using SIMD instructions. */ > - if ((loop == NULL || (unsigned) loop->safelen < nunits) > + if ((loop == NULL || known_lt ((unsigned) loop->safelen, nunits)) This should be maybe_lt, because we can only safely optimise if we know that the simdlen >= safelen. > && gimple_vuse (stmt)) > return false; > > @@ -4078,15 +4078,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > { > case SIMD_CLONE_ARG_TYPE_VECTOR: > atype = bestn->simdclone->args[i].vector_type; > - o = nunits / simd_clone_subparts (atype); > + o = vector_unroll_factor (nunits, > + simd_clone_subparts (atype)); > for (m = j * o; m < (j + 1) * o; m++) > { > - if (simd_clone_subparts (atype) > - < simd_clone_subparts (arginfo[i].vectype)) > + if (known_lt (simd_clone_subparts (atype), > + simd_clone_subparts (arginfo[i].vectype))) I think we should leave this and… > { > poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype)); > - k = (simd_clone_subparts (arginfo[i].vectype) > - / simd_clone_subparts (atype)); > + k = simd_clone_subparts (arginfo[i].vectype) > + / simd_clone_subparts (atype); …this until the simd_clone_subparts calls are removed. FWIW, the original formatting of the division is the preferred one (with the justification that it helps emacs to indent the second line correctly :-)). > gcc_assert ((k & (k - 1)) == 0); > if (m == 0) > { > @@ -4116,8 +4117,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > } > else > { > - k = (simd_clone_subparts (atype) > - / simd_clone_subparts (arginfo[i].vectype)); > + k = simd_clone_subparts (atype) > + / simd_clone_subparts (arginfo[i].vectype); Similarly here, the original formatting was correct. > gcc_assert ((k & (k - 1)) == 0); > vec<constructor_elt, va_gc> *ctor_elts; > if (k != 1) > @@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > gcall *new_call = gimple_build_call_vec (fndecl, vargs); > if (vec_dest) > { > - gcc_assert (ratype || simd_clone_subparts (rtype) == nunits); > + gcc_assert (ratype || known_eq (simd_clone_subparts (rtype), > + nunits)); Another formatting nit, sorry, but now that it needs to span multiple lines, the preferred formatting is: gcc_assert (ratype || known_eq (simd_clone_subparts (rtype), nunits)); > if (ratype) > new_temp = create_tmp_var (ratype); > else if (useless_type_conversion_p (vectype, rtype)) > @@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > > if (vec_dest) > { > - if (simd_clone_subparts (vectype) < nunits) > + if (known_lt (simd_clone_subparts (vectype), nunits)) multiple_p (nunits, …) here too. It would also be possible to do: if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k) and avoid the separate vector_unroll_factor. > { > unsigned int k, l; > poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype)); > poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype)); > - k = nunits / simd_clone_subparts (vectype); > + k = vector_unroll_factor (nunits, > + simd_clone_subparts (vectype)); > gcc_assert ((k & (k - 1)) == 0); > for (l = 0; l < k; l++) > { > @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo, > stmt_vec_info stmt_info, > vect_clobber_variable (vinfo, stmt_info, gsi, new_temp); > continue; > } > - else if (simd_clone_subparts (vectype) > nunits) > + else if (known_gt (simd_clone_subparts (vectype), nunits)) multiple_p here too. > { > - unsigned int k = (simd_clone_subparts (vectype) > - / simd_clone_subparts (rtype)); > + unsigned int k = simd_clone_subparts (vectype) > + / simd_clone_subparts (rtype); Same comment about formatting. The patch looks really good though, thanks. The above comments are only really repetitions of a small number of minor things. Richard