"yangyang (ET)" <yangyang...@huawei.com> writes:
>         Although Richard mentioned in the PR that poly_uint64 will naturally 
> decay to a uint64_t in i386 target files, it seems that operation /= is not 
> supported yet, so I change "clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE 
> (base_type));" into "clonei->simdlen = clonei->simdlen / GET_MODE_BITSIZE 
> (TYPE_MODE (base_type));".
Ah, don't remember encountering that one.  But yeah, expanding the
/= seems like the best approach for now.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a8cc545c370..c630c0c7f81 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
> (struct cgraph_node *node,
>                                       tree base_type, int num)
>  {
>    tree t, ret_type, arg_type;
> -  unsigned int elt_bits, vec_bits, count;
> +  unsigned int elt_bits, count;
> +  unsigned HOST_WIDE_INT const_simdlen;
> +  poly_uint64 vec_bits;
>  
>    if (!TARGET_SIMD)
>      return 0;
>  
> -  if (clonei->simdlen
> -      && (clonei->simdlen < 2
> -       || clonei->simdlen > 1024
> -       || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> +  /* For now, SVE simdclones won't produce illegal simdlen, So only check
> +     const simdlens here.  */
> +  if (maybe_ne (clonei->simdlen, 0U)
> +      && (clonei->simdlen.is_constant (&const_simdlen))

Very minor, but GCC style is (mostly!) not to wrap a condition like this
in parentheses if it fits on a single line, so just:

      && clonei->simdlen.is_constant (&const_simdlen)

>    else
>      {
>        count = 1;
>        vec_bits = clonei->simdlen * elt_bits;
> -      if (vec_bits != 64 && vec_bits != 128)
> +      /* For now, SVE simdclones won't produce illegal simdlen, So only check
> +      const simdlens here.  */
> +      if (clonei->simdlen.is_constant (&const_simdlen)
> +       && known_ne (vec_bits, 64U) && known_ne (vec_bits, 128U))

Although it won't make a difference in context due to the is_constant
check, in principle this should be “maybe_ne” rather than “known_ne”.
E.g. when testing SVE conditions, known_ne (2 + 2 * (VQ - 1), 2)
is false but maybe_ne (2 + 2 * (VQ - 1), 2) is true.

Alternatively:

   !(known_eq (vec_bits, 64U) || known_eq (vec_bits, 128U))

if that seems more natural (either's fine).

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54c2cdaf060..0ef037e5e55 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct 
> cgraph_node *node,
>         || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
>      {
>        warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -               "unsupported simdlen %d", clonei->simdlen);
> +               "unsupported simdlen %ld", clonei->simdlen.to_constant ());

I think this should be %wd instead.

> @@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct 
> cgraph_node *node,
>        if (cnt > (TARGET_64BIT ? 16 : 8))
>       {
>         warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -                   "unsupported simdlen %d", clonei->simdlen);
> +                   "unsupported simdlen %ld",
> +                   clonei->simdlen.to_constant ());

Same here.

> @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>      veclen = node->simdclone->vecsize_int;
>    else
>      veclen = node->simdclone->vecsize_float;
> -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
> -  if (veclen > node->simdclone->simdlen)
> +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> +  if (known_gt (veclen, node->simdclone->simdlen))
>      veclen = node->simdclone->simdlen;

Although again it probably doesn't make a difference in practice,
the known/maybe situation is similar here.  When comparing:

- an SVE vector of 2 + 2 * (VQ - 1) doubles and
- an Advanced SIMD vector of 2 doubles

the Advanced SIMD version is conceptually ordered <= the SVE one,
in the sense that the SVE vector always contains a whole number of
Advanced SIMD vectors whereas the Advanced SIMD vector might not
contain a whole number of SVE vectors.

In other words, the number of lanes in the Advanced SIMD vector
is known_le the number of lanes in the SVE vector, and the number
of lanes in the SVE vector is known_ge and maybe_gt (but not known_gt)
the number of lanes in the Advanced SIMD vector.  So for these kinds of
comparison, known_gt can look a bit unexpected, even if (as here) it's
probably fine in practice.

There's currently a hard-coded assumption in this code and in the
vectoriser that both constant-length software vectors and constant-length
hardware vectors are a power of 2 in size.  This means that the > above
is effectively testing whether veclen contains a whole number of
node->simdclone->simdlens, not just whether the veclen is bigger.

So when adding the initial autovec support for SVE, we generally rewrote
these kinds of test to use multiple_p instead of comparisons.  I think
that makes the intent more obvious (but others might disagree :-)).

That was a long-winded way of saying that I suggest we use:

  if (multiple_p (veclen, node->simdclone->simdlen))
    veclen = node->simdclone->simdlen;

instead.

>    if (POINTER_TYPE_P (t))
>      t = pointer_sized_int_node;
> -  if (veclen == node->simdclone->simdlen)
> +  if (known_eq (veclen, node->simdclone->simdlen))
>      t = build_vector_type (t, node->simdclone->simdlen);
>    else
>      {
>        t = build_vector_type (t, veclen);
> -      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
> +      t = build_array_type_nelts (t, exact_div (node->simdclone->simdlen,
> +                               veclen));

Another minor formatting thing, but the “veclen” should be indented under
“node->simdclone->simdlen”.

>      }
>    TREE_TYPE (TREE_TYPE (fndecl)) = t;
>    if (!node->definition)
> @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct cgraph_node 
> *node)
>       veclen = sc->vecsize_int;
>        else
>       veclen = sc->vecsize_float;
> -      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> -      if (veclen > sc->simdlen)
> +      veclen = exact_div (veclen,
> +                       GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
> +      if (known_gt (veclen, sc->simdlen))
>       veclen = sc->simdlen;

Similarly here.

>        if (sc->mask_mode != VOIDmode)
>       adj.type
> @@ -675,7 +682,8 @@ simd_clone_adjust_argument_types (struct cgraph_node 
> *node)
>       adj.type = build_vector_type (base_type, veclen);
>        vec_safe_push (new_params, adj);
>  
> -      for (j = veclen; j < sc->simdlen; j += veclen)
> +      k = vector_unroll_factor (sc->simdlen, veclen);
> +      for (j = 1; j < k; j++)
>       vec_safe_push (new_params, adj);
>  
>        /* We have previously allocated one extra entry for the mask.  Use
> @@ -690,9 +698,9 @@ simd_clone_adjust_argument_types (struct cgraph_node 
> *node)
>         if (sc->mask_mode == VOIDmode)
>           sc->args[i].simd_array
>             = create_tmp_simd_array ("mask", base_type, sc->simdlen);
> -       else if (veclen < sc->simdlen)
> +       else if (known_lt (veclen, sc->simdlen))

I think k > 1 might be clearer, in the sense that it avoids a choice
between “maybe” and “known”.

>           sc->args[i].simd_array
> -           = create_tmp_simd_array ("mask", adj.type, sc->simdlen / veclen);
> +           = create_tmp_simd_array ("mask", adj.type, k);
>         else
>           sc->args[i].simd_array = NULL_TREE;
>       }
> […]
> @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
>                 iter, NULL_TREE, NULL_TREE);
>        adjustments->register_replacement (&(*adjustments->m_adj_params)[j], 
> r);
>  
> -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> -     j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> +      if (known_lt (simd_clone_subparts (vectype), node->simdclone->simdlen))
> +     {
> +       j += vector_unroll_factor (node->simdclone->simdlen,
> +                                  simd_clone_subparts (vectype)) - 1;
> +     }

And similarly here:

      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))

> diff --git a/gcc/poly-int-types.h b/gcc/poly-int-types.h
> index 5e04e63ebf2..78083098baa 100644
> --- a/gcc/poly-int-types.h
> +++ b/gcc/poly-int-types.h
> @@ -81,6 +81,14 @@ typedef poly_int<NUM_POLY_INT_COEFFS, widest_int> 
> poly_widest_int;
>  #define vector_element_size(SIZE, NELTS) \
>    (exact_div (SIZE, NELTS).to_constant ())
>  
> +/* Return the number of unroll times when a vector has NELTS1 elements

s/vector has/vector that has/

> +   is unrolled to vectors has NELTS2 elements.

s/vectors has/vectors that have/.

> +
> +   to_constant () is safe in this situation because the multiples of the
> +   NELTS of two vectors are always constant-size scalars.  */
> +#define vector_unroll_factor(NELTS1, NELTS2) \
> +  (exact_div (NELTS1, NELTS2).to_constant ())
> +
>  /* Wrapper for poly_int arguments to target macros, so that if a target
>     doesn't need polynomial-sized modes, its header file can continue to
>     treat the argument as a normal constant.  This should go away once
> @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>        n = n->simdclone->next_clone)
>        {
>       unsigned int this_badness = 0;
> -     if (n->simdclone->simdlen > vf
> +     if (known_gt (n->simdclone->simdlen, vf)

I think !constant_multiple_p rather than known_gt here.  Until we add
support for vectorising simd clones with partial vectors, a whole number
of simdlens must fit within VF.  So maybe:

     unsigned int num_calls;
     if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)

and then…

>           || n->simdclone->nargs != nargs)
>         continue;
> -     if (n->simdclone->simdlen < vf)
> -       this_badness += (exact_log2 (vf)
> -                        - exact_log2 (n->simdclone->simdlen)) * 1024;
> +     if (known_lt (n->simdclone->simdlen, vf))
> +       this_badness += exact_log2
> +         (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;

…use num_calls instead of vector_unroll_factor here.


>       if (n->simdclone->inbranch)
>         this_badness += 2048;
>       int target_badness = targetm.simd_clone.usable (n);
> @@ -3988,19 +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>       arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
>                                                         slp_node);
>       if (arginfo[i].vectype == NULL
> -         || (simd_clone_subparts (arginfo[i].vectype)
> -             > bestn->simdclone->simdlen))
> +         || (known_gt (simd_clone_subparts (arginfo[i].vectype),
> +                       bestn->simdclone->simdlen)))

Here too I think we want constant_multiple_p:

            || !constant_multiple_p (bestn->simdclone->simdlen,
                                     simd_clone_subparts (arginfo[i].vectype))

>         return false;
>        }
>  
>    fndecl = bestn->decl;
>    nunits = bestn->simdclone->simdlen;
> -  ncopies = vf / nunits;
> +  ncopies = vector_unroll_factor (vf, nunits);
>  
>    /* If the function isn't const, only allow it in simd loops where user
>       has asserted that at least nunits consecutive iterations can be
>       performed using SIMD instructions.  */
> -  if ((loop == NULL || (unsigned) loop->safelen < nunits)
> +  if ((loop == NULL || known_lt ((unsigned) loop->safelen, nunits))

This should be maybe_lt, because we can only safely optimise if we know
that the simdlen >= safelen.

>        && gimple_vuse (stmt))
>      return false;
>  
> @@ -4078,15 +4078,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>           {
>           case SIMD_CLONE_ARG_TYPE_VECTOR:
>             atype = bestn->simdclone->args[i].vector_type;
> -           o = nunits / simd_clone_subparts (atype);
> +           o = vector_unroll_factor (nunits,
> +                                     simd_clone_subparts (atype));
>             for (m = j * o; m < (j + 1) * o; m++)
>               {
> -               if (simd_clone_subparts (atype)
> -                   < simd_clone_subparts (arginfo[i].vectype))
> +               if (known_lt (simd_clone_subparts (atype),
> +                             simd_clone_subparts (arginfo[i].vectype)))

I think we should leave this and…

>                   {
>                     poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
> -                   k = (simd_clone_subparts (arginfo[i].vectype)
> -                        / simd_clone_subparts (atype));
> +                   k = simd_clone_subparts (arginfo[i].vectype)
> +                       / simd_clone_subparts (atype);

…this until the simd_clone_subparts calls are removed.  FWIW, the original
formatting of the division is the preferred one (with the justification
that it helps emacs to indent the second line correctly :-)).

>                     gcc_assert ((k & (k - 1)) == 0);
>                     if (m == 0)
>                       {
> @@ -4116,8 +4117,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>                   }
>                 else
>                   {
> -                   k = (simd_clone_subparts (atype)
> -                        / simd_clone_subparts (arginfo[i].vectype));
> +                   k = simd_clone_subparts (atype)
> +                       / simd_clone_subparts (arginfo[i].vectype);

Similarly here, the original formatting was correct.

>                     gcc_assert ((k & (k - 1)) == 0);
>                     vec<constructor_elt, va_gc> *ctor_elts;
>                     if (k != 1)
> @@ -4250,7 +4251,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>        gcall *new_call = gimple_build_call_vec (fndecl, vargs);
>        if (vec_dest)
>       {
> -       gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
> +       gcc_assert (ratype || known_eq (simd_clone_subparts (rtype),
> +                                       nunits));

Another formatting nit, sorry, but now that it needs to span multiple lines,
the preferred formatting is:

          gcc_assert (ratype
                      || known_eq (simd_clone_subparts (rtype), nunits));

>         if (ratype)
>           new_temp = create_tmp_var (ratype);
>         else if (useless_type_conversion_p (vectype, rtype))
> @@ -4264,12 +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>  
>        if (vec_dest)
>       {
> -       if (simd_clone_subparts (vectype) < nunits)
> +       if (known_lt (simd_clone_subparts (vectype), nunits))

multiple_p (nunits, …) here too.  It would also be possible to do:

          if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)

and avoid the separate vector_unroll_factor.

>           {
>             unsigned int k, l;
>             poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
>             poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
> -           k = nunits / simd_clone_subparts (vectype);
> +           k = vector_unroll_factor (nunits,
> +                                     simd_clone_subparts (vectype));
>             gcc_assert ((k & (k - 1)) == 0);
>             for (l = 0; l < k; l++)
>               {
> @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>               vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
>             continue;
>           }
> -       else if (simd_clone_subparts (vectype) > nunits)
> +       else if (known_gt (simd_clone_subparts (vectype), nunits))

multiple_p here too.

>           {
> -           unsigned int k = (simd_clone_subparts (vectype)
> -                             / simd_clone_subparts (rtype));
> +           unsigned int k = simd_clone_subparts (vectype)
> +                            / simd_clone_subparts (rtype);

Same comment about formatting.

The patch looks really good though, thanks.  The above comments are only
really repetitions of a small number of minor things.

Richard

Reply via email to