"Kewen.Lin" <li...@linux.ibm.com> writes:
> Hi Richard,
>
> on 2022/11/24 17:24, Richard Sandiford wrote:
>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>> Hi,
>>>
>>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
>>> STORE} into normal vector load/store if the given length is known
>>> to be equal to the length of the whole vector.  It would help to
>>> improve overall cycles as normally the latency of vector access
>>> with length in bytes is bigger than normal vector access, and it
>>> also saves the preparation for length if constant length can not
>>> be encoded into instruction (such as on power).
>>>
>>> Bootstrapped and regtested on x86_64-redhat-linux,
>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>>     PR tree-optimization/107412
>>>
>>> gcc/ChangeLog:
>>>
>>>     * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>>>     (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>>>     mask_p indicating it's for mask or length, and add some handlings for
>>>     IFN LEN_{LOAD,STORE}.
>>>     (gimple_fold_mask_load): Rename to ...
>>>     (gimple_fold_partial_load): ... this, add one parameter mask_p.
>>>     (gimple_fold_mask_store): Rename to ...
>>>     (gimple_fold_partial_store): ... this, add one parameter mask_p.
>>>     (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>>>     and adjust calls on gimple_fold_mask_load_store_mem_ref to
>>>     gimple_fold_partial_load_store_mem_ref.
>> 
>> Sorry to reply to late (still catching up on email), but:
>> 
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     * gcc.target/powerpc/pr107412.c: New test.
>>>     * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>>>     folded LEN_LOAD.
>>> ---
>>>  gcc/gimple-fold.cc                            | 57 ++++++++++++++-----
>>>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>>>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++++++
>>>  3 files changed, 64 insertions(+), 14 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>>>
>>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>>> index a1704784bc9..e3a087defa6 100644
>>> --- a/gcc/gimple-fold.cc
>>> +++ b/gcc/gimple-fold.cc
>>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree 
>>> type,
>>>    return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>>>  }
>>>
>>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
>>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a 
>>> MEM_REF
>>>     for the memory it references, otherwise return null.  VECTYPE is the
>>> -   type of the memory vector.  */
>>> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
>>> +   otherwise it's for LEN.  */
>>>
>>>  static tree
>>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool 
>>> mask_p)
>>>  {
>>>    tree ptr = gimple_call_arg (call, 0);
>>>    tree alias_align = gimple_call_arg (call, 1);
>>> -  tree mask = gimple_call_arg (call, 2);
>>> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
>>> +  if (!tree_fits_uhwi_p (alias_align))
>>>      return NULL_TREE;
>>>
>>> +  if (mask_p)
>>> +    {
>>> +      tree mask = gimple_call_arg (call, 2);
>>> +      if (!integer_all_onesp (mask))
>>> +   return NULL_TREE;
>>> +    } else {
>> 
>> Minor nit: }, else, and { should be on separate lines.  But the thing
>> I actually wanted to say was...
>
> Thanks for catching, I must have forgotten to reformat these lines.
>
>> 
>>> +      tree basic_len = gimple_call_arg (call, 2);
>>> +      if (!tree_fits_uhwi_p (basic_len))
>>> +   return NULL_TREE;
>>> +      unsigned int nargs = gimple_call_num_args (call);
>>> +      tree bias = gimple_call_arg (call, nargs - 1);
>>> +      gcc_assert (tree_fits_uhwi_p (bias));
>>> +      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
>>> +      unsigned int len = tree_to_uhwi (biased_len);
>>> +      unsigned int vect_len
>>> +   = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
>>> +      if (vect_len != len)
>>> +   return NULL_TREE;
>> 
>> Using "unsigned int" truncates the value.  I realise that's probably
>> safe in this context, since large values have undefined behaviour.
>> But it still seems better to use an untruncated type, so that it
>> looks less like an oversight.  (I think this is one case where "auto"
>> can help, since it gets the type right automatically.)
>> 
>> It would also be better to avoid the to_constant, since we haven't
>> proven is_constant.  How about:
>> 
>>       tree basic_len = gimple_call_arg (call, 2);
>>       if (!poly_int_tree_p (basic_len))
>>      return NULL_TREE;
>>       unsigned int nargs = gimple_call_num_args (call);
>>       tree bias = gimple_call_arg (call, nargs - 1);
>>       gcc_assert (TREE_CODE (bias) == INTEGER_CST);
>>       if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
>>                  GET_MODE_SIZE (TYPE_MODE (vectype))))
>>      return NULL_TREE;
>> 
>> which also avoids using tree arithmetic for the subtraction?
>
> I agree your proposed code has better robustness, thanks!
>
> Sorry that the original patch was committed, I made a patch as attached.
> It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and
> powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?

OK, thanks.

Richard

> BR,
> Kewen
>
> From 3984a7f86a35d13e1fd40bc0c12ed5ad5b234047 Mon Sep 17 00:00:00 2001
> From: Kewen Lin <li...@linux.ibm.com>
> Date: Sun, 27 Nov 2022 20:29:57 -0600
> Subject: [PATCH] gimple-fold: Refine gimple_fold_partial_load_store_mem_ref
>
> Following Richard's review comments, this patch is to use
> untruncated type for the length used for IFN_LEN_{LOAD,STORE}
> instead of "unsigned int" for better robustness.  It also
> avoid to use to_constant and tree arithmetic for subtraction.
>
> Co-authored-by: Richard Sandiford  <richard.sandif...@arm.com>
>
> gcc/ChangeLog:
>
>       * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use
>       untruncated type for the length, and avoid to_constant and tree
>       arithmetic for subtraction.
> ---
>  gcc/gimple-fold.cc | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index c2d9c806aee..88d14c7adcc 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -5387,18 +5387,17 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, 
> tree vectype, bool mask_p)
>        tree mask = gimple_call_arg (call, 2);
>        if (!integer_all_onesp (mask))
>       return NULL_TREE;
> -    } else {
> +    }
> +  else
> +    {
>        tree basic_len = gimple_call_arg (call, 2);
> -      if (!tree_fits_uhwi_p (basic_len))
> +      if (!poly_int_tree_p (basic_len))
>       return NULL_TREE;
>        unsigned int nargs = gimple_call_num_args (call);
>        tree bias = gimple_call_arg (call, nargs - 1);
> -      gcc_assert (tree_fits_shwi_p (bias));
> -      tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
> -      unsigned int len = tree_to_uhwi (biased_len);
> -      unsigned int vect_len
> -     = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
> -      if (vect_len != len)
> +      gcc_assert (TREE_CODE (bias) == INTEGER_CST);
> +      if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
> +                 GET_MODE_SIZE (TYPE_MODE (vectype))))
>       return NULL_TREE;
>      }

Reply via email to