date:20231031

Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Lehua Ding


Hi Andrew,

On 2023/10/31 14:48, Andrew Pinski wrote:

+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+ code_helper code_in, tree type_in,
+ tree op0, tree op1, tree op2, tree op3,
+ tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}

Hmm, does it make sense to start to use variadic templates for these
constructors instead of writing them out?
And we can even add a static_assert to make sure the number of
arguments is <= MAX_NUM_OPS to make sure they are correct. And use
std::is_same to make sure we are only passing tree types.


You mean something like this?:

template
inline
gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
  code_helper code_in, tree type_in,
  op_types... ops)
  : cond (cond_in), code (code_in), type (type_in), reverse (false),
num_ops (sizeof...(ops))
{
  static_assert (sizeof...(ops) <= MAX_NUM_OPS);
  auto op_list[] = {ops...};
  for (int i = 0; i < sizeof...(ops); i++)
this->ops[i] = op_list[i];
}

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Andrew Pinski

On Tue, Oct 31, 2023 at 12:08 AM Lehua Ding  wrote:
>
> Hi Andrew,
>
> On 2023/10/31 14:48, Andrew Pinski wrote:
> >> +inline
> >> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> >> + code_helper code_in, tree type_in,
> >> + tree op0, tree op1, tree op2, tree op3,
> >> + tree op4, tree op5)
> >> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> >> +num_ops (6)
> >> +{
> >> +  ops[0] = op0;
> >> +  ops[1] = op1;
> >> +  ops[2] = op2;
> >> +  ops[3] = op3;
> >> +  ops[4] = op4;
> >> +  ops[5] = op5;
> >> +}
> > Hmm, does it make sense to start to use variadic templates for these
> > constructors instead of writing them out?
> > And we can even add a static_assert to make sure the number of
> > arguments is <= MAX_NUM_OPS to make sure they are correct. And use
> > std::is_same to make sure we are only passing tree types.
>
> You mean something like this?:
>
> template
> inline
> gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
>code_helper code_in, tree type_in,
>   op_types... ops)
>: cond (cond_in), code (code_in), type (type_in), reverse (false),
>  num_ops (sizeof...(ops))
> {
>static_assert (sizeof...(ops) <= MAX_NUM_OPS);
>auto op_list[] = {ops...};
>for (int i = 0; i < sizeof...(ops); i++)
>  this->ops[i] = op_list[i];
> }

Yes and maybe use tree for the type of op_list instead of auto.
I suspect this code was originally written before GCC was written in C++11.
Maybe if this code is being compiled with C++20 we could do something like:
#include 
template< std::same_as... op_types>

To get a decent error message earlier ...

Thanks,
Andrew

>
> --
> Best,
> Lehua (RiVAI)
> lehua.d...@rivai.ai

Re: [PATCH6/8] omp: Reorder call for TARGET_SIMD_CLONE_ADJUST (was Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM)

2023-10-31 Thread Richard Biener

On Wed, 18 Oct 2023, Andre Vieira (lists) wrote:

> This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the
> arguments and return types have been transformed into vector types.  It also
> constructs the adjuments and retval modifications after this call, allowing
> targets to alter the types of the arguments and return of the clone prior to
> the modifications to the function definition.
> 
> Is this OK?

OK (I was hoping for Jakub to have a look).

Thanks,
Richard.

> gcc/ChangeLog:
> 
> * omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out
> code to create return array and don't return new type.
> (simd_clone_adjust_argument_types): Hoist out code that creates
> ipa_param_body_adjustments and don't return them.
> (simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return
> and argument types have been vectorized, create adjustments and
> return array after the hook.
> (expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return
> and argument types have been vectorized.
> 
> On 04/10/2023 13:40, Andre Vieira (lists) wrote:
> > 
> > 
> > On 04/10/2023 11:41, Richard Biener wrote:
> >> On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:
> >>
> >>>
> >>>
> >>> On 30/08/2023 14:04, Richard Biener wrote:
>  On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> 
> > This patch adds a new target hook to enable us to adapt the types of
> > return
> > and parameters of simd clones.  We use this in two ways, the first one
> > is
> > to
> > make sure we can create valid SVE types, including the SVE type
> > attribute,
> > when creating a SVE simd clone, even when the target options do not
> > support
> > SVE.  We are following the same behaviour seen with x86 that creates
> > simd
> > clones according to the ABI rules when no simdlen is provided, even if
> > that
> > simdlen is not supported by the current target options.  Note that this
> > doesn't mean the simd clone will be used in auto-vectorization.
> 
>  You are not documenting the bool parameter of the new hook.
> 
>  What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
> >>>
> >>> simd_clone_adjust_argument_types is called after that hook, so by the time
> >>> we
> >>> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not vector. 
> >>> The
> >>> same is true for the return type one.
> >>>
> >>> Also the changes to the types need to be taken into consideration in
> >>> 'adjustments' I think.
> >>
> >> Nothing in the three existing implementations of TARGET_SIMD_CLONE_ADJUST
> >> relies on this ordering I think, how about moving the hook invocation
> >> after simd_clone_adjust_argument_types?
> >>
> > 
> > But that wouldn't change the 'ipa_param_body_adjustments' for when we have a
> > function definition and we need to redo the body.
> >> Richard.
> >>
> >>> PS: I hope the subject line survived, my email client is having a bit of a
> >>> wobble this morning... it's what you get for updating software :(
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Fix PR ada/111909 On Darwin, determine filesystem case sensitivity at runtime

2023-10-31 Thread Iain Sandoe

Hi Simon,

(please cc me on Darwin-related patches)

> On 29 Oct 2023, at 11:51, Simon Wright  wrote:
> 
> This change affects only Ada.
> 
> In gcc/ada/adaint.c(__gnat_get_file_names_case_sensitive), the
> assumption for __APPLE__ is that file names are case-insensitive
> unless __arm__ or __arm64__ are defined, in which case file names
> are declared case-sensitive.
> 
> The associated comment is
>  "By default, we suppose filesystems aren't case sensitive on
>  Windows and Darwin (but they are on arm-darwin)."
> 
> This means that on aarch64-apple-darwin, file names are declared
> case-sensitive, which is not normally the case (but users can set
> up case-sensitive volumes).
> 
> It's understood that GCC does not currently support iOS/tvOS/watchOS,
> so we assume macOS.
> 
> Bootstrapped on x86_64-apple-darwin with languages c,c++,ada and regression 
> tested (check-gnat).
> Also, tested with the example from PR ada/81114, extracted into 4 volumes 
> (APFS, APFS-case-sensitive,
> HFS, HFS-case-sensitive; the example code built successfully on the 
> case-sensitive volumes.
> Setting GNAT_FILE_NAME_CASE_SENSITIVE successfully overrode the choices made 
> by the
> new code.

This does not yet work on at least Darwin17 and Darwin9, even though the 
‘getattrlist()` call
and the `VOL_CAP_FMT_CASE_SENSITIVE` should exist on both.  So we need to figure
out why the current code is not working (so, not yet OK from a Darwin 
perspective).

thanks
Iain

> 
> gcc/ada/Changelog:
> 
> 2023-10-29 Simon Wright 
> 
> PR ada/111909
> 
> * gcc/ada/adaint.c
>  (__gnat_get_file_names_case_sensitive): Remove the checks for
>  __arm__, __arm64__.
>  Split out the check for __APPLE__; remove the checks for __arm__,
>  __arm64__, and use getattrlist(2) to determine whether the current
>  working directory is on a case-sensitive filesystem.
> 
> Signed-off-by: Simon Wright 
> ---
> gcc/ada/adaint.c | 46 ++
> 1 file changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
> index 2a193efc002..43d166824b0 100644
> --- a/gcc/ada/adaint.c
> +++ b/gcc/ada/adaint.c
> @@ -85,6 +85,7 @@
> 
> #if defined (__APPLE__)
> #include 
> +#include 
> #endif
> 
> #if defined (__hpux__)
> @@ -613,11 +614,48 @@ __gnat_get_file_names_case_sensitive (void)
>   else
>   {
> /* By default, we suppose filesystems aren't case sensitive on
> -  Windows and Darwin (but they are on arm-darwin).  */
> -#if defined (WINNT) || defined (__DJGPP__) \
> -  || (defined (__APPLE__) && !(defined (__arm__) || defined (__arm64__)))
> +  Windows or DOS.  */
> +#if defined (WINNT) || defined (__DJGPP__)
> file_names_case_sensitive_cache = 0;
> -#else
> +#elif defined (__APPLE__)
> +   /* Determine whether the current volume is case-sensitive.  */
> +   {
> + /* Formulate a query for the volume capabilities.  */
> + struct attrlist attrList
> +   = {ATTR_BIT_MAP_COUNT,
> +  0,   /* reserved.  */
> +  0,   /* commonattr.  */
> +  ATTR_VOL_INFO | ATTR_VOL_CAPABILITIES, /* volattr.  */
> +  0,   /* dirattr.  */
> +  0,   /* fileattr.  */
> +  0/* forkattr.  */
> + };
> +
> + /* A buffer to contain just the volume capabilities.  */
> + struct returnBuf {
> +   u_int32_t length;
> +   vol_capabilities_attr_t caps;
> + } __attribute__ ((aligned (4), packed)) retBuf;
> +
> + /* Default to case-insensitive.  */
> + file_names_case_sensitive_cache = 0;
> +
> + /* Query the current working directory.  */
> + if (getattrlist (".",
> +  &attrList,
> +  &retBuf,
> +  sizeof (retBuf),
> +  0) == 0)
> +   /* The call succeeded.  */
> +   if ((retBuf.caps.valid[VOL_CAPABILITIES_FORMAT]
> +& VOL_CAP_FMT_CASE_SENSITIVE))
> + /* The volume could be case-sensitive.  */
> + if (retBuf.caps.capabilities[VOL_CAPABILITIES_FORMAT]
> + & VOL_CAP_FMT_CASE_SENSITIVE)
> +   /* The volume is case-sensitive.  */
> +   file_names_case_sensitive_cache = 1;
> +   }
> +#else /* Neither Windows nor Apple.  */
> file_names_case_sensitive_cache = 1;
> #endif
>   }
> -- 
> 2.39.3 (Apple Git-145)
>

Re: [PATCH 1/2] match.pd: Support combine cond_len_op + vec_cond similar to cond_op

2023-10-31 Thread Lehua Ding


Hi Andrew,


Yes and maybe use tree for the type of op_list instead of auto.
I suspect this code was originally written before GCC was written in C++11.
Maybe if this code is being compiled with C++20 we could do something like:
#include 
template< std::same_as... op_types>

To get a decent error message earlier ...


Or I think it's easier to understand without using a template by 
changing it to the following:


inline
gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
  code_helper code_in, tree type_in,
  tree ops[], int num_op)
{
  for (int i = 0; i < num_op)
this->ops[i] = ops[i];
}

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

Re: [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-10-31 Thread Robin Dapp

Hi Juzhe,

> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a vector of mode m.
> +Operand 1 is a scalar base address and operand 2 is mode @var{n}
> +specifying each uniform stride between consecutive element.
How about:

"into a destination vector of mode @var{m} (operand 0). Operand 1
is a scalar base address.  Operand 2 is a scalar stride of mode @var{n}"
such that element @var{i} of the destination is loaded from
(operand 1) + @var{i} * (operand 2).  The instruction can be seen
as a special case of @code{mask_len_gather_load@var{m}@var{n}} with
an offset vector that is a @code{vec_series} with (operand 1) as base
and (operand 2) as step.

> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  

Maybe: Similar to mask_len_load, operand 3 contains the mask, operand 4
the length and operand 5 the bias.  The instruction loads...

> +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}@var{n}}
> +Store a vector of mode @var{m} into several distinct memory locations.
> +Operand 0 is a scalar base address, operand 2 is the vector to be stored,
> +and operand 1 is mode @var{n} specifying each uniform stride between 
> consecutive element.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  Similar to mask_len_store, the instruction stores at most
> +(operand 4 + operand 5) elements to memory.  Bit @var{i} of the mask is set
> +if element @var{i} of the result should be storeed.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.

Same here.

Regards
 Robin

[PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-10-31 Thread Li Xu

From: xuli 

Update in v6:
* Rename maybe_require_frm_p to may_require_frm_p.
* Rename maybe_require_vxrm_p to may_require_vxrm_p.
* Move may_require_frm_p and may_require_vxrm_p to function_base.

Update in v5:
* Split has_vxrm_or_frm_p into maybe_require_frm_p and
  maybe_require_vxrm_p.
* Adjust comments.

Update in v4:
* Remove class function_resolver.
* Remove function get_non_overloaded_instance.
* Add overloaded hash traits for non-overloaded intrinsic.
* All overloaded intrinsics are implemented, and the tests pass.

Update in v3:

* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.

Update in v2:

* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.

Original log:

This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.

However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.

* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New 
function for the hook.
(riscv_register_pragmas): Register the hook.
* config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
* config/riscv/riscv-vector-builtins-bases.cc: New function impl.
* config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register 
overloaded function.
* config/riscv/riscv-vector-builtins.cc (struct 
non_overloaded_registered_function_hasher): New hash table.
(function_builder::add_function): Add overloaded arg.
(function_builder::add_unique_function): Map overloaded function to 
non-overloaded function.
(function_builder::add_overloaded_function): New API impl.
(registered_function::overloaded_hash): Calculate hash value.
(has_vxrm_or_frm_p): New function impl.
(non_overloaded_registered_function_hasher::hash): Ditto.
(non_overloaded_registered_function_hasher::equal): Ditto.
(handle_pragma_vector): Allocate space for hash table.
(resolve_overloaded_builtin): New function impl.
* config/riscv/riscv-vector-builtins.h 
(function_base::may_require_frm_p): Ditto.
(function_base::may_require_vxrm_p): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vmv.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vmv.c: New test.
* gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: New test.
* gcc.target/riscv/rvv/base/overloaded_vadd.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vfadd.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vget_vset.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vloxseg2ei16.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vmv.h: New test.
* gcc.target/riscv/rvv/base/overloaded_vreinterpret.h: New test.

Signed-off-by: Li Xu 
Co-Authored-By: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   |  36 ++-
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-bases.cc  |  69 +-
 .../riscv/riscv-vector-builtins-shapes.cc |   1 +
 gcc/config/riscv/riscv-vector-builtins.cc | 226 +-
 gcc/config/riscv/riscv-vector-builtins.h  |  27 ++-
 .../riscv/rvv/base/overloaded_rv32_vadd.c |  12 +
 .../riscv/rvv/base/overloaded_rv32_vfadd.c|  12 +
 .../rvv/base/overloaded_rv32_vget_vset.c  |   7 +
 .../rvv/base/overloaded_rv32_vloxseg2ei16.c   |  11 +
 .../riscv/rvv/base/overloaded_rv32_vmv.c  |  10 +
 .../rvv/base/overloaded_rv32_vreinterpret.c   |  10 +
 .../riscv/rvv/base/overloaded_rv64_vadd.c |  11 +
 .../riscv/rvv/base/overloaded_rv64_vfadd.c|  11 +
 .../rvv/base/overloaded_rv64_vget_vset.c  |   6 +
 .../rvv/base/overloaded_rv64_vloxseg2ei16.c   |  10 +
 .../riscv/rvv/base/overloaded_rv64_vmv.c  |  10 +
 .../rvv/base/overloaded_rv64_vreinterpret.c   |   9 +
 .../riscv/rvv/base/overloaded_vadd.h  |  59 +
 .../ri

Re: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-10-31 Thread juzhe.zh...@rivai.ai

LGTM from my side.

Give kito one more day to review it.

Thanks for support this feature !

juzhe.zh...@rivai.ai

From: Li Xu
Date: 2023-10-31 17:03
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
intrinsic
From: xuli 

Update in v6:
* Rename maybe_require_frm_p to may_require_frm_p.
* Rename maybe_require_vxrm_p to may_require_vxrm_p.
* Move may_require_frm_p and may_require_vxrm_p to function_base.

Update in v5:
* Split has_vxrm_or_frm_p into maybe_require_frm_p and
  maybe_require_vxrm_p.
* Adjust comments.

Update in v4:
* Remove class function_resolver.
* Remove function get_non_overloaded_instance.
* Add overloaded hash traits for non-overloaded intrinsic.
* All overloaded intrinsics are implemented, and the tests pass.

Update in v3:

* Rewrite comment for overloaded function add.
* Move get_non_overloaded_instance to function_base.

Update in v2:

* Add get_non_overloaded_instance for function instance.
* Fix overload check for policy function.
* Enrich the test cases check.

Original log:

This patch would like add the framework to support the RVV overloaded
intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.

However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
with below steps.

* Register overloaded functions.
* Add function_resolver for overloaded function resolving.
* Add resolve API for function shape with default implementation.
* Implement HOOK for navigating the overloaded API to non-overloaded API.

gcc/ChangeLog:

    * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New 
function for the hook.
    (riscv_register_pragmas): Register the hook.
    * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
    * config/riscv/riscv-vector-builtins-bases.cc: New function impl.
    * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register 
overloaded function.
    * config/riscv/riscv-vector-builtins.cc (struct 
non_overloaded_registered_function_hasher): New hash table.
    (function_builder::add_function): Add overloaded arg.
    (function_builder::add_unique_function): Map overloaded function to 
non-overloaded function.
    (function_builder::add_overloaded_function): New API impl.
    (registered_function::overloaded_hash): Calculate hash value.
    (has_vxrm_or_frm_p): New function impl.
    (non_overloaded_registered_function_hasher::hash): Ditto.
    (non_overloaded_registered_function_hasher::equal): Ditto.
    (handle_pragma_vector): Allocate space for hash table.
    (resolve_overloaded_builtin): New function impl.
    * config/riscv/riscv-vector-builtins.h 
(function_base::may_require_frm_p): Ditto.
    (function_base::may_require_vxrm_p): Ditto.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv32_vmv.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv64_vmv.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: New test.
    * gcc.target/riscv/rvv/base/overloaded_vadd.h: New test.
    * gcc.target/riscv/rvv/base/overloaded_vfadd.h: New test.
    * gcc.target/riscv/rvv/base/overloaded_vget_vset.h: New test.
    * gcc.target/riscv/rvv/base/overloaded_vloxseg2ei16.h: New test.
    * gcc.target/riscv/rvv/base/overloaded_vmv.h: New test.
    * gcc.target/riscv/rvv/base/overloaded_vreinterpret.h: New test.

Signed-off-by: Li Xu 
Co-Authored-By: Pan Li 
---
gcc/config/riscv/riscv-c.cc   |  36 ++-
gcc/config/riscv/riscv-protos.h   |   1 +
.../riscv/riscv-vector-builtins-bases.cc  |  69 +-
.../riscv/riscv-vector-builtins-shapes.cc |   1 +
gcc/config/riscv/riscv-vector-builtins.cc | 226 +-
gcc/config/riscv/riscv-vector-builtins.h  |  27 ++-
.../riscv/rvv/base/overloaded_rv32_vadd.c |  12 +
.../riscv/rvv/base/overloaded_rv32_vfadd.c    |  12 +
.../rvv/base/overloaded_rv32_vget_vset.c  |   7 +
.../rvv/base/overloaded_rv32_vloxseg2ei16.c   |  11 +
.../riscv/rvv/base/overloaded_rv32_vmv.c  |  10 +
.../rvv/base/overloaded_rv32_vreinterpret.c   |  10 +
.../riscv/rvv/base/overloaded_rv64_vadd.c |  11 +
.../riscv/rvv/base/overloaded_rv64_vfadd.c

Re: Re: [PATCH] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-10-31 Thread juzhe.zh...@rivai.ai

Thanks Robin. Address comments on V2.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-31 16:45
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford; rguenther; jeffreyalaw
Subject: Re: [PATCH] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
Hi Juzhe,
 
> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a vector of mode m.
> +Operand 1 is a scalar base address and operand 2 is mode @var{n}
> +specifying each uniform stride between consecutive element.
How about:
 
"into a destination vector of mode @var{m} (operand 0). Operand 1
is a scalar base address.  Operand 2 is a scalar stride of mode @var{n}"
such that element @var{i} of the destination is loaded from
(operand 1) + @var{i} * (operand 2).  The instruction can be seen
as a special case of @code{mask_len_gather_load@var{m}@var{n}} with
an offset vector that is a @code{vec_series} with (operand 1) as base
and (operand 2) as step.
 
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  
 
Maybe: Similar to mask_len_load, operand 3 contains the mask, operand 4
the length and operand 5 the bias.  The instruction loads...
 
> +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}@var{n}}
> +Store a vector of mode @var{m} into several distinct memory locations.
> +Operand 0 is a scalar base address, operand 2 is the vector to be stored,
> +and operand 1 is mode @var{n} specifying each uniform stride between 
> consecutive element.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is
> +bias operand.  Similar to mask_len_store, the instruction stores at most
> +(operand 4 + operand 5) elements to memory.  Bit @var{i} of the mask is set
> +if element @var{i} of the result should be storeed.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
 
Same here.
 
Regards
Robin

[PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-10-31 Thread Juzhe-Zhong

As previous Richard's suggested, we should support strided load/store in
loop vectorizer instead hacking RISC-V backend.

This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.

The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
changing vector offset into scalar stride.

We don't have strided_load/strided_store and 
mask_strided_load/mask_strided_store since
it't unlikely RVV will have such optabs and we can't add the patterns that we 
can't test them.


gcc/ChangeLog:

* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (internal_load_fn_p): Ditto.
(internal_strided_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_strided_fn_supported_p): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* internal-fn.h (internal_strided_fn_p): Ditto.
(internal_strided_fn_supported_p): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 51 +
 gcc/internal-fn.cc  | 44 ++
 gcc/internal-fn.def |  4 
 gcc/internal-fn.h   |  2 ++
 gcc/optabs.def  |  2 ++
 5 files changed, 103 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..5bac713a0dd 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}@var{n}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of mode 
@var{n}.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i:
+
+@itemize @bullet
+@item
+extend the stride to address width, using zero
+extension if operand 3 is 1 and sign extension if operand 3 is zero;
+@item
+multiply the extended stride by operand 4;
+@item
+add the result to the base; and
+@item
+load the value at that address (operand 1 + @var{i} * multiplied and extended 
stride) into element @var{i} of operand 0.
+@end itemize
+
+Similar to mask_len_load, the instruction loads at most (operand 6 + operand 
7) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}@var{n}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of mode 
@var{n}.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i:
+
+@itemize @bullet
+@item
+extend the stride to address width, using zero
+extension if operand 2 is 1 and sign extension if operand 2 is zero;
+@item
+multiply the extended stride by operand 3;
+@item
+add the result to the base; and
+@item
+store element @var{i} of operand 4 to that address (operand 1 + @var{i} * 
multiplied and extended stride).
+@end itemize
+
+Similar to mask_len_store, the instruction stores at most (operand 6 + operand 
7) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e7451b96353..f7f85aa7dde 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4

[PATCH] VECT: Support mask_len_strided_load/mask_len_strided_store in loop vectorize

2023-10-31 Thread Juzhe-Zhong

This patch support loop vectorizer generate direct strided load/store IFN
if targets enable it.

Note that this patch provide the ability that target enabling strided 
load/store but without gather/scatter
can vectorize stride memory access.

gcc/ChangeLog:

* optabs-query.cc (supports_vec_gather_load_p): Support strided 
load/store.
(supports_vec_scatter_store_p): Ditto.
* optabs-query.h (supports_vec_gather_load_p): Ditto.
(supports_vec_scatter_store_p): Ditto.
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
(vect_check_gather_scatter): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_truncate_gather_scatter_offset): Ditto.
(vect_use_strided_gather_scatters_p): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Ditto.

---
 gcc/optabs-query.cc| 27 ++-
 gcc/optabs-query.h |  4 +--
 gcc/tree-vect-data-refs.cc | 71 --
 gcc/tree-vect-stmts.cc | 46 +---
 gcc/tree-vectorizer.h  |  3 +-
 5 files changed, 109 insertions(+), 42 deletions(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 947ccef218c..ea594baf15d 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -670,14 +670,19 @@ supports_vec_convert_optab_p (optab op, machine_mode mode)
for at least one vector mode.  */
 
 bool
-supports_vec_gather_load_p (machine_mode mode)
+supports_vec_gather_load_p (machine_mode mode, bool strided_p)
 {
   if (!this_fn_optabs->supports_vec_gather_load[mode])
 this_fn_optabs->supports_vec_gather_load[mode]
   = (supports_vec_convert_optab_p (gather_load_optab, mode)
-|| supports_vec_convert_optab_p (mask_gather_load_optab, mode)
-|| supports_vec_convert_optab_p (mask_len_gather_load_optab, mode)
-? 1 : -1);
+|| supports_vec_convert_optab_p (mask_gather_load_optab, mode)
+|| supports_vec_convert_optab_p (mask_len_gather_load_optab, mode)
+|| (strided_p
+&& convert_optab_handler (mask_len_strided_load_optab, mode,
+  Pmode)
+ != CODE_FOR_nothing)
+  ? 1
+  : -1);
 
   return this_fn_optabs->supports_vec_gather_load[mode] > 0;
 }
@@ -687,14 +692,20 @@ supports_vec_gather_load_p (machine_mode mode)
for at least one vector mode.  */
 
 bool
-supports_vec_scatter_store_p (machine_mode mode)
+supports_vec_scatter_store_p (machine_mode mode, bool strided_p)
 {
   if (!this_fn_optabs->supports_vec_scatter_store[mode])
 this_fn_optabs->supports_vec_scatter_store[mode]
   = (supports_vec_convert_optab_p (scatter_store_optab, mode)
-|| supports_vec_convert_optab_p (mask_scatter_store_optab, mode)
-|| supports_vec_convert_optab_p (mask_len_scatter_store_optab, mode)
-? 1 : -1);
+|| supports_vec_convert_optab_p (mask_scatter_store_optab, mode)
+|| supports_vec_convert_optab_p (mask_len_scatter_store_optab,
+ mode)
+|| (strided_p
+&& convert_optab_handler (mask_len_strided_store_optab, mode,
+  Pmode)
+ != CODE_FOR_nothing)
+  ? 1
+  : -1);
 
   return this_fn_optabs->supports_vec_scatter_store[mode] > 0;
 }
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 920eb6a1b67..7c22edc5a78 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -191,8 +191,8 @@ bool can_compare_and_swap_p (machine_mode, bool);
 bool can_atomic_exchange_p (machine_mode, bool);
 bool can_atomic_load_p (machine_mode);
 bool lshift_cheap_p (bool);
-bool supports_vec_gather_load_p (machine_mode = E_VOIDmode);
-bool supports_vec_scatter_store_p (machine_mode = E_VOIDmode);
+bool supports_vec_gather_load_p (machine_mode = E_VOIDmode, bool = false);
+bool supports_vec_scatter_store_p (machine_mode = E_VOIDmode, bool = false);
 bool can_vec_extract (machine_mode, machine_mode);
 
 /* Version of find_widening_optab_handler_and_mode that operates on
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2..d374849b0a7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3913,9 +3913,9 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
*IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT.  */
 
 bool
-vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p,
- tree vectype, tree memory_type, tree offset_type,
- int scale, internal_fn *ifn_out,
+vect_gather_scatter_fn_p (vec_info *vinfo, bool strided_p, bool read_p,
+ bool masked_p, tree vectype, tree memory_type,
+

[PATCH] RISC-V: Support strided load/store

2023-10-31 Thread Juzhe-Zhong

This patch is depending on middle-end patches which are under review.
But we can pre-review this patch before the middle-end patch.

Consider this following case:
void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
for (int i = 0; i < n; i++)
  a[i*stride] = b[i*stride] + 100;
}

Before this patch:

sllia6,a2,2
vid.v   v1
vmul.vx v1,v1,a2
vsetvli zero,zero,e64,m2,ta,ma
vsext.vf2   v4,v1
vsll.vi v4,v4,2
.L4:
vsetvli a5,a3,e32,m1,ta,ma
mul a4,a6,a5
vluxei64.v  v1,(a1),v4
sub a3,a3,a5
vadd.vv v1,v1,v2
vsuxei64.v  v1,(a0),v4
add a1,a1,a4
add a0,a0,a4
bne a3,zero,.L4
ret

After this patch:

sllia6,a2,2
mv  a4,a6
.L4:
vsetvli a5,a3,e32,m1,ta,ma
mul a2,a6,a5
vlse32.vv1,0(a1),a4
sub a3,a3,a5
vadd.vv v1,v1,v2
vsse32.vv1,0(a0),a4
add a1,a1,a2
add a0,a0,a2
bne a3,zero,.L4
ret

gcc/ChangeLog:

* config/riscv/autovec.md (mask_len_strided_load): 
New pattern.
(mask_len_strided_store): Ditto.
* config/riscv/predicates.md (vector_stride_extension_operand): New 
predicate.
* config/riscv/riscv-protos.h (expand_strided_load_store): New function.
* config/riscv/riscv-v.cc (expand_strided_load_store): Ditto.
* config/riscv/vector-iterators.md: New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: Adapt 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c: 
New test.
* 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c: 
New test.
* 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c: New 
test.

---
 gcc/config/riscv/autovec.md   | 34 +++
 gcc/config/riscv/predicates.md|  6 ++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 76 +++
 gcc/config/riscv/vector-iterators.md  |  5 +
 .../gather-scatter/mask_strided_load-1.c  | 47 +
 .../gather-scatter/mask_strided_load_run-1.c  | 96 +++
 .../gather-scatter/mask_strided_store-1.c | 48 ++
 .../gather-scatter/mask_strided_store_run-1.c | 88 +
 .../autovec/gather-scatter/strided_load-1.c   |  2 +-
 .../autovec/gather-scatter/strided_load-2.c   |  2 +-
 .../autovec/gather-scatter/strided_load-3.c   | 45 +
 .../gather-scatter/strided_load_run-3.c   | 84 
 .../autovec/gather-scatter/strided_store-1.c  |  2 +-
 .../autovec/gather-scatter/strided_store-2.c  |  2 +-
 .../autovec/gather-scatter/strided_store-3.c  | 45 +
 16 files changed, 579 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 5f49d73be44..69a64f444ef 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -272,6 +272,40 @@
   DONE;
 })
 
+;; =
+;; == Stried Load/Store
+;; =
+
+(define_expand "mask_len_strided_load"
+  [(match_operand:V 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:ANYI 2 "register_operand")
+   (match_operand 3 "")
+   (match_operand 4 "")
+   (match_operand: 5 "vector_mask_operand")
+   (match_operand 6 "autovec_length_operand")
+   (match_operand 7 "const_0_operand")]
+

Re: [PATCH 2/2] tree-optimization/111131 - SLP for non-IFN gathers

2023-10-31 Thread Thomas Schwinge

Hi!

On 2023-10-19T11:47:14+, Richard Biener  wrote:
> The following implements SLP vectorization support for gathers
> without relying on IFNs being pattern detected (and supported by
> the target).  That includes support for emulated gathers but also
> the legacy x86 builtin path.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

For GCN (tested '-march=gfx90a'), I see:

 PASS: gcc.dg/vect/vect-gather-2.c (test for excess errors)
+FAIL: gcc.dg/vect/vect-gather-2.c scan-tree-dump vect "different gather 
base"
+FAIL: gcc.dg/vect/vect-gather-2.c scan-tree-dump vect "different gather 
scale"
+PASS: gcc.dg/vect/vect-gather-2.c scan-tree-dump-not vect "Loop contains 
only SLP stmts"


Grüße
 Thomas


>   PR tree-optimization/31
>   * tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
>   sure to update all gather/scatter stmt DRs, not only those
>   that eventually got VMAT_GATHER_SCATTER set.
>   * tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
>   (vect_get_and_check_slp_defs): Handle gathers/scatters,
>   adding the offset as SLP operand and comparing base and scale.
>   (vect_build_slp_tree_1): Handle gathers.
>   (vect_build_slp_tree_2): Likewise.
>
>   * gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
>   everywhere.
>   * gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
>   Massage the scale case to more reliably produce a different
>   one.  Scan for the specific messages.
>   * gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
>   for AVX2, but not emulated.
>   * gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
>   Massage to more properly ensure this.
>   * gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
>   everywhere.
> ---
>  .../gcc.dg/vect/tsvc/vect-tsvc-s353.c |  2 +-
>  gcc/testsuite/gcc.dg/vect/vect-gather-1.c |  2 +-
>  gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 13 --
>  gcc/testsuite/gcc.dg/vect/vect-gather-3.c |  2 +-
>  gcc/testsuite/gcc.dg/vect/vect-gather-4.c |  6 +--
>  gcc/tree-vect-loop.cc |  6 ++-
>  gcc/tree-vect-slp.cc  | 45 +--
>  7 files changed, 61 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> index 98ba7522471..2c4fa3f5991 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> @@ -44,4 +44,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> riscv_v } } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> index e3bbf5c0bf8..5f6640d9ab6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> @@ -58,4 +58,4 @@ main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
> vect_gather_load_ifn } } } */
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
> index a1f6ba458a9..4c23b808333 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
> @@ -8,6 +8,7 @@ f1 (int *restrict y, int *restrict x1, int *restrict x2,
>  {
>for (int i = 0; i < N; ++i)
>  {
> +  /* Different base.  */
>y[i * 2] = x1[indices[i * 2]] + 1;
>y[i * 2 + 1] = x2[indices[i * 2 + 1]] + 2;
>  }
> @@ -18,8 +19,9 @@ f2 (int *restrict y, int *restrict x, int *restrict indices)
>  {
>for (int i = 0; i < N; ++i)
>  {
> -  y[i * 2] = x[indices[i * 2]] + 1;
> -  y[i * 2 + 1] = x[indices[i * 2 + 1] * 2] + 2;
> +  /* Different scale.  */
> +  y[i * 2] = *(int *)((char *)x + (__UINTPTR_TYPE__)indices[i * 2] * 4) 
> + 1;
> +  y[i * 2 + 1] = *(int *)((char *)x + (__UINTPTR_TYPE__)indices[i * 2 + 
> 1] * 2) + 2;
>  }
>  }
>
> @@ -28,9 +30,12 @@ f3 (int *restrict y, int *restrict x, int *restrict 
> indices)
>  {
>for (int i = 0; i < N; ++i)
>  {
> +  /* Different type.  */
>y[i * 2] = x[indices[i * 2]] + 1;
> -  y[i * 2 + 1] = x[(unsigned int) indices[i * 2 + 1]] + 2;
> +  y[i * 2 + 1] = x[((unsigned int *) indices)[i * 2 + 1]] + 2;
>  }
>  }
>
> -/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect { 
> target vect_gather_load_ifn } } } */
> +/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */
> +/* { dg-final { scan-tree-dump "different gather base" vect { target { ! 
> vect_gather_load_ifn } } } } */
> +/* { dg-final { scan-tree-dump "different gather scale"

[committed] d: Clean-up unused variable assignments after interface change

2023-10-31 Thread Iain Buclaw

Hi,

The lowering done for invoking `new' on a single dimension array was
moved from the code generator to the front-end semantic pass in 
r14-4996.  This removes the detritus left behind in the code generator
from that deletion.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (NewExp *)): Remove unused assignments.
---
 gcc/d/expr.cc | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index ef4ea60ffed..17801a3bd1e 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -2357,9 +2357,6 @@ public:
 else if (tb->ty == TY::Tarray)
   {
/* Allocating memory for a new D array.  */
-   tb = e->newtype->toBasetype ();
-   TypeDArray *tarray = tb->isTypeDArray ();
-
gcc_assert (e->arguments && e->arguments->length >= 1);
 
if (e->arguments->length == 1)
@@ -2403,7 +2400,8 @@ public:
   size_int (e->arguments->length),
   build_address (var));
 
-   result = build_libcall (libcall, tb, 2, tinfo, dims);
+   result = build_libcall (libcall, e->newtype->toBasetype (), 2,
+   tinfo, dims);
  }
 
if (e->argprefix)
-- 
2.39.2

Re: Re: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-10-31 Thread Li Xu

All overload and non-overload intrinsics have been tested successfully on gcc 
and g++.

Thanks.


> -原始邮件-发件人:"juzhe.zh...@rivai.ai" 
> 发送时间:2023-10-31 17:07:11 (星期二)收件人:"Li Xu" 
> , gcc-patches 
> 抄送:"kito.cheng" , palmer 
> , "Li Xu" 主题:Re: [PATCH v6] 
> RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
> 
> LGTM from my side.
> 
> Give kito one more day to review it.
> 
> Thanks for support this feature !
> 
> juzhe.zh...@rivai.ai
>  
> From: Li Xu
> Date: 2023-10-31 17:03
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; xuli
> Subject: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
> intrinsic
> From: xuli 
>  
> Update in v6:
> * Rename maybe_require_frm_p to may_require_frm_p.
> * Rename maybe_require_vxrm_p to may_require_vxrm_p.
> * Move may_require_frm_p and may_require_vxrm_p to function_base.
>  
> Update in v5:
> * Split has_vxrm_or_frm_p into maybe_require_frm_p and
>   maybe_require_vxrm_p.
> * Adjust comments.
>  
> Update in v4:
> * Remove class function_resolver.
> * Remove function get_non_overloaded_instance.
> * Add overloaded hash traits for non-overloaded intrinsic.
> * All overloaded intrinsics are implemented, and the tests pass.
>  
> Update in v3:
>  
> * Rewrite comment for overloaded function add.
> * Move get_non_overloaded_instance to function_base.
>  
> Update in v2:
>  
> * Add get_non_overloaded_instance for function instance.
> * Fix overload check for policy function.
> * Enrich the test cases check.
>  
> Original log:
>  
> This patch would like add the framework to support the RVV overloaded
> intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
>  
> However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
> with below steps.
>  
> * Register overloaded functions.
> * Add function_resolver for overloaded function resolving.
> * Add resolve API for function shape with default implementation.
> * Implement HOOK for navigating the overloaded API to non-overloaded API.
>  
> gcc/ChangeLog:
>  
>     * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New 
> function for the hook.
>     (riscv_register_pragmas): Register the hook.
>     * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
>     * config/riscv/riscv-vector-builtins-bases.cc: New function impl.
>     * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register 
> overloaded function.
>     * config/riscv/riscv-vector-builtins.cc (struct 
> non_overloaded_registered_function_hasher): New hash table.
>     (function_builder::add_function): Add overloaded arg.
>     (function_builder::add_unique_function): Map overloaded function to 
> non-overloaded function.
>     (function_builder::add_overloaded_function): New API impl.
>     (registered_function::overloaded_hash): Calculate hash value.
>     (has_vxrm_or_frm_p): New function impl.
>     (non_overloaded_registered_function_hasher::hash): Ditto.
>     (non_overloaded_registered_function_hasher::equal): Ditto.
>     (handle_pragma_vector): Allocate space for hash table.
>     (resolve_overloaded_builtin): New function impl.
>     * config/riscv/riscv-vector-builtins.h 
> (function_base::may_require_frm_p): Ditto.
>     (function_base::may_require_vxrm_p): Ditto.
>  
> gcc/testsuite/ChangeLog:
>  
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vmv.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vmv.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vadd.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vfadd.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vget_vset.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vloxseg2ei16.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vmv.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vreinterpret.h: New test.
>  
> Signed-off-by: Li Xu 
> Co-Authored-By: Pan Li 
> ---
> gcc/config/riscv/riscv-c.cc   |  36 ++-
> gcc/config/riscv/riscv-protos.h   |   1 +
> .../riscv/riscv-vector-builtins-bases.cc  |  69 +-
> .../riscv/riscv-vector-builtins-shapes.cc |   1 +

Re: [PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-31 Thread Surya Kumari Jangala

Hi Segher,
My replies are inlined:

On 29/10/23 10:16 am, Segher Boessenkool wrote:
> Hi!
> 
> Please say "rs6000/p8swap:" in the subject, not "swap:" :-)
> 
> On Sun, Sep 10, 2023 at 10:58:32PM +0530, Surya Kumari Jangala wrote:
>> Another issue with always handling swappable instructions is that it is
>> incorrect to do so in webs where loads/stores on quad word aligned
>> addresses are changed to lvx/stvx.
> 
> Why?  Please say why in the commit message (the message you send with
> your patch should be the exact eventual commit message!)

ok, I will add more explanation.

> 
>> gcc/
>>  PR rtl-optimization/PR106770
>>  * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>>  function.
> 
> Please don't break commit message / changelog lines early unnecessarily.
> Lines are 80 chars, the leading tab counts as 8.

ok.

> 
>> +  /* Set if the swappable insns in the web represented by this entry
>> + have to be fixed. Swappable insns have to be fixed in :
> 
> (no space before colon)

ok.

> 
>> +static bool
>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>> +{
>> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>> +  insn_entry[i].special_handling == SH_NOSWAP_ST);
>> +}
> 
> "return" is not a function, you don't need parens here.

ok.

> 
>> +/* Convert a non-permuting load/store insn to a permuting one.  */
>> +static void
>> +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> 
> A better name would be good, "handle" is a weaselword :-)  It is a
> static, so a shorter name is completely acceptable (say, one that
> wouldn't be acceptable with bigger than file scope).

Ok. How does convert_mem_insn() sound?
Note: "handle" is used as a prefix for other functions in rs6000-p8swap.cc 
(such as handle_special_swappables()).
> 
>> +  rtx_insn *insn = insn_entry[i].insn;
>> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
>> +permute_load (insn);
>> +  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
>> +permute_store (insn);
> 
> Lose the "else"?  The compiler can do micro-optimisations a million
> times better than any user could.  Simpler, more readable (better
> understandable!) code is much preferred.
> 
>> +  /* Perform special handling for swappable insns that require it.
> 
> That is a completely contentless sentence :-(
> 

This line was present in the original code. This is not something I added.
Let me try to add some more comments to make the explanation better.

>> + Note that special handling should be done only for those
>> + swappable insns that are present in webs marked as requiring
>> + special handling.  */
> 
> This one isn't much better.> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile } */
> 
> This is the default, you do not need this.

ok.

> 
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2 " } */
>> +/* The 2 xxpermdi instructions are generated by the two
>> +   calls to vec_promote() */
>> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> 
> Please enclose in {}.  Use double quotes in Tcl only when tou want the
> interpolation they cause.  Default to using {} instead.

ok.

Regards,
Surya

> 
> So please fix those things, and write a better commit message.  Ideally
> the commit messsage will tell everything needed to understand the patch
> (so also to review the patch).  Maybe add examples where needed.  So
> reviewing the code in the patch should be an easy thing to do, after
> reading the commit message :-)
> 
> 
> Segher

Re: [PATCH] RISC-V: Add vector fmin/fmax expanders.

2023-10-31 Thread Robin Dapp

Thanks, going to commit the attached.

Regards
 Robin

This patch adds expanders for fmin and fmax.  As per RISC-V V Spec 1.0
vfmin/vfmax are IEEE 754-2019 compliant which differs from IEEE 754-2008
that fmin/fmax require (particularly in the signaling-NaN handling).
Therefore the pattern conditions include a !HONOR_SNANS.

gcc/ChangeLog:

* config/riscv/autovec.md (3): fmax/fmin
expanders.
(cond_): Ditto.
(cond_len_): Ditto.
(reduc_fmax_scal_): Ditto.
(reduc_fmin_scal_): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add fmin/fmax.
* config/riscv/vector-iterators.md (fmin): New UNSPEC.
(UNSPEC_VFMIN): Ditto.
* config/riscv/vector.md (@pred_): Add
UNSPEC insn patterns.
(@pred__scalar): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Remove
-ffast-math.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/fmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh-10.c: New test.
---
 gcc/config/riscv/autovec.md   | 72 +++
 gcc/config/riscv/riscv-v.cc   |  2 +
 gcc/config/riscv/vector-iterators.md  |  8 +++
 gcc/config/riscv/vector.md| 43 +++
 .../riscv/rvv/autovec/binop/fmax-1.c  | 24 +++
 .../riscv/rvv/autovec/binop/fmax_run-1.c  | 47 
 .../riscv/rvv/autovec/binop/fmax_zvfh-1.c | 23 ++
 .../riscv/rvv/autovec/binop/fmax_zvfh_run-1.c | 48 +
 .../riscv/rvv/autovec/binop/fmin-1.c  | 10 +++
 .../riscv/rvv/autovec/binop/fmin_run-1.c  |  5 ++
 .../riscv/rvv/autovec/binop/fmin_zvfh-1.c | 10 +++
 .../riscv/rvv/autovec/binop/fmin_zvfh_run-1.c |  5 ++
 .../riscv/rvv/autovec/cond/cond_fmax-1.c  |  6 +-
 .../riscv/rvv/autovec/cond/cond_fmax-2.c  |  3 +-
 .../riscv/rvv/autovec/cond/cond_fmax-3.c  |  6 +-
 .../riscv/rvv/autovec/cond/cond_fmax-4.c  |  6 +-
 .../riscv/rvv/autovec/cond/cond_fmax_run-1.c  |  3 +-
 ..

[PATCH] tree-optimization/112305 - SCEV cprop and conditional undefined overflow

2023-10-31 Thread Richard Biener

The following adjusts final value replacement to also rewrite the
replacement to defined overflow behavior if there's conditionally
evaluated stmts (with possibly undefined overflow), not only when
we "folded casts".  The patch hooks into expression_expensive for
this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112305
* tree-scalar-evolution.h (expression_expensive): Adjust.
* tree-scalar-evolution.cc (expression_expensive): Record
when we see a COND_EXPR.
(final_value_replacement_loop): When the replacement contains
a COND_EXPR, rewrite it to defined overflow.
* tree-sssa-loop-ivopts.cc (may_eliminate_iv): Adjust.

* gcc.dg/torture/pr112305.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr112305.c | 18 
 gcc/tree-scalar-evolution.cc| 59 +++--
 gcc/tree-scalar-evolution.h |  2 +-
 gcc/tree-ssa-loop-ivopts.cc |  3 +-
 4 files changed, 56 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr112305.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr112305.c 
b/gcc/testsuite/gcc.dg/torture/pr112305.c
new file mode 100644
index 000..9d363aaac9d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr112305.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
+
+int a;
+void b()
+{
+  long c = 3;
+  unsigned int d = 50253292;
+  int e = 2147483648;
+  for (; a < 5; a++)
+do {
+  e += 4;
+  d -= c;
+} while (e < 20);
+  if (d != -1560359471u)
+__builtin_abort ();
+}
+int main() { b(); }
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 95a15fe0988..a6524de7b92 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3353,11 +3353,13 @@ scev_finalize (void)
 }
 
 /* Returns true if the expression EXPR is considered to be too expensive
-   for scev_const_prop.  */
+   for scev_const_prop.  Sets *COND_OVERFLOW_P to true when the
+   expression might contain a sub-expression that is subject to undefined
+   overflow behavior and conditionally evaluated.  */
 
 static bool
-expression_expensive_p (tree expr, hash_map &cache,
-   uint64_t &cost)
+expression_expensive_p (tree expr, bool *cond_overflow_p,
+   hash_map &cache, uint64_t &cost)
 {
   enum tree_code code;
 
@@ -3444,7 +3446,7 @@ bitcount_call:
}
 
   FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
-   if (expression_expensive_p (arg, cache, op_cost))
+   if (expression_expensive_p (arg, cond_overflow_p, cache, op_cost))
  return true;
   *cache.get (expr) += op_cost;
   cost += op_cost + 1;
@@ -3453,7 +3455,8 @@ bitcount_call:
 
   if (code == COND_EXPR)
 {
-  if (expression_expensive_p (TREE_OPERAND (expr, 0), cache, op_cost)
+  if (expression_expensive_p (TREE_OPERAND (expr, 0), cond_overflow_p,
+ cache, op_cost)
  || (EXPR_P (TREE_OPERAND (expr, 1))
  && EXPR_P (TREE_OPERAND (expr, 2)))
  /* If either branch has side effects or could trap.  */
@@ -3461,11 +3464,13 @@ bitcount_call:
  || generic_expr_could_trap_p (TREE_OPERAND (expr, 1))
  || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 0))
  || generic_expr_could_trap_p (TREE_OPERAND (expr, 0))
- || expression_expensive_p (TREE_OPERAND (expr, 1),
+ || expression_expensive_p (TREE_OPERAND (expr, 1), cond_overflow_p,
 cache, op_cost)
- || expression_expensive_p (TREE_OPERAND (expr, 2),
+ || expression_expensive_p (TREE_OPERAND (expr, 2), cond_overflow_p,
 cache, op_cost))
return true;
+  /* Conservatively assume there's overflow for now.  */
+  *cond_overflow_p = true;
   *cache.get (expr) += op_cost;
   cost += op_cost + 1;
   return false;
@@ -3475,12 +3480,14 @@ bitcount_call:
 {
 case tcc_binary:
 case tcc_comparison:
-  if (expression_expensive_p (TREE_OPERAND (expr, 1), cache, op_cost))
+  if (expression_expensive_p (TREE_OPERAND (expr, 1), cond_overflow_p,
+ cache, op_cost))
return true;
 
   /* Fallthru.  */
 case tcc_unary:
-  if (expression_expensive_p (TREE_OPERAND (expr, 0), cache, op_cost))
+  if (expression_expensive_p (TREE_OPERAND (expr, 0), cond_overflow_p,
+ cache, op_cost))
return true;
   *cache.get (expr) += op_cost;
   cost += op_cost + 1;
@@ -3492,11 +3499,12 @@ bitcount_call:
 }
 
 bool
-expression_expensive_p (tree expr)
+expression_expensive_p (tree expr, bool *cond_overflow_p)
 {
   hash_map cache;
   uint64_t expanded_size = 0;
-  return (expression_expensive_p (expr, cache, expanded_size)
+  *cond_overflow_p = false;
+  return (expression_expensive_p (expr, cond_overf

RE: Re: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic

2023-10-31 Thread Li, Pan2

Thanks xuli for enabling this feature, we can update the CI of 
rvv-intrinsic-doc for overloaded API(s) after committed.

Pan

-Original Message-
From: Li Xu  
Sent: Tuesday, October 31, 2023 7:37 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches ; kito.cheng ; 
palmer 
Subject: Re: Re: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for 
RVV intrinsic

All overload and non-overload intrinsics have been tested successfully on gcc 
and g++.

Thanks.


> -原始邮件-发件人:"juzhe.zh...@rivai.ai" 
> 发送时间:2023-10-31 17:07:11 (星期二)收件人:"Li Xu" 
> , gcc-patches 
> 抄送:"kito.cheng" , palmer 
> , "Li Xu" 主题:Re: [PATCH v6] 
> RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV intrinsic
> 
> LGTM from my side.
> 
> Give kito one more day to review it.
> 
> Thanks for support this feature !
> 
> juzhe.zh...@rivai.ai
>  
> From: Li Xu
> Date: 2023-10-31 17:03
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; xuli
> Subject: [PATCH v6] RISC-V: Implement RESOLVE_OVERLOADED_BUILTIN for RVV 
> intrinsic
> From: xuli 
>  
> Update in v6:
> * Rename maybe_require_frm_p to may_require_frm_p.
> * Rename maybe_require_vxrm_p to may_require_vxrm_p.
> * Move may_require_frm_p and may_require_vxrm_p to function_base.
>  
> Update in v5:
> * Split has_vxrm_or_frm_p into maybe_require_frm_p and
>   maybe_require_vxrm_p.
> * Adjust comments.
>  
> Update in v4:
> * Remove class function_resolver.
> * Remove function get_non_overloaded_instance.
> * Add overloaded hash traits for non-overloaded intrinsic.
> * All overloaded intrinsics are implemented, and the tests pass.
>  
> Update in v3:
>  
> * Rewrite comment for overloaded function add.
> * Move get_non_overloaded_instance to function_base.
>  
> Update in v2:
>  
> * Add get_non_overloaded_instance for function instance.
> * Fix overload check for policy function.
> * Enrich the test cases check.
>  
> Original log:
>  
> This patch would like add the framework to support the RVV overloaded
> intrinsic API in riscv-xxx-xxx-gcc, like riscv-xxx-xxx-g++ did.
>  
> However, it almost leverage the hook TARGET_RESOLVE_OVERLOADED_BUILTIN
> with below steps.
>  
> * Register overloaded functions.
> * Add function_resolver for overloaded function resolving.
> * Add resolve API for function shape with default implementation.
> * Implement HOOK for navigating the overloaded API to non-overloaded API.
>  
> gcc/ChangeLog:
>  
>     * config/riscv/riscv-c.cc (riscv_resolve_overloaded_builtin): New 
> function for the hook.
>     (riscv_register_pragmas): Register the hook.
>     * config/riscv/riscv-protos.h (resolve_overloaded_builtin): New decl.
>     * config/riscv/riscv-vector-builtins-bases.cc: New function impl.
>     * config/riscv/riscv-vector-builtins-shapes.cc (build_one): Register 
> overloaded function.
>     * config/riscv/riscv-vector-builtins.cc (struct 
> non_overloaded_registered_function_hasher): New hash table.
>     (function_builder::add_function): Add overloaded arg.
>     (function_builder::add_unique_function): Map overloaded function to 
> non-overloaded function.
>     (function_builder::add_overloaded_function): New API impl.
>     (registered_function::overloaded_hash): Calculate hash value.
>     (has_vxrm_or_frm_p): New function impl.
>     (non_overloaded_registered_function_hasher::hash): Ditto.
>     (non_overloaded_registered_function_hasher::equal): Ditto.
>     (handle_pragma_vector): Allocate space for hash table.
>     (resolve_overloaded_builtin): New function impl.
>     * config/riscv/riscv-vector-builtins.h 
> (function_base::may_require_frm_p): Ditto.
>     (function_base::may_require_vxrm_p): Ditto.
>  
> gcc/testsuite/ChangeLog:
>  
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vmv.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vmv.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vadd.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vfadd.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vget_vset.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vloxseg2ei16.h: New test.
>     * gcc.target/riscv/rvv/base/overloaded_vmv.h: New test.

Re: [PATCH] VECT: Support SLP MASK_LEN_GATHER_LOAD with conditional mask

2023-10-31 Thread Richard Biener

On Thu, 26 Oct 2023, Juzhe-Zhong wrote:

> This patch leverage current MASK_GATHER_LOAD to support SLP 
> MASK_LEN_GATHER_LOAD with condtional mask.
> 
> Unconditional MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) SLP is not 
> included in this patch
> since it seems that we can't support it in the middle-end (due to PR44306).

That bug number is off I believe.

> May be we should support GATHER_LOAD explictily in RISC-V backend to walk 
> around this issue.
> 
> I am gonna support GATHER_LOAD explictly work around in RISC-V backend.
> 
> This patch also adds conditional gather load test since there is no 
> conditional gather load test.
> 
> Ok for trunk ? 
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
>   (vect_build_slp_tree_1): Ditto.
>   (vect_build_slp_tree_2): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-gather-6.c: New test.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
>  gcc/tree-vect-slp.cc  |  8 ++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> new file mode 100644
> index 000..ff55f321854
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +
> +void
> +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
> cond, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +{
> +  if (cond[i * 2])
> + y[i * 2] = x[indices[i * 2]] + 1;
> +  if (cond[i * 2 + 1])
> + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
> vect_gather_load_ifn } } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 5eb310eceaf..0c197b50054 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -564,6 +564,7 @@ vect_get_operand_map (const gimple *stmt, bool 
> gather_scatter_p = false,
>   return arg1_map;
>  
> case IFN_MASK_GATHER_LOAD:
> +   case IFN_MASK_LEN_GATHER_LOAD:
>   return arg1_arg4_map;
>  
> case IFN_MASK_STORE:
> @@ -1158,7 +1159,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>  
> if (cfn == CFN_MASK_LOAD
> || cfn == CFN_GATHER_LOAD
> -   || cfn == CFN_MASK_GATHER_LOAD)
> +   || cfn == CFN_MASK_GATHER_LOAD
> + || cfn == CFN_MASK_LEN_GATHER_LOAD)

somehow whitespace is mangled here and below, please fix.

Otherwise OK.

Thanks,
Richard.

>   ldst_p = true;
> else if (cfn == CFN_MASK_STORE)
>   {
> @@ -1425,6 +1427,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
> if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> && rhs_code != CFN_GATHER_LOAD
> && rhs_code != CFN_MASK_GATHER_LOAD
> + && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)
> /* Not grouped loads are handled as externals for BB
>vectorization.  For loop vectorization we can handle
> @@ -1927,7 +1930,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>if (gcall *stmt = dyn_cast  (stmt_info->stmt))
>   gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
>   || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> + || gimple_call_internal_p (stmt, 
> IFN_MASK_LEN_GATHER_LOAD));
>else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
>   gcc_assert (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)));
>else
>

Re: hardcfr: support checking at abnormal edges [PR111943]

2023-10-31 Thread Alexandre Oliva

[adding list]

On Oct 27, 2023, rep.dot@gmail.com wrote:

> +   from E and FALES from other preds, split the whole block, add a

> s/FALES/FALSE/

Thanks, I've just installed the patch including the typo fix.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH v3] VECT: Refine the type size restriction of call vectorizer

2023-10-31 Thread Richard Biener

On Mon, Oct 30, 2023 at 1:23 PM  wrote:
>
> From: Pan Li 
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
> out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
> func impl to predicate the type size is legal or not.
> (vectorizable_call): Leverage vectorizable_type_size_legal_p.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-stmts.cc | 51 +++---
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..24b3448d961 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree 
> fndecl,
>return IFN_LAST;
>  }
>
> +/* Return TRUE when the type size is legal for the call vectorizer,
> +   or FALSE.
> +   The type size of both the vectype_in and vectype_out should be
> +   exactly the same when vectype_out isn't participating the optab.
> +   While there is no restriction for type size when vectype_out
> +   is part of the optab query.
> + */
> +static bool
> +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
> +   tree vectype_in)
> +{
> +  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> +
> +  if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
> +return same_size_p;
> +
> +  const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
> +
> +  if (!difn_info.vectorizable)
> +return same_size_p;
> +
> +  /* According to vectorizable_internal_function, the type0/1 < 0 indicates
> + the vectype_out participating the optable selection.  Aka the type size
> + check can be skipped here.  */
> +  if (difn_info.type0 < 0 || difn_info.type1 < 0)
> +return true;

can you instead amend vectorizable_internal_function to contain the check,
returning IFN_LAST if it doesn't hold?

> +
> +  return same_size_p;
> +}
>
>  static tree permute_vec_elements (vec_info *, tree, tree, tree, 
> stmt_vec_info,
>   gimple_stmt_iterator *);
> @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
>
>return false;
>  }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector.  We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -{
> -  if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"mismatched vector sizes %T and %T\n",
> -vectype_in, vectype_out);
> -  return false;
> -}
>
>if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>!= VECTOR_BOOLEAN_TYPE_P (vectype_in))
> @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
>  ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>   vectype_in);
>
> +  if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"mismatched vector sizes %T and %T\n",
> +vectype_in, vectype_out);
> +  return false;
> +}
> +
>/* If that fails, try asking for a target-specific built-in function.  */
>if (ifn == IFN_LAST)
>  {
> --
> 2.34.1
>

Re: [PATCH] RISC-V: Enable ztso tests on rv32

2023-10-31 Thread Jeff Law





On 10/30/23 18:47, Patrick O'Neill wrote:

This patch transitions the ztso testcases to use the testsuite infrastructure,
enabling the tests on both rv64 and rv32 targets.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Ztso extension to
dg-options for dg-do compile.
 * gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
 * lib/target-supports.exp: Add testing infrastructure to require the
Ztso extension or add it to an existing -march.

OK.

Jeff

Re: [PATCH 1/2] RISC-V: Let non-atomic targets use optimized amo loads/stores

2023-10-31 Thread Jeff Law





On 10/30/23 18:49, Patrick O'Neill wrote:

Non-atomic targets are currently prevented from using the optimized fencing for
seq_cst load/seq_cst store. This patch removes that constraint.

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md (atomic_load_rvwmo): Remove
TARGET_ATOMIC constraint
(atomic_store_rvwmo): Ditto.
* config/riscv/sync-ztso.md (atomic_load_ztso): Ditto.
(atomic_store_ztso): Ditto.
* config/riscv/sync.md (atomic_load): Ditto.
(atomic_store): Ditto.

OK
jeff

Re: [PATCH 2/2] RISC-V: Require a extension for testcases with atomic insns

2023-10-31 Thread Jeff Law





On 10/30/23 18:49, Patrick O'Neill wrote:

Add testsuite infrastructure for the A extension and use it to require the A
extension for dg-do run and add the add extension for non-A dg-do compile.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/amo-table-a-6-amo-add-1.c: Add A extension to
dg-options for dg-do compile.
 * gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
 * gcc.target/riscv/inline-atomics-2.c: Ditto.
 * gcc.target/riscv/inline-atomics-3.c: Require A extension for dg-do
run.
 * gcc.target/riscv/inline-atomics-4.c: Ditto.
 * gcc.target/riscv/inline-atomics-5.c: Ditto.
 * gcc.target/riscv/inline-atomics-6.c: Ditto.
 * gcc.target/riscv/inline-atomics-7.c: Ditto.
 * gcc.target/riscv/inline-atomics-8.c: Ditto.
 * lib/target-supports.exp: Add testing infrastructure to require the A
extension or add it to an existing -march.

OK
jeff

Re: [PATCH 2/2] tree-optimization/111131 - SLP for non-IFN gathers

2023-10-31 Thread Richard Biener

On Tue, 31 Oct 2023, Thomas Schwinge wrote:

> Hi!
> 
> On 2023-10-19T11:47:14+, Richard Biener  wrote:
> > The following implements SLP vectorization support for gathers
> > without relying on IFNs being pattern detected (and supported by
> > the target).  That includes support for emulated gathers but also
> > the legacy x86 builtin path.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.
> 
> For GCN (tested '-march=gfx90a'), I see:
> 
>  PASS: gcc.dg/vect/vect-gather-2.c (test for excess errors)
> +FAIL: gcc.dg/vect/vect-gather-2.c scan-tree-dump vect "different gather 
> base"
> +FAIL: gcc.dg/vect/vect-gather-2.c scan-tree-dump vect "different gather 
> scale"
> +PASS: gcc.dg/vect/vect-gather-2.c scan-tree-dump-not vect "Loop contains 
> only SLP stmts"

Ah, for gather IFNs pattern matched it will instead have

Build SLP failed: different calls in patt_55 = .GATHER_LOAD ((sizetype) 
x2_29(D), _15, 4, 0);

but then I have put in

/* { dg-final { scan-tree-dump "different gather base" vect { target { ! 
vect_gather_load_ifn } } } } */
/* { dg-final { scan-tree-dump "different gather scale" vect { target { ! 
vect_gather_load_ifn } } } } */

and expected gcn to have vect_gather_load_ifn ... but that is

proc check_effective_target_vect_gather_load_ifn { } {
return [expr { [check_effective_target_aarch64_sve]
   || [check_effective_target_riscv_v] }]
} 

probably add

   || [istarget amdgcn*-*-*]

there?  Can you do that (after checking it doesn't break other
tests)?

Richard.

> 
> Gr??e
>  Thomas
> 
> 
> >   PR tree-optimization/31
> >   * tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
> >   sure to update all gather/scatter stmt DRs, not only those
> >   that eventually got VMAT_GATHER_SCATTER set.
> >   * tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
> >   (vect_get_and_check_slp_defs): Handle gathers/scatters,
> >   adding the offset as SLP operand and comparing base and scale.
> >   (vect_build_slp_tree_1): Handle gathers.
> >   (vect_build_slp_tree_2): Likewise.
> >
> >   * gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
> >   everywhere.
> >   * gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
> >   Massage the scale case to more reliably produce a different
> >   one.  Scan for the specific messages.
> >   * gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
> >   for AVX2, but not emulated.
> >   * gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
> >   Massage to more properly ensure this.
> >   * gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
> >   everywhere.
> > ---
> >  .../gcc.dg/vect/tsvc/vect-tsvc-s353.c |  2 +-
> >  gcc/testsuite/gcc.dg/vect/vect-gather-1.c |  2 +-
> >  gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 13 --
> >  gcc/testsuite/gcc.dg/vect/vect-gather-3.c |  2 +-
> >  gcc/testsuite/gcc.dg/vect/vect-gather-4.c |  6 +--
> >  gcc/tree-vect-loop.cc |  6 ++-
> >  gcc/tree-vect-slp.cc  | 45 +--
> >  7 files changed, 61 insertions(+), 15 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c 
> > b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> > index 98ba7522471..2c4fa3f5991 100644
> > --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> > +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> > @@ -44,4 +44,4 @@ int main (int argc, char **argv)
> >return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> > riscv_v } } } } */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> > index e3bbf5c0bf8..5f6640d9ab6 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> > @@ -58,4 +58,4 @@ main (void)
> >return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { 
> > target vect_gather_load_ifn } } } */
> > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
> > index a1f6ba458a9..4c23b808333 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
> > @@ -8,6 +8,7 @@ f1 (int *restrict y, int *restrict x1, int *restrict x2,
> >  {
> >for (int i = 0; i < N; ++i)
> >  {
> > +  /* Different base.  */
> >y[i * 2] = x1[indices[i * 2]] + 1;
> >y[i * 2 + 1] = x2[indices[i * 2 + 1]] + 2;
> >  }
> > @@ -18,8 +19,9 @@ f2 (int *restrict y, int *restrict x, int *restrict 
> > indices)
> >  {
> >for (int i = 0; i < N; ++i)
> >  {
> > -  y[i * 2] = x[indices[i * 2]] + 1

[OG13][committed] OpenMP/Fortran: Fix parsing of metadirectives with BLOCK

2023-10-31 Thread Tobias Burnus


This is an OG13-only patch as metadirectives are not yet on mainline.
I think it is a side effect of the backported my mainline patch that
fixed strictly-nested blocks that was backported to OG13 on Oct 8, 2023
in OG13 commit 36e5f02e64bd4b5b1eaf89993a63c56b01cd4e7c.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 46feedaddf8e4d82b7a24557c7bb4c8c3ee287a0
Author: Tobias Burnus 
Date:   Tue Oct 31 12:03:44 2023 +0100

OpenMP/Fortran: Fix parsing of metadirectives with BLOCK

Probably a fallout of the backport of r14-4471-g6a8edd50a149f1
  Fortran/OpenMP: Fix handling of strictly structured blocks
This showed up as parsing error/fail with
 libgomp.fortran/metadirective-1.f90
 libgomp.fortran/metadirective-6.f90

gcc/fortran/

* decl.cc (gfc_match_end): Handle unnamed END BLOCK with
metadirectives.
---
 gcc/fortran/ChangeLog.omp | 5 +
 gcc/fortran/decl.cc   | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index 6b30302428f..e20a88b8740 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-10-31  Tobias Burnus  
+
+	* decl.cc (gfc_match_end): Handle unnamed END BLOCK with
+	metadirectives.
+
 2023-10-30  Tobias Burnus  
 
 	* trans-openmp.cc (gfc_trans_omp_clauses): Avoid gfc_evaluate_now
diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index 783c39438e8..4c04e64d37c 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -8409,6 +8409,9 @@ gfc_match_end (gfc_statement *st)
 		&& state_data->sym->abr_modproc_decl;
 	}
 	while (state == COMP_OMP_METADIRECTIVE);
+
+	if (startswith (block_name, "block@"))
+	  block_name = NULL;
   }
   break;
 default:

Re: [PATCH v2 1/2] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-10-31 Thread Christoph Müllner

On Sun, Oct 29, 2023 at 10:44 PM Jeff Law  wrote:
>
>
>
> On 10/20/23 03:53, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > The XTheadMemIdx ISA extension provides a additional load and store
> > instructions with new addressing modes.
> >
> > The following memory accesses types are supported:
> > * load: b,bu,h,hu,w,wu,d
> > * store: b,h,w,d
> >
> > The following addressing modes are supported:
> > * immediate offset with PRE_MODIFY or POST_MODIFY (22 instructions):
> >l.ia, l.ib, s.ia, s.ib
> > * register offset with additional immediate offset (11 instructions):
> >lr, sr
> > * zero-extended register offset with additional immediate offset
> >(11 instructions): lur, sur
> >
> > The RISC-V base ISA does not support index registers, so the changes
> > are kept separate from the RISC-V standard support as much as possible.
> >
> > To combine the shift/multiply instructions into the memory access
> > instructions, this patch comes with a few insn_and_split optimizations
> > that allow the combiner to do this task.
> >
> > Handling the different cases of extensions results in a couple of INSNs
> > that look redundant on first view, but they are just the equivalence
> > of what we already have for Zbb as well. The only difference is, that
> > we have much more load instructions.
> >
> > We already have a constraint with the name 'th_f_fmv', therefore,
> > the new constraints follow this pattern and have the same length
> > as required ('th_m_mia', 'th_m_mib', 'th_m_mir', 'th_m_miu').
> >
> > The added tests ensure that this feature won't regress without notice.
> > Testing: GCC regression test suite, GCC bootstrap build, and
> > SPEC CPU 2017 intrate (base&peak) on C920.
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/constraints.md (th_m_mia): New constraint.
> >   (th_m_mib): Likewise.
> >   (th_m_mir): Likewise.
> >   (th_m_miu): Likewise.
> >   * config/riscv/riscv-protos.h (enum riscv_address_type):
> >   Add new address types ADDRESS_REG_REG, ADDRESS_REG_UREG,
> >   and ADDRESS_REG_WB and their documentation.
> >   (struct riscv_address_info): Add new field 'shift' and
> >   document the field usage for the new address types.
> >   (riscv_valid_base_register_p): New prototype.
> >   (th_memidx_legitimate_modify_p): Likewise.
> >   (th_memidx_legitimate_index_p): Likewise.
> >   (th_classify_address): Likewise.
> >   (th_output_move): Likewise.
> >   (th_print_operand_address): Likewise.
> >   * config/riscv/riscv.cc (riscv_index_reg_class):
> >   Return GR_REGS for XTheadMemIdx.
> >   (riscv_regno_ok_for_index_p): Add support for XTheadMemIdx.
> >   (riscv_classify_address): Call th_classify_address() on top.
> >   (riscv_output_move): Call th_output_move() on top.
> >   (riscv_print_operand_address): Call th_print_operand_address()
> >   on top.
> >   * config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): New macro.
> >   (HAVE_PRE_MODIFY_DISP): Likewise.
> >   * config/riscv/riscv.md (zero_extendqi2): Disable
> >   for XTheadMemIdx.
> >   (*zero_extendqi2_internal): Convert to expand,
> >   create INSN with same name and disable it for XTheadMemIdx.
> >   (extendsidi2): Likewise.
> >   (*extendsidi2_internal): Disable for XTheadMemIdx.
> >   * config/riscv/thead.cc (valid_signed_immediate): New helper
> >   function.
> >   (th_memidx_classify_address_modify): New function.
> >   (th_memidx_legitimate_modify_p): Likewise.
> >   (th_memidx_output_modify): Likewise.
> >   (is_memidx_mode): Likewise.
> >   (th_memidx_classify_address_index): Likewise.
> >   (th_memidx_legitimate_index_p): Likewise.
> >   (th_memidx_output_index): Likewise.
> >   (th_classify_address): Likewise.
> >   (th_output_move): Likewise.
> >   (th_print_operand_address): Likewise.
> >   * config/riscv/thead.md (*th_memidx_operand): New splitter.
> >   (*th_memidx_zero_extendqi2): New INSN.
> >   (*th_memidx_extendsidi2): Likewise.
> >   (*th_memidx_zero_extendsidi2): Likewise.
> >   (*th_memidx_zero_extendhi2): Likewise.
> >   (*th_memidx_extend2): Likewise.
> >   (*th_memidx_bb_zero_extendsidi2): Likewise.
> >   (*th_memidx_bb_zero_extendhi2): Likewise.
> >   (*th_memidx_bb_extendhi2): Likewise.
> >   (*th_memidx_bb_extendqi2): Likewise.
> >   (TH_M_ANYI): New mode iterator.
> >   (TH_M_NOEXTI): Likewise.
> >   (*th_memidx_I_a): New combiner optimization.
> >   (*th_memidx_I_b): Likewise.
> >   (*th_memidx_I_c): Likewise.
> >   (*th_memidx_US_a): Likewise.
> >   (*th_memidx_US_b): Likewise.
> >   (*th_memidx_US_c): Likewise.
> >   (*th_memidx_UZ_a): Likewise.
> >   (*th_memidx_UZ_b): Likewise.
> >   (*th_memidx_UZ_c): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadmemidx-helpers.h: New t

Re: [PATCH v2 2/2] riscv: thead: Add support for the XTheadFMemIdx ISA extension

2023-10-31 Thread Christoph Müllner

On Sun, Oct 29, 2023 at 11:25 PM Jeff Law  wrote:
>
>
>
> On 10/20/23 03:53, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > The XTheadFMemIdx ISA extension provides additional load and store
> > instructions for floating-point registers with new addressing modes.
> >
> > The following memory accesses types are supported:
> > * load/store: [w,d] (single-precision FP, double-precision FP)
> >
> > The following addressing modes are supported:
> > * register offset with additional immediate offset (4 instructions):
> >flr, fsr
> > * zero-extended register offset with additional immediate offset
> >(4 instructions): flur, fsur
> >
> > These addressing modes are also part of the similar XTheadMemIdx
> > ISA extension support, whose code is reused and extended to support
> > floating-point registers.
> >
> > One challenge that this patch needs to solve are GP registers in FP-mode
> > (e.g. "(reg:DF a2)"), which cannot be handled by the XTheadFMemIdx
> > instructions. Such registers are the result of independent
> > optimizations, which can happen after register allocation.
> > This patch uses a simple but efficient method to address this:
> > add a dependency for XTheadMemIdx to XTheadFMemIdx optimizations.
> > This allows to use the instructions from XTheadMemIdx in case
> > of such registers.
> Or alternately define secondary reloads so that you can get a scratch
> register to reload the address into a GPR.  Your call on whether or not
> to try to implement that.  I guess it largely depends on how likely it
> is you'll have one extension defined, but not the other.

I started doing this but I thought it is not worth the effort,
given all cores that implement one extension also support the other.


> > The added tests ensure that this feature won't regress without notice.
> > Testing: GCC regression test suite and SPEC CPU 2017 intrate (base&peak).
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_index_reg_class):
> >   Return GR_REGS for XTheadFMemIdx.
> >   (riscv_regno_ok_for_index_p): Add support for XTheadFMemIdx.
> >   * config/riscv/riscv.h (HARDFP_REG_P): New macro.
> >   * config/riscv/thead.cc (is_fmemidx_mode): New function.
> >   (th_memidx_classify_address_index): Add support for XTheadFMemIdx.
> >   (th_fmemidx_output_index): New function.
> >   (th_output_move): Add support for XTheadFMemIdx.
> >   * config/riscv/thead.md (TH_M_ANYF): New mode iterator.
> >   (TH_M_NOEXTF): Likewise.
> >   (*th_fmemidx_movsf_hardfloat): New INSN.
> >   (*th_fmemidx_movdf_hardfloat_rv64): Likewise.
> >   (*th_fmemidx_I_a): Likewise.
> >   (*th_fmemidx_I_c): Likewise.
> >   (*th_fmemidx_US_a): Likewise.
> >   (*th_fmemidx_US_c): Likewise.
> >   (*th_fmemidx_UZ_a): Likewise.
> >   (*th_fmemidx_UZ_c): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadfmemidx-index-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-index.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c: New test.
> >   * gcc.target/riscv/xtheadfmemidx-uindex.c: New test.
> > ---
> Same note as with the prior patch WRT wrapping assembly instructions
> when using scan-assembler.

Will do.

>
>
>
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index eb162abcb92..1e9813b4f39 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -372,6 +372,8 @@ ASM_MISA_SPEC
> > ((unsigned int) ((int) (REGNO) - GP_REG_FIRST) < GP_REG_NUM)
> >   #define FP_REG_P(REGNO)  \
> > ((unsigned int) ((int) (REGNO) - FP_REG_FIRST) < FP_REG_NUM)
> > +#define HARDFP_REG_P(REGNO)  \
> > +  ((REGNO) >= FP_REG_FIRST && (REGNO) <= FP_REG_LAST)
> >   #define V_REG_P(REGNO)  \
> > ((unsigned int) ((int) (REGNO) - V_REG_FIRST) < V_REG_NUM)
> >   #define VL_REG_P(REGNO) ((REGNO) == VL_REGNUM)
>
> > @@ -755,6 +768,40 @@ th_memidx_output_index (rtx x, machine_mode mode, bool 
> > load)
> > return buf;
> >   }
> >
> > +/* Provide a buffer for a th.flX/th.fluX/th.fsX/th.fsuX instruction
> > +   for the given MODE. If LOAD is true, a load instruction will be
> > +   provided (otherwise, a store instruction). If X is not suitable
> > +   return NULL.  */
> > +
> > +static const char *
> > +th_fmemidx_output_index (rtx x, machine_mode mode, bool load)
> > +{
> > +  struct riscv_address_info info;
> > +  static char buf[128] = {0};
> Same comment WRT static buffers as in the previous patch.
>
> OK for the trunk after fixing the testcases and potentially adjusting
> the static buffer.  No need to get another review ro

Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-10-31 Thread Richard Biener

On Sun, Oct 8, 2023 at 6:40 PM Di Zhao OS  wrote:
>
> Attached is a new version of the patch.
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, October 6, 2023 5:33 PM
> > To: Di Zhao OS 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > get_reassociation_width
> >
> > On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
> >  wrote:
> > >
> > > This is a new version of the patch on "nested FMA".
> > > Sorry for updating this after so long, I've been studying and
> > > writing micro cases to sort out the cause of the regression.
> >
> > Sorry for taking so long to reply.
> >
> > > First, following previous discussion:
> > > (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html)
> > >
> > > 1. From testing more altered cases, I don't think the
> > > problem is that reassociation works locally. In that:
> > >
> > >   1) On the example with multiplications:
> > >
> > > tmp1 = a + c * c + d * d + x * y;
> > > tmp2 = x * tmp1;
> > > result += (a + c + d + tmp2);
> > >
> > >   Given "result" rewritten by width=2, the performance is
> > >   worse if we rewrite "tmp1" with width=2. In contrast, if we
> > >   remove the multiplications from the example (and make "tmp1"
> > >   not singe used), and still rewrite "result" by width=2, then
> > >   rewriting "tmp1" with width=2 is better. (Make sense because
> > >   the tree's depth at "result" is still smaller if we rewrite
> > >   "tmp1".)
> > >
> > >   2) I tried to modify the assembly code of the example without
> > >   FMA, so the width of "result" is 4. On Ampere1 there's no
> > >   obvious improvement. So although this is an interesting
> > >   problem, it doesn't seem like the cause of the regression.
> >
> > OK, I see.
> >
> > > 2. From assembly code of the case with FMA, one problem is
> > > that, rewriting "tmp1" to parallel didn't decrease the
> > > minimum CPU cycles (taking MULT_EXPRs into account), but
> > > increased code size, so the overhead is increased.
> > >
> > >a) When "tmp1" is not re-written to parallel:
> > > fmadd d31, d2, d2, d30
> > > fmadd d31, d3, d3, d31
> > > fmadd d31, d4, d5, d31  //"tmp1"
> > > fmadd d31, d31, d4, d3
> > >
> > >b) When "tmp1" is re-written to parallel:
> > > fmul  d31, d4, d5
> > > fmadd d27, d2, d2, d30
> > > fmadd d31, d3, d3, d31
> > > fadd  d31, d31, d27 //"tmp1"
> > > fmadd d31, d31, d4, d3
> > >
> > > For version a), there are 3 dependent FMAs to calculate "tmp1".
> > > For version b), there are also 3 dependent instructions in the
> > > longer path: the 1st, 3rd and 4th.
> >
> > Yes, it doesn't really change anything.  The patch has
> >
> > +  /* If there's code like "acc = a * b + c * d + acc" in a tight loop, some
> > + uarchs can execute results like:
> > +
> > +   _1 = a * b;
> > +   _2 = .FMA (c, d, _1);
> > +   acc_1 = acc_0 + _2;
> > +
> > + in parallel, while turning it into
> > +
> > +   _1 = .FMA(a, b, acc_0);
> > +   acc_1 = .FMA(c, d, _1);
> > +
> > + hinders that, because then the first FMA depends on the result
> > of preceding
> > + iteration.  */
> >
> > I can't see what can be run in parallel for the first case.  The .FMA
> > depends on the multiplication a * b.  Iff the uarch somehow decomposes
> > .FMA into multiply + add then the c * d multiply could run in parallel
> > with the a * b multiply which _might_ be able to hide some of the
> > latency of the full .FMA.  Like on x86 Zen FMA has a latency of 4
> > cycles but a multiply only 3.  But I never got confirmation from any
> > of the CPU designers that .FMAs are issued when the multiply
> > operands are ready and the add operand can be forwarded.
> >
> > I also wonder why the multiplications of the two-FMA sequence
> > then cannot be executed at the same time?  So I have some doubt
> > of the theory above.
>
> The parallel execution for the code snippet above was the other
> issue (previously discussed here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628960.html).
> Sorry it's a bit confusing to include that here, but these 2 fixes
> needs to be combined to avoid new regressions. Since considering
> FMA in get_reassociation_width produces more results of width=1,
> so there would be more loop depending FMA chains.
>
> > Iff this really is the reason for the sequence to execute with lower
> > overall latency and we want to attack this on GIMPLE then I think
> > we need a target hook telling us this fact (I also wonder if such
> > behavior can be modeled in the scheduler pipeline description at all?)
> >
> > > So it seems to me the current get_reassociation_width algorithm
> > > isn't optimal in the presence of FMA. So I modified the patch to
> > > improve get_reassociation_width, rather than check for code
> > > patterns. (Although there could be some other complicated
> > > factors so the regressi

Add OpenACC 'acc_map_data' variant to 'libgomp.oacc-c-c++-common/deep-copy-8.c' (was: [PATCH 11/13] OpenACC 2.6 deep copy: C and C++ execution tests)

2023-10-31 Thread Thomas Schwinge

Hi!

On 2019-12-17T22:04:54-0800, Julian Brown  wrote:
> This patch has been broken out of the "OpenACC 2.6 manual deep copy
> support" patch, [...]

> This part adds C and C++ execution tests to libgomp.

Pushed to master branch commit 3e888f94624294d2b9b34ebfee0916768e5d9c3f
"Add OpenACC 'acc_map_data' variant to 
'libgomp.oacc-c-c++-common/deep-copy-8.c'",
see attached.

This will be helpful to detect (and then avoid) a regression due to an
libgomp patch elsewhere.


Grüße
 Thomas


>  .../testsuite/libgomp.oacc-c++/deep-copy-12.C | 72 +++
>  .../testsuite/libgomp.oacc-c++/deep-copy-13.C | 72 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-1.c   | 24 +
>  .../libgomp.oacc-c-c++-common/deep-copy-10.c  | 53 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-11.c  | 72 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-14.c  | 63 ++
>  .../libgomp.oacc-c-c++-common/deep-copy-2.c   | 29 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-4.c   | 87 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-6.c   | 59 +
>  .../libgomp.oacc-c-c++-common/deep-copy-7.c   | 45 ++
>  .../libgomp.oacc-c-c++-common/deep-copy-8.c   | 54 
>  .../libgomp.oacc-c-c++-common/deep-copy-9.c   | 53 +++


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3e888f94624294d2b9b34ebfee0916768e5d9c3f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 30 Oct 2023 17:11:40 +0100
Subject: [PATCH] Add OpenACC 'acc_map_data' variant to
 'libgomp.oacc-c-c++-common/deep-copy-8.c'

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Add OpenACC
	'acc_map_data' variant.
---
 .../libgomp.oacc-c-c++-common/deep-copy-8.c   | 29 +--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c
index 1b4cf2fb684..e705f78c311 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c
@@ -12,8 +12,8 @@ struct dc
   int *d;
 };
 
-int
-main ()
+static void
+test (unsigned variant)
 {
   int n = 100, i, j, k;
   struct dc v = { .a = 3 };
@@ -22,7 +22,16 @@ main ()
   v.c = (int *) malloc (sizeof (int) * n);
   v.d = (int *) malloc (sizeof (int) * n);
 
+  if (variant & 1)
+{
 #pragma acc enter data copyin(v)
+}
+  else
+{
+  void *v_d = acc_malloc (sizeof v);
+  acc_map_data (&v, v_d, sizeof v);
+  acc_memcpy_to_device (v_d, &v, sizeof v);
+}
 
   for (k = 0; k < 16; k++)
 {
@@ -46,9 +55,25 @@ main ()
   assert (!acc_is_present (v.d, sizeof (int) * n));
 }
 
+  if (variant & 1)
+{
 #pragma acc exit data copyout(v)
+}
+  else
+{
+  void *v_d = acc_deviceptr (&v);
+  acc_unmap_data (&v);
+  acc_free (v_d);
+}
 
   assert (!acc_is_present (&v, sizeof (v)));
+}
+
+int
+main ()
+{
+  for (unsigned variant = 0; variant < 2; ++variant)
+test (variant);
 
   return 0;
 }
-- 
2.34.1

Re: [PATCH, OpenACC 2.7] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2023-10-31 Thread Thomas Schwinge

Hi Chung-Lin!

On 2023-06-22T18:03:37+0800, Chung-Lin Tang via Gcc-patches 
 wrote:
> This patch adjusts the implementation of acc_map_data/acc_unmap_data API 
> library
> routines to more fit the description in the OpenACC 2.7 specification.

Thanks!

> Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
> special value to mark acc_map_data-created mappings, and allow adjustment of
> dynamic_refcount of such mappings by other constructs. Enforcing of an initial
> value of 1 for such mappings, and only allowing acc_unmap_data to delete such
> mappings, is implemented as specified.
>
> Actually, there is no real change (or improvement) in behavior of the API 
> (thus
> no new tests) I've looked at the related OpenACC spec issues, and it seems 
> that
> this part of the 2.7 spec change is mostly a clarification (see no downside in
> current REFCOUNT_INFINITY based implementation either).
> But this patch does make the internals more close to the spec description.

ACK, thanks.

> Tested without regressions using powerpc64le-linux/nvptx, okay for trunk?

A few comments, should be easy to work in:

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -1166,6 +1166,8 @@ struct target_mem_desc;
>  /* Special value for refcount - tgt_offset contains target address of the
> artificial pointer to "omp declare target link" object.  */
>  #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1)
> +/* Special value for refcount - created through acc_map_data.  */
> +#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2)
>
>  /* Special value for refcount - structure element sibling list items.
> All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c
> @@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s)
>assert (n->refcount == 1);
>assert (n->dynamic_refcount == 0);
>/* Special reference counting behavior.  */
> -  n->refcount = REFCOUNT_INFINITY;
> +  n->refcount = REFCOUNT_ACC_MAP_DATA;
> +  n->dynamic_refcount = 1;
>
>if (profiling_p)
>   {
> @@ -460,7 +461,7 @@ acc_unmap_data (void *h)
>   the different 'REFCOUNT_INFINITY' cases, or simply separate
>   'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA'
>   etc.)?  */
> -  else if (n->refcount != REFCOUNT_INFINITY)
> +  else if (n->refcount != REFCOUNT_ACC_MAP_DATA)
>  {
>gomp_mutex_unlock (&acc_dev->lock);
>gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped"

Thus remove the TODO comment before this 'else if' block?  :-)

We should add a comment here that we're unmapping without consideration
of 'n->dynamic_refcount' (that is, 'acc_unmap_data' has implicit
'finalize' semantics -- at least per my reading of the specification; do
you agree?), that is:

acc_map_data([var]); // 'dynamic_refcount = 1'
acc_copyin([var]); // 'dynamic_refcount++'
acc_unmap_data([var]); // does un-map, despite 'dynamic_refcount == 2'?
assert (!acc_is_present([var]));

Do we have such a test case?  If not, please add one.

To complement 'goacc_exit_datum_1' (see below), we should add here:

assert (n->dynamic_refcount >= 1);

The subsequenct code:

if (tgt->refcount == REFCOUNT_INFINITY)
  {
gomp_mutex_unlock (&acc_dev->lock);
gomp_fatal ("cannot unmap target block");
  }

... is now unreachable, I think, and may thus be removed -- and any
inconsistency is caught by the subsequent:

/* Above, we've verified that the mapping must have been set up by
   'acc_map_data'.  */
assert (tgt->refcount == 1);

> @@ -519,7 +520,8 @@ goacc_map_var_existing (struct gomp_device_descr 
> *acc_dev, void *hostaddr,
>  }
>
>assert (n->refcount != REFCOUNT_LINK);
> -  if (n->refcount != REFCOUNT_INFINITY)
> +  if (n->refcount != REFCOUNT_INFINITY
> +  && n->refcount != REFCOUNT_ACC_MAP_DATA)
>  n->refcount++;
>n->dynamic_refcount++;
>
> @@ -683,6 +685,7 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, 
> void *h, size_t s,
>
>assert (n->refcount != REFCOUNT_LINK);
>if (n->refcount != REFCOUNT_INFINITY
> +  && n->refcount != REFCOUNT_ACC_MAP_DATA
>&& n->refcount < n->dynamic_refcount)
>  {
>gomp_mutex_unlock (&acc_dev->lock);
> @@ -691,15 +694,27 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, 
> void *h, size_t s,
>
>if (finalize)
>  {
> -  if (n->refcount != REFCOUNT_INFINITY)
> +  if (n->refcount != REFCOUNT_INFINITY
> +   && n->refcount != REFCOUNT_ACC_MAP_DATA)
>   n->refcount -= n->dynamic_refcount;
> -  n->dynamic_refcount = 0;
> +
> +  if (n->refcount == REFCOUNT_ACC_MAP_DATA)
> + /* Mappings created by acc_map_data are returned to initial
> +dynamic_refcount of 1. Can only be deleted by acc_unmap_data.  */
> + n->dynamic_refcount = 1;
> +  else
> + n->dynamic_refcount = 0;
>  }
>else if

RE: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-10-31 Thread Li, Pan2

Passed the x86 bootstrap and regression tests.

Pan

-Original Message-
From: Juzhe-Zhong  
Sent: Tuesday, October 31, 2023 5:59 PM
To: gcc-patches@gcc.gnu.org
Cc: rguent...@suse.de; jeffreya...@gmail.com; richard.sandif...@arm.com; 
rdapp@gmail.com; Juzhe-Zhong 
Subject: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN

As previous Richard's suggested, we should support strided load/store in
loop vectorizer instead hacking RISC-V backend.

This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.

The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
changing vector offset into scalar stride.

We don't have strided_load/strided_store and 
mask_strided_load/mask_strided_store since
it't unlikely RVV will have such optabs and we can't add the patterns that we 
can't test them.


gcc/ChangeLog:

* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (internal_load_fn_p): Ditto.
(internal_strided_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_strided_fn_supported_p): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* internal-fn.h (internal_strided_fn_p): Ditto.
(internal_strided_fn_supported_p): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 51 +
 gcc/internal-fn.cc  | 44 ++
 gcc/internal-fn.def |  4 
 gcc/internal-fn.h   |  2 ++
 gcc/optabs.def  |  2 ++
 5 files changed, 103 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..5bac713a0dd 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}@var{n}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of mode 
@var{n}.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i:
+
+@itemize @bullet
+@item
+extend the stride to address width, using zero
+extension if operand 3 is 1 and sign extension if operand 3 is zero;
+@item
+multiply the extended stride by operand 4;
+@item
+add the result to the base; and
+@item
+load the value at that address (operand 1 + @var{i} * multiplied and extended 
stride) into element @var{i} of operand 0.
+@end itemize
+
+Similar to mask_len_load, the instruction loads at most (operand 6 + operand 
7) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}@var{n}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of mode 
@var{n}.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i:
+
+@itemize @bullet
+@item
+extend the stride to address width, using zero
+extension if operand 2 is 1 and sign extension if operand 2 is zero;
+@item
+multiply the extended stride by operand 3;
+@item
+add the result to the base; and
+@item
+store element @var{i} of operand 4 to that address (operand 1 + @var{i} * 
multiplied and extended stride).
+@end itemize
+
+Similar to mask_len_store, the instruction stores at most (operand 6 + operand 
7) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask e

RE: [PATCH] VECT: Support mask_len_strided_load/mask_len_strided_store in loop vectorize

2023-10-31 Thread Li, Pan2

Passed the x86 bootstrap and regression tests.

Pan

-Original Message-
From: Juzhe-Zhong  
Sent: Tuesday, October 31, 2023 6:08 PM
To: gcc-patches@gcc.gnu.org
Cc: richard.sandif...@arm.com; rguent...@suse.de; jeffreya...@gmail.com; 
Juzhe-Zhong 
Subject: [PATCH] VECT: Support mask_len_strided_load/mask_len_strided_store in 
loop vectorize

This patch support loop vectorizer generate direct strided load/store IFN
if targets enable it.

Note that this patch provide the ability that target enabling strided 
load/store but without gather/scatter
can vectorize stride memory access.

gcc/ChangeLog:

* optabs-query.cc (supports_vec_gather_load_p): Support strided 
load/store.
(supports_vec_scatter_store_p): Ditto.
* optabs-query.h (supports_vec_gather_load_p): Ditto.
(supports_vec_scatter_store_p): Ditto.
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
(vect_check_gather_scatter): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vect_truncate_gather_scatter_offset): Ditto.
(vect_use_strided_gather_scatters_p): Ditto.
(vect_get_strided_load_store_ops): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Ditto.

---
 gcc/optabs-query.cc| 27 ++-
 gcc/optabs-query.h |  4 +--
 gcc/tree-vect-data-refs.cc | 71 --
 gcc/tree-vect-stmts.cc | 46 +---
 gcc/tree-vectorizer.h  |  3 +-
 5 files changed, 109 insertions(+), 42 deletions(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 947ccef218c..ea594baf15d 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -670,14 +670,19 @@ supports_vec_convert_optab_p (optab op, machine_mode mode)
for at least one vector mode.  */
 
 bool
-supports_vec_gather_load_p (machine_mode mode)
+supports_vec_gather_load_p (machine_mode mode, bool strided_p)
 {
   if (!this_fn_optabs->supports_vec_gather_load[mode])
 this_fn_optabs->supports_vec_gather_load[mode]
   = (supports_vec_convert_optab_p (gather_load_optab, mode)
-|| supports_vec_convert_optab_p (mask_gather_load_optab, mode)
-|| supports_vec_convert_optab_p (mask_len_gather_load_optab, mode)
-? 1 : -1);
+|| supports_vec_convert_optab_p (mask_gather_load_optab, mode)
+|| supports_vec_convert_optab_p (mask_len_gather_load_optab, mode)
+|| (strided_p
+&& convert_optab_handler (mask_len_strided_load_optab, mode,
+  Pmode)
+ != CODE_FOR_nothing)
+  ? 1
+  : -1);
 
   return this_fn_optabs->supports_vec_gather_load[mode] > 0;
 }
@@ -687,14 +692,20 @@ supports_vec_gather_load_p (machine_mode mode)
for at least one vector mode.  */
 
 bool
-supports_vec_scatter_store_p (machine_mode mode)
+supports_vec_scatter_store_p (machine_mode mode, bool strided_p)
 {
   if (!this_fn_optabs->supports_vec_scatter_store[mode])
 this_fn_optabs->supports_vec_scatter_store[mode]
   = (supports_vec_convert_optab_p (scatter_store_optab, mode)
-|| supports_vec_convert_optab_p (mask_scatter_store_optab, mode)
-|| supports_vec_convert_optab_p (mask_len_scatter_store_optab, mode)
-? 1 : -1);
+|| supports_vec_convert_optab_p (mask_scatter_store_optab, mode)
+|| supports_vec_convert_optab_p (mask_len_scatter_store_optab,
+ mode)
+|| (strided_p
+&& convert_optab_handler (mask_len_strided_store_optab, mode,
+  Pmode)
+ != CODE_FOR_nothing)
+  ? 1
+  : -1);
 
   return this_fn_optabs->supports_vec_scatter_store[mode] > 0;
 }
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 920eb6a1b67..7c22edc5a78 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -191,8 +191,8 @@ bool can_compare_and_swap_p (machine_mode, bool);
 bool can_atomic_exchange_p (machine_mode, bool);
 bool can_atomic_load_p (machine_mode);
 bool lshift_cheap_p (bool);
-bool supports_vec_gather_load_p (machine_mode = E_VOIDmode);
-bool supports_vec_scatter_store_p (machine_mode = E_VOIDmode);
+bool supports_vec_gather_load_p (machine_mode = E_VOIDmode, bool = false);
+bool supports_vec_scatter_store_p (machine_mode = E_VOIDmode, bool = false);
 bool can_vec_extract (machine_mode, machine_mode);
 
 /* Version of find_widening_optab_handler_and_mode that operates on
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2..d374849b0a7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3913,9 +3913,9 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
*IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT.  */
 
 bool
-vect_g

RE: [PATCH v3] VECT: Refine the type size restriction of call vectorizer

2023-10-31 Thread Li, Pan2

> can you instead amend vectorizable_internal_function to contain the check,
> returning IFN_LAST if it doesn't hold?

Sure, will send v4 for this.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, October 31, 2023 8:58 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com; Liu, Hongtao 

Subject: Re: [PATCH v3] VECT: Refine the type size restriction of call 
vectorizer

On Mon, Oct 30, 2023 at 1:23 PM  wrote:
>
> From: Pan Li 
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
> out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
> func impl to predicate the type size is legal or not.
> (vectorizable_call): Leverage vectorizable_type_size_legal_p.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-stmts.cc | 51 +++---
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..24b3448d961 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree 
> fndecl,
>return IFN_LAST;
>  }
>
> +/* Return TRUE when the type size is legal for the call vectorizer,
> +   or FALSE.
> +   The type size of both the vectype_in and vectype_out should be
> +   exactly the same when vectype_out isn't participating the optab.
> +   While there is no restriction for type size when vectype_out
> +   is part of the optab query.
> + */
> +static bool
> +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
> +   tree vectype_in)
> +{
> +  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> +
> +  if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
> +return same_size_p;
> +
> +  const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
> +
> +  if (!difn_info.vectorizable)
> +return same_size_p;
> +
> +  /* According to vectorizable_internal_function, the type0/1 < 0 indicates
> + the vectype_out participating the optable selection.  Aka the type size
> + check can be skipped here.  */
> +  if (difn_info.type0 < 0 || difn_info.type1 < 0)
> +return true;

can you instead amend vectorizable_internal_function to contain the check,
returning IFN_LAST if it doesn't hold?

> +
> +  return same_size_p;
> +}
>
>  static tree permute_vec_elements (vec_info *, tree, tree, tree, 
> stmt_vec_info,
>   gimple_stmt_iterator *);
> @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
>
>return false;
>  }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector.  We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -{
> -  if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"mismatched vector sizes %T and %T\n",
> -vectype_in, vectype_out);
> -  return false;
> -}
>
>if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>!= VECTOR_BOOLEAN_TYPE_P (vectype_in))
> @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
>  ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>   vectype_in);
>
> +  if (!vectorizable_type_size_

[Patch, fortran] PR64120

2023-10-31 Thread Paul Richard Thomas

I found this 'obvious' fix, while going through PRs assigned to me.

Regtests. OK for mainline?

Cheers

Paul


Fortran: Allocatable automatic charlen must not be saved [PR64120].

2023-10-31  Paul Thomas  

gcc/fortran
PR fortran/64120
* trans-decl.cc (gfc_trans_deferred_vars): Detect automatic
character length and allow allocatable variants to be nullified
on scope entry and freed on scope exit. Remove trailing white
space.

gcc/testsuite/
PR fortran/64120
* gfortran.dg/pr64120_2.f90: New test.
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index a3f037bd07b..5e0e78ace40 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -4689,9 +4689,14 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 && (sym->ts.u.derived->attr.alloc_comp
 || gfc_is_finalizable (sym->ts.u.derived,
 			   NULL));
+  bool automatic_char_len;
   if (sym->assoc)
 	continue;
 
+  automatic_char_len = sym->ts.type == BT_CHARACTER
+			   && sym->ts.u.cl && sym->ts.u.cl->length
+			   && sym->ts.u.cl->length->expr_type == EXPR_VARIABLE;
+
   /* Set the vptr of unlimited polymorphic pointer variables so that
 	 they do not cause segfaults in select type, when the selector
 	 is an intrinsic type.  */
@@ -4951,7 +4956,8 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 		|| (sym->ts.type == BT_CLASS
 			&& CLASS_DATA (sym)->attr.allocatable)))
 	{
-	  if (!sym->attr.save && flag_max_stack_var_size != 0)
+	  if ((!sym->attr.save || automatic_char_len)
+	   && flag_max_stack_var_size != 0)
 	{
 	  tree descriptor = NULL_TREE;
 
@@ -5210,8 +5216,8 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 	tree tmp = lookup_attribute ("omp allocate",
  DECL_ATTRIBUTES (n->sym->backend_decl));
 	tmp = TREE_VALUE (tmp);
-	TREE_PURPOSE (tmp) = se.expr;	
-	TREE_VALUE (tmp) = align;	
+	TREE_PURPOSE (tmp) = se.expr;
+	TREE_VALUE (tmp) = align;
 	TREE_PURPOSE (TREE_CHAIN (tmp)) = init_stmtlist;
 	TREE_VALUE (TREE_CHAIN (tmp)) = cleanup_stmtlist;
   }
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
!
! Test fix of second testcase in PR64120.
! The first testcase is allocatable_scalar_14.f90.
!
! Contributed by Francois-Xavier Coudert  
!
program test
   logical :: L
   L = g(1)
   write(*,*) L
   L = g(2)
   write(*,*) L
contains
  logical function g(x)
  integer :: x
  character(len=x), allocatable :: s
  save
  if(.NOT.allocated(s)) then
allocate(s)
g = .FALSE.
  else
g = .TRUE.
  end if
  write(*,*) len(s)
  end function g
end
! { dg-final { scan-tree-dump-times "s = 0B;" 2 "original" } }
! { dg-final { scan-tree-dump-times "__builtin_free" 1 "original" } }

[PATCH v4] VECT: Refine the type size restriction of call vectorizer

2023-10-31 Thread pan2 . li

From: Pan Li 

Update in v4:

* Append the check to vectorizable_internal_function.

Update in v3:

* Add func to predicate type size is legal or not for vectorizer call.

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.

The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.

The below test are passed for this patch.

* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.

The below test are ongoing.

* The x86 bootstrap and regression test.
* The aarch64 regression test.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_internal_function): Add type
size check for vectype_out doesn't participating for optab query.
(vectorizable_call): Remove the type size check.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-stmts.cc | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..799b4ab10c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree 
fndecl,
   const direct_internal_fn_info &info = direct_internal_fn (ifn);
   if (info.vectorizable)
{
+ bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+
+ /* The type size of both the vectype_in and vectype_out should be
+exactly the same when vectype_out isn't participating the optab.
+While there is no restriction for type size when vectype_out
+is part of the optab query.  */
+ if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
+   return IFN_LAST;
+
  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
  OPTIMIZE_FOR_SPEED))
return ifn;
@@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
 
   return false;
 }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
- just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
- are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
- by a pack of the two vectors into an SI vector.  We would need
- separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"mismatched vector sizes %T and %T\n",
-vectype_in, vectype_out);
-  return false;
-}
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
   != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Carl Love

On Tue, 2023-10-31 at 10:34 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/10/31 08:08, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch adds tests for two of the rs6000 overloaded
> > built-
> > ins that do not have tests.  Additionally the GCC documentation
> > file
> 
> I just found that actually they have the test coverage, because we
> have
> 
> #define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
> #define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
> #define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
> #define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
> #define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)
> 
> in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all
> these

OK, my simple scripts are not going to pickup the stuff in altivec.h. 
They were just grepping for the built-in name in the test file
directory.

> __builtin_bcdcmp* ...
> 
> > doc/extend.texi is updated to include the built-in definitions as
> > they
> > were missing.
> 
> ... since we already document __builtin_vec_bcdsub_{eq,gt,lt}, I
> think
> it's still good to supplement the documentation and add the explicit
> testing cases.
> 
> > The patch has been tested on a Power 10 system with no
> > regressions. 
> > Please let me know if this patch is acceptable for mainline.
> > 
> >  Carl
> > 
> > ---
> > rs6000, Add missing overloaded bcd builtin tests
> > 
> > The two BCD overloaded built-ins __builtin_bcdsub_ge and
> > __builtin_bcdsub_le
> > do not have a corresponding test.  Add tests to existing test file
> > and update
> > the documentation with the built-in definitions.
> 
> As above, this commit log doesn't describe the actuality well, please
> update
> it with something like:
> 
> Currently we have the documentation for
> __builtin_vec_bcdsub_{eq,gt,lt} but
> not for __builtin_bcdsub_[gl]e, this patch is to supplement the
> descriptions
> for them.  Although they are mainly for __builtin_bcdcmp{ge,le}, we
> already
> have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this
> patch
> adds the corresponding explicit test cases as well.
> 

OK, replaced the commit log with the suggestion.

> > gcc/ChangeLog:
> > * doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge):
> > Add
> > documentation for the builti-ins.
> > 
> > gcc/testsuite/ChangeLog:
> > * bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
> > __builtin_bcdsub_ge and __builtin_bcdsub_le).
> 
> 1) Unexpected ")" at the end.
> 
> 2) I supposed git gcc-verify would complain on this changelog entry.
> 
> Should be starting with:
> 
>   * gcc.target/powerpc/bcd-3.c (
> 
> , no?
> 

Yes, I ment to run the commit check but obviously got distracted and
didn't.  Sorry about that.  

> OK for trunk with the above comments addressed, thanks!
> 
OK, thanks.

Carl 

> BR,
> Kewen
> 
> > ---
> >  gcc/doc/extend.texi  |  4 
> >  gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22
> > +-
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index cf0d0c63cce..fa7402813e7 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned
> > char, vector unsigned char, const int);
> >  vector __int128 __builtin_bcdsub (vector __int128, vector
> > __int128, const int);
> >  vector unsigned char __builtin_bcdsub (vector unsigned char,
> > vector unsigned char,
> > const int);
> > +int __builtin_bcdsub_le (vector __int128, vector __int128, const
> > int);
> > +int __builtin_bcdsub_le (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_lt (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_lt (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_eq (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_eq (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_gt (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_gt (vector unsigned char, vector unsigned
> > char, const int);
> > +int __builtin_bcdsub_ge (vector __int128, vector __int128, const
> > int);
> > +int __builtin_bcdsub_ge (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_ov (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_ov (vector unsigned char, vector unsigned
> > char, const int);
> >  @end smallexample
> > diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-3.c
> > b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
> > index 7948a0c95e2..9891f4ff08e 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/bcd-3.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
>

[pushed][PR111917][RA]: Fixing LRA cycling for multi-reg variable containing a fixed reg

2023-10-31 Thread Vladimir Makarov


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971

Successfully bootstrapped and tested on x86-64, aarch64, pp64le.

commit df111406b4ea1fe2890e94d51655e571cf260d29
Author: Vladimir N. Makarov 
Date:   Tue Oct 31 10:54:43 2023 -0400

[RA]: Fixing LRA cycling for multi-reg variable containing a fixed reg

PR111971 test case uses a multi-reg variable containing a fixed reg.  LRA
rejects such multi-reg because of this when matching the constraint for
an asm insn.  The rejection results in LRA cycling.  The patch fixes this issue.

gcc/ChangeLog:

PR rtl-optimization/111971
* lra-constraints.cc: (process_alt_operands): Don't check start
hard regs for regs originated from register variables.

gcc/testsuite/ChangeLog:

PR rtl-optimization/111971
* gcc.target/powerpc/pr111971.c: New test.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index d10a2a3dc51..0607c8be7cb 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2609,12 +2609,15 @@ process_alt_operands (int only_alternative)
 		  winreg = true;
 		  if (REG_P (op))
 		{
+		  tree decl;
 		  if (hard_regno[nop] >= 0
 			  && in_hard_reg_set_p (this_alternative_set,
 		mode, hard_regno[nop])
-			  && !TEST_HARD_REG_BIT
-			  (this_alternative_exclude_start_hard_regs,
-			   hard_regno[nop]))
+			  && ((REG_ATTRS (op) && (decl = REG_EXPR (op)) != NULL
+			   && VAR_P (decl) && DECL_HARD_REGISTER (decl))
+			  || !(TEST_HARD_REG_BIT
+   (this_alternative_exclude_start_hard_regs,
+hard_regno[nop]
 			win = true;
 		  else if (hard_regno[nop] < 0
 			   && in_class_p (op, this_alternative, NULL))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111971.c b/gcc/testsuite/gcc.target/powerpc/pr111971.c
new file mode 100644
index 000..7f058bd4820
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111971.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+void
+foo (unsigned long long *a)
+{
+  register long long d asm ("r0") = 0x24;
+  long long n;
+  asm ("mr %0, %1" : "=r"(n) : "r"(d));
+  *a++ = n;
+}

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Segher Boessenkool

On Tue, Oct 31, 2023 at 08:31:25AM -0700, Carl Love wrote:
> > I just found that actually they have the test coverage, because we
> > have
> > 
> > #define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
> > #define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
> > #define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
> > #define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
> > #define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)
> > 
> > in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all
> > these
> 
> OK, my simple scripts are not going to pickup the stuff in altivec.h. 
> They were just grepping for the built-in name in the test file
> directory.

You could use gcov to see which rs6000 builtins are not exercised by
anything in the testsuite, maybe.  This probably can be automated pretty
nicely.


Segher

RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Qing Zhao

Hi, 

I wrote a summary based on our extensive discussion, hopefully this can be 
served as an informal proposal. 

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking 
for help.  -:)

Thanks again for all the help.

Qing.


Represent the missing dependence for the "counted_by" attribute and its 
consumers 

Qing Zhao

10/30/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations. 

  1 struct A
  2 {
  3  size_t size;
  4  char buf[] __attribute__((counted_by(size)));
  5 };
  6 
  7 size_t 
  8 foo (size_t sz)
  9 {
 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 11  obj->size = sz;
 12  obj->buf[0] = 2;
 13  return __builtin_dynamic_object_size (obj->buf, 1);
 14 }
  
Please see a more complicate example in the Appendex 1.

We need to represent such data dependency correctly in the IL. 

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information 
for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" 
attribute with the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function 
with the actual FAM field access; 
* Some adjustment to inlining heuristic and some SSA passes to mitigate the 
impact to the optimizer and code generation. 

2.2 The new internal function 

  .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object, 
  if the pointee of the "PTR" has a
* real type, it's the number of the elements of the type;
* void type, it's the number of bytes; 
3rd argument "ACCESS_MODE": 
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write

NOTEs, 
  A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
  B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1. 
  
  C. In this wrieup, we focus on the implementation details for the 
"counted_by" attribute. However, this function should be ready to be used by 
"access" and "alloc_size" without issue. 

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

  struct A
  {
   size_t size;
   char buf[] __attribute__((counted_by(size)));
  };

for any object with such type:

  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the 
FAM field.

Such requirement to the user will guarantee that the first reference to the FAM 
knows the size of the FAM.  

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
  if YES, replace the reference to "obj->buf" with a call to
  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

2.5 Query the size info 

There are multiple consumers of the size info (and ACCESS_MODE info):

  * __builtin_dynamic_object_size;
  * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore

After the last consumer of the size information in the ACCESS_WITH_SIZE, We 
should replace the internal call with its first argument.

Do it in the 2nd object size phase. 

2.7 Adjustment to inlining heuristic and other IPA analysis

the FE changes:

obj->buf

to

_1 = obj->buf;
_2 = obj->size;
.ACCESS_WITH_SIZE (_1, _2, -1)

there are major two changes:

  A. the # of LOADs, the #

Re: [PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-31 Thread Michael Eager


On 10/30/23 10:02, Neal Frager wrote:

The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.

Signed-off-by: Neal Frager 


Added to commit message;
Fix incorrect warning with -mcpu=10.0:
  warning: '-mxl-multiply-high' can be used only with
  '-mcpu=v6.00.a' or greater


---
V1->V2:
  - No need to create a new microblaze specific version check
routine as strverscmp is the correct solution.
V2->V3:
  - Changed mcpu define for microblaze isa testsuite examples.
V3->V4:
  - Added ChangeLog
V4->V5:
  - Added testsuite ChangeLog
V5->V6:
  - Updated testsuite ChangeLog to include all files
---
  gcc/ChangeLog |  4 
  gcc/config/microblaze/microblaze.cc   |  2 +-
  gcc/testsuite/ChangeLog   | 22 +++
  .../gcc.target/microblaze/isa/bshift.c|  2 +-
  gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
  .../gcc.target/microblaze/isa/fcmp1.c |  2 +-
  .../gcc.target/microblaze/isa/fcmp2.c |  2 +-
  .../gcc.target/microblaze/isa/fcmp3.c |  2 +-
  .../gcc.target/microblaze/isa/fcmp4.c |  2 +-
  .../gcc.target/microblaze/isa/fcvt.c  |  2 +-
  .../gcc.target/microblaze/isa/float.c |  2 +-
  .../gcc.target/microblaze/isa/fsqrt.c |  2 +-
  .../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
  .../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
  gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
  .../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
  .../gcc.target/microblaze/isa/mulh.c  |  2 +-
  .../gcc.target/microblaze/isa/nofcmp.c|  2 +-
  .../gcc.target/microblaze/isa/nofloat.c   |  2 +-
  .../gcc.target/microblaze/isa/pcmp.c  |  2 +-
  .../gcc.target/microblaze/isa/vanilla.c   |  2 +-
  .../gcc.target/microblaze/microblaze.exp  |  2 +-
  22 files changed, 46 insertions(+), 20 deletions(-)


Committed.

--
Michael Eager

Re: [Patch, fortran] PR64120

2023-10-31 Thread Steve Kargl

On Tue, Oct 31, 2023 at 02:11:08PM +, Paul Richard Thomas wrote:
> I found this 'obvious' fix, while going through PRs assigned to me.
> 
> Regtests. OK for mainline?
> 

Yes.  Fell free to backport if you have time and desire.

-- 
Steve

RE: [PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-31 Thread Frager, Neal

Hi Michael,

> The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp 
> instead of strverscmp to check the mcpu version against feature 
> options.  By simply changing the define to use strverscmp, the new 
> version 10.0 is treated correctly as a higher version than previous 
> versions.
> 
> Signed-off-by: Neal Frager 

> Added to commit message;
> Fix incorrect warning with -mcpu=10.0:
>   warning: '-mxl-multiply-high' can be used only with
>   '-mcpu=v6.00.a' or greater

> ---
> V1->V2:
>   - No need to create a new microblaze specific version check
> routine as strverscmp is the correct solution.
> V2->V3:
>   - Changed mcpu define for microblaze isa testsuite examples.
> V3->V4:
>   - Added ChangeLog
> V4->V5:
>   - Added testsuite ChangeLog
> V5->V6:
>   - Updated testsuite ChangeLog to include all files
> ---
>   gcc/ChangeLog |  4 
>   gcc/config/microblaze/microblaze.cc   |  2 +-
>   gcc/testsuite/ChangeLog   | 22 +++
>   .../gcc.target/microblaze/isa/bshift.c|  2 +-
>   gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
>   .../gcc.target/microblaze/isa/fcmp1.c |  2 +-
>   .../gcc.target/microblaze/isa/fcmp2.c |  2 +-
>   .../gcc.target/microblaze/isa/fcmp3.c |  2 +-
>   .../gcc.target/microblaze/isa/fcmp4.c |  2 +-
>   .../gcc.target/microblaze/isa/fcvt.c  |  2 +-
>   .../gcc.target/microblaze/isa/float.c |  2 +-
>   .../gcc.target/microblaze/isa/fsqrt.c |  2 +-
>   .../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
>   .../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
>   gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
>   .../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
>   .../gcc.target/microblaze/isa/mulh.c  |  2 +-
>   .../gcc.target/microblaze/isa/nofcmp.c|  2 +-
>   .../gcc.target/microblaze/isa/nofloat.c   |  2 +-
>   .../gcc.target/microblaze/isa/pcmp.c  |  2 +-
>   .../gcc.target/microblaze/isa/vanilla.c   |  2 +-
>   .../gcc.target/microblaze/microblaze.exp  |  2 +-
>   22 files changed, 46 insertions(+), 20 deletions(-)

> Committed.

Did you commit this patch?  I only see the ChangeLog files have been
updated by your commit.

Am I missing something?

Best regards,
Neal Frager
AMD

Re: [PATCH v5] bpf: Improvements in CO-RE builtins implementation.

2023-10-31 Thread David Faust

Hi Cupertino,

On 10/30/23 12:39, Cupertino Miranda wrote:
> 
> Hi everyone,
> 
> Please find a new version for the review as inline attachment.
> 
> Best regards,
> Cupertino
> 

This version LGTM.
Thanks!

> 
> Changes from v4:
>  - Implemented TARGET_DELEGITIMIZE_ADDRESS target hook as the proper
>  solution to the the warning for UNSPEC_CORE_RELOC being
>  non-delegitimize.
>

[committed 2/2] riscv: thead: Add support for the XTheadFMemIdx ISA extension

2023-10-31 Thread Christoph Muellner

From: Christoph Müllner 

The XTheadFMemIdx ISA extension provides additional load and store
instructions for floating-point registers with new addressing modes.

The following memory accesses types are supported:
* load/store: [w,d] (single-precision FP, double-precision FP)

The following addressing modes are supported:
* register offset with additional immediate offset (4 instructions):
  flr, fsr
* zero-extended register offset with additional immediate offset
  (4 instructions): flur, fsur

These addressing modes are also part of the similar XTheadMemIdx
ISA extension support, whose code is reused and extended to support
floating-point registers.

One challenge that this patch needs to solve are GP registers in FP-mode
(e.g. "(reg:DF a2)"), which cannot be handled by the XTheadFMemIdx
instructions. Such registers are the result of independent
optimizations, which can happen after register allocation.
This patch uses a simple but efficient method to address this:
add a dependency for XTheadMemIdx to XTheadFMemIdx optimizations.
This allows to use the instructions from XTheadMemIdx in case
of such registers.

The added tests ensure that this feature won't regress without notice.
Testing: GCC regression test suite and SPEC CPU 2017 intrate (base&peak).

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadFMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadFMemIdx.
* config/riscv/riscv.h (HARDFP_REG_P): New macro.
* config/riscv/thead.cc (is_fmemidx_mode): New function.
(th_memidx_classify_address_index): Add support for XTheadFMemIdx.
(th_fmemidx_output_index): New function.
(th_output_move): Add support for XTheadFMemIdx.
* config/riscv/thead.md (TH_M_ANYF): New mode iterator.
(TH_M_NOEXTF): Likewise.
(*th_fmemidx_movsf_hardfloat): New INSN.
(*th_fmemidx_movdf_hardfloat_rv64): Likewise.
(*th_fmemidx_I_a): Likewise.
(*th_fmemidx_I_c): Likewise.
(*th_fmemidx_US_a): Likewise.
(*th_fmemidx_US_c): Likewise.
(*th_fmemidx_UZ_a): Likewise.
(*th_fmemidx_UZ_c): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-index.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc |   4 +-
 gcc/config/riscv/riscv.h  |   2 +
 gcc/config/riscv/thead.cc |  69 +++-
 gcc/config/riscv/thead.md | 161 ++
 .../riscv/xtheadfmemidx-index-update.c|  20 +++
 .../xtheadfmemidx-index-xtheadbb-update.c |  20 +++
 .../riscv/xtheadfmemidx-index-xtheadbb.c  |  22 +++
 .../gcc.target/riscv/xtheadfmemidx-index.c|  22 +++
 .../riscv/xtheadfmemidx-uindex-update.c   |  20 +++
 .../xtheadfmemidx-uindex-xtheadbb-update.c|  20 +++
 .../riscv/xtheadfmemidx-uindex-xtheadbb.c |  24 +++
 .../gcc.target/riscv/xtheadfmemidx-uindex.c   |  25 +++
 12 files changed, 404 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-index.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx-uindex.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index de6d9734da0..0148a4f2e43 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1084,7 +1084,7 @@ riscv_regno_mode_ok_for_base_p (int regno,
 enum reg_class
 riscv_index_reg_class ()
 {
-  if (TARGET_XTHEADMEMIDX)
+  if (TARGET_XTHEADMEMIDX || TARGET_XTHEADFMEMIDX)
 return GR_REGS;
 
   return NO_REGS;
@@ -1097,7 +1097,7 @@ riscv_index_reg_class ()
 int
 riscv_regno_ok_for_index_p (int regno)
 {
-  if (TARGET_XTHEADMEMIDX)
+  if (TARGET_XTHEADMEMIDX || TARGET_XTHEADFMEMIDX)
 return riscv_regno_mode_ok_for_base_p (regno, VOIDmode, 1);
 
   return 0;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index eb162abcb92..1e9813b4f

[committed 1/2] riscv: thead: Add support for the XTheadMemIdx ISA extension

2023-10-31 Thread Christoph Muellner

From: Christoph Müllner 

The XTheadMemIdx ISA extension provides a additional load and store
instructions with new addressing modes.

The following memory accesses types are supported:
* load: b,bu,h,hu,w,wu,d
* store: b,h,w,d

The following addressing modes are supported:
* immediate offset with PRE_MODIFY or POST_MODIFY (22 instructions):
  l.ia, l.ib, s.ia, s.ib
* register offset with additional immediate offset (11 instructions):
  lr, sr
* zero-extended register offset with additional immediate offset
  (11 instructions): lur, sur

The RISC-V base ISA does not support index registers, so the changes
are kept separate from the RISC-V standard support as much as possible.

To combine the shift/multiply instructions into the memory access
instructions, this patch comes with a few insn_and_split optimizations
that allow the combiner to do this task.

Handling the different cases of extensions results in a couple of INSNs
that look redundant on first view, but they are just the equivalence
of what we already have for Zbb as well. The only difference is, that
we have much more load instructions.

We already have a constraint with the name 'th_f_fmv', therefore,
the new constraints follow this pattern and have the same length
as required ('th_m_mia', 'th_m_mib', 'th_m_mir', 'th_m_miu').

The added tests ensure that this feature won't regress without notice.
Testing: GCC regression test suite, GCC bootstrap build, and
SPEC CPU 2017 intrate (base&peak) on C920.

Signed-off-by: Christoph Müllner 

gcc/ChangeLog:

* config/riscv/constraints.md (th_m_mia): New constraint.
(th_m_mib): Likewise.
(th_m_mir): Likewise.
(th_m_miu): Likewise.
* config/riscv/riscv-protos.h (enum riscv_address_type):
Add new address types ADDRESS_REG_REG, ADDRESS_REG_UREG,
and ADDRESS_REG_WB and their documentation.
(struct riscv_address_info): Add new field 'shift' and
document the field usage for the new address types.
(riscv_valid_base_register_p): New prototype.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadMemIdx.
(riscv_classify_address): Call th_classify_address() on top.
(riscv_output_move): Call th_output_move() on top.
(riscv_print_operand_address): Call th_print_operand_address()
on top.
* config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): New macro.
(HAVE_PRE_MODIFY_DISP): Likewise.
* config/riscv/riscv.md (zero_extendqi2): Disable
for XTheadMemIdx.
(*zero_extendqi2_internal): Convert to expand,
create INSN with same name and disable it for XTheadMemIdx.
(extendsidi2): Likewise.
(*extendsidi2_internal): Disable for XTheadMemIdx.
* config/riscv/thead.cc (valid_signed_immediate): New helper
function.
(th_memidx_classify_address_modify): New function.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_output_modify): Likewise.
(is_memidx_mode): Likewise.
(th_memidx_classify_address_index): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_memidx_output_index): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/thead.md (*th_memidx_operand): New splitter.
(*th_memidx_zero_extendqi2): New INSN.
(*th_memidx_extendsidi2): Likewise.
(*th_memidx_zero_extendsidi2): Likewise.
(*th_memidx_zero_extendhi2): Likewise.
(*th_memidx_extend2): Likewise.
(*th_memidx_bb_zero_extendsidi2): Likewise.
(*th_memidx_bb_zero_extendhi2): Likewise.
(*th_memidx_bb_extendhi2): Likewise.
(*th_memidx_bb_extendqi2): Likewise.
(TH_M_ANYI): New mode iterator.
(TH_M_NOEXTI): Likewise.
(*th_memidx_I_a): New combiner optimization.
(*th_memidx_I_b): Likewise.
(*th_memidx_I_c): Likewise.
(*th_memidx_US_a): Likewise.
(*th_memidx_US_b): Likewise.
(*th_memidx_US_c): Likewise.
(*th_memidx_UZ_a): Likewise.
(*th_memidx_UZ_b): Likewise.
(*th_memidx_UZ_c): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadmemidx-helpers.h: New test.
* gcc.target/riscv/xtheadmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-index.c: New test.
* gcc.target/riscv/xtheadmemidx-modify-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-modify.c: New t

[Committed 1/2] RISC-V: Let non-atomic targets use optimized amo loads/stores

2023-10-31 Thread Patrick O'Neill




On 10/31/23 06:05, Jeff Law wrote:



On 10/30/23 18:49, Patrick O'Neill wrote:
Non-atomic targets are currently prevented from using the optimized 
fencing for

seq_cst load/seq_cst store. This patch removes that constraint.

gcc/ChangeLog:

* config/riscv/sync-rvwmo.md (atomic_load_rvwmo): Remove
TARGET_ATOMIC constraint
(atomic_store_rvwmo): Ditto.
* config/riscv/sync-ztso.md (atomic_load_ztso): Ditto.
(atomic_store_ztso): Ditto.
* config/riscv/sync.md (atomic_load): Ditto.
(atomic_store): Ditto.

OK
jeff

Committed

Patrick

[Committed 2/2] RISC-V: Require a extension for testcases with atomic insns

2023-10-31 Thread Patrick O'Neill




On 10/31/23 06:07, Jeff Law wrote:



On 10/30/23 18:49, Patrick O'Neill wrote:
Add testsuite infrastructure for the A extension and use it to 
require the A
extension for dg-do run and add the add extension for non-A dg-do 
compile.


gcc/testsuite/ChangeLog:

 * gcc.target/riscv/amo-table-a-6-amo-add-1.c: Add A 
extension to

dg-options for dg-do compile.
 * gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
 * gcc.target/riscv/inline-atomics-2.c: Ditto.
 * gcc.target/riscv/inline-atomics-3.c: Require A extension 
for dg-do

run.
 * gcc.target/riscv/inline-atomics-4.c: Ditto.
 * gcc.target/riscv/inline-atomics-5.c: Ditto.
 * gcc.target/riscv/inline-atomics-6.c: Ditto.
 * gcc.target/riscv/inline-atomics-7.c: Ditto.
 * gcc.target/riscv/inline-atomics-8.c: Ditto.
 * lib/target-supports.exp: Add testing infrastructure to 
require the A

extension or add it to an existing -march.

OK
jeff

Committed

Patrick

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Carl Love

Segher:

On Tue, 2023-10-31 at 11:17 -0500, Segher Boessenkool wrote:

> 
> You could use gcov to see which rs6000 builtins are not exercised by
> anything in the testsuite, maybe.  This probably can be automated
> pretty
> nicely.

I will take a look at gcov.  I just did some relatively simple scripts
to go look for test cases.  For the non-overloaded built-ins, the
scrips had to exclude built-ins referenced by the overloaded built-ins.

This patch is just the first of a series of patches that I am working
on to try and clean up the built-in stuff per some comments in a PR. 
The internal LTC issue is

https://github.ibm.com/ltc-toolchain/power-gcc/issues/1288

The goal is to make sure there are test cases and documentation for all
of the overloaded and non overloaded built-in definitions.  Just a low
priority project to fill any spare cycles.  :-)

  Carl

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Siddhesh Poyarekar


On 2023-10-31 12:26, Qing Zhao wrote:

Hi,

I wrote a summary based on our extensive discussion, hopefully this can be 
served as an informal proposal.

Please take a look at it and let me know any comment or suggestion.

There are some (???) in the section 3.2 and 3.6, those are my questions seeking 
for help.  -:)

Thanks again for all the help.

Qing.


Represent the missing dependence for the "counted_by" attribute and its 
consumers

Qing Zhao

10/30/2023
==

The whole discussion is at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html

1. The problem

There is a data dependency between the size assignment and the implicit use of 
the size information in the __builtin_dynamic_object_size that is missing in 
the IL (line 11 and line 13 in the below example). Such information missing 
will result incorrect code reordering and other code transformations.

   1 struct A
   2 {
   3  size_t size;
   4  char buf[] __attribute__((counted_by(size)));
   5 };
   6
   7 size_t
   8 foo (size_t sz)
   9 {
  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
  11  obj->size = sz;
  12  obj->buf[0] = 2;
  13  return __builtin_dynamic_object_size (obj->buf, 1);
  14 }
   
Please see a more complicate example in the Appendex 1.


We need to represent such data dependency correctly in the IL.

2. The solution:

2.1 Summary

* Add a new internal function "ACCESS_WITH_SIZE" to carry the size information 
for every FAM field access;
* In C FE, Replace every FAM field access whose TYPE has the "counted_by" attribute with 
the new internal function "ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When the size information and the "ACCESS_MODE" information are not used 
anymore, possibly at the 2nd object size phase, replace the internal function with the 
actual FAM field access;
* Some adjustment to inlining heuristic and some SSA passes to mitigate the 
impact to the optimizer and code generation.

2.2 The new internal function

   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "PTR" same as the 1st argument;

1st argument "PTR": Pointer to the object;
2nd argument "SIZE": The size of the pointed object,
   if the pointee of the "PTR" has a
 * real type, it's the number of the elements of the type;
 * void type, it's the number of bytes;
3rd argument "ACCESS_MODE":
   -1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write

NOTEs,
   A. This new internal function is intended for a more general use from all the 3 attributes, "access", 
"alloc_size", and the new "counted_by", to encode the "size" and "access_mode" 
information to the corresponding pointer. (in order to resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.
   C. In this wrieup, we focus on the implementation details for the "counted_by" attribute. 
However, this function should be ready to be used by "access" and "alloc_size" without 
issue.

2.3 A new semantic requirement in the user documentation of "counted_by"

For the following structure including a FAM with a counted_by attribute:

   struct A
   {
size_t size;
char buf[] __attribute__((counted_by(size)));
   };

for any object with such type:

   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));

The setting to the size field should be done before the first reference to the 
FAM field.


A more flexible specification could be stating that validation for a 
reference to the FAM field will use the latest value assigned to the 
size field before that reference.  That will allow for situations like:


  o->size = val1;
  deref (o->buf);
  o->size = val2;

making it clear that deref will see val1 and not val2.



Such requirement to the user will guarantee that the first reference to the FAM 
knows the size of the FAM.

We need to add this additional requirement to the user document.

2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE

In C FE:

for every reference to a FAM, for example, "obj->buf" in the small example,
   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
   if YES, replace the reference to "obj->buf" with a call to
   .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);

2.5 Query the size info

There are multiple consumers of the size info (and ACCESS_MODE info):

   * __builtin_dynamic_object_size;
   * array bound sanitizer;

in these consumers, get the size info from the 2nd argument of the call to
ACCESS_WITH_SIZE (PTR, SIZE, -1)

2.6 Eliminate the internal function when not useful anymore

Aft

Re: [PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-31 Thread Michael Eager


On 10/31/23 09:41, Frager, Neal wrote:

Hi Michael,


The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp, the new
version 10.0 is treated correctly as a higher version than previous
versions.

Signed-off-by: Neal Frager 



Added to commit message;
 Fix incorrect warning with -mcpu=10.0:
   warning: '-mxl-multiply-high' can be used only with
   '-mcpu=v6.00.a' or greater



---
V1->V2:
   - No need to create a new microblaze specific version check
 routine as strverscmp is the correct solution.
V2->V3:
   - Changed mcpu define for microblaze isa testsuite examples.
V3->V4:
   - Added ChangeLog
V4->V5:
   - Added testsuite ChangeLog
V5->V6:
   - Updated testsuite ChangeLog to include all files
---
   gcc/ChangeLog |  4 
   gcc/config/microblaze/microblaze.cc   |  2 +-
   gcc/testsuite/ChangeLog   | 22 +++
   .../gcc.target/microblaze/isa/bshift.c|  2 +-
   gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp1.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp2.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp3.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp4.c |  2 +-
   .../gcc.target/microblaze/isa/fcvt.c  |  2 +-
   .../gcc.target/microblaze/isa/float.c |  2 +-
   .../gcc.target/microblaze/isa/fsqrt.c |  2 +-
   .../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
   .../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
   gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
   .../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
   .../gcc.target/microblaze/isa/mulh.c  |  2 +-
   .../gcc.target/microblaze/isa/nofcmp.c|  2 +-
   .../gcc.target/microblaze/isa/nofloat.c   |  2 +-
   .../gcc.target/microblaze/isa/pcmp.c  |  2 +-
   .../gcc.target/microblaze/isa/vanilla.c   |  2 +-
   .../gcc.target/microblaze/microblaze.exp  |  2 +-
   22 files changed, 46 insertions(+), 20 deletions(-)



Committed.


Did you commit this patch?  I only see the ChangeLog files have been
updated by your commit.

Am I missing something?


Somehow only the ChangeLogs, which required manual editing, were
marked to be added.  I'll add the other files.

--
Michael Eager

Re: [PATCH v5] bpf: Improvements in CO-RE builtins implementation.

2023-10-31 Thread David Faust




On 10/31/23 09:58, David Faust wrote:
> Hi Cupertino,
> 
> On 10/30/23 12:39, Cupertino Miranda wrote:
>>
>> Hi everyone,
>>
>> Please find a new version for the review as inline attachment.
>>
>> Best regards,
>> Cupertino
>>
> 
> This version LGTM.
> Thanks!

OK for trunk.
Thanks.

> 
>>
>> Changes from v4:
>>  - Implemented TARGET_DELEGITIMIZE_ADDRESS target hook as the proper
>>  solution to the the warning for UNSPEC_CORE_RELOC being
>>  non-delegitimize.
>>
> 
>

Re: [PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-31 Thread Michael Eager


On 10/31/23 09:41, Frager, Neal wrote:

Hi Michael,


The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options.  By simply changing the define to use strverscmp, the new
version 10.0 is treated correctly as a higher version than previous
versions.

Signed-off-by: Neal Frager 



Added to commit message;
 Fix incorrect warning with -mcpu=10.0:
   warning: '-mxl-multiply-high' can be used only with
   '-mcpu=v6.00.a' or greater



---
V1->V2:
   - No need to create a new microblaze specific version check
 routine as strverscmp is the correct solution.
V2->V3:
   - Changed mcpu define for microblaze isa testsuite examples.
V3->V4:
   - Added ChangeLog
V4->V5:
   - Added testsuite ChangeLog
V5->V6:
   - Updated testsuite ChangeLog to include all files
---
   gcc/ChangeLog |  4 
   gcc/config/microblaze/microblaze.cc   |  2 +-
   gcc/testsuite/ChangeLog   | 22 +++
   .../gcc.target/microblaze/isa/bshift.c|  2 +-
   gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp1.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp2.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp3.c |  2 +-
   .../gcc.target/microblaze/isa/fcmp4.c |  2 +-
   .../gcc.target/microblaze/isa/fcvt.c  |  2 +-
   .../gcc.target/microblaze/isa/float.c |  2 +-
   .../gcc.target/microblaze/isa/fsqrt.c |  2 +-
   .../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
   .../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
   gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
   .../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
   .../gcc.target/microblaze/isa/mulh.c  |  2 +-
   .../gcc.target/microblaze/isa/nofcmp.c|  2 +-
   .../gcc.target/microblaze/isa/nofloat.c   |  2 +-
   .../gcc.target/microblaze/isa/pcmp.c  |  2 +-
   .../gcc.target/microblaze/isa/vanilla.c   |  2 +-
   .../gcc.target/microblaze/microblaze.exp  |  2 +-
   22 files changed, 46 insertions(+), 20 deletions(-)



Committed.


Did you commit this patch?  I only see the ChangeLog files have been
updated by your commit.

Am I missing something?



Updated.

--
Michael Eager

RE: [PATCH v6 1/1] gcc: config: microblaze: fix cpu version check

2023-10-31 Thread Frager, Neal

> Hi Michael,
> 
>> The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp 
>> instead of strverscmp to check the mcpu version against feature 
>> options.  By simply changing the define to use strverscmp, the new 
>> version 10.0 is treated correctly as a higher version than previous 
>> versions.
>>
>> Signed-off-by: Neal Frager 
> 
>> Added to commit message;
>>  Fix incorrect warning with -mcpu=10.0:
>>warning: '-mxl-multiply-high' can be used only with
>>'-mcpu=v6.00.a' or greater
> 
>> ---
>> V1->V2:
>>- No need to create a new microblaze specific version check
>>  routine as strverscmp is the correct solution.
>> V2->V3:
>>- Changed mcpu define for microblaze isa testsuite examples.
>> V3->V4:
>>- Added ChangeLog
>> V4->V5:
>>- Added testsuite ChangeLog
>> V5->V6:
>>- Updated testsuite ChangeLog to include all files
>> ---
>>gcc/ChangeLog |  4 
>>gcc/config/microblaze/microblaze.cc   |  2 +-
>>gcc/testsuite/ChangeLog   | 22 +++
>>.../gcc.target/microblaze/isa/bshift.c|  2 +-
>>gcc/testsuite/gcc.target/microblaze/isa/div.c |  2 +-
>>.../gcc.target/microblaze/isa/fcmp1.c |  2 +-
>>.../gcc.target/microblaze/isa/fcmp2.c |  2 +-
>>.../gcc.target/microblaze/isa/fcmp3.c |  2 +-
>>.../gcc.target/microblaze/isa/fcmp4.c |  2 +-
>>.../gcc.target/microblaze/isa/fcvt.c  |  2 +-
>>.../gcc.target/microblaze/isa/float.c |  2 +-
>>.../gcc.target/microblaze/isa/fsqrt.c |  2 +-
>>.../microblaze/isa/mul-bshift-pcmp.c  |  2 +-
>>.../gcc.target/microblaze/isa/mul-bshift.c|  2 +-
>>gcc/testsuite/gcc.target/microblaze/isa/mul.c |  2 +-
>>.../microblaze/isa/mulh-bshift-pcmp.c |  2 +-
>>.../gcc.target/microblaze/isa/mulh.c  |  2 +-
>>.../gcc.target/microblaze/isa/nofcmp.c|  2 +-
>>.../gcc.target/microblaze/isa/nofloat.c   |  2 +-
>>.../gcc.target/microblaze/isa/pcmp.c  |  2 +-
>>.../gcc.target/microblaze/isa/vanilla.c   |  2 +-
>>.../gcc.target/microblaze/microblaze.exp  |  2 +-
>>22 files changed, 46 insertions(+), 20 deletions(-)
> 
>> Committed.
> 
> Did you commit this patch?  I only see the ChangeLog files have been 
> updated by your commit.
> 
> Am I missing something?
> 

> Updated.

Thanks!

Best regards,
Neal Frager
AMD

[PATCH] c++: constantness of local var in constexpr fn [PR111703, PR112269]

2023-10-31 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Does it look OK for release branches as well for sake of PR111703?

-- >8 --

potential_constant_expression was incorrectly treating most local
variables from a constexpr function as (potentially) constant because it
wasn't considering the 'now' parameter.  This patch fixes this by
relaxing some var_in_maybe_constexpr_fn checks accordingly, which turns
out to partially fix two recently reported regressions:

PR111703 is a regression caused by r11-550-gf65a3299a521a4 for
restricting constexpr evaluation during warning-dependent folding.
The mechanism is intended to restrict only constant evaluation of the
instantiated non-dependent expression, but it also ends up restricting
constant evaluation (as part of satisfaction) during instantiation of
the expression, in particular when resolving the ck_rvalue conversion of
the 'x' argument into a copy constructor call.  This seems like a bug in
the mechanism[1], though I don't know if we want to refine the mechanism
or get rid of it completely since the original testcases which motivated
the mechanism are fixed more simply by r13-1225-gb00b95198e6720.  In any
case, this patch partially fixes this by making us correctly treat 'x'
and therefore 'f(x)' in the below testcase as non-constant, which
prevents the problematic warning-dependent folding from occurring at
all.  If this bug crops up again then I figure we could decide what to
do with the mechanism then.

PR112269 is caused by r14-4796-g3e3d73ed5e85e7 for merging tsubst_copy
into tsubst_copy_and_build.  tsubst_copy used to exit early when 'args'
was empty, behavior which that commit deliberately didn't preserve.
This early exit masked the fact that COMPLEX_EXPR wasn't handled by
tsubst at all, and is a tree code that apparently we could see during
warning-dependent folding on some targets.  A complete fix is to add
handling for this tree code in tsubst_expr, but this patch should fix
the reported testsuite failures since the situations where COMPLEX_EXPR
crops up in  turn out to not be constant expressions in the
first place after this patch.

[1]: The mechanism incorrectly assumes that instantiation of the
non-dependent expression shouldn't induce any template instantiation
since ahead of time checking of the expression should've already induced
whatever template instantiation was needed, but in this case although
overload resolution was performed ahead of time, a ck_rvalue conversion
gets resolved to a copy constructor call only at instantiation time.

PR c++/111703

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Only consider var_in_maybe_constexpr_fn if 'now' is false.
: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-fn8.C: New test.
---
 gcc/cp/constexpr.cc   |  4 ++--
 gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C | 24 +++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index c05760e6789..8a6b210144a 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9623,7 +9623,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  return RECUR (DECL_VALUE_EXPR (t), rval);
}
   if (want_rval
- && !var_in_maybe_constexpr_fn (t)
+ && (now || !var_in_maybe_constexpr_fn (t))
  && !type_dependent_expression_p (t)
  && !decl_maybe_constant_var_p (t)
  && (strict
@@ -9737,7 +9737,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
 STRIP_NOPS (x);
 if (is_this_parameter (x) && !is_capture_proxy (x))
  {
-   if (!var_in_maybe_constexpr_fn (x))
+   if (now || !var_in_maybe_constexpr_fn (x))
  {
if (flags & tf_error)
  constexpr_error (loc, fundef_p, "use of % in a "
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
new file mode 100644
index 000..3f63a5b28d7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
@@ -0,0 +1,24 @@
+// PR c++/111703
+// { dg-do compile { target c++20 } }
+
+template
+constexpr bool always_true() { return true; }
+
+struct P {
+  P() = default;
+
+  template
+requires (always_true()) // { dg-bogus "used before its definition" }
+  constexpr P(const T&) { }
+
+  int n, m;
+};
+
+void (*f)(P);
+
+template
+constexpr bool g() {
+  P x;
+  f(x); // { dg-bogus "from here" }
+  return true;
+}
-- 
2.42.0.526.g3130c155df

__hardcfr_check_fail and BPF

2023-10-31 Thread Jose E. Marchesi



Hi Alex.

As you may know, in BPF we have to live (for now) with the constant pain
from being limited to functions whose arguments can be compiled to get
their arguments in five or less registers.

The recently introduced __hardcfr_check_fail in the run-time component
of hardcfr breaks the bpf-unknown-none build:

  ../../../libgcc/hardcfr.c: In function ‘__hardcfr_check_fail’:
  ../../../libgcc/hardcfr.c:210:1: error: too many function arguments for eBPF
210 | __hardcfr_check_fail (size_t const blocks ATTRIBUTE_UNUSED,
| ^~~~

It seems to me that __hardcfr_check_fail is only called from
__hardcfr_check, and compiled code is not instrumentalized with direct
calls to it.

If so, would it be possible to modify that function so it gets one less
argument? :)

Alternatively, we would need to disable the hardcfr from the BPF backend
and being able to define something in tm.h to inhibit building the
corresponding runtime in libgcc.  Would you be ok with having an #ifndef
DISABLE_LIBGCC_HARDCFR wrapping the stuff in that file?

Thanks.

[PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Vineet Gupta

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3e27897d6d30..7b8e9af0a5af 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8630,9 +8630,10 @@ riscv_promote_function_mode (const_tree type 
ATTRIBUTE_UNUSED,
 return promote_mode (type, mode, punsignedp);
 
   unsignedp = *punsignedp;
-  PROMOTE_MODE (as_a  (mode), unsignedp, type);
+  scalar_mode smode = as_a  (mode);
+  PROMOTE_MODE (smode, unsignedp, type);
   *punsignedp = unsignedp;
-  return mode;
+  return smode;
 }
 
 /* Implement TARGET_MACHINE_DEPENDENT_REORG.  */
-- 
2.34.1

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Qing Zhao



> On Oct 31, 2023, at 1:35 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-10-31 12:26, Qing Zhao wrote:
>> Hi,
>> I wrote a summary based on our extensive discussion, hopefully this can be 
>> served as an informal proposal.
>> Please take a look at it and let me know any comment or suggestion.
>> There are some (???) in the section 3.2 and 3.6, those are my questions 
>> seeking for help.  -:)
>> Thanks again for all the help.
>> Qing.
>> 
>> Represent the missing dependence for the "counted_by" attribute and its 
>> consumers
>> Qing Zhao
>> 10/30/2023
>> ==
>> The whole discussion is at:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
>> 1. The problem
>> There is a data dependency between the size assignment and the implicit use 
>> of the size information in the __builtin_dynamic_object_size that is missing 
>> in the IL (line 11 and line 13 in the below example). Such information 
>> missing will result incorrect code reordering and other code transformations.
>>   1 struct A
>>   2 {
>>   3  size_t size;
>>   4  char buf[] __attribute__((counted_by(size)));
>>   5 };
>>   6
>>   7 size_t
>>   8 foo (size_t sz)
>>   9 {
>>  10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>>  11  obj->size = sz;
>>  12  obj->buf[0] = 2;
>>  13  return __builtin_dynamic_object_size (obj->buf, 1);
>>  14 }
>>   Please see a more complicate example in the Appendex 1.
>> We need to represent such data dependency correctly in the IL.
>> 2. The solution:
>> 2.1 Summary
>> * Add a new internal function "ACCESS_WITH_SIZE" to carry the size 
>> information for every FAM field access;
>> * In C FE, Replace every FAM field access whose TYPE has the "counted_by" 
>> attribute with the new internal function "ACCESS_WITH_SIZE";
>> * In every consumer of the size information, for example, BDOS or array 
>> bound sanitizer, query the size information or ACCESS_MODE information from 
>> the new internal function;
>> * When the size information and the "ACCESS_MODE" information are not used 
>> anymore, possibly at the 2nd object size phase, replace the internal 
>> function with the actual FAM field access;
>> * Some adjustment to inlining heuristic and some SSA passes to mitigate the 
>> impact to the optimizer and code generation.
>> 2.2 The new internal function
>>   .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>> which returns the "PTR" same as the 1st argument;
>> 1st argument "PTR": Pointer to the object;
>> 2nd argument "SIZE": The size of the pointed object,
>>   if the pointee of the "PTR" has a
>> * real type, it's the number of the elements of the type;
>> * void type, it's the number of bytes;
>> 3rd argument "ACCESS_MODE":
>>   -1: Unknown access semantics
>>0: none
>>1: read_only
>>2: write_only
>>3: read_write
>> NOTEs,
>>   A. This new internal function is intended for a more general use from all 
>> the 3 attributes, "access", "alloc_size", and the new "counted_by", to 
>> encode the "size" and "access_mode" information to the corresponding 
>> pointer. (in order to resolve PR96503, etc. 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
>>   B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be 
>> -1.
>>   C. In this wrieup, we focus on the implementation details for the 
>> "counted_by" attribute. However, this function should be ready to be used by 
>> "access" and "alloc_size" without issue.
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> For the following structure including a FAM with a counted_by attribute:
>>   struct A
>>   {
>>size_t size;
>>char buf[] __attribute__((counted_by(size)));
>>   };
>> for any object with such type:
>>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> The setting to the size field should be done before the first reference to 
>> the FAM field.
> 
> A more flexible specification could be stating that validation for a 
> reference to the FAM field will use the latest value assigned to the size 
> field before that reference.  That will allow for situations like:
> 
>  o->size = val1;
>  deref (o->buf);
>  o->size = val2;
> 
> making it clear that deref will see val1 and not val2.

Good point! Yes, with the current design, this is reasonable. 
Will update the proposal with this.
> 
>> Such requirement to the user will guarantee that the first reference to the 
>> FAM knows the size of the FAM.
>> We need to add this additional requirement to the user document.
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> In C FE:
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>   if YES, replace the reference to "obj->buf" wi

[PING] [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-10-31 Thread Martin Uecker

Am Montag, dem 18.09.2023 um 23:26 +0200 schrieb Martin Uecker:
> 
> Compared to the previous version I changed the name of the
> warning to "Walloc-size" which matches "Wanalyzer-allocation-size"
> but is still in line with the other -Walloc-something warnings
> we have. I also added it to Wextra.
> 
> I found PR71219 that requests the warning and points out that 
> it is recommended by the C secure coding guidelines and added
> the PR to the commit log  (although the version with cast is not
> diagnosed so far.)  
> 
> I did not have time to implement the extensions suggested
> on the list,  i.e. warn when the size is not a multiple
> of the size of the type and warn for if the size is not
> suitable for a flexible array member. (this is also a bit
> more complicated than it seems)
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> Martin
> 
> 
> Add option Walloc-size that warns about allocations that have
> insufficient storage for the target type of the pointer the
> storage is assigned to.
> 
>   PR c/71219
> gcc:
>   * doc/invoke.texi: Document -Walloc-size option.
> 
> gcc/c-family:
> 
>   * c.opt (Walloc-size): New option.
> 
> gcc/c:
>   * c-typeck.cc (convert_for_assignment): Add warning.
> 
> gcc/testsuite:
> 
>   * gcc.dg/Walloc-size-1.c: New test.
> ---
>  gcc/c-family/c.opt   |  4 
>  gcc/c/c-typeck.cc| 27 +
>  gcc/doc/invoke.texi  | 10 
>  gcc/testsuite/gcc.dg/Walloc-size-1.c | 36 
>  4 files changed, 77 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/Walloc-size-1.c
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 7348ad42ee0..9ba08a1fb6d 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -319,6 +319,10 @@ Walloca
>  C ObjC C++ ObjC++ Var(warn_alloca) Warning
>  Warn on any use of alloca.
>  
> +Walloc-size
> +C ObjC Var(warn_alloc_size) Warning
> +Warn when allocating insufficient storage for the target type of the 
> assigned pointer.
> +
>  Walloc-size-larger-than=
>  C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int 
> ByteSize Warning Init(HOST_WIDE_INT_MAX)
>  -Walloc-size-larger-than= Warn for calls to allocation functions 
> that
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index e2bfd2caf85..c759c6245ed 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -7384,6 +7384,33 @@ convert_for_assignment (location_t location, 
> location_t expr_loc, tree type,
>   "request for implicit conversion "
>   "from %qT to %qT not permitted in C++", rhstype, type);
>  
> +  /* Warn of new allocations that are not big enough for the target
> +  type.  */
> +  tree fndecl;
> +  if (warn_alloc_size
> +   && TREE_CODE (rhs) == CALL_EXPR
> +   && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
> +   && DECL_IS_MALLOC (fndecl))
> + {
> +   tree fntype = TREE_TYPE (fndecl);
> +   tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
> +   tree alloc_size = lookup_attribute ("alloc_size", fntypeattrs);
> +   if (alloc_size)
> + {
> +   tree args = TREE_VALUE (alloc_size);
> +   int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> +   /* For calloc only use the second argument.  */
> +   if (TREE_CHAIN (args))
> + idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN (args))) - 1;
> +   tree arg = CALL_EXPR_ARG (rhs, idx);
> +   if (TREE_CODE (arg) == INTEGER_CST
> +   && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> +  warning_at (location, OPT_Walloc_size, "allocation of "
> +  "insufficient size %qE for type %qT with "
> +  "size %qE", arg, ttl, TYPE_SIZE_UNIT (ttl));
> + }
> + }
> +
>/* See if the pointers point to incompatible address spaces.  */
>asl = TYPE_ADDR_SPACE (ttl);
>asr = TYPE_ADDR_SPACE (ttr);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 33befee7d6b..a4fbcf5e1b5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8086,6 +8086,16 @@ always leads to a call to another @code{cold} function 
> such as wrappers of
>  C++ @code{throw} or fatal error reporting functions leading to @code{abort}.
>  @end table
>  
> +@opindex Wno-alloc-size
> +@opindex Walloc-size
> +@item -Walloc-size
> +Warn about calls to allocation functions decorated with attribute
> +@code{alloc_size} that specify insufficient size for the target type of
> +the pointer the result is assigned to, including those to the built-in
> +forms of the functions @code{aligned_alloc}, @code{alloca},
> +@code{calloc},
> +@code{malloc}, and @code{realloc}.
> +
>  @opindex Wno-alloc-zero
>  @opindex Walloc-zero
>  @item -Walloc-zero
> diff --git a/gcc/testsuite/gcc.dg/Walloc-size-1.c 
> b/gcc/testsuite/gcc.dg/Walloc-size-1.c
> new file mode 10

Re: [PATCH v5] bpf: Improvements in CO-RE builtins implementation.

2023-10-31 Thread Cupertino Miranda




> On 10/31/23 09:58, David Faust wrote:
>> Hi Cupertino,
>>
>> On 10/30/23 12:39, Cupertino Miranda wrote:
>>>
>>> Hi everyone,
>>>
>>> Please find a new version for the review as inline attachment.
>>>
>>> Best regards,
>>> Cupertino
>>>
>>
>> This version LGTM.
>> Thanks!
>
> OK for trunk.
Pushed !

> Thanks.
Thanks,
Cupertino

>
>>
>>>
>>> Changes from v4:
>>>  - Implemented TARGET_DELEGITIMIZE_ADDRESS target hook as the proper
>>>  solution to the the warning for UNSPEC_CORE_RELOC being
>>>  non-delegitimize.
>>>
>>
>>

[PATCH] Reduce false positives for -Wnonnull for VLA parameters [PR98541]

2023-10-31 Thread Martin Uecker



This is a revised part of previously posted patch which
I split up. C FE changes which another false positive
were already merged, but I still need approval for this
 middle-end change.  It would be nice to get this in,
because it fixes some rather annoying (for me atleast)
false positive warnings with no easy workaround.

In the following example,

int foo(int n, float matrix[n], float opt[n]);
foo(n, matrix, NULL);

GCC warns about NULL iff n > 0.  This is problematic for
several reasons:
1. It causes false positives (and I turn off -Wnonnull
in one of my projects for this reason)
2. It is inconsistent with regular arrays where there is no
warning in this case.
3. The size parameter is sometimes shared (as in this example)
so passing zero to avoid the warning is only possible by
making the code more complex.
4. Passing zero as a workaround is technically UB.


(The original author of the warning code, Martin S seemed to 
agree with this change according to this discussion in Bugzilla.)



Reduce false positives for -Wnonnull for VLA parameters [PR98541]

This patch limits the warning about NULL arguments to VLA
parameters declared [static n].

PR c/98541

gcc/
* gimple-ssa-warn-access.cc
(pass_waccess::maybe_check_access_sizes): For VLA bounds
in parameters, only warn about null pointers with 'static'.

gcc/testsuite:
* gcc.dg/Wnonnull-4: Adapt test.
* gcc.dg/Wstringop-overflow-40.c: Adapt test.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index e439d1b9b68..8b734295f09 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -3477,27 +3477,14 @@ pass_waccess::maybe_check_access_sizes (rdwr_map *rwm, 
tree fndecl, tree fntype,
 
   if (integer_zerop (ptr))
{
- if (sizidx >= 0 && tree_int_cst_sgn (sizrng[0]) > 0)
+ if (!access.second.internal_p
+ && sizidx >= 0 && tree_int_cst_sgn (sizrng[0]) > 0)
{
  /* Warn about null pointers with positive sizes.  This is
 different from also declaring the pointer argument with
 attribute nonnull when the function accepts null pointers
 only when the corresponding size is zero.  */
- if (access.second.internal_p)
-   {
- const std::string argtypestr
-   = access.second.array_as_string (ptrtype);
-
- if (warning_at (loc, OPT_Wnonnull,
- "argument %i of variable length "
- "array %s is null but "
- "the corresponding bound argument "
- "%i value is %s",
- ptridx + 1, argtypestr.c_str (),
- sizidx + 1, sizstr))
-   arg_warned = OPT_Wnonnull;
-   }
- else if (warning_at (loc, OPT_Wnonnull,
+ if (warning_at (loc, OPT_Wnonnull,
   "argument %i is null but "
   "the corresponding size argument "
   "%i value is %s",
diff --git a/gcc/testsuite/gcc.dg/Wnonnull-4.c 
b/gcc/testsuite/gcc.dg/Wnonnull-4.c
index 2c1c45a9856..1f14fbba45d 100644
--- a/gcc/testsuite/gcc.dg/Wnonnull-4.c
+++ b/gcc/testsuite/gcc.dg/Wnonnull-4.c
@@ -27,9 +27,9 @@ void test_fca_n (int r_m1)
   T (  0);
 
   // Verify positive bounds.
-  T (  1);  // { dg-warning "argument 2 of variable length array 
'char\\\[n]' is null but the corresponding bound argument 1 value is 1" }
-  T (  9);  // { dg-warning "argument 2 of variable length array 
'char\\\[n]' is null but the corresponding bound argument 1 value is 9" }
-  T (max);  // { dg-warning "argument 2 of variable length array 
'char\\\[n]' is null but the corresponding bound argument 1 value is \\d+" }
+  T (  1);  // { dg-bogus "argument 2 of variable length array 
'char\\\[n]' is null but the corresponding bound argument 1 value is 1" }
+  T (  9);  // { dg-bogus "argument 2 of variable length array 
'char\\\[n]' is null but the corresponding bound argument 1 value is 9" }
+  T (max);  // { dg-bogus "argument 2 of variable length array 
'char\\\[n]' is null but the corresponding bound argument 1 value is \\d+" }
 }
 
 
@@ -55,9 +55,9 @@ void test_fsa_x_n (int r_m1)
   T (  0);
 
   // Verify positive bounds.
-  T (  1);  // { dg-warning "argument 2 of variable length array 
'short int\\\[]\\\[n]' is null but the corresponding bound argument 1 value is 
1" }
-  T (  9);  // { dg-warning "argument 2 of variable length array 
'short int\\\[]\\\[n]' is null but the corresponding bound argument 1 value is 
9" }
-  T (max);  // { dg-warning "argument 2 of variable length array 
'short int\\\[]\\\[n]' is null

Re: [PATCH] RISC-V: Support strided load/store

2023-10-31 Thread Robin Dapp

Hi Juzhe,

LGTM once the middle-end parts are in.  Just tiny nits.
Nothing that would warrant a V2, though.

> +;; =
> +;; == Stried Load/Store

missing a 'd' here.
 
> +(define_predicate "vector_stride_extension_operand"
> +  (ior (and (match_operand 0 "immediate_operand")
> +(match_test "Pmode == DImode"))
> +   (and (match_operand 0 "const_0_operand")
> +(match_test "Pmode == SImode"
> +

This could use a comment why we allow only sign extension
for 32 bit.  Also the linter complains about spaces vs tabs.

Regards
 Robin

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-31 Thread Richard Sandiford

Robin Dapp  writes:
> Changed as suggested.  The difference to v5 is thus:
>
> +   if (cond_fn_p)
> + {
> +   gcall *call = dyn_cast (use_stmt);
> +   unsigned else_pos
> + = internal_fn_else_index (internal_fn (op.code));
> +
> +   for (unsigned int j = 0; j < gimple_call_num_args (call); ++j)
> + {
> +   if (j == else_pos)
> + continue;
> +   if (gimple_call_arg (call, j) == op.ops[opi])
> + cnt++;
> + }
> + }
> +   else if (!is_gimple_debug (op_use_stmt)
>
> as well as internal_fn_else_index.
>
> Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done,
> aarch64 and x86 still running.
>
> Regards
>  Robin
>
> From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Wed, 13 Sep 2023 22:19:35 +0200
> Subject: [PATCH v6] ifcvt/vect: Emit COND_OP for conditional scalar reduction.
>
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD/COND_OP during
> ifcvt and adjusting some vectorizer code to handle it.
>
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
>
> gcc/ChangeLog:
>
>   PR middle-end/111401
>   * internal-fn.cc (internal_fn_else_index): New function.
>   * internal-fn.h (internal_fn_else_index): Define.
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (check_reduction_path): Don't count else operand in COND_OP.
>   (vect_is_simple_reduction): Ditto.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_OP handling.
>   (vectorizable_reduction): Don't count else operand in COND_OP.
>   (vect_transform_reduction): Add COND_OP handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
>   * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
>
> ---
>  gcc/internal-fn.cc|  58 ++
>  gcc/internal-fn.h |   1 +
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 +
>  .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
>  .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
>  gcc/tree-if-conv.cc   |  49 +++--
>  gcc/tree-vect-loop.cc | 193 ++
>  gcc/tree-vectorizer.h |   2 +-
>  9 files changed, 536 insertions(+), 55 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..018175261b9 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4697,6 +4697,64 @@ internal_fn_len_index (internal_fn fn)
>  }
>  }
>  
> +int
> +internal_fn_else_index (internal_fn fn)

The function needs a comment, maybe:

/* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of the
   argument that is used when the condition is false.  Return -1 otherwise.  */

OK for the internal-fn* and tree-if-conv.cc bits (which were the
parts I commented on earlier).  I'll look at cleaning up the
definition of conditional internal functions separately, so that
the list of functions isn't necessary.

Thanks,
Richard

> +{
> +  switch (fn)
> +{
> +case IFN_COND_NEG:
> +case IFN_COND_NOT:
> +case IFN_COND_LEN_NEG:
> +case IFN_COND_LEN_NOT:
> +  return 2;
> +
> +case IFN_COND_ADD:
> +case IFN_COND_SUB:
> +case IFN_COND_MUL:
> +case IFN_COND_DIV:
> +case IFN_COND_MOD:
> +case IFN_COND_MIN:
> +case IFN_COND_MAX:
> +case IFN_COND_FMIN:
> +case IFN_COND_FMAX:
> +case IFN_COND_AND:
> +case IFN_COND_IOR:
> +case IFN_COND_XOR:
> +case IFN_COND_SHL:
> +case IFN_COND_SHR:
> +case IFN_COND_LEN_ADD:
> +case IFN_COND_LEN_SUB:
> +case IFN_COND_LEN_MUL:
> +case IFN_COND_LEN_DIV:
> +case IFN_COND_LEN_MOD:
> +case IFN_COND_LEN_MIN:
> +case IFN_COND_LEN_MAX:
> +case IFN_COND_LEN_FMIN:
> +case IFN_COND_LEN_FMAX:
> +case IFN_COND_LEN_AND:
> +case IFN_COND_LEN_IOR:
> +

[pushed] pretty-print: gracefully handle null URLs

2023-10-31 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5047-gb9e2088d297744.

gcc/ChangeLog:
* pretty-print.cc (pretty_printer::pretty_printer): Initialize
m_skipping_null_url.
(pp_begin_url): Handle URL being null.
(pp_end_url): Likewise.
(selftest::test_null_urls): New.
(selftest::pretty_print_cc_tests): Call it.
* pretty-print.h (pretty_printer::m_skipping_null_url): New.
---
 gcc/pretty-print.cc | 57 +++--
 gcc/pretty-print.h  |  4 
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 75446cc73a1..80780cfd7b8 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -1664,7 +1664,8 @@ pretty_printer::pretty_printer (int maximum_length)
 need_newline (),
 translate_identifiers (true),
 show_color (),
-url_format (URL_FORMAT_NONE)
+url_format (URL_FORMAT_NONE),
+m_skipping_null_url (false)
 {
   pp_line_cutoff (this) = maximum_length;
   /* By default, we emit prefixes once per message.  */
@@ -1687,7 +1688,8 @@ pretty_printer::pretty_printer (const pretty_printer 
&other)
   need_newline (other.need_newline),
   translate_identifiers (other.translate_identifiers),
   show_color (other.show_color),
-  url_format (other.url_format)
+  url_format (other.url_format),
+  m_skipping_null_url (false)
 {
   pp_line_cutoff (this) = maximum_length;
   /* By default, we emit prefixes once per message.  */
@@ -2211,6 +2213,13 @@ identifier_to_locale (const char *ident)
 void
 pp_begin_url (pretty_printer *pp, const char *url)
 {
+  if (!url)
+{
+  /* Handle null URL by skipping all output here,
+and in the next pp_end_url.  */
+  pp->m_skipping_null_url = true;
+  return;
+}
   switch (pp->url_format)
 {
 case URL_FORMAT_NONE:
@@ -2254,6 +2263,13 @@ get_end_url_string (pretty_printer *pp)
 void
 pp_end_url (pretty_printer *pp)
 {
+  if (pp->m_skipping_null_url)
+{
+  /* We gracefully handle pp_begin_url (NULL) by omitting output for
+both begin and end.  Here we handle the latter.  */
+  pp->m_skipping_null_url = false;
+  return;
+}
   if (pp->url_format != URL_FORMAT_NONE)
 pp_string (pp, get_end_url_string (pp));
 }
@@ -2588,6 +2604,42 @@ test_urls ()
   }
 }
 
+/* Verify that we gracefully reject null URLs.  */
+
+void
+test_null_urls ()
+{
+  {
+pretty_printer pp;
+pp.url_format = URL_FORMAT_NONE;
+pp_begin_url (&pp, nullptr);
+pp_string (&pp, "This isn't a link");
+pp_end_url (&pp);
+ASSERT_STREQ ("This isn't a link",
+ pp_formatted_text (&pp));
+  }
+
+  {
+pretty_printer pp;
+pp.url_format = URL_FORMAT_ST;
+pp_begin_url (&pp, nullptr);
+pp_string (&pp, "This isn't a link");
+pp_end_url (&pp);
+ASSERT_STREQ ("This isn't a link",
+ pp_formatted_text (&pp));
+  }
+
+  {
+pretty_printer pp;
+pp.url_format = URL_FORMAT_BEL;
+pp_begin_url (&pp, nullptr);
+pp_string (&pp, "This isn't a link");
+pp_end_url (&pp);
+ASSERT_STREQ ("This isn't a link",
+ pp_formatted_text (&pp));
+  }
+}
+
 /* Test multibyte awareness.  */
 static void test_utf8 ()
 {
@@ -2637,6 +2689,7 @@ pretty_print_cc_tests ()
   test_pp_format ();
   test_prefixes_and_wrapping ();
   test_urls ();
+  test_null_urls ();
   test_utf8 ();
 }
 
diff --git a/gcc/pretty-print.h b/gcc/pretty-print.h
index 02658c8afad..8759f0def38 100644
--- a/gcc/pretty-print.h
+++ b/gcc/pretty-print.h
@@ -295,6 +295,10 @@ public:
 
   /* Whether URLs should be emitted, and which terminator to use.  */
   diagnostic_url_format url_format;
+
+  /* If true, then we've had a pp_begin_url (nullptr), and so the
+ next pp_end_url should be a no-op.  */
+  bool m_skipping_null_url;
 };
 
 inline const char *
-- 
2.26.3

[pushed] opts.cc: fix comment about DOCUMENTATION_ROOT_URL

2023-10-31 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5048-g8b4ac021cd1f63.

gcc/ChangeLog:
* opts.cc (get_option_url): Update comment; the requirement to
pass DOCUMENTATION_ROOT_URL's value via -D was removed in
r10-8065-ge33a1eae25b8a8.
---
 gcc/opts.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/opts.cc b/gcc/opts.cc
index 8015cb7556a..f54cf8305ca 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -3679,9 +3679,9 @@ char *
 get_option_url (diagnostic_context *, int option_index)
 {
   if (option_index)
-return concat (/* DOCUMENTATION_ROOT_URL should be supplied via -D by
- the Makefile (see --with-documentation-root-url), and
- should have a trailing slash.  */
+return concat (/* DOCUMENTATION_ROOT_URL should be supplied via
+ #include "config.h" (see --with-documentation-root-url),
+ and should have a trailing slash.  */
   DOCUMENTATION_ROOT_URL,
 
   /* get_option_html_page will return something like
-- 
2.26.3

[pushed] libcpp: eliminate MACRO_MAP_EXPANSION_POINT_LOCATION

2023-10-31 Thread David Malcolm

This patch eliminates the function "MACRO_MAP_EXPANSION_POINT_LOCATION"
(which hasn't been a macro since r6-739-g0501dbd932a7e9) in favor of
a new line_map_macro::get_expansion_point_location accessor.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5049-gb0f19336f247c6.

gcc/c-family/ChangeLog:
* c-warn.cc (warn_for_multistatement_macros): Update for removal
of MACRO_MAP_EXPANSION_POINT_LOCATION.

gcc/cp/ChangeLog:
* module.cc (ordinary_loc_of): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
(module_state::note_location): Update for renaming of field.
(module_state::write_macro_maps): Likewise.

gcc/ChangeLog:
* input.cc (dump_location_info): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc):
Likewise.

libcpp/ChangeLog:
* include/line-map.h
(line_map_macro::get_expansion_point_location): New accessor.
(line_map_macro::expansion): Rename field to...
(line_map_macro::mexpansion): Rename field to...
(MACRO_MAP_EXPANSION_POINT_LOCATION): Delete this function.
* line-map.cc (linemap_enter_macro): Update for renaming of field.
(linemap_macro_map_loc_to_exp_point): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
---
 gcc/c-family/c-warn.cc|  2 +-
 gcc/cp/module.cc  |  6 +++---
 gcc/input.cc  |  4 ++--
 gcc/tree-diagnostic.cc|  2 +-
 libcpp/include/line-map.h | 19 ++-
 libcpp/line-map.cc|  4 ++--
 6 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index 9ab83a9a84a..bc889cee6b9 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -2951,7 +2951,7 @@ warn_for_multistatement_macros (location_t body_loc, 
location_t next_loc,
   while (linemap_macro_expansion_map_p (guard_map))
 {
   const line_map_macro *mm = linemap_check_macro (guard_map);
-  guard_loc_exp = MACRO_MAP_EXPANSION_POINT_LOCATION (mm);
+  guard_loc_exp = mm->get_expansion_point_location ();
   guard_map = linemap_lookup (line_table, guard_loc_exp);
   if (guard_map == body_map)
return;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 539518d7923..c1c8c226bc1 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13937,7 +13937,7 @@ ordinary_loc_of (line_maps *lmaps, location_t from)
  /* Find the ordinary location nearest FROM.  */
  const line_map *map = linemap_lookup (lmaps, from);
  const line_map_macro *mac_map = linemap_check_macro (map);
- from = MACRO_MAP_EXPANSION_POINT_LOCATION (mac_map);
+ from = mac_map->get_expansion_point_location ();
}
 }
   return from;
@@ -15779,7 +15779,7 @@ module_state::note_location (location_t loc)
  slot->remap = 0;
  // Expansion locations could themselves be from a
  // macro, we need to note them all.
- note_location (mac_map->expansion);
+ note_location (mac_map->m_expansion);
  gcc_checking_assert (mac_map->n_tokens);
  location_t tloc = UNKNOWN_LOCATION;
  for (unsigned ix = mac_map->n_tokens * 2; ix--;)
@@ -16375,7 +16375,7 @@ module_state::write_macro_maps (elf_out *to, range_t 
&info, unsigned *crc_p)
   sec.u (iter->remap);
   sec.u (mac->n_tokens);
   sec.cpp_node (mac->macro);
-  write_location (sec, mac->expansion);
+  write_location (sec, mac->m_expansion);
   const location_t *locs = mac->macro_locations;
   /* There are lots of identical runs.  */
   location_t prev = UNKNOWN_LOCATION;
diff --git a/gcc/input.cc b/gcc/input.cc
index fd09fccb0e3..6256d81f531 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -1530,9 +1530,9 @@ dump_location_info (FILE *stream)
   map->start_location,
   (map->start_location
+ MACRO_MAP_NUM_MACRO_TOKENS (map)));
-  inform (MACRO_MAP_EXPANSION_POINT_LOCATION (map),
+  inform (map->get_expansion_point_location (),
  "expansion point is location %i",
- MACRO_MAP_EXPANSION_POINT_LOCATION (map));
+ map->get_expansion_point_location ());
   fprintf (stream, "  map->start_location: %u\n",
   map->start_location);
 
diff --git a/gcc/tree-diagnostic.cc b/gcc/tree-diagnostic.cc
index a600f0e9f64..cae400cf372 100644
--- a/gcc/tree-diagnostic.cc
+++ b/gcc/tree-diagnostic.cc
@@ -217,7 +217,7 @@ maybe_unwind_expanded_macro_loc (diagnostic_context 
*context,
This is the locus 2/ of the earlier comment.  */
 location_t resolved_exp_loc =
   linemap_resolve_location (line_table,
-MACRO_MAP_EXPANSION_POINT_LOCATION 
(iter->map),
+

[pushed] analyzer: move class record_layout to its own .h/.cc

2023-10-31 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5050-g37e1634ef1a6f1.

gcc/ChangeLog:
* Makefile.in (ANALYZER_OBJS): Add analyzer/record-layout.o.

gcc/analyzer/ChangeLog:
* record-layout.cc: New file, based on material in region-model.cc.
* record-layout.h: Likewise.
* region-model.cc: Include "analyzer/record-layout.h".
(class record_layout): Move to record-layout.cc and .h
---
 gcc/Makefile.in   |   1 +
 gcc/analyzer/record-layout.cc | 125 
 gcc/analyzer/record-layout.h  |  91 +++
 gcc/analyzer/region-model.cc  | 132 +-
 4 files changed, 218 insertions(+), 131 deletions(-)
 create mode 100644 gcc/analyzer/record-layout.cc
 create mode 100644 gcc/analyzer/record-layout.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 91d6bfbea4d..41ed8163cd8 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1324,6 +1324,7 @@ ANALYZER_OBJS = \
analyzer/program-point.o \
analyzer/program-state.o \
analyzer/ranges.o \
+   analyzer/record-layout.o \
analyzer/region.o \
analyzer/region-model.o \
analyzer/region-model-asm.o \
diff --git a/gcc/analyzer/record-layout.cc b/gcc/analyzer/record-layout.cc
new file mode 100644
index 000..1369bfb5eff
--- /dev/null
+++ b/gcc/analyzer/record-layout.cc
@@ -0,0 +1,125 @@
+/* Implementation of class record_layout.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#define INCLUDE_MEMORY
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "diagnostic.h"
+#include "tree-diagnostic.h"
+#include "analyzer/analyzer.h"
+#include "analyzer/record-layout.h"
+
+#if ENABLE_ANALYZER
+
+namespace ana {
+
+/* class record_layout.  */
+
+record_layout::record_layout (tree record_type)
+{
+  gcc_assert (TREE_CODE (record_type) == RECORD_TYPE);
+
+  for (tree iter = TYPE_FIELDS (record_type); iter != NULL_TREE;
+   iter = DECL_CHAIN (iter))
+{
+  if (TREE_CODE (iter) == FIELD_DECL)
+   {
+ int iter_field_offset = int_bit_position (iter);
+ bit_size_t size_in_bits;
+ if (!int_size_in_bits (TREE_TYPE (iter), &size_in_bits))
+   size_in_bits = 0;
+
+ maybe_pad_to (iter_field_offset);
+
+ /* Add field.  */
+ m_items.safe_push (item (bit_range (iter_field_offset,
+ size_in_bits),
+  iter, false));
+   }
+}
+
+  /* Add any trailing padding.  */
+  bit_size_t size_in_bits;
+  if (int_size_in_bits (record_type, &size_in_bits))
+maybe_pad_to (size_in_bits);
+}
+
+void
+record_layout::dump_to_pp (pretty_printer *pp) const
+{
+  unsigned i;
+  item *it;
+  FOR_EACH_VEC_ELT (m_items, i, it)
+{
+  it->dump_to_pp (pp);
+  pp_newline (pp);
+}
+}
+
+void
+record_layout::dump () const
+{
+  pretty_printer pp;
+  pp_format_decoder (&pp) = default_tree_printer;
+  pp.buffer->stream = stderr;
+  dump_to_pp (&pp);
+  pp_flush (&pp);
+}
+
+const record_layout::item *
+record_layout::get_item_at (bit_offset_t offset) const
+{
+  unsigned i;
+  item *it;
+  FOR_EACH_VEC_ELT (m_items, i, it)
+if (it->contains_p (offset))
+  return it;
+  return NULL;
+}
+
+/* Subroutine of ctor.  Add padding item to NEXT_OFFSET if necessary.  */
+
+void
+record_layout::maybe_pad_to (bit_offset_t next_offset)
+{
+  if (m_items.length () > 0)
+{
+  const item &last_item = m_items[m_items.length () - 1];
+  bit_offset_t offset_after_last_item
+   = last_item.get_next_bit_offset ();
+  if (next_offset > offset_after_last_item)
+   {
+ bit_size_t padding_size
+   = next_offset - offset_after_last_item;
+ m_items.safe_push (item (bit_range (offset_after_last_item,
+ padding_size),
+  last_item.m_field, true));
+   }
+}
+}
+
+} // namespace ana
+
+#endif /* #if ENABLE_ANALYZER  */
diff --git a/gcc/analyzer/record-layout.h b/gcc/analyzer/record-layout.h
new file mode 100644
index

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-31 Thread Robin Dapp

>> +int
>> +internal_fn_else_index (internal_fn fn)
> 
> The function needs a comment, maybe:
> 
> /* If FN is an IFN_COND_* or IFN_COND_LEN_* function, return the index of the
>argument that is used when the condition is false.  Return -1 otherwise.  
> */
> 
> OK for the internal-fn* and tree-if-conv.cc bits (which were the
> parts I commented on earlier).  I'll look at cleaning up the
> definition of conditional internal functions separately, so that
> the list of functions isn't necessary.

Thank you, added the comment (shouldn't have forgotten it in the
first place...).  So there's the vectorizer part left that is not
yet OK'd.  

Regards
 Robin

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-10-31 Thread Joseph Myers

On Tue, 31 Oct 2023, Qing Zhao wrote:

> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>   struct A
>   {
>size_t size;
>char buf[] __attribute__((counted_by(size)));
>   };
> 
> for any object with such type:
> 
>   struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The setting to the size field should be done before the first reference 
> to the FAM field.
> 
> Such requirement to the user will guarantee that the first reference to 
> the FAM knows the size of the FAM.
> 
> We need to add this additional requirement to the user document.

Make sure the manual is very specific about exactly when size is 
considered to be an accurate representation of the space available for buf 
(given that, after malloc or realloc, it's going to be temporarily 
inaccurate).  If the intent is that inaccurate size at such a time means 
undefined behavior, say so explicitly.

> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small example,
>   check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>   if YES, replace the reference to "obj->buf" with a call to
>   .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 

This seems plausible - but you should also consider the case of static 
initializers - remember the GNU extension for statically allocated objects 
with flexible array members (unless you're not allowing it with 
counted_by).

static struct A x = { sizeof "hello", "hello" };
static char *y = &x.buf;

I'd expect that to be valid - and unless you say such a usage is invalid, 
you should avoid the replacement in such a static initializer context when 
the FAM reference is to an object with a constant address (if 
.ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
expression; if it works fine as a constant-address lvalue, then the 
replacement would be OK).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PING] [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-10-31 Thread Joseph Myers

On Tue, 31 Oct 2023, Martin Uecker wrote:

> > + if (TREE_CODE (arg) == INTEGER_CST
> > + && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))

What if TYPE_SIZE_UNIT (ttl) is not an INTEGER_CST?  I don't see any tests 
of the case of assigning to a pointer to a variably sized type.

-- 
Joseph S. Myers
jos...@codesourcery.com

[RFC] Make genautomata.cc output reflect insn-attr.h expectation:

2023-10-31 Thread Edwin Lu

genattr.cc currently generates insn-attr.h with the following structure:

#if CPU_UNITS_QUERY
extern int get_cpu_unit_code (const char *);
extern int cpu_unit_reservation_p (state_t, int);
#endif
extern bool insn_has_dfa_reservation_p (rtx_insn *);

however genautomata.cc generates insn-automata.cc with the following structure:
#if CPU_UNITS_QUERY
int get_cpu_unit_code (const char * ) { ... }
int cpu_unit_reservation_p (state_t, int) { ... }
bool insn_has_dfa_reservation_p (rtx_insn *) { ... }
#endif

I'm not sure if insn_has_dfa_reservation_p is supposed to be a part of the 
CPU_UNITS_QUERY conditional group or not. For consistency, I would like to 
move it outside of the group. 

This would move insn_has_dfa_reservation_p out of the #if CPU_UNITS_QUERY 
conditional inside of insn-automata.cc. This would allow us to see if the 
scheduler is trying to schedule an insn with a type which is not associated 
with a cpu unit or insn reservation through the TARGET_SCHED_VARIABLE_ISSUE 
hook.

If there is a reason for insn_has_dfa_reservation_p being within the 
conditional, please let me know!

gcc/Changelog:

* genautomata.cc (write_automata): move endif

Signed-off-by: Edwin Lu 
---
 gcc/genautomata.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc
index 72f01686d6b..9dda25e5ba2 100644
--- a/gcc/genautomata.cc
+++ b/gcc/genautomata.cc
@@ -9503,9 +9503,9 @@ write_automata (void)
   fprintf (output_file, "\n#if %s\n\n", CPU_UNITS_QUERY_MACRO_NAME);
   output_get_cpu_unit_code_func ();
   output_cpu_unit_reservation_p ();
-  output_insn_has_dfa_reservation_p ();
   fprintf (output_file, "\n#endif /* #if %s */\n\n",
   CPU_UNITS_QUERY_MACRO_NAME);
+  output_insn_has_dfa_reservation_p ();
   output_dfa_clean_insn_cache_func ();
   output_dfa_start_func ();
   output_dfa_finish_func ();
-- 
2.34.1

Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Jeff Law





On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.
Hmm.  There may be dragons in here.  I'll need to find and review an old 
conversation in this space (libcalls and argument promotions).


Jeff

[PATCH v2] RISC-V: Enable ztso tests on rv32

2023-10-31 Thread Patrick O'Neill

This patch transitions the ztso testcases to use the testsuite infrastructure,
enabling the tests on both rv64 and rv32 targets.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Ztso extension to
dg-options for dg-do compile.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
* lib/target-supports.exp: Add testing infrastructure to require the
Ztso extension or add it to an existing -march.

Signed-off-by: Patrick O'Neill 
---
Before committing v1, I ran the full testsuite as a sanity check and found
failures that don't happen when running the testcases individually. v2 fixes
those failures using common-sense fixes.

Changelog:
v1 -> v2:
target-supports.exp
 - Fix typo `riscv_ext_a` -> `riscv_ext_ztso`
 - Add ztso to `check_effective_target_riscv_zvfh_ok`
---
 .../riscv/amo-table-ztso-amo-add-1.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-2.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-3.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-4.c  |  3 ++-
 .../riscv/amo-table-ztso-amo-add-5.c  |  3 ++-
 .../riscv/amo-table-ztso-compare-exchange-1.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-2.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-3.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-4.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-5.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-6.c |  2 +-
 .../riscv/amo-table-ztso-compare-exchange-7.c |  2 +-
 .../gcc.target/riscv/amo-table-ztso-fence-1.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-2.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-3.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-4.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-fence-5.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-load-1.c  |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-load-2.c  |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-load-3.c  |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-store-1.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-store-2.c |  3 ++-
 .../gcc.target/riscv/amo-table-ztso-store-3.c |  3 ++-
 .../riscv/amo-table-ztso-subword-amo-add-1.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-2.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-3.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-4.c  |  2 +-
 .../riscv/amo-table-ztso-subword-amo-add-5.c  |  2 +-
 gcc/testsuite/lib/target-supports.exp | 25 ++-
 29 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
index a88d08eb3f4..65a4351025d 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
-/* { dg-options "-march=rv64id_ztso -mabi=lp64d -O3" } */
+/* { dg-options "-O3" } */
+/* { dg-add-options riscv_ztso } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
index ebd24

RE: [PATCH v4] VECT: Refine the type size restriction of call vectorizer

2023-10-31 Thread Li, Pan2

The below test are passed for this patch.

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.
* Ensure the lrintf standard name in RVV.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, October 31, 2023 11:10 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Li, Pan2 ; Wang, Yanzhang 
; kito.ch...@gmail.com; Liu, Hongtao 
; richard.guent...@gmail.com
Subject: [PATCH v4] VECT: Refine the type size restriction of call vectorizer

From: Pan Li 

Update in v4:

* Append the check to vectorizable_internal_function.

Update in v3:

* Add func to predicate type size is legal or not for vectorizer call.

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.

The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.

The below test are passed for this patch.

* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.

The below test are ongoing.

* The x86 bootstrap and regression test.
* The aarch64 regression test.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_internal_function): Add type
size check for vectype_out doesn't participating for optab query.
(vectorizable_call): Remove the type size check.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-stmts.cc | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..799b4ab10c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree 
fndecl,
   const direct_internal_fn_info &info = direct_internal_fn (ifn);
   if (info.vectorizable)
{
+ bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+
+ /* The type size of both the vectype_in and vectype_out should be
+exactly the same when vectype_out isn't participating the optab.
+While there is no restriction for type size when vectype_out
+is part of the optab query.  */
+ if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
+   return IFN_LAST;
+
  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
  OPTIMIZE_FOR_SPEED))
return ifn;
@@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
 
   return false;
 }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
- just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
- are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
- by a pack of the two vectors into an SI vector.  We would need
- separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"mismatched vector sizes %T and %T\n",
-vectype_in, vectype_out);
-  return false;
-}
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
   != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1

Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Palmer Dabbelt


On Tue, 31 Oct 2023 16:18:35 PDT (-0700), jeffreya...@gmail.com wrote:



On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.

Hmm.  There may be dragons in here.  I'll need to find and review an old
conversation in this space (libcalls and argument promotions).


We also have a non-orthogonality in the ABI sign extension rules between 
SI and DI, a few of us were talking about it on the internal slack 
(though the specifics were for a different patch, Vineet has a few in 
flight).

Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-31 Thread Vineet Gupta





On 10/30/23 16:21, Vineet Gupta wrote:
I don't guess you have data on how this impacts dynamic instruction 
counts on anything significant do you?


No, haven't run it yet. I can fire one though. I doubt if this is as 
significant as the prev one, even if this is the right thing to do. 


Very very small improvement overall

49,318,030,233,258 (w/o patch)
49,318,000,693,233 (W/ patch)

Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-31 Thread Vineet Gupta


On 10/30/23 13:33, Jeff Law wrote:



+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0
+   just peel off the SUBREG to get DI, avoiding extraneous 
extension.  */

+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode


So I may have been partially wrong about v2 patch being wrong and 
needing this fixup ;-) [1]


It seems we don't have to limit this to SImode. I re-read the calling 
convention doc [2] and it says this


"When passed in registers or on the stack, integer scalars narrower than XLEN
 bits are widened according to the sign of their type up to 32 bits, then
 sign-extended to XLEN bits."

This essentially means signed short, and signed char will be already 
sign-extended at caller site and need not be done in callee: Palmer 
mention in internal slack that unadorned char is unsigned on RISC-V 
hence we don't see the compiler extra work for say 
gcc.dg/torture/pr75964.c. If the test is however tweaked to use signed 
chars (or short), it seems caller is doing the work (adjusting the 
constant being passed to be a sign-extended variant).


This further validates Jeff's comment about checking for 
SUBREG_PROMOTED_SIGNED_P (it was anyhow the right thing to begin with 
anyways).


At this point I feel like I'm into splitting hairs (in vain) territory, 
as fixing this might not matter much in practice 


I'd suppose we go ahead with the v3 with changes Jeff asked for and 
maybe do a later fixup to relax SI.



+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode))
+    *op = XEXP (*op, 0);
+  else
+    *op = gen_rtx_SIGN_EXTEND (word_mode, *op);
So for the wrapped test GET_MODE_SIZE stuff), add parenthesis and 
indent the "==" clause.  ie


  && (GET_MODE_SIZE (GET_MODE (XEXP (*op), 0))).to_constant ()
  == GET_MODE_SIZE (word_mode))

Don't you also need to verify that the subreg was sign extended? The 
PROMOTED_VAR_P just notes that it was promoted, not *how* it was 
promoted.  I think you just need to add a test like this:


  && SUBREG_PROMOTED_SIGNED_P (*op)


[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634327.html
[2] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc

Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-31 Thread Jeff Law





On 10/31/23 18:05, Vineet Gupta wrote:

On 10/30/23 13:33, Jeff Law wrote:



+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0
+   just peel off the SUBREG to get DI, avoiding extraneous 
extension.  */

+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode


So I may have been partially wrong about v2 patch being wrong and 
needing this fixup ;-) [1]


It seems we don't have to limit this to SImode. I re-read the calling 
convention doc [2] and it says this


"When passed in registers or on the stack, integer scalars narrower than 
XLEN

  bits are widened according to the sign of their type up to 32 bits, then
  sign-extended to XLEN bits."

This essentially means signed short, and signed char will be already 
sign-extended at caller site and need not be done in callee: Palmer 
mention in internal slack that unadorned char is unsigned on RISC-V 
hence we don't see the compiler extra work for say 
gcc.dg/torture/pr75964.c. If the test is however tweaked to use signed 
chars (or short), it seems caller is doing the work (adjusting the 
constant being passed to be a sign-extended variant).


This further validates Jeff's comment about checking for 
SUBREG_PROMOTED_SIGNED_P (it was anyhow the right thing to begin with 
anyways).


At this point I feel like I'm into splitting hairs (in vain) territory, 
as fixing this might not matter much in practice ...


I'd suppose we go ahead with the v3 with changes Jeff asked for and 
maybe do a later fixup to relax SI.
Consider any such fixup pre-approved.  I was thinking that the 8/16 bit 
sub-objects should probably be extended one way or another.  It wouldn't 
make much sense not to.


Not as much of a win as I'd hoped.  But I'll take it

jeff

[Committed] NFC: Fix whitespace

2023-10-31 Thread Juzhe-Zhong

Notice there is a whitspace issue in previous commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f66b2fc122b8a17591afbb881d580b32e8ddb708

Sorry for missing fixing this whitespace.

Committed as it is obvious.

gcc/ChangeLog:

* tree-vect-slp.cc (vect_build_slp_tree_1): Fix whitespace.

---
 gcc/tree-vect-slp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b78133f204f..43d742e3c92 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1427,7 +1427,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
  && rhs_code != CFN_GATHER_LOAD
  && rhs_code != CFN_MASK_GATHER_LOAD
-   && rhs_code != CFN_MASK_LEN_GATHER_LOAD
+ && rhs_code != CFN_MASK_LEN_GATHER_LOAD
  && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)
  /* Not grouped loads are handled as externals for BB
 vectorization.  For loop vectorization we can handle
-- 
2.36.3

Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Jeff Law





On 10/31/23 17:41, Palmer Dabbelt wrote:

On Tue, 31 Oct 2023 16:18:35 PDT (-0700), jeffreya...@gmail.com wrote:



On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code 
path.


[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.

Hmm.  There may be dragons in here.  I'll need to find and review an old
conversation in this space (libcalls and argument promotions).


We also have a non-orthogonality in the ABI sign extension rules between 
SI and DI, a few of us were talking about it on the internal slack 
(though the specifics were for a different patch, Vineet has a few in 
flight).
So the old issue I was thinking of really only affects targets that push 
arguments on the stack and when a sub-word push actually allocates a 
full word on the stack (m68k, but !coldfire, h8 and probably others of 
that era).


Point being, those issues don't apply here.

jeff

Re: [PATCH 0/4] Fix no-evex512 function attribute

2023-10-31 Thread Hongtao Liu

On Tue, Oct 31, 2023 at 2:39 PM Haochen Jiang  wrote:
>
> Hi all,
>
> These four patches are going to fix no-evex512 function attribute. The detail
> of the issue comes following:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889
>
> My proposal for this problem is to also push "no-evex512" when defining
> 128/256 intrins in AVX512.
>
> Besides, I added some new intrins to support the current AVX512 intrins.
> The newly added  _mm{,256}_avx512* intrins are duplicated from their
> _mm{,256}_* forms from AVX2 or before. We need to add them to prevent target
> option mismatch when calling AVX512 intrins implemented with these intrins
> under no-evex512 function attribute. All AVX512 intrins calling those AVX2
> intrins or before will change their calls to these newly added AVX512 version.
>
> This will solve the problem when we are using no-evex512 attribute with
> AVX512 related intrins. But it will not solve target option mismatch when we
> are calling AVX2 intrins or before with no-evex512 function attribute since as
> mentioned in PR111889, it actually comes from a legacy issue. Therefore, we
> are not expecting that usage.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok, but please wait for 2 more days in case other folks have any comments.
>
> Thx,
> Haochen
>
>


-- 
BR,
Hongtao

Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Vineet Gupta





On 10/31/23 17:51, Jeff Law wrote:




We also have a non-orthogonality in the ABI sign extension rules 
between SI and DI, a few of us were talking about it on the internal 
slack (though the specifics were for a different patch, Vineet has a 
few in flight).
So the old issue I was thinking of really only affects targets that 
push arguments on the stack and when a sub-word push actually 
allocates a full word on the stack (m68k, but !coldfire, h8 and 
probably others of that era).


Point being, those issues don't apply here.


OK, I think Palmer was conflating this with the discussion in other 
thread/patch.


-Vineet

[Commit Pending V2] RISC-V: Support strided load/store

2023-10-31 Thread Juzhe-Zhong

This patch is depending on middle-end patches which are under review.

I will commit it after middle-end patches are approved.

Consider this following case:
void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
for (int i = 0; i < n; i++)
  a[i*stride] = b[i*stride] + 100;
}

Before this patch:

sllia6,a2,2
vid.v   v1
vmul.vx v1,v1,a2
vsetvli zero,zero,e64,m2,ta,ma
vsext.vf2   v4,v1
vsll.vi v4,v4,2
.L4:
vsetvli a5,a3,e32,m1,ta,ma
mul a4,a6,a5
vluxei64.v  v1,(a1),v4
sub a3,a3,a5
vadd.vv v1,v1,v2
vsuxei64.v  v1,(a0),v4
add a1,a1,a4
add a0,a0,a4
bne a3,zero,.L4
ret

After this patch:

sllia6,a2,2
mv  a4,a6
.L4:
vsetvli a5,a3,e32,m1,ta,ma
mul a2,a6,a5
vlse32.vv1,0(a1),a4
sub a3,a3,a5
vadd.vv v1,v1,v2
vsse32.vv1,0(a0),a4
add a1,a1,a2
add a0,a0,a2
bne a3,zero,.L4
ret

gcc/ChangeLog:

* config/riscv/autovec.md (mask_len_strided_load): 
New pattern.
(mask_len_strided_store): Ditto.
* config/riscv/predicates.md (vector_stride_extension_operand): New 
predicate.
* config/riscv/riscv-protos.h (expand_strided_load_store): New function.
* config/riscv/riscv-v.cc (expand_strided_load_store): Ditto.
* config/riscv/vector-iterators.md: New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: Adapt 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c: 
New test.
* 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c: 
New test.
* 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c: New 
test.

---
 gcc/config/riscv/autovec.md   | 34 +++
 gcc/config/riscv/predicates.md|  9 ++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 76 +++
 gcc/config/riscv/vector-iterators.md  |  5 +
 .../gather-scatter/mask_strided_load-1.c  | 47 +
 .../gather-scatter/mask_strided_load_run-1.c  | 97 +++
 .../gather-scatter/mask_strided_store-1.c | 48 +
 .../gather-scatter/mask_strided_store_run-1.c | 89 +
 .../autovec/gather-scatter/strided_load-1.c   |  2 +-
 .../autovec/gather-scatter/strided_load-2.c   |  2 +-
 .../autovec/gather-scatter/strided_load-3.c   | 45 +
 .../gather-scatter/strided_load_run-3.c   | 84 
 .../autovec/gather-scatter/strided_store-1.c  |  2 +-
 .../autovec/gather-scatter/strided_store-2.c  |  2 +-
 .../autovec/gather-scatter/strided_store-3.c  | 45 +
 16 files changed, 584 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f5e3e347ace..3e4493c42cc 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -272,6 +272,40 @@
   DONE;
 })
 
+;; =
+;; == Strided Load/Store
+;; =
+
+(define_expand "mask_len_strided_load"
+  [(match_operand:V 0 "register_operand")
+   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:ANYI 2 "register_operand")
+   (match_operand 3 "")
+   (match_operand 4 "")
+   (match_operand: 5 "vector_mask_operand")
+   (match_operand 6 "autovec_length_operand")
+   (match_operand 7 "const_0_operand")]
+  "TARG

[PATCH] RISC-V: Use riscv_subword_address for atomic_test_and_set

2023-10-31 Thread Patrick O'Neill

Other subword atomic patterns use riscv_subword_address to calculate
the aligned address, shift amount, mask and !mask. atomic_test_and_set
was implemented before the common function was added. After this patch
all subword atomic patterns use riscv_subword_address.

gcc/ChangeLog:

* config/riscv/sync.md:  Use riscv_subword_address function to
calculate the address and shift in atomic_test_and_set.

Signed-off-by: Patrick O'Neill 
---
Tested using r14-5040-g5dc2ba333f8.

This patch causes this codegen to regress (adds a mv) but *only* on -O0.

extern void abort();

short x;

int main()
{
  if ( __atomic_test_and_set(&x, __ATOMIC_SEQ_CST))
abort();
}

Baseline:

main:
addisp,sp,-16
sd  ra,8(sp)
sd  s0,0(sp)
addis0,sp,16
lui a5,%hi(x)
addia5,a5,%lo(x)
andia4,a5,-4
andia5,a5,3
li  a3,1
slliw   a5,a5,3
sllwa2,a3,a5
amoor.w.aqrla3,a2,0(a4)
srlwa5,a3,a5
andia5,a5,0xff
beq a5,zero,.L2
callabort
.L2:
li  a5,0
mv  a0,a5
ld  ra,8(sp)
ld  s0,0(sp)
addisp,sp,16
jr  ra

After patch there is an additional mv:

main:
addisp,sp,-16
sd  ra,8(sp)
sd  s0,0(sp)
addis0,sp,16
lui a5,%hi(x)
addia5,a5,%lo(x)
andia3,a5,-4
andia5,a5,3
slliw   a5,a5,3
li  a4,1
sllwa2,a4,a5
amoor.w.aqrla4,a2,0(a3)
srawa4,a4,a5
>   mv  a5,a4
andia5,a5,0xff
beq a5,zero,.L2
callabort
.L2:
li  a5,0
mv  a0,a5
ld  ra,8(sp)
ld  s0,0(sp)
addisp,sp,16
jr  ra

This can be fixed using:
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index ad4751febd2..a9539977321 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -530,10 +530,9 @@

   emit_insn (gen_atomic_fetch_orsi (old, aligned_mem, shifted_set, model));

-  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
-gen_lowpart (QImode, shift)));
-
-  emit_move_insn (operands[0], gen_lowpart (QImode, old));
+  emit_move_insn (gen_lowpart (SImode, operands[0]),
+ gen_rtx_ASHIFTRT (SImode, old,
+   gen_lowpart (QImode, shift)));

   DONE;
 })

But I think it hurts read/grokability of the .md sequence. If it's worth
changing for -O0 generated sequences, let me know and I'll send a follow
up patch.
---
 gcc/config/riscv/sync.md | 41 +---
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 6ff3493b5ce..ad4751febd2 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -504,43 +504,36 @@
(set (attr "length") (const_int 28))])

 (define_expand "atomic_test_and_set"
-  [(match_operand:QI 0 "register_operand" "") ;; bool output
+  [(match_operand:QI 0 "register_operand" "");; bool output
(match_operand:QI 1 "memory_operand" "+A");; memory
-   (match_operand:SI 2 "const_int_operand" "")]   ;; model
+   (match_operand:SI 2 "const_int_operand" "")]  ;; model
   "TARGET_ATOMIC"
 {
   /* We have no QImode atomics, so use the address LSBs to form a mask,
  then use an aligned SImode atomic.  */
-  rtx result = operands[0];
+  rtx old = gen_reg_rtx (SImode);
   rtx mem = operands[1];
   rtx model = operands[2];
-  rtx addr = force_reg (Pmode, XEXP (mem, 0));
-
-  rtx aligned_addr = gen_reg_rtx (Pmode);
-  emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, GEN_INT (-4)));
+  rtx set = gen_reg_rtx (QImode);
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);

-  rtx aligned_mem = change_address (mem, SImode, aligned_addr);
-  set_mem_alias_set (aligned_mem, 0);
+  /* Unused.  */
+  rtx _mask = gen_reg_rtx (SImode);
+  rtx _not_mask = gen_reg_rtx (SImode);

-  rtx offset = gen_reg_rtx (SImode);
-  emit_move_insn (offset, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
-  GEN_INT (3)));
+  riscv_subword_address (mem, &aligned_mem, &shift, &_mask, &_not_mask);

-  rtx tmp = gen_reg_rtx (SImode);
-  emit_move_insn (tmp, GEN_INT (1));
+  emit_move_insn (set, GEN_INT (1));
+  rtx shifted_set = gen_reg_rtx (SImode);
+  riscv_lshift_subword (QImode, set, shift, &shifted_set);

-  rtx shmt = gen_reg_rtx (SImode);
-  emit_move_insn (shmt, gen_rtx_ASHIFT (SImode, offset, GEN_INT (3)));
+  emit_insn (gen_atomic_fetch_orsi (old, aligned_mem, shifted_set, model));

-  rtx word = gen_reg_rtx (SImode);
-  emit_move_insn (word, gen_rtx_ASHIFT (SImode, tmp,
-   gen_lowpart (QImode, shmt)));
+  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode,

Re: [PATCH] RISC-V: Support strided load/store

2023-10-31 Thread Patrick O'Neill


Hi Juzhe,

The pre-commit CI is seeing these new failures after applying this patch 
[1]:


|FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 132 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 66 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 66 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 66 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 55 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 66 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 44 FAIL: 
gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 55 |


Example debug log:
|
Executing on host: 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc 
-B/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
-march=rv32gc_zba_zbb_zbc_zbs -mabi=ilp32d -mcmodel=medlow 
-fdiagnostics-plain-output   -march=rv64gcv_zvfh -mabi=lp64d -O3 --param 
riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math 
-fdump-tree-optimized-details -S   -o strided_store-3.s    (timeout = 600)
spawn -ignore SIGHUP 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc 
-B/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
-march=rv32gc_zba_zbb_zbc_zbs -mabi=ilp32d -mcmodel=medlow 
-fdiagnostics-plain-output -march=rv64gcv_zvfh -mabi=lp64d -O3 --param 
riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math 
-fdump-tree-optimized-details -S -o strided_store-3.s
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
(test for excess errors)
gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c: pattern 
found 0 times
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 55
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-not optimized " .SCATTER_STORE"
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-not optimized " .MASK_SCATTER_STORE"|


If these failures are due to the missing middle-end bits then feel free 
to ignore :-)


Thanks,
Patrick

[1]: 
https://github.com/ewlu/gcc-precommit-ci/issues/524#issuecomment-1787004837


On 10/31/23 13:26, Robin Dapp wrote:

Hi Juzhe,

LGTM once the middle-end parts are in.  Just tiny nits.
Nothing that would warrant a V2, though.


+;; =
+;; == Stried Load/Store

missing a 'd' here.
  

+(define_predicate "vector_stride_extension_operand"
+  (ior (and (match_operand 0 "immediate_operand")
+(match_test "Pmode == DImode"))
+   (and (match_operand 0 "const_0_operand")
+(match_test "Pmode == SImode"
+

This could use a comment why we allow only sign extension
for 32 bit.  Also the linter complains about spaces vs tabs.

Regards
  Robin

Re: Re: [PATCH] RISC-V: Support strided load/store

2023-10-31 Thread juzhe.zh...@rivai.ai

It is new vectorization optimization which needs middle-end patches.

I believe you didn't apply these following 2 patches:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634812.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634813.html 




juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-11-01 13:23
To: Juzhe-Zhong
CC: kito.cheng; kito.cheng; jeffreyalaw; Robin Dapp; gcc-patches
Subject: Re: [PATCH] RISC-V: Support strided load/store
Hi Juzhe,

The pre-commit CI is seeing these new failures after applying this patch [1]:
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 132
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 66
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 66
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 66
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_LOAD" 55
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 66
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 44
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 55
Example debug log:

Executing on host: 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
  
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c
  -march=rv32gc_zba_zbb_zbc_zbs -mabi=ilp32d -mcmodel=medlow   
-fdiagnostics-plain-output   -march=rv64gcv_zvfh -mabi=lp64d -O3 --param 
riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math 
-fdump-tree-optimized-details -S   -o strided_store-3.s(timeout = 600)
spawn -ignore SIGHUP 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c
 -march=rv32gc_zba_zbb_zbc_zbs -mabi=ilp32d -mcmodel=medlow 
-fdiagnostics-plain-output -march=rv64gcv_zvfh -mabi=lp64d -O3 --param 
riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math 
-fdump-tree-optimized-details -S -o strided_store-3.s
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c (test for 
excess errors)
gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c: pattern found 0 
times
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-times optimized " .MASK_LEN_STRIDED_STORE" 55
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-not optimized " .SCATTER_STORE"
PASS: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c 
scan-tree-dump-not optimized " .MASK_SCATTER_STORE"
If these failures are due to the missing middle-end bits then feel free to 
ignore :-)

Thanks,
Patrick

[1]: https://github.com/ewlu/gcc-precommit-ci/issues/524#issuecomment-1787004837
On 10/31/23 13:26, Robin Dapp wrote:
Hi Juzhe,
LGTM once the middle-end parts are in.  Just tiny nits.
Nothing that would warrant a V2, though.
+;; =
+;; == Stried Load/Store
missing a 'd' here.
 
+(define_predicate "vector_stride_extension_operand"
+  (ior (and (match_operand 0 "immediate_operand")
+(match_test "Pmode == DImode"))
+   (and (match_operand 0 "const_0_operand")
+(match_test "Pmode == SImode"
+
This could use a comment why we allow only sign extension
for 32 bit.  Also the linter complains about spaces vs tabs.
Regards
 Robin

[PATCH] RISC-V: Support vundefine intrinsics for tuple types

2023-10-31 Thread Li Xu

From: xuli 

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/288

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (vundefined): Add 
vundefine intrinsics for tuple types.
* config/riscv/riscv-vector-builtins.cc: Ditto.
* config/riscv/vector.md (@vundefined): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple_vundefined.c: New test.
---
 .../riscv/riscv-vector-builtins-functions.def |  1 +
 gcc/config/riscv/riscv-vector-builtins.cc |  8 ++
 gcc/config/riscv/vector.md|  7 ++
 .../riscv/rvv/base/tuple_vundefined.c | 73 +++
 4 files changed, 89 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 18ed2c2b8f6..911fd520195 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -622,6 +622,7 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, 
all_v_vget_lmul4_x2_ops)
 DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
 DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops)
 DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops)
+DEF_RVV_FUNCTION (vundefined, vundefined, none_preds, all_none_void_tuple_ops)
 DEF_RVV_FUNCTION (vlseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ops)
 DEF_RVV_FUNCTION (vsseg, seg_loadstore, none_m_preds, tuple_v_scalar_ptr_ops)
 DEF_RVV_FUNCTION (vlsseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ptrdiff_ops)
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 5d4dc264fa6..2e33bf73549 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2509,6 +2509,14 @@ static CONSTEXPR const rvv_op_info 
all_v_vcreate_tuple_ops
  rvv_arg_type_info (RVV_BASE_vector), /* Return type */
  tuple_vcreate_args /* Args */};
 
+/* A static operand information for vector_type func () function registration.
+ */
+static CONSTEXPR const rvv_op_info all_none_void_tuple_ops
+  = {tuple_ops,  /* Types */
+ OP_TYPE_none,   /* Suffix */
+ rvv_arg_type_info (RVV_BASE_vector), /* Return type */
+ void_args /* Args */};
+
 /* A list of all RVV base function types.  */
 static CONSTEXPR const function_type_info function_types[] = {
 #define DEF_RVV_TYPE_INDEX(
\
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0297e4f0227..35bb6c3dc58 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -880,6 +880,13 @@
   ""
   [(set_attr "type" "vector")])
 
+(define_insn "@vundefined"
+  [(set (match_operand:VT 0 "register_operand" "=vr")
+   (unspec:VT [(reg:SI X0_REGNUM)] UNSPEC_VUNDEF))]
+  "TARGET_VECTOR"
+  ""
+  [(set_attr "type" "vector")])
+
 (define_expand "@vreinterpret"
   [(set (match_operand:V 0 "register_operand")
(match_operand 1 "vector_any_register_operand"))]
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c
new file mode 100644
index 000..174860de559
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c
@@ -0,0 +1,73 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+vfloat16mf4x2_t
+test_vundefined_f16mf4x2 ()
+{
+  return __riscv_vundefined_f16mf4x2 ();
+}
+
+vfloat32m1x3_t
+test_vundefined_f32m1x3 ()
+{
+  return __riscv_vundefined_f32m1x3 ();
+}
+
+vfloat64m1x5_t
+test_vundefined_f64m1x5 ()
+{
+  return __riscv_vundefined_f64m1x5 ();
+}
+
+vint8mf4x2_t
+test_vundefined_i8mf4x2 ()
+{
+  return __riscv_vundefined_i8mf4x2 ();
+}
+
+vint16mf4x8_t
+test_vundefined_i16mf4x8 ()
+{
+  return __riscv_vundefined_i16mf4x8 ();
+}
+
+vint32m1x7_t
+test_vundefined_i32m1x7 ()
+{
+  return __riscv_vundefined_i32m1x7 ();
+}
+
+vint64m1x4_t
+test_vundefined_i64m1x4 ()
+{
+  return __riscv_vundefined_i64m1x4 ();
+}
+
+vuint8mf8x2_t
+test_vundefined_u8mf8x2 ()
+{
+  return __riscv_vundefined_u8mf8x2 ();
+}
+
+vuint16mf4x4_t
+test_vundefined_u16mf4x4 ()
+{
+  return __riscv_vundefined_u16mf4x4 ();
+}
+
+vuint32m1x7_t
+test_vundefined_u32m1x7 ()
+{
+  return __riscv_vundefined_u32m1x7 ();
+}
+
+vuint64m4x2_t
+test_vundefined_u64m4x2 ()
+{
+  return __riscv_vundefined_u64m4x2 ();
+}
+
+/* { dg-final { scan-assembler-times {vse[0-9]+\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 
18 } } */
+/* { dg-final { scan-assembler-times 
{vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 28 } } */
-- 
2.17.1

Re: [PATCH] RISC-V: Support vundefine intrinsics for tuple types

2023-10-31 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-11-01 14:35
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Support vundefine intrinsics for tuple types
From: xuli 
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/288
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-functions.def (vundefined): Add 
vundefine intrinsics for tuple types.
* config/riscv/riscv-vector-builtins.cc: Ditto.
* config/riscv/vector.md (@vundefined): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/tuple_vundefined.c: New test.
---
.../riscv/riscv-vector-builtins-functions.def |  1 +
gcc/config/riscv/riscv-vector-builtins.cc |  8 ++
gcc/config/riscv/vector.md|  7 ++
.../riscv/rvv/base/tuple_vundefined.c | 73 +++
4 files changed, 89 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 18ed2c2b8f6..911fd520195 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -622,6 +622,7 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, 
all_v_vget_lmul4_x2_ops)
DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_tuple_ops)
DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_tuple_ops)
+DEF_RVV_FUNCTION (vundefined, vundefined, none_preds, all_none_void_tuple_ops)
DEF_RVV_FUNCTION (vlseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ops)
DEF_RVV_FUNCTION (vsseg, seg_loadstore, none_m_preds, tuple_v_scalar_ptr_ops)
DEF_RVV_FUNCTION (vlsseg, seg_loadstore, full_preds, 
tuple_v_scalar_const_ptr_ptrdiff_ops)
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 5d4dc264fa6..2e33bf73549 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2509,6 +2509,14 @@ static CONSTEXPR const rvv_op_info 
all_v_vcreate_tuple_ops
  rvv_arg_type_info (RVV_BASE_vector), /* Return type */
  tuple_vcreate_args /* Args */};
+/* A static operand information for vector_type func () function registration.
+ */
+static CONSTEXPR const rvv_op_info all_none_void_tuple_ops
+  = {tuple_ops,   /* Types */
+ OP_TYPE_none,   /* Suffix */
+ rvv_arg_type_info (RVV_BASE_vector), /* Return type */
+ void_args /* Args */};
+
/* A list of all RVV base function types.  */
static CONSTEXPR const function_type_info function_types[] = {
#define DEF_RVV_TYPE_INDEX(\
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0297e4f0227..35bb6c3dc58 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -880,6 +880,13 @@
   ""
   [(set_attr "type" "vector")])
+(define_insn "@vundefined"
+  [(set (match_operand:VT 0 "register_operand" "=vr")
+ (unspec:VT [(reg:SI X0_REGNUM)] UNSPEC_VUNDEF))]
+  "TARGET_VECTOR"
+  ""
+  [(set_attr "type" "vector")])
+
(define_expand "@vreinterpret"
   [(set (match_operand:V 0 "register_operand")
(match_operand 1 "vector_any_register_operand"))]
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c
new file mode 100644
index 000..174860de559
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/tuple_vundefined.c
@@ -0,0 +1,73 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+vfloat16mf4x2_t
+test_vundefined_f16mf4x2 ()
+{
+  return __riscv_vundefined_f16mf4x2 ();
+}
+
+vfloat32m1x3_t
+test_vundefined_f32m1x3 ()
+{
+  return __riscv_vundefined_f32m1x3 ();
+}
+
+vfloat64m1x5_t
+test_vundefined_f64m1x5 ()
+{
+  return __riscv_vundefined_f64m1x5 ();
+}
+
+vint8mf4x2_t
+test_vundefined_i8mf4x2 ()
+{
+  return __riscv_vundefined_i8mf4x2 ();
+}
+
+vint16mf4x8_t
+test_vundefined_i16mf4x8 ()
+{
+  return __riscv_vundefined_i16mf4x8 ();
+}
+
+vint32m1x7_t
+test_vundefined_i32m1x7 ()
+{
+  return __riscv_vundefined_i32m1x7 ();
+}
+
+vint64m1x4_t
+test_vundefined_i64m1x4 ()
+{
+  return __riscv_vundefined_i64m1x4 ();
+}
+
+vuint8mf8x2_t
+test_vundefined_u8mf8x2 ()
+{
+  return __riscv_vundefined_u8mf8x2 ();
+}
+
+vuint16mf4x4_t
+test_vundefined_u16mf4x4 ()
+{
+  return __riscv_vundefined_u16mf4x4 ();
+}
+
+vuint32m1x7_t
+test_vundefined_u32m1x7 ()
+{
+  return __riscv_vundefined_u32m1x7 ();
+}
+
+vuint64m4x2_t
+test_vundefined_u64m4x2 ()
+{
+  return __riscv_vundefined_u64m4x2 ();
+}
+
+/* { dg-final { scan-assembler-times {vse[0-9]+\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 
18 } } */
+/* { dg-final { scan-assembler-times 
{vs[0-9]+r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 28 } } */
-- 
2.17.1

[PATCH] RISC-V: Allow dest operand and accumulator operand overlap of widen reduction instruction[PR112327]

2023-10-31 Thread Juzhe-Zhong



Consider this following intrinsic code:

void rvv_dot_prod(int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *result)
{
size_t vl;
vint16m4_t vSrcA, vSrcB;
vint64m1_t vSum = __riscv_vmv_s_x_i64m1(0, 1);
while (n > 0) {
vl = __riscv_vsetvl_e16m4(n);
vSrcA = __riscv_vle16_v_i16m4(pSrcA, vl);
vSrcB = __riscv_vle16_v_i16m4(pSrcB, vl);
vSum = __riscv_vwredsum_vs_i32m8_i64m1(__riscv_vwmul_vv_i32m8(vSrcA, 
vSrcB, vl), vSum, vl);
pSrcA += vl;
pSrcB += vl;
n -= vl;
}
*result = __riscv_vmv_x_s_i64m1_i64(vSum);
}

https://godbolt.org/z/vWd35W7G6

Before this patch:

...
Loop:
...
vmv1r.v v2,v1
...
vwredsum.vs v1,v8,v2
...

After this patch:

...
Loop:
...
vwredsum.vs v1,v8,v1
...

PR target/112327

gcc/ChangeLog:

* config/riscv/vector.md: Add '0'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112327-1.c: New test.
* gcc.target/riscv/rvv/base/pr112327-2.c: New test.

---
 gcc/config/riscv/vector.md|  4 +--
 .../gcc.target/riscv/rvv/base/pr112327-1.c| 27 +++
 .../gcc.target/riscv/rvv/base/pr112327-2.c| 27 +++
 3 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-2.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0297e4f0227..3577971fa33 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7765,7 +7765,7 @@
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
(unspec: [
 (match_operand:VI_QHS 3 "register_operand"  "   vr,   
vr")
-(match_operand:  4 "register_operand"  "   vr,   
vr")
+(match_operand:  4 "register_operand"  "  vr0,  
vr0")
] ANY_WREDUC)
   (match_operand:2 "vector_merge_operand"  "   vu,
0")] UNSPEC_REDUC))]
   "TARGET_VECTOR"
@@ -7834,7 +7834,7 @@
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
(unspec: [
 (match_operand:VF_HS  3 "register_operand"  "   vr,   
vr")
-(match_operand:  4 "register_operand"  "   vr,   
vr")
+(match_operand:  4 "register_operand"  "  vr0,  
vr0")
] ANY_FWREDUC_SUM)
   (match_operand:2 "vector_merge_operand"  "   vu,
0")] UNSPEC_REDUC))]
   "TARGET_VECTOR"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-1.c
new file mode 100644
index 000..20da23976f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *result)
+{
+  size_t vl;
+  vint16m4_t vSrcA, vSrcB;
+  vint64m1_t vSum = __riscv_vmv_s_x_i64m1 (0, 1);
+  while (n > 0)
+{
+  vl = __riscv_vsetvl_e16m4 (n);
+  vSrcA = __riscv_vle16_v_i16m4 (pSrcA, vl);
+  vSrcB = __riscv_vle16_v_i16m4 (pSrcB, vl);
+  vSum = __riscv_vwredsum_vs_i32m8_i64m1 (
+   __riscv_vwmul_vv_i32m8 (vSrcA, vSrcB, vl), vSum, vl);
+  pSrcA += vl;
+  pSrcB += vl;
+  n -= vl;
+}
+  *result = __riscv_vmv_x_s_i64m1_i64 (vSum);
+}
+
+/* { dg-final { scan-assembler-not {vmv1r} } } */
+/* { dg-final { scan-assembler-not {vmv\.v\.v} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-2.c
new file mode 100644
index 000..5ffde000fbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112327-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zfh -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+void
+foo (_Float16 *pSrcA, _Float16 *pSrcB, uint32_t n, double *result)
+{
+  size_t vl;
+  vfloat16m4_t vSrcA, vSrcB;
+  vfloat64m1_t vSum = __riscv_vfmv_s_f_f64m1 (0, 1);
+  while (n > 0)
+{
+  vl = __riscv_vsetvl_e16m4 (n);
+  vSrcA = __riscv_vle16_v_f16m4 (pSrcA, vl);
+  vSrcB = __riscv_vle16_v_f16m4 (pSrcB, vl);
+  vSum = __riscv_vfwredusum_vs_f32m8_f64m1 (
+   __riscv_vfwmul_vv_f32m8 (vSrcA, vSrcB, vl), vSum, vl);
+  pSrcA += vl;
+  pSrcB += vl;
+  n -= vl;
+}
+  *result = __riscv_vfmv_f_s_f64m1_f64 (vSum);
+}
+
+/* { dg-final { scan-assembler-not {vmv1r} } } */
+/* { dg-final { scan-assembler-not {vmv\.v\.v} } } */
-- 
2.36.3

90 matches

Mail list logo