date:20240123

Pushed: [PATCH v2] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-23 Thread Xi Ruoyao

On Tue, 2024-01-23 at 10:37 +0800, chenglulu wrote:
> LGTM!
> 
> Thanks!

Pushed v2 as attached.  The only change is in the comment: Qinggang told
me TLE LE relaxation actually *requires* explicit relocs.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From f12317306ee714aed0ca7ed01cafa520296b4c4c Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Mon, 22 Jan 2024 18:07:42 +0800
Subject: [PATCH v2] LoongArch: Disable explicit reloc for TLS LD/GD with
 -mexplicit-relocs=auto

Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
	If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
	for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
	(loongarch_call_tls_get_addr): Do not split symbols of
	SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
	EXPLICIT_RELOCS_AUTO.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
	for la.tls.ld and la.tls.gd.
---
 gcc/config/loongarch/loongarch.cc  | 10 +-
 .../loongarch/explicit-relocs-auto-tls-ld-gd.c |  3 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 82467474288..072c68d97e3 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1970,11 +1970,11 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type type)
 {
   case SYMBOL_TLS_IE:
   case SYMBOL_TLS_LE:
-  case SYMBOL_TLSGD:
-  case SYMBOL_TLSLDM:
   case SYMBOL_PCREL64:
-	/* The linker don't know how to relax TLS accesses or 64-bit
-	   pc-relative accesses.  */
+	/* TLS IE cannot be relaxed.  TLS LE relaxation is different from
+	   the normal R_LARCH_RELAX-based relaxation and it **requires**
+	   using the explicit %le_{lo12,hi20,add}_r relocs.  The linker
+	   does not relax 64-bit pc-relative accesses as at now.  */
 	return true;
   case SYMBOL_GOT_DISP:
 	/* The linker don't know how to relax GOT accesses in extreme
@@ -2789,7 +2789,7 @@ loongarch_call_tls_get_addr (rtx sym, enum loongarch_symbol_type type, rtx v0)
 
   start_sequence ();
 
-  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+  if (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
 {
   /* Split tls symbol to high and low.  */
   rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
diff --git a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
index 957ff98df62..ca55fcfc53e 100644
--- a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
+++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
@@ -6,4 +6,5 @@ extern __thread int b __attribute__((visibility("default")));
 
 int test() { return a + b; }
 
-/* { dg-final { scan-assembler-not "la.tls" { target tls_native } } } */
+/* { dg-final { scan-assembler "la\\.tls\\.ld" { target tls_native } } } */
+/* { dg-final { scan-assembler "la\\.tls\\.gd" { target tls_native } } } */
-- 
2.43.0

[PATCH] c: Call c_fully_fold on __atomic_* operands in atomic_bitint_fetch_using_cas_loop [PR113518]

2024-01-23 Thread Jakub Jelinek

Hi!

As the following testcase shows, I forgot to call c_fully_fold on the
__atomic_*/__sync_* operands called on _BitInt address, the expressions
are then used inside of TARGET_EXPR initializers etc. and are never fully
folded later, which means we can ICE e.g. on C_MAYBE_CONST_EXPR trees
inside of those.

The following patch fixes it, while the function currently is only called
in the C FE because C++ doesn't support BITINT_TYPE, I think guarding the
calls on !c_dialect_cxx () is safer.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-22  Jakub Jelinek  

PR c/113518
* c-common.cc (atomic_bitint_fetch_using_cas_loop): Call c_fully_fold
on lhs_addr, val and model for C.

* gcc.dg/bitint-77.c: New test.

--- gcc/c-family/c-common.cc.jj 2024-01-03 12:07:03.384717147 +0100
+++ gcc/c-family/c-common.cc2024-01-22 14:24:44.209262552 +0100
@@ -8082,6 +8082,12 @@ atomic_bitint_fetch_using_cas_loop (loca
   tree lhs_addr = (*orig_params)[0];
   tree val = convert (nonatomic_lhs_type, (*orig_params)[1]);
   tree model = convert (integer_type_node, (*orig_params)[2]);
+  if (!c_dialect_cxx ())
+{
+  lhs_addr = c_fully_fold (lhs_addr, false, NULL);
+  val = c_fully_fold (val, false, NULL);
+  model = c_fully_fold (model, false, NULL);
+}
   if (TREE_SIDE_EFFECTS (lhs_addr))
 {
   tree var = create_tmp_var_raw (TREE_TYPE (lhs_addr));
--- gcc/testsuite/gcc.dg/bitint-77.c.jj 2024-01-22 14:40:22.505993511 +0100
+++ gcc/testsuite/gcc.dg/bitint-77.c2024-01-22 14:40:02.445276390 +0100
@@ -0,0 +1,25 @@
+/* PR c/113518 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-O2 -std=c23" } */
+
+#if __BITINT_MAXWIDTH__ >= 607
+_BitInt(607) v;
+#else
+_BitInt(63) v;
+#endif
+
+void
+foo (void)
+{
+  __atomic_fetch_or (&v, 1 << 31, __ATOMIC_RELAXED);
+}
+
+#if __BITINT_MAXWIDTH__ >= 16321
+_BitInt(16321) w;
+
+void
+bar (void)
+{
+  __atomic_fetch_add (&w, 1 << 31, __ATOMIC_SEQ_CST);
+}
+#endif

Jakub

Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-01-23 Thread Alexandre Oliva

On Dec  5, 2023, Alexandre Oliva  wrote:

> arm: fix c23 0-named-args caller-side stdarg

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639472.html

> The commit message doesn't name explicitly the fixed testsuite
> failures.  Here they are:

> FAIL: gcc.dg/c23-stdarg-4.c execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O0  execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O1  execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2  execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O3 -g  execution test
> FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -Os  execution test

> Tested on arm-eabi.  Ok to install?


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: Pushed: [PATCH v2] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-23 Thread chenglulu




在 2024/1/23 下午4:04, Xi Ruoyao 写道:

On Tue, 2024-01-23 at 10:37 +0800, chenglulu wrote:

LGTM!

Thanks!

Pushed v2 as attached.  The only change is in the comment: Qinggang told
me TLE LE relaxation actually *requires* explicit relocs.


I think one of the reasons is also because we cannot properly use a 
macro to describe TLS LE relaxation.

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-01-23 Thread Kewen.Lin

Hi Mike,

on 2024/1/6 07:35, Michael Meissner wrote:
> This patch implements support for a potential future PowerPC cpu.  Features
> added with -mcpu=future, may or may not be added to new PowerPC processors.
> 
> This patch adds support for the -mcpu=future option.  If you use -mcpu=future,
> the macro __ARCH_PWR_FUTURE__ is defined, and the assembler .machine directive
> "future" is used.  Future patches in this series will add support for new
> instructions that may be present in future PowerPC processors.
> 
> This particular patch does not any new features.  It exists as a ground work
> for future patches to support for a possible PowerPC processor in the future.
> 
> This patch does not implement any differences in tuning when -mcpu=future is
> used compared to -mcpu=power10.  If -mcpu=future is used, GCC will use power10
> tuning.  If you explicitly use -mtune=future, you will get a warning that
> -mtune=future is not supported, and default tuning will be set for power10.
> 
> The patches have been tested on both little and big endian systems.  Can I 
> check
> it into the master branch?
> 
> 2024-01-05   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
>   __ARCH_PWR_FUTURE__ if -mcpu=future.
>   * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
>   (POWERPC_MASKS): Add -mcpu=future support.
>   * config/rs6000/rs6000-opts.h (enum processor_type): Add
>   PROCESSOR_FUTURE.
>   * config/rs6000/rs6000-tables.opt: Regenerate.
>   * config/rs6000/rs6000.cc (rs600_cpu_index_lookup): New helper
>   function.
>   (rs6000_option_override_internal): Make -mcpu=future set
>   -mtune=power10.  If the user explicitly uses -mtune=future, give a
>   warning and reset the tuning to power10.
>   (rs6000_option_override_internal): Use power10 costs for future
>   machine.
>   (rs6000_machine_from_flags): Add support for -mcpu=future.
>   (rs6000_opt_masks): Likewise.
>   * config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
>   * config/rs6000/rs6000.md (cpu attribute): Likewise.
>   * config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
>   * doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document 
> -mcpu=future.
> ---
>  gcc/config/rs6000/rs6000-c.cc   |  2 +
>  gcc/config/rs6000/rs6000-cpus.def   |  6 +++
>  gcc/config/rs6000/rs6000-opts.h |  4 +-
>  gcc/config/rs6000/rs6000-tables.opt |  3 ++
>  gcc/config/rs6000/rs6000.cc | 58 -
>  gcc/config/rs6000/rs6000.h  |  1 +
>  gcc/config/rs6000/rs6000.md |  2 +-
>  gcc/config/rs6000/rs6000.opt|  4 ++
>  gcc/doc/invoke.texi |  2 +-
>  9 files changed, 69 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index ce0b14a8d37..f2fb5bef678 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
> flags)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> +  if ((flags & OPTION_MASK_FUTURE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE");
>if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
>if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index d28cc87eb2a..8754635f3d9 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -88,6 +88,10 @@
>| OPTION_MASK_POWER10  \
>| OTHER_POWER10_MASKS)
>  
> +/* Flags for a potential future processor that may or may not be delivered.  
> */
> +#define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER   \
> +  | OPTION_MASK_FUTURE)
> +

Nit: Named as "ISA_FUTURE_MASKS_SERVER" seems more accurate as it's constituted
with ISA_3_1_MASKS_**SERVER** ...

>  /* Flags that need to be turned off if -mno-power9-vector.  */
>  #define OTHER_P9_VECTOR_MASKS(OPTION_MASK_FLOAT128_HW
> \
>| OPTION_MASK_P9_MINMAX)
> @@ -135,6 +139,7 @@
>| OPTION_MASK_LOAD_VECTOR_PAIR \
>| OPTION_MASK_POWER10  \
>| OPTION_MASK_P10_FUSION   \
> +  | OPTION_MASK_FUTURE   \
>| OPTION_MASK_HTM  \
>| OPTION_MASK_ISEL \
>| OPTION_MASK_MFCRF

Re: Repost [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2024-01-23 Thread Kewen.Lin

on 2024/1/6 07:37, Michael Meissner wrote:
> This patch re-enables generating load and store vector pair instructions when
> doing certain memory copy operations when -mcpu=future is used.
> 
> During power10 development, it was determined that using store vector pair
> instructions were problematical in a few cases, so we disabled generating load
> and store vector pair instructions for memory options by default.  This patch
> re-enables generating these instructions if -mcpu=future is used.
> 
> The patches have been tested on both little and big endian systems.  Can I 
> check
> it into the master branch?
> 
> 2024-01-05   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
>   -mblock-ops-vector-pair.

Nit: s/-mblock-ops-vector-pair/OPTION_MASK_BLOCK_OPS_VECTOR_PAIR/

>   (POWERPC_MASKS): Likewise.
> ---
>  gcc/config/rs6000/rs6000-cpus.def | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 8754635f3d9..b6cd6d8cc84 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -90,6 +90,7 @@
>  
>  /* Flags for a potential future processor that may or may not be delivered.  
> */
>  #define ISA_FUTURE_MASKS (ISA_3_1_MASKS_SERVER   \
> +  | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
>| OPTION_MASK_FUTURE)


OK with incorporating change s/ISA_FUTURE_MASKS/ISA_FUTURE_MASKS_SERVER/.  
Thanks!

BR,
Kewen

>  
>  /* Flags that need to be turned off if -mno-power9-vector.  */
> @@ -127,6 +128,7 @@
>  
>  /* Mask of all options to set the default isa flags based on -mcpu=.  */
>  #define POWERPC_MASKS(OPTION_MASK_ALTIVEC
> \
> +  | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
>| OPTION_MASK_CMPB \
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DFP  \

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-23 Thread FX Coudert

Hi Steve,

Thanks for the patch. I’ll take time to do a proper review, but after a first 
read I had the following questions:

- "an OS's libm may/will contain cospi(), etc.”: do those functions conform to 
any standard? Are there plans to implement them outside FreeBSD at this point?

- On systems where libquadmath is used for _Float128, does libquadmath contain 
the implementation of the q() variants for those functions?

- If I get this right, to take one example, the Fortran front-end will emit a 
call to gfortran_acospi_r4(), libgfortran provides this as a wrapper calling 
acospif(), which is called either from libm or from libgfortran. This is 
different from other math library functions, like ACOS() where the acosf() call 
is generated directly from the front-end (and then the implementation comes 
either from libm or from libgfortran). Why not follow our usual way of doing 
things?

- On most targets, cospi() and friends are not available. Therefore, we end up 
doing the fallback (with limited precision as you noted) but with a lot of 
indirection. We could generate that code directly in the front-end, couldn’t we?


Best,
FX

[v2][patch] plugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

2024-01-23 Thread Tobias Burnus


Slightly changed patch:

nvptx_attach_host_thread_to_device now fails again with an error for 
CUDA_ERROR_DEINITIALIZED, except for GOMP_OFFLOAD_fini_device.


I think it makes more sense that way.

Tobias Burnus wrote:

Testing showed that the libgomp.c/target-52.c failed with:

libgomp: cuCtxGetDevice error: unknown cuda error

libgomp: device finalization failed

This testcase uses OMP_DISPLAY_ENV=true and 
OMP_TARGET_OFFLOAD=mandatory, and those env vars matter, i.e. it only 
fails if dg-set-target-env-var is honored.


If both env vars are set, the device initialization occurs earlier as 
OMP_DEFAULT_DEVICE is shown due to the display-env env var and its 
value (when target-offload-var is 'mandatory') might be either 
'omp_invalid_device' or '0'.


It turned out that this had an effect on device finalization, which 
caused CUDA to stop earlier than expected. This patch now handles this 
case gracefully. For details, see the commit log message in the 
attached patch and/or the PR.


Comments, remarks, suggestions?

Does this look sensible? (I would like to see some acknowledgement by 
someone who feels more comfortable with CUDA than me.)


Tobiasplugin/plugin-nvptx.c: Fix fini_device call when already shutdown [PR113513]

The following issue was found when running libgomp.c/target-52.c with
nvptx offloading when the dg-set-target-env-var was honored. The issue
occurred for both -foffload=disable and with offloading configured when
an nvidia device is available.

At the end of the program, the offloading parts are shutdown via two means:
The callback registered via 'atexit (gomp_target_fini)' and - via code
generated in mkoffload, the '__attribute__((destructor)) fini' function
that calls GOMP_offload_unregister_ver.

In normal processing, first gomp_target_fini is called - which then sets
GOMP_DEVICE_FINALIZED for the device - and later GOMP_offload_unregister_ver,
but that's then because the state is GOMP_DEVICE_FINALIZED.
If both OMP_DISPLAY_ENV=true and OMP_TARGET_OFFLOAD="mandatory" are set,
the call omp_display_env already invokes gomp_init_targets_once, i.e. it
occurs earlier than usual and is invoked via __attribute__((constructor))
initialize_env.

For some unknown reasons, while this does not have an effect on the
order of the called plugin functions for initialization, it changes the
order of function calls for shutting down. Namely, when the two environment
variables are set, GOMP_offload_unregister_ver is called now before
gomp_target_fini. - And it seems as if CUDA regards a call to cuModuleUnload
(or unloading the last module?) as indication that the device context should
be destroyed - or, at least, afterwards calling cuCtxGetDevice will return
CUDA_ERROR_DEINITIALIZED.

As the previous code in nvptx_attach_host_thread_to_device wasn't expecting
that result, it called
  GOMP_PLUGIN_error ("cuCtxGetDevice error: %s", cuda_error (r));
causing a fatal error of the program.

This commit handles now CUDA_ERROR_DEINITIALIZED in a special way such
that GOMP_OFFLOAD_fini_device just works.

When reading the code, the following was observed in addition:
When gomp_fini_device is called, it invokes goacc_fini_asyncqueues
to ensure that the queue is emptied.  It seems to make sense to do
likewise for GOMP_offload_unregister_ver, which this commit does in
addition.

libgomp/ChangeLog:

	PR libgomp/113513
	* target.c (GOMP_offload_unregister_ver): Call goacc_fini_asyncqueues
	before invoking GOMP_offload_unregister_ver.
	* plugin/plugin-nvptx.c (nvptx_attach_host_thread_to_device): Change
	return type to int and new bool arg, it true, return -1 for
	CUDA_ERROR_DEINITIALIZED.
	(GOMP_OFFLOAD_fini_device): Handle the deinitialized gracefully.
	(nvptx_init, GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_alloc,
	GOMP_OFFLOAD_host2dev, GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d, GOMP_OFFLOAD_openacc_async_host2dev,
	GOMP_OFFLOAD_openacc_async_dev2host): Update calls

Signed-off-by: Tobias Burnus 

 libgomp/plugin/plugin-nvptx.c | 46 ++-
 libgomp/target.c  |  7 +--
 2 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index c04c3acd679..318d3d2aca6 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -382,10 +382,13 @@ nvptx_init (void)
 }
 
 /* Select the N'th PTX device for the current host thread.  The device must
-   have been previously opened before calling this function.  */
+   have been previously opened before calling this function.
+   Returns 1 if successful, 0 if an error occurred and a message has been
+   issued; if fini_okay, -1 is returned for CUDA_ERROR_DEINITIALIZED and
+   no error message is printed in that case.  */
 
-static bool
-nvptx_attach_host_thread_to_device (int n)
+static int
+nvptx_attach_host_thread_to_device (int n, bool fini_okay)
 {
   CUdevice dev;
   CUresult r;
@@ -393,15 +396,17 @@ nvptx_attac

[PATCH] gcn: Fix a warning

2024-01-23 Thread Jakub Jelinek

Hi!

I see
../../gcc/config/gcn/gcn.cc: In function ‘void 
gcn_hsa_declare_function_name(FILE*, const char*, tree)’:
../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’ 
[-Wunused-parameter]
 6568 | gcn_hsa_declare_function_name (FILE *file, const char *name, tree decl)
  |  ~^~~~
warning presumably since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b
Previously, the argument was anonymous, but now it is passed to a macro
which ignores it, so I think we should go with ATTRIBUTE_UNUSED.

Ok for trunk?

2024-01-23  Jakub Jelinek  

* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Add
ATTRIBUTE_UNUSED to decl.

--- gcc/config/gcn/gcn.cc.jj2024-01-08 16:13:18.625940507 +0100
+++ gcc/config/gcn/gcn.cc   2024-01-23 10:47:33.945954494 +0100
@@ -6565,7 +6565,8 @@ output_file_start (void)
comments that pass information to mkoffload.  */
 
 void
-gcn_hsa_declare_function_name (FILE *file, const char *name, tree decl)
+gcn_hsa_declare_function_name (FILE *file, const char *name,
+  tree decl ATTRIBUTE_UNUSED)
 {
   int sgpr, vgpr, avgpr;
   bool xnack_enabled = TARGET_XNACK;

Jakub

Re: [PATCH] Mark ASM_OUTPUT_FUNCTION_LABEL ()'s DECL argument as used

2024-01-23 Thread Jakub Jelinek

On Tue, Jan 16, 2024 at 07:32:03PM -0700, Jeff Law wrote:
> 
> 
> On 1/15/24 02:22, Ilya Leoshkevich wrote:
> > Compile tested for the ia64-elf target; bootstrap and regtest running
> > on x86_64-redhat-linux.  Ok for trunk when successful?
> > 
> > 
> > 
> > ia64-elf build fails with the following warning:
> > 
> > [all 2024-01-12 16:32:34] ../../gcc/gcc/config/ia64/ia64.cc:3889:59: 
> > error: unused parameter 'decl' [-Werror=unused-parameter]
> > [all 2024-01-12 16:32:34]  3889 | ia64_start_function (FILE *file, 
> > const char *fnname, tree decl)
> > 
> > decl is passed to ASM_OUTPUT_FUNCTION_LABEL (), whose default
> > implementation does not use it.  Mark it as used in order to avoid the
> > warning.
> > 
> > Reported-by: Jan-Benedict Glaw 
> > Suggested-by: Jan-Benedict Glaw 
> > Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using 
> > ASM_OUTPUT_FUNCTION_LABEL")
> > Signed-off-by: Ilya Leoshkevich 
> > 
> > gcc/ChangeLog:
> > 
> > * defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Mark DECL as used.
> So normally the way to go is to just remove the unused argument's name,
> leaving its type.  But in this case we're passing the unused argument into a
> generic macro.
> 
> So ISTM the better way to go here is decorate the argument with
> ATTRIBUTE_UNUSED in ia64.cc.  This has the ever-so-slight advantage that
> if/when the ia64 port goes away that the hack goes away too.
> 
> Something like the attached, which I'd consider pre-approved.
> 
> Jeff

> diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
> index 53ad03b..a2011a9a811 100644
> --- a/gcc/config/ia64/ia64.cc
> +++ b/gcc/config/ia64/ia64.cc
> @@ -3886,7 +3886,7 @@ ia64_expand_prologue (void)
>  /* Output the textual info surrounding the prologue.  */
>  
>  void
> -ia64_start_function (FILE *file, const char *fnname, tree decl)
> +ia64_start_function (FILE *file, const char *fnname, tree decl 
> ATTRIBUTE_UNUSED)

Except the too long line sure, that is what I'd go with as well.

Jakub

Re: [PATCH] gcn: Fix a warning

2024-01-23 Thread Andrew Stubbs

On Tue, 23 Jan 2024 at 10:01, Jakub Jelinek  wrote:

> Hi!
>
> I see
> ../../gcc/config/gcn/gcn.cc: In function ‘void
> gcn_hsa_declare_function_name(FILE*, const char*, tree)’:
> ../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’
> [-Wunused-parameter]
>  6568 | gcn_hsa_declare_function_name (FILE *file, const char *name, tree
> decl)
>   |
> ~^~~~
> warning presumably since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b
> Previously, the argument was anonymous, but now it is passed to a macro
> which ignores it, so I think we should go with ATTRIBUTE_UNUSED.
>
> Ok for trunk?
>

OK.

Andrew

[PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

2024-01-23 Thread Juzhe-Zhong

SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.

The root cause is wasting-memory variables:

unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);

I find that compute_avl_def_data can be achieved by RTL_SSA framework.
Replace the code implementation base on RTL_SSA framework.

After this patch, the memory-hog issue is fixed.

simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes 
--massif-out-file=massif.out)
is 1.673 GB.

lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes 
--massif-out-file=massif.out)
is 2.441 GB. 

Tested on both RV32 and RV64, no regression.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (get_expr_id): Remove.
(get_regno): Ditto.
(get_bb_index): Ditto.
(pre_vsetvl::compute_avl_def_data): Ditto.
(pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 233 --
 .../riscv/rvv/vsetvl/avl_single-107.c |   2 +-
 2 files changed, 52 insertions(+), 183 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 54c85ffb7d5..170fc7f003d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -617,22 +617,6 @@ same_equiv_note_p (set_info *set1, set_info *set2)
   return source_equal_p (insn1, insn2);
 }
 
-static unsigned
-get_expr_id (unsigned bb_index, unsigned regno, unsigned num_bbs)
-{
-  return regno * num_bbs + bb_index;
-}
-static unsigned
-get_regno (unsigned expr_id, unsigned num_bb)
-{
-  return expr_id / num_bb;
-}
-static unsigned
-get_bb_index (unsigned expr_id, unsigned num_bb)
-{
-  return expr_id % num_bb;
-}
-
 /* Return true if the SET result is not used by any instructions.  */
 static bool
 has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
@@ -1337,9 +1321,6 @@ public:
 class demand_system
 {
 private:
-  sbitmap *m_avl_def_in;
-  sbitmap *m_avl_def_out;
-
   /* predictors.  */
 
   inline bool always_true (const vsetvl_info &prev ATTRIBUTE_UNUSED,
@@ -1743,14 +1724,6 @@ private:
   }
 
 public:
-  demand_system () : m_avl_def_in (nullptr), m_avl_def_out (nullptr) {}
-
-  void set_avl_in_out_data (sbitmap *m_avl_def_in, sbitmap *m_avl_def_out)
-  {
-m_avl_def_in = m_avl_def_in;
-m_avl_def_out = m_avl_def_out;
-  }
-
   /* Can we move vsetvl info between prev_insn and next_insn safe? */
   bool avl_vl_unmodified_between_p (insn_info *prev_insn, insn_info *next_insn,
const vsetvl_info &info,
@@ -1778,32 +1751,66 @@ public:
   }
 else
   {
+   basic_block prev_cfg_bb = prev_insn->bb ()->cfg_bb ();
if (!ignore_vl && info.has_vl ())
  {
-   bitmap live_out = df_get_live_out (prev_insn->bb ()->cfg_bb ());
+   bitmap live_out = df_get_live_out (prev_cfg_bb);
if (bitmap_bit_p (live_out, REGNO (info.get_vl (
  return false;
  }
 
-   if (info.has_nonvlmax_reg_avl () && m_avl_def_in && m_avl_def_out)
+   /* Find set_info at location of PREV_INSN and NEXT_INSN, Return
+  false if those 2 set_info are different.
+
+PREV_INSN --- multiple nested blocks --- NEXT_INSN.
+
+  Return false if there is any modifications of AVL inside those
+  multiple nested blocks.  */
+   if (info.has_nonvlmax_reg_avl ())
  {
-   bool has_avl_out = false;
-   unsigned regno = REGNO (info.get_avl ());
-   unsigned expr_id;
-   sbitmap_iterator sbi;
-   EXECUTE_IF_SET_IN_BITMAP (m_avl_def_out[prev_insn->bb ()->index ()],
- 0, expr_id, sbi)
+   resource_info resource = full_register (REGNO (info.get_avl ()));
+   def_lookup dl1 = crtl->ssa->find_def (resource, prev_insn);
+   def_lookup dl2 = crtl->ssa->find_def (resource, next_insn);
+   if (dl2.matching_set ())
+ return false;
+
+   auto is_phi_or_real
+ = [&] (insn_info *h) { return h->is_real () || h->is_phi (); };
+
+   def_info *def1 = dl1.matching_set_or_last_def_of_prev_group ();
+   def_info *def2 = dl2.prev_def (next_insn);
+   set_info *set1 = safe_dyn_cast (def1);
+   set_info *set2 = safe_dyn_cast (def2);
+   if (!set1 || !set2)
+ return false;
+
+   auto is_same_ultimate_def = [&] (set_info *s1, set_info *s2) {
+

[PATCH] debug/107058 - gracefully handle unexpected DIE contexts

2024-01-23 Thread Richard Biener

While the bug is persisting that LTO streaming picks up a CONST_DECL
from an attribute argument on a VAR_DECL which with -fdebug-type-section
refers to a DIE in a type unit we can handle this gracefully, at least
with -fno-checking.  Do so.  The C++ frontend nevetheless should resolve
the CONST_DECL attribute argument to a constant.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

PR debug/107058
* dwarf2out.cc (dwarf2out_die_ref_for_decl): Gracefully
handle unexpected but bogus DIE contexts when not checking
enabled.

* c-c++-common/pr107058.c: New testcase.
---
 gcc/dwarf2out.cc  | 15 +++
 gcc/testsuite/c-c++-common/pr107058.c |  7 +++
 2 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr107058.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 0b8a3002292..2b723210f34 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -6061,10 +6061,17 @@ dwarf2out_die_ref_for_decl (tree decl, const char **sym,
 die = die->die_parent;
   /* For the containing CU DIE we compute a die_symbol in
  compute_comp_unit_symbol.  */
-  gcc_assert (die->die_tag == DW_TAG_compile_unit
- && die->die_id.die_symbol != NULL);
-  *sym = die->die_id.die_symbol;
-  return true;
+  if (die->die_tag == DW_TAG_compile_unit)
+{
+  gcc_assert (die->die_id.die_symbol != NULL);
+  *sym = die->die_id.die_symbol;
+  return true;
+}
+  /* While we can gracefully handle running into say a type unit
+ we don't really want and consider this a bug.  */
+  if (flag_checking)
+gcc_unreachable ();
+  return false;
 }
 
 /* Add a reference of kind ATTR_KIND to a DIE at SYMBOL + OFFSET to DIE.  */
diff --git a/gcc/testsuite/c-c++-common/pr107058.c 
b/gcc/testsuite/c-c++-common/pr107058.c
new file mode 100644
index 000..5e625d6a6af
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr107058.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-g -fdebug-types-section -flto -fno-checking" } */
+
+/* We should handle the C++ FE issue gracefully with -fno-checking.  */
+
+#include "pr50459.c"
-- 
2.35.3

[pushed] aarch64: Avoid registering duplicate C++ overloads [PR112989]

2024-01-23 Thread Richard Sandiford

In the original fix for this PR, I'd made sure that
including  didn't reach the final return in
simulate_builtin_function_decl (which would indicate duplicate
function definitions).  But it seems I forgot to do the same
thing for C++, which defines all of its overloads directly.

This patch fixes a case where we still recorded duplicate
functions for C++.  Thanks to Iain for reporting the resulting
GC ICE and for help with reproducing it.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Skip
MODE_single variants of functions that don't take tuple arguments.
---
 gcc/config/aarch64/aarch64-sve-builtins-shapes.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 2692b086ac5..f190770250f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -338,6 +338,14 @@ build_one (function_builder &b, const char *signature,
   unsigned int ti, unsigned int gi, unsigned int pi,
   bool force_direct_overloads)
 {
+  /* For simplicity, function definitions are allowed to use the group
+ suffix lists vg2 and vg4 for shapes that have _single forms,
+ even though the _single form applies only to vgNx2 and vgNx4,
+ not to vgNx1.  */
+  if (mode_suffix_id == MODE_single
+  && group_suffixes[group.groups[gi]].vectors_per_tuple == 1)
+return;
+
   /* Byte forms of svdupq take 16 arguments.  */
   auto_vec argument_types;
   function_instance instance (group.base_name, *group.base, *group.shape,
-- 
2.25.1

Re: [Patch] libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy*

2024-01-23 Thread Tobias Burnus


Hi Sandra,

thanks for the comments and proposals! An updated version is enclosed.

Unless you find more issues, I intent to commit it soon.

Tobias

PS: I think besides filling gaps, some editing wouldn't harm; if you 
feel bored ...


https://gcc.gnu.org/onlinedocs/libgomp/
libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy*

libgomp/ChangeLog:

	* libgomp.texi (Runtime Library Routines): Document
	omp_pause_resource, omp_pause_resource_all and
	omp_target_memcpy{,_rect}{,_async}.

Co-authored-by: Sandra Loosemore 
Signed-off-by: Tobias Burnus 

 libgomp/libgomp.texi | 332 ---
 1 file changed, 317 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 74d4ef34c43..6ee923099b7 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -561,7 +561,7 @@ specification in version 5.2.
 * Thread Affinity Routines::
 * Teams Region Routines::
 * Tasking Routines::
-@c * Resource Relinquishing Routines::
+* Resource Relinquishing Routines::
 * Device Information Routines::
 * Device Memory Routines::
 * Lock Routines::
@@ -1504,16 +1504,78 @@ and @code{false} represent their language-specific counterparts.
 
 
 
-@c @node Resource Relinquishing Routines
-@c @section Resource Relinquishing Routines
-@c
-@c Routines releasing resources used by the OpenMP runtime.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_pause_resource:: 
-@c * omp_pause_resource_all:: 
-@c @end menu
+@node Resource Relinquishing Routines
+@section Resource Relinquishing Routines
+
+Routines releasing resources used by the OpenMP runtime.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_pause_resource:: Release OpenMP resources on a device
+* omp_pause_resource_all:: Release OpenMP resources on all devices
+@end menu
+
+
+
+@node omp_pause_resource
+@subsection @code{omp_pause_resource} -- Release OpenMP resources on a device
+@table @asis
+@item @emph{Description}:
+Free resources used by the OpenMP program and the runtime library on and for the
+device specified by @var{device_num}; on success, zero is returned and non-zero
+otherwise.
+
+The value of @var{device_num} must be a conforming device number.  The routine
+may not be called from within any explicit region and all explicit threads that
+do not bind to the implicit parallel region have finalized execution.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind, int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind, device_num)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@item   @tab @code{integer device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.43.
+@end table
+
+
+
+@node omp_pause_resource_all
+@subsection @code{omp_pause_resource_all} -- Release OpenMP resources on all devices
+@table @asis
+@item @emph{Description}:
+Free resources used by the OpenMP program and the runtime library on all devices,
+including the host. On success, zero is returned and non-zero otherwise.
+
+The routine may not be called from within any explicit region and all explicit
+threads that do not bind to the implicit parallel region have finalized execution.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_pause_resource}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.44.
+@end table
+
+
 
 @node Device Information Routines
 @section Device Information Routines
@@ -1720,10 +1782,10 @@ pointers on devices. They have C linkage and do not throw exceptions.
 * omp_target_free:: Free device memory
 * omp_target_is_present:: Check whether storage is mapped
 * omp_target_is_accessible:: Check whether memory is device accessible
-@c * omp_target_memcpy:: 
-@c * omp_target_memcpy_rect:: 
-@c * omp_target_memcpy_async:: 
-@c * omp_target_memcpy_rect_async:: 
+* omp_target_memcpy:: Copy data between devices
+* omp_target_memcpy_rect:: Copy a subvolume of data between devices
+* omp_target_memcpy_async:: Copy data between devices asynchronously
+* omp_target_memcpy_rect_async:: Copy a subvolume of data between devices asynchronously
 @c * omp_target_memset:: /TR12
 @c * omp_target_memset_async:: /TR12
 * omp_target_associate_ptr:: Associate a device pointer with a host pointer
@@ -18

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-23 Thread Janne Blomqvist

On Tue, Jan 23, 2024 at 11:09 AM FX Coudert  wrote:
>
> Hi Steve,

Hello, long time no see.

> Thanks for the patch. I’ll take time to do a proper review, but after a first 
> read I had the following questions:
>
> - "an OS's libm may/will contain cospi(), etc.”: do those functions conform 
> to any standard? Are there plans to implement them outside FreeBSD at this 
> point?

acospi, asinpi, atan2pi, atanpi, cospi, sinpi, tanpi are all in IEEE
754-2019, and are slated to be part of C23. I would assume actively
developed libm's will eventually catch up and implement them.

> - If I get this right, to take one example, the Fortran front-end will emit a 
> call to gfortran_acospi_r4(), libgfortran provides this as a wrapper calling 
> acospif(), which is called either from libm or from libgfortran. This is 
> different from other math library functions, like ACOS() where the acosf() 
> call is generated directly from the front-end (and then the implementation 
> comes either from libm or from libgfortran). Why not follow our usual way of 
> doing things?

Good point, that's what we've done in c99_functions.c in libgfortran.
We should probably do this so we can skip jumping via libgfortran when
libm implements these directly.

> - On most targets, cospi() and friends are not available. Therefore, we end 
> up doing the fallback (with limited precision as you noted) but with a lot of 
> indirection. We could generate that code directly in the front-end, couldn’t 
> we?

The frontend generally doesn't know what the target libm implements,
hence it's better to just generate the call, and if necessary have a
barebones implementation in libgfortran if libm doesn't implement it
properly.

-- 
Janne Blomqvist

[PATCH] LoongArch: testsuite: Disable stack protector for got-load.C

2024-01-23 Thread Xi Ruoyao

When building GCC with --enable-default-ssp, the stack protector is
enabled for got-load.C, causing additional GOT loads for
__stack_chk_guard.  So mem/u will be matched more than 2 times and the
test will fail.

Disable stack protector to fix this issue.

gcc/testsuite:

* g++.target/loongarch/got-load.C (dg-options): Add
-fno-stack-protector.
---

Ok for trunk?

 gcc/testsuite/g++.target/loongarch/got-load.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/loongarch/got-load.C 
b/gcc/testsuite/g++.target/loongarch/got-load.C
index 20924c73942..17870176ab4 100644
--- a/gcc/testsuite/g++.target/loongarch/got-load.C
+++ b/gcc/testsuite/g++.target/loongarch/got-load.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand" } */
+/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand -fno-stack-protector" } */
 /* { dg-final { scan-rtl-dump-times "mem/u" 2 "expand" } } */
 
 #include 
-- 
2.43.0

[PATCH] tree-optimization/113552 - fix num_call accounting in simd clone vectorization

2024-01-23 Thread Richard Biener

The following avoids using exact_log2 on the number of SIMD clone calls
to be emitted when vectorizing calls since that can easily be not
a power of two in which case it will return -1.  For different simd
clones the number of calls will differ by a multiply with a power of two
only so using floor_log2 is good enough here.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/113552
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
floor_log2 instead of exact_log2 on the number of calls.
---
 gcc/tree-vect-stmts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 09749ae3817..1dbe1115da4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4071,7 +4071,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
|| (nargs != simd_nargs))
  continue;
if (num_calls != 1)
- this_badness += exact_log2 (num_calls) * 4096;
+ this_badness += floor_log2 (num_calls) * 4096;
if (n->simdclone->inbranch)
  this_badness += 8192;
int target_badness = targetm.simd_clone.usable (n);
-- 
2.35.3

Re: [PATCH] tree-optimization/113552 - fix num_call accounting in simd clone vectorization

2024-01-23 Thread Jakub Jelinek

On Tue, Jan 23, 2024 at 12:56:52PM +0100, Richard Biener wrote:
> The following avoids using exact_log2 on the number of SIMD clone calls
> to be emitted when vectorizing calls since that can easily be not
> a power of two in which case it will return -1.  For different simd
> clones the number of calls will differ by a multiply with a power of two
> only so using floor_log2 is good enough here.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
>   PR tree-optimization/113552
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
>   floor_log2 instead of exact_log2 on the number of calls.

Is there any target which supports non-power-of-two simdlen?
If not, perhaps we should add !pow2p_hwi (num_calls) to the continue;
condition a few lines earlier?

> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 09749ae3817..1dbe1115da4 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -4071,7 +4071,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   || (nargs != simd_nargs))
> continue;
>   if (num_calls != 1)
> -   this_badness += exact_log2 (num_calls) * 4096;
> +   this_badness += floor_log2 (num_calls) * 4096;
>   if (n->simdclone->inbranch)
> this_badness += 8192;
>   int target_badness = targetm.simd_clone.usable (n);
> -- 
> 2.35.3

Jakub

Re: [PATCH] tree-optimization/113552 - fix num_call accounting in simd clone vectorization

2024-01-23 Thread Richard Biener

On Tue, 23 Jan 2024, Jakub Jelinek wrote:

> On Tue, Jan 23, 2024 at 12:56:52PM +0100, Richard Biener wrote:
> > The following avoids using exact_log2 on the number of SIMD clone calls
> > to be emitted when vectorizing calls since that can easily be not
> > a power of two in which case it will return -1.  For different simd
> > clones the number of calls will differ by a multiply with a power of two
> > only so using floor_log2 is good enough here.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > PR tree-optimization/113552
> > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
> > floor_log2 instead of exact_log2 on the number of calls.
> 
> Is there any target which supports non-power-of-two simdlen?
> If not, perhaps we should add !pow2p_hwi (num_calls) to the continue;
> condition a few lines earlier?

Is non-power-of-two simdlen a thing?  Note there's nothing wrong
with non-power-of-two num_calls, with VF == 4 and a group size
of 3 you get 12 lanes and either 3 (simdlen == 4) or 6 (simdlen == 2)
calls.

Iff non-power-of-two simdlen is a thing then we could bias
by + num_calls (no idea why we use *_log2 in the first place, but it
was that way since the beginning).

Richard.

> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 09749ae3817..1dbe1115da4 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -4071,7 +4071,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> > stmt_vec_info stmt_info,
> > || (nargs != simd_nargs))
> >   continue;
> > if (num_calls != 1)
> > - this_badness += exact_log2 (num_calls) * 4096;
> > + this_badness += floor_log2 (num_calls) * 4096;
> > if (n->simdclone->inbranch)
> >   this_badness += 8192;
> > int target_badness = targetm.simd_clone.usable (n);
> > -- 
> > 2.35.3
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-23 Thread Xi Ruoyao

The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories.  So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.

In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation.  So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.

gcc/testsuite/ChangeLog:

PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
---

This fixes the test failure on loongarch64-linux-gnu, and I've also
tested it on x86_64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.dg/pr104992.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
index 82f8c75559c..6fd513d34b2 100644
--- a/gcc/testsuite/gcc.dg/pr104992.c
+++ b/gcc/testsuite/gcc.dg/pr104992.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
@@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned 
y, unsigned z) {
 return x / y * z == x;
 }
 
-/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! 
vect_int_mod } } } } */
-/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target vect_int_mod 
} } } */
+/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
-- 
2.43.0

Re: [PATCH 3/4] rtl-ssa: Ensure new defs get inserted [PR113070]

2024-01-23 Thread Richard Sandiford

Alex Coplan  writes:
> On 22/01/2024 13:49, Richard Sandiford wrote:
>> Alex Coplan  writes:
>> > In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to
>> > RTL-SSA for inserting new insns, which included support for users
>> > creating new defs.
>> >
>> > However, I missed that apply_changes_to_insn needed updating to ensure
>> > that the new defs actually got inserted into the main def chain.  This
>> > meant that when the aarch64 ldp/stp pass inserted a new stp insn, the
>> > stp would just get skipped over during subsequent alias analysis, as its
>> > def never got inserted into the memory def chain.  This (unsurprisingly)
>> > led to wrong code.
>> >
>> > This patch fixes the issue by ensuring new user-created defs get
>> > inserted.  I would have preferred to have used a flag internal to the
>> > defs instead of a separate data structure to keep track of them, but since
>> > machine_mode increased to 16 bits we're already at 64 bits in access_info,
>> > and we can't really reuse m_is_temp as the logic in finalize_new_accesses
>> > requires it to get cleared.
>> >
>> > Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>> >
>> > Thanks,
>> > Alex
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/113070
>> >* rtl-ssa.h: Include hash-set.h.
>> >* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add
>> >new_sets parameter and use it to keep track of new user-created sets.
>> >(function_info::apply_changes_to_insn): Also call add_def on new sets.
>> >(function_info::change_insns): Add hash_set to keep track of new
>> >user-created defs.  Plumb it through.
>> >* rtl-ssa/functions.h: Add hash_set parameter to finalize_new_accesses 
>> > and
>> >apply_changes_to_insn.
>> > ---
>> >  gcc/rtl-ssa.h   |  1 +
>> >  gcc/rtl-ssa/changes.cc  | 28 +---
>> >  gcc/rtl-ssa/functions.h |  6 --
>> >  3 files changed, 26 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/gcc/rtl-ssa.h b/gcc/rtl-ssa.h
>> > index f0cf656f5ac..17337639ae8 100644
>> > --- a/gcc/rtl-ssa.h
>> > +++ b/gcc/rtl-ssa.h
>> > @@ -50,6 +50,7 @@
>> >  #include "mux-utils.h"
>> >  #include "rtlanal.h"
>> >  #include "cfgbuild.h"
>> > +#include "hash-set.h"
>> >  
>> >  // Provides the global crtl->ssa.
>> >  #include "memmodel.h"
>> > diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
>> > index ce51d6ccd8d..6119ec3535b 100644
>> > --- a/gcc/rtl-ssa/changes.cc
>> > +++ b/gcc/rtl-ssa/changes.cc
>> > @@ -429,7 +429,8 @@ update_insn_in_place (insn_change &change)
>> >  // POS gives the final position of INSN, which hasn't yet been moved into
>> >  // place.
>> 
>> The new parameter should be documented.  How about:
>> 
>>   // place.  NEW_SETS contains the new set_infos that are being added as part
>>   // of this change (as opposed to being moved or repurposed from existing
>>   // instructions).
>
> That comment looks appropriate for apply_changes_to_insn, where NEW_SETS has
> already been populated, but doesn't seem accurate for finalize_new_accesses.
> How about:
>
>   // place.  Keep track of any newly-created set_infos being added as
>   // part of this change by adding them to NEW_SETS.
>
> for finalize_new_accesses?  OK with that change (and using your suggestion for
> apply_changes_to_insn)?

Yeah, sounds good to me.

Richard

Re: [PATCH 3/3] aarch64: Fix up debug uses in ldp/stp pass [PR113089]

2024-01-23 Thread Richard Sandiford

Alex Coplan  writes:
>> > +  writeback_pats[i] = orig_rtl[i];
>> > +
>> > +  // Now that we've characterized the defs involved, go through the
>> > +  // debug uses and determine how to update them (if needed).
>> > +  for (auto use : set->debug_insn_uses ())
>> > +  {
>> > +if (*pair_dst < *use->insn () && defs[1])
>> > +  // We're re-ordering defs[1] above a previous use of the
>> > +  // same resource.
>> > +  update_debug_use (use, defs[1], writeback_pats[1]);
>> > +else if (*pair_dst >= *use->insn ())
>> > +  // We're re-ordering defs[0] below its use.
>> > +  update_debug_use (use, defs[0], writeback_pats[0]);
>> > +  }
>> > +}
>> > +
>> > +  // Now let's look at registers which are def'd by the second insn
>> > +  // but not by the first insn, there may still be debug uses of a
>> > +  // previous def which can be affected by moving the second insn up.
>> > +  for (auto def : insns[1]->defs ())
>> > +{
>> > +  // This should be M log N where N is the number of defs in
>> > +  // insns[0] and M is the number of defs in insns[1].
>> > +  if (def->is_mem () || find_access (insns[0]->defs (), def->regno 
>> > ()))
>> > +continue;
>> > +
>> > +  auto prev_set = safe_dyn_cast (def->prev_def ());
>> > +  if (!prev_set)
>> > +  continue;
>> > +
>> > +  rtx writeback_pat = NULL_RTX;
>> > +  if (def->regno () == base_regno && (writeback & 2))
>> > +  writeback_pat = orig_rtl[1];
>> > +
>> > +  // We have a def in insns[1] which isn't def'd by the first insn.
>> > +  // Look to the previous def and see if it has any debug uses.
>> > +  for (auto use : prev_set->debug_insn_uses ())
>> > +  if (*pair_dst < *use->insn ())
>> > +// We're ordering DEF above a previous use of the same register.
>> > +update_debug_use (use, def, writeback_pat);
>> 
>> This might be more efficient as a reverse walk that breaks as soon as
>> *pair_dst >= *use->insn (), since the previous def could be arbitrarily
>> far away and have arbitrarily many leading uses.  But it probably doesn't
>> matter much in practice.
>
> Agreed, provided it's easy to get at the last debug use in constant time

Yeah, sorry, I should have checked whether that was possible.  Like you
pointed out off-list, it isn't possible as things stand, so never mind. :)

Richard

> (if so I guess 1/3 might need updating to add a last_debug_insn_use
> accessor).  I'll look into that tomorrow.
>
>> 
>> > +}
>> > +
>> > +  if ((writeback & 2) && !writeback_effect)
>> > +{
>> > +  // If the second insn initially had writeback but the final
>> > +  // pair does not, then there may be trailing debug uses of the
>> > +  // second writeback def which need re-parenting: do that.
>> > +  auto def = find_access (insns[1]->defs (), base_regno);
>> > +  gcc_assert (def);
>> > +  for (auto use : as_a (def)->debug_insn_uses ())
>> > +  {
>> > +insn_change change (use->insn ());
>> > +change.new_uses = check_remove_regno_access (attempt,
>> > + change.new_uses,
>> > + base_regno);
>> > +auto new_use = find_access (insns[0]->uses (), base_regno);
>> > +
>> > +// N.B. insns must have already shared a common base due to writeback.
>> > +gcc_assert (new_use);
>> > +
>> > +if (dump_file)
>> > +  fprintf (dump_file,
>> > +   "  i%d: cancelling wb, re-parenting trailing debug use\n",
>> > +   use->insn ()->uid ());
>> > +
>> > +change.new_uses = insert_access (attempt, new_use, change.new_uses);
>> > +crtl->ssa->change_insn (change);
>> > +  }
>> > +}
>> > +  else if (trailing_add)
>> > +fixup_debug_uses_trailing_add (attempt, pair_dst, trailing_add,
>> > + writeback_effect);
>> > +}
>> > +
>> >  // Try and actually fuse the pair given by insns I1 and I2.
>> >  //
>> >  // Here we've done enough analysis to know this is safe, we only
>> > @@ -1378,6 +1681,9 @@ ldp_bb_info::fuse_pair (bool load_p,
>> >insn_info *first = (*i1 < *i2) ? i1 : i2;
>> >insn_info *second = (first == i1) ? i2 : i1;
>> >  
>> > +  insn_info *pair_dst = move_range.singleton ();
>> > +  gcc_assert (pair_dst);
>> > +
>> >insn_info *insns[2] = { first, second };
>> >  
>> >auto_vec changes;
>> > @@ -1388,6 +1694,13 @@ ldp_bb_info::fuse_pair (bool load_p,
>> >  PATTERN (second->rtl ())
>> >};
>> >  
>> > +  // Make copies of the patterns as we might need to refer to the 
>> > original RTL
>> > +  // later, for example when updating debug uses (which is after we've 
>> > updated
>> > +  // one or both of the patterns in the candidate insns).
>> > +  rtx orig_rtl[2];
>> > +  for (int i = 0; i < 2; i++)
>> > +orig_rtl[i] = copy_rtx (pats[i]);
>> > +
>> 
>> FWIW, an alternative (that avoids RTL allocation) would be to use
>> temporarily_undo_changes and redo_changes.  But this i

Re: [PATCH] tree-optimization/113552 - fix num_call accounting in simd clone vectorization

2024-01-23 Thread Jakub Jelinek

On Tue, Jan 23, 2024 at 01:03:46PM +0100, Richard Biener wrote:
> On Tue, 23 Jan 2024, Jakub Jelinek wrote:
> 
> > On Tue, Jan 23, 2024 at 12:56:52PM +0100, Richard Biener wrote:
> > > The following avoids using exact_log2 on the number of SIMD clone calls
> > > to be emitted when vectorizing calls since that can easily be not
> > > a power of two in which case it will return -1.  For different simd
> > > clones the number of calls will differ by a multiply with a power of two
> > > only so using floor_log2 is good enough here.
> > > 
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > > 
> > >   PR tree-optimization/113552
> > >   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
> > >   floor_log2 instead of exact_log2 on the number of calls.
> > 
> > Is there any target which supports non-power-of-two simdlen?
> > If not, perhaps we should add !pow2p_hwi (num_calls) to the continue;
> > condition a few lines earlier?
> 
> Is non-power-of-two simdlen a thing?  Note there's nothing wrong
> with non-power-of-two num_calls, with VF == 4 and a group size
> of 3 you get 12 lanes and either 3 (simdlen == 4) or 6 (simdlen == 2)
> calls.
> 
> Iff non-power-of-two simdlen is a thing then we could bias
> by + num_calls (no idea why we use *_log2 in the first place, but it
> was that way since the beginning).

Ah, with SLP I can understand it doesn't have to be a power of two,
the original
+   if (n->simdclone->simdlen
+   < (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+ this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+  - exact_log2 (n->simdclone->simdlen)) * 1024;
was written for loop vectorization only and I think correctly assumed
power of 2 loop vectorization factors as well as simdlens.

I admit I don't remember why log2 rather than the count has been used,
the first version of the patch is
https://gcc.gnu.org/pipermail/gcc-patches/2013-November/374728.html
But if we keep using log2, perhaps better ceil_log2 because num_calls of
3 is certainly more expensive than 2.

Jakub

Re: [PATCH] aarch64: enforce lane checking for intrinsics

2024-01-23 Thread Richard Sandiford

Alexandre Oliva  writes:
> Calling arm_neon.h functions that take lanes as arguments may fail to
> report malformed values if the intrinsic happens to be optimized away,
> e.g. because it is pure or const and the result is unused.
>
> Adding __AARCH64_LANE_CHECK calls to the always_inline functions would
> duplicate errors in case the intrinsics are not optimized away; using
> another preprocessor macro to call either the intrinsic or
> __builtin_aarch64_im_lane_boundsi moves the error messages to the
> arm_neon.h header, and may add warnings if we fall off the end of the
> functions; duplicating the code to avoid the undesirable effect of the
> macros doesn't seem appealing; separating the checking from alternate
> no-error-checking core/pure (invisible?) intrinsics in e.g. folding of
> non-const/pure (user-callable) intrinsics seems ugly and risky.
>
> So I propose dropping the pure/const attribute from the intrinsics and
> builtin declarations, so that gimple passes won't optimize them away.
> After expand (when errors are detected and reported), we get plain
> insns rather than calls, and those are dropped if the outputs are
> unused.  It's not ideal, it could be improved, but it's safe enough
> for this stage.
>
> Regstrapped on x86_64-linux-gnu, along with other patches; also tested
> on aarch64-elf with gcc-13.  This addresses the issue first reported at
> .
> Ok to install?

Interesting idea. :)  But I don't think we should sacrifice any
performance gain (however slight) for the sake of these error messages.

Performing the check in expand is itself wrong, since the requirement
is for the arguments to be integer constant expressions.  E.g.:

#include 

float32x4_t f(float32x4_t x, float32x4_t y) {
int lane = 0;
lane += 1;
return vmulq_laneq_f32(x, y, lane);
}

is correctly rejected at -O0 but accepted when optimisation is enabled.
Clang (again correctly) rejects the code at all optimisation levels.

So I think we should enforce the immediate range within the frontend
instead, via TARGET_CHECK_BUILTIN_CALL.  We already do that for SVE
and for the recently added system register builtins.

Unfortunately that isn't suitable for stage 4 though.

Thanks,
Richard

> for  gcc/ChangeLog
>
>   * config/aarch64/aarch64-builtins.cc (aarch64_get_attributes):
>   Add lane_check parm, to rule out pure and const.
>   (aarch64_init_simd_intrinsics): Pass lane_check if any arg has
>   lane index qualifiers.
>   (aarch64_init_simd_builtin_functions): Likewise.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc |   24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 9b23b6b8c33f1..1268deea28e6c 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1258,11 +1258,12 @@ aarch64_add_attribute (const char *name, tree attrs)
>  /* Return the appropriate attributes for a function that has
> flags F and mode MODE.  */
>  static tree
> -aarch64_get_attributes (unsigned int f, machine_mode mode)
> +aarch64_get_attributes (unsigned int f, machine_mode mode,
> + bool lane_check = false)
>  {
>tree attrs = NULL_TREE;
>  
> -  if (!aarch64_modifies_global_state_p (f, mode))
> +  if (!lane_check && !aarch64_modifies_global_state_p (f, mode))
>  {
>if (aarch64_reads_global_state_p (f, mode))
>   attrs = aarch64_add_attribute ("pure", attrs);
> @@ -1318,6 +1319,7 @@ aarch64_init_simd_intrinsics (void)
>  
>tree return_type = void_type_node;
>tree args = void_list_node;
> +  bool lane_check = false;
>  
>for (int op_num = d->op_count - 1; op_num >= 0; op_num--)
>   {
> @@ -1330,10 +1332,17 @@ aarch64_init_simd_intrinsics (void)
>   return_type = eltype;
> else
>   args = tree_cons (NULL_TREE, eltype, args);
> +
> +   if (qualifiers & (qualifier_lane_index
> + | qualifier_struct_load_store_lane_index
> + | qualifier_lane_pair_index
> + | qualifier_lane_quadtup_index))
> + lane_check = true;
>   }
>  
>tree ftype = build_function_type (return_type, args);
> -  tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0]);
> +  tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0],
> +lane_check);
>unsigned int code
> = (d->fcode << AARCH64_BUILTIN_SHIFT | AARCH64_BUILTIN_GENERAL);
>tree fndecl = simulate_builtin_function_decl (input_location, d->name,
> @@ -1400,6 +1409,7 @@ aarch64_init_simd_builtin_functions (bool 
> called_from_pragma)
> || (!called_from_pragma && struct_mode_args > 0))
>   continue;
>  
> +  bool lane_check = false;
>/* Build a function

Re: [PATCH] tree-optimization/113552 - fix num_call accounting in simd clone vectorization

2024-01-23 Thread Richard Biener

On Tue, 23 Jan 2024, Jakub Jelinek wrote:

> On Tue, Jan 23, 2024 at 01:03:46PM +0100, Richard Biener wrote:
> > On Tue, 23 Jan 2024, Jakub Jelinek wrote:
> > 
> > > On Tue, Jan 23, 2024 at 12:56:52PM +0100, Richard Biener wrote:
> > > > The following avoids using exact_log2 on the number of SIMD clone calls
> > > > to be emitted when vectorizing calls since that can easily be not
> > > > a power of two in which case it will return -1.  For different simd
> > > > clones the number of calls will differ by a multiply with a power of two
> > > > only so using floor_log2 is good enough here.
> > > > 
> > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > > > 
> > > > PR tree-optimization/113552
> > > > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
> > > > floor_log2 instead of exact_log2 on the number of calls.
> > > 
> > > Is there any target which supports non-power-of-two simdlen?
> > > If not, perhaps we should add !pow2p_hwi (num_calls) to the continue;
> > > condition a few lines earlier?
> > 
> > Is non-power-of-two simdlen a thing?  Note there's nothing wrong
> > with non-power-of-two num_calls, with VF == 4 and a group size
> > of 3 you get 12 lanes and either 3 (simdlen == 4) or 6 (simdlen == 2)
> > calls.
> > 
> > Iff non-power-of-two simdlen is a thing then we could bias
> > by + num_calls (no idea why we use *_log2 in the first place, but it
> > was that way since the beginning).
> 
> Ah, with SLP I can understand it doesn't have to be a power of two,
> the original
> +   if (n->simdclone->simdlen
> +   < (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> + this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> +  - exact_log2 (n->simdclone->simdlen)) * 1024;
> was written for loop vectorization only and I think correctly assumed
> power of 2 loop vectorization factors as well as simdlens.

Note even without SLP we could vectorize a group size of 3 with
interleaving.  But yes, VF for interleaving is a power of two and
so is simdlen so the above should have worked.

> I admit I don't remember why log2 rather than the count has been used,
> the first version of the patch is
> https://gcc.gnu.org/pipermail/gcc-patches/2013-November/374728.html
> But if we keep using log2, perhaps better ceil_log2 because num_calls of
> 3 is certainly more expensive than 2.

I think only the relative accounting to the other this_badness
increments matter since as said, at least with power-of-two simdlen
we'd get different {ceil,floor}_log2 values for different simdlen.

It's never going to be 3 vs 2 but 3 * 2^n vs. 3 * 2^m so floor or
ceil doesn't matter.  In fact we could have just using
some inverse of exact_log2 (n->simdclone->simdlen).  That is,
it's only simdlen that's varying in this addend to this_badness.

For the exact_log2 case the behavior is unchanged and we now only
get a sensible number for the others now.  Maybe log2 was for the
fear of overflow or over-accounting compared to the other increments.
When overflow is not an issue we could also use
floor_log2 (num_calls * 4096) to be more "precise".  I don't know
why we have all these different weights and why they are the way
they are - but it's a lot of apples and oranges we put together
in a magic number to compare ;)

I prefer to do a minimal change now, but both floor and ceil work
for me, so if you prefer I can change (but as said, it would only
matter with the mixing with the other cost factors, and in unknown
ways since the desire is not spelled out).

Richard.

Re: [PATCH] tree-optimization/113552 - fix num_call accounting in simd clone vectorization

2024-01-23 Thread Jakub Jelinek

On Tue, Jan 23, 2024 at 01:40:15PM +0100, Richard Biener wrote:
> It's never going to be 3 vs 2 but 3 * 2^n vs. 3 * 2^m so floor or
> ceil doesn't matter.  In fact we could have just using
> some inverse of exact_log2 (n->simdclone->simdlen).  That is,
> it's only simdlen that's varying in this addend to this_badness.
> 
> For the exact_log2 case the behavior is unchanged and we now only
> get a sensible number for the others now.  Maybe log2 was for the
> fear of overflow or over-accounting compared to the other increments.
> When overflow is not an issue we could also use
> floor_log2 (num_calls * 4096) to be more "precise".  I don't know
> why we have all these different weights and why they are the way
> they are - but it's a lot of apples and oranges we put together
> in a magic number to compare ;)
> 
> I prefer to do a minimal change now, but both floor and ceil work
> for me, so if you prefer I can change (but as said, it would only
> matter with the mixing with the other cost factors, and in unknown
> ways since the desire is not spelled out).

Ok as is then.

Jakub

Re: [PATCH] LoongArch: testsuite: Disable stack protector for got-load.C

2024-01-23 Thread chenglulu


LGTM!

Thanks!

在 2024/1/23 下午7:35, Xi Ruoyao 写道:

When building GCC with --enable-default-ssp, the stack protector is
enabled for got-load.C, causing additional GOT loads for
__stack_chk_guard.  So mem/u will be matched more than 2 times and the
test will fail.

Disable stack protector to fix this issue.

gcc/testsuite:

* g++.target/loongarch/got-load.C (dg-options): Add
-fno-stack-protector.
---

Ok for trunk?

  gcc/testsuite/g++.target/loongarch/got-load.C | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/loongarch/got-load.C 
b/gcc/testsuite/g++.target/loongarch/got-load.C
index 20924c73942..17870176ab4 100644
--- a/gcc/testsuite/g++.target/loongarch/got-load.C
+++ b/gcc/testsuite/g++.target/loongarch/got-load.C
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand" } */
+/* { dg-options "-mabi=lp64d -O2 -mexplicit-relocs -mcmodel=normal 
-fdump-rtl-expand -fno-stack-protector" } */
  /* { dg-final { scan-rtl-dump-times "mem/u" 2 "expand" } } */
  
  #include

Re: [PATCH] Mark ASM_OUTPUT_FUNCTION_LABEL ()'s DECL argument as used

2024-01-23 Thread Jakub Jelinek

On Tue, Jan 23, 2024 at 11:10:05AM +0100, Jakub Jelinek wrote:
> > --- a/gcc/config/ia64/ia64.cc
> > +++ b/gcc/config/ia64/ia64.cc
> > @@ -3886,7 +3886,7 @@ ia64_expand_prologue (void)
> >  /* Output the textual info surrounding the prologue.  */
> >  
> >  void
> > -ia64_start_function (FILE *file, const char *fnname, tree decl)
> > +ia64_start_function (FILE *file, const char *fnname, tree decl 
> > ATTRIBUTE_UNUSED)
> 
> Except the too long line sure, that is what I'd go with as well.

Tested with a cross to ia64-linux (both without the patch and with the
patch), committed to trunk as obvious.

Jakub

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-23 Thread Alex Coplan

On 22/01/2024 21:50, Alex Coplan wrote:
> On 22/01/2024 15:59, Richard Sandiford wrote:
> > Alex Coplan  writes:
> > > As the PR shows (specifically #c7) we are missing updating uses of mem
> > > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > > patch fixes that.
> > >
> > > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > > own alias analysis and so can re-order stores over other accesses when
> > > we deem this is safe.  If neither store can be re-purposed (moved into
> > > the required position to form the stp while respecting the RTL-SSA
> > > constraints), then we turn both the candidate stores into "tombstone"
> > > insns (logically delete them) and insert a new stp insn.
> > >
> > > As it stands, we implement the insert case separately (after dealing
> > > with the candidate stores) in fuse_pair by inserting into the middle of
> > > the vector of changes.  This is OK when we only have to insert one
> > > change, but with this fix we would need to insert the change for the new
> > > stp plus multiple changes to fix up uses of mem (note the number of
> > > fix-ups is naturally bounded by the alias limit param to prevent
> > > quadratic behaviour).  If we kept the code structured as is and inserted
> > > into the middle of the vector, that would lead to repeated moving of
> > > elements in the vector which seems inefficient.  The structure of the
> > > code would also be a little unwieldy.
> > >
> > > To improve on that situation, this patch introduces a helper class,
> > > stp_change_builder, which implements a state machine that helps to build
> > > the required changes directly in program order.  That state machine is
> > > reponsible for deciding what changes need to be made in what order, and
> > > the code in fuse_pair then simply follows those steps.
> > >
> > > Together with the fix in the previous patch for installing new defs
> > > correctly in RTL-SSA, this fixes PR113070.
> > >
> > > We take the opportunity to rename the function decide_stp_strategy to
> > > try_repurpose_store, as that seems more descriptive of what it actually
> > > does, since stp_change_builder is now responsible for the overall change
> > > strategy.
> > >
> > > Bootstrapped/regtested as a series with/without the passes enabled on
> > > aarch64-linux-gnu, OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/113070
> > >   * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > >   (decide_stp_strategy): Reanme to ...
> > >   (try_repurpose_store): ... this.
> > >   (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > >   construct stp changes.  Fix up uses when inserting new stp insns.
> > > ---
> > >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
> > >  1 file changed, 194 insertions(+), 54 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > index 689a8c884bd..703cfb1228c 100644
> > > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> > >return range;
> > >  }
> > >  
> > > +// Class that implements a state machine for building the changes needed 
> > > to form
> > > +// a store pair instruction.  This allows us to easily build the changes 
> > > in
> > > +// program order, as required by rtl-ssa.
> > > +struct stp_change_builder
> > > +{
> > > +  enum class state
> > > +  {
> > > +FIRST,
> > > +INSERT,
> > > +FIXUP_USE,
> > > +LAST,
> > > +DONE
> > > +  };
> > > +
> > > +  enum class action
> > > +  {
> > > +TOMBSTONE,
> > > +CHANGE,
> > > +INSERT,
> > > +FIXUP_USE
> > > +  };
> > > +
> > > +  struct change
> > > +  {
> > > +action type;
> > > +insn_info *insn;
> > > +  };
> > > +
> > > +  bool done () const { return m_state == state::DONE; }
> > > +
> > > +  stp_change_builder (insn_info *insns[2],
> > > +   insn_info *repurpose,
> > > +   insn_info *dest)
> > > +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > > +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> > 
> > Just to make sure I understand: is it the case that
> > 
> >   *insns[0] <= *dest <= *insns[1]
> > 
> > ?
> 
> Yes, that is my understanding.  I thought about asserting it somewhere in
> stp_change_builder, but it seemed a bit gratuitous.
> 
> > 
> > > +
> > > +  change get_change () const
> > > +  {
> > > +switch (m_state)
> > > +  {
> > > +  case state::FIRST:
> > > + return {
> > > +   m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > > +   m_insns[0]
> > > + };
> > > +  case state::LAST:
> > > + return {
> > > +   m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,

[PATCH] m2: Use time_t in time and don't redefine alloca

2024-01-23 Thread H.J. Lu

Fix the m2 build warning and error:

[...]
../../src/gcc/m2/mc/mc.flex:32:9: warning: "alloca" redefined
   32 | #define alloca __builtin_alloca
  | ^~
In file included from /usr/include/stdlib.h:587,
 from :22:
/usr/include/alloca.h:35:10: note: this is the location of the previous 
definition
   35 | # define alloca(size)   __builtin_alloca (size)
  |  ^~
../../src/gcc/m2/mc/mc.flex: In function 'handleDate':
../../src/gcc/m2/mc/mc.flex:333:25: error: passing argument 1 of 'time' from 
incompatible point
er type [-Wincompatible-pointer-types]
  333 |   time_t  clock = time ((long *)0);
  | ^
  | |
  | long int *
In file included from ../../src/gcc/m2/mc/mc.flex:28:
/usr/include/time.h:76:29: note: expected 'time_t *' {aka 'long long int *'} 
but argument is of
 type 'long int *'
   76 | extern time_t time (time_t *__timer) __THROW;

PR bootstrap/113554
* mc/mc.flex (alloca): Don't redefine.
(handleDate): Replace (long *)0 with (time_t *)0 when calling
time.
---
 gcc/m2/mc/mc.flex | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/m2/mc/mc.flex b/gcc/m2/mc/mc.flex
index bd37d5ad100..7c841bf8d63 100644
--- a/gcc/m2/mc/mc.flex
+++ b/gcc/m2/mc/mc.flex
@@ -28,9 +28,11 @@ along with GNU Modula-2; see the file COPYING3.  If not see
 #include 
 #include 
 
+#ifndef alloca
 #ifdef __GNUC__
 #define alloca __builtin_alloca
 #endif
+#endif
 
 #if !defined(TRUE)
 #  define TRUE (1==1)
@@ -330,7 +332,7 @@ handleColumn (void)
 static void
 handleDate (void)
 {
-  time_t  clock = time ((long *)0);
+  time_t  clock = time ((time_t *)0);
   char   *sdate = ctime (&clock);
   char   *s = (char *)alloca (strlen (sdate)+2+1);
   char   *p = strchr(sdate, '\n');
-- 
2.43.0

Re: [PATCH] m2: Use time_t in time and don't redefine alloca

2024-01-23 Thread Gaius Mulley

"H.J. Lu"  writes:

> Fix the m2 build warning and error:
>
> [...]
> ../../src/gcc/m2/mc/mc.flex:32:9: warning: "alloca" redefined
>32 | #define alloca __builtin_alloca
>   | ^~
> In file included from /usr/include/stdlib.h:587,
>  from :22:
> /usr/include/alloca.h:35:10: note: this is the location of the previous 
> definition
>35 | # define alloca(size)   __builtin_alloca (size)
>   |  ^~
> ../../src/gcc/m2/mc/mc.flex: In function 'handleDate':
> ../../src/gcc/m2/mc/mc.flex:333:25: error: passing argument 1 of 'time' from 
> incompatible point
> er type [-Wincompatible-pointer-types]
>   333 |   time_t  clock = time ((long *)0);
>   | ^
>   | |
>   | long int *
> In file included from ../../src/gcc/m2/mc/mc.flex:28:
> /usr/include/time.h:76:29: note: expected 'time_t *' {aka 'long long int *'} 
> but argument is of
>  type 'long int *'
>76 | extern time_t time (time_t *__timer) __THROW;
>
>   PR bootstrap/113554
>   * mc/mc.flex (alloca): Don't redefine.
>   (handleDate): Replace (long *)0 with (time_t *)0 when calling
>   time.
> ---
>  gcc/m2/mc/mc.flex | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/m2/mc/mc.flex b/gcc/m2/mc/mc.flex
> index bd37d5ad100..7c841bf8d63 100644
> --- a/gcc/m2/mc/mc.flex
> +++ b/gcc/m2/mc/mc.flex
> @@ -28,9 +28,11 @@ along with GNU Modula-2; see the file COPYING3.  If not see
>  #include 
>  #include 
>  
> +#ifndef alloca
>  #ifdef __GNUC__
>  #define alloca __builtin_alloca
>  #endif
> +#endif
>  
>  #if !defined(TRUE)
>  #  define TRUE (1==1)
> @@ -330,7 +332,7 @@ handleColumn (void)
>  static void
>  handleDate (void)
>  {
> -  time_t  clock = time ((long *)0);
> +  time_t  clock = time ((time_t *)0);
>char   *sdate = ctime (&clock);
>char   *s = (char *)alloca (strlen (sdate)+2+1);
>char   *p = strchr(sdate, '\n');

many thanks - and lgtm,

regards.,
Gaius

Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-23 Thread H.J. Lu

On Mon, Jan 22, 2024 at 11:10 PM Richard Biener  wrote:
>
> On Mon, 22 Jan 2024, Jeff Law wrote:
>
> >
> >
> > On 1/15/24 06:34, Richard Biener wrote:
> > > When the x86 backend generates code for cpymem with the rep_8byte
> > > strathegy for the 8 byte aligned main rep movq it needs to compute
> > > an adjusted pointer to the source after doing a prologue aligning
> > > the destination.  It computes that via
> > >
> > >src_ptr + (dest_ptr - orig_dest_ptr)
> > >
> > > which is perfectly fine.  On RTL this is then
> > >
> > >  8: r134:DI=const(`g'+0x44)
> > >  9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
> > >REG_UNUSED flags:CC
> > > 56: r129:DI=const(`g'+0x4c)
> > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
> > >REG_UNUSED flags:CC
> > >REG_EQUAL const(`g'+0x4c)&0xfff8
> > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
> > >REG_DEAD r134:DI
> > >REG_UNUSED flags:CC
> > >REG_EQUAL const(`g'+0x44)-r129:DI
> > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
> > >REG_DEAD r133:DI
> > >REG_UNUSED flags:CC
> > >
> > > but as written find_base_term happily picks the first candidate
> > > it finds for the MINUS which means it picks const(`g') rather
> > > than the correct frame:DI.  This way find_base_term (but also
> > > the unfixed find_base_value used by init_alias_analysis to
> > > initialize REG_BASE_VALUE) performs pointer analysis isn't
> > > sound.  The following restricts the handling of multi-operand
> > > operations to the case we know only one can be a pointer.
> > >
> > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
> > > RTL PRE (I've opened PR113395 for this).  A more drastic patch,
> > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
> > > regressing (so testsuite coverage is bad).  I've looked at
> > > gcc.dg/tree-ssa tests and mostly scheduling changes are present,
> > > the cc1plus .text size is only 230 bytes worse.  With the this
> > > less drastic patch below most scheduling changes are gone.
> > >
> > > x86_64 might not the very best target to test for impact, but
> > > test coverage on other targets is unlikely to be very much better.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together
> > > with 2/2).  Jeff, can you maybe throw this on your tester?
> > > Jakub, you did the PR64025 fix which was for a similar issue.
> > No issues across the cross compilers with those two patches.
>
> Thanks, pushed.  I'm probably going to revert when bigger issues
> appear (and hopefully we'd get some test coverage then).
>
> Richard.

The test failed with -m32:

FAIL: gcc.dg/torture/pr113255.c   -O1  (test for excess errors)
Excess errors:
cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code


-- 
H.J.

Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-23 Thread H.J. Lu

On Tue, Jan 23, 2024 at 6:15 AM H.J. Lu  wrote:
>
> On Mon, Jan 22, 2024 at 11:10 PM Richard Biener  wrote:
> >
> > On Mon, 22 Jan 2024, Jeff Law wrote:
> >
> > >
> > >
> > > On 1/15/24 06:34, Richard Biener wrote:
> > > > When the x86 backend generates code for cpymem with the rep_8byte
> > > > strathegy for the 8 byte aligned main rep movq it needs to compute
> > > > an adjusted pointer to the source after doing a prologue aligning
> > > > the destination.  It computes that via
> > > >
> > > >src_ptr + (dest_ptr - orig_dest_ptr)
> > > >
> > > > which is perfectly fine.  On RTL this is then
> > > >
> > > >  8: r134:DI=const(`g'+0x44)
> > > >  9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
> > > >REG_UNUSED flags:CC
> > > > 56: r129:DI=const(`g'+0x4c)
> > > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
> > > >REG_UNUSED flags:CC
> > > >REG_EQUAL const(`g'+0x4c)&0xfff8
> > > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
> > > >REG_DEAD r134:DI
> > > >REG_UNUSED flags:CC
> > > >REG_EQUAL const(`g'+0x44)-r129:DI
> > > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
> > > >REG_DEAD r133:DI
> > > >REG_UNUSED flags:CC
> > > >
> > > > but as written find_base_term happily picks the first candidate
> > > > it finds for the MINUS which means it picks const(`g') rather
> > > > than the correct frame:DI.  This way find_base_term (but also
> > > > the unfixed find_base_value used by init_alias_analysis to
> > > > initialize REG_BASE_VALUE) performs pointer analysis isn't
> > > > sound.  The following restricts the handling of multi-operand
> > > > operations to the case we know only one can be a pointer.
> > > >
> > > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
> > > > RTL PRE (I've opened PR113395 for this).  A more drastic patch,
> > > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
> > > > regressing (so testsuite coverage is bad).  I've looked at
> > > > gcc.dg/tree-ssa tests and mostly scheduling changes are present,
> > > > the cc1plus .text size is only 230 bytes worse.  With the this
> > > > less drastic patch below most scheduling changes are gone.
> > > >
> > > > x86_64 might not the very best target to test for impact, but
> > > > test coverage on other targets is unlikely to be very much better.
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together
> > > > with 2/2).  Jeff, can you maybe throw this on your tester?
> > > > Jakub, you did the PR64025 fix which was for a similar issue.
> > > No issues across the cross compilers with those two patches.
> >
> > Thanks, pushed.  I'm probably going to revert when bigger issues
> > appear (and hopefully we'd get some test coverage then).
> >
> > Richard.
>
> The test failed with -m32:
>
> FAIL: gcc.dg/torture/pr113255.c   -O1  (test for excess errors)
> Excess errors:
> cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code
>

I am checking in this:

diff --git a/gcc/testsuite/gcc.dg/torture/pr113255.c
b/gcc/testsuite/gcc.dg/torture/pr113255.c
index 2f009524c6b..78af6a5a563 100644
--- a/gcc/testsuite/gcc.dg/torture/pr113255.c
+++ b/gcc/testsuite/gcc.dg/torture/pr113255.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-mtune=k8 -mstringop-strategy=rep_8byte"
{ target { x86_64-*-* i?86-*-* } } } */
+/* { dg-additional-options "-mtune=k8 -mstringop-strategy=rep_8byte"
{ target { { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */

 struct S { unsigned a[10]; unsigned y; unsigned b[6]; } g[2];


-- 
H.J.

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-23 Thread Steve Kargl

On Tue, Jan 23, 2024 at 10:08:43AM +0100, FX Coudert wrote:
> Hi Steve,
> 
> Thanks for the patch. I’ll take time to do a proper review, but
> after a first read I had the following questions:
> 
> - "an OS's libm may/will contain cospi(), etc.”: do those functions
> conform to any standard? Are there plans to implement them outside
>FreeBSD at this point?

Yes.  cospi, sinpi, and tanpi are in IEEE754-2008.  These are
also in ISO/IEC TS 18661-4 along with acospi, asinpi, and atanpi,
which I believe where merged into C23 (see n3096.pdf).  I've
checked if atan2pi is in C23, but it is in F2023.

AFAIK, FreeBSD's libm is the only OS that contains cospi, sinpi,
and tanpi.  The arc functions haven't been written/committed,
because something like cospi(x) = cos(x) / pi with pi split
into hi-lo parts is good enough.

> - On systems where libquadmath is used for _Float128, does
> libquadmath contain the implementation of the q() variants for
> those functions?

AFAIK, libquadmath does not have the half-cycle functions.  I
wrote the function trig_fallback2.F90 to deal with REAL(16) (and
maybe REAL17).

> - If I get this right, to take one example, the Fortran front-end
> will emit a call to gfortran_acospi_r4(), libgfortran provides this
> as a wrapper calling acospif(), which is called either from libm
> or from libgfortran.

Yes, that is correct.  I tried to install __builtin_acospif in
trans-intrinsic to generate a direct call to acospif, but that
led to many hours/days of frustration.  I gave up.

> This is different from other math library functions, like ACOS()
> where the acosf() call is generated directly from the front-end
> (and then the implementation comes either from libm or from
> libgfortran). Why not follow our usual way of doing things?

I gave up on that approach when I got some real obscure error
about some math function which I did not touch. 

> - On most targets, cospi() and friends are not available. Therefore,
> we end up doing the fallback (with limited precision as you noted)
> but with a lot of indirection. We could generate that code directly
> in the front-end, couldn’t we?

The precision of the fallback routines isn't too bad.  More later
today.  Late for a doctor's approintment.

-- 
Steve

[PATCH] gcc.dg/torture/pr113255.c: Fix ia32 test failure

2024-01-23 Thread H.J. Lu <>

Fix ia32 test failure:

FAIL: gcc.dg/torture/pr113255.c   -O1  (test for excess errors)
Excess errors:
cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code

PR rtl-optimization/113255
* gcc.dg/torture/pr113255.c (dg-additional-options): Add only
if not ia32.
---
 gcc/testsuite/gcc.dg/torture/pr113255.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr113255.c 
b/gcc/testsuite/gcc.dg/torture/pr113255.c
index 2f009524c6b..78af6a5a563 100644
--- a/gcc/testsuite/gcc.dg/torture/pr113255.c
+++ b/gcc/testsuite/gcc.dg/torture/pr113255.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-additional-options "-mtune=k8 -mstringop-strategy=rep_8byte" { target 
{ x86_64-*-* i?86-*-* } } } */
+/* { dg-additional-options "-mtune=k8 -mstringop-strategy=rep_8byte" { target 
{ { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */
 
 struct S { unsigned a[10]; unsigned y; unsigned b[6]; } g[2];
 
-- 
2.43.0

[PATCH v3 2/2] x86: Don't save callee-saved registers in noreturn functions

2024-01-23 Thread H.J. Lu

There is no need to save callee-saved registers in noreturn functions
if they don't throw nor support exceptions.  We can treat them the same
as functions with no_callee_saved_registers attribute.

Adjust stack-check-17.c for noreturn function which no longer saves any
registers.

With this change, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
endbr64
push   %r15
push   %r14
mov%rcx,%r14
push   %r13
push   %r12
push   %rbp
mov%esi,%ebp
push   %rbx
mov%rdx,%rbx
sub$0x28,%rsp
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

to

__libc_start_main:
endbr64
sub$0x28,%rsp
mov%esi,%ebp
mov%rdx,%rbx
mov%rcx,%r14
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
endbr64
call   
push   %r15
push   %r14
push   %r13
push   %r12
mov%rdi,%r12
push   %rbp
push   %rbx
mov%gs:0x0,%rbx
sub$0x28,%rsp
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
call   *0x0(%rip)# 
test   $0x2,%ah
je 

to

do_exit:
endbr64
call   
sub$0x28,%rsp
mov%rdi,%r12
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
mov%gs:0x0,%rbx
call   *0x0(%rip)# 
test   $0x2,%ah
je 

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

beforeafter  improvement
30043.75user  30013.16user   0%
1274.85system 1243.72system  2.4%

GCC master branch test time in seconds (new tests added):

beforeafter  improvement
216035.90user 216547.51user  0
27365.51system26658.54system 2.6%

gcc/

PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Don't
save and restore callee saved registers for a noreturn function
with nothrow or compiled with -fno-exceptions.

gcc/testsuite/

PR target/38534
* gcc.target/i386/pr38534-1.c: New file.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/stack-check-17.c: Updated.
---
 gcc/config/i386/i386-options.cc   | 16 ++--
 gcc/testsuite/gcc.target/i386/pr38534-1.c | 26 +++
 gcc/testsuite/gcc.target/i386/pr38534-2.c | 18 +
 gcc/testsuite/gcc.target/i386/pr38534-3.c | 19 ++
 gcc/testsuite/gcc.target/i386/pr38534-4.c | 18 +
 .../gcc.target/i386/stack-check-17.c  | 19 +-
 6 files changed, 102 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-4.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 0cdea30599e..f965568947c 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3371,9 +3371,21 @@ ix86_simd_clone_adjust (struct cgraph_node *node)
 static void
 ix86_set_func_type (tree fndecl)
 {
+  /* No need to save and restore callee-saved registers for a noreturn
+ function with nothrow or compiled with -fno-exceptions.
+
+ NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
+ function.  The local-pure-const pass turns an interrupt function
+ into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
+ the local-pure-const pass is run after ix86_set_func_type is called.
+ When the local-pure-const pass is enabled for LTO, the interrupt
+ function is marked as noreturn in the IR output, which leads the
+ incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
-= lookup_attribute ("no_callee_saved_registers",
-   TYPE_ATTRIBUTES (TREE_TYPE (fndecl)));
+= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
+   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
+   || lookup_attribute ("no_callee_sa

[PATCH v3 0/2] x86: Don't save callee-saved registers if not needed

2024-01-23 Thread H.J. Lu

Changes in v3:

1. Rebase against commit 02e68389494 
2. Don't add call_no_callee_saved_registers to machine_function since
all callee-saved registers are properly clobbered by callee with
no_callee_saved_registers attribute.

Changes in v2:

1. Rebase against commit f9df00340e3
2. Don't add redundant clobbered_registers check in ix86_expand_call.

In some cases, there are no need to save callee-saved registers:

1. If a noreturn function doesn't throw nor support exceptions, it can
skip saving callee-saved registers.

2. When an interrupt handler is implemented by an assembly stub which does:

  1. Save all registers.
  2. Call a C function.
  3. Restore all registers.
  4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

This patch set adds no_callee_saved_registers function attribute, which
is complementary to no_caller_saved_registers function attribute, to
classify x86 backend call-saved register handling type with

  1. Default call-saved registers.
  2. No caller-saved registers with no_caller_saved_registers attribute.
  3. No callee-saved registers with no_callee_saved_registers attribute.

Functions of no callee-saved registers won't save callee-saved registers.
If a noreturn function doesn't throw nor support exceptions, it is
classified as the no callee-saved registers type.

With these changes, __libc_start_main in glibc 2.39, which is a noreturn
function, is changed from

__libc_start_main:
endbr64
push   %r15
push   %r14
mov%rcx,%r14
push   %r13
push   %r12
push   %rbp
mov%esi,%ebp
push   %rbx
mov%rdx,%rbx
sub$0x28,%rsp
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

to

__libc_start_main:
endbr64
sub$0x28,%rsp
mov%esi,%ebp
mov%rdx,%rbx
mov%rcx,%r14
mov%rdi,(%rsp)
mov%fs:0x28,%rax
mov%rax,0x18(%rsp)
xor%eax,%eax
test   %r9,%r9

In Linux kernel 6.7.0 on x86-64, do_exit is changed from

do_exit:
endbr64
call   
push   %r15
push   %r14
push   %r13
push   %r12
mov%rdi,%r12
push   %rbp
push   %rbx
mov%gs:0x0,%rbx
sub$0x28,%rsp
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
call   *0x0(%rip)# 
test   $0x2,%ah
je 

to

do_exit:
endbr64
call   
sub$0x28,%rsp
mov%rdi,%r12
mov%gs:0x28,%rax
mov%rax,0x20(%rsp)
xor%eax,%eax
mov%gs:0x0,%rbx
call   *0x0(%rip)# 
test   $0x2,%ah
je 

I compared GCC master branch bootstrap and test times on a slow machine
with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
with the backported patch.  The performance data isn't precise since the
measurements were done on different days with different GCC sources under
different 6.6 kernel versions.

GCC master branch build time in seconds:

beforeafter  improvement
30043.75user  30013.16user   0%
1274.85system 1243.72system  2.4%

GCC master branch test time in seconds (new tests added):

beforeafter  improvement
216035.90user 216547.51user  0
27365.51system26658.54system 2.6%

Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39.
Systems perform normally.


H.J. Lu (2):
  x86: Add no_callee_saved_registers function attribute
  x86: Don't save callee-saved registers in noreturn functions

 gcc/config/i386/i386-expand.cc| 52 +---
 gcc/config/i386/i386-options.cc   | 61 +++
 gcc/config/i386/i386.cc   | 57 +
 gcc/config/i386/i386.h| 16 -
 gcc/doc/extend.texi   |  8 +++
 .../gcc.dg/torture/no-callee-saved-run-1a.c   | 23 +++
 .../gcc.dg/torture/no-callee-saved-run-1b.c   | 59 ++
 .../gcc.target/i386/no-callee-saved-1.c   | 30 +
 .../gcc.target/i386/no-callee-saved-10.c  | 46 ++
 .../gcc.target/i386/no-callee-saved-11.c  | 11 
 .../gcc.target/i386/no-callee-saved-12.c  | 10 +++
 .../gcc.target/i386/no-callee-saved-13.c  | 16 +
 .../gcc.target/i386/no-callee-saved-14.c  | 16 +
 .../gcc.target/i386/no-callee-saved-15.c  | 17 ++
 .../gcc.target/i386/no-callee-saved-16.c  | 16 +
 .../gcc.target/i386/no-callee-saved-17.c  | 16 +
 .../gcc.target/i386/no-callee-saved-18.c  | 51 
 .../gcc.target/i386/

[PATCH v3 1/2] x86: Add no_callee_saved_registers function attribute

2024-01-23 Thread H.J. Lu

When an interrupt handler is implemented by an assembly stub which does:

1. Save all registers.
2. Call a C function.
3. Restore all registers.
4. Return from interrupt.

it is completely unnecessary to save and restore any registers in the C
function called by the assembly stub, even if they would normally be
callee-saved.

Add no_callee_saved_registers function attribute, which is complementary
to no_caller_saved_registers function attribute, to mark a function which
doesn't have any callee-saved registers.  Such a function won't save and
restore any registers.  Classify function call-saved register handling
type with:

1. Default call-saved registers.
2. No caller-saved registers with no_caller_saved_registers attribute.
3. No callee-saved registers with no_callee_saved_registers attribute.

Disallow sibcall if callee is a no_callee_saved_registers function
and caller isn't a no_callee_saved_registers function.  Otherwise,
callee-saved registers won't be preserved.

After a no_callee_saved_registers function is called, all registers may
be clobbered.  If the calling function isn't a no_callee_saved_registers
function, we need to preserve all registers which aren't used by function
calls.

gcc/

PR target/103503
PR target/113312
* config/i386/i386-expand.cc (ix86_expand_call): Replace
no_caller_saved_registers check with call_saved_registers check.
Clobber all registers that are not used by the callee with
no_callee_saved_registers attribute.
* config/i386/i386-options.cc (ix86_set_func_type): Set
call_saved_registers to TYPE_NO_CALLEE_SAVED_REGISTERS for
noreturn function.  Disallow no_callee_saved_registers with
interrupt or no_caller_saved_registers attributes together.
(ix86_set_current_function): Replace no_caller_saved_registers
check with call_saved_registers check.
(ix86_handle_no_caller_saved_registers_attribute): Renamed to ...
(ix86_handle_call_saved_registers_attribute): This.
(ix86_gnu_attributes): Add
ix86_handle_call_saved_registers_attribute.
* config/i386/i386.cc (ix86_conditional_register_usage): Replace
no_caller_saved_registers check with call_saved_registers check.
(ix86_function_ok_for_sibcall): Don't allow callee with
no_callee_saved_registers attribute when the calling function
has callee-saved registers.
(ix86_comp_type_attributes): Also check
no_callee_saved_registers.
(ix86_epilogue_uses): Replace no_caller_saved_registers check
with call_saved_registers check.
(ix86_hard_regno_scratch_ok): Likewise.
(ix86_save_reg): Replace no_caller_saved_registers check with
call_saved_registers check.  Don't save any registers for
TYPE_NO_CALLEE_SAVED_REGISTERS.  Save all registers with
TYPE_DEFAULT_CALL_SAVED_REGISTERS if function with
no_callee_saved_registers attribute is called.
(find_drap_reg): Replace no_caller_saved_registers check with
call_saved_registers check.
* config/i386/i386.h (call_saved_registers_type): New enum.
(machine_function): Replace no_caller_saved_registers with
call_saved_registers.
* doc/extend.texi: Document no_callee_saved_registers attribute.

gcc/testsuite/

PR target/103503
PR target/113312
* gcc.dg/torture/no-callee-saved-run-1a.c: New file.
* gcc.dg/torture/no-callee-saved-run-1b.c: Likewise.
* gcc.target/i386/no-callee-saved-1.c: Likewise.
* gcc.target/i386/no-callee-saved-2.c: Likewise.
* gcc.target/i386/no-callee-saved-3.c: Likewise.
* gcc.target/i386/no-callee-saved-4.c: Likewise.
* gcc.target/i386/no-callee-saved-5.c: Likewise.
* gcc.target/i386/no-callee-saved-6.c: Likewise.
* gcc.target/i386/no-callee-saved-7.c: Likewise.
* gcc.target/i386/no-callee-saved-8.c: Likewise.
* gcc.target/i386/no-callee-saved-9.c: Likewise.
* gcc.target/i386/no-callee-saved-10.c: Likewise.
* gcc.target/i386/no-callee-saved-11.c: Likewise.
* gcc.target/i386/no-callee-saved-12.c: Likewise.
* gcc.target/i386/no-callee-saved-13.c: Likewise.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-16.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/no-callee-saved-18.c: Likewise.
---
 gcc/config/i386/i386-expand.cc| 52 +---
 gcc/config/i386/i386-options.cc   | 49 +++
 gcc/config/i386/i386.cc   | 57 ++
 gcc/config/i386/i386.h| 16 -
 gcc/doc/extend.texi   |  8 +++
 .../gcc.dg/torture/no-callee-saved-run-1a.c   | 23 
 .../gcc.dg/torture/no-callee-saved-run-1b.c

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-23 Thread Richard Sandiford

Radek Barton  writes:
> Hello Richard.
>
> Thank you for your suggestion. I am sending a patch update according to it.
>
>> How about avoiding the clash by using the names HIDDEN, SYMBOL_TYPE and
>> SYMBOL_SIZE, with SYMBOL_TYPE taking the symbol type as argument?
>
> Yes, unless the symbol is explicitly exported using `__declspec(dllexport)`, 
> it will be effectively hidden.
>
>> What's the practical effect of not marking the symbols as hidden on
>> mingw32?  Will they still be local to the DLL/EXE, since they haven't
>>been explicitly exported?  (Sorry for the probably dumb question.)

Thanks for the updated patch and sorry for the slow reply -- I was
away last week.

I've now pushed the patch after testing on aarch64-linux-gnu.

Richard

[PATCH]middle-end: fix epilog reductions when vector iters peeled [PR113364]

2024-01-23 Thread Tamar Christina

Hi All,

This fixes a bug where vect_create_epilog_for_reduction does not handle the
case where all exits are early exits.  In this case we should do like induction
handling code does and not have a main exit.

Bootstrapped Regtested on x86_64-pc-linux-gnu
with --enable-checking=release --enable-lto --with-arch=native
--with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.

This shows that some new miscompiles are happening (stage3 is likely 
miscompiled)
but that's unrelated to this patch and I'll look at it next.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/113364
* tree-vect-loop.cc (vect_create_epilog_for_reduction): If all exits all
early exits then we must reduce from the first offset for all of them.

gcc/testsuite/ChangeLog:

PR tree-optimization/113364
* gcc.dg/vect/vect-early-break_107-pr113364.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c
new file mode 100644
index 
..f489265dbfe5eb8fe302dcc34901abaf6e6d5c14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-w" } */
+
+typedef const unsigned char *It;
+It DecodeSLEB128(It begin, It end, int *v) {
+  int value = 0;
+  unsigned shift = 0;
+  unsigned char byte;
+  do
+  {
+if (begin == end)
+  return begin;
+byte = *(begin++);
+int slice = byte & 0x7f;
+value |= slice << shift;
+  } while (byte >= 128);
+  *v = value;
+  return begin;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
fe631252dc2258e8ea42179b4ba068a480be9e38..4da1421c8f09746ef4b293573e4f861b642349e1
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5982,7 +5982,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  loop-closed PHI of the inner loop which we remember as
  def for the reduction PHI generation.  */
   bool double_reduc = false;
-  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
+&& !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c
new file mode 100644
index 
..f489265dbfe5eb8fe302dcc34901abaf6e6d5c14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-w" } */
+
+typedef const unsigned char *It;
+It DecodeSLEB128(It begin, It end, int *v) {
+  int value = 0;
+  unsigned shift = 0;
+  unsigned char byte;
+  do
+  {
+if (begin == end)
+  return begin;
+byte = *(begin++);
+int slice = byte & 0x7f;
+value |= slice << shift;
+  } while (byte >= 128);
+  *v = value;
+  return begin;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
fe631252dc2258e8ea42179b4ba068a480be9e38..4da1421c8f09746ef4b293573e4f861b642349e1
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5982,7 +5982,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  loop-closed PHI of the inner loop which we remember as
  def for the reduction PHI generation.  */
   bool double_reduc = false;
-  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
+&& !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {

Re: [PATCH]middle-end: fix epilog reductions when vector iters peeled [PR113364]

2024-01-23 Thread Richard Biener

On Tue, 23 Jan 2024, Tamar Christina wrote:

> Hi All,
> 
> This fixes a bug where vect_create_epilog_for_reduction does not handle the
> case where all exits are early exits.  In this case we should do like 
> induction
> handling code does and not have a main exit.
> 
> Bootstrapped Regtested on x86_64-pc-linux-gnu
> with --enable-checking=release --enable-lto --with-arch=native
> --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
> 
> This shows that some new miscompiles are happening (stage3 is likely 
> miscompiled)
> but that's unrelated to this patch and I'll look at it next.
> 
> Ok for master?

OK, but can you, as followup, change the name of main_exit_p which
doesn't reflect how it's computed now?

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/113364
>   * tree-vect-loop.cc (vect_create_epilog_for_reduction): If all exits all
>   early exits then we must reduce from the first offset for all of them.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/113364
>   * gcc.dg/vect/vect-early-break_107-pr113364.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c
> new file mode 100644
> index 
> ..f489265dbfe5eb8fe302dcc34901abaf6e6d5c14
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_107-pr113364.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-w" } */
> +
> +typedef const unsigned char *It;
> +It DecodeSLEB128(It begin, It end, int *v) {
> +  int value = 0;
> +  unsigned shift = 0;
> +  unsigned char byte;
> +  do
> +  {
> +if (begin == end)
> +  return begin;
> +byte = *(begin++);
> +int slice = byte & 0x7f;
> +value |= slice << shift;
> +  } while (byte >= 128);
> +  *v = value;
> +  return begin;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> fe631252dc2258e8ea42179b4ba068a480be9e38..4da1421c8f09746ef4b293573e4f861b642349e1
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5982,7 +5982,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>   loop-closed PHI of the inner loop which we remember as
>   def for the reduction PHI generation.  */
>bool double_reduc = false;
> -  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
> +  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
> +  && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
>stmt_vec_info rdef_info = stmt_info;
>if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
>  {
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] arm: Fix missing bti instruction for virtual thunks

2024-01-23 Thread Richard Ball

Adds missing bti instruction at the beginning of a virtual
thunk, when bti is enabled.

gcc/ChangeLog:

* config/arm/arm.cc (arm_output_mi_thunk): Emit
insn for bti_c when bti is enabled.

gcc/testsuite/ChangeLog:

* g++.target/arm/bti_thunk.C: New test.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index e5a944486d7bd583627b0e22dfe8f95862e975bb..91eee8be7c1a59118fbf443557561fb3e0689d61 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -29257,6 +29257,8 @@ arm_output_mi_thunk (FILE *file, tree thunk, HOST_WIDE_INT delta,
   const char *fnname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk));
 
   assemble_start_function (thunk, fnname);
+  if (aarch_bti_enabled ())
+emit_insn (aarch_gen_bti_c());
   if (TARGET_32BIT)
 arm32_output_mi_thunk (file, thunk, delta, vcall_offset, function);
   else
diff --git a/gcc/testsuite/g++.target/arm/bti_thunk.C b/gcc/testsuite/g++.target/arm/bti_thunk.C
new file mode 100644
index ..5c4a8e5a8d74581eca2b877c000a5b34ddca0e9b
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/bti_thunk.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main+pacbti -O1 -mbranch-protection=bti --save-temps" } */
+
+#include 
+
+struct C18 {
+  virtual void f7();
+};
+
+struct C19 : virtual C18 {
+  virtual void f7();
+};
+
+void C19::f7() {
+  printf("foo\n");
+}
+
+/* { dg-final { scan-assembler-times "\tbti" 2 } } */
\ No newline at end of file

Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference

2024-01-23 Thread H.J. Lu

On Mon, Jan 22, 2024 at 11:10 PM Richard Biener  wrote:
>
> On Mon, 22 Jan 2024, Jeff Law wrote:
>
> >
> >
> > On 1/15/24 06:34, Richard Biener wrote:
> > > When the x86 backend generates code for cpymem with the rep_8byte
> > > strathegy for the 8 byte aligned main rep movq it needs to compute
> > > an adjusted pointer to the source after doing a prologue aligning
> > > the destination.  It computes that via
> > >
> > >src_ptr + (dest_ptr - orig_dest_ptr)
> > >
> > > which is perfectly fine.  On RTL this is then
> > >
> > >  8: r134:DI=const(`g'+0x44)
> > >  9: {r133:DI=frame:DI-0x4c;clobber flags:CC;}
> > >REG_UNUSED flags:CC
> > > 56: r129:DI=const(`g'+0x4c)
> > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;}
> > >REG_UNUSED flags:CC
> > >REG_EQUAL const(`g'+0x4c)&0xfff8
> > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;}
> > >REG_DEAD r134:DI
> > >REG_UNUSED flags:CC
> > >REG_EQUAL const(`g'+0x44)-r129:DI
> > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;}
> > >REG_DEAD r133:DI
> > >REG_UNUSED flags:CC
> > >
> > > but as written find_base_term happily picks the first candidate
> > > it finds for the MINUS which means it picks const(`g') rather
> > > than the correct frame:DI.  This way find_base_term (but also
> > > the unfixed find_base_value used by init_alias_analysis to
> > > initialize REG_BASE_VALUE) performs pointer analysis isn't
> > > sound.  The following restricts the handling of multi-operand
> > > operations to the case we know only one can be a pointer.
> > >
> > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some
> > > RTL PRE (I've opened PR113395 for this).  A more drastic patch,
> > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c
> > > regressing (so testsuite coverage is bad).  I've looked at
> > > gcc.dg/tree-ssa tests and mostly scheduling changes are present,
> > > the cc1plus .text size is only 230 bytes worse.  With the this
> > > less drastic patch below most scheduling changes are gone.
> > >
> > > x86_64 might not the very best target to test for impact, but
> > > test coverage on other targets is unlikely to be very much better.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together
> > > with 2/2).  Jeff, can you maybe throw this on your tester?
> > > Jakub, you did the PR64025 fix which was for a similar issue.
> > No issues across the cross compilers with those two patches.
>
> Thanks, pushed.  I'm probably going to revert when bigger issues
> appear (and hopefully we'd get some test coverage then).
>
> Richard.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113562

-- 
H.J.

[PATCH]middle-end: rename main_exit_p in reduction code.

2024-01-23 Thread Tamar Christina

Hi All,

This renamed main_exit_p to last_val_reduc_p to more accurately
reflect what the value is calculating.

Ok for master if bootstrap passes? Incremental build shows it's fine.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vect_get_vect_def,
vect_create_epilog_for_reduction): Rename main_exit_p to
last_val_reduc_p.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
4da1421c8f09746ef4b293573e4f861b642349e1..21a997599f397ba6c2cd15c3b9c8b04513bc0c83
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5892,25 +5892,26 @@ vect_create_partial_epilog (tree vec_def, tree vectype, 
code_helper code,
 }
 
 /* Retrieves the definining statement to be used for a reduction.
-   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
-   the reduction definitions.  */
+   For LAST_VAL_REDUC_P we use the current VEC_STMTs which correspond to the
+   final value after vectorization and otherwise we look at the reduction
+   definitions to get the first.  */
 
 tree
 vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
-  slp_instance slp_node_instance, bool main_exit_p, unsigned i,
-  vec  &vec_stmts)
+  slp_instance slp_node_instance, bool last_val_reduc_p,
+  unsigned i, vec  &vec_stmts)
 {
   tree def;
 
   if (slp_node)
 {
-  if (!main_exit_p)
+  if (!last_val_reduc_p)
 slp_node = slp_node_instance->reduc_phis;
   def = vect_get_slp_vect_def (slp_node, i);
 }
   else
 {
-  if (!main_exit_p)
+  if (!last_val_reduc_p)
reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
   vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
   def = gimple_get_lhs (vec_stmts[0]);
@@ -5982,8 +5983,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  loop-closed PHI of the inner loop which we remember as
  def for the reduction PHI generation.  */
   bool double_reduc = false;
-  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
-&& !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+  bool last_val_reduc_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
+ && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {
@@ -6233,7 +6234,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 {
   gimple_seq stmts = NULL;
   def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
-  main_exit_p, i, vec_stmts);
+  last_val_reduc_p, i, vec_stmts);
   for (j = 0; j < ncopies; j++)
{
  tree new_def = copy_ssa_name (def);




-- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
4da1421c8f09746ef4b293573e4f861b642349e1..21a997599f397ba6c2cd15c3b9c8b04513bc0c83
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5892,25 +5892,26 @@ vect_create_partial_epilog (tree vec_def, tree vectype, 
code_helper code,
 }
 
 /* Retrieves the definining statement to be used for a reduction.
-   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
-   the reduction definitions.  */
+   For LAST_VAL_REDUC_P we use the current VEC_STMTs which correspond to the
+   final value after vectorization and otherwise we look at the reduction
+   definitions to get the first.  */
 
 tree
 vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
-  slp_instance slp_node_instance, bool main_exit_p, unsigned i,
-  vec  &vec_stmts)
+  slp_instance slp_node_instance, bool last_val_reduc_p,
+  unsigned i, vec  &vec_stmts)
 {
   tree def;
 
   if (slp_node)
 {
-  if (!main_exit_p)
+  if (!last_val_reduc_p)
 slp_node = slp_node_instance->reduc_phis;
   def = vect_get_slp_vect_def (slp_node, i);
 }
   else
 {
-  if (!main_exit_p)
+  if (!last_val_reduc_p)
reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
   vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
   def = gimple_get_lhs (vec_stmts[0]);
@@ -5982,8 +5983,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  loop-closed PHI of the inner loop which we remember as
  def for the reduction PHI generation.  */
   bool double_reduc = false;
-  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
-&& !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+  bool last_val_reduc_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
+ && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {
@@ -6233,7 +6234,7 @@ vect_create_epilog_for_redu

[COMMITTED] Update year in Gnatvsn

2024-01-23 Thread Marc Poulhiès

From: Ronan Desplanques 

gcc/ada/
* gnatvsn.ads: Update year.
---
 gcc/ada/gnatvsn.ads | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/gnatvsn.ads b/gcc/ada/gnatvsn.ads
index 934c22206f7..29238362cc0 100644
--- a/gcc/ada/gnatvsn.ads
+++ b/gcc/ada/gnatvsn.ads
@@ -39,7 +39,7 @@ package Gnatvsn is
--  Note: Makefile.in uses the library version string to construct the
--  soname value.
 
-   Current_Year : constant String := "2023";
+   Current_Year : constant String := "2024";
--  Used in printing copyright messages
 
Verbose_Library_Version : constant String := "GNAT Lib v" & Library_Version;
-- 
2.43.0

Re: [PATCH]middle-end: rename main_exit_p in reduction code.

2024-01-23 Thread Richard Biener




> Am 23.01.2024 um 17:04 schrieb Tamar Christina :
> 
> Hi All,
> 
> This renamed main_exit_p to last_val_reduc_p to more accurately
> reflect what the value is calculating.
> 
> Ok for master if bootstrap passes? Incremental build shows it's fine.

Ok

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>* tree-vect-loop.cc (vect_get_vect_def,
>vect_create_epilog_for_reduction): Rename main_exit_p to
>last_val_reduc_p.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 4da1421c8f09746ef4b293573e4f861b642349e1..21a997599f397ba6c2cd15c3b9c8b04513bc0c83
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5892,25 +5892,26 @@ vect_create_partial_epilog (tree vec_def, tree 
> vectype, code_helper code,
> }
> 
> /* Retrieves the definining statement to be used for a reduction.
> -   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
> -   the reduction definitions.  */
> +   For LAST_VAL_REDUC_P we use the current VEC_STMTs which correspond to the
> +   final value after vectorization and otherwise we look at the reduction
> +   definitions to get the first.  */
> 
> tree
> vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> -   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
> -   vec  &vec_stmts)
> +   slp_instance slp_node_instance, bool last_val_reduc_p,
> +   unsigned i, vec  &vec_stmts)
> {
>   tree def;
> 
>   if (slp_node)
> {
> -  if (!main_exit_p)
> +  if (!last_val_reduc_p)
> slp_node = slp_node_instance->reduc_phis;
>   def = vect_get_slp_vect_def (slp_node, i);
> }
>   else
> {
> -  if (!main_exit_p)
> +  if (!last_val_reduc_p)
>reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
>   vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
>   def = gimple_get_lhs (vec_stmts[0]);
> @@ -5982,8 +5983,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>  loop-closed PHI of the inner loop which we remember as
>  def for the reduction PHI generation.  */
>   bool double_reduc = false;
> -  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
> - && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
> +  bool last_val_reduc_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
> +  && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
>   stmt_vec_info rdef_info = stmt_info;
>   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
> {
> @@ -6233,7 +6234,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
> {
>   gimple_seq stmts = NULL;
>   def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> -   main_exit_p, i, vec_stmts);
> +   last_val_reduc_p, i, vec_stmts);
>   for (j = 0; j < ncopies; j++)
>{
>  tree new_def = copy_ssa_name (def);
> 
> 
> 
> 
> --
>

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-23 Thread Steve Kargl

On Tue, Jan 23, 2024 at 06:40:27AM -0800, Steve Kargl wrote:
> On Tue, Jan 23, 2024 at 10:08:43AM +0100, FX Coudert wrote:
> > Hi Steve,
> > 
> > Thanks for the patch. I’ll take time to do a proper review, but
> > after a first read I had the following questions:
> > 
> > - "an OS's libm may/will contain cospi(), etc.”: do those functions
> > conform to any standard? Are there plans to implement them outside
> >FreeBSD at this point?
> 
> Yes.  cospi, sinpi, and tanpi are in IEEE754-2008.  These are
> also in ISO/IEC TS 18661-4 along with acospi, asinpi, and atanpi,
> which I believe where merged into C23 (see n3096.pdf).  I've

I haven't checked.

> checked if atan2pi is in C23, but it is in F2023.
> 
> AFAIK, FreeBSD's libm is the only OS that contains cospi, sinpi,
> and tanpi.  The arc functions haven't been written/committed,
> because something like cospi(x) = cos(x) / pi with pi split
> into hi-lo parts is good enough.

Whoops. acospi(x) = acos(x) / pi.  The fallback implements this
as acospi(x) = acos(x) * invpihi + acos(x) * invpilo with invpihi
the leading digits(x)/2 bits of 1/pi and invpilo the trailing
digits(x) of 1/pi.

For most libm's, acos(x) with |x| <= 1 have good numerical accuracy
and handle subnormal, |x| > 1, +-inf, and NaN by setting appropriate
exceptions.  With FreeBSD and exhaustive testing in the interval, I see

% tlibm acos -f -x 0x1p-9 -X 1 -PED
Interval tested for acosf: [0.00195312,1]
   ulp <= 0.5:  96.432%  72803871 |  96.432%  72803871
0.5 <  ulp <  0.6:  3.110%   2348017 |  99.542%  75151888
0.6 <  ulp <  0.7:  0.419%316134 |  99.961%  75468022
0.7 <  ulp <  0.8:  0.039% 29450 | 100.000%  75497472
Max ulp: 0.796035 at 4.99814630e-01

Exhaustive testing of the fallback routine for acospi(x) gives

   program foo

   implicit none

   integer n
   real(4) f4, x, xm
   real(8) f8, u, v

   u = -1
   x = scale(1., -9)
   do
  f4 = acospi(x)
  f8 = acospi(real(x,8))
  n = exponent(f8)
  v = abs(f8 - f4)
  v = scale(v, digits(f4) - n)
  if (v > u) then
 u = v
 xm = x
  end if
  x = nearest(x, 2.)
  if (x > 1) exit
   end do

   print *, u, acospi(xm), acospi(real(xm,8))

   end program foo

% gfcx -o z -O a.f90 && ./z
   1.9551682807505131   0.337791681  0.33779173955822828 

so a max ulp of 1.955.  Hopefully, OS libm's will catch up and
provide an implementation that gets the extra 1+ bit of precision.

Admittedly, the fallback routines for cospi(), sinpi(), and
tanpi() could be better, but much slower.  For now, I do for
example,

float
cospif (float x)
{
   return cosf (x * pihi_f + x * pilo_f);
}

A better routine would be,

/* cospi(x) = cos(pi*x) = cos(pi*n+pi*r) = +-cos(pi*r) with 0 <= r < 1i
   and n integer. */
float
cospif (float x)
{
   int s;
   float ax, n, r;
   ax = fabsf(ax);
   if (ax > 0x1p23) {
  return 1;
   } else {
  /* FreeBSD uses bit twiddling.  See
 https://cgit.freebsd.org/src/tree/lib/msun/src/s_cospif.c  */
  n = floor(ax);  /* integer part */
  r = ax - n; /* remainder */
  s = floor(n/2) * 2 - n == 0 : 1 : -1;  /* even n? */ 
  return s * cosf (r * pi);
   }
}

> > - On systems where libquadmath is used for _Float128, does
> > libquadmath contain the implementation of the q() variants for
> > those functions?
> 
> AFAIK, libquadmath does not have the half-cycle functions.  I
> wrote the function trig_fallback2.F90 to deal with REAL(16) (and
> maybe REAL17).
> 
> > - If I get this right, to take one example, the Fortran front-end
> > will emit a call to gfortran_acospi_r4(), libgfortran provides this
> > as a wrapper calling acospif(), which is called either from libm
> > or from libgfortran.
> 
> Yes, that is correct.  I tried to install __builtin_acospif in
> trans-intrinsic to generate a direct call to acospif, but that
> led to many hours/days of frustration.  I gave up.
> 
> > This is different from other math library functions, like ACOS()
> > where the acosf() call is generated directly from the front-end
> > (and then the implementation comes either from libm or from
> > libgfortran). Why not follow our usual way of doing things?
> 
> I gave up on that approach when I got some real obscure error
> about some math function which I did not touch. 
> 

I tried adding

DEFINE_MATH_BUILTIN_C (COSPI,  "cospi",   0)
or
DEFINE_MATH_BUILTIN (COSPI,  "cospi",   0)
or
OTHER_BUILTIN (COSPI, "cospi",  1, false)  (and variations).

to mathbuiltins.def but this led to some obscure error message
about some other unrelated intrinsic subprogram.

-- 
Steve

[PATCH] Update my email in MAINTAINERS

2024-01-23 Thread Andrew Stubbs

I've moved to BayLibre and don't have access to my codesourcery.com
address, at least for a while.

ChangeLog:

* MAINTAINERS: Update

Signed-off-by:  Andrew Stubbs 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index cb5a42501dd..547237e0cf8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -55,7 +55,7 @@ aarch64 port  Marcus Shawcroft

 aarch64 port   Kyrylo Tkachov  
 alpha port Richard Henderson   
 amdgcn portJulian Brown
-amdgcn portAndrew Stubbs   
+amdgcn portAndrew Stubbs   
 arc port   Joern Rennecke  
 arc port   Claudiu Zissulescu  
 arm port   Nick Clifton
-- 
2.41.0

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-23 Thread Steve Kargl

On Tue, Jan 23, 2024 at 01:37:54PM +0200, Janne Blomqvist wrote:
> On Tue, Jan 23, 2024 at 11:09 AM FX Coudert  wrote:
> >
> > Hi Steve,
> 
> Hello, long time no see.

Time is short and we're all busy with life, but it is nice to see
familiar names!

> 
> > Thanks for the patch. I’ll take time to do a proper review, but after a 
> > first read I had the following questions:
> >
> > - "an OS's libm may/will contain cospi(), etc.”: do those functions conform 
> > to any standard? Are there plans to implement them outside FreeBSD at this 
> > point?
> 
> acospi, asinpi, atan2pi, atanpi, cospi, sinpi, tanpi are all in IEEE
> 754-2019, and are slated to be part of C23. I would assume actively
> developed libm's will eventually catch up and implement them.
> 
> > - If I get this right, to take one example, the Fortran front-end will emit 
> > a call to gfortran_acospi_r4(), libgfortran provides this as a wrapper 
> > calling acospif(), which is called either from libm or from libgfortran. 
> > This is different from other math library functions, like ACOS() where the 
> > acosf() call is generated directly from the front-end (and then the 
> > implementation comes either from libm or from libgfortran). Why not follow 
> > our usual way of doing things?
> 
> Good point, that's what we've done in c99_functions.c in libgfortran.
> We should probably do this so we can skip jumping via libgfortran when
> libm implements these directly.
> 
> > - On most targets, cospi() and friends are not available. Therefore, we end 
> > up doing the fallback (with limited precision as you noted) but with a lot 
> > of indirection. We could generate that code directly in the front-end, 
> > couldn’t we?
> 
> The frontend generally doesn't know what the target libm implements,
> hence it's better to just generate the call, and if necessary have a
> barebones implementation in libgfortran if libm doesn't implement it
> properly.
> 

I suppose I could add a gfc_conv_trigpi() to trans-intrinsic.c,
which could be used emit the function calls directly.  Give me
until Saturday to look at the issue.

I've finally understood what Harald has been trying to tell
me about the MPFR version issue in bugzilla, so I need to 
revamp simplification.

-- 
Steve

Re: [PATCH] c: Call c_fully_fold on __atomic_* operands in atomic_bitint_fetch_using_cas_loop [PR113518]

2024-01-23 Thread Joseph Myers

On Tue, 23 Jan 2024, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcase shows, I forgot to call c_fully_fold on the
> __atomic_*/__sync_* operands called on _BitInt address, the expressions
> are then used inside of TARGET_EXPR initializers etc. and are never fully
> folded later, which means we can ICE e.g. on C_MAYBE_CONST_EXPR trees
> inside of those.
> 
> The following patch fixes it, while the function currently is only called
> in the C FE because C++ doesn't support BITINT_TYPE, I think guarding the
> calls on !c_dialect_cxx () is safer.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] openmp, fortran: Add Fortran support for indirect clause on the declare target directive

2024-01-23 Thread Tobias Burnus


Kwok Cheung Yeung wrote:
This patch adds support for the indirect clause on the OpenMP 'declare 
target' directive in Fortran. As with the C and C++ front-ends, this 
applies the 'omp declare target indirect' attribute on affected 
function declarations. The C test cases have also been translated to 
Fortran where appropriate.


Okay for mainline?


LGTM – can you also update the following libgomp.texi entry?

@item @code{indirect} clause in @code{declare target} @tab P @tab Only C 
and C++


Thanks,

Tobias

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-23 Thread Xi Ruoyao

On Mon, 2024-01-22 at 15:27 +0800, chenglulu wrote:
> > > The failure of this test case was because the compiler believes that two
> > > (UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the
> > > same result, but this isn't true because the result depends on PC.  Thus
> > > (pc) needed to be included in the RTX, like:
> > > 
> > >    [(set (match_operand:DI 0 "register_operand" "=r")
> > >  (unspec:DI [(match_operand:DI 2 "") (pc)] 
> > > UNSPEC_LA_PCREL_64_PART1))
> > >     (set (match_operand:DI 1 "register_operand" "=r")
> > >  (unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))]
> > > 
> > > With this the buggy REG_UNUSED notes were gone.  But it then prevented
> > > the CSE when loading the address of __tls_get_addr (i.e. if we address
> > > 10 TLE_LD symbols in a function it would emit 10 instances of "la.global
> > > __tls_get_addr") so I added an REG_EQUAL note for it.  For symbols other
> > > than __tls_get_addr such notes are added automatically by optimization
> > > passes.
> > > 
> > > Updated patch attached.
> > > 
> > I'm eliminating redundant la.global directives in my macro 
> > implementation.
> > 
> > I will be testing this patch.
> > 
> > 
> > 
> > 
> With this patch, spec2006 can pass the test, but spec2017 621 and 654 
> tests fail.
> I haven't debugged the specific cause of the problem yet.

Try removing the TARGET_DELEGITIMIZE_ADDRESS hook?  After eating some
unhealthy food in the midnight I realized the hook only
papers over the same issue caused spec2006 failure.  I tried a bootstrap
with BOOT_CFLAGS=-O2 -g -mcmodel=extreme and TARGET_DELEGITIMIZE_ADDRESS
commented out, and there is no more spurious "note: non-delegitimized
UNSPEC UNSPEC_LA_PCREL_64_PART1 (42) found in variable location" things.
I feel that this hook is still written in a buggy way, so maybe removing
it will solve the spec2017 issue.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH v2] c/c++: Tweak warning for 'always_inline function might not be inlinable'

2024-01-23 Thread Hans-Peter Nilsson

Change from v1: The message is changed as per the review.
The powerpc test-case is dropped from the changes as the
part quoted in a comment now does not change and so cannot
cause further confusion.  The commit message is tweaked.
It now also mentions clang.  I intend to commit this on
Thursday 2024-01-25, as per Richard Biener's approval.

-- >8 --
When you're not regularly exposed to this warning, it is
easy to be misled by its wording, believing that there's
something else in the function that stops it from being
inlined, something other than the lack of also being
*declared* inline.  Also, clang does not warn.

It's just a warning: without the inline directive, there has
to be a secondary reason for the function to be inlined,
other than the always_inline attribute, a reason that may be
in effect despite the warning.

Whenever the text is quoted in inline-related bugzilla
entries, there seems to often have been an initial step of
confusion that has to be cleared, for example in PR55830.
A file in the powerpc-specific parts of the test-suite,
gcc.target/powerpc/vec-extract-v16qiu-v2.h, has a comment
and seems to be another example, and I testify as the
first-hand third "experience".  The wording has been the
same since the warning was added.

Let's just tweak the wording, adding the cause, so that the
reason for the warning is clearer.  This hopefully stops the
user from immediately asking "'Might'?  Because why?"  and
then going off looking at the function body - or grepping
the gcc source or documentation, or enter a bug-report
subsequently closed as resolved/invalid.

Since the message is only appended with additional
information, no test-case actually required adjustment.
I still changed them, so the message is covered.

gcc:
* cgraphunit.cc (process_function_and_variable_attributes): Tweak
the warning for an attribute-always_inline without inline declaration.

gcc/testsuite:
* g++.dg/Wattributes-3.C: Adjust expected warning.
* gcc.dg/fail_always_inline.c: Ditto.
---
 gcc/cgraphunit.cc | 3 ++-
 gcc/testsuite/g++.dg/Wattributes-3.C  | 4 ++--
 gcc/testsuite/gcc.dg/fail_always_inline.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index 38052674aaa5..5c405258ec30 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -918,7 +918,8 @@ process_function_and_variable_attributes (cgraph_node 
*first,
  /* redefining extern inline function makes it DECL_UNINLINABLE.  */
  && !DECL_UNINLINABLE (decl))
warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
-   "% function might not be inlinable");
+   "% function might not be inlinable"
+   " unless also declared %");
 
   process_common_attributes (node, decl);
 }
diff --git a/gcc/testsuite/g++.dg/Wattributes-3.C 
b/gcc/testsuite/g++.dg/Wattributes-3.C
index 208ec6696551..dd9c2244900c 100644
--- a/gcc/testsuite/g++.dg/Wattributes-3.C
+++ b/gcc/testsuite/g++.dg/Wattributes-3.C
@@ -26,7 +26,7 @@ B::operator char () const { return 0; }
 
 ATTR ((__noinline__))
 B::operator int () const  // { dg-warning "ignoring attribute .noinline. 
because it conflicts with attribute .always_inline." }
-// { dg-warning "function might not be inlinable" "" { target *-*-* } .-1 }
+// { dg-warning "function might not be inlinable unless also declared 
.inline." "" { target *-*-* } .-1 }
 
 {
   return 0;
@@ -45,7 +45,7 @@ C::operator char () { return 0; }
 
 ATTR ((__noinline__))
 C::operator short ()   // { dg-warning "ignoring attribute .noinline. 
because it conflicts with attribute .always_inline." }
-// { dg-warning "function might not be inlinable" "" { target *-*-* } .-1 }
+// { dg-warning "function might not be inlinable unless also declared 
.inline." "" { target *-*-* } .-1 }
 { return 0; }
 
 inline ATTR ((__noinline__))
diff --git a/gcc/testsuite/gcc.dg/fail_always_inline.c 
b/gcc/testsuite/gcc.dg/fail_always_inline.c
index 86645b850de8..16a549ca0935 100644
--- a/gcc/testsuite/gcc.dg/fail_always_inline.c
+++ b/gcc/testsuite/gcc.dg/fail_always_inline.c
@@ -2,7 +2,7 @@
 /* { dg-add-options bind_pic_locally } */
 
 extern __attribute__ ((always_inline)) void
- bar() { } /* { dg-warning "function might not be inlinable" } */
+ bar() { } /* { dg-warning "function might not be inlinable unless also 
declared .inline." } */
 
 void
 f()
-- 
2.30.2

[PATCH] Fortran: passing of optional dummies to elemental procedures [PR113377]

2024-01-23 Thread Harald Anlauf

Dear all,

here's the second part of a series for the treatment of missing
optional arguments passed to optional dummies, now fixing the
case that the latter procedures are elemental.  Adjustments
were necessary when the missing dummy has the VALUE attribute.

I factored the code for the treatment of VALUE, hoping that the
monster loop in gfc_conv_procedure_call will become slightly
easier to overlook.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From bd97af4225bf596260610ea37241ee503842435e Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 23 Jan 2024 21:23:42 +0100
Subject: [PATCH] Fortran: passing of optional dummies to elemental procedures
 [PR113377]

gcc/fortran/ChangeLog:

	PR fortran/113377
	* trans-expr.cc (conv_dummy_value): New.
	(gfc_conv_procedure_call): Factor code for handling dummy arguments
	with the VALUE attribute in the scalar case into conv_dummy_value().
	Reuse and adjust for calling elemental procedures.

gcc/testsuite/ChangeLog:

	PR fortran/113377
	* gfortran.dg/optional_absent_10.f90: New test.
---
 gcc/fortran/trans-expr.cc | 198 +---
 .../gfortran.dg/optional_absent_10.f90| 219 ++
 2 files changed, 333 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/optional_absent_10.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 128add47516..0fac0523670 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6075,6 +6075,105 @@ conv_cond_temp (gfc_se * parmse, gfc_expr * e, tree cond)
 }


+/* Helper function for the handling of (currently) scalar dummy variables
+   with the VALUE attribute.  Argument parmse should already be set up.  */
+static void
+conv_dummy_value (gfc_se * parmse, gfc_expr * e, gfc_symbol * fsym,
+		  vec *& optionalargs)
+{
+  tree tmp;
+
+  gcc_assert (fsym && fsym->attr.value && !fsym->attr.dimension);
+
+  /* Absent actual argument for optional scalar dummy.  */
+  if (e == NULL && fsym->attr.optional && !fsym->attr.dimension)
+{
+  /* For scalar arguments with VALUE attribute which are passed by
+	 value, pass "0" and a hidden argument for the optional status.  */
+  if (fsym->ts.type == BT_CHARACTER)
+	{
+	  /* Pass a NULL pointer for an absent CHARACTER arg and a length of
+	 zero.  */
+	  parmse->expr = null_pointer_node;
+	  parmse->string_length = build_int_cst (gfc_charlen_type_node, 0);
+	}
+  else
+	parmse->expr = fold_convert (gfc_sym_type (fsym),
+ integer_zero_node);
+  vec_safe_push (optionalargs, boolean_false_node);
+
+  return;
+}
+
+  /* gfortran argument passing conventions:
+ actual arguments to CHARACTER(len=1),VALUE
+ dummy arguments are actually passed by value.
+ Strings are truncated to length 1.  */
+  if (gfc_length_one_character_type_p (&fsym->ts))
+{
+  if (e->expr_type == EXPR_CONSTANT
+	  && e->value.character.length > 1)
+	{
+	  e->value.character.length = 1;
+	  gfc_conv_expr (parmse, e);
+	}
+
+  tree slen1 = build_int_cst (gfc_charlen_type_node, 1);
+  gfc_conv_string_parameter (parmse);
+  parmse->expr = gfc_string_to_single_character (slen1, parmse->expr,
+		 e->ts.kind);
+  /* Truncate resulting string to length 1.  */
+  parmse->string_length = slen1;
+}
+
+  if (fsym->attr.optional
+  && fsym->ts.type != BT_CLASS
+  && fsym->ts.type != BT_DERIVED)
+{
+  /* F2018:15.5.2.12 Argument presence and
+	 restrictions on arguments not present.  */
+  if (e->expr_type == EXPR_VARIABLE
+	  && e->rank == 0
+	  && (gfc_expr_attr (e).allocatable
+	  || gfc_expr_attr (e).pointer))
+	{
+	  gfc_se argse;
+	  tree cond;
+	  gfc_init_se (&argse, NULL);
+	  argse.want_pointer = 1;
+	  gfc_conv_expr (&argse, e);
+	  cond = fold_convert (TREE_TYPE (argse.expr), null_pointer_node);
+	  cond = fold_build2_loc (input_location, NE_EXPR,
+  logical_type_node,
+  argse.expr, cond);
+	  vec_safe_push (optionalargs,
+			 fold_convert (boolean_type_node, cond));
+	  /* Create "conditional temporary".  */
+	  conv_cond_temp (parmse, e, cond);
+	}
+  else if (e->expr_type != EXPR_VARIABLE
+	   || !e->symtree->n.sym->attr.optional
+	   || (e->ref != NULL && e->ref->type != REF_ARRAY))
+	vec_safe_push (optionalargs, boolean_true_node);
+  else
+	{
+	  tmp = gfc_conv_expr_present (e->symtree->n.sym);
+	  if (e->ts.type != BT_CHARACTER && !e->symtree->n.sym->attr.value)
+	parmse->expr
+	  = fold_build3_loc (input_location, COND_EXPR,
+ TREE_TYPE (parmse->expr),
+ tmp, parmse->expr,
+ fold_convert (TREE_TYPE (parmse->expr),
+	   integer_zero_node));
+
+	  vec_safe_push (optionalargs,
+			 fold_convert (boolean_type_node, tmp));
+	}
+}
+}
+
+
+
 /* Generate code for a procedure call.  Note can return se->post != NULL.
If se->direct_byref is set then se->expr contains the return parameter.
Return nonzero, if the call has altern

Re: [PATCH] c++: ambiguous member lookup for rewritten cands [PR113529]

2024-01-23 Thread Jason Merrill


On 1/22/24 13:11, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?

-- >8 --

Here we handle the operator expression u < v inconsistently: in a SFINAE
context (such as a requires-expression) we accept the it, and in a
non-SFINAE context we reject it with

   error: request for member 'operator<=>' is ambiguous

as per [class.member.lookup]/6.  This inconsistency is ultimately
because we neglect to propagate error_mark_node after recursing in
add_operator_candidates, fixed like so.


Shouldn't we do the same with the other recursive call?

Please also add return value documentation to the comment before the 
function.


OK with those changes.

Jason

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-23 Thread Richard Sandiford

Matthias Kretz  writes:
> On Sunday, 10 December 2023 14:29:45 CET Richard Sandiford wrote:
>> Thanks for the patch and sorry for the slow review.
>
> Sorry for my slow reaction. I needed a long vacation. For now I'll focus on 
> the design question wrt. multi-arch compilation.
>
>> I can only comment on the usage of SVE, rather than on the scaffolding
>> around it.  Hopefully Jonathan or others can comment on the rest.
>
> That's very useful!
>
>> The main thing that worries me is:
>> 
>> #if _GLIBCXX_SIMD_HAVE_SVE
>> constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS/8;
>> #else
>> constexpr inline int __sve_vectorized_size_bytes = 0;
>> #endif
>> 
>> Although -msve-vector-bits is currently a per-TU setting, that isn't
>> necessarily going to be the case in future.
>
> This is a topic that I care about a lot... as simd user, implementer, and 
> WG21 
> proposal author. Are you thinking of a plan to implement the target_clones 
> function attribute for different SVE lengths? Or does it go further than 
> that? 
> PR83875 is raising the same issue and solution ideas for x86. If I understand 
> your concern correctly, then the issue you're raising exists in the same form 
> for x86.
>
> If anyone is interested in working on a "translation phase 7 replacement" for 
> compiler flags macros I'd be happy to give some input of what I believe is 
> necessary to make target_clones work with std(x)::simd. This seems to be 
> about 
> constant expressions that return compiler-internal state - probably similar 
> to 
> how static reflection needs to work.
>
> For a sketch of a direction: what I'm already doing in 
> std::experimental::simd, is to tag all non-always_inline function names with 
> a 
> bitflag, representing a relevant subset of -f and -m flags. That way, I'm 
> guarding against surprises on linking TUs compiled with different flags.

Yeah, target_clones could be one use for switching lengths.  In that
case the choice would be based on a load-time check of the vector length.

However, we also support different vector lengths for streaming SVE
(running in "streaming" mode on SME) and non-streaming SVE (running
in "non-streaming" mode on the core).  Having two different lengths is
expected to be the common case, rather than a theoretical curiosity.

Software also provides a "streaming compatible" mode in which code can
run either in streaming or non-streaming mode and which restricts itself
to the common subset of non-streaming and streaming features (which is
large).

So it isn't really safe to treat the vector length as a program or
process invariant, or to assume that all active uses of std::simd
in a binary are for the same vector length.  It should in principle
be possible to use std::simd variants for non-streaming code and
std::simd variants for streaming code even if the lengths are
different.

So in the target_clones case, the clone would have to apply to a
non-streaming function and switch based on the non-streaming
vector length, or apply to a streaming function and switch based
on the streaming vector length.  (Neither of those are supported yet,
of course.)

I think there are probably also ODR problems with hard-coding a vector
length, instead of treating it as a template parameter.

Unfortunately it isn't currently possible to write:

template
using T = svint32_t __attribute__((arm_sve_vector_bits(N)));

due to PRs 58855/68703/88600.  But I think we should fix that. :)

>> Ideally it would be
>> possible to define different implementations of a function with
>> different (fixed) vector lengths within the same TU.  The value at
>> the point that the header file is included is then somewhat arbitrary.
>> 
>> So rather than have:
>> >  using __neon128 = _Neon<16>;
>> >  using __neon64 = _Neon<8>;
>> > 
>> > +using __sve = _Sve<>;
>> 
>> would it be possible instead to have:
>> 
>>   using __sve128 = _Sve<128>;
>>   using __sve256 = _Sve<256>;
>>   ...etc...
>> 
>> ?  Code specialised for 128-bit vectors could then use __sve128 and
>> code specialised for 256-bit vectors could use __sve256.
>
> Hmm, as things stand we'd need two numbers, IIUC:
> _Sve
>
> On x86, "NumberOfUsedBytes" is sufficient, because 33-64 implies zmm 
> registers 
> (and -mavx512f), 17-32 implies ymm, and <=16 implies xmm (except where it 
> doesn't ;) ).

When would NumberOfUsedBytes < SizeofRegister be used for SVE?  Would it
be for storing narrower elements in wider containers?  If the interface
supports that then, yeah, two parameters would probably be safer.

Or were you thinking about emulating narrower vectors with wider registers
using predication?  I suppose that's possible too, and would be similar in
spirit to using SVE to optimise Advanced SIMD std::simd types.
But mightn't it cause confusion if sizeof applied to a "16-byte"
vector actually gives 32?

>> Perhaps that's not possible as things stand, but it'd be interesting
>> to know the exact failure mode if so.  Either way, it would be goo

Re: [PATCH] c++: -Wdangling-reference and lambda false warning [PR109640]

2024-01-23 Thread Jason Merrill


On 1/19/24 21:18, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.  A lambda could return a dangling reference, but it's unlikely.


-- >8 --
-Wdangling-reference checks if a function receives a temporary as its
argument, and only warns if any of the arguments was a temporary.  But
we should not warn when the temporary represents a lambda or we generate
false positives as in the attached testcases.

PR c++/113256
PR c++/111607
PR c++/109640

gcc/cp/ChangeLog:

* call.cc (do_warn_dangling_reference): Don't warn if the temporary
is of lambda type.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference14.C: New test.
* g++.dg/warn/Wdangling-reference15.C: New test.
* g++.dg/warn/Wdangling-reference16.C: New test.
---
  gcc/cp/call.cc|  9 --
  .../g++.dg/warn/Wdangling-reference14.C   | 22 +
  .../g++.dg/warn/Wdangling-reference15.C   | 31 +++
  .../g++.dg/warn/Wdangling-reference16.C   | 13 
  4 files changed, 72 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference14.C
  create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference15.C
  create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference16.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 1f5ff417c81..77f51bacce3 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14123,7 +14123,10 @@ do_warn_dangling_reference (tree expr, bool arg_p)
tree e = expr;
while (handled_component_p (e))
e = TREE_OPERAND (e, 0);
-  if (!reference_like_class_p (TREE_TYPE (e)))
+  tree type = TREE_TYPE (e);
+  /* If the temporary represents a lambda, we don't really know
+what's going on here.  */
+  if (!reference_like_class_p (type) && !LAMBDA_TYPE_P (type))
return expr;
  }
  
@@ -14180,10 +14183,10 @@ do_warn_dangling_reference (tree expr, bool arg_p)

   initializing this reference parameter.  */
if (do_warn_dangling_reference (arg, /*arg_p=*/true))
  return expr;
- /* Don't warn about member function like:
+ /* Don't warn about member functions like:
  std::any a(...);
  S& s = a.emplace({0}, 0);
-which constructs a new object and returns a reference to it, but
+which construct a new object and return a reference to it, but
 we still want to detect:
   struct S { const S& self () { return *this; } };
   const S& s = S().self();
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference14.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference14.C
new file mode 100644
index 000..92b38a965e0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference14.C
@@ -0,0 +1,22 @@
+// PR c++/113256
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wdangling-reference" }
+
+#include 
+#include 
+
+template auto bind(M T::* pm, A)
+{
+return [=]( auto&& x ) -> M const& { return x.*pm; };
+}
+
+template struct arg {};
+
+arg<1> _1;
+
+int main()
+{
+std::pair pair;
+int const& x = bind( &std::pair::first, _1 )( pair ); // { dg-bogus 
"dangling reference" }
+assert( &x == &pair.first );
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference15.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference15.C
new file mode 100644
index 000..c39577db64a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference15.C
@@ -0,0 +1,31 @@
+// PR c++/111607
+// { dg-do compile { target c++20 } }
+// { dg-options "-Wdangling-reference" }
+
+#include 
+
+struct S {
+   constexpr S(int i_) : i(i_) {}
+   S(S const &) = delete;
+   S & operator=(S const &) = delete;
+   S(S &&) = delete;
+   S & operator=(S &&) = delete;
+   int i;
+};
+
+struct A {
+   S s{0};
+};
+
+using V = std::variant;
+
+consteval auto f(V const & v) {
+  auto const & s = std::visit([](auto const & v) -> S const & { return v.s; }, v); // { 
dg-bogus "dangling reference" }
+  return s.i;
+}
+
+int main() {
+   constexpr V a{std::in_place_type};
+   constexpr auto i = f(a);
+   return i;
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference16.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference16.C
new file mode 100644
index 000..91996922291
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference16.C
@@ -0,0 +1,13 @@
+// PR c++/109640
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wdangling-reference" }
+
+bool
+fn0 ()
+{
+int a;
+int&& i = [](int& r) -> int&& { return static_cast(r); }(a); // { dg-bogus 
"dangling reference" }
+auto const l = [](int& r) -> int&& { return static_cast(r); };
+int&& j = l(a);
+return &i == &j;
+}

base-commit: 615e25c82de97acc17ab438f88d6788cf7ffe1d6

Re: [PATCH] testsuite: Disable test for PR113292 on targets without TLS support

2024-01-23 Thread Jason Merrill


On 1/19/24 02:41, Nathaniel Shead wrote:

Tested on x86_64-pc-linux-gnu using a cross-compiler to
arm-unknown-linux-gnueabihf with --enable-threads=0 that the link test
is correctly skipped. OK for trunk?


OK.


-- >8 --

This disables the new test added by r14-8168 on machines that don't have
TLS support, such as bare-metal ARM.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr113292_c.C: Require TLS.

Signed-off-by: Nathaniel Shead 
---
  gcc/testsuite/g++.dg/modules/pr113292_c.C | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.dg/modules/pr113292_c.C 
b/gcc/testsuite/g++.dg/modules/pr113292_c.C
index aa3f32ae818..c117c7cfcd4 100644
--- a/gcc/testsuite/g++.dg/modules/pr113292_c.C
+++ b/gcc/testsuite/g++.dg/modules/pr113292_c.C
@@ -1,6 +1,8 @@
  // PR c++/113292
  // { dg-module-do link }
+// { dg-add-options tls }
  // { dg-additional-options "-fmodules-ts" }
+// { dg-require-effective-target tls_runtime }
  
  import "pr113292_a.H";

Re: [PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

2024-01-23 Thread Robin Dapp

> SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
> that is, VSETVL PASS consume over 33 GB memory which make use impossible
> to compile SPEC 2017 wrf in a laptop.
> 
> The root cause is wasting-memory variables:

LGTM.   The new code matches compute_lcm_local_properties more
closely which makes sense to me.

One separate thing, nothing to do with this patch - I find
bitmap_union_of_preds_with_entry not wrong but weirdly written.
Probably because it was copied from somewhere and slightly
adjusted?  If you touch more code anyway, would you mind fixing it?

  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
{
  e = EDGE_PRED (b, ix);
  bitmap_copy (dst, src[e->src->index]);
  break;
}
  if (ix == EDGE_COUNT (b->preds))
bitmap_clear (dst);

The whole idea seems to _not_ skip the entry block.  So something
like if (EDGE_COUNT () == 0) {...} else { bitmap_copy (...)) should
be sufficient?  If the input is assumed to be empty we could even
skip the copy.

> -/* { dg-options "--param=riscv-autovec-preference=scalable -march=rv32gcv 
> -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2 -fno-tree-vectorize" } */
> +/* { dg-options "--param=riscv-autovec-preference=scalable -march=rv32gcv 
> -mabi=ilp32 -fno-tree-vectorize" } */

Why that change?  Was no-schedule necessary before and is not anymore?
Is it a result from the changes?  I'd hope not.

Regards
 Robin

[committed] MAINTAINERS: Update my email address

2024-01-23 Thread Tobias Burnus


Updated my email address.

Committed as r14-8374-ged4c7893de2cba.

Tobias
commit ed4c7893de2cbae0a07bb4984e408d57e6db06f3
Author: Tobias Burnus 
Date:   Tue Jan 23 22:18:57 2024 +0100

MAINTAINERS: Update my email address

ChangeLog:

* MAINTAINERS: Update my email address.

Signed-off-by: Tobias Burnus 

diff --git a/MAINTAINERS b/MAINTAINERS
index 547237e0cf8..ade5c9f0181 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -180,14 +180,14 @@ fp-bit			Ian Lance Taylor	
 libgcc			Ian Lance Taylor	
 libgo			Ian Lance Taylor	
 libgomp			Jakub Jelinek		
-libgomp			Tobias Burnus		
+libgomp			Tobias Burnus		
 libgomp (OpenACC)	Thomas Schwinge		
 libgrust		All Rust front end maintainers
 libiberty		Ian Lance Taylor	
 libitm			Torvald Riegel		
 libobjc			Nicola Pero		
 libobjc			Andrew Pinski		
-libquadmath		Tobias Burnus		
+libquadmath		Tobias Burnus		
 libquadmath		Jakub Jelinek		
 libvtv			Caroline Tice		
 libphobos		Iain Buclaw		
@@ -254,9 +254,9 @@ loop infrastructure	Zdenek Dvorak		
 loop ivopts		Bin Cheng		
 loop optimizer		Bin Cheng		
 OpenACC			Thomas Schwinge		
-OpenACC			Tobias Burnus		
+OpenACC			Tobias Burnus		
 OpenMP			Jakub Jelinek		
-OpenMP			Tobias Burnus		
+OpenMP			Tobias Burnus		
 testsuite		Rainer Orth		
 testsuite		Mike Stump		
 register allocation	Vladimir Makarov	
@@ -281,7 +281,7 @@ dataflow		Kenneth Zadeck		
 driver			Joseph Myers		
 Fortran			Harald Anlauf		
 Fortran			Janne Blomqvist		
-Fortran			Tobias Burnus		
+Fortran			Tobias Burnus		
 Fortran			François-Xavier Coudert	
 Fortran			Jerry DeLisle		
 Fortran			Erik Edelmann

Re: [PATCH] c++: ambiguous member lookup for rewritten cands [PR113529]

2024-01-23 Thread Patrick Palka

On Tue, 23 Jan 2024, Jason Merrill wrote:

> On 1/22/24 13:11, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > OK for trunk/13?
> > 
> > -- >8 --
> > 
> > Here we handle the operator expression u < v inconsistently: in a SFINAE
> > context (such as a requires-expression) we accept the it, and in a
> > non-SFINAE context we reject it with
> > 
> >error: request for member 'operator<=>' is ambiguous
> > 
> > as per [class.member.lookup]/6.  This inconsistency is ultimately
> > because we neglect to propagate error_mark_node after recursing in
> > add_operator_candidates, fixed like so.
> 
> Shouldn't we do the same with the other recursive call?

Oops, yes.  I thought that the function is symmetric with respect to
member lookup so it should suffice to check the first recursive call,
but member lookup is only performed for the first argument type, and
arguments are swapped for the second recursive call.

> 
> Please also add return value documentation to the comment before the function.

Will do, thanks!

> 
> OK with those changes.
> 
> Jason
> 
>

Re: [PATCH 4/4] libbacktrace: get debug information for loaded dlls

2024-01-23 Thread Björn Schäpers


Am 03.01.2024 um 00:12 schrieb Björn Schäpers:

Am 30.11.2023 um 20:53 schrieb Ian Lance Taylor:

On Fri, Jan 20, 2023 at 2:55 AM Björn Schäpers  wrote:


From: Björn Schäpers 

Fixes https://github.com/ianlancetaylor/libbacktrace/issues/53, except
that libraries loaded after the backtrace_initialize are not handled.
But as far as I can see that's the same for elf.


Thanks, but I don't want a patch that loops using goto statements.
Please rewrite to avoid that.  It may be simpler to call a function.

Also starting with a module count of 1000 seems like a lot.  Do
typical Windows programs load that many modules?

Ian




Rewritten using a function.

If that is commited, could you attribute that commit to me (--author="Björn 
Schäpers ")?


Thanks and kind regards,
Björn.


A friendly ping.

[r14-8346 Regression] FAIL: gcc.dg/guality/pr54796.c -Os -DPREVENT_OPTIMIZATION line 17 c == 5 on Linux/x86_64

2024-01-23 Thread haochen.jiang

On Linux/x86_64,

a98d5130a6dcff2ed4db371e500550134777b8cf is the first bad commit
commit a98d5130a6dcff2ed4db371e500550134777b8cf
Author: Richard Biener 
Date:   Mon Jan 15 12:55:20 2024 +0100

rtl-optimization/113255 - base_alias_check vs. pointer difference

caused

FAIL: gcc.dg/guality/pr54796.c   -O1  -DPREVENT_OPTIMIZATION  line 17 a == 5
FAIL: gcc.dg/guality/pr54796.c   -O1  -DPREVENT_OPTIMIZATION  line 17 b == 6
FAIL: gcc.dg/guality/pr54796.c   -O1  -DPREVENT_OPTIMIZATION  line 17 c == 5
FAIL: gcc.dg/guality/pr54796.c   -O2  -DPREVENT_OPTIMIZATION  line 17 a == 5
FAIL: gcc.dg/guality/pr54796.c   -O2  -DPREVENT_OPTIMIZATION  line 17 b == 6
FAIL: gcc.dg/guality/pr54796.c   -O2  -DPREVENT_OPTIMIZATION  line 17 c == 5
FAIL: gcc.dg/guality/pr54796.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 17 a == 5
FAIL: gcc.dg/guality/pr54796.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 17 b == 6
FAIL: gcc.dg/guality/pr54796.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 17 c == 5
FAIL: gcc.dg/guality/pr54796.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 17 a == 5
FAIL: gcc.dg/guality/pr54796.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 17 b == 6
FAIL: gcc.dg/guality/pr54796.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 17 c == 5
FAIL: gcc.dg/guality/pr54796.c  -Og -DPREVENT_OPTIMIZATION  line 17 a == 5
FAIL: gcc.dg/guality/pr54796.c  -Og -DPREVENT_OPTIMIZATION  line 17 b == 6
FAIL: gcc.dg/guality/pr54796.c  -Og -DPREVENT_OPTIMIZATION  line 17 c == 5
FAIL: gcc.dg/guality/pr54796.c   -Os  -DPREVENT_OPTIMIZATION  line 17 a == 5
FAIL: gcc.dg/guality/pr54796.c   -Os  -DPREVENT_OPTIMIZATION  line 17 b == 6
FAIL: gcc.dg/guality/pr54796.c   -Os  -DPREVENT_OPTIMIZATION  line 17 c == 5

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-8346/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54796.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54796.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH 5/4] libbacktrace: improve getting debug information for loaded dlls

2024-01-23 Thread Ian Lance Taylor

On Thu, Jan 4, 2024 at 2:33 PM Björn Schäpers  wrote:
>
> Am 03.01.2024 um 00:12 schrieb Björn Schäpers:
> > Am 30.11.2023 um 20:53 schrieb Ian Lance Taylor:
> >> On Fri, Jan 20, 2023 at 2:55 AM Björn Schäpers  wrote:
> >>>
> >>> From: Björn Schäpers 
> >>>
> >>> Fixes https://github.com/ianlancetaylor/libbacktrace/issues/53, except
> >>> that libraries loaded after the backtrace_initialize are not handled.
> >>> But as far as I can see that's the same for elf.
> >>
> >> Thanks, but I don't want a patch that loops using goto statements.
> >> Please rewrite to avoid that.  It may be simpler to call a function.
> >>
> >> Also starting with a module count of 1000 seems like a lot.  Do
> >> typical Windows programs load that many modules?
> >>
> >> Ian
> >>
> >>
> >
> > Rewritten using a function.
> >
> > If that is commited, could you attribute that commit to me (--author="Björn
> > Schäpers ")?
> >
> > Thanks and kind regards,
> > Björn.
>
> I noticed that under 64 bit libraries loaded with LoadLibrary were missing.
> EnumProcessModules stated the correct number of modules, but did not fill the
> the HMODULEs, but set them to 0. While trying to investigate I noticed if I do
> the very same thing from main (in C++) I even got fewer module HMODULEs.
>
> So I went a different way. This detects all libraries correctly, in 32 and 64
> bit. The question is, if it should be a patch on top of the previous, or 
> should
> they be merged, or even only this solution and drop the EnumProcessModules 
> variant?

Is there any reason to use both patches?  Seems simpler to just use
this one if it works.  Thanks.

Ian

Re: Re: [PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

2024-01-23 Thread 钟居哲

>> Why that change?  Was no-schedule necessary before and is not anymore?
>> Is it a result from the changes?  I'd hope not.
Yes. But reasonable. So adapt testcase is enough.

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2024-01-24 05:12
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix large memory usage of VSETVL PASS [PR113495]
> SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
> that is, VSETVL PASS consume over 33 GB memory which make use impossible
> to compile SPEC 2017 wrf in a laptop.
> 
> The root cause is wasting-memory variables:

LGTM. The new code matches compute_lcm_local_properties more
closely which makes sense to me.

One separate thing, nothing to do with this patch - I find
bitmap_union_of_preds_with_entry not wrong but weirdly written.
Probably because it was copied from somewhere and slightly
adjusted?  If you touch more code anyway, would you mind fixing it?

  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
{
  e = EDGE_PRED (b, ix);
  bitmap_copy (dst, src[e->src->index]);
  break;
}
  if (ix == EDGE_COUNT (b->preds))
bitmap_clear (dst);

The whole idea seems to _not_ skip the entry block.  So something
like if (EDGE_COUNT () == 0) {...} else { bitmap_copy (...)) should
be sufficient?  If the input is assumed to be empty we could even
skip the copy.

> -/* { dg-options "--param=riscv-autovec-preference=scalable -march=rv32gcv 
> -mabi=ilp32 -fno-schedule-insns -fno-schedule-insns2 -fno-tree-vectorize" } */
> +/* { dg-options "--param=riscv-autovec-preference=scalable -march=rv32gcv 
> -mabi=ilp32 -fno-tree-vectorize" } */

Why that change?  Was no-schedule necessary before and is not anymore?
Is it a result from the changes?  I'd hope not.

Regards
Robin

[PATCH 1/2] libstdc++/pair: Define _S_const_assignable helper for C++20

2024-01-23 Thread Patrick Palka

This is consistent with std::tuple's __const_assignable member function,
and will be reused when implementing the new pair::operator= overloads
from P2165R4.

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (pair::_S_const_assignable): Define,
factored out from ...
(pair::operator=): ... the constraints of the const overloads.
---
 libstdc++-v3/include/bits/stl_pair.h | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index acd0c7b58f9..b81b479ad43 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -404,6 +404,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return false;
}
 
+  template
+   static constexpr bool
+   _S_const_assignable()
+   {
+ if constexpr (is_assignable_v)
+   return is_assignable_v;
+ return false;
+   }
+
   template
static constexpr bool
_S_nothrow_assignable()
@@ -468,8 +477,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Copy assignment operator (const)
   constexpr const pair&
   operator=(const pair& __p) const
-  requires is_copy_assignable_v
-   && is_copy_assignable_v
+  requires (_S_const_assignable())
   {
first = __p.first;
second = __p.second;
@@ -479,8 +487,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Move assignment operator (const)
   constexpr const pair&
   operator=(pair&& __p) const
-  requires is_assignable_v
-   && is_assignable_v
+  requires (_S_const_assignable())
   {
first = std::forward(__p.first);
second = std::forward(__p.second);
@@ -491,8 +498,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
constexpr const pair&
operator=(const pair<_U1, _U2>& __p) const
-   requires is_assignable_v
- && is_assignable_v
+   requires (_S_const_assignable())
{
  first = __p.first;
  second = __p.second;
@@ -503,8 +509,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
constexpr const pair&
operator=(pair<_U1, _U2>&& __p) const
-   requires is_assignable_v
- && is_assignable_v
+   requires (_S_const_assignable<_U1, _U2>())
{
  first = std::forward<_U1>(__p.first);
  second = std::forward<_U2>(__p.second);
-- 
2.43.0.386.ge02ecfcc53

[PATCH 2/2] libstdc++: Implement P2165R4 changes to std::pair/tuple/etc

2024-01-23 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunK?

-- >8 --

This implements the C++23 paper P2165R4 Compatibility between tuple,
pair and tuple-like objects, which builds upon many changes from the
earlier C++23 paper P2321R2 zip.

Some declarations had to be moved around so that they're visible from
 without introducing new includes.  In the end, the
only new include is for  from ,
for tuple_element_t.

libstdc++-v3/ChangeLog:

* include/bits/ranges_util.h (__detail::__pair_like): Don't
define in C++23 mode.
(__detail::__pair_like_convertible_from): Adjust as per P2165R4.
(__detail::__is_subrange): Moved from .
(__detail::__is_tuple_like_v): Likewise.
* include/bits/stl_iterator.h: Include  for
C++23.
(__different_from): Move to .
(__iter_key_t): Adjust for C++23 as per P2165R4.
(__iter_val_t): Likewise.
* include/bits/stl_pair.h (pair, array): Forward declare.
(get): Forward declare all overloads relevant to P2165R4
tuple-like constructors.
(__is_tuple_v): Define for C++23.
(__is_tuple_like_v): Define for C++23.
(__tuple_like): Define for C++23 as per P2165R4.
(__pair_like): Define for C++23 as per P2165R4.
(__eligibile_tuple_like): Define for C++23.
(__eligibile_pair_like): Define for C++23.
(pair::_S_constructible_from_pair_like): Define for C++23.
(pair::_S_convertible_from_pair_like): Define for C++23.
(pair::pair): Define overload taking a tuple-like type for
C++23 as per P2165R4.
(pair::_S_assignable_from_tuple_like): Define for C++23.
(pair::_S_const_assignable_from_tuple_like): Define for C++23.
(pair::operator=): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
* include/bits/utility.h (ranges::__detail::__is_subrange):
Moved from .
* include/bits/version.def (tuple_like): Define for C++23.
* include/bits/version.h: Regenerate.
* include/std/concepts (__different_from): Moved from
.
(ranges::__swap::__adl_swap): Clarify which __detail namespace.
* include/std/map (__cpp_lib_tuple_like): Define C++23.
* include/std/ranges (__detail::__is_subrange): Moved to
.
(__detail::__is_subrange): Moved to 
(__detail::__has_tuple_element): Adjust for C++23 as per P2165R4.
(__detail::__tuple_or_pair): Remove as per P2165R4.  Replace all
uses with plain tuple as per P2165R4.
* include/std/tuple (__cpp_lib_tuple_like): Define for C++23.
(__tuple_like_tag_t): Define for C++23.
(__tuple_cmp): Forward declare for C++23.
(_Tuple_impl::_Tuple_impl): Define overloads taking
__tuple_like_tag_t and a tuple-like type for C++23.
(_Tuple_impl::_M_assign): Likewise.
(tuple::__constructible_from_tuple_like): Define for C++23.
(tuple::__convertible_from_tuple_like): Define for C++23.
(tuple::tuple): Define overloads taking a tuple-like type for
C++23 as per P2165R4.
(tuple::__assignable_from_tuple_like): Define for C++23.
(tuple::__const_assignable_from_tuple_like): Define for C++23.
(tuple::operator=): Define overloads taking a tuple-like type
for C++23 as per P2165R4.
(tuple::__tuple_like_common_comparison_category): Define for C++23.
(tuple::operator<=>): Define overload taking a tuple-like type
for C++23 as per P2165R4.
(array, get): Forward declarations moved to .
(tuple_cat): Constrain with __tuple_like for C++23 as per P2165R4.
(apply): Likewise.
(make_from_tuple): Likewise.
(__tuple_like_common_reference): Define for C++23.
(basic_common_reference): Adjust as per P2165R4.
(__tuple_like_common_type): Define for C++23.
(common_type): Adjust as per P2165R4.
* include/std/unordered_map (__cpp_lib_tuple_like): Define for
C++23.
* include/std/utility (__cpp_lib_tuple_like): Define for C++23.
* testsuite/std/ranges/zip/1.cc (test01): Adjust to handle pair
and 2-tuple interchangeably.
* testsuite/20_util/pair/p2165r4.cc: New test.
* testsuite/20_util/tuple/p2165r4.cc: New test.
---
 libstdc++-v3/include/bits/ranges_util.h   |  17 +-
 libstdc++-v3/include/bits/stl_iterator.h  |  16 +-
 libstdc++-v3/include/bits/stl_pair.h  | 167 +
 libstdc++-v3/include/bits/utility.h   |   8 +
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  11 +
 libstdc++-v3/include/std/concepts |  11 +-
 libstdc++-v3/include/std/map  |   1 +
 libstdc++-v3/include/std/ranges   |  48 +--
 libstdc++-v3/include/std/tuple| 299 +---
 libstdc++-v3/include/std/unordered_map|   1 +
 libstdc++-v3/includ

Re: [PATCH 1/2] libstdc++/pair: Define _S_const_assignable helper for C++20

2024-01-23 Thread Jonathan Wakely

On Tue, 23 Jan 2024, 23:53 Patrick Palka,  wrote:

> This is consistent with std::tuple's __const_assignable member function,
> and will be reused when implementing the new pair::operator= overloads
> from P2165R4.
>


OK



> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_pair.h (pair::_S_const_assignable): Define,
> factored out from ...
> (pair::operator=): ... the constraints of the const overloads.
> ---
>  libstdc++-v3/include/bits/stl_pair.h | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_pair.h
> b/libstdc++-v3/include/bits/stl_pair.h
> index acd0c7b58f9..b81b479ad43 100644
> --- a/libstdc++-v3/include/bits/stl_pair.h
> +++ b/libstdc++-v3/include/bits/stl_pair.h
> @@ -404,6 +404,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   return false;
> }
>
> +  template
> +   static constexpr bool
> +   _S_const_assignable()
> +   {
> + if constexpr (is_assignable_v)
> +   return is_assignable_v;
> + return false;
> +   }
> +
>template
> static constexpr bool
> _S_nothrow_assignable()
> @@ -468,8 +477,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>/// Copy assignment operator (const)
>constexpr const pair&
>operator=(const pair& __p) const
> -  requires is_copy_assignable_v
> -   && is_copy_assignable_v
> +  requires (_S_const_assignable second_type&>())
>{
> first = __p.first;
> second = __p.second;
> @@ -479,8 +487,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>/// Move assignment operator (const)
>constexpr const pair&
>operator=(pair&& __p) const
> -  requires is_assignable_v
> -   && is_assignable_v
> +  requires (_S_const_assignable())
>{
> first = std::forward(__p.first);
> second = std::forward(__p.second);
> @@ -491,8 +498,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
> constexpr const pair&
> operator=(const pair<_U1, _U2>& __p) const
> -   requires is_assignable_v
> - && is_assignable_v
> +   requires (_S_const_assignable())
> {
>   first = __p.first;
>   second = __p.second;
> @@ -503,8 +509,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
> constexpr const pair&
> operator=(pair<_U1, _U2>&& __p) const
> -   requires is_assignable_v
> - && is_assignable_v
> +   requires (_S_const_assignable<_U1, _U2>())
> {
>   first = std::forward<_U1>(__p.first);
>   second = std::forward<_U2>(__p.second);
> --
> 2.43.0.386.ge02ecfcc53
>
>

[PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-23 Thread Qing Zhao

Hi,

This is the 4th version of the patch.

It based on the following proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

**The summary of the proposal is:

* Add a new internal function ".ACCESS_WITH_SIZE" to carry the size information 
for every reference to a FAM field;
* In C FE, Replace every reference to a FAM field whose TYPE has the 
"counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
* In every consumer of the size information, for example, BDOS or array bound 
sanitizer, query the size information or ACCESS_MODE information from the new 
internal function;
* When expansing to RTL, replace the internal function with the actual 
reference to the FAM field;
* Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
impact to the optimizer and code generation.


**The new internal function

  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE, INDEX)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from the incomplete array type to the corresponding pointer type.

Please see the following link for why:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

1st argument "REF_TO_OBJ": The reference to the object;
2nd argument "REF_TO_SIZE": The reference to the size of the object,
3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE represents
   0: unknown;
   1: the number of the elements of the object type;
   2: the number of bytes;
4th argument "PRECISION_OF_SIZE": The precision of the integer that REF_TO_SIZE 
points;
5th argument "ACCESS_MODE":
  -1: Unknown access semantics
   0: none
   1: read_only
   2: write_only
   3: read_write
6th argument "INDEX": the INDEX for the original array reference.
  -1: Unknown

NOTE: The 6th Argument is added for bound sanitizer instrumentation.

** The Patch sets included:

1. Provide counted_by attribute to flexible array member field;
  which includes:
  * "counted_by" attribute documentation;
  * C FE handling of the new attribute;
syntax checking, error reporting;
  * testing cases;

2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
  which includes:
  * The definition of the new internal function .ACCESS_WITH_SIZE in 
internal-fn.def.
  * C FE converts every reference to a FAM with "counted_by" attribute to a 
call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and 
initialized.
In order to make this working, we should update 
initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
calls to .ACCESS_WITH_SIZE.

However, for the reference inside "offsetof", ignore the "counted_by" 
attribute since it's not useful at all. (c_parser_postfix_expression in 
c/c-parser.cc)

  * Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
  * adjust alias analysis to exclude the new internal from clobbering 
anything.
(ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
tree-ssa-alias.cc)
  * adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
when
it's LHS is eliminated as dead code.
(eliminate_unnecessary_stmts in tree-ssa-dce.cc)
  * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
get the reference from the call to .ACCESS_WITH_SIZE.
(is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
  * testing cases. (for offsetof, static initialization, generation of 
calls to
.ACCESS_WITH_SIZE, code runs correctly after calls to .ACCESS_WITH_SIZE 
are
converted back)

3. Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
  which includes:
  * use the size info of the .ACCESS_WITH_SIZE for sub-object.
  * testing cases. 

4 Use the .ACCESS_WITH_SIZE in bound sanitizer
  Since the result type of the call to .ACCESS_WITH_SIZE is a pointer to
the element type. The original array_ref is converted to an 
indirect_ref,
which introduced two issues for the instrumenation of bound sanitizer:

A. The index for the original array_ref was mixed into the offset
expression for the new indirect_ref.

In order to make the instrumentation for the bound sanitizer easier, one
more argument for the function .ACCESS_WITH_SIZE is added to record this
original index for the array_ref. then later during bound 
instrumentation,
get the index from the additi

[PATCH v4 2/4] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-01-23 Thread Qing Zhao

Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routines initializer_constant_valid_p_1
  and output_constant in varasm.cc are updated to handle calls to
  .ACCESS_WITH_SIZE.
  (initializer_constant_valid_p_1 and output_constant in varasm.c)

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)
* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Adjust alias analysis to exclude the new internal from clobbering anything.
  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
* Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
  it's LHS is eliminated as dead code.
  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
IFN_ACCESS_WITH_SIZE.
(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
to .ACCESS_WITH_SIZE when its LHS is dead.
* tree.cc (process_call_operands): Adjust side effect for function
.ACCESS_WITH_SIZE.
(is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
* varasm.cc (initializer_constant_valid_p_1): Handle call to
.ACCESS_WITH_SIZE.
(output_constant): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 108 +-
 gcc/internal-fn.cc|  35 ++
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  |  94 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 294 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c31349dae2ff..a6ed5ac43bb1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || c_parser_next_token_is (parser,
@@ -10879,11 +10882,14 @@ c_parser_postfix_expression (c_parser *parser)
break;
  }
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful.  */

[PATCH v4 1/4] Provide counted_by attribute to flexible array member field (PR108896)

2024-01-23 Thread Qing Zhao

'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field named "COUNT" in the
 same structure as the flexible array member.  GCC uses this
 information to improve the results of the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler will report a warning and
 ignore the attribute.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' should be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler will report
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field will use the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' will use 'val1' as the number of the elements
 in 'p->array', and 'ref2' will use 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
---
 gcc/c-family/c-attribs.cc| 54 -
 gcc/c-family/c-common.cc | 13 +++
 gcc/c-family/c-common.h  |  1 +
 gcc/c/c-decl.cc  | 85 
 gcc/c/c-tree.h   |  1 +
 gcc/c/c-typeck.cc|  3 +-
 gcc/doc/extend.texi  | 62 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c | 40 +
 8 files changed, 239 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 40a0cf90295d..4395c0656b14 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
 int, bool *);
+static tree handle_counted_by_attribute (tree *, tree, tree,
+  int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
@@ -412,6 +414,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_warn_if_not_aligned_attribute, NULL },
   { "strict_flex_array",  1, 1, true, false, false, false,
  handle_strict_flex_array_attribute, NULL },
+  { "counted

[PATCH v4 4/4] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-01-23 Thread Qing Zhao

Since the result type of the call to .ACCESS_WITH_SIZE is a pointer to
the element type. The original array_ref is converted to an indirect_ref,
which introduced two issues for the instrumenation of bound sanitizer:

A. The index for the original array_ref was mixed into the offset
expression for the new indirect_ref.

In order to make the instrumentation for the bound sanitizer easier, one
more argument for the function .ACCESS_WITH_SIZE is added to record this
original index for the array_ref. then later during bound instrumentation,
get the index from the additional argument from the call to the function
.ACCESS_WITH_SIZE.

B. In the current bound sanitizer, no instrumentation will be inserted for
an indirect_ref.

In order to add instrumentation for an indirect_ref with a call to
.ACCESS_WITH_SIZE, we should specially handle the indirect_ref with a
call to .ACCESS_WITH_SIZE, and get the index and bound info from the
arguments of the call.

gcc/c-family/ChangeLog:

* c-gimplify.cc (ubsan_walk_array_refs_r): Instrument indirect_ref.
* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds_indirect_ref): New function.
(ubsan_indirect_ref_instrumented_p): New function.
(ubsan_maybe_instrument_indirect_ref): New function.
* c-ubsan.h (ubsan_maybe_instrument_indirect_ref): New prototype.

gcc/c/ChangeLog:

* c-typeck.cc (build_counted_by_ref): Minor style fix.
(build_access_with_size_for_counted_by): Add one more argument.
(build_array_ref): Set the 5th argument of a call to .ACCESS_WITH_SIZE
to the index.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-gimplify.cc|   2 +
 gcc/c-family/c-ubsan.cc   | 130 ++
 gcc/c-family/c-ubsan.h|   1 +
 gcc/c/c-typeck.cc |  14 +-
 .../ubsan/flex-array-counted-by-bounds-2.c|  45 ++
 .../ubsan/flex-array-counted-by-bounds.c  |  46 +++
 6 files changed, 235 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 494da49791d5..25a3ca1a9a99 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -108,6 +108,8 @@ ubsan_walk_array_refs_r (tree *tp, int *walk_subtrees, void 
*data)
 }
   else if (TREE_CODE (*tp) == ARRAY_REF)
 ubsan_maybe_instrument_array_ref (tp, false);
+  else if (TREE_CODE (*tp) == INDIRECT_REF)
+ubsan_maybe_instrument_indirect_ref (tp);
   else if (TREE_CODE (*tp) == MODIFY_EXPR)
 {
   /* Since r7-1900, we gimplify RHS before LHS.  Consider
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..7bb6464eb5b5 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,7 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -501,6 +502,68 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
   *index, bound);
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  unsigned int prec_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 3));
+  tree type = build_nonstandard_integer_type (prec_of_size, 1);
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+/* Instrument array bounds for INDIRECT_REFs whose pointers are
+   POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE.  We create special
+   builtins that gets expanded in the sanopt pass, and make an array
+   dimension of it.  ARRAY is the pointer to the base of the array,
+   which is a call to .ACCESS_WITH_SIZE.
+   We get the INDEX from the 6th argument of the call to .ACCESS_WITH_SIZE
+   Retur

[PATCH v4 3/4] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-01-23 Thread Qing Zhao

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 gcc/tree-object-size.cc   |  47 +
 4 files changed, 299 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   ther

[PATCH] RISC-V: Add regression test for vsetvl bug pr113429

2024-01-23 Thread Patrick O'Neill

The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.

PR 113429

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.

Signed-off-by: Patrick O'Neill 
---
 .../gcc.target/riscv/rvv/vsetvl/pr113429.c| 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
new file mode 100644
index 000..05c3eeecb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+
+long a;
+int b, c, d, e, f, g;
+short h, i, j;
+static int k = 3;
+static int l = 6;
+int m[5][7];
+signed char n;
+int *const o = &c;
+
+signed char(p)(signed char p1, signed char q) {
+  return p1 / q;
+}
+
+void s(unsigned p1) {
+  b = (b ^ p1) & 255;
+}
+
+static long t() {
+  long u;
+  signed char v;
+  d = 1;
+  for (; d <= 4; d++) {
+j = 0;
+for (; j <= 4; j++) {
+  v = 0;
+  for (; v <= 4; v++) {
+if (m[v][v])
+  continue;
+c = 0;
+for (; c <= 4; c++) {
+  n = 0;
+  for (; n <= 4; n++) {
+int *w = &e;
+long r = v;
+u = r == 0 ? a : a % r;
+h |= u;
+*w = g;
+--m[n][c];
+f &= *o;
+  }
+}
+if (p((i < 3) ^ 9, k))
+  ;
+else if (v)
+  return 0;
+  }
+}
+  }
+  return 1;
+}
+
+static char x() {
+  for (;;) {
+t();
+if (l)
+  return 0;
+  }
+}
+
+int main() {
+  x();
+  s(e & 255);
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
-- 
2.34.1

Re: [PATCH] RISC-V: Add regression test for vsetvl bug pr113429

2024-01-23 Thread juzhe.zh...@rivai.ai

ok



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-01-24 08:50
To: gcc-patches
CC: juzhe.zhong; kito.cheng; law; rdapp; vineetg; Patrick O'Neill
Subject: [PATCH] RISC-V: Add regression test for vsetvl bug pr113429
The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.
 
PR 113429
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.
 
Signed-off-by: Patrick O'Neill 
---
.../gcc.target/riscv/rvv/vsetvl/pr113429.c| 70 +++
1 file changed, 70 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
new file mode 100644
index 000..05c3eeecb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+
+long a;
+int b, c, d, e, f, g;
+short h, i, j;
+static int k = 3;
+static int l = 6;
+int m[5][7];
+signed char n;
+int *const o = &c;
+
+signed char(p)(signed char p1, signed char q) {
+  return p1 / q;
+}
+
+void s(unsigned p1) {
+  b = (b ^ p1) & 255;
+}
+
+static long t() {
+  long u;
+  signed char v;
+  d = 1;
+  for (; d <= 4; d++) {
+j = 0;
+for (; j <= 4; j++) {
+  v = 0;
+  for (; v <= 4; v++) {
+if (m[v][v])
+  continue;
+c = 0;
+for (; c <= 4; c++) {
+  n = 0;
+  for (; n <= 4; n++) {
+int *w = &e;
+long r = v;
+u = r == 0 ? a : a % r;
+h |= u;
+*w = g;
+--m[n][c];
+f &= *o;
+  }
+}
+if (p((i < 3) ^ 9, k))
+  ;
+else if (v)
+  return 0;
+  }
+}
+  }
+  return 1;
+}
+
+static char x() {
+  for (;;) {
+t();
+if (l)
+  return 0;
+  }
+}
+
+int main() {
+  x();
+  s(e & 255);
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
-- 
2.34.1

[Committed] RISC-V: Add regression test for vsetvl bug pr113429

2024-01-23 Thread Patrick O'Neill


The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.

PR target/113429

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.

Signed-off-by: Patrick O'Neill
---
 .../gcc.target/riscv/rvv/vsetvl/pr113429.c| 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
new file mode 100644
index 000..05c3eeecb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+
+long a;
+int b, c, d, e, f, g;
+short h, i, j;
+static int k = 3;
+static int l = 6;
+int m[5][7];
+signed char n;
+int *const o = &c;
+
+signed char(p)(signed char p1, signed char q) {
+  return p1 / q;
+}
+
+void s(unsigned p1) {
+  b = (b ^ p1) & 255;
+}
+
+static long t() {
+  long u;
+  signed char v;
+  d = 1;
+  for (; d <= 4; d++) {
+j = 0;
+for (; j <= 4; j++) {
+  v = 0;
+  for (; v <= 4; v++) {
+if (m[v][v])
+  continue;
+c = 0;
+for (; c <= 4; c++) {
+  n = 0;
+  for (; n <= 4; n++) {
+int *w = &e;
+long r = v;
+u = r == 0 ? a : a % r;
+h |= u;
+*w = g;
+--m[n][c];
+f &= *o;
+  }
+}
+if (p((i < 3) ^ 9, k))
+  ;
+else if (v)
+  return 0;
+  }
+}
+  }
+  return 1;
+}
+
+static char x() {
+  for (;;) {
+t();
+if (l)
+  return 0;
+  }
+}
+
+int main() {
+  x();
+  s(e & 255);
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
--
2.34.1

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-23 Thread Kewen.Lin

Hi Mike,

on 2024/1/12 01:29, Michael Meissner wrote:
> This is version 2 of the patch.  The only difference is I made the test case
> simpler to read.
> 
> In looking at support for load vector pair and store vector pair for the
> PowerPC in GCC, I noticed that we were missing a print_operand output modifier
> if you are dealing with vector pairs to print the 2nd register in the vector
> pair.
> 
> If the instruction inside of the asm used the Altivec encoding, then we could
> use the %L modifier:
> 
>   __vector_pair *p, *q, *r;
>   // ...
>   __asm__ ("vaddudm %0,%1,%2\n\tvaddudm %L0,%L1,%L2"
>: "=v" (*p)
>: "v" (*q), "v" (*r));
> 
> Likewise if we know the value to be in a tradiational FPR register, %L will
> work for instructions that use the VSX encoding:
> 
>   __vector_pair *p, *q, *r;
>   // ...
>   __asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %L0,%L1,%L2"
>: "=f" (*p)
>: "f" (*q), "f" (*r));
> 
> But if have a value that is in a traditional Altivec register, and the
> instruction uses the VSX encoding, %L will a value between 0 and 31, when 
> it
> should give a value between 32 and 63.
> 
> This patch adds %S that acts like %x, except that it adds 1 to the
> register number.
> 
> I have tested this on power10 and power9 little endian systems and on a power9
> big endian system.  There were no regressions in the patch.  Can I apply it to
> the trunk?
> 
> It would be nice if I could apply it to the open branches.  Can I backport it
> after a burn-in period?
> 
> 2024-01-10  Michael Meissner  
> 
> gcc/
> 
>   PR target/112886
>   * config/rs6000/rs6000.cc (print_operand): Add %S output modifier.
>   * doc/md.texi (Modifiers): Mention %S can be used like %x.
> 
> gcc/testsuite/
> 
>   PR target/112886
>   * /gcc.target/powerpc/pr112886.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc | 10 ---
>  gcc/doc/md.texi |  5 ++--
>  gcc/testsuite/gcc.target/powerpc/pr112886.c | 29 +
>  3 files changed, 39 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr112886.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 872df3140e3..6353a7ccfb2 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14504,13 +14504,17 @@ print_operand (FILE *file, rtx x, int code)
>   print_operand (file, x, 0);
>return;
>  
> +case 'S':
>  case 'x':
> -  /* X is a FPR or Altivec register used in a VSX context.  */
> +  /* X is a FPR or Altivec register used in a VSX context.  %x prints
> +  the VSX register number, %S prints the 2nd register number for
> +  vector pair, decimal 128-bit floating and IBM 128-bit binary floating
> +  values.  */
>if (!REG_P (x) || !VSX_REGNO_P (REGNO (x)))
> - output_operand_lossage ("invalid %%x value");
> + output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : 
> 'x'));

Nit: Seems simpler with
  
  output_operand_lossage ("invalid %%%c value", (char) code);

>else
>   {
> -   int reg = REGNO (x);
> +   int reg = REGNO (x) + (code == 'S' ? 1 : 0);
> int vsx_reg = (FP_REGNO_P (reg)
>? reg - 32
>: reg - FIRST_ALTIVEC_REGNO + 32);
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 47a87d6ceec..53ec957cb23 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -3386,8 +3386,9 @@ A VSX register (VSR), @code{vs0}@dots{}@code{vs63}.  
> This is either an
>  FPR (@code{vs0}@dots{}@code{vs31} are @code{f0}@dots{}@code{f31}) or a VR
>  (@code{vs32}@dots{}@code{vs63} are @code{v0}@dots{}@code{v31}).
>  
> -When using @code{wa}, you should use the @code{%x} output modifier, so that
> -the correct register number is printed.  For example:
> +When using @code{wa}, you should use either the @code{%x} or @code{%S}
> +output modifier, so that the correct register number is printed.  For
> +example:

As I questioned in [1], "It seems there is no Power specific documentation on
operand modifiers like this %L?", even if we document this new "%S" as above,
users can still be confused.

Does it sound better with something like " ... @code{%x} or @code{%S} (for the
next consecutive register number in the context like vector pair etc.) ... "?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642414.html

>  
>  @smallexample
>  asm ("xvadddp %x0,%x1,%x2"

And we can also extend this by adding one more example like your associated test
case (hope it can make it more clear).

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr112886.c 
> b/gcc/testsuite/gcc.target/powerpc/pr112886.c
> new file mode 100644
> index 000..4e59dcda6ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr112886.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-23 Thread Peter Bergner

On 1/23/24 8:30 PM, Kewen.Lin wrote:
>> -output_operand_lossage ("invalid %%x value");
>> +output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : 
>> 'x'));
> 
> Nit: Seems simpler with
>   
>   output_operand_lossage ("invalid %%%c value", (char) code);

Agreed, good catch.

>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr112886.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr112886.c
>> new file mode 100644
>> index 000..4e59dcda6ea
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr112886.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target power10_ok } */
> 
> I think this needs one more:
> 
> /* { dg-require-effective-target powerpc_vsx_ok } */

I agree with this...

>> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> 
> ... and
> 
> /* { dg-options "-mdejagnu-cpu=power10 -O2 -mvsx" } */
> 
> , otherwise with explicit -mno-vsx, this test case would fail.

But not with this.  The -mdejagnu-cpu=power10 option already enables -mvsx.
If the user explcitly forces -mno-vsx via RUNTESTFLAGS, then let them.
The options set in RUNTESTFLAGS come after the options in the dg-options
line, so even adding -mvsx like the above won't help the test case PASS
if we didn't have the powerpc_vsx_ok test.  In other words, the -mvsx option
doesn't help with anything.

All we need is the new powerpc_vsx_ok check and that will guard against the FAIL
in the case the user forces -mno-vsx.  In that case, we'll just get an 
UNSUPPORTED
and that is fine.

Peter

[PATCH v1] RISC-V: Bugfix for vls integer mode calling convention

2024-01-23 Thread pan2 . li

From: Pan Li 

According to the issue as below.

https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/416

When the mode size of vls integer mode is less than 2 * XLEN, we will
take the gpr/fpr for both the args and the return values. Instead of
the reference. For example the below code:

typedef short v8hi __attribute__ ((vector_size (16)));

v8hi __attribute__((noinline))
add (v8hi a, v8hi b)
{
  v8hi r = a + b;
  return r;
}

Before this patch:
add:
  vsetivli zero,8,e16,m1,ta,ma
  vle16.v  v1,0(a1) <== arg by reference
  vle16.v  v2,0(a2) <== arg by reference
  vadd.vv  v1,v1,v2
  vse16.v  v1,0(a0) <== return by reference
  ret

After this patch:
add:
  addi sp,sp,-32
  sd   a0,0(sp)  <== arg by register a0 - a3
  sd   a1,8(sp)
  sd   a2,16(sp)
  sd   a3,24(sp)
  addi a5,sp,16
  vsetivli zero,8,e16,m1,ta,ma
  vle16.v  v2,0(sp)
  vle16.v  v1,0(a5)
  vadd.vv  v1,v1,v2
  vse16.v  v1,0(sp)
  ld   a0,0(sp)  <== return by a0 - a1.
  ld   a1,8(sp)
  addi sp,sp,32
  jr   ra

For vls floating point, the things get more complicated.  We follow
the below rules.

1. Vls element count <= 2 and vls size <= 2 * xlen, go fpr.
2. Vls size <= 2 * xlen, go gpr.
3. Vls size > 2 * xlen, go reference.

One exceptions is V2DF mode, we treat vls mode as aggregated and we will
have TFmode here.  Unforturnately, the emit_move_multi_word cannot take
care of TFmode elegantly and we go to gpr for V2DF mode.

The riscv regression passed for this patch.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_v_ext_vector_or_tuple_mode_p):
New predicate function for vector or tuple vector.
(riscv_v_vls_mode_aggregate_reg_count): New function to
calculate the gpr/fpr count required by vls mode.
(riscv_gpr_unit_size): New function to get gpr in bytes.
(riscv_fpr_unit_size): New function to get fpr in bytes.
(riscv_v_vls_to_gpr_mode): New function convert vls mode to gpr mode.
(riscv_v_vls_to_fpr_mode): New function convert vls mode to fpr mode.
(riscv_pass_vls_aggregate_in_gpr_or_fpr): New function to return
the rtx of gpr/fpr for vls mode.
(riscv_mode_pass_by_reference_p): New predicate function to
indicate the mode will be passed by reference or not.
(riscv_get_arg_info): Add vls mode handling.
(riscv_pass_by_reference): Return false if arg info has no zero
gpr count.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add helper marcos.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-9.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-run-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 185 +-
 .../rvv/autovec/vls/calling-convention-1.c| 154 +++
 .../rvv/autovec/vls/calling-convention-10.c   |  51 +
 .../rvv/autovec/vls/calling-convention-2.c| 142 ++
 .../rvv/autovec/vls/calling-convention-3.c| 130 
 .../rvv/autovec/vls/calling-convention-4.c| 118 +++
 .../rvv/autovec/vls/calling-convention-5.c| 141 +
 .../rvv/autovec/vls/calling-convention-6.c| 129 
 .../rvv/autovec/vls/calling-convention-7.c| 120 
 .../rvv/autovec/vls/calling-convention-8.c|  43 
 .../rvv/autovec/vls/calling-convention-9.c|  51 +
 .../autovec/vls/calling-convention-run-1.c|  55 ++
 .../autovec/vls/calling-convention-run-2.c|  55 ++
 .../autovec/vls/calling-convention-run-3.c|  55 ++
 .../autovec/vls/calling-convention-run-4.c|  55 ++
 .../autovec/vls/calling-convention-run-5.c|  55 ++
 .../autovec/vls/calling-convention-run-6.c|  55 ++
 .../gcc.target/riscv/rvv/autovec/vls/def.h|  74 +++
 18 files changed, 1665 insertions(+), 3 de

[r14-8346 Regression] FAIL: gcc.dg/guality/pr54796.c -Os -DPREVENT_OPTIMIZATION line 17 c == 5 on Linux/x86_64

2024-01-23 Thread haochen.jiang

On Linux/x86_64,

a98d5130a6dcff2ed4db371e500550134777b8cf is the first bad commit
commit a98d5130a6dcff2ed4db371e500550134777b8cf
Author: Richard Biener 
Date:   Mon Jan 15 12:55:20 2024 +0100

rtl-optimization/113255 - base_alias_check vs. pointer difference

caused

FAIL: gcc.dg/torture/pr113255.c   -O0  (test for excess errors)
FAIL: gcc.dg/torture/pr113255.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/pr113255.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr113255.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/pr113255.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr113255.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/pr113255.c   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-8346/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr113255.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr113255.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH] xtensa: Make full transition to LRA

2024-01-23 Thread Max Filippov

Hi Suwa-san,

I've finally processed the new issues introduced by this change.

On Wed, May 10, 2023 at 2:10 AM Max Filippov  wrote:
> On Mon, May 8, 2023 at 6:38 AM Takayuki 'January June' Suwa
>  wrote:
> >
> > gcc/ChangeLog:
> >
> > * config/xtensa/constraints.md (R, T, U):
> > Change define_constraint to define_memory_constraint.
> > * config/xtensa/xtensa.cc
> > (xtensa_lra_p, TARGET_LRA_P): Remove.
> > (xtensa_emit_move_sequence): Remove "if (reload_in_progress)"
> > clause as it can no longer be true.
> > (xtensa_output_integer_literal_parts): Consider 16-bit wide
> > constants.
> > (xtensa_legitimate_constant_p): Add short-circuit path for
> > integer load instructions.
> > * config/xtensa/xtensa.md (movsf): Use can_create_pseudo_p()
> > rather reload_in_progress and reload_completed.
> > * config/xtensa/xtensa.opt (mlra): Remove.
> > ---
> >  gcc/config/xtensa/constraints.md | 26 --
> >  gcc/config/xtensa/xtensa.cc  | 26 +-
> >  gcc/config/xtensa/xtensa.md  |  2 +-
> >  gcc/config/xtensa/xtensa.opt |  4 
> >  4 files changed, 14 insertions(+), 44 deletions(-)
>
> That's impressive.
> This version introduces a few execution failures in the testsuite on
> little endian targets and a bunch more (but not all, some execution
> tests still pass) on big endian.
> I'm traveling this week and likely won't be able to take a deep look
> into it until 5/15.
>
> New LE failures:

All of the LE failures are related to zero-overhead loops. Dropping the
operand 2 from the doloop_end pattern fixes that (change 1).

> New BE failures:

All of the BE failures are related to loading HImode constants into
registers, which instead of

.literal .LCx value
...
l32r register, .LCx

now generates the following code:

.literal .LCx value
.literal .LCy .LCx
...
l32r register1, .LCy
l16ui register, register1, 0

I've fixed that by allowing HImode constants in the literal pool in the
'move_operand' predicate, making addresses of such constants
legitimate in the xtensa_legitimate_address_p and adding an
alternative with l32r opcode to the movhi_internal pattern (change 2).

With these additional changes there's no new regression failures
and the generated code looks mostly the same as with the reload.

--
Thanks.
-- Max
From 0fb9ddfd22d11579674ac4a95912d2bc5612deb7 Mon Sep 17 00:00:00 2001
From: Max Filippov 
Date: Sun, 21 Jan 2024 16:14:20 -0800
Subject: [PATCH 1/2] gcc: xtensa: drop operand 2 from doloop_end pattern

gcc/ChangeLog:
	* config/xtensa/xtensa.md (doloop_end): Drop operand 2.
---
 gcc/config/xtensa/xtensa.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 7aded86e244f..a9c37da48b81 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -2368,14 +2368,12 @@
 	  (set (match_dup 0)
 		   (plus:SI (match_dup 0)
 			(const_int -1)))
-	  (unspec [(const_int 0)] UNSPEC_LSETUP_END)
-	  (clobber (match_dup 2))])] ; match_scratch
+	  (unspec [(const_int 0)] UNSPEC_LSETUP_END)])]
   "TARGET_LOOPS && optimize"
 {
   /* The loop optimizer doesn't check the predicates... */
   if (GET_MODE (operands[0]) != SImode)
 FAIL;
-  operands[2] = gen_rtx_SCRATCH (SImode);
 })
 
 
-- 
2.39.2

From e5536a47e9f1ae856c2491919933d18866511991 Mon Sep 17 00:00:00 2001
From: Max Filippov 
Date: Tue, 23 Jan 2024 10:57:21 -0800
Subject: [PATCH 2/2] gcc: xtensa: fix HImode constant loads

gcc/ChangeLog:
	* config/xtensa/predicates.md (move_operand): Don't check that a
	constant pool operand size is a multiple of UNITS_PER_WORD.
	* config/xtensa/xtensa.cc (xtensa_legitimate_address_p): Don't
	check that mode size is at least UNITS_PER_WORD.
	* config/xtensa/xtensa.md (movhi_internal): Add alternative
	loading constant from a literal pool.
---
 gcc/config/xtensa/predicates.md | 4 +---
 gcc/config/xtensa/xtensa.cc | 2 +-
 gcc/config/xtensa/xtensa.md | 9 +
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 672fb003a6c5..dd77911e3b70 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -143,9 +143,7 @@
 (define_predicate "move_operand"
   (ior
  (ior (match_operand 0 "register_operand")
-	  (and (match_operand 0 "memory_operand")
-	   (match_test "!constantpool_mem_p (op)
-			|| GET_MODE_SIZE (mode) % UNITS_PER_WORD == 0")))
+	  (match_operand 0 "memory_operand"))
  (ior (and (match_code "const_int")
 	   (match_test "(GET_MODE_CLASS (mode) == MODE_INT
 			 && xtensa_simm12b (INTVAL (op)))
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 39ef576ac4a9..3c2d21fe8e2e 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -2343,7 +2343,7 @@ xtensa_legiti

[PATCH v2] xtensa: Make full transition to LRA

2024-01-23 Thread Max Filippov

From: Takayuki 'January June' Suwa 

gcc/ChangeLog:

* config/xtensa/constraints.md (R, T, U):
Change define_constraint to define_memory_constraint.
* config/xtensa/predicates.md (move_operand): Don't check that a
constant pool operand size is a multiple of UNITS_PER_WORD.
* config/xtensa/xtensa.cc
(xtensa_lra_p, TARGET_LRA_P): Remove.
(xtensa_emit_move_sequence): Remove "if (reload_in_progress)"
clause as it can no longer be true.
(fixup_subreg_mem): Drop function.
(xtensa_output_integer_literal_parts): Consider 16-bit wide
constants.
(xtensa_legitimate_constant_p): Add short-circuit path for
integer load instructions. Don't check that mode size is
at least UNITS_PER_WORD.
* config/xtensa/xtensa.md (movsf): Use can_create_pseudo_p()
rather reload_in_progress and reload_completed.
(doloop_end): Drop operand 2.
(movhi_internal): Add alternative loading constant from a
literal pool.
* config/xtensa/xtensa.opt (mlra): Change to no effect.
---
 gcc/config/xtensa/constraints.md | 26 ++
 gcc/config/xtensa/predicates.md  |  4 +--
 gcc/config/xtensa/xtensa.cc  | 46 +---
 gcc/config/xtensa/xtensa.md  | 15 +--
 gcc/config/xtensa/xtensa.opt |  4 +--
 5 files changed, 24 insertions(+), 71 deletions(-)

diff --git a/gcc/config/xtensa/constraints.md b/gcc/config/xtensa/constraints.md
index 5cade1db8ff1..dc6ffb5ba15c 100644
--- a/gcc/config/xtensa/constraints.md
+++ b/gcc/config/xtensa/constraints.md
@@ -123,29 +123,19 @@
   (and (match_code "const_int")
   (match_test "! xtensa_split1_finished_p ()"
 
-;; Memory constraints.  Do not use define_memory_constraint here.  Doing so
-;; causes reload to force some constants into the constant pool, but since
-;; the Xtensa constant pool can only be accessed with L32R instructions, it
-;; is always better to just copy a constant into a register.  Instead, use
-;; regular constraints but add a check to allow pseudos during reload.
+;; Memory constraints.
 
-(define_constraint "R"
+(define_memory_constraint "R"
  "Memory that can be accessed with a 4-bit unsigned offset from a register."
- (ior (and (match_code "mem")
-  (match_test "smalloffset_mem_p (op)"))
-  (and (match_code "reg")
-  (match_test "reload_in_progress
-   && REGNO (op) >= FIRST_PSEUDO_REGISTER"
+ (and (match_code "mem")
+  (match_test "smalloffset_mem_p (op)")))
 
-(define_constraint "T"
+(define_memory_constraint "T"
  "Memory in a literal pool (addressable with an L32R instruction)."
  (and (match_code "mem")
   (match_test "!TARGET_CONST16 && constantpool_mem_p (op)")))
 
-(define_constraint "U"
+(define_memory_constraint "U"
  "Memory that is not in a literal pool."
- (ior (and (match_code "mem")
-  (match_test "! constantpool_mem_p (op)"))
-  (and (match_code "reg")
-  (match_test "reload_in_progress
-   && REGNO (op) >= FIRST_PSEUDO_REGISTER"
+ (and (match_code "mem")
+  (match_test "! constantpool_mem_p (op)")))
diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 672fb003a6c5..dd77911e3b70 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -143,9 +143,7 @@
 (define_predicate "move_operand"
   (ior
  (ior (match_operand 0 "register_operand")
- (and (match_operand 0 "memory_operand")
-  (match_test "!constantpool_mem_p (op)
-   || GET_MODE_SIZE (mode) % UNITS_PER_WORD == 0")))
+ (match_operand 0 "memory_operand"))
  (ior (and (match_code "const_int")
   (match_test "(GET_MODE_CLASS (mode) == MODE_INT
 && xtensa_simm12b (INTVAL (op)))
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index a4f8e3e49d06..22b4416f48e4 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -115,7 +115,6 @@ static enum internal_test map_test_to_internal_test (enum 
rtx_code);
 static rtx gen_int_relational (enum rtx_code, rtx, rtx);
 static rtx gen_float_relational (enum rtx_code, rtx, rtx);
 static rtx gen_conditional_move (enum rtx_code, machine_mode, rtx, rtx);
-static rtx fixup_subreg_mem (rtx);
 static struct machine_function * xtensa_init_machine_status (void);
 static rtx xtensa_legitimize_tls_address (rtx);
 static rtx xtensa_legitimize_address (rtx, rtx, machine_mode);
@@ -192,7 +191,6 @@ static void xtensa_output_mi_thunk (FILE *file, tree thunk 
ATTRIBUTE_UNUSED,
HOST_WIDE_INT delta,
HOST_WIDE_INT vcall_offset,
tree function);
-static bool xtensa_lra_p (void);
 
 static rtx xtensa_delegitimize_address (rtx);
 
@@ -286,9 +284,6 @@ static rtx xtensa_delegitimize_address (

[patch] PR 81271: gcc/cp/lex.c:116: wrong condition ?

2024-01-23 Thread Jasmine Tang

Change the style from & to && to reflect boolean result with boolean
operation (instead of bitwise operation)

David Binderman 2017-07-01 13:24:44 UTC

trunk/gcc/cp/lex.c:116]: (style) Boolean result is used in bitwise
operation. Clarify expression with parentheses.

Source code is

  gcc_checking_assert (!IDENTIFIER_KIND_BIT_2 (id)
   & !IDENTIFIER_KIND_BIT_1 (id)
   & !IDENTIFIER_KIND_BIT_0 (id));

Maybe better code

  gcc_checking_assert (!IDENTIFIER_KIND_BIT_2 (id)
   && !IDENTIFIER_KIND_BIT_1 (id)
   && !IDENTIFIER_KIND_BIT_0 (id));
From 10b501ffa8a11c7f10fd6e6ab5d9a876a321fe13 Mon Sep 17 00:00:00 2001
From: Jasmine 
Date: Tue, 23 Jan 2024 21:18:13 -0800
Subject: [PATCH] Fix compiler warning: Boolean result is used in bitwise
 operation

---
 gcc/cp/lex.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/lex.cc b/gcc/cp/lex.cc
index 1110db7f8d0..8d94ae1e7b1 100644
--- a/gcc/cp/lex.cc
+++ b/gcc/cp/lex.cc
@@ -136,8 +136,8 @@ void
 set_identifier_kind (tree id, cp_identifier_kind kind)
 {
   gcc_checking_assert (!IDENTIFIER_KIND_BIT_2 (id)
-		   & !IDENTIFIER_KIND_BIT_1 (id)
-		   & !IDENTIFIER_KIND_BIT_0 (id));
+		   && !IDENTIFIER_KIND_BIT_1 (id)
+		   && !IDENTIFIER_KIND_BIT_0 (id));
   IDENTIFIER_KIND_BIT_2 (id) |= (kind >> 2) & 1;
   IDENTIFIER_KIND_BIT_1 (id) |= (kind >> 1) & 1;
   IDENTIFIER_KIND_BIT_0 (id) |= (kind >> 0) & 1;
-- 
2.34.1

[patch] PR 81271: gcc/cp/lex.c:116: wrong condition ?

2024-01-23 Thread Jasmine Tang

Change the style from & to && to reflect boolean result with boolean
operation (instead of bitwise operation)


>From 10b501ffa8a11c7f10fd6e6ab5d9a876a321fe13 Mon Sep 17 00:00:00 2001
From: Jasmine 
Date: Tue, 23 Jan 2024 21:18:13 -0800
Subject: [PATCH] Fix compiler warning: Boolean result is used in bitwise
 operation

---
 gcc/cp/lex.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/lex.cc b/gcc/cp/lex.cc
index 1110db7f8d0..8d94ae1e7b1 100644
--- a/gcc/cp/lex.cc
+++ b/gcc/cp/lex.cc
@@ -136,8 +136,8 @@ void
 set_identifier_kind (tree id, cp_identifier_kind kind)
 {
   gcc_checking_assert (!IDENTIFIER_KIND_BIT_2 (id)
-   & !IDENTIFIER_KIND_BIT_1 (id)
-   & !IDENTIFIER_KIND_BIT_0 (id));
+   && !IDENTIFIER_KIND_BIT_1 (id)
+   && !IDENTIFIER_KIND_BIT_0 (id));
   IDENTIFIER_KIND_BIT_2 (id) |= (kind >> 2) & 1;
   IDENTIFIER_KIND_BIT_1 (id) |= (kind >> 1) & 1;
   IDENTIFIER_KIND_BIT_0 (id) |= (kind >> 0) & 1;
-- 
2.34.1

Re: [PATCH, V2] PR target/112886, Add %S to print_operand for vector pair support.

2024-01-23 Thread Kewen.Lin

on 2024/1/24 11:11, Peter Bergner wrote:
> On 1/23/24 8:30 PM, Kewen.Lin wrote:
>>> -   output_operand_lossage ("invalid %%x value");
>>> +   output_operand_lossage ("invalid %%%c value", (code == 'S' ? 'S' : 
>>> 'x'));
>>
>> Nit: Seems simpler with
>>   
>>   output_operand_lossage ("invalid %%%c value", (char) code);
> 
> Agreed, good catch.
> 
> 
> 
> 
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr112886.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr112886.c
>>> new file mode 100644
>>> index 000..4e59dcda6ea
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr112886.c
>>> @@ -0,0 +1,29 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target power10_ok } */
>>
>> I think this needs one more:
>>
>> /* { dg-require-effective-target powerpc_vsx_ok } */
> 
> I agree with this...
> 
> 
> 
>>> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
>>
>> ... and
>>
>> /* { dg-options "-mdejagnu-cpu=power10 -O2 -mvsx" } */
>>
>> , otherwise with explicit -mno-vsx, this test case would fail.
> 
> But not with this.  The -mdejagnu-cpu=power10 option already enables -mvsx.
> If the user explcitly forces -mno-vsx via RUNTESTFLAGS, then let them.
> The options set in RUNTESTFLAGS come after the options in the dg-options
> line, so even adding -mvsx like the above won't help the test case PASS

But this is NOT true, at least on one of internal Power10 machine (ltcden2-lp1).

With the command below:
  
  make check-gcc-c RUNTESTFLAGS="--target_board=unix/-mno-vsx 
powerpc.exp=pr112886.c"

this test case fails without the explicit -mvsx in dg-options.

>From the verbose dumping, the compilation command looks like:

/home/linkw/gcc/build/gcc-test-debug/gcc/xgcc 
-B/home/linkw/gcc/build/gcc-test-debug/gcc/
/home/linkw/gcc/gcc-test/gcc/testsuite/gcc.target/powerpc/pr112886.c  -mno-vsx 
-fdiagnostics-plain-output  -mdejagnu-cpu=power10 -O2 -ffat-lto-objects 
-fno-ident -S
-o pr112886.s

"-mno-vsx" comes **before** "-mdejagnu-cpu=power10 -O2" rather than **after**.

I guess it might be due to different behaviors of different versions of runtest 
framework?

So there can be two cases with user explicitly specified -mno-vsx:

1) RUNTESTFLAGS comes after dg-options (assuming same order for -mvsx in 
powerpc_vsx_ok)

  powerpc_vsx_ok test failed, so UNSUPPORTED

  // with explicit -mvsx does nothing as you said.

2) RUNTESTFLAGS comes before dg-options

  powerpc_vsx_ok test succeeds, but FAIL.
  
 // with suggested -mvsx, make it match the powerpc_vsx_ok checking and the 
case not fail.

As above I think we still need to append the "-mvsx" explicitly.  As 
tested/verified, it
does help the case not to fail on ltcden2-lp1.

BR,
Kewen

> if we didn't have the powerpc_vsx_ok test.  In other words, the -mvsx option
> doesn't help with anything.
> 
> All we need is the new powerpc_vsx_ok check and that will guard against the 
> FAIL
> in the case the user forces -mno-vsx.  In that case, we'll just get an 
> UNSUPPORTED
> and that is fine.
> 
> Peter
> 
> 
>

Re: [Fortran] half-cycle trig functions and atan[d] fixes

2024-01-23 Thread Steve Kargl

On Tue, Jan 23, 2024 at 01:37:54PM +0200, Janne Blomqvist wrote:
> 
> > - If I get this right, to take one example, the Fortran front-end will emit 
> > a call to gfortran_acospi_r4(), libgfortran provides this as a wrapper 
> > calling acospif(), which is called either from libm or from libgfortran. 
> > This is different from other math library functions, like ACOS() where the 
> > acosf() call is generated directly from the front-end (and then the 
> > implementation comes either from libm or from libgfortran). Why not follow 
> > our usual way of doing things?
> 
> Good point, that's what we've done in c99_functions.c in libgfortran.
> We should probably do this so we can skip jumping via libgfortran when
> libm implements these directly.
> 

Hopefully, FX sees this as my emails to gmail bounce.

I don't see how this can work or I don't understand symbol versioning
in libraries.

If an OS does not supply cospi() in its libm, then gfortran will
fallback to a cospi() in libgfortran.  With the indirection I 
currently implement, cospi() is not in the symbol map of libgfortran
and _gfortran_cospi_r4() will use the libgfortran version.  Now, if
the OS adds cospi() to libm and it's in libm's symbol map, then the
cospi() used by gfortran depends on the search order of the loaded
libraries.  Consider on FreeBSD, I have

% nm --dynamic /lib/libm.so.5 | grep cospi
00025b60 T cospi@@FBSD_1.7
00025fe0 T cospif@@FBSD_1.7
0002e230 T cospil@@FBSD_1.7

% nm --dynamic work/lib/libgfortran.so.5 | grep cospi
002b1e60 T _gfortran_acospi_r10@@GFORTRAN_14
002b4590 T _gfortran_acospi_r16@@GFORTRAN_14
002b1d80 T _gfortran_acospi_r4@@GFORTRAN_14
002b1df0 T _gfortran_acospi_r8@@GFORTRAN_14
002b1ea0 T _gfortran_cospi_r10@@GFORTRAN_14
002b46d0 T _gfortran_cospi_r16@@GFORTRAN_14
002b1dc0 T _gfortran_cospi_r4@@GFORTRAN_14
002b1e30 T _gfortran_cospi_r8@@GFORTRAN_14
 U cospi@FBSD_1.7
 U cospif@FBSD_1.7
 U cospil@FBSD_1.7

The FE generates code for _gfortran_cospi_rXX.  If FreeBSD
adds say acospif() and libgfortran also exported, then 
how does the linker choose between acospif@@FBSD_1.7 and
acospif@@GFORTRAN_14?.

-- 
Steve

[PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550]

2024-01-23 Thread Andrew Pinski

The split for movv8di is not ready to handle the case where the setting
register overlaps with the address of the memory that is being load.
Fixing the split than just making the output constraint as an early clobber
for this alternative. The split would first need to figure out which register
is overlapping with the address and then only emit that move last.

Build and tested for aarch64-linux-gnu with no regressions

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_movv8di): Mark the last
alternative's output constraint as an early clobber.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 662ef696630..ba079298b84 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7985,7 +7985,7 @@ (define_insn "*aarch64_mov"
 )
 
 (define_insn "*aarch64_movv8di"
-  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r")
+  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,&r")
(match_operand:V8DI 1 "general_operand" " r,r,m"))]
   "(register_operand (operands[0], V8DImode)
 || register_operand (operands[1], V8DImode))"
-- 
2.39.3

Re: [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-01-23 Thread rep . dot . nop

On 22 January 2024 21:33:17 CET, Kwok Cheung Yeung  
wrote:
>Hi
>
>There was a bug in the declare-target-indirect-2.c libgomp testcase (testing 
>indirect calls in offloaded target regions, spread over multiple 
>teams/threads) that due to an errant fallthrough in a switch statement 
>resulted in only one indirect function ever getting called:
>
>switch (i % 3)
>  {
>case 0: fn_ptr[i] = &foo;  // Missing break
>case 1: fn_ptr[i] = &bar;  // Missing break
>case 2: fn_ptr[i] = &baz;
>  }
>
>However, when the missing break statements are added, the testcase fails with 
>an invalid memory access. Upon investigation, this is due to the use of a 
>splay-tree as the lookup structure for indirect addresses, as the splay-tree 
>moves frequently accessed elements closer to the root node and so needs 
>locking when used from multiple threads. However, this would end up partially 
>serialising all the threads and kill performance. I have switched the lookup 
>structure from a splay tree to a hashtab instead to avoid locking during 
>lookup.
>
>I have also tidied up the initialisation of the lookup table by calling it 
>only from the first thread of the first team, instead of redundantly calling 
>it from every thread and only having the first one reached do the 
>initialisation. This removes the need for locking during initialisation.
>
>Tested with offloading to NVPTX and GCN with a x86_64 host. Okay for master?


Can you please akso update the comments to talk about hashtab instead of splay?

TIA

>
>Thanks
>
>Kwok

89 matches

Mail list logo