date:20240614

[PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch

2024-06-14 Thread pan2 . li

From: Pan Li 

We have vec_extract pattern which takes ZVFHMIN as the mode
iterator of the V mode.  Aka VF_ZVFHMIN iterator.  But it will
expand to pred_extract_first pattern which takes the ZVFH as the mode
iterator of the V mode.  AKa VF.  The mismatch will result in one ICE
similar as below:

insn 30 29 31 2 (set (reg:HF 156 [ _2 ])
(unspec:HF [
(vec_select:HF (reg:RVVMF2HF 134 [ _1 ])
(parallel [
(const_int 0 [0])
]))
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1
 (nil))
during RTL pass: vregs
compress_run-2.c:25:1: internal compiler error: in extract_insn, at
recog.cc:2812
0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../../gcc/gcc/rtl-error.cc:108
0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
../../../gcc/gcc/rtl-error.cc:116
0xb3a545 extract_insn(rtx_insn*)
../../../gcc/gcc/recog.cc:2812
0x1010e9e instantiate_virtual_regs_in_insn
../../../gcc/gcc/function.cc:1612
0x1010e9e instantiate_virtual_regs
../../../gcc/gcc/function.cc:1995
0x1010e9e execute
../../../gcc/gcc/function.cc:2042

The below test suites are passed for this patch.
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc.

There may be other similar issue(s) for the mismatch,  we will take care
of them by test cases one by one.

PR target/115456

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V
which contains the VF_ZVFHMIN for alignment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115456-2.c: New test.
* gcc.target/riscv/rvv/base/pr115456-3.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/vector-iterators.md  |  4 ++-
 .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++
 .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++
 3 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 47392d0da4c..43137a2a379 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF])
 
 (define_mode_iterator V [VI VF_ZVFHMIN])
 
+(define_mode_iterator V_ZVFH [VI VF])
+
 (define_mode_iterator V_VLS [V VLS])
 
-(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH])
+(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH])
 
 (define_mode_iterator V_VLSI [VI VLSI])
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
new file mode 100644
index 000..453e18b1c79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
@@ -0,0 +1,31 @@
+/* Test there is no ICE when compile.  */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d -O3 
-ftree-vectorize" } */
+
+#include 
+#include 
+
+typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
+
+vnx4f __attribute__ ((noinline, noclone))
+test_5 (vnx4f x, vnx4f y)
+{
+  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
+}
+
+int
+main (void)
+{
+  vnx4f test_5_x = {0, 1, 3, 4};
+  vnx4f test_5_y = {4, 5, 6, 7};
+  vnx4f test_5_except = {1, 4, 6, 7};
+  vnx4f test_5_real;
+  test_5_real = test_5 (test_5_x, test_5_y);
+
+  for (int i = 0; i < 4; i++)
+assert (test_5_real[i] == test_5_except[i]);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
new file mode 100644
index 000..2c54f1d7538
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize" } */
+
+#include 
+#include 
+
+typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
+
+vnx4f __attribute__ ((noinline, noclone))
+test_5 (vnx4f x, vnx4f y)
+{
+  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
+}
+
+int
+main (void)
+{
+  vnx4f test_5_x = {0, 1, 3, 4};
+  vnx4f test_5_y = {4, 5, 6, 7};
+  vnx4f test_5_except = {1, 4, 6, 7};
+  vnx4f test_5_real;
+  test_5_real = test_5 (test_5_x, test_5_y);
+
+  for (int i = 0; i < 4; i++)
+assert (test_5_real[i] == test_5_except[i]);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not {call\s+__extendhfsf2} } } */
+/* { dg-final { scan-assembler-times {vfmv\.f\.s\s+fa[0-9]+,\s*v[0-9]+} 4 } } 
*/
-- 
2.34.1

[PATCH] Fix fallout of peeling for gap improvements

2024-06-14 Thread Richard Biener

The following hopefully addresses an observed bootstrap issue on aarch64
where maybe-uninit diagnostics occur.  It also fixes bogus napkin math
from myself when I was confusing rounded up size of a single access
with rounded up size of the group accessed in a single scalar iteration.
So the following puts in a correctness check, leaving a set of peeling
for gaps as insufficient.  This could be rectified by splitting the
last load into multiple ones but I'm leaving this for a followup, better
quickly fix the reported wrong-code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (get_group_load_store_type): Do not
re-use poly-int remain but re-compute with non-poly values.
Verify the shortened load is good enough to be covered with
a single scalar gap iteration before accepting it.

* gcc.dg/vect/pr115385.c: Enable AVX2 if available.
---
 gcc/testsuite/gcc.dg/vect/pr115385.c |  1 +
 gcc/tree-vect-stmts.cc   | 12 +++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c 
b/gcc/testsuite/gcc.dg/vect/pr115385.c
index a18cd665d7d..baea0b2473f 100644
--- a/gcc/testsuite/gcc.dg/vect/pr115385.c
+++ b/gcc/testsuite/gcc.dg/vect/pr115385.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target mmap } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
 
 #include 
 #include 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e32d44050e5..ca6052662a3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2148,15 +2148,17 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
{
  /* But peeling a single scalar iteration is enough if
 we can use the next power-of-two sized partial
-access.  */
+access and that is sufficiently small to be covered
+by the single scalar iteration.  */
  unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
  if (!nunits.is_constant (&cnunits)
  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
- || ((cremain = remain.to_constant (), true)
+ || (((cremain = group_size * cvf - gap % cnunits), true)
  && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits)
- && vector_vector_composition_type
-  (vectype, cnunits / cpart_size,
-   &half_vtype) == NULL_TREE))
+ && (cremain + group_size < cpart_size
+ || vector_vector_composition_type
+  (vectype, cnunits / cpart_size,
+   &half_vtype) == NULL_TREE)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-- 
2.35.3

Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-14 Thread Kewen.Lin

Hi!

on 2024/6/14 05:16, Carl Love wrote:
> Segher:
> 
> On 6/13/24 12:51, Segher Boessenkool wrote:
> 
> 
> 
>>
>>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>>> @@ -1,4 +1,4 @@
>>> -/* { dg-do compile { target powerpc*-*-* } } */
>>> +/* { dg-do run { target powerpc*-*-* } } */
>>>  /* { dg-options "-mvsx" } */
>>>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>>> has_arch_pwr8 } } } */
>>>  /* { dg-require-effective-target powerpc_vsx } */
>>
>> Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*"
>> already, so you could remove that target clause even (after testing of
>> course :-) )
>>
>> Okay for trunk with or without that extra tweak.  Thank you!
> 
> I updated the patch by removing the target clause as suggested:
> 
> -/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-do run } */
>  /* { dg-options "-mvsx" } */
>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
>  /* { dg-require-effective-target powerpc_vsx } */

Since you changed this for "run", I think you also want s/powerpc_vsx/vsx_hw/ .

BR,
Kewen

>  
> Retested on Power 10.  Reports 2 passes and no failures.  I will go ahead and 
> commit.
> 
> Thanks. 
> 
>Carl

Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs

2024-06-14 Thread Andre Vehreschild

Hi all,

I messed up renaming of the coarray_alloc_comp-test. This is fixed in the second
version of the patch. Sorry for the inconvenience.

Additionally I figured that this patch also fixed PR fortran/103112.

Regtests ok on x86_64 Fedora 39. Ok for mainline?

Regards,
Andre

On Tue, 11 Jun 2024 16:12:38 +0200
Andre Vehreschild  wrote:

> Hi all,
>
> attached patch has already been present in 2020, but lost my attention. It
> fixes an ICE in the testsuite. The old mails description is:
>
> attached patch fixes PR96418 where the code in the testsuite when compiled
> with -fcoarray=single  lead to an ICE. The reason was that the coarray object
> was derefed as an array, but it was no array. Introducing the test for the
> descriptor removes the ICE.
>
> Regtests ok on x86_64-linux/Fedora 39. Ok for mainline?
>
> Regards,
>   Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
From 50ed0af756a3eb6b9c95a39c379e1b3f2daf331d Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Tue, 11 Jun 2024 15:24:55 +0200
Subject: [PATCH] Fix ICE when compiling with -fcoarray=single, when derefing a
 non-array.

PR fortran/96418
PR fortran/103112

gcc/fortran/ChangeLog:

* trans.c (gfc_deallocate_with_status): Check that object to deref
is an array, before applying array deref.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_alloc_comp_3.f08: Moved to...
* gfortran.dg/coarray/alloc_comp_8.f90: ...here.
Should be tested for both -fcoarray=single and lib, resp.
* gfortran.dg/coarray_alloc_comp_4.f08: Fix program name.
---
 gcc/fortran/trans.cc   | 3 ++-
 .../{coarray_alloc_comp_3.f08 => coarray/alloc_comp_8.f90} | 3 +--
 gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)
 rename gcc/testsuite/gfortran.dg/{coarray_alloc_comp_3.f08 => coarray/alloc_comp_8.f90} (95%)

diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index a208afe90ab..1335b8cc48b 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -1838,7 +1838,8 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	  else
 	caf_dereg_type = (enum gfc_coarray_deregtype) coarray_dealloc_mode;
 	}
-  else if (flag_coarray == GFC_FCOARRAY_SINGLE)
+  else if (flag_coarray == GFC_FCOARRAY_SINGLE
+	   && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (pointer)))
 	pointer = gfc_conv_descriptor_data_get (pointer);
 }
   else if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (pointer)))
diff --git a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_8.f90
similarity index 95%
rename from gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08
rename to gcc/testsuite/gfortran.dg/coarray/alloc_comp_8.f90
index e2037aa5809..8b153925129 100644
--- a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_3.f08
+++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_8.f90
@@ -1,12 +1,11 @@
 ! { dg-do run }
-! { dg-options "-fcoarray=lib -lcaf_single" }
 ! { dg-additional-options "-latomic" { target libatomic_available } }
 !
 ! Contributed by Andre Vehreschild
 ! Check that manually freeing components does not lead to a runtime crash,
 ! when the auto-deallocation is taking care.

-program coarray_alloc_comp_3
+program alloc_comp_6
   implicit none

   type dt
diff --git a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08 b/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08
index 6586ec651dd..4c71a90af8f 100644
--- a/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08
+++ b/gcc/testsuite/gfortran.dg/coarray_alloc_comp_4.f08
@@ -5,7 +5,7 @@
 ! Contributed by Andre Vehreschild
 ! Check that sub-components are caf_deregistered and not freed.

-program coarray_alloc_comp_3
+program coarray_alloc_comp_4
   implicit none

   type dt
--
2.45.1

Re: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch

2024-06-14 Thread Kito Cheng

LGTM, thanks :)

On Fri, Jun 14, 2024 at 3:02 PM  wrote:
>
> From: Pan Li 
>
> We have vec_extract pattern which takes ZVFHMIN as the mode
> iterator of the V mode.  Aka VF_ZVFHMIN iterator.  But it will
> expand to pred_extract_first pattern which takes the ZVFH as the mode
> iterator of the V mode.  AKa VF.  The mismatch will result in one ICE
> similar as below:
>
> insn 30 29 31 2 (set (reg:HF 156 [ _2 ])
> (unspec:HF [
> (vec_select:HF (reg:RVVMF2HF 134 [ _1 ])
> (parallel [
> (const_int 0 [0])
> ]))
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1
>  (nil))
> during RTL pass: vregs
> compress_run-2.c:25:1: internal compiler error: in extract_insn, at
> recog.cc:2812
> 0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char
> const*)
> ../../../gcc/gcc/rtl-error.cc:108
> 0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char
> const*)
> ../../../gcc/gcc/rtl-error.cc:116
> 0xb3a545 extract_insn(rtx_insn*)
> ../../../gcc/gcc/recog.cc:2812
> 0x1010e9e instantiate_virtual_regs_in_insn
> ../../../gcc/gcc/function.cc:1612
> 0x1010e9e instantiate_virtual_regs
> ../../../gcc/gcc/function.cc:1995
> 0x1010e9e execute
> ../../../gcc/gcc/function.cc:2042
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression test.
> 2. The rv64gcv build with glibc.
>
> There may be other similar issue(s) for the mismatch,  we will take care
> of them by test cases one by one.
>
> PR target/115456
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V
> which contains the VF_ZVFHMIN for alignment.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr115456-2.c: New test.
> * gcc.target/riscv/rvv/base/pr115456-3.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/vector-iterators.md  |  4 ++-
>  .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++
>  .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++
>  3 files changed, 65 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 47392d0da4c..43137a2a379 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF])
>
>  (define_mode_iterator V [VI VF_ZVFHMIN])
>
> +(define_mode_iterator V_ZVFH [VI VF])
> +
>  (define_mode_iterator V_VLS [V VLS])
>
> -(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH])
> +(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH])
>
>  (define_mode_iterator V_VLSI [VI VLSI])
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
> new file mode 100644
> index 000..453e18b1c79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
> @@ -0,0 +1,31 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d 
> -O3 -ftree-vectorize" } */
> +
> +#include 
> +#include 
> +
> +typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
> +
> +vnx4f __attribute__ ((noinline, noclone))
> +test_5 (vnx4f x, vnx4f y)
> +{
> +  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
> +}
> +
> +int
> +main (void)
> +{
> +  vnx4f test_5_x = {0, 1, 3, 4};
> +  vnx4f test_5_y = {4, 5, 6, 7};
> +  vnx4f test_5_except = {1, 4, 6, 7};
> +  vnx4f test_5_real;
> +  test_5_real = test_5 (test_5_x, test_5_y);
> +
> +  for (int i = 0; i < 4; i++)
> +assert (test_5_real[i] == test_5_except[i]);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
> new file mode 100644
> index 000..2c54f1d7538
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +#include 
> +#include 
> +
> +typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
> +
> +vnx4f __attribute__ ((noinline, noclone))
> +test_5 (vnx4f x, vnx4f y)
> +{
> +  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
> +}
> +
> +int
> +main (void)
> +{
> +  vnx4f test_5_x = {0, 1, 3, 4};
> +  vnx4f test_5_y = {4, 5, 6, 7};
> +  vnx4f test_5_except = {1, 4, 6, 7};
> +  vnx4f test_5_real;
> +  test_5_real = test_5 (test_5_x, test_5_y);
> +
> +  for (int i = 0; i < 4; i++)
> +assert (test_5_real[

[pushed] wwwdocs: gcc-11: Fix grammar - template alias parameters

2024-06-14 Thread Gerald Pfeifer

Elsewhere, for example in the D documentation, template is used in 
singular, too.

Pushed.

Gerald
---
 htdocs/gcc-11/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index b4ecf3c0..3737af5b 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -585,7 +585,7 @@ You may also want to check out our
   User-defined attributes can now be used to annotate enum members,
alias declarations, and function parameters.
   
-  Templates alias parameters can now be instantiated with basic types
+  Template alias parameters can now be instantiated with basic types
such as int or void function().
   
   The mixin construct can now be used as types in the form
-- 
2.45.2

[COMMITTED 01/16] ada: Remove unused name of aspect from Snames

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

gcc/ada/

* snames.ads-tmpl (Name_Storage_Model): Delete.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/snames.ads-tmpl | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/ada/snames.ads-tmpl b/gcc/ada/snames.ads-tmpl
index 6cc66566907..699b8df5851 100644
--- a/gcc/ada/snames.ads-tmpl
+++ b/gcc/ada/snames.ads-tmpl
@@ -165,7 +165,6 @@ package Snames is
Name_Relaxed_Initialization : constant Name_Id := N + $;
Name_Stable_Properties  : constant Name_Id := N + $;
Name_Static_Predicate   : constant Name_Id := N + $;
-   Name_Storage_Model  : constant Name_Id := N + $;
Name_Storage_Model_Type : constant Name_Id := N + $;
Name_String_Literal : constant Name_Id := N + $;
Name_Synchronization: constant Name_Id := N + $;
-- 
2.45.1

[COMMITTED 04/16] ada: Missing initialization of multidimensional array using sliding

2024-06-14 Thread Marc Poulhiès

From: Javier Miranda 

When a multidimensional array is initialized with an array
aggregate, and inner dimensions of the array are initialized
with array subaggregates using sliding, the code generated
by the compiler does not initialize the inner dimensions
of the array.

gcc/ada/

* exp_aggr.adb (Must_Slide): Add missing support for
multidimensional arrays.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 54 +++-
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 796b0f1e0de..2686f5b3b82 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -154,8 +154,8 @@ package body Exp_Aggr is
--  case the aggregate must slide, and we must introduce an intermediate
--  temporary to hold it.
--
-   --  The same holds in an assignment to one-dimensional array of arrays,
-   --  when a component may be given with bounds that differ from those of the
+   --  The same holds in an assignment to multi-dimensional arrays, when
+   --  components may be given with bounds that differ from those of the
--  component type.
 
function Number_Of_Choices (N : Node_Id) return Nat;
@@ -9550,32 +9550,44 @@ package body Exp_Aggr is
   elsif Is_Others_Aggregate (Aggr) then
  return False;
 
-  else
- --  Sliding can only occur along the first dimension
- --  If any the bounds of non-static sliding is required
- --  to force potential range checks.
+  --  Check if sliding is required
 
+  else
  declare
-Bounds1 : constant Range_Nodes :=
-  Get_Index_Bounds (First_Index (Typ));
-Bounds2 : constant Range_Nodes :=
-  Get_Index_Bounds (First_Index (Obj_Type));
+Obj_Index  : Node_Id := First_Index (Obj_Type);
+Obj_Bounds : Range_Nodes;
+Typ_Index  : Node_Id := First_Index (Typ);
+Typ_Bounds : Range_Nodes;
 
  begin
-if not Is_OK_Static_Expression (Bounds1.First) or else
-   not Is_OK_Static_Expression (Bounds2.First) or else
-   not Is_OK_Static_Expression (Bounds1.Last) or else
-   not Is_OK_Static_Expression (Bounds2.Last)
-then
-   return True;
+while Present (Typ_Index) loop
+   pragma Assert (Present (Obj_Index));
 
-else
-   return Expr_Value (Bounds1.First) /= Expr_Value (Bounds2.First)
-or else
-  Expr_Value (Bounds1.Last) /= Expr_Value (Bounds2.Last);
-end if;
+   Typ_Bounds := Get_Index_Bounds (Typ_Index);
+   Obj_Bounds := Get_Index_Bounds (Obj_Index);
+
+   if not Is_OK_Static_Expression (Typ_Bounds.First) or else
+ not Is_OK_Static_Expression (Obj_Bounds.First) or else
+ not Is_OK_Static_Expression (Typ_Bounds.Last) or else
+ not Is_OK_Static_Expression (Obj_Bounds.Last)
+   then
+  return True;
+
+   elsif Expr_Value (Typ_Bounds.First)
+   /= Expr_Value (Obj_Bounds.First)
+ or else Expr_Value (Typ_Bounds.Last)
+   /= Expr_Value (Obj_Bounds.Last)
+   then
+  return True;
+   end if;
+
+   Next_Index (Typ_Index);
+   Next_Index (Obj_Index);
+end loop;
  end;
   end if;
+
+  return False;
end Must_Slide;
 
-
-- 
2.45.1

[COMMITTED 03/16] ada: Couple of small cleanups in semantic analysis of aspects

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

The first cleanup is to expose a consistent interface from Sem_Ch13 for the
analysis of aspects at various points of the program.  The second cleanup is
to fix the awkward implementation of the analysis of the specification for
the aspects Stable_Properties, Designated_Storage_Model, Storage_Model_Type
and Aggregate, which are always delayed, and the incorrect placement of that
of the aspect Local_Restrictions, which is never delayed.

gcc/ada/

* freeze.adb (Freeze_All): Call Check_Aspects_At_End_Of_Declarations
to perform the visibility check for aspects.
* sem_ch13.ads (Check_Aspects_At_End_Of_Declarations): Declare.
(Check_Aspect_At_Freeze_Point): Move to...
(Check_Aspect_At_End_Of_Declarations): Move to...
* sem_ch13.adb  (Check_Aspect_At_Freeze_Point): ...here.
(Check_Aspect_At_End_Of_Declarations): ...here.
(Analyze_Aspect_Specifications): Remove peculiar processing for
Stable_Properties, Designated_Storage_Model, Storage_Model_Type
and Aggregate.  Move that of Local_Restrictions around.  Reset
Aitem at the beginning of the loop for each aspect.
(Check_Aspects_At_End_Of_Declarations): New procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb   | 17 +
 gcc/ada/sem_ch13.adb | 87 ++--
 gcc/ada/sem_ch13.ads | 14 +++
 3 files changed, 58 insertions(+), 60 deletions(-)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index c4c524f4685..523b026cc21 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -2645,22 +2645,7 @@ package body Freeze is
 --  for a description of how we handle aspect visibility).
 
 elsif Has_Delayed_Aspects (E) then
-   declare
-  Ritem : Node_Id;
-
-   begin
-  Ritem := First_Rep_Item (E);
-  while Present (Ritem) loop
- if Nkind (Ritem) = N_Aspect_Specification
-   and then Entity (Ritem) = E
-   and then Is_Delayed_Aspect (Ritem)
- then
-Check_Aspect_At_End_Of_Declarations (Ritem);
- end if;
-
- Next_Rep_Item (Ritem);
-  end loop;
-   end;
+   Check_Aspects_At_End_Of_Declarations (E);
 end if;
 
 --  If an incomplete type is still not frozen, this may be a
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index d065dd8dfda..46a359fd7d6 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -150,6 +150,15 @@ package body Sem_Ch13 is
--  is inserted before the freeze node, and the body of the function is
--  inserted after the freeze node.
 
+   procedure Check_Aspect_At_End_Of_Declarations (ASN : Node_Id);
+   --  Performs the processing of an aspect at the freeze all point and issues
+   --  appropriate error messages if the visibility has indeed changed. ASN is
+   --  the N_Aspect_Specification node for the aspect.
+
+   procedure Check_Aspect_At_Freeze_Point (ASN : Node_Id);
+   --  Performs the processing of an aspect at the freeze point. ASN is the
+   --  N_Aspect_Specification node for the aspect.
+
procedure Check_Pool_Size_Clash (Ent : Entity_Id; SP, SS : Node_Id);
--  Called if both Storage_Pool and Storage_Size attribute definition
--  clauses (SP and SS) are present for entity Ent. Issue error message.
@@ -1669,7 +1678,6 @@ package body Sem_Ch13 is
   --  Local variables
 
   Aspect : Node_Id;
-  Aitem  : Node_Id := Empty;
   Ent: Node_Id;
 
   L : constant List_Id := Aspect_Specifications (N);
@@ -1722,7 +1730,12 @@ package body Sem_Ch13 is
 Loc  : constant Source_Ptr := Sloc (Aspect);
 Nam  : constant Name_Id:= Chars (Id);
 A_Id : constant Aspect_Id  := Get_Aspect_Id (Nam);
+
+Aitem : Node_Id := Empty;
+--  The associated N_Pragma or N_Attribute_Definition_Clause
+
 Anod : Node_Id;
+--  An auxiliary node
 
 Delay_Required : Boolean;
 --  Set False if delay is not required
@@ -2949,19 +2962,6 @@ package body Sem_Ch13 is
   end if;
 end case;
 
-if Delay_Required
-   and then (A_Id = Aspect_Stable_Properties
-  or else A_Id = Aspect_Designated_Storage_Model
-  or else A_Id = Aspect_Storage_Model_Type
-  or else A_Id = Aspect_Aggregate)
-   --  ??? It seems like we should do this for all aspects, not
-   --  just these, but that causes as-yet-undiagnosed regressions.
-
-then
-   Set_Has_Delayed_Aspects (E);
-   Set_Is_Delayed_Aspect (Aspect);
-end if;
-
 --  Check 13.1(9.2/5): A represe

[COMMITTED 02/16] ada: Allow implicit dereferenced for uses of 'Super

2024-06-14 Thread Marc Poulhiès

From: Justin Squirek 

This patch modifies the experimental 'Super attribute to allow an access-valued
prefix to be equivalent to Prefix.all'Super.

gcc/ada/

* sem_attr.adb:
(Analyze_Attribute): Add check for dereference.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 22fbca45ac5..2563a92f2f0 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -6688,6 +6688,7 @@ package body Sem_Attr is
  Error_Msg_GNAT_Extension ("attribute %", Sloc (N));
 
  Check_E0;
+ Check_Dereference;
 
  --  Verify that we are looking at a type with ancestors
 
-- 
2.45.1

[COMMITTED 06/16] ada: Crash checking accessibility level on private type

2024-06-14 Thread Marc Poulhiès

From: Justin Squirek 

This patch fixes an issue in the compiler whereby calculating a static
accessibility level on a private type with an access discriminant resulted
in a compile time crash when No_Dynamic_Accessibility_Checks is enabled.

gcc/ada/

* accessibility.adb:
(Accessibility_Level): Replace call Get_Full_View with call to
Full_View since Get_Full_View only works with incomplete types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/accessibility.adb | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/accessibility.adb b/gcc/ada/accessibility.adb
index 47b3a7af10a..da4d1d9ce2e 100644
--- a/gcc/ada/accessibility.adb
+++ b/gcc/ada/accessibility.adb
@@ -2227,7 +2227,11 @@ package body Accessibility is
   --  that of the type.
 
   elsif Ekind (Def_Ent) = E_Discriminant then
- return Scope_Depth (Get_Full_View (Scope (Def_Ent)));
+ return Scope_Depth
+   (if Present (Full_View (Scope (Def_Ent))) then
+   Full_View (Scope (Def_Ent))
+else
+   Scope (Def_Ent));
   end if;
end if;
 
-- 
2.45.1

[COMMITTED 09/16] ada: Simplify handling of VxWorks-specific error codes for ENOENT

2024-06-14 Thread Marc Poulhiès

From: Jerome Guitton 

These error codes were defined on older versions of VxWorks (5, 6, 7
SR0540) and now they are either not defined or they fallback to
ENOENT. To handle these cases without using complex tests against
vxworks versions, leverage on __has_include and provide a fallback to
ENOENT if these error codes are not defined.

gcc/ada/

* sysdep.c (S_dosFsLib_FILE_NOT_FOUND, S_nfsLib_NFSERR_NOENT):
New macros, falback to ENOENT when not already defined.
(__gnat_is_file_not_found_error): Use these new macros to remove
tests against VxWorks flavors.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sysdep.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/sysdep.c b/gcc/ada/sysdep.c
index 443b11f4302..254c736bec4 100644
--- a/gcc/ada/sysdep.c
+++ b/gcc/ada/sysdep.c
@@ -35,18 +35,35 @@
 #ifdef __vxworks
 #include "vxWorks.h"
 #include "ioLib.h"
-#if ! defined (VTHREADS)
+/* VxWorks 5, 6 and 7 SR0540 expose error codes that need to be handled
+   as ENOENT. On later versions:
+   - either they are defined as ENOENT (vx7r2);
+   - or the corresponding system includes are not provided (Helix Cert).  */
+
+#if __has_include ("dosFsLib.h")
+/* On helix-cert, this include is only provided for RTPs.  */
 #include "dosFsLib.h"
 #endif
-#if ! defined (__RTP__) && (! defined (VTHREADS) || defined (__VXWORKSMILS__))
+
+#ifndef S_dosFsLib_FILE_NOT_FOUND
+#define S_dosFsLib_FILE_NOT_FOUND ENOENT
+#endif
+
+#if __has_include ("nfsLib.h")
+/* This include is not provided for RTPs or on helix-cert.  */
 # include "nfsLib.h"
 #endif
+
+#ifndef S_nfsLib_NFSERR_NOENT
+#define S_nfsLib_NFSERR_NOENT ENOENT
+#endif
+
 #include "selectLib.h"
 #include "version.h"
 #if defined (__RTP__)
 #  include "vwModNum.h"
 #endif /* __RTP__ */
-#endif
+#endif /* __vxworks */
 
 #ifdef __ANDROID__
 #undef __linux__
@@ -912,14 +929,10 @@ __gnat_is_file_not_found_error (int errno_val)
 /* In the case of VxWorks, we also have to take into account various
  * filesystem-specific variants of this error.
  */
-#if ! defined (VTHREADS) && (_WRS_VXWORKS_MAJOR < 7)
 else if (errno_val == S_dosFsLib_FILE_NOT_FOUND)
   return 1;
-#endif
-#if ! defined (__RTP__) && (! defined (VTHREADS) || defined (__VXWORKSMILS__))
 else if (errno_val ==  S_nfsLib_NFSERR_NOENT)
   return 1;
-#endif
 #if defined (__RTP__)
 /* An RTP can return an NFS file not found, and the NFS bits must
first be masked on to check the errno.  */
-- 
2.45.1

[COMMITTED 05/16] ada: Minor tweaks to processing of Aggregate aspect

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

The main one is to give the error for Aggregate applied to array types from
Analyze_Aspects_At_Freeze_Point instead of Check_Aspect_At_Freeze_Point, as
for the other aspects.  The message is also changed to be more direct.

gcc/ada/

* aspects.ads (Operational_Aspect): Alphabetize.
* sem_ch13.ads (Analyze_Aspects_At_Freeze_Point): Fix description.
* sem_ch13.adb (Analyze_Aspects_At_Freeze_Point) : Give
the error for array types here instead of...
(Analyze_Aspect_Specifications) : Adjust comment.
(Check_Aspect_At_Freeze_Point) : ...here.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/aspects.ads  |  4 ++--
 gcc/ada/sem_ch13.adb | 17 -
 gcc/ada/sem_ch13.ads |  9 +
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads
index 3cc62de3411..1acbec87824 100644
--- a/gcc/ada/aspects.ads
+++ b/gcc/ada/aspects.ads
@@ -325,12 +325,12 @@ package Aspects is
--  List is currently incomplete ???
 
Operational_Aspect : constant array (Aspect_Id) of Boolean :=
- (Aspect_Constant_Indexing  => True,
+ (Aspect_Aggregate  => True,
+  Aspect_Constant_Indexing  => True,
   Aspect_Default_Iterator   => True,
   Aspect_Iterator_Element   => True,
   Aspect_Iterable   => True,
   Aspect_Variable_Indexing  => True,
-  Aspect_Aggregate  => True,
   others=> False);
 
--  The following array indicates aspects for which multiple occurrences of
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 46a359fd7d6..caebe2e793e 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -1367,7 +1367,11 @@ package body Sem_Ch13 is
  Validate_Storage_Model_Type_Aspect (E, ASN);
 
   when Aspect_Aggregate =>
- null;
+ if Is_Array_Type (E) then
+Error_Msg_N
+  ("aspect Aggregate may not be applied to array type",
+   ASN);
+ end if;
 
   when others =>
  null;
@@ -1384,7 +1388,7 @@ package body Sem_Ch13 is
  Next_Rep_Item (ASN);
   end loop;
 
-  --  Make a second pass for a Full_Access_Only entry
+  --  Make a second pass for a Full_Access_Only entry, see above why
 
   ASN := First_Rep_Item (E);
   while Present (ASN) loop
@@ -4130,8 +4134,8 @@ package body Sem_Ch13 is
   end if;
 
when Aspect_Aggregate =>
-  --  We will be checking that the aspect is not specified on a
-  --  non-array type in Check_Aspect_At_Freeze_Point
+  --  We will be checking that the aspect is not specified on
+  --  an array type in Analyze_Aspects_At_Freeze_Point.
 
   Validate_Aspect_Aggregate (Expr);
 
@@ -11378,11 +11382,6 @@ package body Sem_Ch13 is
 return;
 
  when Aspect_Aggregate =>
-if Is_Array_Type (Entity (ASN)) then
-   Error_Msg_N
- ("aspect& can only be applied to non-array type",
-  Ident);
-end if;
 Resolve_Aspect_Aggregate (Entity (ASN), Expression (ASN));
 return;
 
diff --git a/gcc/ada/sem_ch13.ads b/gcc/ada/sem_ch13.ads
index 2bdca957826..aeacda833d1 100644
--- a/gcc/ada/sem_ch13.ads
+++ b/gcc/ada/sem_ch13.ads
@@ -312,10 +312,11 @@ package Sem_Ch13 is
--  Quite an awkward approach, but this is an awkard requirement
 
procedure Analyze_Aspects_At_Freeze_Point (E : Entity_Id);
-   --  Analyzes all the delayed aspects for entity E at freezing point. This
-   --  includes dealing with inheriting delayed aspects from the parent type
-   --  in the case where a derived type is frozen. Callers should check that
-   --  Has_Delayed_Aspects (E) is True before calling this routine.
+   --  Analyzes all the delayed aspects for entity E at the freeze point. Note
+   --  that this does not include dealing with inheriting delayed aspects from
+   --  the parent or base type in the case where a derived type or a subtype is
+   --  frozen. Callers should check that Has_Delayed_Aspects (E) is True before
+   --  calling this routine.
 
procedure Check_Aspects_At_End_Of_Declarations (E : Entity_Id);
--  Performs the processing described above at the freeze all point, and
-- 
2.45.1

[COMMITTED 08/16] ada: Minor tweak in Snames

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

gcc/ada/

* snames.ads-tmpl (Name_Present): Move to Repinfo section.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/snames.ads-tmpl | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/ada/snames.ads-tmpl b/gcc/ada/snames.ads-tmpl
index 699b8df5851..d2f724f86ca 100644
--- a/gcc/ada/snames.ads-tmpl
+++ b/gcc/ada/snames.ads-tmpl
@@ -903,10 +903,6 @@ package Snames is
Name_Warn   : constant Name_Id := N + $;
Name_Working_Storage: constant Name_Id := N + $;
 
-   --  used by Repinfo JSON I/O
-
-   Name_Present: constant Name_Id := N + $;
-
--  Names of recognized attributes. The entries with the comment "Ada 83"
--  are attributes that are defined in Ada 83, but not in Ada 95. These
--  attributes are implemented in all Ada modes in GNAT.
@@ -1372,6 +1368,7 @@ package Snames is
 
Name_Discriminant : constant Name_Id := N + $;
Name_Operands : constant Name_Id := N + $;
+   Name_Present  : constant Name_Id := N + $;
 
--  Other miscellaneous names used in front end
--  Note that the UP_ prefix means use the rest of the name in uppercase,
-- 
2.45.1

[COMMITTED 13/16] ada: Do not create null GCC thunks

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

This prevents Gigi from creating null GCC thunks, i.e. thunks that have all
their internal parameters set to zero, replacing them with aliases.  They
can arise in degenerate cases and null thunks would trip on an assertion in
former_thunk_p when they are later optimized.

gcc/ada/

PR ada/109817
* gcc-interface/trans.cc (maybe_make_gnu_thunk): Create an alias
instead of a null thunk.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 93978c0f0ba..5256095dfeb 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -11093,6 +11093,16 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree 
gnu_thunk)
   tree gnu_interface_offset
 = gnu_interface_tag ? byte_position (gnu_interface_tag) : NULL_TREE;
 
+  /* But we generate a call to the Thunk_Entity in the thunk.  */
+  tree gnu_target
+= gnat_to_gnu_entity (Thunk_Entity (gnat_thunk), NULL_TREE, false);
+
+  /* If the target is local, then thunk and target must have the same context
+ because cgraph_node::expand_thunk can only forward the static chain.  */
+  if (DECL_STATIC_CHAIN (gnu_target)
+  && DECL_CONTEXT (gnu_thunk) != DECL_CONTEXT (gnu_target))
+return false;
+
   /* There are three ways to retrieve the offset between the interface view
  and the base object.  Either the controlling type covers the interface
  type and the offset of the corresponding tag is fixed, in which case it
@@ -1,6 +11121,15 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree 
gnu_thunk)
   virtual_value = 0;
   virtual_offset = NULL_TREE;
   indirect_offset = 0;
+
+  /* Do not create a null thunk, instead make it an alias.  */
+  if (fixed_offset == 0)
+   {
+ SET_DECL_ASSEMBLER_NAME (gnu_thunk, DECL_ASSEMBLER_NAME (gnu_target));
+ (void) cgraph_node::get_create (gnu_target);
+ (void) cgraph_node::create_alias (gnu_thunk, gnu_target);
+ return true;
+   }
 }
   else if (!gnu_interface_offset
   && !Is_Variable_Size_Record (gnat_controlling_type))
@@ -11132,16 +11151,6 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree 
gnu_thunk)
   indirect_offset = (HOST_WIDE_INT) (POINTER_SIZE / BITS_PER_UNIT);
 }
 
-  /* But we generate a call to the Thunk_Entity in the thunk.  */
-  tree gnu_target
-= gnat_to_gnu_entity (Thunk_Entity (gnat_thunk), NULL_TREE, false);
-
-  /* If the target is local, then thunk and target must have the same context
- because cgraph_node::expand_thunk can only forward the static chain.  */
-  if (DECL_STATIC_CHAIN (gnu_target)
-  && DECL_CONTEXT (gnu_thunk) != DECL_CONTEXT (gnu_target))
-return false;
-
   /* If the target returns by invisible reference and is external, apply the
  same transformation as Subprogram_Body_to_gnu here.  */
   if (TREE_ADDRESSABLE (TREE_TYPE (gnu_target))
-- 
2.45.1

[COMMITTED 10/16] ada: Bad tree built for Obj.Discrim_Dep_Component'Loop_Entry in assertion

2024-06-14 Thread Marc Poulhiès

From: Steve Baird 

The Etype for an N_Selected_Component node usually should not match the Etype
of the referenced component if the component is subject to a
discriminant-dependent constraint. Instead Build_Actual_Subtype_Of_Component
should be called. Fix a case where this rule was not being followed (because
B_A_S_O_C is not called during preanalysis of a component selection), resulting
in a tree that confused CodePeer because the subtype was wrong.

gcc/ada/

* exp_attr.adb
(Expand_Loop_Entry_Attribute):
Ensure that Etype of the saved expression is set correctly.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_attr.adb | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 1396007a2d1..5c85b4912d2 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -1780,14 +1780,25 @@ package body Exp_Attr is
  begin
 Aux_Decl := Empty;
 
---  Generate a nominal type for the constant when the prefix is of
---  a constrained type. This is achieved by setting the Etype of
---  the relocated prefix to its base type. Since the prefix is now
---  the initialization expression of the constant, its freezing
---  will produce a proper nominal type.
-
 Temp_Expr := Relocate_Node (Pref);
-Set_Etype (Temp_Expr, Base_Typ);
+
+--  For Etype (Temp_Expr) in some cases we cannot use either
+--  Etype (Pref) or Base_Typ. So we set Etype (Temp_Expr) to null
+--  and mark Temp_Expr as requiring analysis. Rather than trying
+--  to sort out exactly when this is needed, we do it
+--  unconditionally.
+--  One case where this is needed is when
+-- 1) Pref is an N_Selected_Component name that
+--refers to a component which is subject to a
+--discriminant-dependent constraint; and
+-- 2) The prefix of that N_Selected_Component refers to a
+--formal parameter with an unconstrained subtype; and
+-- 3) Pref has only been preanalyzed (so that
+--Build_Actual_Subtype_Of_Component has not been called
+--and Etype (Pref) equals the Etype of the component).
+
+Set_Etype (Temp_Expr, Empty);
+Set_Analyzed (Temp_Expr, False);
 
 --  Generate:
 --Temp : constant Base_Typ := Pref;
-- 
2.45.1

[COMMITTED 12/16] ada: Typo and indentation fix

2024-06-14 Thread Marc Poulhiès

Fixes typo in comments and 2 instances of bad indentation.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity): Typo fix.
(gnat_to_gnu_component_type): Indent fix.
* gcc-interface/gigi.h (build_call_alloc_dealloc): Typo fix.
* gcc-interface/utils.cc (make_dummy_type): Typo fix.
* gcc-interface/utils2.cc (gnat_protect_expr): Indent fix.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc   | 8 
 gcc/ada/gcc-interface/gigi.h| 2 +-
 gcc/ada/gcc-interface/utils.cc  | 2 +-
 gcc/ada/gcc-interface/utils2.cc | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 8b72c96c439..23983742605 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -1384,7 +1384,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
volatile_flag = false;
gnu_size = NULL_TREE;
 
-   /* In case this was a aliased object whose nominal subtype is
+   /* In case this was an aliased object whose nominal subtype is
   unconstrained, the pointer above will be a thin pointer and
   build_allocator will automatically make the template.
 
@@ -2103,7 +2103,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
 
   1. the array type (suffix XUA) containing the actual data,
 
-  2. the template type (suffix XUB) containng the bounds,
+  2. the template type (suffix XUB) containing the bounds,
 
   3. the fat pointer type (suffix XUP) representing a pointer or a
  reference to the unconstrained array type:
@@ -5445,8 +5445,8 @@ gnat_to_gnu_component_type (Entity_Id gnat_array, bool 
definition,
  if (gnu_comp_align > TYPE_ALIGN (gnu_type))
gnu_comp_align = 0;
}
-   else
-gnu_comp_align = 0;
+  else
+   gnu_comp_align = 0;
 
   gnu_type = maybe_pad_type (gnu_type, gnu_comp_size, gnu_comp_align,
 gnat_array, true, definition, true);
diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h
index f3205a8a25d..6ed74d6879e 100644
--- a/gcc/ada/gcc-interface/gigi.h
+++ b/gcc/ada/gcc-interface/gigi.h
@@ -906,7 +906,7 @@ extern tree build_call_alloc_dealloc (tree gnu_obj, tree 
gnu_size,
  Entity_Id gnat_pool, Node_Id gnat_node);
 
 /* Build a GCC tree to correspond to allocating an object of TYPE whose
-   initial value if INIT, if INIT is nonzero.  Convert the expression to
+   initial value is INIT, if INIT is nonzero.  Convert the expression to
RESULT_TYPE, which must be some type of pointer.  Return the tree.
 
GNAT_PROC and GNAT_POOL optionally give the procedure to call and
diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index ae520542ace..771cb1a17ca 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -499,7 +499,7 @@ make_dummy_type (Entity_Id gnat_type)
   if (No (gnat_equiv))
 gnat_equiv = gnat_type;
 
-  /* If it there already a dummy type, use that one.  Else make one.  */
+  /* If there is already a dummy type, use that one.  Else make one.  */
   if (PRESENT_DUMMY_NODE (gnat_equiv))
 return GET_DUMMY_NODE (gnat_equiv);
 
diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc
index 4b7e2739f6a..70271cf2836 100644
--- a/gcc/ada/gcc-interface/utils2.cc
+++ b/gcc/ada/gcc-interface/utils2.cc
@@ -2884,7 +2884,7 @@ gnat_protect_expr (tree exp)
   if (code == NON_LVALUE_EXPR
   || CONVERT_EXPR_CODE_P (code)
   || code == VIEW_CONVERT_EXPR)
-  return build1 (code, type, gnat_protect_expr (TREE_OPERAND (exp, 0)));
+return build1 (code, type, gnat_protect_expr (TREE_OPERAND (exp, 0)));
 
   /* If we're indirectly referencing something, we only need to protect the
  address since the data itself can't change in these situations.  */
-- 
2.45.1

[COMMITTED 11/16] ada: Fix parts of classification of aspects

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

Many aspects are (correctly) marked as GNAT-specific but nevertheless not
listed in the Implementation_Defined_Aspect array, so this aligns the two
sides and also removes Default_Initial_Condition and Object_Size from the
list, since they are defined in Ada 2022.

This also moves No_Controlled_Parts and No_Task_Parts to the subclass of
boolean aspects, and completes the list of nonoverridable aspects defined
in Ada 2022.

gcc/ada/

* aspects.ads (Aspect_Id): Alphabetize, remove the GNAT tag from
Default_Initial_Condition and Object_Size, move No_Controlled_Parts
and No_Task_Parts to boolean subclass.
(Nonoverridable_Aspect_Id): Add missing Ada 2022 aspects.
(Implementation_Defined_Aspect): Add all missing aspects, remove
Max_Entry_Queue_Length and Object_Size
(Aspect_Argument): Remove specific entries for No_Controlled_Parts
and No_Task_Parts, list boolean aspects last.
(Is_Representation_Aspect ): Move boolean aspects last.
(Aspect_Names): Alphabetize.
* sem_ch13.adb (Analyze_Aspect_Disable_Controlled): Adjust.
(Analyze_Aspect_Specifications): Move around processing for
No_Controlled_Parts and No_Task_Parts.
(Check_Aspect_At_Freeze_Point): Remove specific entries for
No_Controlled_Parts and No_Task_Parts

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/aspects.ads  | 94 
 gcc/ada/sem_ch13.adb | 69 +++-
 2 files changed, 101 insertions(+), 62 deletions(-)

diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads
index d4aafb1a4f1..202d42193d1 100644
--- a/gcc/ada/aspects.ads
+++ b/gcc/ada/aspects.ads
@@ -64,10 +64,14 @@ with Types;   use Types;
 
 package Aspects is
 
-   --  Type defining recognized aspects
+   --  Type enumerating the recognized aspects. The GNAT tag must be in keeping
+   --  with the Implementation_Defined_Aspect array below.
 
type Aspect_Id is
  (No_Aspect,-- Dummy entry for no aspect
+
+  --  The following aspects do not have a (static) boolean value
+
   Aspect_Abstract_State,-- GNAT
   Aspect_Address,
   Aspect_Aggregate,
@@ -81,7 +85,7 @@ package Aspects is
   Aspect_Convention,
   Aspect_CPU,
   Aspect_Default_Component_Value,
-  Aspect_Default_Initial_Condition, -- GNAT
+  Aspect_Default_Initial_Condition,
   Aspect_Default_Iterator,
   Aspect_Default_Storage_Pool,
   Aspect_Default_Value,
@@ -104,8 +108,8 @@ package Aspects is
   Aspect_Integer_Literal,
   Aspect_Interrupt_Priority,
   Aspect_Invariant, -- GNAT
-  Aspect_Iterator_Element,
   Aspect_Iterable,  -- GNAT
+  Aspect_Iterator_Element,
   Aspect_Link_Name,
   Aspect_Linker_Section,-- GNAT
   Aspect_Local_Restrictions,-- GNAT
@@ -113,9 +117,7 @@ package Aspects is
   Aspect_Max_Entry_Queue_Depth, -- GNAT
   Aspect_Max_Entry_Queue_Length,
   Aspect_Max_Queue_Length,  -- GNAT
-  Aspect_No_Controlled_Parts,
-  Aspect_No_Task_Parts, -- GNAT
-  Aspect_Object_Size,   -- GNAT
+  Aspect_Object_Size,
   Aspect_Obsolescent,   -- GNAT
   Aspect_Output,
   Aspect_Part_Of,   -- GNAT
@@ -186,10 +188,10 @@ package Aspects is
   Aspect_Atomic,
   Aspect_Atomic_Components,
   Aspect_Constant_After_Elaboration,-- GNAT
-  Aspect_Disable_Controlled,-- GNAT
-  Aspect_Discard_Names,
   Aspect_CUDA_Device,   -- GNAT
   Aspect_CUDA_Global,   -- GNAT
+  Aspect_Disable_Controlled,-- GNAT
+  Aspect_Discard_Names,
   Aspect_Effective_Reads,   -- GNAT
   Aspect_Effective_Writes,  -- GNAT
   Aspect_Exclusive_Functions,
@@ -206,9 +208,11 @@ package Aspects is
   Aspect_Interrupt_Handler,
   Aspect_Lock_Free, -- GNAT
   Aspect_No_Caching,-- GNAT
+  Aspect_No_Controlled_Parts,
   Aspect_No_Inline, -- GNAT
   Aspect_No_Return,
   Aspect_No_Tagged_Streams, -- GNAT
+  Aspect_No_Task_Parts, -- GNAT
   Aspect_Pack,
   Aspect_Persistent_BSS,-- GNAT
   Aspect_Preelaborable_Initialization,
@@ -242,12 +246,13 @@ package Aspects is
  | Aspect_Constant_Indexing
  | Aspect_Default_Iterator
  | Aspect_Implicit_Dereference
+ | Aspect_Integer_Literal
  | Aspect_Iterator_Element
  | Aspect_Max_Entry_Queue_Length
  | A

[COMMITTED 16/16] ada: Do not include target-specific makefile fragments

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

They are unused in this context.

gcc/ada/

* gcc-interface/Makefile.in (tmake_file): Remove all references.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/Makefile.in | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index 0666fc00bb8..29db89c6f52 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -148,7 +148,6 @@ host_vendor=@host_vendor@
 host_os=@host_os@
 target_cpu_default = @target_cpu_default@
 xmake_file = @xmake_file@
-tmake_file = @tmake_file@
 #version=`sed -e 's/.*\"\([^ \"]*\)[ \"].*/\1/' < $(srcdir)/version.c`
 #mainversion=`sed -e 's/.*\"\([0-9]*\.[0-9]*\).*/\1/' < $(srcdir)/version.c`
 
@@ -209,11 +208,6 @@ all: all.indirect
 # This tells GNU Make version 3 not to put all variables in the environment.
 .NOEXPORT:
 
-# target overrides
-ifneq ($(tmake_file),)
-include $(tmake_file)
-endif
-
 # host overrides
 ifneq ($(xmake_file),)
 include $(xmake_file)
-- 
2.45.1

[COMMITTED 07/16] ada: Add prototype for mutably tagged types

2024-06-14 Thread Marc Poulhiès

From: Justin Squirek 

This patch implements mutably tagged types via the new Size'Class aspect.

gcc/ada/

* doc/gnat_rm/gnat_language_extensions.rst: Add documentation for
mutably tagged type feature.
* aspects.ads: Add registration for 'Size'Class.
* einfo.ads: Add documentation for new components
Class_Wide_Equivalent_Type and Is_Mutably_Tagged_Type.
* exp_aggr.adb (Gen_Assign): Assume associated mutably tagged type
when class-wide equivalent type is encountered.
(Contains_Mutably_Tagged_Type): New subprogram.
(Convert_To_Positional): Assume associated mutably tagged type
when class-wide equivalent type is encountered.
(Is_Static_Element): Assume associated mutably tagged type when
class-wide equivalent type is encountered.
(Expand_Array_Aggregate): Assume associated mutably tagged type
when class-wide equivalent type is encountered.
(Expand_Record_Aggregate): Force mutably tagged records to be
expanded into assignments.
* exp_ch3.adb (Build_Array_Init_Proc): Assume associated mutably
tagged type when class-wide equivalent type is encountered.
(Simple_Initialization_OK): Disallow simple initialization for
class-wide equivalent types.
(Build_Init_Statements): Assume associated mutably tagged type
when class-wide equivalent type is encountered.
(Expand_Freeze_Array_Type): Ignore building of record init procs
for mutably tagged types.
(Expand_N_Full_Type_Declaration): Replace mutably tagged type
declarations with their associated class-wide equivalent types.
(Default_Initialize_Object): Add special handling for mutably
tagged types.
* exp_ch4.adb (Expand_N_Allocator): Add initialization for mutably
tagged types.
(Expand_Record_Equality): Generate mutably tagged unchecked
conversions.
* exp_ch5.adb (Expand_N_Assignment_Statement): Generate a special
assignment case for class-wide equivalent types which does tag
assignments and ignores certain checks.
* exp_ch6.adb (Expand_Call_Helper): Propagate constrained extra
formal actuals for mutably tagged types.
* exp_ch7.adb (Make_Init_Call): Handle mutably tagged type
initialization.
* exp_util.adb (Make_CW_Equivalent_Type): Modify to handle mutably
tagged objects which contain no initialization expression.
(Make_Subtype_From_Expr): Modify call to Make_CW_Equivalent_Type.
* exp_util.ads (Make_CW_Equivalent_Type): Move declaration from
body to spec.
* freeze.adb (Size_Known): No longer return false automatically
when a class-wide type is encountered.
(Freeze_Entity): Ignore error messages about size not being known
for mutably tagged types.
* gen_il-fields.ads: Register new fields
Class_Wide_Equivalent_Type and Is_Mutably_Tagged_Type.
* gen_il-gen-gen_entities.adb: Register new fields
Class_Wide_Equivalent_Type and Is_Mutably_Tagged_Type for type
entities.
* mutably_tagged.adb, mutably_tagged.ads
(Corresponding_Mutably_Tagged_Type): New subprogram.
(Depends_On_Mutably_Tagged_Ext_Comp): New subprogram.
(Get_Corresponding_Mutably_Tagged_Type_If_Present): New
subprogram.
(Get_Corresponding_Tagged_Type_If_Present): New subprogram.
(Is_Mutably_Tagged_Conversion): New subprogram.
(Is_Mutably_Tagged_CW_Equivalent_Type): New subprogram.
(Make_Mutably_Tagged_Conversion): New subprogram.
(Make_CW_Size_Compile_Check): New subprogram.
(Make_Mutably_Tagged_CW_Check): New subprogram.
* sem_aggr.adb (Resolve_Array_Aggregate): Skip tag checks for
class-wide equivalent types.
(Resolve_Aggr_Expr): Assume associated mutably tagged type when
class-wide equivalent type is encountered.
* sem_attr.adb (Analyze_Attribute): Allow 'Tag on mutably tagged
types.
(Resolve_Attribute): Detect errors for dependence of mutably
tagged extension type component.
* sem_ch12.adb (Instantiate_Object): Detect errors for dependence
of mutably tagged extension type component.
* sem_ch13.adb (Analyze_One_Aspect): Propagate 'Size'Class to
class-wide type.
(Analyze_Attribute_Definition_Clause): Add handling of 'Size'Class
by generating class-wide equivalent types and checking for illegal
uses.
* sem_ch2.adb (Analyze_Identifier): Generate unchecked conversion
for class-wide equivalent types.
* sem_ch3.adb (Analyze_Component_Declaration): Avoid unconstrained
errors on mutably tagged types.
(Analyze_Object_Declaration): Rewrite declarations of mutably
tagged types to use class-wide equivalent types.
(Array_Type_Declaration): Modify arra

[COMMITTED 15/16] ada: Fix return mechanism reported by -gnatRm

2024-06-14 Thread Marc Poulhiès

From: Eric Botcazou 

The return mechanism of functions is reported when the -gnatRm switch is
specified, but it is incorrect when the result type is not a by-reference
type in the language sense but is nevertheless returned by reference.

gcc/ada/

* gcc-interface/decl.cc: Include function.h.
(gnat_to_gnu_param): Minor comment tweaks.
(gnat_to_gnu_subprog_type): Take into account the default for the
computation of the return mechanism.  Give a warning if a by-copy
specified mechanism cannot be honored.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 23983742605..aa31a18 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -27,6 +27,7 @@
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
+#include "function.h"
 #include "tree.h"
 #include "gimple-expr.h"
 #include "stringpool.h"
@@ -5703,6 +5704,7 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree 
gnu_param_type, bool first,
 
   input_location = saved_location;
 
+  /* Warn if we are asked to pass by copy but cannot.  */
   if (mech == By_Copy && (by_ref || by_component_ptr))
 post_error ("??cannot pass & by copy", gnat_param);
 
@@ -5735,12 +5737,13 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree 
gnu_param_type, bool first,
   DECL_RESTRICTED_ALIASING_P (gnu_param) = restricted_aliasing_p;
   Sloc_to_locus (Sloc (gnat_param), &DECL_SOURCE_LOCATION (gnu_param));
 
-  /* If no Mechanism was specified, indicate what we're using, then
- back-annotate it.  */
+  /* If no Mechanism was specified, indicate what we will use.  */
   if (mech == Default)
 mech = (by_ref || by_component_ptr) ? By_Reference : By_Copy;
 
+  /* Back-annotate the mechanism in all cases.  */
   Set_Mechanism (gnat_param, mech);
+
   return gnu_param;
 }
 
@@ -6129,11 +6132,6 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool 
definition,
  associate_subprog_with_dummy_type (gnat_subprog, gnu_return_type);
  incomplete_profile_p = true;
}
-
-  if (kind == E_Function)
-   Set_Mechanism (gnat_subprog, return_by_direct_ref_p
-|| return_by_invisi_ref_p
-? By_Reference : By_Copy);
 }
 
   /* A procedure (something that doesn't return anything) shouldn't be
@@ -6636,6 +6634,28 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool 
definition,
  if (warn_shadow)
post_error ("'G'C'C builtin not found for&!??", gnat_subprog);
}
+
+  /* Finally deal with the return mechanism for a function.  */
+  if (kind == E_Function)
+   {
+ /* We return by reference either if this is required by the semantics
+of the language or if this is the default for the function.  */
+ const bool by_ref = return_by_direct_ref_p
+ || return_by_invisi_ref_p
+ || aggregate_value_p (gnu_return_type, gnu_type);
+ Mechanism_Type mech = Mechanism (gnat_subprog);
+
+ /* Warn if we are asked to return by copy but cannot.  */
+ if (mech == By_Copy && by_ref)
+   post_error ("??cannot return from & by copy", gnat_subprog);
+
+ /* If no mechanism was specified, indicate what we will use.  */
+ if (mech == Default)
+   mech = by_ref ? By_Reference : By_Copy;
+
+ /* Back-annotate the mechanism in all cases.  */
+ Set_Mechanism (gnat_subprog, mech);
+   }
 }
 
   *param_list = gnu_param_list;
-- 
2.45.1

[COMMITTED 14/16] ada: Skip subprogram body entities inside scopes

2024-06-14 Thread Marc Poulhiès

From: Yannick Moy 

Entities of kind E_Subprogram_Body, used on bodies of subprograms for
which there is a separate declaration, have been added in the entities
linked from a scope in order to get the representation information on
their enclosed object and type declarations. Skip these entities in gigi.

gcc/ada/

* gcc-interface/trans.cc (elaborate_all_entities_for_package)
(process_freeze_entity): Skip entities of kind E_Subprogram_Body.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 5256095dfeb..e68fb3fd776 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -9321,6 +9321,10 @@ elaborate_all_entities_for_package (Entity_Id 
gnat_package)
   if (kind == E_Package_Body)
continue;
 
+  /* Skip subprogram bodies.  */
+  if (kind == E_Subprogram_Body)
+   continue;
+
   /* Skip limited views that point back to the main unit.  */
   if (IN (kind, Incomplete_Kind)
  && From_Limited_With (gnat_entity)
@@ -9427,6 +9431,10 @@ process_freeze_entity (Node_Id gnat_node)
   if (Is_Subprogram (gnat_entity) && Present (Interface_Alias (gnat_entity)))
 return;
 
+  /* Skip subprogram bodies.  */
+  if (kind == E_Subprogram_Body)
+return;
+
   /* Check for an old definition if this isn't an object with address clause,
  since the saved GCC tree is the address expression in that case.  */
   gnu_old
-- 
2.45.1

Re: [Patch, fortran] PR59104

2024-06-14 Thread Andre Vehreschild

Hi Paul,

to me this looks fine. Thanks for the patch. Me having been away for some time
from gfortran, I recommend you wait for Harald's ok, too.

Regards,
Andre

On Thu, 13 Jun 2024 22:43:03 +0100
Paul Richard Thomas  wrote:

> Hi Both,
>
> Thanks for the highly constructive comments. I think that I have
> incorporated them fully in the attached.
>
> OK for mainline and ...?
>
> Paul
>
>
> On Mon, 10 Jun 2024 at 08:19, Andre Vehreschild  wrote:
>
> > Hi Paul,
> >
> > while looking at your patch I see calls to gfc_add_init_cleanup (...,
> > back),
> > while the function signature is gfc_add_init_cleanup (..., bool front).
> > This
> > slightly confuses me. I would at least expect to see
> > gfc_add_init_cleanup(...,
> > !back) calls. Just to get the semantics right.
> >
> > Then I wonder why not doing:
> >
> > diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
> > index bafe8cbc5bc..97ace8c778e 100644
> > --- a/gcc/fortran/dependency.cc
> > +++ b/gcc/fortran/dependency.cc
> > @@ -2497,3 +2497,63 @@ gfc_omp_expr_prefix_same (gfc_expr *lexpr, gfc_expr
> > *rexpr)
> >return true;
> >  }
> > +
> > +
> > +/* gfc_function_dependency returns true for non-dummy symbols with
> > dependencies
> > +   on an old-fashioned function result (ie. proc_name =
> > proc_name->result).
> > +   This is used to ensure that initialization code appears after the
> > function
> > +   result is treated and that any mutual dependencies between these
> > symbols are
> > +   respected.  */
> > +
> > +static bool
> > +dependency_fcn (gfc_expr *e, gfc_symbol *sym,
> > +int *f ATTRIBUTE_UNUSED)
> > +{
> > +  return (e && e->expr_type == EXPR_VARIABLE
> > +  && e->symtree
> > +  && e->symtree->n.sym == sym);
> > +}
> >
> > Instead of the multiple if-statements?
> >
> > +
> > +bool
> > +gfc_function_dependency (gfc_symbol *sym, gfc_symbol *proc_name)
> > +{
> > +  bool front = false;
> > +
> > +  if (proc_name && proc_name->attr.function
> > +  && proc_name == proc_name->result
> > +  && !(sym->attr.dummy || sym->attr.result))
> > +{
> > +  if (sym->as && sym->as->type == AS_EXPLICIT)
> > +   {
> > + for (int dim = 0; dim < sym->as->rank; dim++)
> > +   {
> > + if (sym->as->lower[dim]
> > + && sym->as->lower[dim]->expr_type != EXPR_CONSTANT)
> > +   front = gfc_traverse_expr (sym->as->lower[dim], proc_name,
> > +  dependency_fcn, 0);
> > + if (front)
> > +   break;
> > + if (sym->as->upper[dim]
> > + && sym->as->upper[dim]->expr_type != EXPR_CONSTANT)
> > +   front = gfc_traverse_expr (sym->as->upper[dim], proc_name,
> > +  dependency_fcn, 0);
> > + if (front)
> > +   break;
> > +   }
> > +   }
> > +
> > +  if (sym->ts.type == BT_CHARACTER
> > + && sym->ts.u.cl && sym->ts.u.cl->length
> > + && sym->ts.u.cl->length->expr_type != EXPR_CONSTANT)
> > +   front = gfc_traverse_expr (sym->ts.u.cl->length, proc_name,
> > +  dependency_fcn, 0);
> >
> > This can overwrite a previous front == true, right? Is this intended?
> >
> > +}
> > +  return front;
> > + }
> >
> > The rest - besides the front-back confusion - looks fine to me. Thanks for
> > the
> > patch.
> >
> > Regards,
> > Andre
> >
> > On Sun, 9 Jun 2024 07:14:39 +0100
> > Paul Richard Thomas  wrote:
> >
> > > Hi All,
> > >
> > > The attached fixes a problem that, judging by the comments, has been
> > looked
> > > at periodically over the last ten years but just looked to be too
> > > fiendishly complicated to fix. This is not in small part because of the
> > > confusing ordering of dummies in the tlink chain and the unintuitive
> > > placement of all deferred initializations to the front of the init chain
> > in
> > > the wrapped block.
> > >
> > > The result of the existing ordering is that the initialization code for
> > > non-dummy variables that depends on the function result occurs before any
> > > initialization code for the function result itself. The fix ensures that:
> > > (i) These variables are placed correctly in the tlink chain, respecting
> > > inter-dependencies; and (ii) The dependent initializations are placed at
> > > the end of the wrapped block init chain.  The details appear in the
> > > comments in the patch. It is entirely possible that a less clunky fix
> > > exists but I failed to find it.
> > >
> > > OK for mainline?
> > >
> > > Regards
> > >
> > > Paul
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
> >


--
Andre Vehreschild * Email: vehre ad gmx dot de

[pushed] doc: Remove reference to Interix

2024-06-14 Thread Gerald Pfeifer

This apparently was missed when support for Interix was removed in 2016.

gcc:
PR target/69374
* doc/install.texi (Specific): Remove stale reference to Interix.
---
 gcc/doc/install.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 0baba5e594d..38c93f067b5 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -5178,8 +5178,7 @@ SuperH (sh-wince-pe), and MIPS (mips-wince-pe).
 @subheading Other Windows Platforms
 GCC no longer supports Windows NT on the Alpha or PowerPC.
 
-GCC no longer supports the Windows POSIX subsystem.  However, it does
-support the Interix subsystem.  See above.
+GCC no longer supports the Windows POSIX subsystem.
 
 Old target names including *-*-winnt and *-*-windowsnt are no longer used.
 
-- 
2.45.2

RE: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction mismatch

2024-06-14 Thread Li, Pan2

Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, June 14, 2024 3:33 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Bugfix vec_extract v mode iterator restriction 
mismatch

LGTM, thanks :)

On Fri, Jun 14, 2024 at 3:02 PM  wrote:
>
> From: Pan Li 
>
> We have vec_extract pattern which takes ZVFHMIN as the mode
> iterator of the V mode.  Aka VF_ZVFHMIN iterator.  But it will
> expand to pred_extract_first pattern which takes the ZVFH as the mode
> iterator of the V mode.  AKa VF.  The mismatch will result in one ICE
> similar as below:
>
> insn 30 29 31 2 (set (reg:HF 156 [ _2 ])
> (unspec:HF [
> (vec_select:HF (reg:RVVMF2HF 134 [ _1 ])
> (parallel [
> (const_int 0 [0])
> ]))
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)) "compress_run-2.c":22:3 -1
>  (nil))
> during RTL pass: vregs
> compress_run-2.c:25:1: internal compiler error: in extract_insn, at
> recog.cc:2812
> 0xb3bc47 _fatal_insn(char const*, rtx_def const*, char const*, int, char
> const*)
> ../../../gcc/gcc/rtl-error.cc:108
> 0xb3bc69 _fatal_insn_not_found(rtx_def const*, char const*, int, char
> const*)
> ../../../gcc/gcc/rtl-error.cc:116
> 0xb3a545 extract_insn(rtx_insn*)
> ../../../gcc/gcc/recog.cc:2812
> 0x1010e9e instantiate_virtual_regs_in_insn
> ../../../gcc/gcc/function.cc:1612
> 0x1010e9e instantiate_virtual_regs
> ../../../gcc/gcc/function.cc:1995
> 0x1010e9e execute
> ../../../gcc/gcc/function.cc:2042
>
> The below test suites are passed for this patch.
> 1. The rv64gcv fully regression test.
> 2. The rv64gcv build with glibc.
>
> There may be other similar issue(s) for the mismatch,  we will take care
> of them by test cases one by one.
>
> PR target/115456
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Leverage V_ZVFH instead of V
> which contains the VF_ZVFHMIN for alignment.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr115456-2.c: New test.
> * gcc.target/riscv/rvv/base/pr115456-3.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/vector-iterators.md  |  4 ++-
>  .../gcc.target/riscv/rvv/base/pr115456-2.c| 31 +++
>  .../gcc.target/riscv/rvv/base/pr115456-3.c| 31 +++
>  3 files changed, 65 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 47392d0da4c..43137a2a379 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -1578,9 +1578,11 @@ (define_mode_iterator VLS_ZVFH [VLSI VLSF])
>
>  (define_mode_iterator V [VI VF_ZVFHMIN])
>
> +(define_mode_iterator V_ZVFH [VI VF])
> +
>  (define_mode_iterator V_VLS [V VLS])
>
> -(define_mode_iterator V_VLS_ZVFH [V VLS_ZVFH])
> +(define_mode_iterator V_VLS_ZVFH [V_ZVFH VLS_ZVFH])
>
>  (define_mode_iterator V_VLSI [VI VLSI])
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
> new file mode 100644
> index 000..453e18b1c79
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-2.c
> @@ -0,0 +1,31 @@
> +/* Test there is no ICE when compile.  */
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfhmin -mrvv-vector-bits=zvl -mabi=lp64d 
> -O3 -ftree-vectorize" } */
> +
> +#include 
> +#include 
> +
> +typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
> +
> +vnx4f __attribute__ ((noinline, noclone))
> +test_5 (vnx4f x, vnx4f y)
> +{
> +  return __builtin_shufflevector (x, y, 1, 3, 6, 7);
> +}
> +
> +int
> +main (void)
> +{
> +  vnx4f test_5_x = {0, 1, 3, 4};
> +  vnx4f test_5_y = {4, 5, 6, 7};
> +  vnx4f test_5_except = {1, 4, 6, 7};
> +  vnx4f test_5_real;
> +  test_5_real = test_5 (test_5_x, test_5_y);
> +
> +  for (int i = 0; i < 4; i++)
> +assert (test_5_real[i] == test_5_except[i]);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {call\s+__extendhfsf2} 8 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
> new file mode 100644
> index 000..2c54f1d7538
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115456-3.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize" } */
> +
> +#include 
> +#include 
> +
> +typedef _Float16 vnx4f __attribute__ ((vector_size (8)));
> +
> +vnx4f __attribute__ ((noinline, noclone))
> +test_5 (vnx4f x, vnx4f y)
> +{
> +  return __builtin_shufflev

Re: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-14 Thread Richard Biener

On Wed, Jun 12, 2024 at 2:38 PM  wrote:
>
> From: Pan Li 
>
> After we support the scalar unsigned form 1 and 2,  we would like
> to introduce more forms include the branch and branchless.  There
> are forms 3-10 list as below:
>
> Form 3:
>   #define SAT_SUB_U_3(T) \
>   T sat_sub_u_3_##T (T x, T y) \
>   { \
> return x > y ? x - y : 0; \
>   }
>
> Form 4:
>   #define SAT_SUB_U_4(T) \
>   T sat_sub_u_4_##T (T x, T y) \
>   { \
> return x >= y ? x - y : 0; \
>   }
>
> Form 5:
>   #define SAT_SUB_U_5(T) \
>   T sat_sub_u_5_##T (T x, T y) \
>   { \
> return x < y ? 0 : x - y; \
>   }
>
> Form 6:
>   #define SAT_SUB_U_6(T) \
>   T sat_sub_u_6_##T (T x, T y) \
>   { \
> return x <= y ? 0 : x - y; \
>   }
>
> Form 7:
>   #define SAT_SUB_U_7(T) \
>   T sat_sub_u_7_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)(overflow - 1); \
>   }
>
> Form 8:
>   #define SAT_SUB_U_8(T) \
>   T sat_sub_u_8_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)-(!overflow); \
>   }
>
> Form 9:
>   #define SAT_SUB_U_9(T) \
>   T sat_sub_u_9_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Form 10:
>   #define SAT_SUB_U_10(T) \
>   T sat_sub_u_10_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return !overflow ? ret : 0; \
>   }
>
> Take form 10 as example:
>
> SAT_SUB_U_10(uint64_t);
>
> Before this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   unsigned char _2;
>   uint8_t _3;
>   __complex__ unsigned char _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 == 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3
> ;;4
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;3
>   # _3 = PHI <0(2), _1(3)>
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   uint8_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add more match for unsigned sat_sub.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
> func impl to match phi node for .SAT_SUB.
> (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
> for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 25 +++--
>  gcc/tree-ssa-math-opts.cc | 33 +
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cfe81e80b3..66e411b3359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> @@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERFLOW.  
> */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (bit_and:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
> +  (plus:c (imagpart @2) integer_minus_onep))

:c shouldn't be necessary on the plus

> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
> +/* Unsigned saturation sub, case 6 (branchless mult with .SUB_OVERFLOW.  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
> +  (bit_xor:c (imagpart @2) integer_onep))

or on the bit_xor

OK with those changes.

Richard.

> + (if (INTEGRAL_TYPE_P (

Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-14 Thread Richard Biener

On Fri, Jun 14, 2024 at 8:58 AM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, June 14, 2024 2:52 PM
> > To: Kong, Lingling 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Uros
> > Bizjak 
> > Subject: Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV
> >
> > On Fri, Jun 14, 2024 at 5:12 AM Kong, Lingling 
> > wrote:
> > >
> > > APX CFCMOV[1] feature implements conditionally faulting which means
> > > that all memory faults are suppressed when the condition code
> > > evaluates to false and load or store a memory operand. Now we could load
> > or store a memory operand may trap or fault for conditional move.
> > >
> > > In middle-end, now we don't support a conditional move if we knew that a
> > load from A or B could trap or fault.
> >
> > What's the cost of suppressing a fault?  ISTR that for example fault 
> > suppression
> > for vector masked load/store is quite expensive, so when this is for example
> Yes, avx512 masked load/store, the cost is expensive when memory is invalid.
> > done in a loop where there's always a fault that's suppressed you can see
> > 1000-fold slowdown.  I would suspect this is similar for cfcmov?  So how is 
> > this
> > reflected in the decision to if-convert?
> But for APXF, we were told the cost of invalid memory is as cheap as valid 
> ones.
> (Why else would this instructions be designed?)

Well - I wondered about this for the AVX512 case, so this isn't a good reason to
expect it to be any better for APXF ;)  But if you have confirmation
this is to be
expected (I would expect the silicon design with APX is finished at
this point, even
if actual hardware is still 2-3 years out), then fine - consider me
positively surprised ;)

Richard.

> >
> > > To enable CFCMOV, we add a target HOOK
> > > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> > > in if-conversion pass to allow convert to cmov.
> > >
> > > All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> > > We also tested spec with SDE and passed the runtime test.
> > >
> > > Ok for trunk?
> > >
> > > [1].https://www.intel.com/content/www/us/en/developer/articles/technic
> > > al/advanced-performance-extensions-apx.html
> > >
> > > Lingling Kong (3):
> > >   [APX CFCMOV] Add a new target hook:
> > TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> > >   [APX CFCMOV] Support APX CFCMOV in if_convert pass
> > >   [APX CFCMOV] Support APX CFCMOV in backend
> > >
> > >  gcc/config/i386/i386-expand.cc   |  63 +
> > >  gcc/config/i386/i386-opts.h  |   4 +-
> > >  gcc/config/i386/i386.cc  |  33 ++-
> > >  gcc/config/i386/i386.h   |   1 +
> > >  gcc/config/i386/i386.md  |  53 +++-
> > >  gcc/config/i386/i386.opt |   3 +
> > >  gcc/config/i386/predicates.md|   7 +
> > >  gcc/doc/tm.texi  |   6 +
> > >  gcc/doc/tm.texi.in   |   2 +
> > >  gcc/ifcvt.cc | 247 ++-
> > >  gcc/target.def   |  11 +
> > >  gcc/targhooks.cc |   8 +
> > >  gcc/targhooks.h  |   1 +
> > >  gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c |  73 ++
> > > gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c |  40 +++
> > >  15 files changed, 539 insertions(+), 13 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c
> > >
> > > --
> > > 2.31.1
> > >

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-14 Thread Li, Pan2

Thanks Richard for comments.

> :c shouldn't be necessary on the plus
> or on the bit_xor
> OK with those changes.

Will remove the :c and commit it if there is no surprise from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 4:05 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 12, 2024 at 2:38 PM  wrote:
>
> From: Pan Li 
>
> After we support the scalar unsigned form 1 and 2,  we would like
> to introduce more forms include the branch and branchless.  There
> are forms 3-10 list as below:
>
> Form 3:
>   #define SAT_SUB_U_3(T) \
>   T sat_sub_u_3_##T (T x, T y) \
>   { \
> return x > y ? x - y : 0; \
>   }
>
> Form 4:
>   #define SAT_SUB_U_4(T) \
>   T sat_sub_u_4_##T (T x, T y) \
>   { \
> return x >= y ? x - y : 0; \
>   }
>
> Form 5:
>   #define SAT_SUB_U_5(T) \
>   T sat_sub_u_5_##T (T x, T y) \
>   { \
> return x < y ? 0 : x - y; \
>   }
>
> Form 6:
>   #define SAT_SUB_U_6(T) \
>   T sat_sub_u_6_##T (T x, T y) \
>   { \
> return x <= y ? 0 : x - y; \
>   }
>
> Form 7:
>   #define SAT_SUB_U_7(T) \
>   T sat_sub_u_7_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)(overflow - 1); \
>   }
>
> Form 8:
>   #define SAT_SUB_U_8(T) \
>   T sat_sub_u_8_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)-(!overflow); \
>   }
>
> Form 9:
>   #define SAT_SUB_U_9(T) \
>   T sat_sub_u_9_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Form 10:
>   #define SAT_SUB_U_10(T) \
>   T sat_sub_u_10_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return !overflow ? ret : 0; \
>   }
>
> Take form 10 as example:
>
> SAT_SUB_U_10(uint64_t);
>
> Before this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   unsigned char _2;
>   uint8_t _3;
>   __complex__ unsigned char _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 == 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3
> ;;4
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;3
>   # _3 = PHI <0(2), _1(3)>
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   uint8_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add more match for unsigned sat_sub.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
> func impl to match phi node for .SAT_SUB.
> (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
> for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 25 +++--
>  gcc/tree-ssa-math-opts.cc | 33 +
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cfe81e80b3..66e411b3359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> @@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERFLOW.  
> */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (bit_and:c (realpart (IFN_SUB_OVERFLO

[pushed] doc: Consolidate duplicate MOVBE listings for Intel CPUs

2024-06-14 Thread Gerald Pfeifer

It might be a good idea to sort these lists alphabetically - easier to 
findi specific entries if one is looking for it, and that also avoids such 
cases?

Pushed for now.

Gerald

gcc:
* doc/invoke.texi (x86 Options): Consolidate duplicate MOVBE
listings for haswell, broadwell, skylake, skylake-avx512,
cannonlake, icelake-client, icelake-server, cascadelake,
cooperlake, tigerlake, sapphirerapids, rocketlake, graniterapids,
and graniterapids-d options to -march.
---
 gcc/doc/invoke.texi | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 26e6a349d51..5d7a87fde86 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34476,18 +34476,18 @@ SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, 
PCLMUL, FSGSBASE, RDRND
 and F16C instruction set support.
 
 @item haswell
-Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+Intel Haswell CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND,
 F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instruction set support.
 
 @item broadwell
-Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+Intel Broadwell CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND,
 F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX and PREFETCHW
 instruction set support.
 
 @item skylake
-Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
+Intel Skylake CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND,
 F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES,
 CLFLUSHOPT, XSAVEC, XSAVES and SGX instruction set support.
@@ -34548,14 +34548,14 @@ ENQCMD, UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT, 
CMPCCXADD, AVXVNNIINT16,
 SHA512, SM3, SM4, USER_MSR and PREFETCHI instruction set support.
 
 @item skylake-avx512
-Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
+Intel Skylake Server CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE,
 RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW,
 AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, CLWB, AVX512VL, AVX512BW,
 AVX512DQ and AVX512CD instruction set support.
 
 @item cannonlake
-Intel Cannonlake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2,
+Intel Cannonlake Server CPU with 64-bit extensions, MMX, SSE, SSE2,
 SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL,
 FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX,
 PREFETCHW, AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW,
@@ -34563,7 +34563,7 @@ AVX512DQ, AVX512CD, PKU, AVX512VBMI, AVX512IFMA and SHA 
instruction set
 support.
 
 @item icelake-client
-Intel Icelake Client CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
+Intel Icelake Client CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE,
 RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW,
 AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, AVX512DQ,
@@ -34571,7 +34571,7 @@ AVX512CD, PKU, AVX512VBMI, AVX512IFMA, SHA, AVX512VNNI, 
GFNI, VAES, AVX512VBMI2
 , VPCLMULQDQ, AVX512BITALG, RDPID and AVX512VPOPCNTDQ instruction set support.
 
 @item icelake-server
-Intel Icelake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
+Intel Icelake Server CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE,
 RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW,
 AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, AVX512DQ,
@@ -34580,21 +34580,21 @@ AVX512CD, PKU, AVX512VBMI, AVX512IFMA, SHA, 
AVX512VNNI, GFNI, VAES, AVX512VBMI2
 instruction set support.
 
 @item cascadelake
-Intel Cascadelake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, 
SSSE3,
+Intel Cascadelake CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND,
 F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES,
 CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, CLWB, AVX512VL, AVX512BW, AVX512DQ,
 AVX512CD and AVX512VNNI instruction set support.
 
 @item cooperlake
-Intel cooperlake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, 
SSSE3,
+Intel cooperlake CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
 SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND,
 F16C, AVX2, BMI, BMI2, LZCN

Re: PING^1 [PATCH 44/52] s390: New hook implementation s390_c_mode_for_floating_type

2024-06-14 Thread Andreas Krebbel


On 6/13/24 09:43, Kewen.Lin wrote:

Hi,

Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653382.html

BR,
Kewen

on 2024/6/3 11:01, Kewen Lin wrote:

This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in s390 port, and add new port specific hook
implementation s390_c_mode_for_floating_type.

gcc/ChangeLog:

* config/s390/s390.cc (s390_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/s390/s390.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Ok. Thanks!


Andreas




---
  gcc/config/s390/s390.cc | 15 +++
  gcc/config/s390/s390.h  |  3 ---
  2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index fa517bd3e77..117da36b3c0 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -18066,6 +18066,18 @@ s390_noce_conversion_profitable_p (rtx_insn *seq, 
struct noce_if_info *if_info)
return default_noce_conversion_profitable_p (seq, if_info);
  }
  
+/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode

+   for TI_LONG_DOUBLE_TYPE which is for long double type, go with the
+   default one for the others.  */
+
+static machine_mode
+s390_c_mode_for_floating_type (enum tree_index ti)
+{
+  if (ti == TI_LONG_DOUBLE_TYPE)
+return TARGET_LONG_DOUBLE_128 ? TFmode : DFmode;
+  return default_mode_for_floating_type (ti);
+}
+
  /* Initialize GCC target structure.  */
  
  #undef  TARGET_ASM_ALIGNED_HI_OP

@@ -18382,6 +18394,9 @@ s390_noce_conversion_profitable_p (rtx_insn *seq, 
struct noce_if_info *if_info)
  #undef TARGET_NOCE_CONVERSION_PROFITABLE_P
  #define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
  
+#undef TARGET_C_MODE_FOR_FLOATING_TYPE

+#define TARGET_C_MODE_FOR_FLOATING_TYPE s390_c_mode_for_floating_type
+
  struct gcc_target targetm = TARGET_INITIALIZER;
  
  #include "gt-s390.h"

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 0ea8802..4a4dde1a9ba 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -396,9 +396,6 @@ extern const char *s390_host_detect_local_cpu (int argc, 
const char **argv);
  #define INT_TYPE_SIZE 32
  #define LONG_TYPE_SIZE (TARGET_64BIT ? 64 : 32)
  #define LONG_LONG_TYPE_SIZE 64
-#define FLOAT_TYPE_SIZE 32
-#define DOUBLE_TYPE_SIZE 64
-#define LONG_DOUBLE_TYPE_SIZE (TARGET_LONG_DOUBLE_128 ? 128 : 64)
  
  /* Work around target_flags dependency in ada/targtyps.cc.  */

  #define WIDEST_HARDWARE_FP_SIZE 64

Re: [PATCH] s390: testsuite: Fix nobp-table-jump-*.c

2024-06-14 Thread Andreas Krebbel


On Mon, Jun 03, 2024 at 03:43:39PM +0200, Stefan Schulze Frielinghaus wrote:

Starting with r14-5628-g53ba8d669550d3 interprocedural VRP became strong
enough in order to render these tests useless.  Fixed by disabling IPA.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-table-jump-inline-z10.c: Do not perform
IPA.
* gcc.target/s390/nobp-table-jump-inline-z900.c: Dito.
* gcc.target/s390/nobp-table-jump-z10.c: Dito.
* gcc.target/s390/nobp-table-jump-z900.c: Dito.
---
  Ok for mainline?


Ok. Thanks!


Andreas




  .../s390/nobp-table-jump-inline-z10.c | 42 +--
  .../s390/nobp-table-jump-inline-z900.c| 42 +--
  .../gcc.target/s390/nobp-table-jump-z10.c | 42 +--
  .../gcc.target/s390/nobp-table-jump-z900.c| 42 +--
  4 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
index 8dfd7e4c786..121751166d0 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
@@ -4,29 +4,29 @@
  /* case-values-threshold will be set to 20 by the back-end when jump
 thunk are requested.  */
  
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }

-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
-int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
-int __attribute__((noinline,noclone)) foo13 (void) { return 13; }
-int __attribute__((noinline,noclone)) foo14 (void) { return 14; }
-int __attribute__((noinline,noclone)) foo15 (void) { return 15; }
-int __attribute__((noinline,noclone)) foo16 (void) { return 16; }
-int __attribute__((noinline,noclone)) foo17 (void) { return 17; }
-int __attribute__((noinline,noclone)) foo18 (void) { return 18; }
-int __attribute__((noinline,noclone)) foo19 (void) { return 19; }
-int __attribute__((noinline,noclone)) foo20 (void) { return 20; }
+int __attribute__((noipa)) foo1 (void) { return 1; }
+int __attribute__((noipa)) foo2 (void) { return 2; }
+int __attribute__((noipa)) foo3 (void) { return 3; }
+int __attribute__((noipa)) foo4 (void) { return 4; }
+int __attribute__((noipa)) foo5 (void) { return 5; }
+int __attribute__((noipa)) foo6 (void) { return 6; }
+int __attribute__((noipa)) foo7 (void) { return 7; }
+int __attribute__((noipa)) foo8 (void) { return 8; }
+int __attribute__((noipa)) foo9 (void) { return 9; }
+int __attribute__((noipa)) foo10 (void) { return 10; }
+int __attribute__((noipa)) foo11 (void) { return 11; }
+int __attribute__((noipa)) foo12 (void) { return 12; }
+int __attribute__((noipa)) foo13 (void) { return 13; }
+int __attribute__((noipa)) foo14 (void) { return 14; }
+int __attribute__((noipa)) foo15 (void) { return 15; }
+int __attribute__((noipa)) foo16 (void) { return 16; }
+int __attribute__((noipa)) foo17 (void) { return 17; }
+int __attribute__((noipa)) foo18 (void) { return 18; }
+int __attribute__((noipa)) foo19 (void) { return 19; }
+int __attribute__((noipa)) foo20 (void) { return 20; }
  
  
-int __attribute__((noinline,noclone))

+int __attribute__((noipa))
  bar (int a)
  {
int ret = 0;
diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
index 46d2c54bcff..5ad0c72afc3 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
@@ -4,29 +4,29 @@
  /* case-values-threshold will be set to 20 by the back-end when jump
 thunk are requested.  */
  
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }

-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) {

[PATCH] gccrs: configure.ac: Fix quoting around ldl/lpthread checks

2024-06-14 Thread Arthur Cohen

ChangeLog:

* configure.ac: Add quoting around both variable checks performed, which
will fix the noise reported when libc contains both ldl and lpthread. A
typo around `pthread_create` being typed `pthread_crate` is also fixed.
* configure: Regenerate.
---
 configure| 10 +-
 configure.ac | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/configure b/configure
index a9ea5258f0f..46f00b476c5 100755
--- a/configure
+++ b/configure
@@ -8827,7 +8827,7 @@ fi
 
 
 
-# Rust requires -ldl and -lpthread if you are using an old glibc that does not 
include them by
+# Rust requires -ldl and -lpthread if you are using a libc which does not 
include them by
 # default, so we check for them here
 
 missing_rust_dynlibs=none
@@ -8945,15 +8945,15 @@ if test "$ac_res" != no; then :
 fi
 
 
-if test $ac_cv_search_dlopen = -ldl; then
+if test "$ac_cv_search_dlopen" = -ldl; then
 CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test $ac_cv_search_dlopen = no; then
+elif test "$ac_cv_search_dlopen" = no; then
 missing_rust_dynlibs="libdl"
 fi
 
-if test $ac_cv_search_pthread_create = -lpthread; then
+if test "$ac_cv_search_pthread_create" = -lpthread; then
 CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test $ac_cv_search_pthread_crate = no; then
+elif test "$ac_cv_search_pthread_create" = no; then
 missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
 fi
 
diff --git a/configure.ac b/configure.ac
index adb738ac346..0befcc01cd6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2036,7 +2036,7 @@ fi
 
 AC_SUBST(PICFLAG)
 
-# Rust requires -ldl and -lpthread if you are using an old glibc that does not 
include them by
+# Rust requires -ldl and -lpthread if you are using a libc which does not 
include them by
 # default, so we check for them here
 
 missing_rust_dynlibs=none
@@ -2044,15 +2044,15 @@ missing_rust_dynlibs=none
 AC_SEARCH_LIBS([dlopen], [dl])
 AC_SEARCH_LIBS([pthread_create], [pthread])
 
-if test $ac_cv_search_dlopen = -ldl; then
+if test "$ac_cv_search_dlopen" = -ldl; then
 CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test $ac_cv_search_dlopen = no; then
+elif test "$ac_cv_search_dlopen" = no; then
 missing_rust_dynlibs="libdl"
 fi
 
-if test $ac_cv_search_pthread_create = -lpthread; then
+if test "$ac_cv_search_pthread_create" = -lpthread; then
 CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test $ac_cv_search_pthread_crate = no; then
+elif test "$ac_cv_search_pthread_create" = no; then
 missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
 fi
 
-- 
2.42.0

Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-14 Thread Richard Sandiford

HAO CHEN GUI  writes:
>> The:
>> 
>>   bool ok = recog (attempt, use_change);
>> 
>> should leave INSN_CODE set to the result of the successful recog.
>> Why isn't that true in the example you hit?
>> 
>> I wondered whether we might be trying to cost a NOOP_MOVE_INSN_CODE,
>> since I couldn't see anything in the current code to stop that.
>> But if so, that's a bug.  NOOP_MOVE_INSN_CODE should have zero cost,
>> and shouldn't go through insn_cost.
>
> Yes, the recog sets the recog data for the new rtl. But
> changes_are_worthwhile calls change->old_cost and finally calls
> calculate_cost to get the insn cost for old rtl. The undo/redo_changes
> don't restore/recover the recog_data, which causes insn_cost takes the
> wrong number of operands (recog_data.n_operands) and wrong operands
> (recog_data.operand[]). Note, rs6000_insn_cost checks if the recog data
> is cached. If so, it doesn't recog the insn again.
>
> void
> insn_info::calculate_cost () const
> {
>   basic_block cfg_bb = BLOCK_FOR_INSN (m_rtl);
>   temporarily_undo_changes (0);
>   m_cost_or_uid = insn_cost (m_rtl, optimize_bb_for_speed_p (cfg_bb));
>   redo_changes (0);
> }
>
> // A simple debug output for calculate_cost
> before undo
> INSN CODE 850
> n_operands 3
>
> after undo
> INSN CODE 685
> n_operands 3
>
> Actually insn pattern 850 has 3 operands and insn pattern 685 has 2
> operands.
>
> IMHO, we should invalidate recog data after each undo/redo or before
> calling the insn cost.

Ah, ok.  If the problem is stale recog_data information, I think we
should instead make recog.cc:swap_change conditionally invalidate it.
I.e. change:

  if (changes[num].object && !MEM_P (changes[num].object))
std::swap (INSN_CODE (changes[num].object), changes[num].old_code);

to something like:

  if (changes[num].object && !MEM_P (changes[num].object))
{
  std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
  if (recog_data.insn == changes[num].object)
recot_data.insn = nullptr;
}

(untested).  IMO this is better then resetting the INSN_CODE, since the
INSN_CODE should be correct and is relatively expensive to recompute.

Thanks,
Richard

Re: [PATCH] build: Fix missing variable quotes

2024-06-14 Thread Arthur Cohen


Hi Collin,

Sorry about the mess. As Sam pointed out, there was already a patch 
proposed in the bugzilla PR which I've posted on the ML as well for 
review. The only difference with your patch is that it also changes 
ac_cv_search_pthread_crate to ac_cv_search_pthread_create, as otherwise 
the check will always be false.


Best,

Arthur

On 6/14/24 02:53, Collin Funk wrote:

When dlopen and pthread_create are in libc the variable is
set to "none required", therefore running configure will show
the following errors:

./configure: line 8997: test: too many arguments
./configure: line 8999: test: too many arguments
./configure: line 9003: test: too many arguments
./configure: line 9005: test: =: unary operator expected

ChangeLog:

* configure.ac: Quote variable result of AC_SEARCH_LIBS.
 * configure: Regenerate.

Signed-off-by: Collin Funk 
---
  configure| 10 +-
  configure.ac |  8 
  2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 51576a41f30..6e95b27d9df 100755
--- a/configure
+++ b/configure
@@ -8994,15 +8994,15 @@ if test "$ac_res" != no; then :
  fi
  
  
-if test $ac_cv_search_dlopen = -ldl; then

+if test "$ac_cv_search_dlopen" = -ldl; then
  CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test $ac_cv_search_dlopen = no; then
+elif test "$ac_cv_search_dlopen" = no; then
  missing_rust_dynlibs="libdl"
  fi
  
-if test $ac_cv_search_pthread_create = -lpthread; then

+if test "$ac_cv_search_pthread_create" = -lpthread; then
  CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test $ac_cv_search_pthread_crate = no; then
+elif test "$ac_cv_search_pthread_crate" = no; then
  missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
  fi
  
@@ -19746,7 +19746,7 @@ config.status

  configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"
  
-Copyright (C) 2012 Free Software Foundation, Inc.

+Copyright (C)  Free Software Foundation, Inc.
  This config.status script is free software; the Free Software Foundation
  gives unlimited permission to copy, distribute and modify it."
  
diff --git a/configure.ac b/configure.ac

index 5eda8dcdbf7..88576b31bfc 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2045,15 +2045,15 @@ missing_rust_dynlibs=none
  AC_SEARCH_LIBS([dlopen], [dl])
  AC_SEARCH_LIBS([pthread_create], [pthread])
  
-if test $ac_cv_search_dlopen = -ldl; then

+if test "$ac_cv_search_dlopen" = -ldl; then
  CRAB1_LIBS="$CRAB1_LIBS -ldl"
-elif test $ac_cv_search_dlopen = no; then
+elif test "$ac_cv_search_dlopen" = no; then
  missing_rust_dynlibs="libdl"
  fi
  
-if test $ac_cv_search_pthread_create = -lpthread; then

+if test "$ac_cv_search_pthread_create" = -lpthread; then
  CRAB1_LIBS="$CRAB1_LIBS -lpthread"
-elif test $ac_cv_search_pthread_crate = no; then
+elif test "$ac_cv_search_pthread_crate" = no; then
  missing_rust_dynlibs="$missing_rust_dynlibs, libpthread"
  fi

Re: [PATCH] build: Fix missing variable quotes

2024-06-14 Thread YunQiang Su

Sam James  于2024年6月14日周五 09:02写道：
>
> Collin Funk  writes:
>
> > When dlopen and pthread_create are in libc the variable is
> > set to "none required", therefore running configure will show
> > the following errors:
> >
> > ./configure: line 8997: test: too many arguments
> > ./configure: line 8999: test: too many arguments
> > ./configure: line 9003: test: too many arguments
> > ./configure: line 9005: test: =: unary operator expected
> >
> > ChangeLog:
> >
> >   * configure.ac: Quote variable result of AC_SEARCH_LIBS.
> > * configure: Regenerate.
>
> This is PR115453 (which also needs to address a 'crate' typo).
>

I noticed another similar problem. I guess that we can put them in a
single patch:
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5317,7 +5327,7 @@ x:

 AC_MSG_CHECKING(assembler and linker for explicit JALR relocation)
 gcc_cv_as_ld_jalr_reloc=no
-if test $gcc_cv_as_mips_explicit_relocs = yes; then
+if test x$gcc_cv_as_mips_explicit_relocs = xyes; then
   if test $in_tree_ld = yes ; then
 if test "$gcc_cv_gld_major_version" -eq 2 -a
"$gcc_cv_gld_minor_version" -ge 20 -o "$gcc_cv_gld_major_version" -gt
2 \
&& test $in_tree_ld_is_elf = yes; then

[PATCH][v2] Support single def-use cycle optimization for SLP reduction vectorization

2024-06-14 Thread Richard Biener

We can at least mimic single def-use cycle optimization when doing
single-lane SLP reductions and that's required to avoid regressing
compared to non-SLP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-loop.cc (vectorizable_reduction): Allow
single-def-use cycles with SLP.
(vect_transform_reduction): Handle SLP single def-use cycles.
(vect_transform_cycle_phi): Likewise.

* gcc.dg/vect/slp-reduc-12.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-reduc-12.c | 18 ++
 gcc/tree-vect-loop.cc| 45 ++--
 2 files changed, 45 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-reduc-12.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c
new file mode 100644
index 000..625f8097c54
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_hw_misalign } */
+/* { dg-additional-options "-Ofast" } */
+
+double foo (double *x, int * __restrict a, int n)
+{
+  double r = 0.;
+  for (int i = 0; i < n; ++i)
+{
+  a[i] = a[i] + i;
+  r += x[i];
+}
+  return r;
+}
+
+/* { dg-final { scan-tree-dump "using single def-use cycle for reduction" 
"vect" } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bbd5d261907..d9a2ad69484 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8320,7 +8320,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
participating.  When unrolling we want each unrolled iteration to have its
own reduction accumulator since one of the main goals of unrolling a
reduction is to reduce the aggregate loop-carried latency.  */
-  if (ncopies > 1
+  if ((ncopies > 1
+   || (slp_node
+  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
+  && SLP_TREE_LANES (slp_node) == 1
+  && vect_get_num_copies (loop_vinfo, vectype_in) > 1))
   && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
   && reduc_chain_length == 1
   && loop_vinfo->suggested_unroll_factor == 1)
@@ -8373,6 +8377,10 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
single_defuse_cycle = false;
}
 }
+  if (dump_enabled_p () && single_defuse_cycle)
+dump_printf_loc (MSG_NOTE, vect_location,
+"using single def-use cycle for reduction by reducing "
+"multiple vectors to one in the loop body\n");
   STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info) = single_defuse_cycle;
 
   /* If the reduction stmt is one of the patterns that have lane
@@ -8528,9 +8536,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 {
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  int i;
-  int ncopies;
-  int vec_num;
+  unsigned ncopies;
+  unsigned vec_num;
 
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -8577,7 +8584,6 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   auto_vec vec_oprnds0;
   auto_vec vec_oprnds1;
   auto_vec vec_oprnds2;
-  tree def0;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
@@ -8652,20 +8658,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
  definition.  */
   if (single_defuse_cycle)
 {
-  gcc_assert (!slp_node);
-  vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
-op.ops[reduc_index],
-reduc_index == 0 ? &vec_oprnds0
-: (reduc_index == 1 ? &vec_oprnds1
-   : &vec_oprnds2));
+  vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, 1,
+reduc_index == 0 ? op.ops[0] : NULL_TREE, &vec_oprnds0,
+reduc_index == 1 ? op.ops[1] : NULL_TREE, &vec_oprnds1,
+reduc_index == 2 ? op.ops[2] : NULL_TREE,
+&vec_oprnds2);
 }
 
   bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
 
-  FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
+  unsigned num = (reduc_index == 0
+ ? vec_oprnds1.length () : vec_oprnds0.length ());
+  for (unsigned i = 0; i < num; ++i)
 {
   gimple *new_stmt;
-  tree vop[3] = { def0, vec_oprnds1[i], NULL_TREE };
+  tree vop[3] = { vec_oprnds0[i], vec_oprnds1[i], NULL_TREE };
   if (masked_loop_p && !mask_by_cond_expr)
{
  /* No conditional ifns have been defined for dot-product yet.  */
@@ -8720,10 +8727,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, gsi);
}
 
-  if (slp_n

[PATCH] Adjust gcc.target/i386/vect-strided-3.c

2024-06-14 Thread Richard Biener

The following disables SSE4 instead of just AVX to avoid
pextrq being used, confusing the assembler scanning.  This
avoids the reported failure with -march=cascadelake but adds
a FAIL for -march=cascadelake -m32 (I've opened PR115487 for that).

Tested on x86_64-unknown-linux-gnu, pushed.

* gcc.target/i386/vect-strided-3.c: Disable SSE4 instead of AVX.
---
 gcc/testsuite/gcc.target/i386/vect-strided-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-3.c 
b/gcc/testsuite/gcc.target/i386/vect-strided-3.c
index b462701a0b2..f9c54a6f715 100644
--- a/gcc/testsuite/gcc.target/i386/vect-strided-3.c
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse2 -mno-avx -fno-tree-slp-vectorize" } */
+/* { dg-options "-O2 -msse2 -mno-sse4 -fno-tree-slp-vectorize" } */
 
 void foo (int * __restrict a, int *b, int s)
 {
-- 
2.35.3

Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-14 Thread Jonathan Wakely


On 07/06/24 19:40 +0100, Roger Sayle wrote:


This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.


It doesn't though.  The C++17 guaranteed copy elision rules are not
relevant here.  This is about lookup for the constructor used in the
return statement, and whether that lookup considers the variable to be
an lvalue or an rvalue.  C++11 already says this is valid:

i#include 

std::unique_ptr f()
{
std::unique_ptr m;
return m;
}

See C++11 12.8 [class.copy] p31:

  This elision of copy/move operations, called copy elision, is
  permitted in the following circumstances (which may be combined to
  eliminate multiple copies):

  - in a return statement in a function with a class return type, when
  the expression is the name of a non-volatile automatic object (other
  than a function or catch-clause parameter) with the same cv-
  unqualified type as the function return type, the copy/move
  operation can be omitted by constructing the automatic object
  directly into the function’s return value

and then p32:

  When the criteria for elision of a copy operation are met or would
  be met save for the fact that the source object is a function
  parameter, and the object to be copied is designated by an lvalue,
  overload resolution to select the constructor for the copy is first
  performed as if the object were designated by an rvalue.

The constructor isn't required to be elided in C++11, but the compiler
is required to use a move constructor instead of a copy constructor,
if a move constructor is available. So you don't need to use std::move
on the return value.

And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even
with 4.7.1).

So the std::move calls you've added are redundant, and will cause
-Wredundant-move warnings.

What's the error you were seeing?



Bootstrapped on x86_64-pc-linux-gnu using both gcc 4.8.5 (system) and
gcc 10.2.1 (using "scl enable devetoolset-10") as host compilers.
Ok for mainline?


2024-06-07  Roger Sayle  

gcc/analyzer/ChangeLog
   * constraint-manager.cc (equiv_class::make_dump_widget): Use
   std::move to return a std::unique_ptr.
   (bounded_ranges_constraint::make_dump_widget): Likewise.
   (constraint_manager::make_dump_widget): Likewise.
   * program_state.cc (sm_state_map::make_dump_widget): Likewise.
   (program_state::make_dump_widget): Likewise.
   * region-model.cc (region_to_value_map::make_dump_widget): Likewise.
   (region_model::make_dump_widget): Likewise.
   * region.cc (region::make_dump_widget): Likewise.
   * store.cc (binding_cluster::make_dump_widget): Likewise.
   (store::make_dump_widget): Likewise.
   * svalue.cc (svalue::make_dump_widget): Likewise.

Thanks in advance,
Roger
--




diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index 707385d..883f33b 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -1176,7 +1176,7 @@ equiv_class::make_dump_widget (const 
text_art::dump_widget_info &dwi,
  ec_widget->add_child (tree_widget::make (dwi, &pp));
}

-  return ec_widget;
+  return std::move (ec_widget);
}

/* Generate a hash value for this equiv_class.
@@ -1500,7 +1500,7 @@ make_dump_widget (const text_art::dump_widget_info &dwi) 
const
(tree_widget::from_fmt (dwi, nullptr,
"ec%i bounded ranges", m_ec_id.as_int ()));
  m_ranges->add_to_dump_widget (*brc_widget.get (), dwi);
-  return brc_widget;
+  return std::move (brc_widget);
}

bool
@@ -1853,7 +1853,7 @@ constraint_manager::make_dump_widget (const 
text_art::dump_widget_info &dwi) con
  if (cm_widget->get_num_children () == 0)
return nullptr;

-  return cm_widget;
+  return std::move (cm_widget);
}

/* Attempt to add the constraint LHS OP RHS to this constraint_manager.
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index dc2d4bd..efaf569 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -382,7 +382,7 @@ sm_state_map::make_dump_widget (const 
text_art::dump_widget_info &dwi,
  state_widget->add_child (tree_widget::make (dwi, pp));
}

-  return state_widget;
+  return std::move (state_widget);
}

/* Return true if no states have been set within this map
@@ -1247,7 +1247,7 @@ program_state::make_dump_widget (const 
text_art::dump_widget_info &dwi) const
state_widget->add_child (smap->make_dump_widget (dwi, m_region_model));
  }

-  return state_widget;
+  return std::move (state_widget);
}

/* Update this program_state to reflect a top-level call to FUN.
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d142d85..4fbc970 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -288,7 +288,7 @@ make_dump_widget (const text_art::dump_wi

Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-14 Thread Jonathan Wakely


On 14/06/24 10:36 +0100, Jonathan Wakely wrote:

On 07/06/24 19:40 +0100, Roger Sayle wrote:


This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.


It doesn't though.  The C++17 guaranteed copy elision rules are not
relevant here.  This is about lookup for the constructor used in the
return statement, and whether that lookup considers the variable to be
an lvalue or an rvalue.  C++11 already says this is valid:

i#include 

std::unique_ptr f()
{
   std::unique_ptr m;
   return m;
}

See C++11 12.8 [class.copy] p31:

 This elision of copy/move operations, called copy elision, is
 permitted in the following circumstances (which may be combined to
 eliminate multiple copies):

 - in a return statement in a function with a class return type, when
 the expression is the name of a non-volatile automatic object (other
 than a function or catch-clause parameter) with the same cv-
 unqualified type as the function return type, the copy/move
 operation can be omitted by constructing the automatic object
 directly into the function’s return value

and then p32:

 When the criteria for elision of a copy operation are met or would
 be met save for the fact that the source object is a function
 parameter, and the object to be copied is designated by an lvalue,
 overload resolution to select the constructor for the copy is first
 performed as if the object were designated by an rvalue.

The constructor isn't required to be elided in C++11, but the compiler
is required to use a move constructor instead of a copy constructor,
if a move constructor is available. So you don't need to use std::move
on the return value.

And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even
with 4.7.1).

So the std::move calls you've added are redundant, and will cause
-Wredundant-move warnings.

What's the error you were seeing?


I can reproduce it:

/home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function 
‘std::unique_ptr ana::equiv_class::make_dump_widget(const 
text_art::dump_widget_info&, unsigned int) const’:
/home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind 
‘std::unique_ptr’ lvalue to 
‘std::unique_ptr&&’
   return ec_widget;
  ^

Odd ... I'm looking into it ...

[PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Richard Biener

The following retires vcond{,u,eq} optabs by stopping to use them
from the middle-end.  Targets instead (should) implement vcond_mask
and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
possibly affected targets - those implementing these patterns,
and in particular it lists mips, sparc and ia64 as targets that
most definitely will regress while others might simply remove
their vcond{,u,eq} patterns.

I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
I know riscv doesn't implement any of the legacy optabs.  But less
maintained vector targets might need adjustments.

I want to get rid of those optabs for GCC 15.  If I don't hear from
you I will assume your target is fine.

Thanks,
Richard.

PR middle-end/114189
* optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing.
(get_vcond_eq_icode): Likewise.
---
 gcc/optabs-query.h | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 0cb2c21ba85..31fbce80175 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode 
mask_mode)
mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  */
 
 inline enum insn_code
-get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns)
+get_vcond_icode (machine_mode, machine_mode, bool)
 {
-  enum insn_code icode = CODE_FOR_nothing;
-  if (uns)
-icode = convert_optab_handler (vcondu_optab, vmode, cmode);
-  else
-icode = convert_optab_handler (vcond_optab, vmode, cmode);
-  return icode;
+  return CODE_FOR_nothing;
 }
 
 /* Return insn code for a conditional operator with a mask mode
@@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode 
mmode)
mode CMODE (only EQ/NE), resulting in a value of mode VMODE.  */
 
 inline enum insn_code
-get_vcond_eq_icode (machine_mode vmode, machine_mode cmode)
+get_vcond_eq_icode (machine_mode, machine_mode)
 {
-  return convert_optab_handler (vcondeq_optab, vmode, cmode);
+  return CODE_FOR_nothing;
 }
 
 /* Enumerates the possible extraction_insn operations.  */
-- 
2.35.3

Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-14 Thread Jonathan Wakely


On 14/06/24 11:26 +0100, Jonathan Wakely wrote:

On 14/06/24 10:36 +0100, Jonathan Wakely wrote:

On 07/06/24 19:40 +0100, Roger Sayle wrote:


This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.


It doesn't though.  The C++17 guaranteed copy elision rules are not
relevant here.  This is about lookup for the constructor used in the
return statement, and whether that lookup considers the variable to be
an lvalue or an rvalue.  C++11 already says this is valid:

i#include 

std::unique_ptr f()
{
  std::unique_ptr m;
  return m;
}

See C++11 12.8 [class.copy] p31:

This elision of copy/move operations, called copy elision, is
permitted in the following circumstances (which may be combined to
eliminate multiple copies):

- in a return statement in a function with a class return type, when
the expression is the name of a non-volatile automatic object (other
than a function or catch-clause parameter) with the same cv-
unqualified type as the function return type, the copy/move
operation can be omitted by constructing the automatic object
directly into the function’s return value

and then p32:

When the criteria for elision of a copy operation are met or would
be met save for the fact that the source object is a function
parameter, and the object to be copied is designated by an lvalue,
overload resolution to select the constructor for the copy is first
performed as if the object were designated by an rvalue.

The constructor isn't required to be elided in C++11, but the compiler
is required to use a move constructor instead of a copy constructor,
if a move constructor is available. So you don't need to use std::move
on the return value.

And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even
with 4.7.1).

So the std::move calls you've added are redundant, and will cause
-Wredundant-move warnings.

What's the error you were seeing?


I can reproduce it:

/home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function 
‘std::unique_ptr ana::equiv_class::make_dump_widget(const 
text_art::dump_widget_info&, unsigned int) const’:
/home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind 
‘std::unique_ptr’ lvalue to 
‘std::unique_ptr&&’
  return ec_widget;
 ^

Odd ... I'm looking into it ...


Ah, the problem here is that the function's return type is
std::unique_ptr but the return value is
std::unique_ptr and that requires the converting
constructor, not the move constructor.

That *did* change after C++11, and isn't supported by G++ until 5.1

Is there a reason that function has to return unique_ptr
instead of unique_ptr? The conversion to ptr-to-base
could happen at the call site (where you always have an rvalue)
instead of on the return value.

Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-14 Thread Jonathan Wakely


On 14/06/24 11:37 +0100, Jonathan Wakely wrote:

On 14/06/24 11:26 +0100, Jonathan Wakely wrote:

On 14/06/24 10:36 +0100, Jonathan Wakely wrote:

On 07/06/24 19:40 +0100, Roger Sayle wrote:


This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.


It doesn't though.  The C++17 guaranteed copy elision rules are not
relevant here.  This is about lookup for the constructor used in the
return statement, and whether that lookup considers the variable to be
an lvalue or an rvalue.  C++11 already says this is valid:

i#include 

std::unique_ptr f()
{
 std::unique_ptr m;
 return m;
}

See C++11 12.8 [class.copy] p31:

This elision of copy/move operations, called copy elision, is
permitted in the following circumstances (which may be combined to
eliminate multiple copies):

- in a return statement in a function with a class return type, when
the expression is the name of a non-volatile automatic object (other
than a function or catch-clause parameter) with the same cv-
unqualified type as the function return type, the copy/move
operation can be omitted by constructing the automatic object
directly into the function’s return value

and then p32:

When the criteria for elision of a copy operation are met or would
be met save for the fact that the source object is a function
parameter, and the object to be copied is designated by an lvalue,
overload resolution to select the constructor for the copy is first
performed as if the object were designated by an rvalue.

The constructor isn't required to be elided in C++11, but the compiler
is required to use a move constructor instead of a copy constructor,
if a move constructor is available. So you don't need to use std::move
on the return value.

And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even
with 4.7.1).

So the std::move calls you've added are redundant, and will cause
-Wredundant-move warnings.

What's the error you were seeing?


I can reproduce it:

/home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function 
‘std::unique_ptr ana::equiv_class::make_dump_widget(const 
text_art::dump_widget_info&, unsigned int) const’:
/home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind 
‘std::unique_ptr’ lvalue to 
‘std::unique_ptr&&’
 return ec_widget;
^

Odd ... I'm looking into it ...


Ah, the problem here is that the function's return type is
std::unique_ptr but the return value is
std::unique_ptr and that requires the converting
constructor, not the move constructor.

That *did* change after C++11, and isn't supported by G++ until 5.1


That changed with https://cplusplus.github.io/CWG/issues/1579.html
which is in C++14 and is a DR against C++11, but of course compilers
that implement the original C++11 rules don't support that DR.

Arguably, g++ should not give -Wredundant-move warnings here when
using the -std=c++11 dialect, because it's only redundant if the C++11
compiler implements that DR.


Is there a reason that function has to return unique_ptr
instead of unique_ptr? The conversion to ptr-to-base
could happen at the call site (where you always have an rvalue)
instead of on the return value.


Changing the return type to match the return value would avoid the
issue entirely though, as it would be valid with all C++11 compilers.

Re: [analyzer PATCH] Restore bootstrap with g++ 4.8.

2024-06-14 Thread Jonathan Wakely


On 14/06/24 11:42 +0100, Jonathan Wakely wrote:

On 14/06/24 11:37 +0100, Jonathan Wakely wrote:

On 14/06/24 11:26 +0100, Jonathan Wakely wrote:

On 14/06/24 10:36 +0100, Jonathan Wakely wrote:

On 07/06/24 19:40 +0100, Roger Sayle wrote:


This patch restores bootstrap when using g++ 4.8 as a host compiler.
Returning a std::unique_ptr requires a std::move on C++ compilers
(pre-C++17) that don't guarantee copy elision/return value optimization.


It doesn't though.  The C++17 guaranteed copy elision rules are not
relevant here.  This is about lookup for the constructor used in the
return statement, and whether that lookup considers the variable to be
an lvalue or an rvalue.  C++11 already says this is valid:

i#include 

std::unique_ptr f()
{
std::unique_ptr m;
return m;
}

See C++11 12.8 [class.copy] p31:

This elision of copy/move operations, called copy elision, is
permitted in the following circumstances (which may be combined to
eliminate multiple copies):

- in a return statement in a function with a class return type, when
the expression is the name of a non-volatile automatic object (other
than a function or catch-clause parameter) with the same cv-
unqualified type as the function return type, the copy/move
operation can be omitted by constructing the automatic object
directly into the function’s return value

and then p32:

When the criteria for elision of a copy operation are met or would
be met save for the fact that the source object is a function
parameter, and the object to be copied is designated by an lvalue,
overload resolution to select the constructor for the copy is first
performed as if the object were designated by an rvalue.

The constructor isn't required to be elided in C++11, but the compiler
is required to use a move constructor instead of a copy constructor,
if a move constructor is available. So you don't need to use std::move
on the return value.

And the code above compiles fine with gcc 4.8.5 on CentOS 7 (and even
with 4.7.1).

So the std::move calls you've added are redundant, and will cause
-Wredundant-move warnings.

What's the error you were seeing?


I can reproduce it:

/home/test/src/gcc/gcc/analyzer/constraint-manager.cc: In member function 
‘std::unique_ptr ana::equiv_class::make_dump_widget(const 
text_art::dump_widget_info&, unsigned int) const’:
/home/test/src/gcc/gcc/analyzer/constraint-manager.cc:1179:10: error: cannot bind 
‘std::unique_ptr’ lvalue to 
‘std::unique_ptr&&’
return ec_widget;
   ^

Odd ... I'm looking into it ...


Ah, the problem here is that the function's return type is
std::unique_ptr but the return value is
std::unique_ptr and that requires the converting
constructor, not the move constructor.

That *did* change after C++11, and isn't supported by G++ until 5.1


That changed with https://cplusplus.github.io/CWG/issues/1579.html


Which for the record was implemented in r5-1576-gfb682f9458c6cf (by
me, which I'd forgotten!)


which is in C++14 and is a DR against C++11, but of course compilers
that implement the original C++11 rules don't support that DR.

Arguably, g++ should not give -Wredundant-move warnings here when
using the -std=c++11 dialect, because it's only redundant if the C++11
compiler implements that DR.


Is there a reason that function has to return unique_ptr
instead of unique_ptr? The conversion to ptr-to-base
could happen at the call site (where you always have an rvalue)
instead of on the return value.


Changing the return type to match the return value would avoid the
issue entirely though, as it would be valid with all C++11 compilers.

Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-14 Thread HAO CHEN GUI




在 2024/6/14 16:36, Richard Sandiford 写道:
> Ah, ok.  If the problem is stale recog_data information, I think we
> should instead make recog.cc:swap_change conditionally invalidate it.
> I.e. change:
> 
>   if (changes[num].object && !MEM_P (changes[num].object))
> std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
> 
> to something like:
> 
>   if (changes[num].object && !MEM_P (changes[num].object))
> {
>   std::swap (INSN_CODE (changes[num].object), changes[num].old_code);
>   if (recog_data.insn == changes[num].object)
> recot_data.insn = nullptr;
> }
> 
> (untested).  IMO this is better then resetting the INSN_CODE, since the
> INSN_CODE should be correct and is relatively expensive to recompute.

It's better. I will modify the patch and test it.

Thanks
Gui Haochen

Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-14 Thread HAO CHEN GUI

Hi Richard,
  Thanks for your comments.

在 2024/6/12 15:51, Richard Sandiford 写道:
> It should only be necessary to call change_is_worthwhile once,
> with strict == !prop.likely_profitable_p ()
> 
> So something like:
> 
>   bool ok = recog (attempt, use_change);
>   if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
> {
>   bool strict_p = !prop.likely_profitable_p ();
>   if (!change_is_worthwhile (use_change, strict_p))
>   {
> if (dump_file)
>   fprintf (dump_file, "change not profitable");
> ok = false;
>   }
> }

I will modify the code. Thanks.

> The:
> 
>   bool ok = recog (attempt, use_change);
> 
> should leave INSN_CODE set to the result of the successful recog.
> Why isn't that true in the example you hit?
> 
> I wondered whether we might be trying to cost a NOOP_MOVE_INSN_CODE,
> since I couldn't see anything in the current code to stop that.
> But if so, that's a bug.  NOOP_MOVE_INSN_CODE should have zero cost,
> and shouldn't go through insn_cost.

Yes, the recog sets the recog data for the new rtl. But
changes_are_worthwhile calls change->old_cost and finally calls
calculate_cost to get the insn cost for old rtl. The undo/redo_changes
don't restore/recover the recog_data, which causes insn_cost takes the
wrong number of operands (recog_data.n_operands) and wrong operands
(recog_data.operand[]). Note, rs6000_insn_cost checks if the recog data
is cached. If so, it doesn't recog the insn again.

void
insn_info::calculate_cost () const
{
  basic_block cfg_bb = BLOCK_FOR_INSN (m_rtl);
  temporarily_undo_changes (0);
  m_cost_or_uid = insn_cost (m_rtl, optimize_bb_for_speed_p (cfg_bb));
  redo_changes (0);
}

// A simple debug output for calculate_cost
before undo
INSN CODE 850
n_operands 3

after undo
INSN CODE 685
n_operands 3

Actually insn pattern 850 has 3 operands and insn pattern 685 has 2
operands.

IMHO, we should invalidate recog data after each undo/redo or before
calling the insn cost.

Looking forward to your advice.

Thanks
Gui Haochen

Re: [Patch, rs6000, middle-end] v2: Add implementation for different targets for pair mem fusion

2024-06-14 Thread Richard Sandiford

Ajit Agarwal  writes:
> Hello Richard:
>
> All comments are addressed.

I don't think this addresses the following comments from the previous
reviews:

(1) It is not correct to mark existing insn uses as live-out.
The patch mustn't try to do this.

(2) To quote a previous review:

It's probably better to create a fresh OO register, rather than
change an existing 128-bit register to 256 bits.  If we do that,
and if reg:V16QI 125 is the destination of the second load
(which I assume it is from the 16 offset in the subreg),
then the new RTL should be:

  (vec_select:HI (subreg:V8HI (reg:OO NEW_REG) 16) ...)

It's possible to get this by using insn_propagation to replace
(reg:V16QI 125) with (subreg:V16QI (reg:OO NEW_REG) 16).
insn_propagation should then take care of the rest.

There are no existing rtl-ssa routines for handling new registers
though.  (The idea was to add things as the need arose.)

The reason for (2) is that changing the mode of an existing pseudo
invalidates all existing references to that pseudo.  Although the
patch tries to fix things up, it's doing that at a stage where
there is already "garbage in" (in the sense that the starting
RTL is invalid).  Just changing the mode would also invalidate
things like REG_EXPR, for example.

In contrast, the advantage of creating a new pseudo means that every
insn transformation is from structurally valid RTL to structurally
valid RTL.  It also prevents information being incorrectly carried
over from the old pseudo.
x
Thanks,
Richard

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Richard Sandiford

Richard Biener  writes:
> The following retires vcond{,u,eq} optabs by stopping to use them
> from the middle-end.  Targets instead (should) implement vcond_mask
> and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> possibly affected targets - those implementing these patterns,
> and in particular it lists mips, sparc and ia64 as targets that
> most definitely will regress while others might simply remove
> their vcond{,u,eq} patterns.
>
> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> I know riscv doesn't implement any of the legacy optabs.  But less
> maintained vector targets might need adjustments.
>
> I want to get rid of those optabs for GCC 15.  If I don't hear from
> you I will assume your target is fine.

Great!  Thanks for doing this.

Is there a plan for how we should handle vector comparisons that
have to be done as the inverse of the negated condition?  Should
targets simply not provide vec_cmp for such conditions and leave
the target-independent code to deal with the fallout?  (For a
standalone comparison, it would invert the result.  For a VEC_COND_EXPR
it would swap the true and false values.)

Richard

>
> Thanks,
> Richard.
>
>   PR middle-end/114189
>   * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing.
>   (get_vcond_eq_icode): Likewise.
> ---
>  gcc/optabs-query.h | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index 0cb2c21ba85..31fbce80175 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode 
> mask_mode)
> mode CMODE, unsigned if UNS is true, resulting in a value of mode VMODE.  
> */
>  
>  inline enum insn_code
> -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns)
> +get_vcond_icode (machine_mode, machine_mode, bool)
>  {
> -  enum insn_code icode = CODE_FOR_nothing;
> -  if (uns)
> -icode = convert_optab_handler (vcondu_optab, vmode, cmode);
> -  else
> -icode = convert_optab_handler (vcond_optab, vmode, cmode);
> -  return icode;
> +  return CODE_FOR_nothing;
>  }
>  
>  /* Return insn code for a conditional operator with a mask mode
> @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode 
> mmode)
> mode CMODE (only EQ/NE), resulting in a value of mode VMODE.  */
>  
>  inline enum insn_code
> -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode)
> +get_vcond_eq_icode (machine_mode, machine_mode)
>  {
> -  return convert_optab_handler (vcondeq_optab, vmode, cmode);
> +  return CODE_FOR_nothing;
>  }
>  
>  /* Enumerates the possible extraction_insn operations.  */

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Richard Biener

On Fri, 14 Jun 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following retires vcond{,u,eq} optabs by stopping to use them
> > from the middle-end.  Targets instead (should) implement vcond_mask
> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> > possibly affected targets - those implementing these patterns,
> > and in particular it lists mips, sparc and ia64 as targets that
> > most definitely will regress while others might simply remove
> > their vcond{,u,eq} patterns.
> >
> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> > I know riscv doesn't implement any of the legacy optabs.  But less
> > maintained vector targets might need adjustments.
> >
> > I want to get rid of those optabs for GCC 15.  If I don't hear from
> > you I will assume your target is fine.
> 
> Great!  Thanks for doing this.
> 
> Is there a plan for how we should handle vector comparisons that
> have to be done as the inverse of the negated condition?  Should
> targets simply not provide vec_cmp for such conditions and leave
> the target-independent code to deal with the fallout?  (For a
> standalone comparison, it would invert the result.  For a VEC_COND_EXPR
> it would swap the true and false values.)

I would expect that the ISEL pass which currently deals with finding
valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
So how do we deal with this right now?  I expect RTL expansion will
do the inverse trick, no?

Thanks,
Richard.

> Richard
> 
> >
> > Thanks,
> > Richard.
> >
> > PR middle-end/114189
> > * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing.
> > (get_vcond_eq_icode): Likewise.
> > ---
> >  gcc/optabs-query.h | 13 -
> >  1 file changed, 4 insertions(+), 9 deletions(-)
> >
> > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> > index 0cb2c21ba85..31fbce80175 100644
> > --- a/gcc/optabs-query.h
> > +++ b/gcc/optabs-query.h
> > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, machine_mode 
> > mask_mode)
> > mode CMODE, unsigned if UNS is true, resulting in a value of mode 
> > VMODE.  */
> >  
> >  inline enum insn_code
> > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns)
> > +get_vcond_icode (machine_mode, machine_mode, bool)
> >  {
> > -  enum insn_code icode = CODE_FOR_nothing;
> > -  if (uns)
> > -icode = convert_optab_handler (vcondu_optab, vmode, cmode);
> > -  else
> > -icode = convert_optab_handler (vcond_optab, vmode, cmode);
> > -  return icode;
> > +  return CODE_FOR_nothing;
> >  }
> >  
> >  /* Return insn code for a conditional operator with a mask mode
> > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, machine_mode 
> > mmode)
> > mode CMODE (only EQ/NE), resulting in a value of mode VMODE.  */
> >  
> >  inline enum insn_code
> > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode)
> > +get_vcond_eq_icode (machine_mode, machine_mode)
> >  {
> > -  return convert_optab_handler (vcondeq_optab, vmode, cmode);
> > +  return CODE_FOR_nothing;
> >  }
> >  
> >  /* Enumerates the possible extraction_insn operations.  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-14 Thread Frank Scheiner


Dear Jonathan, Jeff,

On 13.06.24 12:33, Jonathan Wakely wrote:

On Wed, 12 Jun 2024 at 22:00, Frank Scheiner  wrote:

Ok, I posted the results as created by contrib/test_summary now:

1. non-LRA version on [1]

2. LRA version on [2]

[1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html

[2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html


Thanks!

These ones are probably due to non-reserved names in glibc or kernel headers:

FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)

The errors for all three are probably the same and should be
decipherable from libstdc++.log which will show which names defined as
macros in names.cc are clashing with names in system headers.


Thanks for the hints. I looked into the log now and searched for the
respective tests. And the log excerpts for all three failing tests (also
attached with more context):

```
FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
Excess errors:
/usr/include/bits/sigcontext.h:37: error: expected unqualified-id before
';' token
/usr/include/bits/sigcontext.h:37: error: expected ')' before ';' token
/usr/include/sys/ucontext.h:44: error: expected unqualified-id before
';' token
/usr/include/sys/ucontext.h:44: error: expected ')' before ';' token
```

...point to two headers which are part of glibc 2.39 (w/ia64 support
re-added):

* /usr/include/bits/sigcontext.h:32-38:
```
32 struct __ia64_fpreg
33   {
34 union
35   {
36 unsigned long bits[2];
37   } u;
38   } __attribute__ ((__aligned__ (16)));
```

* /usr/include/sys/ucontext.h:39-45:
```
 39 struct __ia64_fpreg_mcontext
 40   {
 41 union
 42   {
 43 unsigned long __ctx(bits)[2];
 44   } __ctx(u);
 45   } __attribute__ ((__aligned__ (16)));
```

Cheers,
Frank
# Log excerpts for failing tests in libstdc++v3 for 5.0.0 20240528 (experimental) [master revision 236116068151bbc72aaaf53d0f223fe06f7e3bac] (GCC) w/LRA #

## FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors) ##

```
extra_tool_flags are:  -std=gnu++17   -D__GLIBCXX__=
Executing on host: /dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc/xg++ -shared-libgcc -B/dev/shm/gcc-15-lra/src.gcc.ia64-to
olchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc -nostdinc++ -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux
-gnu/libstdc++-v3/src -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/src/.libs -L/dev/shm/gcc-15
-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/libsupc++/.libs -B/usr/ia64-t2-linux-gnu/bin/ -B/usr/ia64-t2-linux
-gnu/lib/ -isystem /usr/ia64-t2-linux-gnu/include -isystem /usr/ia64-t2-linux-gnu/sys-include -fchecking=1 -B/usr/src/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.12334
6.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/./libstdc++-v3/src/.libs -fmessage-length=0 -fno-show-column -ffunction-sections -fdata-sections -g -O2 -fPIC -D_GNU_SOUR
CE -DLOCALEDIR="." -nostdinc++ -I/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/include/ia64-t2-li
nux-gnu -I/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/include -I/dev/shm/gcc-15-lra/src.gcc.ia6
4-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/libsupc++ -I/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/include/backward -I
/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/libstdc++-v3/testsuite/util  /usr/src/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/g
cc/libstdc++-v3/testsuite/17_intro/names.cc-std=gnu++17   -D__GLIBCXX__= -fdiagnostics-plain-output -S -o names.s(timeout = 360)
spawn -ignore SIGHUP /dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc/xg++ -shared-libgcc -B/dev/shm/gcc-15-lra/src.gcc.ia64-
toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/./gcc -nostdinc++ -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/src -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/src/.libs -L/dev/shm/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/libstdc++-v3/libsupc++/.libs -B/usr/ia64-t2-linux-gnu/bin/ -B/usr/ia64-t2-linux-gnu/lib/ -isystem /usr/ia64-t2-linux-gnu/include -isystem /usr/ia64-t2-linux-gnu/sys-include -fchecking=1 -B/usr/src/gcc-15-lra/src.gcc.ia64-toolchain-3.240529.123346.921189/gcc/gcc.build.lnx/ia64-t2-linux-gnu/./libstdc++-v3/src/.libs -fmessage-length=0 -fno-show-column -ffunction-se

[PATCH] Improve optimizer to avoid stack spill across pure function call

2024-06-14 Thread user202729

This patch was inspired from PR 110137. It reduces the amount of stack spilling 
by ensuring that more values are constant across a pure function call.

It does not add any new flag; rather, it makes the optimizer generate more 
optimal code.

For the added test file, the change is the following. As can be seen, the 
number of memory operations is cut in half (almost, because rbx = rdi also need 
to be saved in the "after" version).

Before:

_Z2ggO7MyClass:
.LFB653:
.cfi_startproc
sub rsp, 72
.cfi_def_cfa_offset 80
movdqu  xmm1, XMMWORD PTR [rdi]
movdqu  xmm0, XMMWORD PTR [rdi+16]
movaps  XMMWORD PTR [rsp+16], xmm1
movaps  XMMWORD PTR [rsp], xmm0
call_Z1fv
movdqa  xmm1, XMMWORD PTR [rsp+16]
movdqa  xmm0, XMMWORD PTR [rsp]
lea rdx, [rsp+32]
movaps  XMMWORD PTR [rsp+32], xmm1
movaps  XMMWORD PTR [rsp+48], xmm0
add rsp, 72
.cfi_def_cfa_offset 8
ret
.cfi_endproc

After:

_Z2ggO7MyClass:
.LFB653:
.cfi_startproc
pushrbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
mov rbx, rdi
sub rsp, 32
.cfi_def_cfa_offset 48
call_Z1fv
movdqu  xmm0, XMMWORD PTR [rbx]
movaps  XMMWORD PTR [rsp], xmm0
movdqu  xmm0, XMMWORD PTR [rbx+16]
movaps  XMMWORD PTR [rsp+16], xmm0
add rsp, 32
.cfi_def_cfa_offset 16
pop rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc

As explained in PR 110137, the reason I modify the RTL pass instead of the 
GIMPLE pass is that currently the code that handle the optimization is in the 
IRA.

The optimization involved is: rewrite

definition: a = something;
...
use a;

to move the definition statement right before the use statement, provided none 
of the statements inbetween modifies "something".

The existing code only handle the case where "something" is a memory reference 
with a fixed address. The patch modifies the logic to also allow memory 
reference whose address is not changed by the statements inbetween.

In order to do that the only way I can think of is to modify 
"validate_equiv_mem" to also validate the equivalence of the address, which may 
consist of a pseudo-register.

Nevertheless, reviews and suggestions to improve the code/explain how to 
implement it in GIMPLE phase would be appreciated.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

I think the test passes but there are some spurious failures with some 
scan-assembler-* tests, looking through it it doesn't appear to pessimize the 
code.

I think this also fix PR 103541 as a side effect, but I'm not sure if the 
generated code is optimal (it loads from the global variable twice, but then 
there's no readily usable caller-saved register so you need an additional 
memory operation anyway)
From 3d8d04373716e031242678821c3e9904b6ac51cb Mon Sep 17 00:00:00 2001
From: user202729 
Date: Mon, 20 May 2024 18:12:58 +0800
Subject: [PATCH] Allow moving init from dereference across pure function call

gcc/ChangeLog:

	* ira.cc (struct equiv_mem_data): Rename to equiv_data.
	(validate_equiv_mem_from_store): Rename to validate_equiv_from_store.
	(uses_any_clobbered_hard_reg_p): New function.
	(validate_equiv_mem): Rename to valid_equiv.
	(equiv_init_movable_p): Relax several conditions.
	(update_equiv_regs): Relax condition that the source must be mem.
	(add_store_equivs): Update changed function name.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avoid-stack-spill-pure-call.C: New test.
---
 gcc/ira.cc| 157 --
 .../i386/avoid-stack-spill-pure-call.C|  32 
 2 files changed, 140 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avoid-stack-spill-pure-call.C

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 5642aea3caa..1a1718a18c5 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3054,28 +3054,80 @@ struct equivalence
 static struct equivalence *reg_equiv;
 
 /* Used for communication between the following two functions.  */
-struct equiv_mem_data
+struct equiv_data
 {
-  /* A MEM that we wish to ensure remains unchanged.  */
-  rtx equiv_mem;
+  /* A rtx that we wish to ensure remains unchanged.  */
+  rtx equiv;
 
-  /* Set true if EQUIV_MEM is modified.  */
-  bool equiv_mem_modified;
+  /* Set true if EQUIV is modified.  */
+  bool equiv_modified;
 };
 
-/* If EQUIV_MEM is modified by modifying DEST, indicate that it is modified.
+/* If EQUIV is modified by modifying DEST, indicate that it is modified.
Called via note_stores.  */
 stati

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, 14 Jun 2024, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > The following retires vcond{,u,eq} optabs by stopping to use them
>> > from the middle-end.  Targets instead (should) implement vcond_mask
>> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
>> > possibly affected targets - those implementing these patterns,
>> > and in particular it lists mips, sparc and ia64 as targets that
>> > most definitely will regress while others might simply remove
>> > their vcond{,u,eq} patterns.
>> >
>> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
>> > I know riscv doesn't implement any of the legacy optabs.  But less
>> > maintained vector targets might need adjustments.
>> >
>> > I want to get rid of those optabs for GCC 15.  If I don't hear from
>> > you I will assume your target is fine.
>> 
>> Great!  Thanks for doing this.
>> 
>> Is there a plan for how we should handle vector comparisons that
>> have to be done as the inverse of the negated condition?  Should
>> targets simply not provide vec_cmp for such conditions and leave
>> the target-independent code to deal with the fallout?  (For a
>> standalone comparison, it would invert the result.  For a VEC_COND_EXPR
>> it would swap the true and false values.)
>
> I would expect that the ISEL pass which currently deals with finding
> valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
> So how do we deal with this right now?  I expect RTL expansion will
> do the inverse trick, no?

I think in practice (at least for the targets I've worked on),
the target's vec_cmp handles the inversion itself.  Thus the
main optimisation done by targets' vcond patterns is to avoid
the inversion (and instead swap the true/false values) when the
"opposite" comparison is the native one.

Thanks,
Richard

[PATCH 1/2] Add function filtering to gcov

2024-06-14 Thread Jørgen Kvalsvik

Add the --include and --exclude flags to gcov to control what functions
to report on. This is meant to make gcov more practical as an when
writing test suites or performing other coverage experiments, which
tends to focus on a few functions at the time. This really shines in
combination with the -t/--stdout flag. With support for more expansive
metrics in gcov like modified condition/decision coverage (MC/DC) and
path coverage, output quickly gets overwhelming without filtering.

The approach is quite simple: filters are egrep regexes and are
evaluated left-to-right, and the last filter "wins", that is, if a
function matches an --include and a subsequent --exclude, it should not
be included in the output. All of the output machinery works on the
function table, so by optionally (not) adding function makes the even
the json output work as expected, and only minor changes are needed to
suppress the filtered-out functions.

Demo: math.c

int mul (int a, int b) {
return a * b;
}

int sub (int a, int b) {
return a - b;
}

int sum (int a, int b) {
return a + b;
}

Plain matches:

$ gcov -t math --include=sum
-:0:Source:filter.c
-:0:Graph:filter.gcno
-:0:Data:-
-:0:Runs:0
#:9:int sum (int a, int b) {
#:   10:return a + b;

$ gcov -t math --include=mul
-:0:Source:filter.c
-:0:Graph:filter.gcno
-:0:Data:-
-:0:Runs:0
#:1:int mul (int a, int b) {
#:2:return a * b;

Regex match:

$ gcov -t math --include=su
-:0:Source:filter.c
-:0:Graph:filter.gcno
-:0:Data:-
-:0:Runs:0
#:5:int sub (int a, int b) {
#:6:return a - b;
-:7:}
#:9:int sum (int a, int b) {
#:   10:return a + b;

And similar for exclude:

$ gcov -t math --exclude=sum
-:0:Source:filter.c
-:0:Graph:filter.gcno
-:0:Data:-
-:0:Runs:0
#:1:int mul (int a, int b) {
#:2:return a * b;
-:3:}
#:5:int sub (int a, int b) {
#:6:return a - b;

And json, for good measure:

$ gcov -t math --include=sum --json | jq ".files[].lines[]"
{
  "line_number": 9,
  "function_name": "sum",
  "count": 0,
  "unexecuted_block": true,
  "block_ids": [],
  "branches": [],
  "calls": []
}
{
  "line_number": 10,
  "function_name": "sum",
  "count": 0,
  "unexecuted_block": true,
  "block_ids": [
2
  ],
  "branches": [],
  "calls": []
}

Note that the last function gets "clipped" when lines are associated to
functions, which means the closing brace is dropped from the report. I
hope this can be fixed, but considering it is not really a part of the
function body, the gcov report is "complete".

Matching generally work well for mangled names, as the mangled names
also have the base symbol name in it. By default, functions are matched
by the mangled name, which means matching on base names always work as
expected. The -M flag makes the matching work on the demangled name
which is quite useful when you only want to report on specific
overloads and can use the full type names.

Why not just use grep? grep is not really sufficient as grep is very
line oriented, and the reports that benefit the most from filtering
often unpredictably span multiple lines based on the state of coverage.
For example, a condition coverage report for 3 terms/6 outcomes only
outputs 1 line when all conditions are covered, and 7 with no lines
covered.

gcc/ChangeLog:

* doc/gcov.texi: Add --include, --exclude, --match-on-demangled
documentation.
* gcov.cc (struct fnfilter): New.
(print_usage): Add --include, --exclude, -M,
--match-on-demangled.
(process_args): Likewise.
(release_structures): Release filters.
(read_graph_file): Only add function_infos matching filters.
(output_lines): Likewise.

gcc/testsuite/ChangeLog:

* lib/gcov.exp: Add filtering test function.
* g++.dg/gcov/gcov-19.C: New test.
* g++.dg/gcov/gcov-20.C: New test.
* g++.dg/gcov/gcov-21.C: New test.
* gcc.misc-tests/gcov-25.c: New test.
* gcc.misc-tests/gcov-26.c: New test.
* gcc.misc-tests/gcov-27.c: New test.
* gcc.misc-tests/gcov-28.c: New test.
---
 gcc/doc/gcov.texi  |  27 ++
 gcc/gcov.cc| 128 +++--
 gcc/testsuite/g++.dg/gcov/gcov-19.C|  35 +++
 gcc/testsuite/g++.dg/gcov/gcov-20.C|  38 
 gcc/testsuite/g++.dg/gcov/gcov-21.C|  32 +++
 gcc/testsuite/gcc.misc-tests/gcov-25.c |  25 +
 gcc/testsuite/gcc.misc-tests/gcov-26.c |  25 +
 gcc/testsuite/gcc.misc-tests/gcov-27.c |  24 +
 gcc/testsuite/gcc.misc-tests/gcov-28.c |  22 +
 gcc/testsuite/lib/gcov.exp |  53 +-
 10 fil

[PATCH 2/2] Add examples on filteirng in gcov tutorial

2024-06-14 Thread Jørgen Kvalsvik

Add a section with some examples on the --include and --exclude flags in
the gcov tutorial.

gcc/ChangeLog:

* doc/gcov.texi: Add tutorial on function filtering.
---
 gcc/doc/gcov.texi | 86 +++
 1 file changed, 86 insertions(+)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index a9690bf5a69..dc79bccb8cf 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -933,6 +933,92 @@ the file doesn't match the executable (differing number of 
basic block
 counts) it will ignore the contents of the file.  It then adds in the
 new execution counts and finally writes the data to the file.
 
+You can report on a subset of functions by using @option{--include}
+and @option{--exclude}.  This is very useful when combined with
+@option{--stdout} trying to understand behavior and coverage for a
+particular function by running a test, looking at @command{gcov}
+output, testing another input, and running @command{gcov} again.
+
+@smallexample
+$ gcov -m --stdout --include inc tmp
+-:0:Source:tmp.cpp
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+   2*:8:  void inc () @{ b++; @}
+--
+Foo::inc():
+#:8:  void inc () @{ b++; @}
+--
+Foo::inc():
+2:8:  void inc () @{ b++; @}
+--
+@end smallexample
+
+@command{gcov} will match on mangled names by default, which you can
+control with the @option{-M} flag.  Note that matching and reporting
+are independent, so you can match on mangled names while printing
+demangled names, and vice versa.  To report on the @code{int}
+instantiation of @code{Foo} matching on mangled and demangled names:
+
+@smallexample
+$ gcov -t -m -M tmp --include 'Foo'
+-:0:Source:tmp.cpp
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+1:7:  Foo(): b (1000) @{@}
+2:8:  void inc () @{ b++; @}
+@end smallexample
+
+@smallexample
+$ gcov -t -m tmp --include 'FooIi'
+-:0:Source:tmp.cpp
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+1:7:  Foo(): b (1000) @{@}
+2:8:  void inc () @{ b++; @}
+@end smallexample
+
+The arguments to @option{--include} and @option{--exclude} are
+extended regular expressions (like @command{grep -E}), so the pattern
+@code{in.?} matches both @code{inc} and @code{main}.  If used with
+@option{-M} then all @code{int} instantiations of @code{Foo} would
+match too.  @option{--include} and @option{--exclude} can be used
+multiple times, and if a name matches multiple filters it is the last
+one to match which takes preference.  For example, to match
+@code{main} and the @code{int} instatiation of @code{inc}, while
+omitting the @code{Foo} constructor:
+
+@smallexample
+$ gcov -t -m -M --include in --exclude Foo --include '::inc' tmp
+-:0:Source:tmp.cpp
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+2:8:  void inc () @{ b++; @}
+1:   18:main (void)
+-:   19:@{
+-:   20:  int i, total;
+1:   21:  Foo counter;
+-:   22:
+1:   23:  counter.inc();
+1:   24:  counter.inc();
+1:   25:  total = 0;
+-:   26:
+   11:   27:  for (i = 0; i < 10; i++)
+   10:   28:total += i;
+-:   29:
+   1*:   30:  int v = total > 100 ? 1 : 2;
+-:   31:
+1:   32:  if (total != 45)
+#:   33:printf ("Failure\n");
+-:   34:  else
+1:   35:printf ("Success\n");
+1:   36:  return 0;
+@end smallexample
+
 @node Gcov and Optimization
 @section Using @command{gcov} with GCC Optimization
 
-- 
2.30.2

[PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-06-14 Thread Tamar Christina

Hi All,

When the patch for PR114074 was applied we saw a good boost in exchange2.

This boost was partially caused by a simplification of the addressing modes.
With the patch applied IV opts saw the following form for the base addressing;

  Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) l0_19(D) *
324) + 36)

vs what we normally get:

  Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) l0_19(D)
* 81) + 9) * 4

This is because the patch promoted multiplies where one operand is a constant
from a signed multiply to an unsigned one, to attempt to fold away the constant.

This patch attempts the same but due to the various problems with SCEV and
niters not being able to analyze the resulting forms (i.e. PR114322) we can't
do it during SCEV or in the general form like in fold-const like extract_muldiv
attempts.

Instead this applies the simplification during IVopts initialization when we
create the IV.  Essentially when we know the IV won't overflow with regards to
niters then we perform an affine fold which gets it to simplify the internal
computation, even if this is signed because we know that for IVOPTs uses the
IV won't ever overflow.  This allows IV opts to see the simplified form
without influencing the rest of the compiler.

as mentioned in PR114074 it would be good to fix the missed optimization in the
other passes so we can perform this in general.

The reason this has a big impact on fortran code is that fortran doesn't seem to
have unsigned integer types.  As such all it's addressing are created with
signed types and folding does not happen on them due to the possible overflow.

concretely on AArch64 this changes the results from generation:

mov x27, -108
mov x24, -72
mov x23, -36
add x21, x1, x0, lsl 2
add x19, x20, x22
.L5:
add x0, x22, x19
add x19, x19, 324
ldr d1, [x0, x27]
add v1.2s, v1.2s, v15.2s
str d1, [x20, 216]
ldr d0, [x0, x24]
add v0.2s, v0.2s, v15.2s
str d0, [x20, 252]
ldr d31, [x0, x23]
add v31.2s, v31.2s, v15.2s
str d31, [x20, 288]
bl  digits_20_
cmp x21, x19
bne .L5

into:

.L5:
ldr d1, [x19, -108]
add v1.2s, v1.2s, v15.2s
str d1, [x20, 216]
ldr d0, [x19, -72]
add v0.2s, v0.2s, v15.2s
str d0, [x20, 252]
ldr d31, [x19, -36]
add x19, x19, 324
add v31.2s, v31.2s, v15.2s
str d31, [x20, 288]
bl  digits_20_
cmp x21, x19
bne .L5

The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/114932
* tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned fold.

gcc/testsuite/ChangeLog:

PR tree-optimization/114932
* gfortran.dg/addressing-modes_1.f90: New test.

---
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_1.f90 
b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
new file mode 100644
index 
..334d5bc47a16e53e9168bb1f90dfeff584b4e494
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_1.f90
@@ -0,0 +1,37 @@
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast" }
+
+  module brute_force
+integer, parameter :: r=9
+ integer  block(r, r, 0)
+contains
+  subroutine brute
+ do
+  do
+  do
+   do
+do
+ do
+ do i7 = l0, 1
+   select case(1 )
+   case(1)
+   block(:2, 7:, 1) = block(:2, 7:, i7) - 1
+   end select
+do i8 = 1, 1
+   do i9 = 1, 1
+if(1 == 1) then
+call digits_20
+end if
+end do
+  end do
+end do
+end do
+  end do
+  end do
+   end do
+ end do
+  end do
+ end
+  end
+
+! { dg-final { scan-assembler-not {ldr\s+d([0-9]+),\s+\[x[0-9]+, x[0-9]+\]} } }
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 
4338d7b64a6c2df6404b8d5e51c7f62c23006e72..f621e4ee924b930e1e1d68e35f3d3a0d52470811
 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -1216,6 +1216,18 @@ alloc_iv (struct ivopts_data *data, tree base, tree step,
   ba

[PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

2024-06-14 Thread Tamar Christina

Hi All,

IVOPTS normally uses affine trees to perform comparisons between different IVs,
but these seem to have been missing in two key spots and instead normal tree
equivalencies used.

In some cases where we have a structural equivalence but not a signedness
equivalencies we end up generating both a signed and unsigned IV for the same
candidate.

This happens quite a lot with fortran but can also happen in C because this came
code is unable to figure out when one expression is a multiple of another.

As an example in the attached testcase we get:

Initial set of candidates:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

:
inv_expr 1: stride.3_27 * 4
inv_expr 2: (unsigned long) stride.3_27 * 4

These end up being used in the same group:

Group 1:
cand  costcompl.  inv.expr.   inv.vars
1 0   0   NIL;6
2 0   0   NIL;6
3 0   0   NIL;6

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 
58.2545), maybe hot
;;prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
;;pred:   28 [always]  count:23622320 (estimated locally, freq 6.4080) 
(FALLTHRU,EXECUTABLE)
;;25 [always]  count:191126046 (estimated locally, freq 
51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI 
  # ivtmp.28_90 = PHI 

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 
58.2545), maybe hot
;;prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
;;pred:   22 [always]  count:95443719 (estimated locally, freq 25.8909) 
(FALLTHRU)  

 ;;21 [33.3% (guessed)]  
count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)

 ;; 
   31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) 
(TRUE_VALUE,EXECUTABLE) 

  # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>   


   ivtmp.22_82 = ivtmp.22_41 + 
1;  


 ivtmp.26_72 = ivtmp.26_51 + _80;   


  ivtmp.28_98 = ivtmp.28_90 + _39;  

These two IVs are always used as unsigned, so IV ops generates:

  _73 = stride.3_27 * 4;
  _80 = (unsigned long) _73;
  _54 = (unsigned long) stride.3_27;
  _39 = _54 * 4;

Which means that in e.g. exchange2 we generate a lot of duplicate code.

This is because candidate 6 and 8 are structurally equivalent but have different
signs.

This patch changes it so that if you have two IVs that are affine equivalent to
just pick one over the other.  IV already has code for this, so the patch just
uses affine trees instead of tree for the check.

With it we get:

:
inv_expr 1: stride.3_27 * 4

:
Group 0:
  cand  costcompl.  inv.expr.   inv.vars
  5 0   2   NIL;NIL;
  6 0   3   NIL;NIL;

Group 1:
  cand  costcompl.  inv.expr.   inv.vars
  1 0   0   NIL;6
  2 0   0   NIL;6
  3 0   0   NIL;6
  4 0   0   NIL;6

Initial set of candidates:
  cost: 16 (complexity 3)
  reg_cost: 6
  can

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Richard Biener

On Fri, 14 Jun 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 14 Jun 2024, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > The following retires vcond{,u,eq} optabs by stopping to use them
> >> > from the middle-end.  Targets instead (should) implement vcond_mask
> >> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> >> > possibly affected targets - those implementing these patterns,
> >> > and in particular it lists mips, sparc and ia64 as targets that
> >> > most definitely will regress while others might simply remove
> >> > their vcond{,u,eq} patterns.
> >> >
> >> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> >> > I know riscv doesn't implement any of the legacy optabs.  But less
> >> > maintained vector targets might need adjustments.
> >> >
> >> > I want to get rid of those optabs for GCC 15.  If I don't hear from
> >> > you I will assume your target is fine.
> >> 
> >> Great!  Thanks for doing this.
> >> 
> >> Is there a plan for how we should handle vector comparisons that
> >> have to be done as the inverse of the negated condition?  Should
> >> targets simply not provide vec_cmp for such conditions and leave
> >> the target-independent code to deal with the fallout?  (For a
> >> standalone comparison, it would invert the result.  For a VEC_COND_EXPR
> >> it would swap the true and false values.)
> >
> > I would expect that the ISEL pass which currently deals with finding
> > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
> > So how do we deal with this right now?  I expect RTL expansion will
> > do the inverse trick, no?
> 
> I think in practice (at least for the targets I've worked on),
> the target's vec_cmp handles the inversion itself.  Thus the
> main optimisation done by targets' vcond patterns is to avoid
> the inversion (and instead swap the true/false values) when the
> "opposite" comparison is the native one.

I see.  I suppose whether or not vec_cmp is handled is determined
by a FAIL so it's somewhat difficult to determine this at ISEL time.

Meanwhile I've opened PR115490 with the x86 fallout from applying the
patch.

Richard.

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Richard Biener

On Fri, 14 Jun 2024, Richard Biener wrote:

> On Fri, 14 Jun 2024, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Fri, 14 Jun 2024, Richard Sandiford wrote:
> > >
> > >> Richard Biener  writes:
> > >> > The following retires vcond{,u,eq} optabs by stopping to use them
> > >> > from the middle-end.  Targets instead (should) implement vcond_mask
> > >> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> > >> > possibly affected targets - those implementing these patterns,
> > >> > and in particular it lists mips, sparc and ia64 as targets that
> > >> > most definitely will regress while others might simply remove
> > >> > their vcond{,u,eq} patterns.
> > >> >
> > >> > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> > >> > I know riscv doesn't implement any of the legacy optabs.  But less
> > >> > maintained vector targets might need adjustments.
> > >> >
> > >> > I want to get rid of those optabs for GCC 15.  If I don't hear from
> > >> > you I will assume your target is fine.
> > >> 
> > >> Great!  Thanks for doing this.
> > >> 
> > >> Is there a plan for how we should handle vector comparisons that
> > >> have to be done as the inverse of the negated condition?  Should
> > >> targets simply not provide vec_cmp for such conditions and leave
> > >> the target-independent code to deal with the fallout?  (For a
> > >> standalone comparison, it would invert the result.  For a VEC_COND_EXPR
> > >> it would swap the true and false values.)
> > >
> > > I would expect that the ISEL pass which currently deals with finding
> > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
> > > So how do we deal with this right now?  I expect RTL expansion will
> > > do the inverse trick, no?
> > 
> > I think in practice (at least for the targets I've worked on),
> > the target's vec_cmp handles the inversion itself.  Thus the
> > main optimisation done by targets' vcond patterns is to avoid
> > the inversion (and instead swap the true/false values) when the
> > "opposite" comparison is the native one.
> 
> I see.  I suppose whether or not vec_cmp is handled is determined
> by a FAIL so it's somewhat difficult to determine this at ISEL time.

I'll also note that we document vec_cmp{,u,eq} as having all zeros,
all ones for the result while vcond_mask might only care for the MSB
(it's documented to work on the result of a pre-computed vector
comparison).

So this eventually asks for targets to work out the optimal sequence
via combine helpers and thus eventually splitters to fixup invalid
compare operators late?

Richard.

[PATCH] Build/Cross: Look for target headers from include if sys-include doesn't exist

2024-06-14 Thread YunQiang Su

PR 115416

When we build a cross toolchain, while without --with-sysroot,
target headers are expected in
  ${test_exec_prefix}/${target_noncanonical}/sys-include
while it is true only with --with-headers option is used. In other
cases, the path should be
  ${test_exec_prefix}/${target_noncanonical}/include
such as Debian's cross toolchain.

Debian's cross toolchain has directory structures like:
   /usr//lib
/include
/bin/

For this case, we cannot use "--prefix=/usr --with-sysroot=/", as
gcc/configure will use headers of build, aka in /usr/include to detect
features.  And fixinclude also uses the headers of build.

Let's use the `include` if `sys-include` doesn't exist.

For Makefile.in, the compare @includedir@ and $(prefix)/include is not
correct, as the --includedir option is used to set where the headers
should be installed.

gcc:
PR 115415.
configure.ac: Set target_header_dir and CROSS_SYSTEM_HEADER_DIR
to ${test_exec_prefix}/${target_noncanonical}/include when cross
and without --with-sysroot and without --with-headers.
configure: Regenerate.
Makefile.in: Set CROSS_SYSTEM_HEADER_DIR as configure, and don't
compare @includedir@ and $(prefix)/include.
---
 gcc/Makefile.in  | 6 +-
 gcc/configure| 8 ++--
 gcc/configure.ac | 4 
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f5adb647d3f..349f988dc08 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -560,11 +560,7 @@ LINKER_PLUGIN_API_H = $(srcdir)/../include/plugin-api.h
 # Default native SYSTEM_HEADER_DIR, to be overridden by targets.
 NATIVE_SYSTEM_HEADER_DIR = @NATIVE_SYSTEM_HEADER_DIR@
 # Default cross SYSTEM_HEADER_DIR, to be overridden by targets.
-ifeq (@includedir@,$(prefix)/include)
-  CROSS_SYSTEM_HEADER_DIR = @CROSS_SYSTEM_HEADER_DIR@
-else
-  CROSS_SYSTEM_HEADER_DIR = @includedir@
-endif
+CROSS_SYSTEM_HEADER_DIR = @CROSS_SYSTEM_HEADER_DIR@
 
 # autoconf sets SYSTEM_HEADER_DIR to one of the above.
 # Purge it of unnecessary internal relative paths
diff --git a/gcc/configure b/gcc/configure
index aaf5899cc03..d11e97d1758 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -15124,6 +15124,10 @@ if test x$host != x$target || test 
"x$TARGET_SYSTEM_ROOT" != x ||
 target_header_dir="${with_build_sysroot}${native_system_header_dir}"
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
+if ! test -d ${target_header_dir};then
+  target_header_dir="${test_exec_prefix}/${target_noncanonical}/include"
+fi
+CROSS_SYSTEM_HEADER_DIR=${target_header_dir}
   elif test "x$with_sysroot" = xyes; then
 
target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-root${native_system_header_dir}"
   else
@@ -21410,7 +21414,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 21413 "configure"
+#line 21417 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -21516,7 +21520,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 21519 "configure"
+#line 21523 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index f8d67efeb98..54e6776747e 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2512,6 +2512,10 @@ if test x$host != x$target || test 
"x$TARGET_SYSTEM_ROOT" != x ||
 target_header_dir="${with_build_sysroot}${native_system_header_dir}"
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
+if test -d ${target_header_dir};then
+  target_header_dir="${test_exec_prefix}/${target_noncanonical}/include"
+fi
+CROSS_SYSTEM_HEADER_DIR=${target_header_dir}
   elif test "x$with_sysroot" = xyes; then
 
target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-root${native_system_header_dir}"
   else
-- 
2.39.3 (Apple Git-146)

[PATCHSET 14.2] Add documentation rules for Rust frontend

2024-06-14 Thread Arthur Cohen

Hi everyone,

This is a quick and simple patchset to our Makefile in order to have proper
documentation rules for the Rust frontend. Let me know if these changes would
be accepted into 14.2, or if we should rather integrate them to trunk for 15.1
as part of our upstreaming process.

If the changes are accepted for 14.2, I will be following them up with .texi
changes to improve the documentation of the Rust frontend.

Best,

Arthur

[PATCH 14.2 1/3] gccrs: Add base documentation for using the Rust
[PATCH 14.2 2/3] rust: Add rust.install-dvi and rust.install-html
[PATCH 14.2 3/3] rust: Copy install-html rule from Ada frontend

[PATCH 14.2 2/3] rust: Add rust.install-dvi and rust.install-html rules

2024-06-14 Thread Arthur Cohen

From: Christophe Lyon 

rust has the (empty) rust.dvi and rust.html rules, but lacks the
(empty) rust.install-dvi and rust.install-html ones.

2024-04-04  Christophe Lyon  

gcc/rust/
* Make-lang.in (rust.install-dvi, rust.install-html): New rules.
---
 gcc/rust/Make-lang.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index f0fc582bd26..2395541425f 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -374,6 +374,8 @@ selftest-rust-valgrind: $(RUST_SELFTEST_DEPS)
 # should have dependencies on info files that should be installed.
 rust.install-info:
 
+rust.install-dvi:
+rust.install-html:
 rust.install-pdf:
 
 # Install man pages for the front end. This target should ignore errors.
-- 
2.42.0

[PATCH 14.2 1/3] gccrs: Add base documentation for using the Rust frontend.

2024-06-14 Thread Arthur Cohen

gcc/rust/ChangeLog:

* Make-lang.in: Add documentation targets.
* gccrs.texi: New file.
---
 gcc/rust/Make-lang.in |  25 +++--
 gcc/rust/gccrs.texi   | 207 ++
 2 files changed, 225 insertions(+), 7 deletions(-)
 create mode 100644 gcc/rust/gccrs.texi

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index dd94c9b5eab..f0fc582bd26 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -283,27 +283,38 @@ rust.tags: force
 
 # Build documentation hooks.
 
+RUST_TEXI = \
+   rust/rust.texi
+
 # Build info documentation for the front end, in the build directory. This 
target is only called by
 # ‘make bootstrap’ if a suitable version of makeinfo is available, so does not 
need to check for this,
 # and should fail if an error occurs.
 rust.info:
+   if [ x$(BUILD_INFO) = xinfo ]; then \
+ $(MAKEINFO) $(MAKEINFOFLAGS) -I $(abs_docdir) -I 
$(abs_docdir)/include -o $@ $<; \
+   else \
+   true; \
+   fi
 
 rust.srcinfo:
 
 # Build DVI documentation for the front end, in the build directory. This 
should be done using
 # $(TEXI2DVI), with appropriate -I arguments pointing to directories of 
included files.
-rust.dvi:
+rust.dvi: $(RUST_TEXI)
+   $(TEXI2DVI) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $<
 
 # Build PDF documentation for the front end, in the build directory. This 
should be done using
 # $(TEXI2PDF), with appropriate -I arguments pointing to directories of 
included files.
-rust.pdf:
-
-doc/rust.info:
-doc/rust.dvi:
-doc/rust.pdf:
+rust.pdf: $(RUST_TEXI)
+   $(TEXI2PDF) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $<
 
 # Build HTML documentation for the front end, in the build directory.
-rust.html:
+rust.html: $(RUST_TEXI)
+   $(TEXI2HTML) -I $(abs_docdir) -I $(abs_docdir)/include -o $@ $<
+
+doc/rust.info: rust.info
+doc/rust.dvi: rust.dvi
+doc/rust.pdf: rust.pdf
 
 # Install hooks.
 
diff --git a/gcc/rust/gccrs.texi b/gcc/rust/gccrs.texi
new file mode 100644
index 000..52f055eaad8
--- /dev/null
+++ b/gcc/rust/gccrs.texi
@@ -0,0 +1,207 @@
+\input texinfo @c -*-texinfo-*-
+@setfilename gccrs.info
+@settitle The GNU Rust Compiler
+
+@c Merge the standard indexes into a single one.
+@syncodeindex fn cp
+@syncodeindex vr cp
+@syncodeindex ky cp
+@syncodeindex pg cp
+@syncodeindex tp cp
+
+@include gcc-common.texi
+
+@c Copyright years for this manual.
+@set copyrights-gccrs 2024
+
+@copying
+@c man begin COPYRIGHT
+Copyright @copyright{} @value{copyrights-gccrs} Free Software Foundation, Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, the Front-Cover Texts being (a) (see below), and
+with the Back-Cover Texts being (b) (see below).
+A copy of the license is included in the
+@c man end
+section entitled ``GNU Free Documentation License''.
+@ignore
+@c man begin COPYRIGHT
+man page gfdl(7).
+@c man end
+@end ignore
+
+@c man begin COPYRIGHT
+
+(a) The FSF's Front-Cover Text is:
+
+ A GNU Manual
+
+(b) The FSF's Back-Cover Text is:
+
+ You have freedom to copy and modify this GNU Manual, like GNU
+ software.  Copies published by the Free Software Foundation raise
+ funds for GNU development.
+@c man end
+@end copying
+
+@ifinfo
+@format
+@dircategory Software development
+@direntry
+* gccrs: (gccrs).   A GCC-based compiler for the Rust programming 
language
+@end direntry
+@end format
+
+@insertcopying
+@end ifinfo
+
+@titlepage
+@title The GNU Rust Compiler
+@versionsubtitle
+@author The gccrs team
+
+@page
+@vskip 0pt plus 1filll
+Published by the Free Software Foundation @*
+51 Franklin Street, Fifth Floor@*
+Boston, MA 02110-1301, USA@*
+@sp 1
+@insertcopying
+@end titlepage
+@contents
+@page
+
+@node Top
+@top Introduction
+
+This manual describes how to use @command{gccrs}, the GNU compiler for
+the Rust programming language.  This manual is specifically about
+@command{gccrs}.  For more information about the Rust programming
+language, see @uref{https//rust-lang.org}.
+
+@menu
+* Copying:: The GNU General Public License.
+* GNU Free Documentation License::
+How you can share and copy this manual.
+* Invoking gccrs::  How to run gccrs.
+* Index::   Index.
+@end menu
+
+
+@include gpl_v3.texi
+
+@include fdl.texi
+
+@node Invoking gccrs
+@chapter Invoking gccrs
+
+@c man title gccrs A GCC-based compiler for the Rust programming language
+
+@c man begin DESCRIPTION gccrs
+
+The @command{gccrs} command is a frontend to @command{gcc} and
+supports many of the same options, @xref{Option Summary, , Option
+Summary, gcc, Using the GNU Compiler Collection (GCC)}, as well
+as some @command{gccrs} specific ones. This manual only documents
+the options that are specific to @com

[PATCH 14.2 3/3] rust: Copy install-html rule from Ada frontend

2024-06-14 Thread Arthur Cohen

gcc/rust/ChangeLog:

* Make-lang.in: Add proper rust.install-html rule.
---
 gcc/rust/Make-lang.in | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index 2395541425f..815a8b315c0 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -375,7 +375,24 @@ selftest-rust-valgrind: $(RUST_SELFTEST_DEPS)
 rust.install-info:
 
 rust.install-dvi:
-rust.install-html:
+
+rust.install-html: $(build_htmldir)/rust
+   @$(NORMAL_INSTALL)
+   test -z "$(htmldir)" || $(mkinstalldirs) "$(DESTDIR)$(htmldir)"
+   @for p in $(build_htmldir)/rust; do \
+ if test -f "$$p" || test -d "$$p"; then d=""; else d="$(srcdir)/"; 
fi; \
+ f=$(html__strip_dir) \
+ if test -d "$$d$$p"; then \
+   echo " $(mkinstalldirs) '$(DESTDIR)$(htmldir)/$$f'"; \
+   $(mkinstalldirs) "$(DESTDIR)$(htmldir)/$$f" || exit 1; \
+   echo " $(INSTALL_DATA) '$$d$$p'/* '$(DESTDIR)$(htmldir)/$$f'"; \
+   $(INSTALL_DATA) "$$d$$p"/* "$(DESTDIR)$(htmldir)/$$f"; \
+ else \
+   echo " $(INSTALL_DATA) '$$d$$p' '$(DESTDIR)$(htmldir)/$$f'"; \
+   $(INSTALL_DATA) "$$d$$p" "$(DESTDIR)$(htmldir)/$$f"; \
+ fi; \
+   done
+
 rust.install-pdf:
 
 # Install man pages for the front end. This target should ignore errors.
-- 
2.42.0

[PATCH] Enhance if-conversion for automatic arrays

2024-06-14 Thread Richard Biener

Automatic arrays that are not address-taken should not be subject to
store data races.  This applies to OMP SIMD in-branch lowered
functions result array which for the testcase otherwise prevents
vectorization with SSE and for AVX and AVX512 ends up with spurious
.MASK_STORE to the stack surviving.

This inefficiency was noted in PR111793.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.  Is my
idea of store data races correct?  At least phiopt uses the same
check but for example LIM doesn't special-case locals.

PR tree-optimization/111793
* tree-if-conv.cc (ifcvt_memrefs_wont_trap): For stores
that do not trap only consider -fstore-data-races when
the underlying object is not automatic or has its address
taken.

* gcc.dg/vect/vect-simd-clone-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c | 16 
 gcc/tree-if-conv.cc| 13 +++--
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
new file mode 100644
index 000..49c52fb59bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+
+#pragma omp declare simd simdlen(4) inbranch
+__attribute__((noinline)) int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target 
i?86-*-* x86_64-*-* } } } */
+/* if-conversion shouldn't need to resort to masked stores for the result
+   array created by OMP lowering since that's automatic and does not have
+   its address taken.  */
+/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index c4c3ed41a44..974c614edf3 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -934,14 +934,23 @@ ifcvt_memrefs_wont_trap (gimple *stmt, 
vec drs)
   if (DR_IS_READ (a))
return true;
 
+  bool ok = flag_store_data_races;
+  base = get_base_address (base);
+  if (DECL_P (base)
+ && auto_var_in_fn_p (base, cfun->decl)
+ && ! may_be_aliased (base))
+   /* Automatic variables not aliased are not subject to
+  data races.  */
+   ok = true;
+
   /* an unconditionaly write won't trap if the base is written
  to unconditionally.  */
   if (base_master_dr
  && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
-   return flag_store_data_races;
+   return ok;
   /* or the base is known to be not readonly.  */
   else if (base_object_writable (DR_REF (a)))
-   return flag_store_data_races;
+   return ok;
 }
 
   return false;
-- 
2.35.3

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-14 Thread Jonathan Wakely

On Fri, 14 Jun 2024 at 12:07, Frank Scheiner  wrote:
>
> Dear Jonathan, Jeff,
>
> On 13.06.24 12:33, Jonathan Wakely wrote:
> > On Wed, 12 Jun 2024 at 22:00, Frank Scheiner  wrote:
> >> Ok, I posted the results as created by contrib/test_summary now:
> >>
> >> 1. non-LRA version on [1]
> >>
> >> 2. LRA version on [2]
> >>
> >> [1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817267.html
> >>
> >> [2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-June/817268.html
> >
> > Thanks!
> >
> > These ones are probably due to non-reserved names in glibc or kernel 
> > headers:
> >
> > FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
> > FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
> > FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)
> >
> > The errors for all three are probably the same and should be
> > decipherable from libstdc++.log which will show which names defined as
> > macros in names.cc are clashing with names in system headers.
>
> Thanks for the hints. I looked into the log now and searched for the
> respective tests. And the log excerpts for all three failing tests (also
> attached with more context):
>
> ```
> FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
> Excess errors:
> /usr/include/bits/sigcontext.h:37: error: expected unqualified-id before
> ';' token
> /usr/include/bits/sigcontext.h:37: error: expected ')' before ';' token
> /usr/include/sys/ucontext.h:44: error: expected unqualified-id before
> ';' token
> /usr/include/sys/ucontext.h:44: error: expected ')' before ';' token
> ```
>
> ...point to two headers which are part of glibc 2.39 (w/ia64 support
> re-added):
>
> * /usr/include/bits/sigcontext.h:32-38:
> ```
> 32 struct __ia64_fpreg
> 33   {
> 34 union
> 35   {
> 36 unsigned long bits[2];
> 37   } u;
> 38   } __attribute__ ((__aligned__ (16)));
> ```
>
> * /usr/include/sys/ucontext.h:39-45:
> ```
>   39 struct __ia64_fpreg_mcontext
>   40   {
>   41 union
>   42   {
>   43 unsigned long __ctx(bits)[2];
>   44   } __ctx(u);
>   45   } __attribute__ ((__aligned__ (16)));
> ```

This is an ia64-specific glibc bug, because it means the following
valid C program is rejected:

#define _GNU_SOURCE
#define bits 111
#define u no u
#include 
int main() { }

[PATCH] analyzer: Fix g++ 4.8 bootstrap without using std::move to return std::unique_ptr

2024-06-14 Thread Jonathan Wakely

This builds successfully using both gcc-4.8.5 and gcc-13.3.1 as host
compiler.

OK for trunk if testing succeeds?

-- >8 --

Revert the changes in r15--ge22b7f741ab54f and fix bootstrap with
GCC 4.8 a different way. The original problem is not related to C++17
guaranteed copy elision, it's related to Core DR 1579 [1], which was
part of C++14 but only implemented in G++ as a C++11 DR with
r5-1576-gfb682f9458c6cf (so GCC 4.8 doesn't implement it).

The original fix causes -Wredundant-move warnings with GCC trunk.

[1] https://cplusplus.github.io/CWG/issues/1579.html

gcc/analyzer/ChangeLog
* constraint-manager.cc (equiv_class::make_dump_widget): Change
return type to match return value and do not use std::move on
return value.
(bounded_ranges_constraint::make_dump_widget): Likewise.
(constraint_manager::make_dump_widget): Likewise.
* constraint-manager.h (equiv_class::make_dump_widget): Change
return type.
(bounded_ranges_constraint::make_dump_widget): Likewise.
(constraint_manager::make_dump_widget): Likewise.
* program-state.cc (sm_state_map::make_dump_widget): Likewise.
(program_state::make_dump_widget): Likewise.
* program-state.h (sm_state_map::make_dump_widget): Likewise.
(program_state::make_dump_widget): Likewise.
* region-model.cc (region_to_value_map::make_dump_widget): Likewise.
(region_model::make_dump_widget): Likewise.
* region-model.h (region_to_value_map::make_dump_widget): Likewise.
(region_model::make_dump_widget): Likewise.
* region.cc (region::make_dump_widget): Likewise.
* region.h (region::make_dump_widget): Likewise.
* store.cc (binding_cluster::make_dump_widget): Likewise.
(store::make_dump_widget): Likewise.
* store.h (binding_cluster::make_dump_widget): Likewise.
(store::make_dump_widget): Likewise.
* svalue.cc (svalue::make_dump_widget): Likewise.
* svalue.h (svalue::make_dump_widget): Likewise.
---
 gcc/analyzer/constraint-manager.cc | 12 ++--
 gcc/analyzer/constraint-manager.h  |  6 +++---
 gcc/analyzer/program-state.cc  |  8 
 gcc/analyzer/program-state.h   |  4 ++--
 gcc/analyzer/region-model.cc   |  8 
 gcc/analyzer/region-model.h|  4 ++--
 gcc/analyzer/region.cc |  4 ++--
 gcc/analyzer/region.h  |  2 +-
 gcc/analyzer/store.cc  |  8 
 gcc/analyzer/store.h   |  4 ++--
 gcc/analyzer/svalue.cc |  4 ++--
 gcc/analyzer/svalue.h  |  2 +-
 12 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index a9d58c9cdcf..29539060ebd 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -1146,7 +1146,7 @@ equiv_class::to_json () const
   return ec_obj;
 }
 
-std::unique_ptr
+std::unique_ptr
 equiv_class::make_dump_widget (const text_art::dump_widget_info &dwi,
   unsigned id) const
 {
@@ -1176,7 +1176,7 @@ equiv_class::make_dump_widget (const 
text_art::dump_widget_info &dwi,
   ec_widget->add_child (tree_widget::make (dwi, &pp));
 }
 
-  return std::move (ec_widget);
+  return ec_widget;
 }
 
 /* Generate a hash value for this equiv_class.
@@ -1491,7 +1491,7 @@ bounded_ranges_constraint::to_json () const
   return con_obj;
 }
 
-std::unique_ptr
+std::unique_ptr
 bounded_ranges_constraint::
 make_dump_widget (const text_art::dump_widget_info &dwi) const
 {
@@ -1500,7 +1500,7 @@ make_dump_widget (const text_art::dump_widget_info &dwi) 
const
 (tree_widget::from_fmt (dwi, nullptr,
"ec%i bounded ranges", m_ec_id.as_int ()));
   m_ranges->add_to_dump_widget (*brc_widget.get (), dwi);
-  return std::move (brc_widget);
+  return brc_widget;
 }
 
 bool
@@ -1829,7 +1829,7 @@ constraint_manager::to_json () const
   return cm_obj;
 }
 
-std::unique_ptr
+std::unique_ptr
 constraint_manager::make_dump_widget (const text_art::dump_widget_info &dwi) 
const
 {
   using text_art::tree_widget;
@@ -1853,7 +1853,7 @@ constraint_manager::make_dump_widget (const 
text_art::dump_widget_info &dwi) con
   if (cm_widget->get_num_children () == 0)
 return nullptr;
 
-  return std::move (cm_widget);
+  return cm_widget;
 }
 
 /* Attempt to add the constraint LHS OP RHS to this constraint_manager.
diff --git a/gcc/analyzer/constraint-manager.h 
b/gcc/analyzer/constraint-manager.h
index 31556aebc7a..81e9c7ec035 100644
--- a/gcc/analyzer/constraint-manager.h
+++ b/gcc/analyzer/constraint-manager.h
@@ -273,7 +273,7 @@ public:
 
   json::object *to_json () const;
 
-  std::unique_ptr
+  std::unique_ptr
   make_dump_widget (const text_art::dump_widget_info &dwi,
unsigned id) const;
 
@@ -408,7 +408,7 @@ public:
 
   void add_to_hash (inchash::hash *hstate) const;
 
-  std::unique_ptr
+  std::un

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-14 Thread Frank Scheiner


On 14.06.24 14:53, Jonathan Wakely wrote:

On Fri, 14 Jun 2024 at 12:07, Frank Scheiner  wrote:

...point to two headers which are part of glibc 2.39 (w/ia64 support
re-added):

* /usr/include/bits/sigcontext.h:32-38:
```
32 struct __ia64_fpreg
33   {
34 union
35   {
36 unsigned long bits[2];
37   } u;
38   } __attribute__ ((__aligned__ (16)));
```

* /usr/include/sys/ucontext.h:39-45:
```
   39 struct __ia64_fpreg_mcontext
   40   {
   41 union
   42   {
   43 unsigned long __ctx(bits)[2];
   44   } __ctx(u);
   45   } __attribute__ ((__aligned__ (16)));
```


This is an ia64-specific glibc bug, because it means the following
valid C program is rejected:

#define _GNU_SOURCE
#define bits 111
#define u no u
#include 
int main() { }


Ok, I see. Should this then be "worked around" like in [1] with
something like:

```
#if defined (__linux__) && defined (__ia64__)
//  defines __ia64_fpreg_mcontext::u
#undef u
#endif
```

...in `libstdc++-v3/testsuite/17_intro/names.cc` or should we fix it in
glibc instead?

[1]:
https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=cf5f7791056b3ed993bc8024be767a86157514a9

Cheers,
Frank

RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-06-14 Thread Li, Pan2

Hi Richard,

Here is one PR related to this patch (by git bisect), details as below.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

I am still trying to narrow down which change caused this failure or any hints 
here?

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 15, 2024 5:39 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9....e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change doesn't fix the testcase in the PR on itself.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

PR tree-optimization/114589
* tree-ssa-sink.cc (select_best_block): Remove profile-based
heuristics.  Instead reject sink locations that sink
to post-dominators.  Move empty latch check here from
statement_sink_location.  Also consider early_bb for the
loop depth check.
(statement_sink_location): Remove superfluous check.  Remove
empty latch check.
(pass_sink_code::execute): Compute/release post-dominators.

* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
excess precision when not spilling.
---
 gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
 gcc/tree-ssa-sink.cc | 62 
 2 files changed, 20 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 
b/gcc/testsuite/gfortran.dg/streamio_9.f90
index b6bddb973f8..f29ded6ba54 100644
--- a/gcc/testsuite/gfortran.dg/streamio_9.f90
+++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-options "-ffloat-store" }
 ! PR29053 Stream IO test 9.
 ! Contributed by Jerry DeLisle .
 ! Test case derived from that given in PR by Steve Kargl.
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2f90acb7ef4..2188b7523c7 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 
We want the most control dependent block in the shallowest loop nest.
 
-   If the resulting block is in a shallower loop nest, then use it.  Else
-   only use the resulting block if it has significantly lower execution
-   frequency than EARLY_BB to avoid gratuitous statement movement.  We
-   consider statements with VOPS more desirable to move.
-
-   This pass would obviously benefit from PDO as it utilizes block
-   frequencies.  It would also benefit from recomputing frequencies
-   if profile data is not available since frequencies often get out
-   of sync with reality.  */
+   If the resulting block is in a shallower loop nest, then use it.  */
 
 static basic_block
 select_best_block (basic_block early_bb,
@@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
 {
+  /* Walk up the dominator tree, hopefully we'll find a shallower
+loop nest.  */
+  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
-
-  /* Walk up the dominator tree, hopefully we'll find a shallower
-loop nest.  */
-  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
 
   /* Placing a statement before a setjmp-like function would be invalid
@@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
 return best_bb;
 
+  /* Do not move stmts to post-dominating places on the same loop depth.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
+return early_bb;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+ something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+  && empty_block_p (best_bb))
+return early_bb;
+
   /* Avoid turning an unconditional read into a conditional one when we
  still might want to perform vectorization.  */
   if (best_bb->loop_father == early_bb->loop_father
@@ -233,28 +234,7 @@ se

RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-06-14 Thread Richard Biener

On Fri, 14 Jun 2024, Li, Pan2 wrote:

> Hi Richard,
> 
> Here is one PR related to this patch (by git bisect), details as below.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
> 
> I am still trying to narrow down which change caused this failure or any 
> hints here?

It definitely looks like a latent issue being triggered.  Either in LRA
or in how the target presents itself.

Richard.

> Pan
> 
> -Original Message-
> From: Richard Biener  
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/114589 - remove profile based sink 
> heuristics
> 
> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9....e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.
> 
>   PR tree-optimization/114589
>   * tree-ssa-sink.cc (select_best_block): Remove profile-based
>   heuristics.  Instead reject sink locations that sink
> to post-dominators.  Move empty latch check here from
>   statement_sink_location.  Also consider early_bb for the
>   loop depth check.
>   (statement_sink_location): Remove superfluous check.  Remove
>   empty latch check.
>   (pass_sink_code::execute): Compute/release post-dominators.
> 
>   * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
>   excess precision when not spilling.
> ---
>  gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
>  gcc/tree-ssa-sink.cc | 62 
>  2 files changed, 20 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 
> b/gcc/testsuite/gfortran.dg/streamio_9.f90
> index b6bddb973f8..f29ded6ba54 100644
> --- a/gcc/testsuite/gfortran.dg/streamio_9.f90
> +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-options "-ffloat-store" }
>  ! PR29053 Stream IO test 9.
>  ! Contributed by Jerry DeLisle .
>  ! Test case derived from that given in PR by Steve Kargl.
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2f90acb7ef4..2188b7523c7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>  
> We want the most control dependent block in the shallowest loop nest.
>  
> -   If the resulting block is in a shallower loop nest, then use it.  Else
> -   only use the resulting block if it has significantly lower execution
> -   frequency than EARLY_BB to avoid gratuitous statement movement.  We
> -   consider statements with VOPS more desirable to move.
> -
> -   This pass would obviously benefit from PDO as it utilizes block
> -   frequencies.  It would also benefit from recomputing frequencies
> -   if profile data is not available since frequencies often get out
> -   of sync with reality.  */
> +   If the resulting block is in a shallower loop nest, then use it.  */
>  
>  static basic_block
>  select_best_block (basic_block early_bb,
> @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
>  {
>basic_block best_bb = late_bb;
>basic_block temp_bb = late_bb;
> -  int threshold;
>  
>while (temp_bb != early_bb)
>  {
> +  /* Walk up the dominator tree, hopefully we'll find a shallower
> +  loop nest.  */
> +  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +
>/* If we've moved into a lower loop nest, then that becomes
>our best block.  */
>if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>   best_bb = temp_bb;
> -
> -  /* Walk up the dominator tree, hopefully we'll find a shallower
> -  loop nest.  */
> -  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>  }
>  
>/* Placing a statement before a setjmp-like function would be invalid
> @@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
>if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
>  return best_bb;
>  
> +  /* Do not move stmts to post-dominating places on the same loop depth.  */
> +  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
> +return early_bb;
> +
> +  /* If the latch block is empty, do

RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-06-14 Thread Li, Pan2

> It definitely looks like a latent issue being triggered.  Either in LRA
> or in how the target presents itself.

Thanks Richard, will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 9:11 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH] tree-optimization/114589 - remove profile based sink 
heuristics

On Fri, 14 Jun 2024, Li, Pan2 wrote:

> Hi Richard,
> 
> Here is one PR related to this patch (by git bisect), details as below.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
> 
> I am still trying to narrow down which change caused this failure or any 
> hints here?

It definitely looks like a latent issue being triggered.  Either in LRA
or in how the target presents itself.

Richard.

> Pan
> 
> -Original Message-
> From: Richard Biener  
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/114589 - remove profile based sink 
> heuristics
> 
> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9....e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.
> 
>   PR tree-optimization/114589
>   * tree-ssa-sink.cc (select_best_block): Remove profile-based
>   heuristics.  Instead reject sink locations that sink
> to post-dominators.  Move empty latch check here from
>   statement_sink_location.  Also consider early_bb for the
>   loop depth check.
>   (statement_sink_location): Remove superfluous check.  Remove
>   empty latch check.
>   (pass_sink_code::execute): Compute/release post-dominators.
> 
>   * gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
>   excess precision when not spilling.
> ---
>  gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
>  gcc/tree-ssa-sink.cc | 62 
>  2 files changed, 20 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 
> b/gcc/testsuite/gfortran.dg/streamio_9.f90
> index b6bddb973f8..f29ded6ba54 100644
> --- a/gcc/testsuite/gfortran.dg/streamio_9.f90
> +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-options "-ffloat-store" }
>  ! PR29053 Stream IO test 9.
>  ! Contributed by Jerry DeLisle .
>  ! Test case derived from that given in PR by Steve Kargl.
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2f90acb7ef4..2188b7523c7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>  
> We want the most control dependent block in the shallowest loop nest.
>  
> -   If the resulting block is in a shallower loop nest, then use it.  Else
> -   only use the resulting block if it has significantly lower execution
> -   frequency than EARLY_BB to avoid gratuitous statement movement.  We
> -   consider statements with VOPS more desirable to move.
> -
> -   This pass would obviously benefit from PDO as it utilizes block
> -   frequencies.  It would also benefit from recomputing frequencies
> -   if profile data is not available since frequencies often get out
> -   of sync with reality.  */
> +   If the resulting block is in a shallower loop nest, then use it.  */
>  
>  static basic_block
>  select_best_block (basic_block early_bb,
> @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
>  {
>basic_block best_bb = late_bb;
>basic_block temp_bb = late_bb;
> -  int threshold;
>  
>while (temp_bb != early_bb)
>  {
> +  /* Walk up the dominator tree, hopefully we'll find a shallower
> +  loop nest.  */
> +  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +
>/* If we've moved into a lower loop nest, then that becomes
>our best block.  */
>if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>   best_bb = temp_bb;
> -
> -  /* Walk up the dominator tree, hopefully we'll find a shallower
> -  loop nest.  */
> -  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>  }
>  
>/* Placing a statement before a setjmp-like fu

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-14 Thread Jonathan Wakely

On Fri, 14 Jun 2024 at 14:07, Frank Scheiner  wrote:
>
> On 14.06.24 14:53, Jonathan Wakely wrote:
> > On Fri, 14 Jun 2024 at 12:07, Frank Scheiner  wrote:
> >> ...point to two headers which are part of glibc 2.39 (w/ia64 support
> >> re-added):
> >>
> >> * /usr/include/bits/sigcontext.h:32-38:
> >> ```
> >> 32 struct __ia64_fpreg
> >> 33   {
> >> 34 union
> >> 35   {
> >> 36 unsigned long bits[2];
> >> 37   } u;
> >> 38   } __attribute__ ((__aligned__ (16)));
> >> ```
> >>
> >> * /usr/include/sys/ucontext.h:39-45:
> >> ```
> >>39 struct __ia64_fpreg_mcontext
> >>40   {
> >>41 union
> >>42   {
> >>43 unsigned long __ctx(bits)[2];
> >>44   } __ctx(u);
> >>45   } __attribute__ ((__aligned__ (16)));
> >> ```
> >
> > This is an ia64-specific glibc bug, because it means the following
> > valid C program is rejected:
> >
> > #define _GNU_SOURCE
> > #define bits 111
> > #define u no u
> > #include 
> > int main() { }
>
> Ok, I see. Should this then be "worked around" like in [1] with
> something like:
>
> ```
> #if defined (__linux__) && defined (__ia64__)
> //  defines __ia64_fpreg_mcontext::u
> #undef u
> #endif
> ```
>
> ...in `libstdc++-v3/testsuite/17_intro/names.cc` or should we fix it in
> glibc instead?

Both, ideally. The libstdc++ test should definitely be fixed because
it fails with released versions of glibc already in the wild. But
glibc should also be fixed because it's a standards conformance issue.


>
> [1]:
> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=cf5f7791056b3ed993bc8024be767a86157514a9
>
> Cheers,
> Frank
>

Re: [PING^2] Re: [PATCH v7 1/9] Improve must tail in RTL backend

2024-06-14 Thread Andi Kleen

Andi Kleen  writes:

PING^2

> Need reviewers for the tree and middle-end parts, as well as the C frontend.
>
> Thanks!
>
> -Andi

Re: [PATCH v4] c++: fix constained auto deduction in templ spec scopes [PR114915]

2024-06-14 Thread sska1377

On Wed, 2024-05-22 at 16:30 -0400, Jason Merrill wrote:
> OK, on the right patch this time I hope.
> 
> Looks like you still need either FSF copyright assignment or DCO 
> certification per https://gcc.gnu.org/contribute.html#legal
> 

Hi. Thanks for your patience. I now have the FSF copyright assignment.

> On 5/15/24 13:27, Seyed Sajad Kahani wrote:
> > This patch resolves PR114915 by replacing the logic that fills in
> > the
> > missing levels in do_auto_deduction in cp/pt.cc.
> 
> I miss the text in your original patch that explained the problem
> more.
> 

The patch that I will send in the upcoming email contains the
explanation from the v1 patch as well.

> > The new approach now trims targs if the depth of targs is deeper
> > than desired
> > (this will only happen in specific contexts), and still fills targs
> > with empty
> > layers if it has fewer depths than expected.
> > 
> > PR c++/114915
> 
> This line needs to start with a tab.
> 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (do_auto_deduction): Handle excess outer template
> > arguments during constrained auto satisfaction.
> 
> This one, too.  These issues are flagged by git gcc-verify, and are 
> easier to avoid with git gcc-commit-mklog.
> 

Thanks for guiding me. The commit is now verified.

> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-placeholder14.C: New test.
> > * g++.dg/cpp2a/concepts-placeholder15.C: New test.
> 
> This test still needs a variable template partial specialization.
> 

Regarding this, I have added a simple variable template parameter
specialization to the test, but due to PR c++/115030, which I will be
working on right after this patch, a more complex test will fail.

> A few coding style nits below.
> 
> > * g++.dg/cpp2a/concepts-placeholder16.C: New test.
> > ---
> >   gcc/cp/pt.cc  | 20 ---
> >   .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +++
> >   .../g++.dg/cpp2a/concepts-placeholder15.C | 15 +
> >   .../g++.dg/cpp2a/concepts-placeholder16.C | 33
> > +++
> >   4 files changed, 83 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-
> > placeholder14.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-
> > placeholder15.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-
> > placeholder16.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 32640f8e9..ecfda67aa 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -31253,6 +31253,19 @@ do_auto_deduction (tree type, tree init,
> > tree auto_node,
> >     full_targs = add_outermost_template_args (tmpl,
> > full_targs);
> >     full_targs = add_to_template_args (full_targs, targs);
> >   
> > +  int want = TEMPLATE_TYPE_ORIG_LEVEL (auto_node);
> > +  int have = TMPL_ARGS_DEPTH (full_targs);
> > +
> > +  if (want < have)
> > +   {
> > +     // if a constrained auto is declared in an explicit
> > specialization
> 
> We generally use C-style /* */ comments, that start with a capital 
> letter and end with a period.
> 
> > +     gcc_assert (context == adc_variable_type || context ==
> > adc_return_type
> > +     || context == adc_decomp_type);
> 
> The || should line up with the 'c' on the previous line.
> 
> > +     tree trimmed_full_args = get_innermost_template_args
> > +       (full_targs, want);
> 
> We try to avoid having arguments to the left of the function name;
> here 
> I'd start the new line with the = instead.
> 
> > +     full_targs = trimmed_full_args;
> > +   }
> > +
> 
> Unnecessary tab on this line.
> 
> >     /* HACK: Compensate for callers not always communicating
> > all levels of
> >      outer template arguments by filling in the outermost
> > missing levels
> >      with dummy levels before checking satisfaction.  We'll
> > still crash
> > @@ -31260,11 +31273,10 @@ do_auto_deduction (tree type, tree init,
> > tree auto_node,
> >      these missing levels, but this hack otherwise allows us
> > to handle a
> >      large subset of possible constraints (including all non-
> > dependent
> >      constraints).  */
> > -  if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL
> > (auto_node)
> > -   - TMPL_ARGS_DEPTH (full_targs)))
> > +  if (want > have)
> >     {
> > -     tree dummy_levels = make_tree_vec (missing_levels);
> > -     for (int i = 0; i < missing_levels; ++i)
> > +     tree dummy_levels = make_tree_vec (want - have);
> > +     for (int i = 0; i < want - have; ++i)
> >         TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0);
> >       full_targs = add_to_template_args (dummy_levels,
> > full_targs);
> >     }
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
> > new file mode 100644
> > index 0..fcdbd7608
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
> > @@ -0,0 +1

[PATCH v5] c++: fix constained auto deduction in templ spec scopes [PR114915]

2024-06-14 Thread Seyed Sajad Kahani

When deducing auto for `adc_return_type`, `adc_variable_type`, and
`adc_decomp_type` contexts (at the usage time), we try to resolve the outermost
template arguments to be used for satisfaction. This is done by one of the
following, depending on the scope:

1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and
extracting DECL_TI_ARGS from it for function scope deductions (pt.cc:31236).
2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with other
conditions) for non-function scope variable declaration deductions
(decl.cc:8527).

Then, we do not retrieve the deeper layers of the template arguments; instead,
we fill the missing levels with dummy levels (pt.cc:31260).

The problem (that is shown in PR114915) is that we do not consider the case
where the deduction happens in a template specialization scope. In this case,
the type is not dependent on the outermost template arguments (which are
the specialization arguments). Yet, we still resolve the outermost template
arguments, and then the number of layers in the template arguments exceeds the
number of levels in the type. This causes the missing levels to be negative.
This leads to the rejection of valid code and ICEs (like segfault) in the
release mode. In the debug mode, it is possible to show as an assertion failure
(when creating a tree_vec with a negative size).

This patch resolves PR114915 by replacing the logic that fills in the
missing levels in do_auto_deduction in cp/pt.cc.
The new approach now trims targs if the depth of targs is deeper than desired
(this will only happen in specific contexts), and still fills targs with empty
layers if it has fewer depths than expected.

PR c++/114915

gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): Handle excess outer template
arguments during constrained auto satisfaction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-placeholder14.C: New test.
* g++.dg/cpp2a/concepts-placeholder15.C: New test.
* g++.dg/cpp2a/concepts-placeholder16.C: New test.
---
 gcc/cp/pt.cc  | 20 ---
 .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +++
 .../g++.dg/cpp2a/concepts-placeholder15.C | 26 +++
 .../g++.dg/cpp2a/concepts-placeholder16.C | 33 +++
 4 files changed, 94 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 32640f8e9..2206d9ffe 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -31253,6 +31253,19 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
full_targs = add_outermost_template_args (tmpl, full_targs);
   full_targs = add_to_template_args (full_targs, targs);
 
+  int want = TEMPLATE_TYPE_ORIG_LEVEL (auto_node);
+  int have = TMPL_ARGS_DEPTH (full_targs);
+
+  if (want < have)
+   {
+ /* If a constrained auto is declared in an explicit specialization. */
+ gcc_assert (context == adc_variable_type || context == adc_return_type
+ || context == adc_decomp_type);
+ tree trimmed_full_args 
+   = get_innermost_template_args (full_targs, want);
+ full_targs = trimmed_full_args;
+   }
+
   /* HACK: Compensate for callers not always communicating all levels of
 outer template arguments by filling in the outermost missing levels
 with dummy levels before checking satisfaction.  We'll still crash
@@ -31260,11 +31273,10 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
 these missing levels, but this hack otherwise allows us to handle a
 large subset of possible constraints (including all non-dependent
 constraints).  */
-  if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL (auto_node)
-   - TMPL_ARGS_DEPTH (full_targs)))
+  if (want > have)
{
- tree dummy_levels = make_tree_vec (missing_levels);
- for (int i = 0; i < missing_levels; ++i)
+ tree dummy_levels = make_tree_vec (want - have);
+ for (int i = 0; i < want - have; ++i)
TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0);
  full_targs = add_to_template_args (dummy_levels, full_targs);
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
new file mode 100644
index 0..fcdbd7608
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
@@ -0,0 +1,19 @@
+// PR c++/114915
+// { dg-do compile { target c++20 } }
+
+template
+concept C = __is_same(T, int);
+
+template
+void f() {
+}
+
+template<>
+void f() {
+  C auto x = 1;
+}
+
+int main() {
+  f();
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholde

RE: [PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-14 Thread Li, Pan2

Committed with those changes and test suites passed.

Pan

-Original Message-
From: Li, Pan2 
Sent: Friday, June 14, 2024 4:15 PM
To: Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: RE: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

Thanks Richard for comments.

> :c shouldn't be necessary on the plus
> or on the bit_xor
> OK with those changes.

Will remove the :c and commit it if there is no surprise from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 14, 2024 4:05 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support more forms for the scalar unsigned 
.SAT_SUB

On Wed, Jun 12, 2024 at 2:38 PM  wrote:
>
> From: Pan Li 
>
> After we support the scalar unsigned form 1 and 2,  we would like
> to introduce more forms include the branch and branchless.  There
> are forms 3-10 list as below:
>
> Form 3:
>   #define SAT_SUB_U_3(T) \
>   T sat_sub_u_3_##T (T x, T y) \
>   { \
> return x > y ? x - y : 0; \
>   }
>
> Form 4:
>   #define SAT_SUB_U_4(T) \
>   T sat_sub_u_4_##T (T x, T y) \
>   { \
> return x >= y ? x - y : 0; \
>   }
>
> Form 5:
>   #define SAT_SUB_U_5(T) \
>   T sat_sub_u_5_##T (T x, T y) \
>   { \
> return x < y ? 0 : x - y; \
>   }
>
> Form 6:
>   #define SAT_SUB_U_6(T) \
>   T sat_sub_u_6_##T (T x, T y) \
>   { \
> return x <= y ? 0 : x - y; \
>   }
>
> Form 7:
>   #define SAT_SUB_U_7(T) \
>   T sat_sub_u_7_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)(overflow - 1); \
>   }
>
> Form 8:
>   #define SAT_SUB_U_8(T) \
>   T sat_sub_u_8_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return ret & (T)-(!overflow); \
>   }
>
> Form 9:
>   #define SAT_SUB_U_9(T) \
>   T sat_sub_u_9_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return overflow ? 0 : ret; \
>   }
>
> Form 10:
>   #define SAT_SUB_U_10(T) \
>   T sat_sub_u_10_##T (T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_sub_overflow (x, y, &ret); \
> return !overflow ? ret : 0; \
>   }
>
> Take form 10 as example:
>
> SAT_SUB_U_10(uint64_t);
>
> Before this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   unsigned char _2;
>   uint8_t _3;
>   __complex__ unsigned char _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 == 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3
> ;;4
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;3
>   # _3 = PHI <0(2), _1(3)>
>   return _3;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
> {
>   uint8_t _3;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
>   return _3;
> ;;succ:   EXIT
>
> }
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add more match for unsigned sat_sub.
> * tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
> func impl to match phi node for .SAT_SUB.
> (math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
> for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 25 +++--
>  gcc/tree-ssa-math-opts.cc | 33 +
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cfe81e80b3..66e411b3359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
>  /* Unsigned saturation sub, case 2 (branch with ge):
> SAT_U_SUB = X >= Y ? X - Y : 0.  */
>  (match (unsigned_integer_sat_sub @0 @1)
> - (cond (ge @0 @1) (minus @0 @1) integer_zerop)
> + (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& typ

Re: [PATCH] analyzer: Fix g++ 4.8 bootstrap without using std::move to return std::unique_ptr

2024-06-14 Thread David Malcolm

On Fri, 2024-06-14 at 13:57 +0100, Jonathan Wakely wrote:
> This builds successfully using both gcc-4.8.5 and gcc-13.3.1 as host
> compiler.
> 
> OK for trunk if testing succeeds?

Yes, thanks!

Dave

[committed] libstdc++: Fix declaration of posix_memalign for freestanding

2024-06-14 Thread Jonathan Wakely

Tested x86_64-linux with --disable-hosted-libstdcxx where it is needed
to fix the build.

Pushed to trunk, backports needed too.

-- >8 --

Thanks to Jérôme Duval for noticing this.

libstdc++-v3/ChangeLog:

* libsupc++/new_opa.cc [!_GLIBCXX_HOSTED]: Fix declaration of
posix_memalign.
---
 libstdc++-v3/libsupc++/new_opa.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/new_opa.cc 
b/libstdc++-v3/libsupc++/new_opa.cc
index 35606e1c1b3..c7c7e794623 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -47,7 +47,7 @@ using std::size_t;
 extern "C"
 {
 # if _GLIBCXX_HAVE_POSIX_MEMALIGN
-  void *posix_memalign(void **, size_t alignment, size_t size);
+  int posix_memalign(void **, size_t alignment, size_t size);
 # elif _GLIBCXX_HAVE_ALIGNED_ALLOC
   void *aligned_alloc(size_t alignment, size_t size);
 # elif _GLIBCXX_HAVE__ALIGNED_MALLOC
-- 
2.45.1

[Fortran, Patch, PR 96992] Fix Class arrays of different ranks are rejected as storage association argument

2024-06-14 Thread Andre Vehreschild

Hi all,

I somehow got assigned to this PR so I fixed it. GFortran was ICEing because of
the ASSUME_RANK in a derived to class conversion. After fixing this, storage
association was producing segfaults. The "shape conversion" of the class array
as dummy argument was not initializing the dim 0 stride and with that grabbing
into the memory somewhere. This is now fixed and

regtests fine on x86_64 Fedora 39. Ok for mainline?

I assume this patch could be fixing some other PRs with class array's parameter
passing, too. If that sounds familiar, feel free to point me to them.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 86ac3179e1314ca1c41f52025c5a156ad7346dc1 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 14 Jun 2024 16:54:37 +0200
Subject: [PATCH] Fortran: [PR96992] Fix rejecting class arrays of different
 ranks as storage association argument.

Removing the assert in trans-expr, lead to initial strides not set
which is not fixed.

	PR fortran/96992

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_trans_array_bounds): Set a starting
	stride, when descriptor expects a variable for the stride.
	(gfc_trans_dummy_array_bias): Allow storage association for
	dummy class arrays, when they are not elemental.
	* trans-expr.cc (gfc_conv_derived_to_class): Remove assert to
	allow converting derived to class type arrays with assumend
	rank.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr96992.f90: New test.
---
 gcc/fortran/trans-array.cc|  7 ++-
 gcc/fortran/trans-expr.cc |  2 -
 gcc/testsuite/gfortran.dg/pr96992.f90 | 61 +++
 3 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr96992.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index b3088a892c8..9fa8bad2f35 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -6798,6 +6798,9 @@ gfc_trans_array_bounds (tree type, gfc_symbol * sym, tree * poffset,

   size = gfc_index_one_node;
   offset = gfc_index_zero_node;
+  stride = GFC_TYPE_ARRAY_STRIDE (type, 0);
+  if (stride && VAR_P (stride))
+gfc_add_modify (pblock, stride, gfc_index_one_node);
   for (dim = 0; dim < as->rank; dim++)
 {
   /* Evaluate non-constant array bound expressions.
@@ -7134,7 +7137,9 @@ gfc_trans_dummy_array_bias (gfc_symbol * sym, tree tmpdesc,
   || (is_classarray && CLASS_DATA (sym)->attr.allocatable))
 return;

-  if (!is_classarray && sym->attr.dummy && gfc_is_nodesc_array (sym))
+  if ((!is_classarray
+   || (is_classarray && CLASS_DATA (sym)->as->type == AS_EXPLICIT))
+  && sym->attr.dummy && !sym->attr.elemental && gfc_is_nodesc_array (sym))
 {
   gfc_trans_g77_array (sym, block);
   return;
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 0796fb75505..4bb62cfb1ad 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -903,8 +903,6 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,

 	  if (e->rank != class_ts.u.derived->components->as->rank)
 	{
-	  gcc_assert (class_ts.u.derived->components->as->type
-			  == AS_ASSUMED_RANK);
 	  if (derived_array
 		  && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (parmse->expr)))
 		{
diff --git a/gcc/testsuite/gfortran.dg/pr96992.f90 b/gcc/testsuite/gfortran.dg/pr96992.f90
new file mode 100644
index 000..c56ed80f394
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96992.f90
@@ -0,0 +1,61 @@
+! { dg-do run }
+
+! PR fortran/96992
+
+! Contributed by Thomas Koenig  
+
+! From the standard:
+! An actual argument that represents an element sequence and
+! corresponds to a dummy argument that is an array is sequence
+! associated with the dummy argument. The rank and shape of the
+! actual argument need not agree with the rank and shape of the
+! dummy argument, but the number of elements in the dummy argument
+! shall not exceed the number of elements in the element sequence
+! of the actual argument. If the dummy argument is assumed-size,
+! the number of elements in the dummy argument is exactly
+! the number of elements in the element sequence.
+
+! Check that walking the sequence starts with an initialized stride
+! for dim == 0.
+
+module foo_mod
+  implicit none
+  type foo
+ integer :: i
+  end type foo
+contains
+  subroutine d1(x,n)
+integer, intent(in) :: n
+integer :: i
+class (foo), intent(out), dimension(n) :: x
+select type(x)
+class is(foo)
+   x(:)%i = (/ (42 + i, i = 1, n ) /)
+class default
+   stop 1
+end select
+  end subroutine d1
+  subroutine d2(x,n)
+integer, intent(in) :: n
+integer :: i
+class (foo), intent(in), dimension(n,n,n) :: x
+select type (x)
+class is (foo)
+   print *,x%i
+   if ( any( x%i /= reshape((/ (42 + i, i = 1, n ** 3 ) /), [n, n, n] ))) stop 2
+class default
+   stop 3
+end select
+  end subroutine d2
+end module foo_mod
+program main
+  use foo_mod
+  i

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Hongtao Liu

On Fri, Jun 14, 2024 at 6:31 PM Richard Biener  wrote:
>
> The following retires vcond{,u,eq} optabs by stopping to use them
> from the middle-end.  Targets instead (should) implement vcond_mask
> and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> possibly affected targets - those implementing these patterns,
> and in particular it lists mips, sparc and ia64 as targets that
> most definitely will regress while others might simply remove
> their vcond{,u,eq} patterns.
>
> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> I know riscv doesn't implement any of the legacy optabs.  But less
> maintained vector targets might need adjustments.
>
At GCC14, I tried to remove these expanders in the x86 backend, and it
regressed some testcases, mainly because of the optimizations we did
in ix86_expand_{int,fp}_vcond.
I've started testing your patch, it's possible that we still need to
move the ix86_expand_{int,fp}_vcond optimizations to the
middle-end(isel or match.pd)or add extra patterns to handle it at the
rtl pas_combine.

-- 
BR,
Hongtao

[committed] libstdc++: Make std::type_info::operator== always_inline for C++23 [PR110572]

2024-06-14 Thread Jonathan Wakely

Tested x86_64-linux, and lightly tested on x86_64-w64-mingw32 too, where
it fixes the linker error.

Pushed to trunk. Backports needed for 12, 13 and 14.

-- >8 --

Commit r12-6266-g3633cc54284450 implemented P1328 for C++23, making
std::type_info::operator== usable in constant expressions. For targets
such as mingw-w64 where that function was not previously inline, making
it constexpr required making it inline for C++23 and later. For
statically linked programs this can result in multiple definition
errors, because there's a non-inline definition in libstdc++.a as well.

For those targets make it always_inline for C++23, so that there is no
symbol generated for the inline definition, and the non-inline
definition in libstdc++.a will be the only definition.

libstdc++-v3/ChangeLog:

PR libstdc++/110572
* libsupc++/typeinfo (type_info::operator==): Add always_inline
attribute for targets where the ABI requries equality to be
non-inline.
* testsuite/18_support/type_info/110572.cc: New test.
---
 libstdc++-v3/libsupc++/typeinfo   |  3 +++
 libstdc++-v3/testsuite/18_support/type_info/110572.cc | 11 +++
 2 files changed, 14 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/18_support/type_info/110572.cc

diff --git a/libstdc++-v3/libsupc++/typeinfo b/libstdc++-v3/libsupc++/typeinfo
index fcc3077d060..35e72bb18ee 100644
--- a/libstdc++-v3/libsupc++/typeinfo
+++ b/libstdc++-v3/libsupc++/typeinfo
@@ -188,6 +188,9 @@ namespace std
 #endif
 
 #if __GXX_TYPEINFO_EQUALITY_INLINE || __cplusplus > 202002L
+# if ! __GXX_TYPEINFO_EQUALITY_INLINE
+  [[__gnu__::__always_inline__]]
+# endif
   _GLIBCXX23_CONSTEXPR inline bool
   type_info::operator==(const type_info& __arg) const _GLIBCXX_NOEXCEPT
   {
diff --git a/libstdc++-v3/testsuite/18_support/type_info/110572.cc 
b/libstdc++-v3/testsuite/18_support/type_info/110572.cc
new file mode 100644
index 000..64081879b77
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/type_info/110572.cc
@@ -0,0 +1,11 @@
+// { dg-options "-static-libstdc++" }
+// { dg-require-static-libstdcxx }
+// { dg-require-cpp-feature-test __cpp_rtti }
+// { dg-do link }
+
+#include 
+
+int main()
+{
+  return typeid(0) == typeid(0u);
+}
-- 
2.45.1

Re: [PATCH] Enhance if-conversion for automatic arrays

2024-06-14 Thread Andrew Pinski

On Fri, Jun 14, 2024 at 5:54 AM Richard Biener  wrote:
>
> Automatic arrays that are not address-taken should not be subject to
> store data races.

That seems conservative enough. Though I would think if the array
never escaped the function would be still correct and allow for more
arrays (but maybe that is not tracked).

Thanks,
Andrew Pinski

> This applies to OMP SIMD in-branch lowered
> functions result array which for the testcase otherwise prevents
> vectorization with SSE and for AVX and AVX512 ends up with spurious
> .MASK_STORE to the stack surviving.
>
> This inefficiency was noted in PR111793.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.  Is my
> idea of store data races correct?  At least phiopt uses the same
> check but for example LIM doesn't special-case locals.
>
> PR tree-optimization/111793
> * tree-if-conv.cc (ifcvt_memrefs_wont_trap): For stores
> that do not trap only consider -fstore-data-races when
> the underlying object is not automatic or has its address
> taken.
>
> * gcc.dg/vect/vect-simd-clone-21.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c | 16 
>  gcc/tree-if-conv.cc| 13 +++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c 
> b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
> new file mode 100644
> index 000..49c52fb59bd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-21.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_simd_clones } */
> +/* { dg-additional-options "-fopenmp-simd" } */
> +
> +#pragma omp declare simd simdlen(4) inbranch
> +__attribute__((noinline)) int
> +foo (int a, int b)
> +{
> +  return a + b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target 
> i?86-*-* x86_64-*-* } } } */
> +/* if-conversion shouldn't need to resort to masked stores for the result
> +   array created by OMP lowering since that's automatic and does not have
> +   its address taken.  */
> +/* { dg-final { scan-tree-dump-not "MASK_STORE" "vect" } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index c4c3ed41a44..974c614edf3 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -934,14 +934,23 @@ ifcvt_memrefs_wont_trap (gimple *stmt, 
> vec drs)
>if (DR_IS_READ (a))
> return true;
>
> +  bool ok = flag_store_data_races;
> +  base = get_base_address (base);
> +  if (DECL_P (base)
> + && auto_var_in_fn_p (base, cfun->decl)
> + && ! may_be_aliased (base))
> +   /* Automatic variables not aliased are not subject to
> +  data races.  */
> +   ok = true;
> +
>/* an unconditionaly write won't trap if the base is written
>   to unconditionally.  */
>if (base_master_dr
>   && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
> -   return flag_store_data_races;
> +   return ok;
>/* or the base is known to be not readonly.  */
>else if (base_object_writable (DR_REF (a)))
> -   return flag_store_data_races;
> +   return ok;
>  }
>
>return false;
> --
> 2.35.3

Re: [PATCH 30/52 v2] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-14 Thread Paul Koning

Ok, I understand better now.  But if those macros are supposed to be replaced 
by hook functions, could you make that replacement part of the proposed patch?

paul

> On Jun 13, 2024, at 11:22 PM, Kewen.Lin  wrote:
> 
> Hi Paul,
> 
> on 2024/6/14 04:07, Paul Koning wrote:
>> What is the effect of this change?  The original code intended to have 
>> "float" mean a 32 bit value, and "double" a 64 bit value.  There aren't any 
>> larger floats, so I defined the long double size as 64 also.  Is the right 
>> answer not to define it?
> 
> Since sub-patch 09/52 will poison {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE, 
> target code building will fail
> if it still has these macros.  As I'd like to squash these target changes 
> onto 09/52, so I didn't note
> the background/context here, sorry about that.
> 
>> 
>> That part I understand, but why does the patch also remove FLOAT_TYPE_SIZE 
>> and DOUBLE_TYPE_SIZE without explanation and without mention in the 
>> changelog?
> 
> Oops, thanks for catching!  I just noticed this sub-patch has inconsistent 
> subject & changelog, I should
> have noticed this as it has a quite different subject from the others. :(  
> With your finding, I just
> re-visited all the other sub-patches, luckily they are consistent.
> 
> The below is the updated revision, hope it looks good to you.  Thanks again.
> 
> BR,
> Kewen
> -
> 
> Subject: [PATCH] pdp11: Remove macro {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE
> 
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in pdp11 port, as we want to replace these macros
> with hook mode_for_floating_type and poison them.
> 
> gcc/ChangeLog:
> 
>* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove.
>(DOUBLE_TYPE_SIZE): Likewise.
>(LONG_DOUBLE_TYPE_SIZE): Likewise.
> ---
> gcc/config/pdp11/pdp11.h | 11 ---
> 1 file changed, 11 deletions(-)
> 
> diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
> index 2446fea0b58..6c8e045bc57 100644
> --- a/gcc/config/pdp11/pdp11.h
> +++ b/gcc/config/pdp11/pdp11.h
> @@ -71,17 +71,6 @@ along with GCC; see the file COPYING3.  If not see
> #define LONG_TYPE_SIZE 32
> #define LONG_LONG_TYPE_SIZE64
> 
> -/* In earlier versions, FLOAT_TYPE_SIZE was selectable as 32 or 64,
> -   but that conflicts with Fortran language rules.  Since there is no
> -   obvious reason why we should have that feature -- other targets
> -   generally don't have float and double the same size -- I've removed
> -   it.  Note that it continues to be true (for now) that arithmetic is
> -   always done with 64-bit values, i.e., the FPU is always in "double"
> -   mode.  */
> -#define FLOAT_TYPE_SIZE32
> -#define DOUBLE_TYPE_SIZE   64
> -#define LONG_DOUBLE_TYPE_SIZE  64
> -
> /* machine types from ansi */
> #define SIZE_TYPE "short unsigned int" /* definition of size_t */
> #define WCHAR_TYPE "short int" /* or long int */
> --
> 2.43.0
> 
>

[r15-1330 Regression] FAIL: gcc.target/i386/vect-strided-3.c scan-assembler-times movq 4 on Linux/x86_64

2024-06-14 Thread haochen.jiang

On Linux/x86_64,

1438b15e5430f7fab3832c35d262d6b58caba469 is the first bad commit
commit 1438b15e5430f7fab3832c35d262d6b58caba469
Author: Richard Biener 
Date:   Fri Jun 14 11:31:53 2024 +0200

Adjust gcc.target/i386/vect-strided-3.c

caused

FAIL: gcc.target/i386/vect-strided-3.c scan-assembler-times movq 4

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1330/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-strided-3.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH V2 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-14 Thread Robin Dapp

> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 6a2eabbd854..29916adb62b 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1517,8 +1517,7 @@ (define_insn_and_split "*vwsll_zext1_scalar_"
>"&& 1"
>[(const_int 0)]
>{
> -if (GET_CODE (operands[2]) == SUBREG)
> -  operands[2] = SUBREG_REG (operands[2]);
> +operands[2] = gen_lowpart (Pmode, operands[2]);
>  insn_code icode = code_for_pred_vwsll_scalar (mode);
>  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
>  DONE;

Please also correct it in the other expander (_trunc_).
OK with that change if the testsuite still looks ok :)

Regards
 Robin

Re: [PATCH V2 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn

2024-06-14 Thread Robin Dapp

OK.

Regards
 Robin

[patch,avr,applied] PR115419: round double to even LSB on tie breaks

2024-06-14 Thread Georg-Johann Lay


The convention in round-to-nearest floating-point
rounding is to round to even ("Banker's rounding").

Johann

AVR: target/115419 - Tie breaks are rounded-to-even.

libgcc/config/avr/libf7/
PR target/115419
* libf7.c (f7_get_double): Round tie breaks to even LSB.


diff --git a/libgcc/config/avr/libf7/libf7.c 
b/libgcc/config/avr/libf7/libf7.c

index 375becb854c..6fae4fc1a2d 100644
--- a/libgcc/config/avr/libf7/libf7.c
+++ b/libgcc/config/avr/libf7/libf7.c
@@ -440,11 +440,21 @@ f7_double_t f7_get_double (const f7_t *aa)

   mant &= 0x00ff;

-  // FIXME: For subnormals, rounding is premature and should be
-  //   done *after* the mantissa has been shifted into place
-  //   (or the round value be shifted left accordingly).
-  // Round.
-  mant += 1u << (F7_MANT_BITS - (1 + DBL_DIG_MANT) - 1);
+  // PR115419: The least significant nibble tells how to round:
+  // Tie breaks are rounded to even (Banker's rounding).
+  uint8_t lsn = mant & 0xff;
+  lsn &= 0xf;
+  // The LSB of the outgoing double is at bit 3.
+  if (lsn & (1 << 3))
+++lsn;
+  if (lsn > (1 << 2))
+{
+  // FIXME: For subnormals, rounding is premature and should be
+  //done *after* the mantissa has been shifted into place
+  //(or the round value be shifted left accordingly).
+  // Round.
+  mant += 1u << (F7_MANT_BITS - (1 + DBL_DIG_MANT) - 1);
+}

   uint8_t dex;
   register uint64_t r18 __asm ("r18") = mant;

Re: [PATCH] c++: remove Concepts TS code

2024-06-14 Thread Marek Polacek

On Wed, Jun 12, 2024 at 03:08:27PM -0400, Patrick Palka wrote:
> FWIW I think we can also remove DECL_DECLARED_CONCEPT_P,
> function_concept_p and variable_concept_p (and related code in
> grokfndecl and grokvardecl probably).  And e.g. code in constraint.cc
> that checks for FUNCTION_DECL, VAR_*, CALL_EXPR, OVERLOAD / OVL_* etc.

Thanks.  I'm leaving this as a follow-up; the patch is already quite large.

Marek

Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-14 Thread Alexander Monakov

On Fri, 14 Jun 2024, Kong, Lingling wrote:

> APX CFCMOV[1] feature implements conditionally faulting which means that all 
> memory faults are suppressed
> when the condition code evaluates to false and load or store a memory 
> operand. Now we could load or store a
> memory operand may trap or fault for conditional move.
> 
> In middle-end, now we don't support a conditional move if we knew that a load
> from A or B could trap or fault.

Predicated loads&stores on Itanium don't trap either. They are modeled via
COND_EXEC on RTL. The late if-conversion pass (the instance that runs after
reload) is capable of introducing them.

> To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> in if-conversion pass to allow convert to cmov.

Considering the above, is the new hook really necessary? Can you model the new
instructions via (cond_exec () (set ...)) instead of (set (if_then_else ...)) ?

Alexander

Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-14 Thread Jeff Law





On 6/14/24 11:10 AM, Alexander Monakov wrote:


On Fri, 14 Jun 2024, Kong, Lingling wrote:


APX CFCMOV[1] feature implements conditionally faulting which means that all 
memory faults are suppressed
when the condition code evaluates to false and load or store a memory operand. 
Now we could load or store a
memory operand may trap or fault for conditional move.

In middle-end, now we don't support a conditional move if we knew that a load
from A or B could trap or fault.


Predicated loads&stores on Itanium don't trap either. They are modeled via
COND_EXEC on RTL. The late if-conversion pass (the instance that runs after
reload) is capable of introducing them.


To enable CFCMOV, we add a target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
in if-conversion pass to allow convert to cmov.


Considering the above, is the new hook really necessary? Can you model the new
instructions via (cond_exec () (set ...)) instead of (set (if_then_else ...)) ?

Note that turning on cond_exec will turn off some of the cmove support.

But the general suggesting of trying to avoid a hook for this is a good 
one.  In fact, my first reaction to this thread was "do we really need a 
hook for this".


jeff

[PATCH] libcpp, c-family: Add (dumb) C23 N3017 #embed support [PR105863]

2024-06-14 Thread Jakub Jelinek

Hi!

The following patch implements the C23 N3017 "#embed - a scannable,
tooling-friendly binary resource inclusion mechanism" paper.

The implementation is intentionally dumb, in that it doesn't significantly
speed up compilation of larger initializers and doesn't make it possible
to use huge #embeds (like several gigabytes large, that is compile time
and memory still infeasible).
There are 2 reasons for this.  One is that I think like it is implemented
now in the patch is how we should use it for the smaller #embed sizes,
dunno with which boundary, whether 32 bytes or 64 or something like that,
certainly handling the single byte cases which is something that can appear
anywhere in the source where constant integer literal can appear is
desirable and I think for a few bytes it isn't worth it to come up with
something smarter and users would like to e.g. see it in -E readably as
well (perhaps the slow vs. fast boundary should be determined by command
line option).  And the other one is to be able to more easily find
regressions in behavior caused by the optimizations, so we have something
to get back in git to compare against.
I'm definitely willing to work on the optimizations (likely introduce a new
CPP_* token type to refer to a range of libcpp owned memory (start + size)
and similarly some tree which can do the same, and can be at any time e.g.
split into 2 subparts + say INTEGER_CST in between if needed say for
const unsigned char d[] = {
#embed "2GB.dat" prefix (0, 0, ) suffix (, [0x4000] = 42)
}; still without having to copy around huge amounts of data; STRING_CST
owns the memory it points to and can be only 2GB in size), but would
like to do that incrementally.
And would like to first include some extensions also not included in
this patch, like gnu::offset (off) parameter to allow to skip certain
constant amount of bytes at the start of the files, plus
gnu::base64 ("base64_encoded_data") parameter to add something which can
store more efficiently large amounts of the #embed data in preprocessed
source.

I've been cross-checking all the tests also against the LLVM implementation
https://github.com/llvm/llvm-project/pull/68620
which has been for a few hours even committed to LLVM trunk but reverted
afterwards.

The patch uses --embed-dir= option that clang plans to add above and doesn't
use other variants on the search directories yet, plus there are no
default directories at least for the time being where to search for embed
files.  So, #embed "..." works if it is found in the same directory (or
relative to the current file's directory) and #embed "/..." or #embed 
work always, but relative #embed <...> doesn't unless at least one
--embed-dir= is specified.  There is no reason to differentiate between
system and non-system directories, so we don't need -isystem like
counterpart, perhaps -iquote like counterpart could be useful in the future,
dunno what else.  Should we have --embed-directory= as alias to --embed-dir=?

There are some differences beyond clang ICEs, so I'd like to point them out
to make sure there is agreement on the choices in the patch.  They are also
mentioned in the comments of the llvm pull request.

The most important is that the GCC patch (as well as the original thephd.dev
LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into
a mere sequence of numbers like 123,2,35,26 rather then what clang
effectively treats as (unsigned char)123,(unsigned char)2,(unsigned
char)35,(unsigned char)26 but only does that when using integrated
preprocessor, not when using -save-temps where it acts as GCC.
JeanHeyd as the original author agrees that is how it is currently worded in
C23.

Another difference (not tested in the testsuite, not sure how to check for
effective target /dev/urandom nor am sure it is desirable to check that
during testsuite) is how to treat character devices, named pipes etc.
(block devices are errored on).  The original paper uses /dev/urandom
in various examples and seems to assume that unlike regular files the
devices aren't really cached, so
#embed  limit(1) prefix(int a = ) suffix(;)
#embed  limit(1) prefix(int b = ) suffix(;)
usually results in a != b.  That is what the godbolt thephd.dev branch
implements too and what this patch does as well, but clang actually seems
to just go from st.st_size == 0, ergo it must be zero-sized resource and
so just copies over if_empty if present.  It is really questionable
what to do about the character devices/named pipes with __has_embed, for
regular files the patch doesn't read anything from them, relies on
st.st_size + limit for whether it is empty or non-empty.  But I don't know
of a way to check if read on say a character device would read anything
or not (the  limit (1) vs.  limit (1) cases), and
if we read something, that would be better cached for later because
#embed later if it reads again could read no further data even when it
first read something.  So, the patch currently for __has_embed just
a

[PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]

2024-06-14 Thread Detlef Vollmann


The attached patch fixes 115481 and 111639.
It was tested against avr-libc 2.0.0 (which has *f as macros),
2.1.0 (which has *f as functions, but no long double), and
2.2.0 (which has long double functions).

Only the libstdc++ build was tested, not the resulting library.

  Detlef
commit 9112ae579f73b249e9515d5ac9ddc3a76b067c39
Author: Detlef Vollmann 
Date:   Fri Jun 14 14:34:31 2024 +0200

libstdc++-v3: detect math functions when using avr-libc and handle macros

Different versions of avr-libc have different definitions of
math functions:
  - 2.0.0 and earlier have float versions as macros
  - 2.1.0 has float versions as proper functions, but no long double versions
  - 2.2.0 has long double versions

This commit tells configure to always check if these functions are available.
It also #undef's any macros.

libstdc++-v3/ChangeLog:

* crossconfig.m4 [avr*-*-*]: Add compile-checks for
float and long double functions.
* configure: Regenerate.
* src/c++98/math_stubs_float.cc: #undef any macros.

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5645e991af7..17dbae7bd87 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -5080,7 +5080,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -5126,7 +5126,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -5150,7 +5150,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -5195,7 +5195,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -5219,7 +5219,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -28628,51 +28628,2354 @@ case "${host}" in
 ;;
 
   avr*-*-*)
-$as_echo "#define HAVE_ACOSF 1" >>confdefs.h
 
-$as_echo "#define HAVE_ASINF 1" >>confdefs.h
 
-$as_echo "#define HAVE_ATAN2F 1" >>confdefs.h
 
-$as_echo "#define HAVE_ATANF 1" >>confdefs.h
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for acosf declaration" >&5
+$as_echo_n "checking for acosf declaration... " >&6; }
+if ${glibcxx_cv_func_acosf_use+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+
+
+  ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+#include 
+#ifdef HAVE_IEEEFP_H
+# include 
+#endif
+#undef acosf
+
+int
+main ()
+{
+
+  void (*f)(void) = (void (*)(void))acosf;
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  glibcxx_cv_func_acosf_use=yes
+
+else
+  glibcxx_cv_func_acosf_use=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $glibcxx_cv_func_acosf_use" >&5
+$as_echo "$glibcxx_cv_func_acosf_use" >&6; }
+

Re: [PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]

2024-06-14 Thread Xi Ruoyao

On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote:
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index 5645e991af7..17dbae7bd87 100755
> --- a/libstdc++-v3/configure
> +++ b/libstdc++-v3/configure
> @@ -5080,7 +5080,7 @@ else
>  We can't simply define LARGE_OFF_T to be 9223372036854775807,
>  since some C++ compilers masquerading as C compilers
>  incorrectly reject 9223372036854775807.  */
> -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 
> 31))
>    int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
>  && LARGE_OFF_T % 2147483647 == 1)
>     ? 1 : -1];

This shouldn't happen.  Please regenerate using *vanilla* autoconf-2.69.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]

2024-06-14 Thread Jonathan Wakely

On Fri, 14 Jun 2024 at 18:45, Xi Ruoyao wrote:
>
> On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote:
> > diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> > index 5645e991af7..17dbae7bd87 100755
> > --- a/libstdc++-v3/configure
> > +++ b/libstdc++-v3/configure
> > @@ -5080,7 +5080,7 @@ else
> >  We can't simply define LARGE_OFF_T to be 9223372036854775807,
> >  since some C++ compilers masquerading as C compilers
> >  incorrectly reject 9223372036854775807.  */
> > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> > +#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 
> > 31))
> >int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
> >  && LARGE_OFF_T % 2147483647 == 1)
> > ? 1 : -1];
>
> This shouldn't happen.  Please regenerate using *vanilla* autoconf-2.69.

Yes, please, if possible. But I can regenerate it locally before
pushing if needed.

[committed] testsuite: Add -Wno-psabi to vshuf-mem.C test

2024-06-14 Thread Jakub Jelinek

On Mon, Jun 10, 2024 at 10:06:31AM +0200, Andreas Krebbel wrote:
> The current implementation assumes to always be invoked with register
> operands. For memory operands we even have an instruction
> though (vlrep). With the patch we try this first and only if it fails
> force the input into a register and continue.
> 
> vec_splats generation fails for single element 128bit types which are
> allowed for vec_splat. This is something to sort out with another
> patch I guess.
> 
> Bootstrapped and regtested on IBM Z. Committed to mainline. Needs to
> be committed to GCC 14 branch as well.

The newly added test FAILs on i686-linux.
On x86_64-linux
make check-g++ 
RUNTESTFLAGS='--target_board=unix\{-m64,-m32/-msse2,-m32/-mno-sse/-mno-mmx\} 
dg-torture.exp=vshuf-mem.C'
shows that as well.

The problem is that without SSE2/MMX the vector is passed differently
than normally and so GCC warns about that.
-Wno-psabi is the usual way to shut it up.

Also wonder about the
// { dg-additional-options "-march=z14" { target s390*-*-* } }
line, doesn't that mean the test will FAIL on all pre-z14 HW?
Shouldn't it use some z14_runtime or similar effective target, or
check in main (in that case copied over to g++.target/s390) whether
z14 instructions can be actually used at runtime?

Tested on x86_64-linux, committed to trunk as obvious.

2024-06-14  Jakub Jelinek  

* g++.dg/torture/vshuf-mem.C: Add -Wno-psabi to dg-options.

--- gcc/testsuite/g++.dg/torture/vshuf-mem.C.jj 2024-06-14 19:45:09.116781920 
+0200
+++ gcc/testsuite/g++.dg/torture/vshuf-mem.C2024-06-14 19:56:08.744135867 
+0200
@@ -1,4 +1,4 @@
-// { dg-options "-std=c++11" }
+// { dg-options "-std=c++11 -Wno-psabi" }
 // { dg-do run }
 // { dg-additional-options "-march=z14" { target s390*-*-* } }

Jakub

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-06-14 Thread Jeff Law





On 6/13/24 5:32 AM, Manolis Tsamis wrote:

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

  strbw2, [x1, 1]
  ldr x0, [x1]  # Expensive store forwarding to larger load.

To:

  ldr x0, [x1]
  strbw2, [x1]
  bfi x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:  +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:+17.5%

gcc/ChangeLog:

* Makefile.in: Add avoid-store-forwarding.o.
* common.opt: New option -favoid-store-forwarding.
* params.opt: New param store-forwarding-max-distance.
* doc/invoke.texi: Document new pass.
* doc/passes.texi: Document new pass.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
 * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
Just a note for others.  I've sent Manolis's a few more failures spit 
out by my tester.  The crosses aren't quite all clean, but they're 
getting closer.  Once the crosses are clean we'd run the QEMU emulated 
targets which take dramatically longer (but which are also a deeper test 
for this kind of change).


Jeff

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Xi Ruoyao

On Fri, 2024-06-14 at 12:31 +0200, Richard Biener wrote:
> The following retires vcond{,u,eq} optabs by stopping to use them
> from the middle-end.  Targets instead (should) implement vcond_mask
> and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> possibly affected targets - those implementing these patterns,
> and in particular it lists mips, sparc and ia64 as targets that
> most definitely will regress while others might simply remove
> their vcond{,u,eq} patterns.
> 
> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> I know riscv doesn't implement any of the legacy optabs.  But less
> maintained vector targets might need adjustments.

No new test failures on LoongArch.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] libstdc++: Fix build for AVR [PR115481, PR111639]

2024-06-14 Thread Detlef Vollmann


On 6/14/24 19:45, Xi Ruoyao wrote:

On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote:

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5645e991af7..17dbae7bd87 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -5080,7 +5080,7 @@ else
  We can't simply define LARGE_OFF_T to be 9223372036854775807,
  since some C++ compilers masquerading as C compilers
  incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
    int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];


This shouldn't happen.  Please regenerate using *vanilla* autoconf-2.69.


Sorry, I was surprised myself, when I looked at the changes.
But after looking at the actual change I thought it doesn't matter.
I used Debian's version, I'll try to get a vanilla version working
tomorrow (I actually plan anyway to run a test with the built libstdc++
on simulavr).

  Detlef

[COMMITTED] Do not assume LHS of call is an ssa-name.

2024-06-14 Thread Andrew MacLeod

gimple_range_fold makes an assumption that if there is a LHS on a call 
that it is an ssa_name.  Especially later in compilation that may not be 
true.


This patch merely avoids calling routines that assume an ssa-name is 
being passed in.


Bootstraps on  x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew

From cef90b0558032b64b021054370ad5b96f8a5c5d8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 12 Jun 2024 09:20:20 -0400
Subject: [PATCH 1/7] Do not assume LHS of call is an ssa-name.

gimple_range_fold makes an assumption that the LHS of a call is an
ssa_name, which later in compilation may not be true.

	* gimple-range-fold.cc (fold_using_range::range_of_call): Ensure
	LHS is an SSA_NAME before invoking gimple_range_global.
---
 gcc/gimple-range-fold.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 6037c29ce11..52fc3f2cb04 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1105,7 +1105,7 @@ fold_using_range::range_of_call (vrange &r, gcall *call, fur_source &)
 }
 
   // If there is an LHS, intersect that with what is known.
-  if (lhs)
+  if (gimple_range_ssa_p (lhs))
 {
   Value_Range def (TREE_TYPE (lhs));
   gimple_range_global (def, lhs);
-- 
2.45.0

[COMMITTED] Add merge facility to ssa_lazy_cache.

2024-06-14 Thread Andrew MacLeod

The ssa_lazy_cache object has a routine to merge a range for an ssa-name 
with an existing range in the cache.  This adds a method which will 
merge all elements of another ssa_lazy_cache


the ssa_lazy_cache is a reasonably efficient object for storing ranges 
associated a few ssa-names, and it is used in a few places by ranger.


Bootstraps on   x86_64-pc-linux-gnu with no regressions.  There 
shouldn't be, there are no callers yet :-)   Pushed.


Andrew

From e2679227829ffe2ff4fb974994aacd639219f855 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 13 Jun 2024 15:35:55 -0400
Subject: [PATCH 2/7] Add merge facility to ssa_lazy_cache.

The ssa_lazy_cache has a routine to merge a range for an ssa-name with
an existing range in the cache.  This adds a method which will merge all
elements of another ssa_lazy_cache.

	* gimple-range-cache.cc (ssa_lazy_cache::merge): New.
	* gimple-range-cache.h (ssa_lazy_cache::merge): New prototype.
---
 gcc/gimple-range-cache.cc | 18 ++
 gcc/gimple-range-cache.h  |  1 +
 2 files changed, 19 insertions(+)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index a511a2c3a4c..efaae2ed928 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -729,6 +729,24 @@ ssa_lazy_cache::merge_range (tree name, const vrange &r)
   return true;
 }
 
+// Merge all elements of CACHE with this cache.
+// Any names in CACHE that are not in this one are added.
+// Any names in both are merged via merge_range..
+
+void
+ssa_lazy_cache::merge (const ssa_lazy_cache &cache)
+{
+  unsigned x;
+  bitmap_iterator bi;
+  EXECUTE_IF_SET_IN_BITMAP (cache.active_p, 0, x, bi)
+{
+  tree name = ssa_name (x);
+  Value_Range r(TREE_TYPE (name));
+  cache.get_range (r, name);
+  merge_range (ssa_name (x), r);
+}
+}
+
 // Return TRUE if NAME has a range, and return it in R.
 
 bool
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index c7499f928a9..63410d5437e 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -87,6 +87,7 @@ public:
   virtual bool get_range (vrange &r, tree name) const;
   virtual void clear_range (tree name);
   virtual void clear ();
+  void merge (const ssa_lazy_cache &);
 protected:
   bitmap active_p;
 };
-- 
2.45.0

[COMMITTED] Dont add varying values to gori_on_edge mass, calculations.

2024-06-14 Thread Andrew MacLeod

The 'gori_on_edge' routine is an API which provides all contextual names 
that can be calculated on an outgoing edge at once.   It provides this 
via the ssa_lazy_cache object.


This patch recognizes that if the value produces VARYING, we should not 
put it in the list, nor continue processing the dependency chain.


Bootstraps on  x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew

From 2f90cb8b36affbb28dba643dd38068fc88c76e12 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 14 Jun 2024 11:01:08 -0400
Subject: [PATCH 3/7] Dont add varying values to gori_on_edge mass
 calculations.

gori_on_edge will return an ssa_lazy_cache with all contextual ranges
that can be generated by an edge.   This patch adjusts it so that
a VARYING range is never added.

	* gimple-range-gori.cc (gori_calc_operands): Do not continue nor
	add the range when VARYING is produced for an operand.
---
 gcc/gimple-range-gori.cc | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index d489aef312c..4f6073c715a 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1605,11 +1605,14 @@ gori_calc_operands (vrange &lhs, gimple *stmt, ssa_cache &r, range_query *q)
   tmp.set_type (TREE_TYPE (si.ssa1));
   if (si.calc_op1 (tmp, lhs, si.op2_range))
 	si.op1_range.intersect (tmp);
-  r.set_range (si.ssa1, si.op1_range);
-  gimple *src = SSA_NAME_DEF_STMT (si.ssa1);
-  // If defintion is in the same basic lock, evaluate it.
-  if (src && gimple_bb (src) == gimple_bb (stmt))
-	gori_calc_operands (si.op1_range, src, r, q);
+  if (!si.op1_range.varying_p ())
+	{
+	  r.set_range (si.ssa1, si.op1_range);
+	  gimple *src = SSA_NAME_DEF_STMT (si.ssa1);
+	  // If defintion is in the same basic lock, evaluate it.
+	  if (src && gimple_bb (src) == gimple_bb (stmt))
+	gori_calc_operands (si.op1_range, src, r, q);
+	}
 }
 
   if (si.ssa2 && !r.has_range (si.ssa2))
@@ -1617,10 +1620,13 @@ gori_calc_operands (vrange &lhs, gimple *stmt, ssa_cache &r, range_query *q)
   tmp.set_type (TREE_TYPE (si.ssa2));
   if (si.calc_op2 (tmp, lhs, si.op1_range))
 	si.op2_range.intersect (tmp);
-  r.set_range (si.ssa2, si.op2_range);
-  gimple *src = SSA_NAME_DEF_STMT (si.ssa2);
-  if (src && gimple_bb (src) == gimple_bb (stmt))
-	gori_calc_operands (si.op2_range, src, r, q);
+  if (!si.op2_range.varying_p ())
+	{
+	  r.set_range (si.ssa2, si.op2_range);
+	  gimple *src = SSA_NAME_DEF_STMT (si.ssa2);
+	  if (src && gimple_bb (src) == gimple_bb (stmt))
+	gori_calc_operands (si.op2_range, src, r, q);
+	}
 }
 }
 
-- 
2.45.0

Re: [Patch, Fortran, 96418] Fix Test coarray_alloc_comp_4.f08 ICEs

2024-06-14 Thread Harald Anlauf


Hi Andre,

the patch looks fairly simple and obvious, so OK from my side.

***

Regarding the testsuite: since you renamed one of the testcases
gfortran.dg/coarray_alloc_comp_* and moved it to gfortran.dg/coarray/,
I checked and noticed that there are other similar runtime tests for
coarrays (while some are compile-time only tests).

Do we plan to "clean" this up and move more/all related runtime
tests to the coarray/ subdirectory?  What is the general opinion on
this?

***

Thanks for the patch!

Harald


Am 14.06.24 um 09:22 schrieb Andre Vehreschild:

Hi all,

I messed up renaming of the coarray_alloc_comp-test. This is fixed in the second
version of the patch. Sorry for the inconvenience.

Additionally I figured that this patch also fixed PR fortran/103112.

Regtests ok on x86_64 Fedora 39. Ok for mainline?

Regards,
Andre

On Tue, 11 Jun 2024 16:12:38 +0200
Andre Vehreschild  wrote:


Hi all,

attached patch has already been present in 2020, but lost my attention. It
fixes an ICE in the testsuite. The old mails description is:

attached patch fixes PR96418 where the code in the testsuite when compiled
with -fcoarray=single  lead to an ICE. The reason was that the coarray object
was derefed as an array, but it was no array. Introducing the test for the
descriptor removes the ICE.

Regtests ok on x86_64-linux/Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de



--
Andre Vehreschild * Email: vehre ad gmx dot de

1 2 >

1 - 100 of 113 matches

Mail list logo