Re: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

2024-09-10 Thread Richard Biener
On Tue, Sep 10, 2024 at 1:05 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> >> +   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
> >> +(match (signed_integer_sat_add @0 @1)
> >> + (cond^ (ge (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
> >> + (nop_convert 
> >> @1
> >> +  (bit_not (bit_xor:c @0 @1)))
>
> >You only need one :c on either bit_xor.
>
> Sorry don't get the pointer here. I can understand swap @0 and @1 can also 
> acts on plus op.
> But the first xor with :c would like to allow (@0 @2) and (@2 @0).
>
> Or due to the commutative(xor), swap @0 and @1 also valid for (@1 @2) in the 
> first xor. But
> I failed to get the point how to make the @2 as first arg here.

Hmm, my logic was that there's a canonicalization rule for SSA
operands which is to put
SSA names with higher SSA_NAME_VERSION last.  That means we get the 2nd
bit_xor in a defined order, we don't know the @0 order wrt @2 so we
need to put :c on that.
That should get us all interesting cases plus making sure the @0s match up?

But maybe I'm missing something.  It's just the number of patterns generated
is 2^number-of-:c, so it's good to prune known unnecessary combinations.

> >> +   integer_zerop)
> >> +   @2
> >> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value))
>
> >> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> >> +  && types_match (type, @0, @1
>
> >I think the types_match is redundant as you have the bit_xor combining both.
>
> Got it, does that indicates the bit_xor somehow has the similar type check 
> already? As well as other
> op like and/or ... etc.

Yes, all commutative binary operators require matching types on their operands.

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, September 9, 2024 8:19 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer 
> .SAT_ADD
>
> On Tue, Sep 3, 2024 at 2:34 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 2 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 2:
> >   #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
> >   T __attribute__((noinline))  \
> >   sat_s_add_##T##_fmt_2 (T x, T y) \
> >   {\
> > T sum = (UT)x + (UT)y; \
> >\
> > if ((x ^ y) < 0 || (sum ^ x) >= 0) \
> >   return sum;  \
> >\
> > return x < 0 ? MIN : MAX;  \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y)
> >6   │ {
> >7   │   int8_t sum;
> >8   │   unsigned char x.0_1;
> >9   │   unsigned char y.1_2;
> >   10   │   unsigned char _3;
> >   11   │   signed char _4;
> >   12   │   signed char _5;
> >   13   │   int8_t _6;
> >   14   │   _Bool _11;
> >   15   │   signed char _12;
> >   16   │   signed char _13;
> >   17   │   signed char _14;
> >   18   │   signed char _22;
> >   19   │   signed char _23;
> >   20   │
> >   21   │ ;;   basic block 2, loop depth 0
> >   22   │ ;;pred:   ENTRY
> >   23   │   x.0_1 = (unsigned char) x_7(D);
> >   24   │   y.1_2 = (unsigned char) y_8(D);
> >   25   │   _3 = x.0_1 + y.1_2;
> >   26   │   sum_9 = (int8_t) _3;
> >   27   │   _4 = x_7(D) ^ y_8(D);
> >   28   │   _5 = x_7(D) ^ sum_9;
> >   29   │   _23 = ~_4;
> >   30   │   _22 = _5 & _23;
> >   31   │   if (_22 >= 0)
> >   32   │ goto ; [42.57%]
> >   33   │   else
> >   34   │ goto ; [57.43%]
> >   35   │ ;;succ:   4
> >   36   │ ;;3
> >   37   │
> >   38   │ ;;   basic block 3, loop depth 0
> >   39   │ ;;pred:   2
> >   40   │   _11 = x_7(D) < 0;
> >   41   │   _12 = (signed char) _11;
> >   42   │   _13 = -_12;
> >   43   │   _14 = _13 ^ 127;
> >   44   │ ;;succ:   4
> >   45   │
> >   46   │ ;;   basic block 4, loop depth 0
> >   47   │ ;;pred:   2
> >   48   │ ;;3
> >   49   │   # _6 = PHI 
> >   50   │   return _6;
> >   51   │ ;;succ:   EXIT
> >   52   │
> >   53   │ }
> >
> > After this patch:
> >4   │ __attribute__((noinline))
> >5   │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y)
> >6   │ {
> >7   │   int8_t _6;
> >8   │
> >9   │ ;;   basic block 2, loop depth 0
> >   10   │ ;;pred:   ENTRY
> >   11   │   _6 = .SAT_ADD (x_7(D), y_8(D)); [tail call

Re: [PATCH v2 1/2] RISC-V: Fix ICE caused by early ggc_free on DECL for RVV intrinsics in LTO.

2024-09-10 Thread Richard Biener
On Tue, Sep 10, 2024 at 7:56 AM Jin Ma  wrote:
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.cc 
> (function_builder::add_function):
> Check the final DECl to make sure it is valid.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/bug-10.c: New test.
> ---
>  gcc/config/riscv/riscv-vector-builtins.cc   |  9 +++--
>  .../gcc.target/riscv/rvv/base/bug-10.c  | 17 +
>  2 files changed, 24 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 41730c483ee1..0176670fbdf2 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -79,7 +79,7 @@ public:
>function_instance GTY ((skip)) instance;
>
>/* The decl itself.  */
> -  tree GTY ((skip)) decl;
> +  tree decl;

While this looks obvious, ...

>/* The overload hash of non-overloaded intrinsic is determined by
>   the overload name and argument list. Adding the overload name to
> @@ -3771,7 +3771,6 @@ function_builder::add_function (const function_instance 
> &instance,
>  {
>unsigned int code = vec_safe_length (registered_functions);
>code = (code << RISCV_BUILTIN_SHIFT) + RISCV_BUILTIN_VECTOR;
> -
>/* We need to be able to generate placeholders to ensure that we have a
>   consistent numbering scheme for function codes between the C and C++
>   frontends, so that everything ties up in LTO.
> @@ -3790,6 +3789,12 @@ function_builder::add_function (const 
> function_instance &instance,
> : simulate_builtin_function_decl (input_location, name, 
> fntype,
>   code, NULL, attrs);
>
> +  /* If the code of DECL is ERROR_MARK or invalid code, usually "ggc_freed", 
> we
> + use integer_zero_node instead of it. This will be very helpful for the
> + ggc_free.  */
> +  if (TREE_CODE (decl) == ERROR_MARK || TREE_CODE (decl) >= MAX_TREE_CODES)
> +decl = integer_zero_node;
> +

... this one looks like a hack (note ggc_free only poisons memory when
checking is enabled).
I'm curious why you ever get a ggc_freed decl here.

>registered_function &rfn = *ggc_alloc ();
>rfn.instance = instance;
>rfn.decl = decl;
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
> new file mode 100644
> index ..f685792a2c65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c
> @@ -0,0 +1,17 @@
> +/* Test that we do not have ice when compile */
> +/* { dg-do link } */
> +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O2 -flto"  { target { rv64 
> } } } */
> +/* { dg-options "-march=rv32gcv_zvfh -mabi=ilp32d -O2 -flto"  { target { 
> rv32 } } } */
> +
> + #include 
> +
> +int
> +main ()
> +{
> +  size_t vl = 8;
> +  vint32m1_t vs1 = {};
> +  vint32m1_t vs2 = {};
> +  vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> +
> +  return *(int*)&vd;
> +}
> --
> 2.17.1
>


RE: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

2024-09-10 Thread Li, Pan2
Thanks a lot.

> It's just the number of patterns generated
> is 2^number-of-:c, so it's good to prune known unnecessary combinations.

I see, will make the changes as your suggestion and commit it if no surprise 
from test suites.

> Yes, all commutative binary operators require matching types on their 
> operands.

Got it, will revisit the matching I added before for possible redundant 
checking.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, September 10, 2024 3:02 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer .SAT_ADD

On Tue, Sep 10, 2024 at 1:05 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> >> +   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
> >> +(match (signed_integer_sat_add @0 @1)
> >> + (cond^ (ge (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
> >> + (nop_convert 
> >> @1
> >> +  (bit_not (bit_xor:c @0 @1)))
>
> >You only need one :c on either bit_xor.
>
> Sorry don't get the pointer here. I can understand swap @0 and @1 can also 
> acts on plus op.
> But the first xor with :c would like to allow (@0 @2) and (@2 @0).
>
> Or due to the commutative(xor), swap @0 and @1 also valid for (@1 @2) in the 
> first xor. But
> I failed to get the point how to make the @2 as first arg here.

Hmm, my logic was that there's a canonicalization rule for SSA
operands which is to put
SSA names with higher SSA_NAME_VERSION last.  That means we get the 2nd
bit_xor in a defined order, we don't know the @0 order wrt @2 so we
need to put :c on that.
That should get us all interesting cases plus making sure the @0s match up?

But maybe I'm missing something.  It's just the number of patterns generated
is 2^number-of-:c, so it's good to prune known unnecessary combinations.

> >> +   integer_zerop)
> >> +   @2
> >> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value))
>
> >> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> >> +  && types_match (type, @0, @1
>
> >I think the types_match is redundant as you have the bit_xor combining both.
>
> Got it, does that indicates the bit_xor somehow has the similar type check 
> already? As well as other
> op like and/or ... etc.

Yes, all commutative binary operators require matching types on their operands.

>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, September 9, 2024 8:19 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Match: Support form 2 for scalar signed integer 
> .SAT_ADD
>
> On Tue, Sep 3, 2024 at 2:34 PM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 2 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 2:
> >   #define DEF_SAT_S_ADD_FMT_2(T, UT, MIN, MAX) \
> >   T __attribute__((noinline))  \
> >   sat_s_add_##T##_fmt_2 (T x, T y) \
> >   {\
> > T sum = (UT)x + (UT)y; \
> >\
> > if ((x ^ y) < 0 || (sum ^ x) >= 0) \
> >   return sum;  \
> >\
> > return x < 0 ? MIN : MAX;  \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_2(int8_t, uint8_t, INT8_MIN, INT8_MAX)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int8_t sat_s_add_int8_t_fmt_2 (int8_t x, int8_t y)
> >6   │ {
> >7   │   int8_t sum;
> >8   │   unsigned char x.0_1;
> >9   │   unsigned char y.1_2;
> >   10   │   unsigned char _3;
> >   11   │   signed char _4;
> >   12   │   signed char _5;
> >   13   │   int8_t _6;
> >   14   │   _Bool _11;
> >   15   │   signed char _12;
> >   16   │   signed char _13;
> >   17   │   signed char _14;
> >   18   │   signed char _22;
> >   19   │   signed char _23;
> >   20   │
> >   21   │ ;;   basic block 2, loop depth 0
> >   22   │ ;;pred:   ENTRY
> >   23   │   x.0_1 = (unsigned char) x_7(D);
> >   24   │   y.1_2 = (unsigned char) y_8(D);
> >   25   │   _3 = x.0_1 + y.1_2;
> >   26   │   sum_9 = (int8_t) _3;
> >   27   │   _4 = x_7(D) ^ y_8(D);
> >   28   │   _5 = x_7(D) ^ sum_9;
> >   29   │   _23 = ~_4;
> >   30   │   _22 = _5 & _23;
> >   31   │   if (_22 >= 0)
> >   32   │ goto ; [42.57%]
> >   33   │   else
> >   34   │ goto ; [57.43%]
> >   35   │ ;;succ:   4
> >   36   │ ;;3
> >   37   │
> >   38   │ ;;   basic block 3, loop depth 0
> >   39   │ ;;pred: 

[PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Levy Hsu
Simple testcase fix, ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit 
scan
and removed register checks in spill situations.
---
 .../i386/avx10_2-partial-bf-vector-fma-1.c   | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
index 72e17e99603..8a9096a300a 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
@@ -1,9 +1,13 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx10.2 -O2" } */
-/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 3 
{ target ia32 } } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 3 
{ target ia32 } } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 3 
{ target ia32 } } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 3 
{ target ia32 } } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
 
 typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
 typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));
-- 
2.31.1



Re: [PATCH v2 1/2] RISC-V: Fix ICE caused by early ggc_free on DECL for RVV intrinsics in LTO.

2024-09-10 Thread Jin Ma
> > +  /* If the code of DECL is ERROR_MARK or invalid code, usually 
> > "ggc_freed", we
> > + use integer_zero_node instead of it. This will be very helpful for the
> > + ggc_free.  */
> > +  if (TREE_CODE (decl) == ERROR_MARK || TREE_CODE (decl) >= MAX_TREE_CODES)
> > +decl = integer_zero_node;
> > +
> 
> ... this one looks like a hack (note ggc_free only poisons memory when
> checking is enabled).

Hi, I just built the compiler with "--enable-multilib" from riscv-gnu-toolchain 
source,
and it seems to have "CHECKING_P"/"ENABLE_EXTRA_CHECKING" on by default, 
resulting in
"flag_checking=2".

I haven't located it in detail, so I'm not sure if it is.

> I'm curious why you ever get a ggc_freed decl here.

It seems that the overloaded interface of RVV has been registered repeatedly, 
resulting
in invalid registrations except for the first registration, and these invalid 
registrations
have been ggc_freed. But anyway, I think it is necessary to do a check here. I 
think using
"integer_zero_node" is to meet the needs, although direct return would be 
better.

BR
Jin


Re: [PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Hongtao Liu
On Tue, Sep 10, 2024 at 3:35 PM Levy Hsu  wrote:
>
> Simple testcase fix, ok for trunk?
Ok.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit 
> scan
> and removed register checks in spill situations.
> ---
>  .../i386/avx10_2-partial-bf-vector-fma-1.c   | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c 
> b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> index 72e17e99603..8a9096a300a 100644
> --- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
> @@ -1,9 +1,13 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mavx10.2 -O2" } */
> -/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ 
> \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[
>  \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ 
> \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[
>  \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ 
> \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[
>  \\t\]+#)" 2 } } */
> -/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ 
> \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[
>  \\t\]+#)" 2 } } */
> +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 3 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 3 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 3 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 3 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 
> 2 { target { ! ia32 } } } } */
>
>  typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
>  typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));
> --
> 2.31.1
>


-- 
BR,
Hongtao


[COMMITTED 1/7] ada: Whitespace cleanup in declaration of calendar-related routines

2024-09-10 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup.

gcc/ada/

* libgnat/s-os_lib.ads: Remove extra whitespace.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-os_lib.ads | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnat/s-os_lib.ads b/gcc/ada/libgnat/s-os_lib.ads
index 46e11f708a2..54e7205c3e6 100644
--- a/gcc/ada/libgnat/s-os_lib.ads
+++ b/gcc/ada/libgnat/s-os_lib.ads
@@ -130,12 +130,12 @@ package System.OS_Lib is
--  Returns current local time in the form -MM-DD HH:MM:SS. The result
--  has bounds 1 .. 19.
 
-   function GM_Year(Date : OS_Time) return Year_Type;
-   function GM_Month   (Date : OS_Time) return Month_Type;
-   function GM_Day (Date : OS_Time) return Day_Type;
-   function GM_Hour(Date : OS_Time) return Hour_Type;
-   function GM_Minute  (Date : OS_Time) return Minute_Type;
-   function GM_Second  (Date : OS_Time) return Second_Type;
+   function GM_Year   (Date : OS_Time) return Year_Type;
+   function GM_Month  (Date : OS_Time) return Month_Type;
+   function GM_Day(Date : OS_Time) return Day_Type;
+   function GM_Hour   (Date : OS_Time) return Hour_Type;
+   function GM_Minute (Date : OS_Time) return Minute_Type;
+   function GM_Second (Date : OS_Time) return Second_Type;
--  Functions to extract information from OS_Time value in GMT form
 
procedure GM_Split
-- 
2.43.0



Re: [PATCH v2 1/2] RISC-V: Fix ICE caused by early ggc_free on DECL for RVV intrinsics in LTO.

2024-09-10 Thread Richard Biener
On Tue, Sep 10, 2024 at 9:38 AM Jin Ma  wrote:
>
> > > +  /* If the code of DECL is ERROR_MARK or invalid code, usually 
> > > "ggc_freed", we
> > > + use integer_zero_node instead of it. This will be very helpful for 
> > > the
> > > + ggc_free.  */
> > > +  if (TREE_CODE (decl) == ERROR_MARK || TREE_CODE (decl) >= 
> > > MAX_TREE_CODES)
> > > +decl = integer_zero_node;
> > > +
> >
> > ... this one looks like a hack (note ggc_free only poisons memory when
> > checking is enabled).
>
> Hi, I just built the compiler with "--enable-multilib" from 
> riscv-gnu-toolchain source,
> and it seems to have "CHECKING_P"/"ENABLE_EXTRA_CHECKING" on by default, 
> resulting in
> "flag_checking=2".
>
> I haven't located it in detail, so I'm not sure if it is.

During development checking is enabled by default
(--enable-checking=yes), but once
released the compiler defaults to --enable-checking=release.

> > I'm curious why you ever get a ggc_freed decl here.
>
> It seems that the overloaded interface of RVV has been registered repeatedly, 
> resulting
> in invalid registrations except for the first registration, and these invalid 
> registrations
> have been ggc_freed. But anyway, I think it is necessary to do a check here. 
> I think using
> "integer_zero_node" is to meet the needs, although direct return would be 
> better.

But there isn't any way to check whether 'decl' has been freed ...
just make sure it isn't - you
should not even have a reference to it.

Richard.

> BR
> Jin


[COMMITTED 3/7] ada: Evaluate calls to GNAT.Source_Info routines in semantic checking

2024-09-10 Thread Marc Poulhiès
From: Piotr Trojanek 

When semantic checking mode is active, i.e. when switch -gnatc is
present or when the frontend is operating in the GNATprove mode,
we now rewrite calls to GNAT.Source_Info routines in evaluation
and not expansion (which is disabled in these modes).

This is needed to recognize constants initialized with calls to
GNAT.Source_Info as static constants, regardless of expansion being
enabled.

gcc/ada/

* exp_intr.ads, exp_intr.adb (Expand_Source_Info): Move
declaration to package spec.
* sem_eval.adb (Eval_Intrinsic_Call): Evaluate calls to
GNAT.Source_Info where possible.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_intr.adb |  6 --
 gcc/ada/exp_intr.ads |  5 +
 gcc/ada/sem_eval.adb | 37 ++---
 3 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/exp_intr.adb b/gcc/ada/exp_intr.adb
index a076eb0eeb6..0db0a66ab1c 100644
--- a/gcc/ada/exp_intr.adb
+++ b/gcc/ada/exp_intr.adb
@@ -109,12 +109,6 @@ package body Exp_Intr is
--  Expand a call to corresponding function, declared in an instance of
--  System.Address_To_Access_Conversions.
 
-   procedure Expand_Source_Info (N : Node_Id; Nam : Name_Id);
-   --  Rewrite the node as the appropriate string literal or positive
-   --  constant. Nam is the name of one of the intrinsics declared in
-   --  GNAT.Source_Info; see g-souinf.ads for documentation of these
-   --  intrinsics.
-
-
-- Add_Source_Info --
-
diff --git a/gcc/ada/exp_intr.ads b/gcc/ada/exp_intr.ads
index 699d1c8164f..75f24bf5495 100644
--- a/gcc/ada/exp_intr.ads
+++ b/gcc/ada/exp_intr.ads
@@ -39,6 +39,11 @@ package Exp_Intr is
--  documentation of these intrinsics. Loc is passed to provide location
--  information where it is needed.
 
+   procedure Expand_Source_Info (N : Node_Id; Nam : Name_Id);
+   --  Rewrite the node as the appropriate string literal or positive constant.
+   --  Nam is the name of one of the intrinsics declared in GNAT.Source_Info;
+   --  see g-souinf.ads for documentation of these intrinsics.
+
procedure Expand_Intrinsic_Call (N : Node_Id; E : Entity_Id);
--  N is either a function call node, a procedure call statement node, or
--  an operator where the corresponding subprogram is intrinsic (i.e. was
diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb
index aaf0a766dc3..de3f35e9a61 100644
--- a/gcc/ada/sem_eval.adb
+++ b/gcc/ada/sem_eval.adb
@@ -33,6 +33,7 @@ with Einfo.Utils;use Einfo.Utils;
 with Elists; use Elists;
 with Errout; use Errout;
 with Eval_Fat;   use Eval_Fat;
+with Exp_Intr;   use Exp_Intr;
 with Exp_Util;   use Exp_Util;
 with Freeze; use Freeze;
 with Lib;use Lib;
@@ -191,7 +192,7 @@ package body Sem_Eval is
--  (it is an error to make the call if these conditions are not met).
 
procedure Eval_Intrinsic_Call (N : Node_Id; E : Entity_Id);
-   --  Evaluate a call N to an intrinsic subprogram E.
+   --  Evaluate a call N to an intrinsic subprogram E
 
function Find_Universal_Operator_Type (N : Node_Id) return Entity_Id;
--  Check whether an arithmetic operation with universal operands which is a
@@ -2888,13 +2889,43 @@ package body Sem_Eval is
   end if;
 
   case Nam is
- when Name_Shift_Left  =>
+
+ --  Compilation date and time are the same for the entire compilation
+ --  unit, so we can replace them with static strings.
+
+ when Name_Compilation_ISO_Date
+| Name_Compilation_Date
+| Name_Compilation_Time
+ =>
+Expand_Source_Info (N, Nam);
+
+ --  Calls to other intrinsics from the GNAT.Source_Info package give
+ --  different results, depending on where they occur. In particular,
+ --  for generics their results depend on where those generics are
+ --  instantiated; same for default values of subprogram parameters.
+ --  Those calls will behave as nonstatic, and we postpone their
+ --  rewriting until expansion.
+
+ when Name_Enclosing_Entity
+| Name_File
+| Name_Line
+| Name_Source_Location
+ =>
+if Inside_A_Generic
+  or else In_Spec_Expression
+then
+   null;
+else
+   Expand_Source_Info (N, Nam);
+end if;
+
+ when Name_Shift_Left =>
 Eval_Shift (N, E, N_Op_Shift_Left);
  when Name_Shift_Right =>
 Eval_Shift (N, E, N_Op_Shift_Right);
  when Name_Shift_Right_Arithmetic =>
 Eval_Shift (N, E, N_Op_Shift_Right_Arithmetic);
- when others   =>
+ when others =>
 null;
   end case;
end Eval_Intrinsic_Call;
-- 
2.43.0



[COMMITTED 7/7] ada: Include missing associated header file

2024-09-10 Thread Marc Poulhiès
From: Eric Botcazou 

memmodel.h must be included alongside tm_p.h for the sake of the SPARC port.

gcc/ada/

* gcc-interface/misc.cc: Include memmodel.h before tm_p.h.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/misc.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index 645937f796c..2aa1bfd410b 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -28,6 +28,7 @@
 #include "coretypes.h"
 #include "target.h"
 #include "tree.h"
+#include "memmodel.h"
 #include "tm_p.h"
 #include "diagnostic.h"
 #include "opts.h"
-- 
2.43.0



[COMMITTED 2/7] ada: Simplify code for inserting checks into expressions

2024-09-10 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* checks.adb (Remove_Checks): Combine CASE alternatives.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 34 --
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 5d7f4cca70a..57307c3da1b 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -9712,10 +9712,6 @@ package body Checks is
  Set_Do_Range_Check (N, False);
 
  case Nkind (N) is
-when N_And_Then =>
-   Traverse (Left_Opnd (N));
-   return Skip;
-
 when N_Attribute_Reference =>
Set_Do_Overflow_Check (N, False);
 
@@ -9723,35 +9719,29 @@ package body Checks is
Set_Do_Overflow_Check (N, False);
 
case Nkind (N) is
-  when N_Op_Divide =>
- Set_Do_Division_Check (N, False);
-
-  when N_Op_And =>
- Set_Do_Length_Check (N, False);
-
-  when N_Op_Mod =>
- Set_Do_Division_Check (N, False);
-
-  when N_Op_Or =>
- Set_Do_Length_Check (N, False);
-
-  when N_Op_Rem =>
+  when N_Op_Divide
+ | N_Op_Mod
+ | N_Op_Rem
+  =>
  Set_Do_Division_Check (N, False);
 
-  when N_Op_Xor =>
+  when N_Op_And
+ | N_Op_Or
+ | N_Op_Xor
+  =>
  Set_Do_Length_Check (N, False);
 
   when others =>
  null;
end case;
 
-when N_Or_Else =>
-   Traverse (Left_Opnd (N));
-   return Skip;
-
 when N_Selected_Component =>
Set_Do_Discriminant_Check (N, False);
 
+when N_Short_Circuit =>
+   Traverse (Left_Opnd (N));
+   return Skip;
+
 when N_Type_Conversion =>
Set_Do_Length_Check   (N, False);
Set_Do_Overflow_Check (N, False);
-- 
2.43.0



[COMMITTED 4/7] ada: Normalize span generation on different platforms

2024-09-10 Thread Marc Poulhiès
From: Viljar Indus 

The total number of characters on a source code line
is different on Windows and Linux based systems
(CRLF vs LF endings). Use the last non line change
character to adjust printing the spans that go over
the end of line.

gcc/ada/

* diagnostics-pretty_emitter.adb (Get_Last_Line_Char): New. Get
the last non line change character. Write_Span_Labels use the
adjusted line end pointer to calculate the length of the span.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/diagnostics-pretty_emitter.adb | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/diagnostics-pretty_emitter.adb 
b/gcc/ada/diagnostics-pretty_emitter.adb
index 927e50578e9..389be8a6533 100644
--- a/gcc/ada/diagnostics-pretty_emitter.adb
+++ b/gcc/ada/diagnostics-pretty_emitter.adb
@@ -165,7 +165,7 @@ package body Diagnostics.Pretty_Emitter is
function Get_Line_End
   (Buf : Source_Buffer_Ptr;
Loc : Source_Ptr) return Source_Ptr;
-   --  Get the source location for the end of the line in Buf for Loc. If
+   --  Get the source location for the end of the line (LF) in Buf for Loc. If
--  Loc is past the end of Buf already, return Buf'Last.
 
function Get_Line_Start
@@ -177,6 +177,10 @@ package body Diagnostics.Pretty_Emitter is
  (Buf : Source_Buffer_Ptr; Loc : Source_Ptr) return Source_Ptr;
--  Get first non-space character in the line containing Loc
 
+   function Get_Last_Line_Char
+ (Buf : Source_Buffer_Ptr; Loc : Source_Ptr) return Source_Ptr;
+   --  Get last non line end [LF, CR] character in the line containing Loc
+
function Image (X : Positive; Width : Positive) return String;
--  Output number X over Width characters, with whitespace padding.
--  Only output the low-order Width digits of X, if X is larger than
@@ -314,6 +318,25 @@ package body Diagnostics.Pretty_Emitter is
   return Cur_Loc;
end Get_First_Line_Char;
 
+   
+   -- Get_Last_Line_Char --
+   
+
+   function Get_Last_Line_Char
+ (Buf : Source_Buffer_Ptr; Loc : Source_Ptr) return Source_Ptr
+   is
+  Cur_Loc : Source_Ptr := Get_Line_End (Buf, Loc);
+   begin
+
+  while Cur_Loc > Buf'First
+and then Buf (Cur_Loc) in ASCII.LF | ASCII.CR
+  loop
+ Cur_Loc := Cur_Loc - 1;
+  end loop;
+
+  return Cur_Loc;
+   end Get_Last_Line_Char;
+
---
-- Image --
---
@@ -720,8 +743,9 @@ package body Diagnostics.Pretty_Emitter is
 Source_Text (Get_Source_File_Index (L.First));
 
   Col_L_Fst : constant Natural := Natural
- (Get_Column_Number (Get_First_Line_Char (Buf, L.First)));
-  Col_L_Lst : constant Natural := Natural (Get_Column_Number (L.Last));
+(Get_Column_Number (Get_First_Line_Char (Buf, L.First)));
+  Col_L_Lst : constant Natural := Natural
+(Get_Column_Number (Get_Last_Line_Char (Buf, L.Last)));
 
   --  Carret positions
   Ptr  : constant Source_Ptr := Loc.Span.Ptr;
-- 
2.43.0



[COMMITTED 5/7] ada: First controlling parameter: report error without Extensions allowed

2024-09-10 Thread Marc Poulhiès
From: Javier Miranda 

Enable reporting an error when this new aspect/pragma is set to
True, and the sources are compiled without language extensions
allowed.

gcc/ada/

* sem_ch13.adb (Analyze_One_Aspect): Call
Error_Msg_GNAT_Extension() to report an error when the aspect
First_Controlling_Parameter is set to True and the sources are
compiled without Core_Extensions_ Allowed.
* sem_prag.adb (Pragma_First_Controlling_Parameter): Call
subprogram Error_Msg_GNAT_Extension() to report an error when the
aspect First_Controlling_Parameter is set to True and the sources
are compiled without Core_Extensions_Allowed. Report an error when
the aspect pragma does not confirm an inherited True value.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 28 ++-
 gcc/ada/sem_prag.adb | 53 +++-
 2 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index ab8cc1012c3..0770bafd231 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -4530,6 +4530,9 @@ package body Sem_Ch13 is
 if (No (Expr) or else Entity (Expr) = Standard_True)
   and then not Core_Extensions_Allowed
 then
+   Error_Msg_GNAT_Extension
+ ("'First_'Controlling_'Parameter", Sloc (Aspect),
+  Is_Core_Extension => True);
goto Continue;
 end if;
 
@@ -4545,19 +4548,24 @@ package body Sem_Ch13 is
goto Continue;
 end if;
 
---  If the aspect is specified for a derived type, the
---  specified value shall be confirming.
-
 if Present (Expr)
-  and then Is_Derived_Type (E)
-  and then
-Has_First_Controlling_Parameter_Aspect (Etype (E))
   and then Entity (Expr) = Standard_False
 then
-   Error_Msg_Name_1 := Nam;
-   Error_Msg_N
- ("specification of inherited aspect% can only "
-   & "confirm parent value", Id);
+   --  If the aspect is specified for a derived type,
+   --  the specified value shall be confirming.
+
+   if Is_Derived_Type (E)
+ and then Has_First_Controlling_Parameter_Aspect
+(Etype (E))
+   then
+  Error_Msg_Name_1 := Nam;
+  Error_Msg_N
+("specification of inherited True value for "
+   & "aspect% can only confirm parent value",
+ Id);
+   end if;
+
+   goto Continue;
 end if;
 
 --  Given that the aspect has been explicitly given,
diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index b139bd4cf4e..2d31c71f366 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -17761,22 +17761,55 @@ package body Sem_Prag is
  
 
  when Pragma_First_Controlling_Parameter => First_Ctrl_Param : declare
-Arg : Node_Id;
-E   : Entity_Id := Empty;
+Arg  : Node_Id;
+E: Entity_Id := Empty;
+Expr : Node_Id := Empty;
 
  begin
-if not Core_Extensions_Allowed then
-   return;
-end if;
-
 GNAT_Pragma;
-Check_Arg_Count (1);
+Check_At_Least_N_Arguments (1);
+Check_At_Most_N_Arguments  (2);
 
 Arg := Get_Pragma_Arg (Arg1);
+Check_Arg_Is_Identifier (Arg);
 
-if Nkind (Arg) = N_Identifier then
-   Analyze (Arg);
-   E := Entity (Arg);
+Analyze (Arg);
+E := Entity (Arg);
+
+if Present (Arg2) then
+   Check_Arg_Is_OK_Static_Expression (Arg2, Standard_Boolean);
+   Expr := Get_Pragma_Arg (Arg2);
+   Analyze_And_Resolve (Expr, Standard_Boolean);
+end if;
+
+if not Core_Extensions_Allowed then
+   if No (Expr)
+ or else
+   (Present (Expr)
+  and then Is_Entity_Name (Expr)
+  and then Entity (Expr) = Standard_True)
+   then
+  Error_Msg_GNAT_Extension
+("'First_'Controlling_'Parameter", Sloc (N),
+   

[COMMITTED 6/7] ada: Use the same warning character in continuations

2024-09-10 Thread Marc Poulhiès
From: Viljar Indus 

gcc/ada/

* gcc-interface/decl.cc: Use same warning characters in
continuation messages.
* gcc-interface/trans.cc: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc  |  8 
 gcc/ada/gcc-interface/trans.cc | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 655ba0b8a10..4252e627b0c 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -1430,7 +1430,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
post_error
  ("??too large object cannot be allocated statically",
   gnat_entity);
-   post_error ("\\?dynamic allocation will be used 
instead",
+   post_error ("\\??dynamic allocation will be used 
instead",
gnat_entity);
  }
 
@@ -6565,7 +6565,7 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool 
definition,
("??cannot import type-generic 'G'C'C builtin!",
 gnat_subprog);
  post_error
-   ("\\?use a supported result type",
+   ("\\??use a supported result type",
 gnat_subprog);
  gnu_builtin_decl = NULL_TREE;
}
@@ -6587,7 +6587,7 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool 
definition,
("??cannot import type-generic 'G'C'C builtin!",
 gnat_subprog);
  post_error
-   ("\\?use a supported second parameter type",
+   ("\\??use a supported second parameter type",
 gnat_subprog);
  gnu_builtin_decl = NULL_TREE;
}
@@ -6608,7 +6608,7 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool 
definition,
("??cannot import type-generic 'G'C'C builtin!",
 gnat_subprog);
  post_error
-   ("\\?use a supported third parameter type",
+   ("\\??use a supported third parameter type",
 gnat_subprog);
  gnu_builtin_decl = NULL_TREE;
}
diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 92e000686fb..710907bb999 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4949,10 +4949,10 @@ Call_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, tree gnu_target,
("unchecked conversion implemented by copy??",
 gnat_actual);
  post_error
-   ("\\?use pragma Universal_Aliasing on either type",
+   ("\\??use pragma Universal_Aliasing on either type",
 gnat_actual);
  post_error
-   ("\\?to enable RM 13.9(12) implementation permission",
+   ("\\??to enable RM 13.9(12) implementation permission",
 gnat_actual);
}
 
@@ -4962,10 +4962,10 @@ Call_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, tree gnu_target,
("value conversion implemented by copy??",
 gnat_actual);
  post_error
-   ("\\?use pair of types with same root type",
+   ("\\??use pair of types with same root type",
 gnat_actual);
  post_error
-   ("\\?to avoid new object in RM 4.6(58.5/5)",
+   ("\\??to avoid new object in RM 4.6(58.5/5)",
 gnat_actual);
}
}
@@ -10644,9 +10644,9 @@ validate_unchecked_conversion (Node_Id gnat_node)
{
  post_error_ne ("??possible aliasing problem for type&",
 gnat_node, Target_Type (gnat_node));
- post_error ("\\?use -fno-strict-aliasing switch for references",
+ post_error ("\\??use -fno-strict-aliasing switch for references",
  gnat_node);
- post_error_ne ("\\?or use `pragma No_Strict_Aliasing (&);`",
+ post_error_ne ("\\??or use `pragma No_Strict_Aliasing (&);`",
 gnat_node, Target_Type (gnat_node));
}
 }
@@ -10670,7 +10670,7 @@ validate_unchecked_conversion (Node_Id gnat_node)
{
  post_error_ne ("??possible aliasing problem for type&",
 gnat_node, Target_Type (gnat_node));
- post_error ("\\?use -fno-strict-aliasing switch for references",
+ post_error ("\\??use -fno-st

[PATCH] Enable tune fuse_move_and_alu for GNR/GNR-D.

2024-09-10 Thread liuhongt
According to Intel Software Optimization Manual[1], the Redwood cove
microarchitecture supports LD+OP and MOV+OP macro fusions.

The patch enables MOV+OP tune for GNR.

[1] 
https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_FUSE_MOV_AND_ALU): Enable
for GNR and GNR-D.
---
 gcc/config/i386/x86-tune.def | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index d7e2ad7fd25..3d123da95f0 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -153,7 +153,8 @@ DEF_TUNE (X86_TUNE_FUSE_ALU_AND_BRANCH, 
"fuse_alu_and_branch",
 /* X86_TUNE_FUSE_MOV_AND_ALU: mov and alu in case mov is reg-reg mov
and the destination is used by alu.  alu must be one of
ADD, ADC, AND, XOR, OR, SUB, SBB, INC, DEC, NOT, SAL, SHL, SHR, SAR.  */
-DEF_TUNE (X86_TUNE_FUSE_MOV_AND_ALU, "fuse_mov_and_alu", m_ZNVER5)
+DEF_TUNE (X86_TUNE_FUSE_MOV_AND_ALU, "fuse_mov_and_alu",
+m_ZNVER5 | m_GRANITERAPIDS | m_GRANITERAPIDS_D)
 
 /*/
 /* Function prologue, epilogue and function calling sequences.   */
-- 
2.31.1



Re: [PATCH] ada: Fix gcc-interface/misc.cc compilation on SPARC

2024-09-10 Thread Marc Poulhiès
Eric Botcazou  writes:

>> commit 72c6938f29cbeddb3220720e68add4cf09ffd794
>> Author: Eric Botcazou 
>> Date:   Sun Aug 25 15:20:59 2024 +0200
>>
>> ada: Streamline handling of low-level peculiarities of record field
>> layout
>>
>> broke the Ada build on SPARC:
>>
>> In file included from ./tm_p.h:4,
>>  from
>> /vol/gcc/src/hg/master/local/gcc/ada/gcc-interface/misc.cc:31:
>> /vol/gcc/src/hg/master/local/gcc/config/sparc/sparc-protos.h:46:47: error:
>> use of enum ‘memmodel’ without previous declaration 46 | extern void
>> sparc_emit_membar_for_model (enum memmodel, int, int);
>>   |   ^~~~
>>
>> Fixed by including memmodel.h.
>
> Sorry about that, a small merge glitch, the fix is already in the pipeline.

Hello,

Éric's fix has been merged this morning as r15-3568-g73dc46f47be615.

Marc


Re: [PATCH] MIPS: Add some floating point instructions support for MIPSr6

2024-09-10 Thread 梅杰
在 2024/9/3 08:30, YunQiang Su 写道:
> Jie Mei  于2024年7月26日周五 14:50写道:
>>
>> This patch adds some floating point instructiions from mips32r6,
>> for instance, MINA/MAXA.fmt, RINT.fmt, CLASS.fmt etc.
>>
>> Also add built-in functions to MIPSr6 to better handle tests
>> for MIPSr6.
>>
>> gcc/ChangeLog:
>>
>> * config/mips/i6400.md (i6400_fpu_minmax): Include
>> fclass type.
>> (i6400_fpu_fadd): Include frint type.
>> * config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry
>> for __builtin_mipsr6_xxx.
> 
> Since libc has fmaxmag/fminmag, maybe we should use __builtin_fmaxmag.
> And `rint/rintf` exist in libc.
> 
> 

There is no such built-in function `__builtin_fmaxmag` in gcc, as the 
only file that mentioned `fmaxmag/fminmag` in the code tree 
`fortan/trans-intrinsic.cc` stated:

> IEEE_MIN_NUM_MAG and IEEE_MAX_NUM_MAG translate to C 
> fminmag() and fmaxmag(), which do not exist as built-ins.

As for the function `__builtin_rint`, although it exists, however, after 
defining the instruction in `mips.md`, GCC still won't generate `RINT.fmt` 
instruction for MIPS, it generates following code instead:

>   lui $28,%hi(__gnu_local_gp)
>   addiu   $28,$28,%lo(__gnu_local_gp)
>   lw  $25,%call16(rint)($28)
>   .reloc  1f,R_MIPS_JALR,rint

This patch tries to fix said problem with an extra built-in function
like the MSA solution for `FRINT.fmt`.

>> (MIPSR6_BUILTIN_PURE): Same as above.
>> (CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d)
>> (CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d)
>> (CODE_FOR_mipsr6_rint_s, CODE_FOR_mipsr6_rint_d)
>> (CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d):
>> New code_aliasing macros.
>> (mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s,
>> max_a_d, rint_s, rint_d, class_s, class_d builtins.
>> * config/mips/mips.h (ISA_HAS_FRINT): Define a new macro.
>> (ISA_HAS_FCLASS): Same as above.
>> * config/mips/mips.md (UNSPEC_FRINT): New unspec.
>> (UNSPEC_FCLASS): Same as above.
>> (type): Add frint and fclass.
>> (fmin_a_): Generates MINA.fmt instructions.
>> (fmax_a_): Generates MAXA.fmt instructions.
>> (frint_): Generates RINT.fmt instructions.
>> (fclass_): Generates CLASS.fmt instructions.
>> * config/mips/p6600.md (p6600_fpu_fadd): Include
>> frint type.
>> (p6600_fpu_fabs): Incled fclass type.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/mips/mips-class.c: New tests for MIPSr6
>> * gcc.target/mips/mips-minamaxa.c: Same as above.
>> * gcc.target/mips/mips-rint.c: Same as above.
>> ---
>>  gcc/config/mips/i6400.md  |  8 +--
>>  gcc/config/mips/mips.cc   | 28 ++
>>  gcc/config/mips/mips.h|  4 ++
>>  gcc/config/mips/mips.md   | 52 ++-
>>  gcc/config/mips/p6600.md  |  8 +--
>>  gcc/testsuite/gcc.target/mips/mips-class.c| 17 ++
>>  gcc/testsuite/gcc.target/mips/mips-minamaxa.c | 31 +++
>>  gcc/testsuite/gcc.target/mips/mips-rint.c | 17 ++
>>  8 files changed, 155 insertions(+), 10 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/mips/mips-class.c
>>  create mode 100644 gcc/testsuite/gcc.target/mips/mips-minamaxa.c
>>  create mode 100644 gcc/testsuite/gcc.target/mips/mips-rint.c
>>
>> diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md
>> index d6f691ee217..48ce980e1c2 100644
>> --- a/gcc/config/mips/i6400.md
>> +++ b/gcc/config/mips/i6400.md
>> @@ -219,16 +219,16 @@
>> (eq_attr "type" "fabs,fneg,fmove"))
>>"i6400_fpu_short, i6400_fpu_apu")
>>
>> -;; min, max
>> +;; min, max, fclass
>>  (define_insn_reservation "i6400_fpu_minmax" 2
>>(and (eq_attr "cpu" "i6400")
>> -   (eq_attr "type" "fminmax"))
>> +   (eq_attr "type" "fminmax,fclass"))
>>"i6400_fpu_short+i6400_fpu_logic")
>>
>> -;; fadd, fsub, fcvt
>> +;; fadd, fsub, fcvt, frint
>>  (define_insn_reservation "i6400_fpu_fadd" 4
>>(and (eq_attr "cpu" "i6400")
>> -   (eq_attr "type" "fadd,fcvt"))
>> +   (eq_attr "type" "fadd,fcvt,frint"))
>>"i6400_fpu_long, i6400_fpu_apu")
>>
>>  ;; fmul
>> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
>> index 6c797b62164..14a1f23eb70 100644
>> --- a/gcc/config/mips/mips.cc
>> +++ b/gcc/config/mips/mips.cc
>> @@ -15775,6 +15775,7 @@ AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && 
>> TARGET_DSPR2)
>>  AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI)
>>  AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
>>  AVAIL_NON_MIPS16 (msa, TARGET_MSA)
>> +AVAIL_NON_MIPS16 (r6, mips_isa_rev >= 6)
>>
>>  /* Construct a mips_builtin_description from the given arguments.
>>
>> @@ -15940,6 +15941,14 @@ AVAIL_NON_MIPS16 (msa, TARGET_MSA)
>>  "__builtin_msa_" #INSN,  MIPS_BUILTIN_DIRECT_NO_TARGET,\
>>  

Re: [PATCH] MIPS: Add some floating point instructions support for MIPSr6

2024-09-10 Thread Xi Ruoyao
On Tue, 2024-09-10 at 16:50 +0800, 梅杰 wrote:
> As for the function `__builtin_rint`, although it exists, however, after 
> defining the instruction in `mips.md`, GCC still won't generate `RINT.fmt` 
> instruction for MIPS, it generates following code instead:
> 
> > lui $28,%hi(__gnu_local_gp)
> > addiu   $28,$28,%lo(__gnu_local_gp)
> > lw  $25,%call16(rint)($28)
> > .reloc  1f,R_MIPS_JALR,rint

Why?

Whis this:

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index f147667d63a..0c1ef77a816 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -100,6 +100,7 @@ (define_c_enum "unspec" [
   ;; Floating-point unspecs.
   UNSPEC_FMIN
   UNSPEC_FMAX
+  UNSPEC_RINT
 
   ;; HI/LO moves.
   UNSPEC_MFHI
@@ -8025,6 +8026,14 @@ (define_peephole2
   (any_extend:SI (match_dup 3)))])]
   "")
 
+(define_insn "rint2"
+  [(set (match_operand:SCALARF 0 "register_operand" "=f")
+   (unspec:SCALARF [(match_operand:SCALARF 1 "register_operand" " f")]
+   UNSPEC_RINT))]
+  "mips_isa_rev >= 6"
+  "rint.\t%0,%1")
+
+
 

 ;; Synchronization instructions.

it works for me:

$ cat t.c
double test(double x)
{
return __builtin_rint(x);
}
$ ./gcc/cc1 t.c -nostdinc -O2 -mips64r6 -mexplicit-relocs -mabi=64
$ grep rint t.s
rint.d  $f0,$f12

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-10 Thread Andrew Stubbs

On 06/09/2024 09:47, Robin Dapp wrote:

So we only found two instances of this problem and both were related to
_Bools.  In case you have more cases, it would be greatly appreciated
to verify the series with them.  If you don't mind, would it be possible
to comment out the zeroing, re-run the testsuite and check for FAILs?


I looked it up, and it was an execution failure in testcase
gfortran.dg/assumed_rank_1.f90 that prompted me to add the initialization.


Ah, I saw that one as well here.  Thanks, will have a look locally.



I ran the tests with the initialization removed, but I'm not too sure 
what to make of the results. There are 3 regressions and 8 progressions, 
but there's no consistency across the different devices (I tested 
gfx1100, gfx908, and gfx90a).


I'm going to rerun the tests and see if it does the same again, or if 
there's some randomness.


Andrew


[PATCH] tree-optimization/116658 - latent issue in vect_is_slp_load_node

2024-09-10 Thread Richard Biener
Permute nodes do not have a representative so we have to guard
vect_is_slp_load_node against those.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116658
* tree-vect-slp.cc (vect_is_slp_load_node): Make sure
node isn't a permute.

* g++.dg/vect/pr116658.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr116658.cc | 58 +++
 gcc/tree-vect-slp.cc  |  7 ++--
 2 files changed, 62 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr116658.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr116658.cc 
b/gcc/testsuite/g++.dg/vect/pr116658.cc
new file mode 100644
index 000..c3ff23a2b60
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116658.cc
@@ -0,0 +1,58 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-O3" }
+// { dg-additional-options "-mavx512f" { target avx512f } }
+
+struct bb {
+  bb operator+=(bb bc) {
+  bd[0] += bc.bd[0];
+return *this;
+  }
+  bb operator-=(bb bc) {
+  bd[0] -= bc.bd[0];
+return *this;
+  }
+  bb operator*=(double be) {
+  bd[0] *= be;
+return *this;
+  }
+  double bd[1];
+};
+
+bb operator+(bb n, bb v) {
+  bb bf = n;
+  return bf += v;
+}
+
+bb operator-(bb n, bb v) {
+  bb bf = n;
+  return bf -= v;
+}
+bb operator*(double n, bb v) {
+  bb bf = v;
+  return bf *= n;
+}
+
+using az = bb;
+struct cc {
+  void apply(bb *ci) {
+  bb xm[1];
+  for (int cm = 0; cm < 2; ++cm) {
+az cn, co = cv[cm] * xm[0];
+ci[cm] = cn + co;
+ci[-1] = cn - co;
+  }
+  }
+  double *cu;
+  double *cv;
+};
+void dc(unsigned de, int di, az *dk, az *dl, cc dh) {
+  for (int c; c < 1024; ++c) {
+if (de & 1)
+  dh.apply(dk);
+if (de & 2)
+  dh.apply(dl);
+dk += di;
+dl += di;
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0fb17340bd3..31c7e20f8c9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3265,9 +3265,10 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned 
int group_size)
 static inline bool
 vect_is_slp_load_node  (slp_tree root)
 {
-  return SLP_TREE_DEF_TYPE (root) == vect_internal_def
-&& STMT_VINFO_GROUPED_ACCESS (SLP_TREE_REPRESENTATIVE (root))
-&& DR_IS_READ (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (root)));
+  return (SLP_TREE_CODE (root) != VEC_PERM_EXPR
+ && SLP_TREE_DEF_TYPE (root) == vect_internal_def
+ && STMT_VINFO_GROUPED_ACCESS (SLP_TREE_REPRESENTATIVE (root))
+ && DR_IS_READ (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (root;
 }
 
 
-- 
2.43.0


Re: [PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-09-10 Thread Alex Coplan
On 27/08/2024 10:55, Alex Coplan wrote:
> Hi,
> 
> This is a v3 that hopefully addresses the feedback from both Jason and
> Jakub.  v2 was posted here:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html

Gentle ping on this C++ patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661559.html

Jason, are you OK with this approach, or would you prefer to not make the
INTEGER_CST assumption and do something along the lines of your last suggestion
instead:

> Perhaps we want a recompute_expr_flags like the existing 
> recompute_constructor_flags, so we don't need to duplicate PROCESS_ARG 
> logic elsewhere.

?  Sorry, I'd missed that reply when I wrote the v3 patch.

Thanks,
Alex

> 
> (Sorry for the delay in posting the re-spin, I was away last week.)
> 
> In this version we refactor to introudce a helper class (annotate_saver)
> which is much less invasive to the caller (maybe_convert_cond) and
> should (at least in theory) be reuseable elsewhere.
> 
> This version also relies on the assumption that operands 1 and 2 of
> ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
> without having to rely on assumptions about the specific changes made
> in maybe_convert_cond.
> 
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> 
> Thanks,
> Alex
> 
> -- >8 --
> 
> For the testcase added with this patch, we would end up losing the:
> 
>   #pragma GCC unroll 4
> 
> and emitting "warning: ignoring loop annotation".  That warning comes
> from tree-cfg.cc:replace_loop_annotate, and means that we failed to
> process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
> That function walks backwards over the GIMPLE in an exiting BB for a
> loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
> immediately preceding the gcond.
> 
> The function documents the following pre-condition:
> 
>/* [...] We assume that the annotations come immediately before the
>   condition in BB, if any.  */
> 
> now looking at the exiting BB of the loop, we have:
> 
>:
>   D.4524 = .ANNOTATE (iftmp.1, 1, 4);
>   retval.0 = D.4524;
>   if (retval.0 != 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
> and crucially there is an intervening assignment between the gcond and
> the preceding .ANNOTATE ifn call.  To see where this comes from, we can
> look to the IR given by -fdump-tree-original:
> 
>   if (< int*)operator() (&pred, *first), unroll 4>>>)
> goto ;
>   else
> goto ;
> 
> here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
> ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
> expression in the condition.
> 
> The CLEANUP_POINT_EXPR gets added by the following call chain:
> 
> finish_while_stmt_cond
>  -> maybe_convert_cond
>  -> condition_conversion
>  -> fold_build_cleanup_point_expr
> 
> this patch chooses to fix the issue by first introducing a new helper
> class (annotate_saver) to save and restore outer chains of
> ANNOTATE_EXPRs and then using it in maybe_convert_cond.
> 
> With this patch, we don't get any such warning and the loop gets unrolled as
> expected at -O2.
> 
> gcc/cp/ChangeLog:
> 
> PR libstdc++/116140
> * semantics.cc (anotate_saver): New. Use it ...
> (maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
> remain the outermost expression(s) of the condition.
> 
> gcc/testsuite/ChangeLog:
> 
> PR libstdc++/116140
> * g++.dg/ext/pragma-unroll-lambda.C: New test.

> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 5ab2076b673..b1a49b14238 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
> nested_p,
>  }
>  }
>  
> +/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, 
> users
> +   can construct one of these like so:
> +
> + annotate_saver s (&t);
> +
> +   and t will be updated to have any annotations removed.  The user can then
> +   transform t, and later restore the ANNOTATE_EXPRs with:
> +
> + t = s.restore (t).
> +
> +   The intent is to ensure that any ANNOTATE_EXPRs remain the outermost
> +   expressions following any operations on t.  */
> +
> +class annotate_saver {
> +  /* The chain of saved annotations, if there were any.  Otherwise null.  */
> +  tree m_annotations;
> +
> +  /* If M_ANNOTATIONS is non-null, then M_INNER points to TREE_OPERAND (A, 0)
> + for the innermost annotation A.  */
> +  tree *m_inner;
> +
> +public:
> +  annotate_saver (tree *);
> +  tree restore (tree);
> +};
> +
> +/* If *COND is an ANNOTATE_EXPR, walk through the chain of annotations, and 
> set
> +   *COND equal to the first non-ANNOTATE_EXPR (saving a pointer to the
> +   original chain of annotations for later use in restore).  */
> +
> +annotate_saver::annotate_saver (tree *cond) : m_annotations (nullptr)
> +{
> +  tree *t = cond;
> +  while (TREE_CODE (*t) == ANNOTATE_EXPR)
> +t = &T

Re: [PATCH RFA] libstdc++: fix C header include guards

2024-09-10 Thread Jonathan Wakely
On Tue, 10 Sept 2024 at 05:10, Jason Merrill wrote:
>
> Tested x86_64-pc-linux-gnu, OK for trunk?

I'm going to have to do some digging ... I _think_ there's some
obscure reason for this. Maybe a weird bootstrap situation. Or maybe I
made that up as rationale for a scripting error.


>
> -- 8< --
>
> Ever since the c_global and c_compatibility directories were added in
> r122533, the include guards have been oddly late in the files, with no
> comment about why that might be either in the commit message or the files
> themselves.  I don't see any justification for this; it seems like a
> scripting error in creating these files based on the ones in include/c.
>
> libstdc++-v3/ChangeLog:
>
> * include/c_compatibility/ctype.h
> * include/c_compatibility/errno.h
> * include/c_compatibility/float.h
> * include/c_compatibility/iso646.h
> * include/c_compatibility/limits.h
> * include/c_compatibility/locale.h
> * include/c_compatibility/setjmp.h
> * include/c_compatibility/signal.h
> * include/c_compatibility/stdarg.h
> * include/c_compatibility/stdbool.h
> * include/c_compatibility/stddef.h
> * include/c_compatibility/stdio.h
> * include/c_compatibility/string.h
> * include/c_compatibility/tgmath.h
> * include/c_compatibility/time.h
> * include/c_compatibility/uchar.h
> * include/c_compatibility/wchar.h
> * include/c_compatibility/wctype.h
> * include/c_global/cctype
> * include/c_global/cerrno
> * include/c_global/cfloat
> * include/c_global/climits
> * include/c_global/clocale
> * include/c_global/cmath
> * include/c_global/csetjmp
> * include/c_global/csignal
> * include/c_global/cstdarg
> * include/c_global/cstdio
> * include/c_global/cstdlib
> * include/c_global/cstring
> * include/c_global/ctime
> * include/c_global/cwchar
> * include/c_global/cwctype: Move header guard before #includes.
> ---
>  libstdc++-v3/include/c_compatibility/ctype.h   | 4 ++--
>  libstdc++-v3/include/c_compatibility/errno.h   | 4 ++--
>  libstdc++-v3/include/c_compatibility/float.h   | 4 ++--
>  libstdc++-v3/include/c_compatibility/iso646.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/limits.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/locale.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/setjmp.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/signal.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/stdarg.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/stdbool.h | 6 +++---
>  libstdc++-v3/include/c_compatibility/stddef.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/stdio.h   | 4 ++--
>  libstdc++-v3/include/c_compatibility/string.h  | 4 ++--
>  libstdc++-v3/include/c_compatibility/tgmath.h  | 6 +++---
>  libstdc++-v3/include/c_compatibility/time.h| 4 ++--
>  libstdc++-v3/include/c_compatibility/uchar.h   | 4 ++--
>  libstdc++-v3/include/c_compatibility/wchar.h   | 4 ++--
>  libstdc++-v3/include/c_compatibility/wctype.h  | 4 ++--
>  libstdc++-v3/include/c_global/cctype   | 6 +++---
>  libstdc++-v3/include/c_global/cerrno   | 6 +++---
>  libstdc++-v3/include/c_global/cfloat   | 6 +++---
>  libstdc++-v3/include/c_global/climits  | 6 +++---
>  libstdc++-v3/include/c_global/clocale  | 6 +++---
>  libstdc++-v3/include/c_global/cmath| 6 +++---
>  libstdc++-v3/include/c_global/csetjmp  | 6 +++---
>  libstdc++-v3/include/c_global/csignal  | 6 +++---
>  libstdc++-v3/include/c_global/cstdarg  | 6 +++---
>  libstdc++-v3/include/c_global/cstdio   | 6 +++---
>  libstdc++-v3/include/c_global/cstdlib  | 6 +++---
>  libstdc++-v3/include/c_global/cstring  | 6 +++---
>  libstdc++-v3/include/c_global/ctime| 6 +++---
>  libstdc++-v3/include/c_global/cwchar   | 6 +++---
>  libstdc++-v3/include/c_global/cwctype  | 6 +++---
>  33 files changed, 83 insertions(+), 83 deletions(-)
>
> diff --git a/libstdc++-v3/include/c_compatibility/ctype.h 
> b/libstdc++-v3/include/c_compatibility/ctype.h
> index 7b57a4d118a..d975dff1529 100644
> --- a/libstdc++-v3/include/c_compatibility/ctype.h
> +++ b/libstdc++-v3/include/c_compatibility/ctype.h
> @@ -26,11 +26,11 @@
>   *  This is a Standard C++ Library header.
>   */
>
> -#include 
> -
>  #ifndef _GLIBCXX_CTYPE_H
>  #define _GLIBCXX_CTYPE_H 1
>
> +#include 
> +
>  #ifdef _GLIBCXX_NAMESPACE_C
>  using std::isalnum;
>  using std::isalpha;
> diff --git a/libstdc++-v3/include/c_compatibility/errno.h 
> b/libstdc++-v3/include/c_compatibility/errno.h
> index 06a7e5e1ee4..402810456a5 100644
> --- a/libstdc++-v3/include/c_compatibility/errno.h
> +++ b/libstdc++-v3/include/c_compatibility/errno.h
> @@ -26,9 +26,9 @@
>   *  This is a Standard C++ Library header.
>   */
>
> -#include 
> -
>  #ifndef _GLIBCXX_E

[Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

2024-09-10 Thread Tobias Burnus

Background: OpenMP states that for 'map(var)', all allocatable components
of 'var' will automatically also be mapped ('deep mapping').

Thus, for

type(t), allocatable :: var(:)

this leads to a pseudo code like:

  map(var, storage_size(var))
  do i = lbound(var), ubound(var)
if (allocated(var(i)%comp1) &
  map(var(i)%comp1, storage_size(var(i)%comp1))
  end do

and more complicated, e.g. var(1204)%comp1(395)%str might be
an allocatable scalar. Or var is an recursive type, e.g. it has
'type(t), allocatable :: self' as component such that
  var%self%self%self%self ...
might exist (and 'self' could also be an array …).

* * *

Approach:

The idea is to handle it inlower_omp_target as follows (semi-pseudocode): /* Obtain number of 
additional mappings, in the example above, it would be size(var) * 2 for 
map + attach of 'comp1', assuming all 'var(:)%comp1' are allocated and 
no other alloc comp. exist. */ tree cnt = 
lang_hooks.decls.omp_deep_mapping_cnt (...)   if (cnt)
   deep_map_cnt *= cnt; if (cnt) → switch to pointer type + dynamically 
allocate addrs, kinds, sizes → add 'uintptr_t s[]' as tailing member to 
addr struct.

(Thus, all automatically mapped items are added to the end.)

 In the big map loop, call additionally:
lang_hooks.decls.omp_deep_mapping Additionally, in some cases, the only 
question that needs to be solved is: Does the decl have an allocatable 
component or not. In that case, lang_hooks.decls.omp_deep_mapping_p is 
sufficient. * * * RFC: Does this approach sound sensible? Does the 
attached patch (middle-end part) look reasonable? One downside of the 
current approach is that for map(var) when 'var' is present we still 
attempt to map all allocatable components instead of stopping directly 
after finding 'var' in the splay table. this can be fixed by passing 
more attributes to libgomp, but as the items come last in the list, it 
might be not straight forward. (maybe a starts-here + ends-here flags, 
where the attach next to starts-here flag could be used to do the 
lookup?). This might also lead to cases where an allocatable variable is 
mapped that otherwise would not be mapped. Albeit as 'map(var%comp)' of 
a later allocated 'comp' is only guaranteed to work with the 'always' 
modifier, having it automapped for 'map(var)' should at least not affect 
the values that were mapped. * * * The full patch has been applied to 
OG14 (= devel/omp/gcc-14) branch. The interesting bit are the hook entry 
points gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, and 
gfc_omp_deep_mapping → 
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209 
* * * I have attached the middle-end patch, only, of the patch:


https://gcc.gnu.org/g:92c3af3d4f8 Fortran/OpenMP: Support mapping of DT with 
allocatable components

to focus on that part.

Tobias

PS: In TR13 and also after TR13, a couple of mapping features were added that 
permit
shallow mapping, unmapping of allocatable components etc. I have not tried to 
analyze
whether this affects this patch, but I think it remains largely as is.
Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

gcc/ChangeLog:

	* langhooks-def.h (lhd_omp_deep_mapping_p,
	lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
	(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
	LANG_HOOKS_OMP_DEEP_MAPPING): Define.
	(LANG_HOOKS_DECLS): Use it.
	* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
	lhd_omp_deep_mapping): New stubs.
	* langhooks.h (struct lang_hooks_for_decls): Add new hooks
	* omp-expand.cc (expand_omp_target): Handle dynamic-size
	addr/sizes/kinds arrays.
	* omp-low.cc (build_sender_ref, fixup_child_record_type,
	scan_sharing_clauses, lower_omp_target): Update to handle
	new hooks and dynamic-size addr/sizes/kinds arrays.
---
 gcc/langhooks-def.h |  10 +++
 gcc/langhooks.cc|  24 ++
 gcc/langhooks.h |  15 
 gcc/omp-expand.cc   |  18 -
 gcc/omp-low.cc  | 224 ++--
 5 files changed, 265 insertions(+), 26 deletions(-)

diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index f5c67b6823c..756714558e5 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -86,6 +86,10 @@ extern enum omp_clause_defaultmap_kind lhd_omp_predetermined_mapping (tree);
 extern tree lhd_omp_assignment (tree, tree, tree);
 extern void lhd_omp_finish_clause (tree, gimple_seq *, bool);
 extern tree lhd_omp_array_size (tree, gimple_seq *);
+extern bool lhd_omp_deep_mapping_p (const gimple *, tree);
+extern tree lhd_omp_deep_mapping_cnt (const gimple *, tree, gimple_seq *);
+extern void lhd_omp_deep_mapping (const gimple *, tree, unsigned HOST_WIDE_INT,
+  tree, tree, tree, tree, tree, gimple_seq *);
 struct gimplify_omp_ctx;
 extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
 	   tree);
@@ -272,6 +276,9 @@ extern tree lhd_unit_size_without_reusable_padding (tree)

Re: [Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

2024-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2024 at 12:19:33PM +0200, Tobias Burnus wrote:
> Background: OpenMP states that for 'map(var)', all allocatable components
> of 'var' will automatically also be mapped ('deep mapping').

Not a review, just a comment.  This kind of recursive mapping is also
what needs to happen for declare mapper, so wonder if that shouldn't be
solved together; and some way to merge mappings of one field after another
with the same way if consecutive fields (with possibly some padding bits
in between) are mapped the same way.

Jakub



Re: [PATCH] MATCH: add abs support for half float

2024-09-10 Thread Richard Biener
On Thu, Sep 5, 2024 at 3:19 AM Kugan Vivekanandarajah
 wrote:
>
> Thanks for the explanation.
>
>
> > On 2 Sep 2024, at 9:47 am, Andrew Pinski  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Sun, Sep 1, 2024 at 4:27 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Andrew.
> >>
> >>> On 28 Aug 2024, at 2:23 pm, Andrew Pinski  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Tue, Aug 27, 2024 at 8:54 PM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Hi Richard,
> 
>  Thanks for the reply.
> 
> > On 27 Aug 2024, at 7:05 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >>> On 22 Aug 2024, at 10:34 pm, Richard Biener 
> >>>  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Hi Richard,
> 
> > On 20 Aug 2024, at 6:09 pm, Richard Biener 
> >  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Thanks for the comments.
> >>
> >>> On 2 Aug 2024, at 8:36 pm, Richard Biener 
> >>>  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
> 
> 
> > On 1 Aug 2024, at 10:46 pm, Richard Biener 
> >  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >>
> >> On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski 
> >>  wrote:
> >>>
> >>> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
>   wrote:
> >
> > On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> >>  wrote:
> >>>
> >>> On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
>   wrote:
> >
> > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Revised based on the comment and moved it into 
> >> existing patterns as.
> >>
> >> gcc/ChangeLog:
> >>
> >> * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 
> >> ? A : -A.
> >> Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/absfloat16.c: New test.
> >
> > The testcase needs to make sure it runs only for 
> > targets that support
> > float16 so like:
> >
> > /* { dg-require-effective-target float16 } */
> > /* { dg-add-options float16 } */
>  Added in the attached version.
> >>>
> >>> + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> >>> (for cmp (ge gt)
> >>> (simplify
> >>> -   (cnd (cmp @0 zerop) @1 (negate @1))
> >>> -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> >>> -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> >>> -&& bitwise_equal_p (@0, @1))
> >>> +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> >>> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> >>> +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> >>> +&& ((VECTOR_TYPE_P (type)
> >>> + && tree_nop_conversion_p (TREE_TYPE (@0), 
> >>> TREE_TYPE (@1)))
> >>> +   || (!VECTOR_TYPE_P (type)
> >>> +

Re: [PATCH] match: Change (A * B) + (-C) to (B - C/A) * A, if C multiple of A [PR109393]

2024-09-10 Thread Richard Biener
On Fri, Sep 6, 2024 at 2:44 PM  wrote:
>
> From: kelefth 
>
> The following function:
>
> int foo(int *a, int j)
> {
>   int k = j - 1;
>   return a[j - 1] == a[k];
> }
>
> does not fold to `return 1;` using -O2 or higher. The cause of this is that
> the expression `4 * j + (-4)` for the index computation is not folded to
> `4 * (j - 1)`. Existing simplifications that handle similar cases are applied
> when A == C, which is not the case in this instance.
>
> A previous attempt to address this issue is
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html
>
> This patch adds the following simplification in match.pd:
> (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A
>
> which also handles cases where the index is j - 2, j - 3, etc.
>
> Bootstrapped for all languages and regression tested on x86-64 and aarch64.
>
> PR tree-optimization/109393
>
> gcc/ChangeLog:
>
> * match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr109393.c: New test.
>
> Tested-by: Christoph Müllner 
> Signed-off-by: Philipp Tomsich 
> Signed-off-by: Konstantinos Eleftheriou 
> ---
>  gcc/match.pd| 15 ++-
>  gcc/testsuite/gcc.dg/pr109393.c | 23 +++
>  2 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 621306213e4..9d971b663c6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  ? wi::max_value (TYPE_PRECISION (type), SIGNED)
>  : wi::min_value (TYPE_PRECISION (type), SIGNED))
>  && single_use (@3))
> - (mult (plusminus @2 { build_one_cst (type); }) @0))
> + (mult (plusminus @2 { build_one_cst (type); }) @0)))

Please move the pattern outside of the enclosing (for plusminus (..) loop

> +   /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A.  */
> +   (simplify
> +(plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4)
> +  (if (TREE_CODE (type) == INTEGER_TYPE
> + && wi::neg_p (wi::to_wide (@4)))
> +   (with {
> + wide_int c1 = wi::to_wide (@0);
> + wide_int c2_abs = wi::abs (wi::to_wide (@4));

I think you need to exclude the most negative number for @4

> + /* Calculate @4 / @0 in order to factorize the expression.  */
> + wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN (type));
> + tree div_cst = wide_int_to_tree (type, div_res); }

Please build the tree node (and perform the division) only when
wi::multiple_of_p.

> +   (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type)))
> + (mult (minus @2 { div_cst; }) @0

(minus @2 INTEGER_CST) isn't canonical, it's going to be
folded back to a plus with the negative constant.  Can't you
work with a signed division on the original possibly "signed" value?

You have to be careful to not introduce new undefined overflow
with the B - C/A expression which can happen for A == 1 and B == 0
for C == INT_MIN.  I think fold_plusminus_mult_expr documents all
problematical cases.

Thanks,
Richard.

>
>  #if GIMPLE
>  /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 000..17bf9330796
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int foo(int *a, int j)
> +{
> +  int k = j - 1;
> +  return a[j - 1] == a[k];
> +}
> +
> +int foo2(int *a, int j)
> +{
> +  int k = j - 5;
> +  return a[j - 5] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> +  int k = j - 1;
> +  return (&a[j + 1] - 2) == &a[k];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */
> \ No newline at end of file
> --
> 2.46.0
>


Re: [Patch][RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

2024-09-10 Thread Tobias Burnus

Hi Jakub,

Jakub Jelinek wrote:

On Tue, Sep 10, 2024 at 12:19:33PM +0200, Tobias Burnus wrote:

Background: OpenMP states that for 'map(var)', all allocatable components
of 'var' will automatically also be mapped ('deep mapping').

Not a review, just a comment.  This kind of recursive mapping is also
what needs to happen for declare mapper, so wonder if that shouldn't be
solved together; and some way to merge mappings of one field after another
with the same way if consecutive fields (with possibly some padding bits
in between) are mapped the same way.


In case mapping Fortran allocatable components, I do not see the padding 
part. For 'map(var)' all of var is mapped, including all array 
descriptors. We then need to map the allocated memory (fully, if an 
array: all array elements) + do a pointer attach. And we need to handle 
unallocated components.


That's different to 'mapper', which is more flexible on one hand - but 
also really explicit. There is no hidden 'only if allocated do', 
possibly except for zero-sized array sections or iterator steps.


The Fortran part also handles polymorphic variables, where it is only 
known at runtime which components exist – which means that the whole 
tree of mappings to do is unknown at compile time. For 'mapper' that 
part is known.


[Granted, TR13 now explicitly does not permit mapping of polymorphic 
variables as there are too many corner cases. But for 6.x it is planned 
to re-add it.]


In any case, the Fortran allocatable-component mapping also needs to be 
applied to the mapper (+ iterator) generated code — and it needs to come 
last after all implicit mappings and remove-mapping optimizations. It 
could be also be done as part of the mapper expansion.


* * *

Having said this, there might be well a useful common approach that 
covers Fortran deep mapping, 'mapper' and 'iterator'.


But the current approaches don't use them. Namely, we have:

* The current Fortran deep mapper (as just posted) was ready in March 
2022, https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591075.html


* The mapper patch (latest version) is at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629363.html – 
albeit first bits date back to 
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591983.html


* There is also an 'iterator' patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662138.html – 
albeit it lacks the 'mapper' part, which is WIP and needs for main the 
patch 'mapper' of the previous bullet.


* * *

If we have a clear plan to to implement things, I am somewhat willing to 
revise patches, if it makes sense.


But for that, a clear design is needed.

And, in any case, it would be good, if we could get all of the features 
above into GCC 15: Fortran deep mapping, 'mapper' (+ target_update with 
strides), 'iterator'  [and some other backlog].


Tobias



RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-10 Thread Prathamesh Kulkarni
> -Original Message-
> From: Thomas Schwinge 
> Sent: Monday, September 9, 2024 8:50 PM
> To: Prathamesh Kulkarni ; Richard Biener
> 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub
> Jelinek 
> Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Thomas,
> 
> On 2024-09-09T06:31:18+, Prathamesh Kulkarni
>  wrote:
> >> -Original Message-
> >> From: Thomas Schwinge 
> >> Sent: Friday, September 6, 2024 2:31 PM On 2024-08-
> 16T15:36:29+,
> >> Prathamesh Kulkarni  wrote:
> >> >> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
> >> >> :
> >> >> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
> >> >>  wrote:
> >> >> >> I added another option -foffload-abi-host-opts to specify
> host
> >> abi
> >> >> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit
> >> which
> >> >> >> mkoffload can use to enable/disable offloading (as before).
> 
> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> > @@ -18999,9 +18999,9 @@ static char *  aarch64_offload_options
> >> > (void)  {
> >> >if (TARGET_ILP32)
> >> > -return xstrdup ("-foffload-abi=ilp32");
> >> > +return xstrdup ("-foffload-abi=ilp32
> >> > + -foffload-abi-host-opts=-mabi=ilp32");
> >> >else
> >> > -return xstrdup ("-foffload-abi=lp64");
> >> > +return xstrdup ("-foffload-abi=lp64
> >> > + -foffload-abi-host-opts=-mabi=lp64");
> >> >  }
> >>
> >> As none of the current offload compilers is set up of ILP32, I
> >> suggest we continue to pass '-foffload-abi=ilp32' without
> >> '-foffload-abi-host- opts=[...]' -- the 'mkoffload's in that case
> >> should get to the point where the latter is used.
> 
> Oh...  I was wrong with the latter item: I failed to see that the
> 'mkoffload's still do 'compile_native' even if they don't create an
> actual offload image, sorry!
> 
> > Um, would that still possibly result in arch mismatch for host
> objects and xnvptx-none.o if we don't pass host ABI opts for ILP32 ?
> > For eg, if the host compiler defaults to 64-bit code-gen (and user
> > requests for 32-bit code gen on host), and we avoid passing host ABI
> opts for -foffload-abi=ilp32, it will generate 64-bit xnvptx-none.o
> (corresponding to empty ptx_cfile_name), while rest of the host
> objects will be 32-bit, or am I misunderstanding ?
> 
> You're quite right -- my fault.
> 
> > The attached patch avoids passing -foffload-abi-host-opts if -
> foffload-abi=ilp32.
> 
> So, sorry for the back and forth.  I think we now agree that we do
> need '-foffload-abi-host-opts=[...]' specified in call cases (as you
> originally had), and then again unconditionally use
> 'offload_abi_host_opts' in the 'mkoffload's' 'compile_native'
> functions.
Done in the attached patch, thanks.
> 
> > Could you please test the patch for gcn backend ?
> 
> I'll do that.
> 
> > [nvptx] Pass host specific ABI opts from mkoffload.
> >
> > The patch adds an option -foffload-abi-host-opts, which is set by
> host
> > in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes it's value
> 
> "its", by the way.  ;-)
Fixed 😊
> 
> > to host_compiler.
> 
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> 
> > +foffload-abi-host-opts=
> > +Common Driver Joined MissingArgError(option missing after %qs)
> > +-foffload-abi-host-opts= Specify host ABI options.
> > +
> 
> Still need TAB between '-foffload-abi-host-opts=' and its
> help text.
Done.
> 
> > --- a/gcc/config/gcn/mkoffload.cc
> > +++ b/gcc/config/gcn/mkoffload.cc
> 
> > @@ -998,6 +996,14 @@ main (int argc, char **argv)
> >"unrecognizable argument of option %<" STR
> "%>");
> >   }
> >  #undef STR
> > +  else if (startswith (argv[i], "-foffload-abi-host-opts="))
> > + {
> > +   if (offload_abi_host_opts)
> > + fatal_error (input_location,
> > +  "-foffload-abi-host-opts specified multiple
> > + times");
> 
> ACK, but again '%<-foffload-abi-host-opts%>', please.  (May also use
> another '#define STR "[...]"' for the duplicated string, but I don't
> care.)
Sorry, missed this earlier, fixed.
> 
> > --- a/gcc/config/nvptx/mkoffload.cc
> > +++ b/gcc/config/nvptx/mkoffload.cc
> 
> > @@ -721,6 +718,14 @@ main (int argc, char **argv)
> >"unrecognizable argument of option " STR);
> >   }
> >  #undef STR
> > +  else if (startswith (argv[i], "-foffload-abi-host-opts="))
> > + {
> > +   if (offload_abi_host_opts)
> > + fatal_error (input_location,
> > +  "-foffload-abi-host-opts specified multiple
> > + times");
> 
> Likewise.
> 
> > --- a/gcc/lto-wrapper.cc
> > +++ b/gcc/lto-wrapper.cc
> > @@ -484,6 +484,7 @@ merge_and_complain (vec
> > &decoded_options,
> >
> >
> >   case OPT_foffload_abi_:
> > + case OPT_foffload_abi_host_opts_:
> > if (existing_opt == -1)
> >   decoded_options.safe_push (*fopt

[PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Qing Zhao
Hi,

This is the 2nd version of the patch after the long discussion. We finally
decided to keep the previous design that returns a pointer to the counted_by
object. 

Compared to the first version, the major changes are:

1. change the name of the builtin from __builtin_get_counted_by to
   __builtin_counted_by_ref in order to reflect the fact that the returned
   value of it is a reference to the object.

2. make typeof(__builtin_counted_by_ref) working.

3. update the testing case to use the new builtin inside _Generic.

bootstrapped and regress tested on both X86 and aarch64. no issue.

Okay for the trunk?

thanks.

Qing.



With the addition of the 'counted_by' attribute and its wide roll-out
within the Linux kernel, a use case has been found that would be very
nice to have for object allocators: being able to set the counted_by
counter variable without knowing its name.

For example, given:

  struct foo {
...
int counter;
...
struct bar array[] __attribute__((counted_by (counter)));
  } *p;

The existing Linux object allocators are roughly:

  #define alloc(P, FAM, COUNT) ({ \
typeof(P) __p; \
size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
__p = kmalloc(__size, GFP); \
P = __p; \
  })

Right now, any addition of a counted_by annotation must also
include an open-coded assignment of the counter variable after
the allocation:

  p = alloc(p, array, how_many);
  p->counter = how_many;

In order to avoid the tedious and error-prone work of manually adding
the open-coded counted-by intializations everywhere in the Linux
kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
to be added to help the adoption of the counted-by attribute.

 -- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
 The built-in function '__builtin_counted_by_ref' checks whether the
 array object pointed by the pointer PTR has another object
 associated with it that represents the number of elements in the
 array object through the 'counted_by' attribute (i.e.  the
 counted-by object).  If so, returns a pointer to the corresponding
 counted-by object.  If such counted-by object does not exist,
 returns a NULL pointer.

 This built-in function is only available in C for now.

 The argument PTR must be a pointer to an array.  The TYPE of the
 returned value must be a pointer type pointing to the corresponding
 type of the counted-by object or VOID pointer type in case of a
 NULL pointer being returned.

With this new builtin, the central allocator could be updated to:

  #define alloc(P, FAM, COUNT) ({ \
typeof(P) __p; \
size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
__p = kmalloc(__size, GFP); \
typeof(_Generic(__builtin_counted_by_ref(__p->FAM), \
void *: (size_t *)NULL, \
default: __builtin_counted_by_ref(__p->FAM))) \
  ret = __builtin_counted_by_ref(__p->FAM); \
if (ret) \
  *ret = COUNT; \
P = __p; \
  })

And then structs can gain the counted_by attribute without needing
additional open-coded counter assignments for each struct, and
unannotated structs could still use the same allocator.

PR c/116016

gcc/c-family/ChangeLog:

* c-common.cc: Add new __builtin_counted_by_ref.
* c-common.h (enum rid): Add RID_BUILTIN_COUNTED_BY_REF.

gcc/c/ChangeLog:

* c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
* c-parser.cc (has_counted_by_object): New routine.
(get_counted_by_ref): New routine.
(c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
* c-tree.h: New global in_builtin_counted_by_ref.
* c-typeck.cc (build_component_ref): Enable generating
.ACCESS_WITH_SIZE inside typeof when inside builtin_counted_by_ref.

gcc/ChangeLog:

* doc/extend.texi: Add documentation for __builtin_counted_by_ref.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-counted-by-ref-1.c: New test.
* gcc.dg/builtin-counted-by-ref.c: New test.
---
 gcc/c-family/c-common.cc  |  1 +
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-decl.cc   |  1 +
 gcc/c/c-parser.cc | 77 +++
 gcc/c/c-tree.h|  1 +
 gcc/c/c-typeck.cc |  7 +-
 gcc/doc/extend.texi   | 55 +++
 .../gcc.dg/builtin-counted-by-ref-1.c | 97 +++
 gcc/testsuite/gcc.dg/builtin-counted-by-ref.c | 58 +++
 9 files changed, 297 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-counted-by-ref-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-counted-by-ref.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e7e371fd26f..15b90bae8b5 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-com

Re: [PING^4] [PATCH] Add a bootstrap-native build config

2024-09-10 Thread Andi Kleen
On Tue, Sep 10, 2024 at 03:29:08AM +, Ramana Radhakrishnan wrote:
> > diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
> > new file mode 100644
> > index ..a4a3d8594089
> > --- /dev/null
> > +++ b/config/bootstrap-native.mk
> > @@ -0,0 +1 @@
> > +BOOT_CFLAGS := -march=native -mtune=native $(BOOT_CFLAGS)
> 
> Does every port have a -march=native + -mtune=native that behaves the same 
> way as what is expected here ? 

Not all of them do, but it's getting somewhat common with popular ones.

> 
> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> > index 4973f195daf9..29827c5106f8 100644
> > --- a/gcc/doc/install.texi
> > +++ b/gcc/doc/install.texi
> > @@ -3052,6 +3052,12 @@ Removes any @option{-O}-started option from 
> > @code{BOOT_CFLAGS}, and adds
> > @itemx @samp{bootstrap-Og}
> > Analogous to @code{bootstrap-O1}.
> > 
> > +@item @samp{bootstrap-native}
> > +@itemx @samp{bootstrap-native}
> > +Optimize the compiler code for the build host, if supported by the
> > +architecture. Note this only affects the compiler, not the targeted
> > +code. If you want the later use @samp{--with-cpu}.
> > +
> 
>  The defaults suitable for a port can be different , for instance on AArch32 
> additional options to specify float abi and floating point units might be 
> required.
>  I would suggest rewriting this to something like .  “If you want the later , 
> choose options suitable to the target you are looking for.  For e.g. 
> @samp{--with-cpu} would be a good starting point.” 

Ok.

-Andi


[RFC 0/4] Hard Register Constraints

2024-09-10 Thread Stefan Schulze Frielinghaus
This series introduces hard register constraints.  The first patch
enables hard register constraints for asm statements and for
machine descriptions.  The subsequent patch adds some basic error
handling for asm statements.  The third patch adds some verification of
register names used in machine description.  The fourth and last patch
adds the feature of rewriting local register asm into hard register
constraints.

This series was bootstrapped and regtested on s390.  Furthermore, the
new dg-compile tests were verified via cross compilers for the enabled
targets.  There is still some fallout if -fdemote-register-asm is used
since a couple of features are missing as e.g. erroring out during
gimplification if the clobber set of registers intersects with
input/output registers.

As a larger test vehicle I've compiled and regtested glibc on s390 using
-fdemote-register-asm without any fallout.  On x86_64 this fails due to
the limitation that fixed registers are currently not supported for hard
register constraints (see commit message of the first patch).  This is
also the reason why I'm posting this series already since I was hoping
to get some feedback about this limitation.

Furthermore, I've compiled the Linux kernel on s390 and x86_64 with
-fdemote-register-asm.  Interestingly, the Linux kernel for x86_64 makes
use of the following asm statement:

#define call_on_stack(stack, func, asm_call, argconstr...)  \
{   \
register void *tos asm("r11");  \
\
tos = ((void *)(stack));\
\
asm_inline volatile(\
"movq   %%rsp, (%[tos]) \n" \
"movq   %[tos], %%rsp   \n" \
\
asm_call\
\
"popq   %%rsp   \n" \
\
: "+r" (tos), ASM_CALL_CONSTRAINT   \
: [__func] "i" (func), [tos] "r" (tos) argconstr\
: "cc", "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10",   \
  "memory"  \
);  \
}

Note the output
  "+r" (tos)
and the input
  [tos] "r" (tos)
Currently I error out for this since I consider this as two inputs using
the same hard register.  One time an implicit input via '+' and a second
time via the explicit input.  Thus, actually I would expect a '='
instead of a '+' for the output constraint since the input is explicitly
mentioned, or remove the input entirely and just use the inoutput
   [tos] "+r" (tos)
If you consider this valid asm I would have to adjust the error
handling.  Either way, this is just about error handling and doesn't
really affect code generation.

Stefan Schulze Frielinghaus (4):
  Hard register constraints
  Error handling for hard register constraints
  genoutput: Verify hard register constraints
  Rewrite register asm into hard register constraints

 gcc/cfgexpand.cc  |  42 ---
 gcc/common.opt|   4 +
 gcc/function.cc   | 116 
 gcc/genoutput.cc  |  60 
 gcc/genpreds.cc   |   4 +-
 gcc/gimplify.cc   | 151 +-
 gcc/gimplify_reg_info.h   | 130 +
 gcc/ira.cc|  79 +-
 gcc/lra-constraints.cc|  13 +
 gcc/output.h  |   2 +
 gcc/recog.cc  |  11 +-
 gcc/stmt.cc   | 268 +-
 gcc/stmt.h|   9 +-
 gcc/testsuite/gcc.dg/asm-hard-reg-1.c |  85 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-2.c |  33 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-3.c |  25 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-4.c |  50 
 gcc/testsuite/gcc.dg/asm-hard-reg-5.c |  36 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-6.c |  60 
 gcc/testsuite/gcc.dg/asm-hard-reg-7.c |  41 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-8.c |  49 
 .../gcc.dg/asm-hard-reg-demotion-1.c  |  19 ++
 .../gcc.dg/asm-hard-reg-demotion-2.c  |  19 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h  |  52 
 gcc/testsuite/

[RFC 2/4] Error handling for hard register constraints

2024-09-10 Thread Stefan Schulze Frielinghaus
This implements some basic error handling for hard register constraints
including potential conflics with register asm operands.

In contrast to register asm operands, hard register constraints allow
more than just one register per operand.  Even more than just one
register per alternative.  For example, a valid constraint for an
operand is "{r0}{r1}m,{r2}".  However, this also means that we have to
make sure that each register is used at most once in each alternative
over all outputs and likewise over all inputs.  For asm statements this
is done by this patch during gimplification.  For hard register
constraints used in machine description, error handling is still a todo
and I haven't investigated this so far and consider this rather a low
priority.

There are 9/10 call sides for parse_{input,output}_constraint() which I
didn't dare to touch in the first run.  If this patch is about to be
accepted I could change those call sides and explicitly pass a null
pointer instead of overloading those functions as it is done right now.
I consider this an implementation nit and didn't want to clutter the
patch for reviewing.
---
 gcc/cfgexpand.cc  |  42 
 gcc/gimplify.cc   |  73 +-
 gcc/gimplify_reg_info.h   | 130 ++
 gcc/stmt.cc   | 229 +-
 gcc/stmt.h|   8 +-
 gcc/testsuite/gcc.dg/asm-hard-reg-error-1.c   |  83 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-2.c   |  20 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-3.c   |  21 ++
 gcc/testsuite/gcc.dg/pr87600-2.c  |  30 +--
 gcc/testsuite/gcc.dg/pr87600-3.c  |  35 +++
 gcc/testsuite/gcc.dg/pr87600-3.s  |   0
 .../gcc.target/s390/asm-hard-reg-1.c  | 103 
 .../gcc.target/s390/asm-hard-reg-2.c  |  43 
 .../gcc.target/s390/asm-hard-reg-3.c  |  42 
 gcc/testsuite/lib/scanasm.exp |   4 +
 15 files changed, 779 insertions(+), 84 deletions(-)
 create mode 100644 gcc/gimplify_reg_info.h
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-error-1.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-error-2.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-error-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pr87600-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pr87600-3.s
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 13f8c08d295..fdbbd93f1b5 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2966,44 +2966,6 @@ expand_asm_loc (tree string, int vol, location_t locus)
   emit_insn (body);
 }
 
-/* Return the number of times character C occurs in string S.  */
-static int
-n_occurrences (int c, const char *s)
-{
-  int n = 0;
-  while (*s)
-n += (*s++ == c);
-  return n;
-}
-
-/* A subroutine of expand_asm_operands.  Check that all operands have
-   the same number of alternatives.  Return true if so.  */
-
-static bool
-check_operand_nalternatives (const vec &constraints)
-{
-  unsigned len = constraints.length();
-  if (len > 0)
-{
-  int nalternatives = n_occurrences (',', constraints[0]);
-
-  if (nalternatives + 1 > MAX_RECOG_ALTERNATIVES)
-   {
- error ("too many alternatives in %");
- return false;
-   }
-
-  for (unsigned i = 1; i < len; ++i)
-   if (n_occurrences (',', constraints[i]) != nalternatives)
- {
-   error ("operand constraints for % differ "
-  "in number of alternatives");
-   return false;
- }
-}
-  return true;
-}
-
 /* Check for overlap between registers marked in CLOBBERED_REGS and
anything inappropriate in T.  Emit error and return the register
variable definition for error, NULL_TREE for ok.  */
@@ -3169,10 +3131,6 @@ expand_asm_stmt (gasm *stmt)
= TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE (t)));
 }
 
-  /* ??? Diagnose during gimplification?  */
-  if (! check_operand_nalternatives (constraints))
-return;
-
   /* Count the number of meaningful clobbered registers, ignoring what
  we would ignore later.  */
   auto_vec clobber_rvec;
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 26a216e151d..08e0b5d047b 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -70,6 +70,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-offload.h"
 #include "context.h"
 #include "tree-nested.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "output.h"
+#include "gimplify_reg_info.h"
 
 /* Identifier for a basic condition, mapping it to other basic conditions of
its Boolean expression.  Basic conditions given the same uid (in the same
@@ -7009,6 +7013,42 @@ gimplify_addr_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *

[RFC 4/4] Rewrite register asm into hard register constraints

2024-09-10 Thread Stefan Schulze Frielinghaus
Currently a register asm already materializes during expand.  This
means, a hard register is allocated for the very first access of a
register asm as e.g. in an assignment.  As a consequence this might lead
to suboptimal register allocation if the assignment and the using asm
statement are spread far apart.  Even more problematic are function
calls in between register asm assignments and its using asm statement
since hard registers may be clobbered by a call.  The former may be
solved by pulling register asm assignments and asm statements close by.
However, the latter is not easily solved since sometimes function calls
are implicit.  For example

int
foo (int *x)
{
  register int y asm ("0") = 42;
  register int z asm ("1") = *x;
  asm ("bar\t%0,%1" : "+r" (z) : "r" (y));
  return z;
}

If compiled with address sanitizer, then a function call is introduced
for the memory load which in turn may interfer with the initialization
of register asm y.  Likewise, for some targets and configurations even
an operation like an addition may lead to an implicit library call.

In contrast hard register constraints materialize during register
allocation and therefore do not suffer from this, i.e., asm operands are
kept in pseudos until RA.  This patch adds the feature of rewriting
local register asm into code which exploits hard register constraints.
For example

register int global asm ("r3");

int foo (int x0)
{
  register int x asm ("r4") = x0;
  register int y asm ("r5");

  asm ("bar\t%0,%1,%2" : "=r" (x) : "0" (x), "r" (global));
  x += 42;
  asm ("baz\t%0,%1" : "=r" (y) : "r" (x));

  return y;
}

is rewritten during gimplification into

register int global asm ("r3");

int foo (int x0)
{
  int x = x0;
  int y;

  asm ("bar\t%0,%1,%2" : "={r4}" (x) : "0" (x), "r" (global));
  x += 42;
  asm ("baz\t%0,%1" : "={r5}" (y) : "{r4}" (x));

  return y;
}

The resulting code solely relies on hard register constraints modulo
global register asm.

Since I consider this as an experimental feature it is hidden behind new
flag -fdemote-register-asm (I'm open for other naming suggestions).
---
 gcc/common.opt|  4 +
 gcc/gimplify.cc   | 78 +++
 .../gcc.dg/asm-hard-reg-demotion-1.c  | 19 +
 .../gcc.dg/asm-hard-reg-demotion-2.c  | 19 +
 gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h  | 52 +
 5 files changed, 172 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-1.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion-2.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-demotion.h

diff --git a/gcc/common.opt b/gcc/common.opt
index ea39f87ae71..859a735a0b7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3422,6 +3422,10 @@ fverbose-asm
 Common Var(flag_verbose_asm)
 Add extra commentary to assembler output.
 
+fdemote-register-asm
+Common Var(flag_demote_register_asm) Init(0)
+Demote local register asm and use hard register constraints instead
+
 fvisibility=
 Common Joined RejectNegative Enum(symbol_visibility) Var(default_visibility) 
Init(VISIBILITY_DEFAULT)
 -fvisibility=[default|internal|hidden|protected]   Set the default symbol 
visibility.
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 08e0b5d047b..c9bd1769c28 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7049,6 +7049,73 @@ num_alternatives (const_tree link)
   return num + 1;
 }
 
+static hash_set demote_register_asm;
+
+static void
+gimplify_demote_register_asm (tree link)
+{
+  if (!flag_demote_register_asm)
+return;
+  tree op = TREE_VALUE (link);
+  if (!VAR_P (op) || !DECL_HARD_REGISTER (op) || is_global_var (op))
+return;
+  tree id = DECL_ASSEMBLER_NAME (op);
+  const char *regname = IDENTIFIER_POINTER (id);
+  ++regname;
+  int regno = decode_reg_name (regname);
+  if (regno < 0)
+/* This indicates an error and we error out later on.  */
+return;
+  const char *constraint = TREE_STRING_POINTER (TREE_VALUE (TREE_PURPOSE 
(link)));
+  auto_vec constraint_new;
+  for (const char *p = constraint; *p; )
+{
+  bool pushed = false;
+  switch (*p)
+   {
+   case '+': case '=': case '%': case '?': case '!': case '*': case '&':
+   case '#': case '$': case '^': case '{': case 'E': case 'F': case 'G':
+   case 'H': case 's': case 'i': case 'n': case 'I': case 'J': case 'K':
+   case 'L': case 'M': case 'N': case 'O': case 'P': case ',': case '0':
+   case '1': case '2': case '3': case '4': case '5': case '6': case '7':
+   case '8': case '9': case '[': case '<': case '>': case 'g': case 'X':
+ break;
+
+   default:
+ if (!ISALPHA (*p))
+   break;
+ enum constraint_num cn = lookup_constraint (p);
+ enum reg_class rclass = reg_class_for_constraint (cn);
+ if (rclass != NO_REGS || insn_extra_address_constraint (cn))
+   {
+ gcc_assert (reg_class_subset_p (REGNO_REG_CLASS (regno), r

[RFC 1/4] Hard register constraints

2024-09-10 Thread Stefan Schulze Frielinghaus
Implement hard register constraints of the form {regname} where regname
must be any valid register name for the target.  Such constraints may be
used in asm statements as a replacement for register asm and in machine
descriptions.

Due to optimizations it is not unexpected if two or more inputs require
the same value, then those also share a common pseudo.  However, this in
turn may lead to unsatisfiable asm where multiple inputs with different
hard register constraints share the same pseudo.  Therefore, we have to
introduce copies of such a pseudo and use these for conflicting inputs.
This is done prior RA during asmcons in match_asm_constraints_2().
While IRA tries to reduce live ranges, it also replaces some
register-register moves.  That in turn might undo those copies of a
pseudo which we just introduced during asmcons.  Thus, check in
decrease_live_ranges_number() via valid_replacement_for_asm_input_p()
whether it is valid to perform a replacement.

The reminder of the patch mostly deals with parsing and decoding hard
register constraints.  The actual work is done by LRA in
process_alt_operands() where a register filter, according to the
constraint, is installed.

For the sake of "reviewability" and in order to show the beauty of LRA,
error handling (which gets pretty involved) is spread out into a
subsequent patch.

Limitation: Currently, a fixed register cannot be used as hard register
constraint.  For example, accessing the stack pointer on x86_64 via

void *
foo (void)
{
  void *y;
  __asm__ ("" : "={rsp}" (y));
  return y;
}

leads to an error.  This is unfortunate since register asm does not have
this limitation.  The culprit seems to be that during reload
ira_class_hard_regs_num[rclass] does not even include fixed registers
which is why lra_assign() ultimately fails.  Does anyone have an idea
how to lift this limitation?  Maybe there is even a shortcut in order to
force a pseudo into a hard reg?
---
 gcc/function.cc   | 116 ++
 gcc/genoutput.cc  |  14 
 gcc/genpreds.cc   |   4 +-
 gcc/ira.cc|  79 +-
 gcc/lra-constraints.cc|  13 +++
 gcc/recog.cc  |  11 ++-
 gcc/stmt.cc   |  39 +
 gcc/stmt.h|   1 +
 gcc/testsuite/gcc.dg/asm-hard-reg-1.c |  85 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-2.c |  33 
 gcc/testsuite/gcc.dg/asm-hard-reg-3.c |  25 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-4.c |  50 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-5.c |  36 
 gcc/testsuite/gcc.dg/asm-hard-reg-6.c |  60 +
 gcc/testsuite/gcc.dg/asm-hard-reg-7.c |  41 +
 gcc/testsuite/gcc.dg/asm-hard-reg-8.c |  49 +++
 16 files changed, 653 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-1.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-2.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-3.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-4.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-5.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-6.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-7.c
 create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-8.c

diff --git a/gcc/function.cc b/gcc/function.cc
index a6f6de34942..bf5992f2b06 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6974,6 +6974,115 @@ match_asm_constraints_1 (rtx_insn *insn, rtx *p_sets, 
int noutputs)
 df_insn_rescan (insn);
 }
 
+/* It is expected and desired that optimizations coalesce multiple pseudos into
+   one whenever possible.  However, in case of hard register constraints we may
+   have to undo this and introduce copies since otherwise we could constraint a
+   single pseudo to different hard registers.  For example, during register
+   allocation the following insn would be unsatisfiable since pseudo 60 is
+   constrained to hard register r5 and r6 at the same time.
+
+   (insn 7 5 0 2 (asm_operands/v ("foo") ("") 0 [
+  (reg:DI 60) repeated x2
+  ]
+   [
+  (asm_input:DI ("{r5}") t.c:4)
+  (asm_input:DI ("{r6}") t.c:4)
+  ]
+   [] t.c:4) "t.c":4:3 -1
+   (expr_list:REG_DEAD (reg:DI 60)
+  (nil)))
+
+   Therefore, introduce a copy of pseudo 60 and transform it into
+
+   (insn 10 5 7 2 (set (reg:DI 62)
+  (reg:DI 60)) "t.c":4:3 1503 {*movdi_64}
+   (nil))
+   (insn 7 10 11 2 (asm_operands/v ("foo") ("") 0 [
+  (reg:DI 60)
+  (reg:DI 62)
+  ]
+   [
+  (asm_input:DI ("{r5}") t.c:4)
+  (asm_input:DI ("{r6}") t.c:4)
+  ]
+   [] t.c:4) "t.c":4:3 -1
+   (expr_list:REG_DEAD (reg:DI 62)
+  (expr_list:REG_DEAD (reg:DI 60)
+  (nil
+
+   Now, LRA can assign pseudo 60 to r5, and pseudo 62 to r6.
+
+ 

[RFC 3/4] genoutput: Verify hard register constraints

2024-09-10 Thread Stefan Schulze Frielinghaus
Since genoutput has no information about hard register names we cannot
statically verify those names in constraints of the machine description.
Therefore, we have to do it at runtime.  Although verification shouldn't
be too expensive, restrict it to checking builds.  This should be
sufficient since hard register constraints in machine descriptions
probably change rarely, and each commit should be tested with checking
anyway, or at the very least before a release is taken.
---
 gcc/genoutput.cc | 46 ++
 gcc/output.h |  2 ++
 gcc/toplev.cc|  4 
 3 files changed, 52 insertions(+)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index 2ffb2fb28d2..4f4fde83608 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -200,6 +200,8 @@ static const char indep_constraints[] = ",=+%*?!^$#&g";
 static class constraint_data *
 constraints_by_letter_table[1 << CHAR_BIT];
 
+static hash_set used_reg_names;
+
 static int mdep_constraint_len (const char *, file_location, int);
 static void note_constraint (md_rtx_info *);
 
@@ -1156,6 +1158,45 @@ main (int argc, const char **argv)
   output_insn_data ();
   output_get_insn_name ();
 
+  /* Since genoutput has no information about hard register names we cannot
+ statically verify hard register names in constraints of the machine
+ description.  Therefore, we have to do it at runtime.  Although
+ verification shouldn't be too expensive, restrict it to checking builds.
+   */
+  printf ("\n\n#if CHECKING_P\n");
+  if (used_reg_names.is_empty ())
+printf ("void verify_reg_names_in_constraints () { }\n");
+  else
+{
+  size_t max_len = 0;
+  for (auto it = used_reg_names.begin (); it != used_reg_names.end (); 
++it)
+   {
+ size_t len = strlen (*it);
+ if (len > max_len)
+   max_len = len;
+   }
+  printf ("void\nverify_reg_names_in_constraints ()\n{\n");
+  printf ("  static const char hregnames[%zu][%zu] = {\n",
+ used_reg_names.elements (), max_len + 1);
+  auto it = used_reg_names.begin ();
+  while (it != used_reg_names.end ())
+   {
+ printf ("\"%s\"", *it);
+ ++it;
+ if (it != used_reg_names.end ())
+   printf (",");
+ printf ("\n");
+   }
+  printf ("  };\n");
+  printf ("  for (size_t i = 0; i < %zu; ++i)\n",
+ used_reg_names.elements ());
+  printf ("if (decode_reg_name (hregnames[i]) < 0)\n");
+  printf ("  internal_error (\"invalid register %%qs used in "
+ "constraint of machine description\", hregnames[i]);\n");
+  printf ("}\n");
+}
+  printf ("#endif\n");
+
   fflush (stdout);
   return (ferror (stdout) != 0 || have_error
? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE);
@@ -1294,6 +1335,11 @@ mdep_constraint_len (const char *s, file_location loc, 
int opno)
   ptrdiff_t len = end - s;
   if (*end == '}' && len > 1 && len < 31)
{
+ char *regname = new char[len];
+ memcpy (regname, s + 1, len - 1);
+ regname[len - 1] = '\0';
+ if (used_reg_names.add (regname))
+   delete[] regname;
  return len + 1;
}
 }
diff --git a/gcc/output.h b/gcc/output.h
index 46b0033b221..5f0f8a6098c 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -636,4 +636,6 @@ extern int default_address_cost (rtx, machine_mode, 
addr_space_t, bool);
 /* Stack usage.  */
 extern void output_stack_usage (void);
 
+extern void verify_reg_names_in_constraints ();
+
 #endif /* ! GCC_OUTPUT_H */
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index bc442a08c63..34c372ad1a2 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -1817,6 +1817,10 @@ backend_init_target (void)
 static void
 backend_init (void)
 {
+#if CHECKING_P
+  verify_reg_names_in_constraints ();
+#endif
+
   init_emit_once ();
 
   init_rtlanal ();
-- 
2.45.2



Re: [PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-09-10 Thread Jason Merrill

On 9/10/24 6:10 AM, Alex Coplan wrote:

On 27/08/2024 10:55, Alex Coplan wrote:

Hi,

This is a v3 that hopefully addresses the feedback from both Jason and
Jakub.  v2 was posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html


Gentle ping on this C++ patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661559.html

Jason, are you OK with this approach, or would you prefer to not make the
INTEGER_CST assumption and do something along the lines of your last suggestion
instead:


Perhaps we want a recompute_expr_flags like the existing
recompute_constructor_flags, so we don't need to duplicate PROCESS_ARG
logic elsewhere.


?  Sorry, I'd missed that reply when I wrote the v3 patch.


I still think that function would be nice to have, but the patch is OK 
as is.



Thanks,
Alex



(Sorry for the delay in posting the re-spin, I was away last week.)

In this version we refactor to introudce a helper class (annotate_saver)
which is much less invasive to the caller (maybe_convert_cond) and
should (at least in theory) be reuseable elsewhere.

This version also relies on the assumption that operands 1 and 2 of
ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
without having to rely on assumptions about the specific changes made
in maybe_convert_cond.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

For the testcase added with this patch, we would end up losing the:

   #pragma GCC unroll 4

and emitting "warning: ignoring loop annotation".  That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.

The function documents the following pre-condition:

/* [...] We assume that the annotations come immediately before the
   condition in BB, if any.  */

now looking at the exiting BB of the loop, we have:

:
   D.4524 = .ANNOTATE (iftmp.1, 1, 4);
   retval.0 = D.4524;
   if (retval.0 != 0)
 goto ; [INV]
   else
 goto ; [INV]

and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call.  To see where this comes from, we can
look to the IR given by -fdump-tree-original:

   if (<::operator() (&pred, *first), unroll 4>>>)
 goto ;
   else
 goto ;

here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.

The CLEANUP_POINT_EXPR gets added by the following call chain:

finish_while_stmt_cond
  -> maybe_convert_cond
  -> condition_conversion
  -> fold_build_cleanup_point_expr

this patch chooses to fix the issue by first introducing a new helper
class (annotate_saver) to save and restore outer chains of
ANNOTATE_EXPRs and then using it in maybe_convert_cond.

With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.

gcc/cp/ChangeLog:

 PR libstdc++/116140
 * semantics.cc (anotate_saver): New. Use it ...
 (maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
 remain the outermost expression(s) of the condition.

gcc/testsuite/ChangeLog:

 PR libstdc++/116140
 * g++.dg/ext/pragma-unroll-lambda.C: New test.



diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 5ab2076b673..b1a49b14238 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
nested_p,
  }
  }
  
+/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, users

+   can construct one of these like so:
+
+ annotate_saver s (&t);
+
+   and t will be updated to have any annotations removed.  The user can then
+   transform t, and later restore the ANNOTATE_EXPRs with:
+
+ t = s.restore (t).
+
+   The intent is to ensure that any ANNOTATE_EXPRs remain the outermost
+   expressions following any operations on t.  */
+
+class annotate_saver {
+  /* The chain of saved annotations, if there were any.  Otherwise null.  */
+  tree m_annotations;
+
+  /* If M_ANNOTATIONS is non-null, then M_INNER points to TREE_OPERAND (A, 0)
+ for the innermost annotation A.  */
+  tree *m_inner;
+
+public:
+  annotate_saver (tree *);
+  tree restore (tree);
+};
+
+/* If *COND is an ANNOTATE_EXPR, walk through the chain of annotations, and set
+   *COND equal to the first non-ANNOTATE_EXPR (saving a pointer to the
+   original chain of annotations for later use in restore).  */
+
+annotate_saver::annotate_saver (tree *cond) : m_annotations (nullptr)
+{
+  tree *t = cond;
+  while (TREE_CODE (*t) == ANNOTATE_EXPR)
+t = &TREE_OPERAND (*t, 0);
+
+  if (t != cond)
+{
+  m_annotations = *cond;
+  *cond = *t;
+  m_inner = t;
+}
+}
+
+/*

Re: [PATCH] c++: ICE with -Wtautological-compare in template [PR116534]

2024-09-10 Thread Jason Merrill

On 8/29/24 12:23 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?

-- >8 --
Pre r14-4793, we'd call warn_tautological_cmp -> operand_equal_p
with operands wrapped in NON_DEPENDENT_EXPR, which works, since
o_e_p bails for codes it doesn't know.  But now we pass operands
not encapsulated in NON_DEPENDENT_EXPR, and crash, because the
template tree for &a[x] has null DECL_FIELD_OFFSET.


Why are we trying to compare DECL_FIELD_OFFSET in C++, anyway?  I'd 
think we should limit that to C and in C++ rely on FIELD_DECL identity.



I don't think I meant this warning to be called in a template,
so let's avoid the problem this easy way.


Hmm, it seems potentially useful in a template, especially since we 
suppress it in instantiations.



PR c++/116534

gcc/cp/ChangeLog:

* call.cc (build_new_op) : Don't call
warn_tautological_cmp in a template.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wtautological-compare4.C: New test.
---
  gcc/cp/call.cc|  4 +++-
  .../g++.dg/warn/Wtautological-compare4.C  | 21 +++
  2 files changed, 24 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wtautological-compare4.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index fa7f05d76f6..d0d0c4b4d3c 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -7526,7 +7526,9 @@ build_new_op (const op_location_t &loc, enum tree_code 
code, int flags,
  && ((code_orig_arg1 == BOOLEAN_TYPE)
  ^ (code_orig_arg2 == BOOLEAN_TYPE)))
maybe_warn_bool_compare (loc, code, arg1, arg2);
-  if (complain & tf_warning && warn_tautological_compare)
+  if ((complain & tf_warning)
+ && warn_tautological_compare
+ && !processing_template_decl)
warn_tautological_cmp (loc, code, arg1, arg2);
/* Fall through.  */
  case SPACESHIP_EXPR:
diff --git a/gcc/testsuite/g++.dg/warn/Wtautological-compare4.C 
b/gcc/testsuite/g++.dg/warn/Wtautological-compare4.C
new file mode 100644
index 000..96308f49a42
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wtautological-compare4.C
@@ -0,0 +1,21 @@
+// PR c++/116534
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+template 
+struct Test {
+bool foo(unsigned x, unsigned y) {
+bool test = &a[x] == &b[y];
+   return test;
+}
+unsigned *a;
+unsigned *b;
+};
+
+void
+g ()
+{
+  Test t;
+  t.foo (0u, 1u);
+  t.foo (0u, 0u);
+}

base-commit: cdd5dd2125ca850aa8599f76bed02509590541ef




Re: [PATCH] c++: mutable temps in rodata [PR116369]

2024-09-10 Thread Jason Merrill

On 8/29/24 4:15 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14/13?


OK.


-- >8 --
Here we wrongly mark the reference temporary for g TREE_READONLY,
so it's put in .rodata and so we can't modify its subobject even
when the subobject is marked mutable.  This is so since r9-869.
r14-1785 fixed a similar problem, but not in set_up_extended_ref_temp.

PR c++/116369

gcc/cp/ChangeLog:

* call.cc (set_up_extended_ref_temp): Don't mark a temporary
TREE_READONLY if its type is TYPE_HAS_MUTABLE_P.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/initlist-opt7.C: New test.
---
  gcc/cp/call.cc|  4 +++-
  gcc/testsuite/g++.dg/tree-ssa/initlist-opt7.C | 13 +
  2 files changed, 16 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/initlist-opt7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index fa7f05d76f6..d30f36d49ff 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13964,7 +13964,9 @@ set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups,
init = cp_fully_fold (init);
if (TREE_CONSTANT (init))
  {
-  if (literal_type_p (type) && CP_TYPE_CONST_NON_VOLATILE_P (type))
+  if (literal_type_p (type)
+ && CP_TYPE_CONST_NON_VOLATILE_P (type)
+ && !TYPE_HAS_MUTABLE_P (type))
{
  /* 5.19 says that a constant expression can include an
 lvalue-rvalue conversion applied to "a glvalue of literal type
diff --git a/gcc/testsuite/g++.dg/tree-ssa/initlist-opt7.C 
b/gcc/testsuite/g++.dg/tree-ssa/initlist-opt7.C
new file mode 100644
index 000..2420db502a6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/initlist-opt7.C
@@ -0,0 +1,13 @@
+// PR c++/116369
+// { dg-do run { target c++11 } }
+
+struct f{
+  mutable int t;
+};
+
+const f &g = {1};
+
+int main()
+{
+  g.t++;
+}

base-commit: 6bfeba12c86b4d0dae27d99b484f64774dd49398




RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-10 Thread Thomas Schwinge
Hi Prathamesh!

On 2024-09-10T13:22:10+, Prathamesh Kulkarni  wrote:
>> -Original Message-
>> From: Thomas Schwinge 
>> Sent: Monday, September 9, 2024 8:50 PM

>> > Could you please test the patch for gcn backend ?

I've successfully tested x86_64 host with GCN as well as nvptx
offloading, and also ppc64le host with nvptx offloading.

I just realized two more minor things:

> [nvptx] Pass host specific ABI opts from mkoffload.
>
> The patch adds an option -foffload-abi-host-opts, which
> is set by host in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes its value
> to host_compiler.
>

Please add here "   PR target/96265".

> gcc/ChangeLog:
>   * common.opt (foffload-abi-host-opts): New option.
>   * config/aarch64/aarch64.cc (aarch64_offload_options): Pass
>   -foffload-abi-host-opts.
>   * config/i386/i386-opts.cc (ix86_offload_options): Likewise.
>   * config/rs6000/rs6000.cc (rs6000_offload_options): Likewise.
>   * config/nvptx/mkoffload.cc (offload_abi_host_opts): Define.
>   (compile_native): Append offload_abi_host_opts to argv_obstack.
>   (main): Handle option -foffload-abi-host-opts.
>   * config/gcn/mkoffload.cc (offload_abi_host_opts): Define.
>   (compile_native): Append offload_abi_host_opts to argv_obstack.
>   (main): Handle option -foffload-abi-host-opts.
>   * lto-wrapper.cc (merge_and_complain): Handle
>   -foffload-abi-host-opts.
>   (append_compiler_options): Likewise.
>   * opts.cc (common_handle_option): Likewise.
>
> Signed-off-by: Prathamesh Kulkarni 

Given that we're adding a new option to 'gcc/common.opt', do we need to
update (regenerate?) 'gcc/common.opt.urls'?  (I've not yet had the need
myself, and therefore not yet looked up how to do that.)  Or maybe not,
given that '-foffload-abi-host-opts=[...]' isn't documented?

Otherwise looks good to me; OK to push (with these minor items addressed,
as necessary), thanks!


Grüße
 Thomas


> diff --git a/gcc/common.opt b/gcc/common.opt
> index ea39f87ae71..d270e524ff4 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2361,6 +2361,10 @@ Enum(offload_abi) String(ilp32) 
> Value(OFFLOAD_ABI_ILP32)
>  EnumValue
>  Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
>  
> +foffload-abi-host-opts=
> +Common Joined MissingArgError(option missing after %qs)
> +-foffload-abi-host-opts=Specify host ABI options.
> +
>  fomit-frame-pointer
>  Common Var(flag_omit_frame_pointer) Optimization
>  When possible do not generate stack frames.
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 6a3f1a23a9f..6ccf08d1cc0 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19000,9 +19000,9 @@ static char *
>  aarch64_offload_options (void)
>  {
>if (TARGET_ILP32)
> -return xstrdup ("-foffload-abi=ilp32");
> +return xstrdup ("-foffload-abi=ilp32 
> -foffload-abi-host-opts=-mabi=ilp32");
>else
> -return xstrdup ("-foffload-abi=lp64");
> +return xstrdup ("-foffload-abi=lp64 -foffload-abi-host-opts=-mabi=lp64");
>  }
>  
>  static struct machine_function *
> diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
> index b8d981878ed..345bbf7709c 100644
> --- a/gcc/config/gcn/mkoffload.cc
> +++ b/gcc/config/gcn/mkoffload.cc
> @@ -133,6 +133,8 @@ static const char *gcn_dumpbase;
>  static struct obstack files_to_cleanup;
>  
>  enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
> +const char *offload_abi_host_opts = NULL;
> +
>  uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU 
> architecture.
>  uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4;
>  
> @@ -819,17 +821,10 @@ compile_native (const char *infile, const char 
> *outfile, const char *compiler,
>obstack_ptr_grow (&argv_obstack, gcn_dumpbase);
>obstack_ptr_grow (&argv_obstack, "-dumpbase-ext");
>obstack_ptr_grow (&argv_obstack, ".c");
> -  switch (offload_abi)
> -{
> -case OFFLOAD_ABI_LP64:
> -  obstack_ptr_grow (&argv_obstack, "-m64");
> -  break;
> -case OFFLOAD_ABI_ILP32:
> -  obstack_ptr_grow (&argv_obstack, "-m32");
> -  break;
> -default:
> -  gcc_unreachable ();
> -}
> +  if (!offload_abi_host_opts)
> +fatal_error (input_location,
> +  "%<-foffload-abi-host-opts%> not specified.");
> +  obstack_ptr_grow (&argv_obstack, offload_abi_host_opts);
>obstack_ptr_grow (&argv_obstack, infile);
>obstack_ptr_grow (&argv_obstack, "-c");
>obstack_ptr_grow (&argv_obstack, "-o");
> @@ -998,6 +993,15 @@ main (int argc, char **argv)
>"unrecognizable argument of option %<" STR "%>");
>   }
>  #undef STR
> +  else if (startswith (argv[i], "-foffload-abi-host-opts="))
> + {
> +   if (offload_abi_host_opts)
> + fatal_error (input_location,
> +  "%<-foffload-abi-host-opts%> specified "
> +  "multiple times");
> +   offlo

Re: [RFC 0/4] Hard Register Constraints

2024-09-10 Thread Joseph Myers
A new feature for asm statements definitely needs documenting in the GCC 
manual.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-10 Thread Jeff Law




On 9/8/24 3:28 PM, Christoph Müllner wrote:

On Sat, Sep 7, 2024 at 7:08 PM Jeff Law  wrote:




On 9/6/24 5:32 AM, Jin Ma wrote:

In the process of DF to SI, we generally use "unsigned_fix" rather than
"truncate" for conversion. Although this has no effect in general,
unexpected ICE often occurs when precise semantic analysis is required.

gcc/ChangeLog:

   * config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
  the Zfa extension on rv32.

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.

This doesn't look correct.

fmv.x.w just moves the bits from one place to another, no conversion.

So I don't see how the original pattern was correct.  Using truncate on
an FP mode source isn't defined.  But I also don't think the updated
pattern is correct either.
jeff


Having matching pattern for these Zfa moves seems pointless because
the optimization that utilizes the instructions in
riscv_split_doubleword_move() uses:
gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
In the XTheadFmv pattern, we use unspec, so the pattern won't match.
I think this should be done for Zfa as well.
But if the generated code is just moving bits, why can't we use the 
standard movXX patterns for the data movement?  Clearly there's 
something about this that I'm missing.


Jeff



Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Martin Uecker
Am Dienstag, dem 10.09.2024 um 13:51 + schrieb Qing Zhao:
>   #define alloc(P, FAM, COUNT) ({ \
>     typeof(P) __p; \
>     size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
>     __p = kmalloc(__size, GFP); \
>     typeof(_Generic(__builtin_counted_by_ref(__p->FAM), \
>       void *: (size_t *)NULL, \
>       default: __builtin_counted_by_ref(__p->FAM))) \
>   ret = __builtin_counted_by_ref(__p->FAM); \
>     if (ret) \
>   *ret = COUNT; \
>     P = __p; \
>   })

Maybe not too relevant for your patch, but I would use
something like this

#define alloc(P, FAM, COUNT) ({ \
  __auto_type __p = &(P); \
  __auto_type __c = (COUNT); \
  size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \
  if ((*__p = malloc(__size))) { \ 
__auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
 *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \    
  } \
})

to have brackets around the macro arguments to avoid accidents,
to reduce compile time due to multiple evaluation of the macro
arguments, and to avoid warnings for the null pointer dereference
on clang.

(Actually, I would also cast the sizes to a signed type
immediately to catch overflows with UBSan, but I kept
size_t here.)

Martin




RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-10 Thread Prathamesh Kulkarni
> -Original Message-
> From: Thomas Schwinge 
> Sent: Tuesday, September 10, 2024 8:19 PM
> To: Prathamesh Kulkarni ; Richard Biener
> 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub
> Jelinek 
> Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
> 
> On 2024-09-10T13:22:10+, Prathamesh Kulkarni
>  wrote:
> >> -Original Message-
> >> From: Thomas Schwinge 
> >> Sent: Monday, September 9, 2024 8:50 PM
> 
> >> > Could you please test the patch for gcn backend ?
> 
> I've successfully tested x86_64 host with GCN as well as nvptx
> offloading, and also ppc64le host with nvptx offloading.
Thanks for the thorough testing!
> 
> I just realized two more minor things:
> 
> > [nvptx] Pass host specific ABI opts from mkoffload.
> >
> > The patch adds an option -foffload-abi-host-opts, which is set by
> host
> > in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes its value to
> > host_compiler.
> >
> 
> Please add here "   PR target/96265".
> 
> > gcc/ChangeLog:
> >   * common.opt (foffload-abi-host-opts): New option.
> >   * config/aarch64/aarch64.cc (aarch64_offload_options): Pass
> >   -foffload-abi-host-opts.
> >   * config/i386/i386-opts.cc (ix86_offload_options): Likewise.
> >   * config/rs6000/rs6000.cc (rs6000_offload_options): Likewise.
> >   * config/nvptx/mkoffload.cc (offload_abi_host_opts): Define.
> >   (compile_native): Append offload_abi_host_opts to
> argv_obstack.
> >   (main): Handle option -foffload-abi-host-opts.
> >   * config/gcn/mkoffload.cc (offload_abi_host_opts): Define.
> >   (compile_native): Append offload_abi_host_opts to
> argv_obstack.
> >   (main): Handle option -foffload-abi-host-opts.
> >   * lto-wrapper.cc (merge_and_complain): Handle
> >   -foffload-abi-host-opts.
> >   (append_compiler_options): Likewise.
> >   * opts.cc (common_handle_option): Likewise.
> >
> > Signed-off-by: Prathamesh Kulkarni 
> 
> Given that we're adding a new option to 'gcc/common.opt', do we need
> to update (regenerate?) 'gcc/common.opt.urls'?  (I've not yet had the
> need myself, and therefore not yet looked up how to do that.)  Or
> maybe not, given that '-foffload-abi-host-opts=[...]' isn't
> documented?
I checked common.opt.urls doesn't seem to have entry for -foffload-abi,
so I guess it's probably not necessary for -foffload-abi-host-opts either ?
Or should we do it for both the options ?
> 
> Otherwise looks good to me; OK to push (with these minor items
> addressed, as necessary), thanks!
Thanks, I have committed the patch to trunk in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e783a4a683762487cb003ae48235f3d44875de1b
Will post a follow up patch to regenerate common.opt.urls for -foffload-abi and 
-foffload-abi-host-opts if required.

Thanks,
Prathamesh
> 
> 
> Grüße
>  Thomas
> 
> 
> > diff --git a/gcc/common.opt b/gcc/common.opt index
> > ea39f87ae71..d270e524ff4 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -2361,6 +2361,10 @@ Enum(offload_abi) String(ilp32)
> > Value(OFFLOAD_ABI_ILP32)  EnumValue
> >  Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
> >
> > +foffload-abi-host-opts=
> > +Common Joined MissingArgError(option missing after %qs)
> > +-foffload-abi-host-opts=Specify host ABI options.
> > +
> >  fomit-frame-pointer
> >  Common Var(flag_omit_frame_pointer) Optimization  When possible do
> > not generate stack frames.
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 6a3f1a23a9f..6ccf08d1cc0
> 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -19000,9 +19000,9 @@ static char *
> >  aarch64_offload_options (void)
> >  {
> >if (TARGET_ILP32)
> > -return xstrdup ("-foffload-abi=ilp32");
> > +return xstrdup ("-foffload-abi=ilp32
> > + -foffload-abi-host-opts=-mabi=ilp32");
> >else
> > -return xstrdup ("-foffload-abi=lp64");
> > +return xstrdup ("-foffload-abi=lp64
> > + -foffload-abi-host-opts=-mabi=lp64");
> >  }
> >
> >  static struct machine_function *
> > diff --git a/gcc/config/gcn/mkoffload.cc
> b/gcc/config/gcn/mkoffload.cc
> > index b8d981878ed..345bbf7709c 100644
> > --- a/gcc/config/gcn/mkoffload.cc
> > +++ b/gcc/config/gcn/mkoffload.cc
> > @@ -133,6 +133,8 @@ static const char *gcn_dumpbase;  static struct
> > obstack files_to_cleanup;
> >
> >  enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
> > +const char *offload_abi_host_opts = NULL;
> > +
> >  uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU
> architecture.
> >  uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4;
> >
> > @@ -819,17 +821,10 @@ compile_native (const char *infile, const char
> *outfile, const char *compiler,
> >obstack_ptr_grow (&argv_obstack, gcn_dumpbase);
> >obstack_ptr_grow (&argv_obstack, "-dumpbase-ext");
> >obstack_ptr_grow

[patch,avr] Tweak 32-bit comparisons.

2024-09-10 Thread Georg-Johann Lay

The order in which multi-byte EQ and NE comparisons are performing
the byte comparisons does not matter, and there are situations where
using SBIW on the high word can save an instruction.

This is for trunk.

Johann

--

AVR: Tweak 32-bit EQ and NE comparisons.

The order in which multi-byte EQ and NE comparisons are performing
the byte comparisons does not matter, and there are situations where
using SBIW on the high word can save an instruction.

gcc/
* config/avr/avr.cc (avr_out_compare): Tweak 32-bit EQ and NE
comparisons that can use SBIW for the hi16 part.AVR: Tweak 32-bit EQ and NE comparisons.

The order in which multi-byte EQ and NE comparisons are performing
the byte comparisons does not matter, and there are situations where
using SBIW on the high word can save an instruction.

gcc/
* config/avr/avr.cc (avr_out_compare): Tweak 32-bit EQ and NE
comparisons that can use SBIW for the hi16 part.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 1f809d8e1e3..99657911171 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -5647,6 +5647,31 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
 	}
 }
 
+  /* Comparisons == and != may change the order in which the sub-bytes are
+ being compared.  Start with the high 16 bits so we can use SBIW.  */
+
+  if (n_bytes == 4
+  && compare_eq_p (insn)
+  && AVR_HAVE_ADIW
+  && REGNO (xreg) >= REG_22)
+{
+  if (xval == const0_rtx)
+	return avr_asm_len ("sbiw %C0,0"   CR_TAB
+			"cpc %B0,__zero_reg__" CR_TAB
+			"cpc %A0,__zero_reg__", xop, plen, 3);
+
+  rtx xhi16 = simplify_gen_subreg (HImode, xval, mode, 2);
+  if (IN_RANGE (UINTVAL (xhi16) & GET_MODE_MASK (HImode), 0, 63)
+	  && reg_unused_after (insn, xreg))
+	{
+	  xop[1] = xhi16;
+	  avr_asm_len ("sbiw %C0,%1", xop, plen, 1);
+	  xop[1] = xval;
+	  return avr_asm_len ("sbci %B0,hi8(%1)" CR_TAB
+			  "sbci %A0,lo8(%1)", xop, plen, 2);
+	}
+}
+
   for (int i = 0; i < n_bytes; i++)
 {
   /* We compare byte-wise.  */


Re: [PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-10 Thread Christoph Müllner
On Tue, Sep 10, 2024 at 5:25 PM Jeff Law  wrote:
>
>
>
> On 9/8/24 3:28 PM, Christoph Müllner wrote:
> > On Sat, Sep 7, 2024 at 7:08 PM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 9/6/24 5:32 AM, Jin Ma wrote:
> >>> In the process of DF to SI, we generally use "unsigned_fix" rather than
> >>> "truncate" for conversion. Although this has no effect in general,
> >>> unexpected ICE often occurs when precise semantic analysis is required.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
> >>>   the Zfa extension on rv32.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
> >> This doesn't look correct.
> >>
> >> fmv.x.w just moves the bits from one place to another, no conversion.
> >>
> >> So I don't see how the original pattern was correct.  Using truncate on
> >> an FP mode source isn't defined.  But I also don't think the updated
> >> pattern is correct either.
> >> jeff
> >
> > Having matching pattern for these Zfa moves seems pointless because
> > the optimization that utilizes the instructions in
> > riscv_split_doubleword_move() uses:
> > gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
> > In the XTheadFmv pattern, we use unspec, so the pattern won't match.
> > I think this should be done for Zfa as well.
> But if the generated code is just moving bits, why can't we use the
> standard movXX patterns for the data movement?  Clearly there's
> something about this that I'm missing.

This is a special case for rv32 + D, where we have SI, SF and DF, but no DI.
This raises the question of how to model a 2x 32-bit register <-> DF
register transfer.
This is currently done using movdf_hardfloat_rv32, which could either go through
memory (no Zfa), or create an "artificial" 64-bit move using the
constraint "zmvr",
which will later be picked up by the doubleword-move splitter, which
converts the move
into a Zfa move via riscv_split_doubleword_move().


Re: [PATCH] c++, coroutines: Fix handling of bool await_suspend() [PR115905].

2024-09-10 Thread Jason Merrill

On 9/7/24 6:45 AM, Iain Sandoe wrote:

As noted in the PR the action of the existing implementation was to
treat a false value from await_suspend () as equivalent to "do not
suspend".  Actually it needs to be the equivalent of "resume" - and
we need to restart the dispatcher - since the await_suspend() body
could have already resumed the coroutine.
See also https://github.com/cplusplus/CWG/issues/601 (NAD) for more
discussion.


I'm having trouble wrapping my head around this stuff, so I'll take your 
word for it.  :)  It would be nice to have more full pseudocode.



+ /* Finish the destory dispatcher. */


Typo.


+  /* Build the dispatcher:
+ if (resume index is odd)
+   {
+ switch (resume index)
+  case 1:
+goto cleanup.
+  case ... odd suspension point number
+.CO_ACTOR (... odd suspension point number)
+break;
+  default:
+break;
+   }
+  else
+   {
+ coro.restart.dispatch:
+  case 0:
+goto start.
+  case ... even suspension point number
+.CO_ACTOR (... even suspension point number)
+break;
+  default:
+break;
+   }
+we should not get here unless something is broken badly.
+ __builtin_trap ();
+*/


Maybe mention in this that the odd case is destroy and the even case is 
resume?  Why is it useful to have two separate switches when they have 
the same default: behavior?



+  /* For the case of a boolean await_resume () that returns 'true' we should
+ restart the dispatch, since we cannot know if additional resumes were
+ executed from within the await_resume function.  */


Do you mean await_suspend here?

OK with at least the typo fixed.

Jason



Re: [PATCH] c++, v3: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]

2024-09-10 Thread Jason Merrill

On 9/6/24 2:38 PM, Jakub Jelinek wrote:

On Wed, Sep 04, 2024 at 10:31:48PM +0200, Franz Sirl wrote:

Hmm, it just occured to me, how about adding !NONVIRTUAL here? When
NONVIRTUAL is true, there is no conditional stmt at all, or?


Yeah, that makes sense, the problem doesn't happen in that case.

Here is an adjusted patch, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?


OK.


2024-09-06  Jakub Jelinek  

PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Use save_expr
on instance_ptr and function even if it doesn't have side-effects,
as long as it isn't a decl.

* g++.dg/ubsan/pr116449.C: New test.

--- gcc/cp/typeck.cc.jj 2024-09-02 17:07:30.115098114 +0200
+++ gcc/cp/typeck.cc2024-09-04 19:08:24.127490242 +0200
@@ -4188,10 +4188,23 @@ get_member_function_from_ptrfunc (tree *
if (!nonvirtual && is_dummy_object (instance_ptr))
nonvirtual = true;
  
-  if (TREE_SIDE_EFFECTS (instance_ptr))

-   instance_ptr = instance_save_expr = save_expr (instance_ptr);
+  /* Use save_expr even when instance_ptr doesn't have side-effects,
+unless it is a simple decl (save_expr won't do anything on
+constants), so that we don't ubsan instrument the expression
+multiple times.  See PR116449.  */
+  if (TREE_SIDE_EFFECTS (instance_ptr)
+ || (!nonvirtual && !DECL_P (instance_ptr)))
+   {
+ instance_save_expr = save_expr (instance_ptr);
+ if (instance_save_expr == instance_ptr)
+   instance_save_expr = NULL_TREE;
+ else
+   instance_ptr = instance_save_expr;
+   }
  
-  if (TREE_SIDE_EFFECTS (function))

+  /* See above comment.  */
+  if (TREE_SIDE_EFFECTS (function)
+ || (!nonvirtual && !DECL_P (function)))
function = save_expr (function);
  
/* Start by extracting all the information from the PMF itself.  */

--- gcc/testsuite/g++.dg/ubsan/pr116449.C.jj2024-09-04 18:58:46.106764285 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr116449.C   2024-09-04 18:58:46.106764285 
+0200
@@ -0,0 +1,14 @@
+// PR c++/116449
+// { dg-do compile }
+// { dg-options "-O2 -Wall -fsanitize=undefined" }
+
+struct C { void foo (int); void bar (); int c[16]; };
+typedef void (C::*P) ();
+struct D { P d; };
+static D e[1] = { { &C::bar } };
+
+void
+C::foo (int x)
+{
+  (this->*e[c[x]].d) ();
+}


Jakub





[patch,avr] Reorder avr.cc so it requires less forward declarations.

2024-09-10 Thread Georg-Johann Lay

This patch reorders functions in avr.cc so that less
forward declarations are needed.

Johann

--

AVR: avr.cc - Reorder functions to require less forward decls.

gcc/
* config/avr/avr.cc (avr_init_machine_status): Move code to...
(avr_option_override) : ...lambda.
(avr_insn_has_reg_unused_note_p): Move up.
(_reg_unused_after, reg_unused_after): Move up.
(output_reload_in_const): Move up.AVR: avr.cc - Reorder functions to require less forward decls.

gcc/
* config/avr/avr.cc (avr_init_machine_status): Move code to...
(avr_option_override) : ...lambda.
(avr_insn_has_reg_unused_note_p): Move up.
(_reg_unused_after, reg_unused_after): Move up.
(output_reload_in_const): Move up.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 1f809d8e1e3..f15d579ab1d 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -153,16 +153,6 @@ static const char *out_movqi_mr_r (rtx_insn *, rtx[], int *);
 static const char *out_movhi_mr_r (rtx_insn *, rtx[], int *);
 static const char *out_movsi_mr_r (rtx_insn *, rtx[], int *);
 
-static int get_sequence_length (rtx_insn *insns);
-static int sequent_regs_live (void);
-static const char *ptrreg_to_str (int);
-static int avr_num_arg_regs (machine_mode, const_tree);
-static int avr_operand_rtx_cost (rtx, machine_mode, enum rtx_code,
- int, bool);
-static void output_reload_in_const (rtx *, rtx, int *, bool);
-static struct machine_function *avr_init_machine_status (void);
-static bool _reg_unused_after (rtx_insn *insn, rtx reg, bool look_at_insn);
-
 
 /* Prototypes for hook implementors if needed before their implementation.  */
 
@@ -456,7 +446,10 @@ avr_option_override (void)
   avr_addr.sp_l = 0x3D + avr_arch->sfr_offset;
   avr_addr.sp_h = avr_addr.sp_l + 1;
 
-  init_machine_status = avr_init_machine_status;
+  init_machine_status = []()
+  {
+return ggc_cleared_alloc ();
+  };
 
   avr_log_set_avr_log();
 
@@ -473,14 +466,6 @@ avr_option_override (void)
   }
 }
 
-/* Function to set up the backend function structure.  */
-
-static struct machine_function *
-avr_init_machine_status (void)
-{
-  return ggc_cleared_alloc ();
-}
-
 
 /* Implement `INIT_EXPANDERS'.  */
 /* The function works like a singleton.  */
@@ -1179,7 +1164,7 @@ sequent_regs_live (void)
 
 /* Obtain the length sequence of insns.  */
 
-int
+static int
 get_sequence_length (rtx_insn *insns)
 {
   int length = 0;
@@ -2933,6 +2918,7 @@ avr_init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype, rtx libname,
   cfun->machine->sibcall_fails = 0;
 }
 
+
 /* Returns the number of registers to allocate for a function argument.  */
 
 static int
@@ -3099,6 +3085,152 @@ avr_xload_libgcc_p (machine_mode mode)
 }
 
 
+/* Return true when INSN has a REG_UNUSED note for hard reg REG.
+   rtlanal.cc::find_reg_note() uses == to compare XEXP (link, 0)
+   therefore use a custom function.  */
+
+static bool
+avr_insn_has_reg_unused_note_p (rtx_insn *insn, rtx reg)
+{
+  for (rtx link = REG_NOTES (insn); link; link = XEXP (link, 1))
+if (REG_NOTE_KIND (link) == REG_UNUSED
+	&& REG_P (XEXP (link, 0))
+	&& REGNO (reg) >= REGNO (XEXP (link, 0))
+	&& END_REGNO (reg) <= END_REGNO (XEXP (link, 0)))
+  return true;
+
+  return false;
+}
+
+
+/* A helper for the next function.
+   Return nonzero if REG is not used after INSN.
+   We assume REG is a reload reg, and therefore does
+   not live past labels.  It may live past calls or jumps though.  */
+
+static bool
+_reg_unused_after (rtx_insn *insn, rtx reg, bool look_at_insn)
+{
+  if (look_at_insn)
+{
+  /* If the reg is set by this instruction, then it is safe for our
+	 case.  Disregard the case where this is a store to memory, since
+	 we are checking a register used in the store address.  */
+  rtx set = single_set (insn);
+  if (set && !MEM_P (SET_DEST (set))
+	  && reg_overlap_mentioned_p (reg, SET_DEST (set)))
+	return 1;
+
+  /* This case occurs when fuse-add introduced a POST_INC addressing,
+	 but the address register is unused after.  */
+  if (set)
+	{
+	  rtx mem = MEM_P (SET_SRC (set)) ? SET_SRC (set) : SET_DEST (set);
+	  if (MEM_P (mem)
+	  && reg_overlap_mentioned_p (reg, XEXP (mem, 0))
+	  && avr_insn_has_reg_unused_note_p (insn, reg))
+	return 1;
+	}
+}
+
+  while ((insn = NEXT_INSN (insn)))
+{
+  rtx set;
+  enum rtx_code code = GET_CODE (insn);
+
+#if 0
+  /* If this is a label that existed before reload, then the register
+	 if dead here.  However, if this is a label added by reorg, then
+	 the register may still be live here.  We can't tell the difference,
+	 so we just ignore labels completely.  */
+  if (code == CODE_LABEL)
+	return 1;
+  /* else */
+#endif
+
+  if (!INSN_P (insn))
+	continue;
+
+  if (code == JUMP_INSN)
+	return 0;
+
+  /* If this is a sequence, we must handle them all at once.
+	 We could have for instance a cal

[committed] libstdc++: Add missing exception specifications in tests

2024-09-10 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Since r15-3532-g7cebc6384a0ad6 18_support/new_nothrow.cc fails in C++98 mode 
because G++
diagnoses missing exception specifications for the user-defined
(de)allocation functions. Add throw(std::bad_alloc) and throw() for
C++98 mode.

Similarly, 26_numerics/headers/numeric/synopsis.cc fails in C++20 mode
because the declarations of gcd and lcm are not noexcept.

libstdc++-v3/ChangeLog:

* testsuite/18_support/new_nothrow.cc (THROW_BAD_ALLOC): Define
macro to add exception specifications for C++98 mode.
(NOEXCEPT): Expand to throw() for C++98 mode.
* testsuite/26_numerics/headers/numeric/synopsis.cc (gcd, lcm):
Add noexcept.
---
 .../testsuite/18_support/new_nothrow.cc| 18 ++
 .../26_numerics/headers/numeric/synopsis.cc|  4 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/testsuite/18_support/new_nothrow.cc 
b/libstdc++-v3/testsuite/18_support/new_nothrow.cc
index a3251f5ad64..551b71dfa58 100644
--- a/libstdc++-v3/testsuite/18_support/new_nothrow.cc
+++ b/libstdc++-v3/testsuite/18_support/new_nothrow.cc
@@ -41,7 +41,15 @@ static void new_handler ()
 throw MyBadAlloc ();
 }
 
-void* operator new (size_t n)
+#if __cplusplus >= 201103L
+# define THROW_BAD_ALLOC noexcept(false)
+# define NOEXCEPT noexcept
+# else
+# define THROW_BAD_ALLOC throw(std::bad_alloc)
+# define NOEXCEPT throw()
+#endif
+
+void* operator new (size_t n) THROW_BAD_ALLOC
 {
 static size_t cntr;
 
@@ -64,12 +72,6 @@ void* operator new (size_t n)
 }
 }
 
-#if __cplusplus >= 201103L
-#define NOEXCEPT noexcept
-#else
-#define NOEXCEPT
-#endif
-
 void operator delete (void *p) NOEXCEPT
 {
 ++delete_called;
@@ -77,7 +79,7 @@ void operator delete (void *p) NOEXCEPT
 free (static_cast(p) - 1);
 }
 
-void* operator new[] (size_t n)
+void* operator new[] (size_t n) THROW_BAD_ALLOC
 {
 ++new_vec_called;
 return operator new(n);
diff --git a/libstdc++-v3/testsuite/26_numerics/headers/numeric/synopsis.cc 
b/libstdc++-v3/testsuite/26_numerics/headers/numeric/synopsis.cc
index 87670090f72..8c33974c2e9 100644
--- a/libstdc++-v3/testsuite/26_numerics/headers/numeric/synopsis.cc
+++ b/libstdc++-v3/testsuite/26_numerics/headers/numeric/synopsis.cc
@@ -161,10 +161,10 @@ namespace std {
 
 #if __cplusplus > 201703L
   template
-constexpr common_type_t gcd(M m, N n);
+constexpr common_type_t gcd(M m, N n) noexcept;
 
   template
-constexpr common_type_t lcm(M m, N n);
+constexpr common_type_t lcm(M m, N n) noexcept;
 
   template
 constexpr T midpoint(T a, T b) noexcept;
-- 
2.46.0



[PATCH v2] gimple ssa: Don't use __builtin_popcount in switch exp transform

2024-09-10 Thread Filip Kastl
Hi,

This is the second version of this patch.  See the previous version here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662462.html

In this verison I added a conversion to unsigned type to the bitmagic that I
generate as Andrew suggested.

Bootstrapped and regtested on x86_64-linux.  Ok to push?

Thanks,
Filip Kastl


--- 8< ---


Switch exponential transformation in the switch conversion pass
currently generates

tmp1 = __builtin_popcount (var);
tmp2 = tmp1 == 1;

when inserting code to determine if var is power of two.  If the target
doesn't support expanding the builtin as special instructions switch
conversion relies on this whole pattern being expanded as bitmagic.
However, it is possible that other GIMPLE optimizations move the two
statements of the pattern apart.  In that case the builtin becomes a
libgcc call in the final binary.  The call is slow and in case of
freestanding programs can result in linking error (this bug was
originally found while compiling Linux kernel).

This patch modifies switch conversion to insert the bitmagic
(var ^ (var - 1)) > (var - 1) instead of the builtin.

gcc/ChangeLog:

PR tree-optimization/116616
* tree-switch-conversion.cc (can_pow2p): Remove this function.
(gen_pow2p): Generate bitmagic instead of a builtin.  Remove the
TYPE parameter.
(switch_conversion::is_exp_index_transform_viable): Don't call
can_pow2p.
(switch_conversion::exp_index_transform): Call gen_pow2p without
the TYPE parameter.
* tree-switch-conversion.h: Remove
m_exp_index_transform_pow2p_type.

gcc/testsuite/ChangeLog:

PR tree-optimization/116616
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of the POPCOUNT internal fn call.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |  7 +-
 gcc/tree-switch-conversion.cc | 84 +--
 gcc/tree-switch-conversion.h  |  6 +-
 3 files changed, 23 insertions(+), 74 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index a8c9e03e515..4832f5b52c3 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,10 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
-   this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated and that it has been later expanded
-   into an internal function.  */
+   this switch into an array lookup.  */
 
 int foo(unsigned bar)
 {
@@ -30,4 +28,3 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index c1332a26094..00426d4 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -133,75 +133,33 @@ gen_log2 (tree op, location_t loc, tree *result, tree 
type)
   return stmts;
 }
 
-/* Is it possible to efficiently check that a value of TYPE is a power of 2?
-
-   If yes, returns TYPE.  If no, returns NULL_TREE.  May also return another
-   type.  This indicates that logarithm of the variable can be computed but
-   only after it is converted to this type.
-
-   Also see gen_pow2p.  */
-
-static tree
-can_pow2p (tree type)
-{
-  /* __builtin_popcount supports the unsigned type or its long and long long
- variants.  Choose the smallest out of those that can still fit TYPE.  */
-  int prec = TYPE_PRECISION (type);
-  int i_prec = TYPE_PRECISION (unsigned_type_node);
-  int li_prec = TYPE_PRECISION (long_unsigned_type_node);
-  int lli_prec = TYPE_PRECISION (long_long_unsigned_type_node);
-
-  if (prec <= i_prec)
-return unsigned_type_node;
-  else if (prec <= li_prec)
-return long_unsigned_type_node;
-  else if (prec <= lli_prec)
-return long_long_unsigned_type_node;
-  else
-return NULL_TREE;
-}
-
-/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
-   special optabs if target supports them.  Return the result as a
-   boolean_type_node ssa name through RESULT.  Assumes that OP's value will
-   be non-negative.  The generated check may give arbitrary answer for negative
-   values.
-
-   Before computing the check, OP may have to be converted to another type.
-   This should be specified in TYPE.  Use can_pow2p to decide what this type
-   should be.
-
-   Should only be used if can_pow2p returns true for type of OP.  */
+/* Build a sequence of gimple statements checking that OP is a power of 2.
+   Return the result as a boolean_type_node s

[PATCH] RISC-V: Align vconfig for TARGER_SFB_ALU

2024-09-10 Thread Dusan Stojkovic
  This patch addresses a missed opportunity to fuse vsetvl_infos.
Instead of checking whether demands for merging configurations of
vsetvl_info are all met, the demands are checked individually.

  The case in question occurs because of the conditional move
instruction which sifive-7, sifive-p400 and sifive-p600 support.
Firstly, the conditional move generated rearranges the CFG.
Secondly, because the conditional move generated uses
the same register in the if_then_else pattern as vsetvli before it
curr_info and prev_info won't be merged.

  Tested for tune={sifive-7-series, sifive-p400-series, sifive-p600-series}
and arch={rv64gcv, rv32gcv} making no new regressions.
The fusion performed also makes the following tests pass:
* vsetvlmax-9.c
* vsetvlmax-10.c
* vsetvlmax-11.c
* vsetvlmax-15.c
for all tune configurations mentioned. Specifically, the
scan-assembler-times 
vsetvli\\s+[a-x0-9]+,\\s*zero,\\s*e32,\\s*m1,\\s*t[au],\\s*m[au] 1
tests in previously mentioned files are now passing.

2024-09-10 Dusan Stojkovic 

PR target/113035 - RISC-V: regression testsuite errors -mtune=sifive-7-series

PR target/113035

gcc\ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info):
---
 gcc/config/riscv/riscv-vsetvl.cc | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 017efa8bc17..f93e0c313b6 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3052,6 +3052,22 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
  else
{
  vsetvl_info &prev_info = src_block_info.get_exit_info ();
+ if (TARGET_SFB_ALU
+ && prev_info.valid_p ()
+ && curr_info.valid_p ()
+ && prev_info.vl_used_by_non_rvv_insn_p ()
+ && !curr_info.vl_used_by_non_rvv_insn_p ())
+ {
+   // Try to merge each demand individually
+   if (m_dem.sew_lmul_compatible_p (prev_info, curr_info))
+   {
+ m_dem.merge_sew_lmul (prev_info, curr_info);
+   }
+   if (m_dem.policy_compatible_p (prev_info, curr_info))
+   {
+ m_dem.merge_policy (prev_info, curr_info);
+   }
+ }
  if (!prev_info.valid_p ()
  || m_dem.available_p (prev_info, curr_info)
  || !m_dem.compatible_p (prev_info, curr_info))
--
2.43.0

CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.


Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Qing Zhao
Thanks a lot for the tips.

I updated the 2 testing cases per your suggestion, they work well.

I will send the 3rd version of the patch soon.

Qing

> On Sep 10, 2024, at 11:42, Martin Uecker  wrote:
> 
> Am Dienstag, dem 10.09.2024 um 13:51 + schrieb Qing Zhao:
>>   #define alloc(P, FAM, COUNT) ({ \
>> typeof(P) __p; \
>> size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
>> __p = kmalloc(__size, GFP); \
>> typeof(_Generic(__builtin_counted_by_ref(__p->FAM), \
>> void *: (size_t *)NULL, \
>> default: __builtin_counted_by_ref(__p->FAM))) \
>>   ret = __builtin_counted_by_ref(__p->FAM); \
>> if (ret) \
>>   *ret = COUNT; \
>> P = __p; \
>>   })
> 
> Maybe not too relevant for your patch, but I would use
> something like this
> 
> #define alloc(P, FAM, COUNT) ({ \
>  __auto_type __p = &(P); \
>  __auto_type __c = (COUNT); \
>  size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \
>  if ((*__p = malloc(__size))) { \ 
>__auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
> *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
>  } \
> })
> 
> to have brackets around the macro arguments to avoid accidents,
> to reduce compile time due to multiple evaluation of the macro
> arguments, and to avoid warnings for the null pointer dereference
> on clang.
> 
> (Actually, I would also cast the sizes to a signed type
> immediately to catch overflows with UBSan, but I kept
> size_t here.)
> 
> Martin
> 
> 



[PATCH] c++: Don't crash when mangling member with anonymous union type [PR100632]

2024-09-10 Thread Simon Martin
We currently crash upon the following valid code (the case from the PR,
invalid, can be made valid by simply adding a definition for f at line
2)

=== cut here ===
struct B { const int *p; };
template void f() {}
struct Nested { union { int k; }; } nested;
template void f();
=== cut here ===

The problem is that because of the anonymous union, nested.k is
represented as nested.$(decl_of_anon_union).k, and we run into an assert
in write_member_name just before calling write_unqualified_name, because
DECL_NAME ($decl_of_anon_union) is 0.

This patch fixes this by relaxing the assert to also accept members with
an ANON_AGGR_TYPE_P type, that are handled by write_unqualified_name
just fine.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/100632

gcc/cp/ChangeLog:

* mangle.cc (write_member_name): Relax assert to accept anonymous
unions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class67.C: New test.

---
 gcc/cp/mangle.cc | 3 ++-
 gcc/testsuite/g++.dg/cpp2a/nontype-class67.C | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class67.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 46dc6923add..11dc66c8d16 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -3255,7 +3255,8 @@ write_member_name (tree member)
 }
   else if (DECL_P (member))
 {
-  gcc_assert (!DECL_OVERLOADED_OPERATOR_P (member));
+  gcc_assert (ANON_AGGR_TYPE_P (TREE_TYPE (member))
+ || !DECL_OVERLOADED_OPERATOR_P (member));
   write_unqualified_name (member);
 }
   else if (TREE_CODE (member) == TEMPLATE_ID_EXPR)
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class67.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class67.C
new file mode 100644
index 000..accf4284883
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class67.C
@@ -0,0 +1,9 @@
+// PR c++/100632
+// { dg-do compile { target c++20 } }
+
+struct B { const int* p; };
+template void f() {}
+
+struct Nested { union { int k; }; } nested;
+
+template void f();
-- 
2.44.0




Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2024 at 06:02:45PM +, Qing Zhao wrote:
> > #define alloc(P, FAM, COUNT) ({ \
> >  __auto_type __p = &(P); \
> >  __auto_type __c = (COUNT); \
> >  size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \

Shouldn't that be
  size_t __size = offsetof(__typeof(*__p), FAM) + sizeof(*(*__p)->FAM) * __c; \
?

> >  if ((*__p = malloc(__size))) { \ 
> >__auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
> > *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
> >  } \
> > })
> > 
> > to have brackets around the macro arguments to avoid accidents,
> > to reduce compile time due to multiple evaluation of the macro
> > arguments, and to avoid warnings for the null pointer dereference
> > on clang.

Jakub



[PATCH v3] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Qing Zhao
Hi, This is the 3rd version of the patch.

compared to the 2nd version, the only change is the update in testing
cases per Martin's suggestions.

when the 2nd version is compared to the first version, the major changes are:

1. change the name of the builtin from __builtin_get_counted_by to
  __builtin_counted_by_ref in order to reflect the fact that the returned
  value of it is a reference to the object.

2. make typeof(__builtin_counted_by_ref) working.

3. update the testing case to use the new builtin inside _Generic.

bootstrapped and regress tested on both X86 and aarch64. no issue.

Okay for the trunk?

thanks.

Qing.

===

With the addition of the 'counted_by' attribute and its wide roll-out
within the Linux kernel, a use case has been found that would be very
nice to have for object allocators: being able to set the counted_by
counter variable without knowing its name.

For example, given:

  struct foo {
...
int counter;
...
struct bar array[] __attribute__((counted_by (counter)));
  } *p;

The existing Linux object allocators are roughly:

  #define alloc(P, FAM, COUNT) ({ \
size_t __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
__p = kmalloc(__size, GFP); \
  })

Right now, any addition of a counted_by annotation must also
include an open-coded assignment of the counter variable after
the allocation:

  p = alloc(p, array, how_many);
  p->counter = how_many;

In order to avoid the tedious and error-prone work of manually adding
the open-coded counted-by intializations everywhere in the Linux
kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
to be added to help the adoption of the counted-by attribute.

 -- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
 The built-in function '__builtin_counted_by_ref' checks whether the
 array object pointed by the pointer PTR has another object
 associated with it that represents the number of elements in the
 array object through the 'counted_by' attribute (i.e.  the
 counted-by object).  If so, returns a pointer to the corresponding
 counted-by object.  If such counted-by object does not exist,
 returns a NULL pointer.

 This built-in function is only available in C for now.

 The argument PTR must be a pointer to an array.  The TYPE of the
 returned value must be a pointer type pointing to the corresponding
 type of the counted-by object or VOID pointer type in case of a
 NULL pointer being returned.

With this new builtin, the central allocator could be updated to:

  #define alloc(P, FAM, COUNT) ({ \
__auto_type __p = &(P); \
__auto_type __c = (COUNT); \
size_t __size = sizeof(*(*__p)) + sizeof(*((*__p)->FAM)) * __c; \
if ((*__p = kmalloc(__size))) { \
  __auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
  *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
} \
  })

And then structs can gain the counted_by attribute without needing
additional open-coded counter assignments for each struct, and
unannotated structs could still use the same allocator.

PR c/116016

gcc/c-family/ChangeLog:

* c-common.cc: Add new __builtin_counted_by_ref.
* c-common.h (enum rid): Add RID_BUILTIN_COUNTED_BY_REF.

gcc/c/ChangeLog:

* c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
* c-parser.cc (has_counted_by_object): New routine.
(get_counted_by_ref): New routine.
(c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
* c-tree.h: New global in_builtin_counted_by_ref.
* c-typeck.cc (build_component_ref): Enable generating
.ACCESS_WITH_SIZE inside typeof when inside builtin_counted_by_ref.

gcc/ChangeLog:

* doc/extend.texi: Add documentation for __builtin_counted_by_ref.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-counted-by-ref-1.c: New test.
* gcc.dg/builtin-counted-by-ref.c: New test.
---
 gcc/c-family/c-common.cc  |  1 +
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-decl.cc   |  1 +
 gcc/c/c-parser.cc | 77 +++
 gcc/c/c-tree.h|  1 +
 gcc/c/c-typeck.cc |  7 +-
 gcc/doc/extend.texi   | 55 +++
 .../gcc.dg/builtin-counted-by-ref-1.c | 94 +++
 gcc/testsuite/gcc.dg/builtin-counted-by-ref.c | 58 
 9 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-counted-by-ref-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-counted-by-ref.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e7e371fd26f..15b90bae8b5 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -430,6 +430,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_choose_expr", RID_C

[PATCH] c++: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-10 Thread Jakub Jelinek
Hi!

The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html
patch adds CWG 2867 support for namespace locals.

Those vars are just pushed into {static,tls}_aggregates chain, then
pruned from those lists, separated by priority and finally emitted into
the corresponding dynamic initialization functions.
The patch adds two flags used on the TREE_LIST nodes in those lists,
one marks the structured binding base variable and/or associated ref
extended temps, another marks the vars initialized using get methods.
The flags are preserved across the pruning, for splitting into by priority
all associated decls of a structured binding using tuple* are forced
into the same priority as the first one, and finally when actually emitting
code, CLEANUP_POINT_EXPRs are disabled in the base initializer(s) and
code from the bases and non-bases together is wrapped into a single
CLEANUP_POINT_EXPR.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, I haven't touched the module handling; from what I can see,
prune_vars_needing_no_initialization is destructive to the
{static,tls}_aggregates lists (keeps the list NULL at the end or if there
are errors or it contains some DECL_EXTERNAL decls, keeps in there just
those, not the actual vars that need dynamic initialization) and
the module writing is done only afterwards, so I think it could work
reasonably only if header_module_p ().  Can namespace scope structured
bindings appear in header_module_p () or !header_module_p () modules?
How would a testcase using them look like?  Especially when structured
bindings can't be extern nor templates nor inline there can be just one
definition, so the module would need to be included in a single file, no?
In any case, the patch shouldn't make the modules case any worse, it
just adds TREE_LIST flags which will not be streamed for modules and so
if one can use structured bindings in modules, possibly CWG 2867 would be
not fixed for those but nothing worse than that.

2024-09-10  Jakub Jelinek  

PR c++/115769
gcc/cp/
* cp-tree.h (STATIC_INIT_DECOMP_BASE_P): Define.
(STATIC_INIT_DECOMP_NONBASE_P): Define.
* decl.cc (cp_finish_decl): Mark nodes in {static,tls}_aggregates
with 
* decl2.cc (decomp_handle_one_var, decomp_finalize_var_list): New
functions.
(emit_partial_init_fini_fn): Use them.
(prune_vars_needing_no_initialization): Clear
STATIC_INIT_DECOMP_*BASE_P flags if needed.
(partition_vars_for_init_fini): Use same priority for
consecutive STATIC_INIT_DECOMP_*BASE_P vars and propagate
those flags to new TREE_LISTs when possible.  Formatting fix.
(handle_tls_init): Use decomp_handle_one_var and
decomp_finalize_var_list functions.
gcc/testsuite/
* g++.dg/DRs/dr2867-5.C: New test.
* g++.dg/DRs/dr2867-6.C: New test.
* g++.dg/DRs/dr2867-7.C: New test.
* g++.dg/DRs/dr2867-8.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-09-07 09:31:20.601484156 +0200
+++ gcc/cp/cp-tree.h2024-09-09 15:53:44.924112247 +0200
@@ -470,6 +470,7 @@ extern GTY(()) tree cp_global_trees[CPTI
   BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P (in BASELINK)
   BIND_EXPR_VEC_DTOR (in BIND_EXPR)
   ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (in ATOMIC_CONSTR)
+  STATIC_INIT_DECOMP_BASE_P (in the TREE_LIST for {static,tls}_aggregates)
2: IDENTIFIER_KIND_BIT_2 (in IDENTIFIER_NODE)
   ICS_THIS_FLAG (in _CONV)
   DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (in VAR_DECL)
@@ -489,6 +490,8 @@ extern GTY(()) tree cp_global_trees[CPTI
   IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR)
   PACK_EXPANSION_AUTO_P (in *_PACK_EXPANSION)
   contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
+  STATIC_INIT_DECOMP_NONBASE_P (in the TREE_LIST
+   for {static,tls}_aggregates)
3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR)
   ICS_BAD_FLAG (in _CONV)
   FN_TRY_BLOCK_P (in TRY_BLOCK)
@@ -5947,6 +5950,21 @@ extern bool defer_mangling_aliases;
 
 extern bool flag_noexcept_type;
 
+/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic
+   initialization of namespace scope structured binding base or related
+   extended ref init temps.  Temporaries from the initialization of
+   STATIC_INIT_DECOMP_BASE_P dynamic initializers should be destroyed only
+   after the last STATIC_INIT_DECOMP_NONBASE_P dynamic initializer following
+   it.  */
+#define STATIC_INIT_DECOMP_BASE_P(NODE) \
+  TREE_LANG_FLAG_1 (TREE_LIST_CHECK (NODE))
+
+/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic
+   initialization of namespace scope structured binding non-base
+   variable using get.  */
+#define STATIC_INIT_DECOMP_NONBASE_P(NODE) \
+  TREE_LANG_FLAG_2 (TREE_LIST_CHECK (NODE))
+
 /* A list of namespace-scope objects which have constructors or
destructors which

Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Qing Zhao



> On Sep 10, 2024, at 14:09, Jakub Jelinek  wrote:
> 
> On Tue, Sep 10, 2024 at 06:02:45PM +, Qing Zhao wrote:
>>> #define alloc(P, FAM, COUNT) ({ \
>>> __auto_type __p = &(P); \
>>> __auto_type __c = (COUNT); \
>>> size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \
> 
> Shouldn't that be
>  size_t __size = offsetof(__typeof(*__p), FAM) + sizeof(*(*__p)->FAM) * __c; \
> ?

Yeah, I think that the correct size computation should be:

#define MAX(A, B) (A > B) ? (A) : (B)
size_t __size = MAX (sizeof (*(*__p)), offsetof(__typeof(*__p), FAM) + 
sizeof(*(*__p)->FAM) * __c); \

Qing
 
 
> 
>>> if ((*__p = malloc(__size))) { \ 
>>>   __auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
>>>*_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
>>> } \
>>> })
>>> 
>>> to have brackets around the macro arguments to avoid accidents,
>>> to reduce compile time due to multiple evaluation of the macro
>>> arguments, and to avoid warnings for the null pointer dereference
>>> on clang.
> 
> Jakub
> 



Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2024 at 06:31:23PM +, Qing Zhao wrote:
> 
> 
> > On Sep 10, 2024, at 14:09, Jakub Jelinek  wrote:
> > 
> > On Tue, Sep 10, 2024 at 06:02:45PM +, Qing Zhao wrote:
> >>> #define alloc(P, FAM, COUNT) ({ \
> >>> __auto_type __p = &(P); \
> >>> __auto_type __c = (COUNT); \
> >>> size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \
> > 
> > Shouldn't that be
> >  size_t __size = offsetof(__typeof(*__p), FAM) + sizeof(*(*__p)->FAM) * 
> > __c; \
> > ?
> 
> Yeah, I think that the correct size computation should be:
> 
> #define MAX(A, B) (A > B) ? (A) : (B)
> size_t __size = MAX (sizeof (*(*__p)), offsetof(__typeof(*__p), FAM) + 
> sizeof(*(*__p)->FAM) * __c); \

No, why?  sizeof (*(*__p)) should be always >= offsetof(__typeof(*__p), FAM),
you can't have an offset outside of a structure (ok, except doing something
like use fld[100] as FAM).  offsetof + sizeof (elt) * count is the actually
needed size, say if it is
struct S { size_t a; char b; __attribute__((counted_by (a))) char c[]; };
then you don't really need 2 * sizeof (size_t) + N size of N elements
in the flexible array, just sizeof (size_t) + 1 + N is enough.

Or is counted_by attribute handling it in some weird way?

Jakub



Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Martin Uecker
Am Dienstag, dem 10.09.2024 um 20:36 +0200 schrieb Jakub Jelinek:
> On Tue, Sep 10, 2024 at 06:31:23PM +, Qing Zhao wrote:
> > 
> > 
> > > On Sep 10, 2024, at 14:09, Jakub Jelinek  wrote:
> > > 
> > > On Tue, Sep 10, 2024 at 06:02:45PM +, Qing Zhao wrote:
> > > > > #define alloc(P, FAM, COUNT) ({ \
> > > > > __auto_type __p = &(P); \
> > > > > __auto_type __c = (COUNT); \
> > > > > size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \
> > > 
> > > Shouldn't that be
> > >  size_t __size = offsetof(__typeof(*__p), FAM) + sizeof(*(*__p)->FAM) * 
> > > __c; \
> > > ?
> > 
> > Yeah, I think that the correct size computation should be:
> > 
> > #define MAX(A, B) (A > B) ? (A) : (B)
> > size_t __size = MAX (sizeof (*(*__p)), offsetof(__typeof(*__p), FAM) + 
> > sizeof(*(*__p)->FAM) * __c); \
> 
> No, why?  sizeof (*(*__p)) should be always >= offsetof(__typeof(*__p), FAM),
> you can't have an offset outside of a structure (ok, except doing something
> like use fld[100] as FAM).  offsetof + sizeof (elt) * count is the actually
> needed size, say if it is

(offset + sizeof * c) could be smaller than sizeof (*(*__p))). 

Martin


> struct S { size_t a; char b; __attribute__((counted_by (a))) char c[]; };
> then you don't really need 2 * sizeof (size_t) + N size of N elements
> in the flexible array, just sizeof (size_t) + 1 + N is enough.
> 
> Or is counted_by attribute handling it in some weird way?

> 
>   Jakub
> 



Re: [PATCH v2] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Qing Zhao


> On Sep 10, 2024, at 14:48, Martin Uecker  wrote:
> 
> Am Dienstag, dem 10.09.2024 um 20:36 +0200 schrieb Jakub Jelinek:
>> On Tue, Sep 10, 2024 at 06:31:23PM +, Qing Zhao wrote:
>>> 
>>> 
 On Sep 10, 2024, at 14:09, Jakub Jelinek  wrote:
 
 On Tue, Sep 10, 2024 at 06:02:45PM +, Qing Zhao wrote:
>> #define alloc(P, FAM, COUNT) ({ \
>> __auto_type __p = &(P); \
>> __auto_type __c = (COUNT); \
>> size_t __size = sizeof(*(*__p)) + sizeof(*(*__p)->FAM) * __c; \
 
 Shouldn't that be
 size_t __size = offsetof(__typeof(*__p), FAM) + sizeof(*(*__p)->FAM) * 
 __c; \
 ?
>>> 
>>> Yeah, I think that the correct size computation should be:
>>> 
>>> #define MAX(A, B) (A > B) ? (A) : (B)
>>> size_t __size = MAX (sizeof (*(*__p)), offsetof(__typeof(*__p), FAM) + 
>>> sizeof(*(*__p)->FAM) * __c); \
>> 
>> No, why?  sizeof (*(*__p)) should be always >= offsetof(__typeof(*__p), FAM),
>> you can't have an offset outside of a structure (ok, except doing something
>> like use fld[100] as FAM).  offsetof + sizeof (elt) * count is the actually
>> needed size, say if it is
> 
> (offset + sizeof * c) could be smaller than sizeof (*(*__p))).

Yes, that’s the reason.

[ ~]$ cat t.c
#include 
#include 
struct flex {
 int b;
 char other;
 char c[];
} *array_annotated;

int main ()
{
 printf ("the size of struct is %d \n", sizeof(struct flex));
 printf ("the offset of c is %d \n", offsetof(struct flex, c));
 return 0;
}

[ ~]$ gcc t.c; ./a.out
the size of struct is 8 
the offset of c is 5 

Then if we only allocate 2 elements for the FAM “c”,  then offset  + sizeof 
(char) * 2 = 5 + 2 = 7, which is smaller than sizeof (struct flex), 8.

Qing

> 
> Martin
> 
> 
>> struct S { size_t a; char b; __attribute__((counted_by (a))) char c[]; };
>> then you don't really need 2 * sizeof (size_t) + N size of N elements
>> in the flexible array, just sizeof (size_t) + 1 + N is enough.
>> 
>> Or is counted_by attribute handling it in some weird way?
> 
>> 
>> Jakub
>> 
> 



RE: [PATCH 1/2]middle-end: refactor type to be explicit in operand_equal_p [PR114932]

2024-09-10 Thread Tamar Christina
ping

> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, August 20, 2024 2:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH 1/2]middle-end: refactor type to be explicit in 
> operand_equal_p
> [PR114932]
> 
> Hi All,
> 
> This is a refactoring with no expected behavioral change.
> The goal with this is to make the type of the expressions being used explicit.
> 
> I did not change all the recursive calls to operand_equal_p () to recurse
> directly to the new function but instead this goes through the top level call
> which re-extracts the types.
> 
> This was done because in most of the cases where we recurse type == arg.
> The second patch makes use of this new flexibility to implement an overload
> of operand_equal_p which checks for equality under two's complement.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114932
>   * fold-const.cc (operand_compare::operand_equal_p): Split into one that
>   takes explicit type parameters and use that in public one.
>   * fold-const.h (class operand_compare): Add operand_equal_p private
>   overload.
> 
> ---
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index
> b82ef137e2f2096f86c20df3c7749747e604177e..878545b1148b839e8a8e866f
> 38e31161f0d116c8 100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -273,6 +273,12 @@ protected:
>   true is returned.  Then RET is set to corresponding comparsion result.  
> */
>bool verify_hash_value (const_tree arg0, const_tree arg1, unsigned int 
> flags,
> bool *ret);
> +
> +private:
> +  /* Return true if two operands are equal.  The flags fields can be used
> + to specify OEP flags described in tree-core.h.  */
> +  bool operand_equal_p (tree, const_tree, tree, const_tree,
> + unsigned int flags);
>  };
> 
>  #endif // GCC_FOLD_CONST_H
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index
> 8908e7381e72cbbf4a8fd96f18cbf4436aba8441..71e82b1d76d4106c7c23c54af
> 8b35905a1af9f1c 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3156,6 +3156,17 @@ combine_comparisons (location_t loc,
>  bool
>  operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
> unsigned int flags)
> +{
> +  return operand_equal_p (TREE_TYPE (arg0), arg0, TREE_TYPE (arg1), arg1,
> flags);
> +}
> +
> +/* The same as operand_equal_p however the type of ARG0 and ARG1 are
> assumed to be
> +   the TYPE0 and TYPE1 respectively.  */
> +
> +bool
> +operand_compare::operand_equal_p (tree type0, const_tree arg0,
> +   tree type1, const_tree arg1,
> +   unsigned int flags)
>  {
>bool r;
>if (verify_hash_value (arg0, arg1, flags, &r))
> @@ -3166,25 +3177,25 @@ operand_compare::operand_equal_p (const_tree
> arg0, const_tree arg1,
> 
>/* If either is ERROR_MARK, they aren't equal.  */
>if (TREE_CODE (arg0) == ERROR_MARK || TREE_CODE (arg1) == ERROR_MARK
> -  || TREE_TYPE (arg0) == error_mark_node
> -  || TREE_TYPE (arg1) == error_mark_node)
> +  || type0 == error_mark_node
> +  || type1 == error_mark_node)
>  return false;
> 
>/* Similar, if either does not have a type (like a template id),
>   they aren't equal.  */
> -  if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> +  if (!type0 || !type1)
>  return false;
> 
>/* Bitwise identity makes no sense if the values have different layouts.  
> */
>if ((flags & OEP_BITWISE)
> -  && !tree_nop_conversion_p (TREE_TYPE (arg0), TREE_TYPE (arg1)))
> +  && !tree_nop_conversion_p (type0, type1))
>  return false;
> 
>/* We cannot consider pointers to different address space equal.  */
> -  if (POINTER_TYPE_P (TREE_TYPE (arg0))
> -  && POINTER_TYPE_P (TREE_TYPE (arg1))
> -  && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
> -   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> +  if (POINTER_TYPE_P (type0)
> +  && POINTER_TYPE_P (type1)
> +  && (TYPE_ADDR_SPACE (TREE_TYPE (type0))
> +   != TYPE_ADDR_SPACE (TREE_TYPE (type1
>  return false;
> 
>/* Check equality of integer constants before bailing out due to
> @@ -3211,12 +3222,15 @@ operand_compare::operand_equal_p (const_tree
> arg0, const_tree arg1,
> 
>/* If both types don't have the same precision, then it is not safe
>to strip NOPs.  */
> -  if (element_precision (TREE_TYPE (arg0))
> -   != element_precision (TREE_TYPE (arg1)))
> +  if (element_precision (type0)
> +   != element_precision (type1))
>   return false;
> 
>STRIP_NOPS (arg0);
>STRIP_NOPS (arg1);
> +
> +  type0 = TREE_TYPE (arg0);
> +  type1 = TREE_TYPE (arg1);
>  }
>  #if 0
>/* FIXME: Fortran

RE: [PATCH 2/2]middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]

2024-09-10 Thread Tamar Christina
ping

> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, August 20, 2024 2:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH 2/2]middle-end: use two's complement equality when comparing
> IVs during candidate selection [PR114932]
> 
> Hi All,
> 
> IVOPTS normally uses affine trees to perform comparisons between different 
> IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
> 
> In some cases where we have a two-complements equivalence but not a strict
> signedness equivalencies we end up generating both a signed and unsigned IV 
> for
> the same candidate.
> 
> This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag will
> check if the operands would produce the same bit values after the computations
> even if the final sign is different.
> 
> This happens quite a lot with fortran but can also happen in C because this 
> came
> code is unable to figure out when one expression is a multiple of another.
> 
> As an example in the attached testcase we get:
> 
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>group:0 --> iv_cand:6, cost=(0,1)
>group:1 --> iv_cand:1, cost=(0,0)
>group:2 --> iv_cand:8, cost=(0,1)
>group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> :
> inv_expr 1: stride.3_27 * 4
> inv_expr 2: (unsigned long) stride.3_27 * 4
> 
> These end up being used in the same group:
> 
> Group 1:
> cand  costcompl.  inv.expr.   inv.vars
> 1 0   0   NIL;6
> 2 0   0   NIL;6
> 3 0   0   NIL;6
> 
> which ends up with IV opts picking the signed and unsigned IVs:
> 
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>group:0 --> iv_cand:6, cost=(0,1)
>group:1 --> iv_cand:1, cost=(0,0)
>group:2 --> iv_cand:8, cost=(0,1)
>group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> and so generates the same IV as both signed and unsigned:
> 
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> 58.2545), maybe hot
> ;;prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   28 [always]  count:23622320 (estimated locally, freq 6.4080)
> (FALLTHRU,EXECUTABLE)
> ;;25 [always]  count:191126046 (estimated locally, freq 
> 51.8465)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI 
>   # ivtmp.28_90 = PHI 
> 
> ...
> 
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> 58.2545), maybe hot
> ;;prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;pred:   22 [always]  count:95443719 (estimated locally, freq 
> 25.8909)
> (FALLTHRU)
> ;;21 [33.3% (guessed)]  count:71582790 (estimated locally, 
> freq 19.4182)
> (TRUE_VALUE,EXECUTABLE)
> ;;31 [33.3% (guessed)]  count:47721860 (estimated locally, 
> freq 12.9455)
> (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
> 
> These two IVs are always used as unsigned, so IV ops generates:
> 
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
> 
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
> 
> This is because candidate 6 and 8 are equivalent under two's complement but 
> have
> different signs.
> 
> This patch changes it so that if you have two IVs that are affine equivalent 
> to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
> 
> With it we get:
> 
> :
> inv_expr 1: stride.3_27 * 4
> 
> :
> Group 0:
>   cand  costcompl.  inv.expr.   inv.vars
>   5 0   2   NIL;NIL;
>   6 0   3   NIL;NIL;
> 
> Group 1:
>   cand  costcompl.  inv.expr.   inv.vars
>   1 0   0   NIL;6
>   2 0   0   NIL;6
>   3 0   0   NIL;6
>   4 0   0   NIL;6
> 
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>group:0 --> iv_cand:6, cost=(0,3)
>group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
> 
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> compile
> time. There's also a 5% 

[committed] libstdc++: std::string move assignment should not use POCCA trait [PR116641]

2024-09-10 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Backports to follow.

-- >8 --

The changes to implement LWG 2579 (r10-327-gdb33efde17932f) made
std::string::assign use the propagate_on_container_copy_assignment
(POCCA) trait, for consistency with operator=(const basic_string&).
However, this also unintentionally affected operator=(basic_string&&)
which calls assign(str) to make a deep copy when performing a move is
not possible. The fix is for the move assignment operator to call
_M_assign(str) instead of assign(str), as this just does the deep copy
and doesn't check the POCCA trait first.

The bug only affects the unlikely/useless combination of POCCA==true and
POCMA==false, but we should fix it for correctness anyway. it should
also make move assignment slightly cheaper to compile and execute,
because we skip the extra code in assign(const basic_string&).

libstdc++-v3/ChangeLog:

PR libstdc++/116641
* include/bits/basic_string.h (operator=(basic_string&&)): Call
_M_assign instead of assign.
* testsuite/21_strings/basic_string/allocator/116641.cc: New
test.
---
 libstdc++-v3/include/bits/basic_string.h  |  2 +-
 .../basic_string/allocator/116641.cc  | 53 +++
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/allocator/116641.cc

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 944bd230704..120c0bc9a17 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -915,7 +915,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  __str._M_data(__str._M_use_local_data());
  }
else // Need to do a deep copy
- assign(__str);
+ _M_assign(__str);
__str.clear();
return *this;
   }
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/116641.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/116641.cc
new file mode 100644
index 000..a1a411b87fa
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/116641.cc
@@ -0,0 +1,53 @@
+// { dg-do run { target c++11 } }
+// { dg-require-effective-target cxx11_abi }
+
+// Bug 116641 - std::string move assignment incorrectly depends on POCCA
+
+#include 
+#include 
+
+template
+struct Alloc
+{
+  using value_type = T;
+  using propagate_on_container_swap = std::false_type;
+  using propagate_on_container_copy_assignment = std::true_type;
+  using propagate_on_container_move_assignment = std::false_type;
+
+  Alloc(int id) : id(id) { }
+
+  template
+Alloc(const Alloc& a) : id(a.id) { }
+
+  T* allocate(unsigned long n)
+  { return std::allocator().allocate(n); }
+
+  void deallocate(T* p, unsigned long n)
+  { std::allocator().deallocate(p, n); }
+
+  Alloc& operator=(const Alloc&) { throw; }
+
+  bool operator==(const Alloc& a) const { return id == a.id; }
+  bool operator!=(const Alloc& a) const { return id != a.id; }
+
+  int id;
+};
+
+void
+test_pr116641()
+{
+  Alloc a1(1), a2(2);
+  std::basic_string, Alloc> s1(a1), s2(a2);
+
+  s1 = "allocator should not propagate on move assignment";
+  VERIFY( s1.get_allocator() == a1 );
+  VERIFY( s2.get_allocator() == a2 );
+  s2 = std::move(s1);
+  VERIFY( s1.get_allocator() == a1 );
+  VERIFY( s2.get_allocator() == a2 );
+}
+
+int main()
+{
+  test_pr116641();
+}
-- 
2.46.0



[committed] libstdc++: Only use std::ios_base_library_init() for ELF [PR116159]

2024-09-10 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Backport to follow.

-- >8 --

The undefined std::ios_base_library_init() symbol that is referenced by
 is only supposed to be used for targets where symbol
versioning is supported.

The mingw-w64 target defaults to --enable-symvers=gnu due to using GNU
ld but doesn't actually support symbol versioning. This means it tries
to emit references to the std::ios_base_library_init() symbol, which
isn't really defined in the library. This causes problems when using lld
to link user binaries.

Disable the undefined symbol reference for non-ELF targets.

libstdc++-v3/ChangeLog:

PR libstdc++/116159
* include/std/iostream (ios_base_library_init): Only define for
ELF targets.
* src/c++98/ios_init.cc (ios_base_library_init): Likewise.
---
 libstdc++-v3/include/std/iostream  | 2 +-
 libstdc++-v3/src/c++98/ios_init.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/iostream 
b/libstdc++-v3/include/std/iostream
index 4f4fa6880d5..4a6dc584d38 100644
--- a/libstdc++-v3/include/std/iostream
+++ b/libstdc++-v3/include/std/iostream
@@ -78,7 +78,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if !(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE \
   && __has_attribute(__init_priority__))
   static ios_base::Init __ioinit;
-#elif defined(_GLIBCXX_SYMVER_GNU)
+#elif defined(_GLIBCXX_SYMVER_GNU) && defined(__ELF__)
   __extension__ __asm (".globl _ZSt21ios_base_library_initv");
 #endif
 
diff --git a/libstdc++-v3/src/c++98/ios_init.cc 
b/libstdc++-v3/src/c++98/ios_init.cc
index 1422e20d940..6e2e5014cf0 100644
--- a/libstdc++-v3/src/c++98/ios_init.cc
+++ b/libstdc++-v3/src/c++98/ios_init.cc
@@ -199,7 +199,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 return __ret;
   }
 
-#ifdef _GLIBCXX_SYMVER_GNU
+#if defined(_GLIBCXX_SYMVER_GNU) && defined(__ELF__)
 #pragma GCC diagnostic ignored "-Wattribute-alias"
 
   void ios_base_library_init (void)
-- 
2.46.0



[PATCH v4] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Qing Zhao
Hi, This is the 4th version of the patch.

Compared to the 3rd version, the only change is the size calculation in
the testing case.

The 3rd version compared to the 2nd version, the major change is:
the update in testing cases per Martin's suggestions.

when the 2nd version is compared to the first version, the major changes are:

1. change the name of the builtin from __builtin_get_counted_by to
 __builtin_counted_by_ref in order to reflect the fact that the returned
 value of it is a reference to the object.

2. make typeof(__builtin_counted_by_ref) working.

3. update the testing case to use the new builtin inside _Generic.

bootstrapped and regress tested on both X86 and aarch64. no issue.

Okay for the trunk?

thanks.

Qing.



With the addition of the 'counted_by' attribute and its wide roll-out
within the Linux kernel, a use case has been found that would be very
nice to have for object allocators: being able to set the counted_by
counter variable without knowing its name.

For example, given:

  struct foo {
...
int counter;
...
struct bar array[] __attribute__((counted_by (counter)));
  } *p;

The existing Linux object allocators are roughly:

  #define MAX(A, B) (A > B) ? (A) : (B)
  #define alloc(P, FAM, COUNT) ({ \
__auto_type __p = &(P); \
size_t __size = MAX (sizeof(*P),
 __builtin_offsetof (__typeof(*P), FAM)
 + sizeof (*(P->FAM)) * COUNT); \
*__p = kmalloc(__size); \
  })

Right now, any addition of a counted_by annotation must also
include an open-coded assignment of the counter variable after
the allocation:

  p = alloc(p, array, how_many);
  p->counter = how_many;

In order to avoid the tedious and error-prone work of manually adding
the open-coded counted-by intializations everywhere in the Linux
kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
to be added to help the adoption of the counted-by attribute.

 -- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
 The built-in function '__builtin_counted_by_ref' checks whether the
 array object pointed by the pointer PTR has another object
 associated with it that represents the number of elements in the
 array object through the 'counted_by' attribute (i.e.  the
 counted-by object).  If so, returns a pointer to the corresponding
 counted-by object.  If such counted-by object does not exist,
 returns a NULL pointer.

 This built-in function is only available in C for now.

 The argument PTR must be a pointer to an array.  The TYPE of the
 returned value must be a pointer type pointing to the corresponding
 type of the counted-by object or VOID pointer type in case of a
 NULL pointer being returned.

With this new builtin, the central allocator could be updated to:

  #define MAX(A, B) (A > B) ? (A) : (B)
  #define alloc(P, FAM, COUNT) ({ \
__auto_type __p = &(P); \
__auto_type __c = (COUNT); \
size_t __size = MAX (sizeof (*(*__p)),\
 __builtin_offsetof (__typeof(*(*__p)),FAM) \
 + sizeof (*((*__p)->FAM)) * __c); \
if ((*__p = kmalloc(__size))) { \
  __auto_type ret = __builtin_counted_by_ref((*__p)->FAM); \
  *_Generic(ret, void *: &(size_t){0}, default: ret) = __c; \
} \
  })

And then structs can gain the counted_by attribute without needing
additional open-coded counter assignments for each struct, and
unannotated structs could still use the same allocator.

PR c/116016

gcc/c-family/ChangeLog:

* c-common.cc: Add new __builtin_counted_by_ref.
* c-common.h (enum rid): Add RID_BUILTIN_COUNTED_BY_REF.

gcc/c/ChangeLog:

* c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
* c-parser.cc (has_counted_by_object): New routine.
(get_counted_by_ref): New routine.
(c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
* c-tree.h: New global in_builtin_counted_by_ref.
* c-typeck.cc (build_component_ref): Enable generating
.ACCESS_WITH_SIZE inside typeof when inside builtin_counted_by_ref.

gcc/ChangeLog:

* doc/extend.texi: Add documentation for __builtin_counted_by_ref.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-counted-by-ref-1.c: New test.
* gcc.dg/builtin-counted-by-ref.c: New test.
---
 gcc/c-family/c-common.cc  |  1 +
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-decl.cc   |  1 +
 gcc/c/c-parser.cc | 77 +++
 gcc/c/c-tree.h|  1 +
 gcc/c/c-typeck.cc |  7 +-
 gcc/doc/extend.texi   | 55 +++
 .../gcc.dg/builtin-counted-by-ref-1.c | 97 +++
 gcc/testsuite/gcc.dg/builtin-counted-by-ref.c | 61 
 9 files changed, 300 insertions(+),

Re: [PATCH v4] Provide new GCC builtin __builtin_counted_by_ref [PR116016]

2024-09-10 Thread Jakub Jelinek
On Tue, Sep 10, 2024 at 09:28:04PM +, Qing Zhao wrote:
> @@ -11741,6 +11770,54 @@ c_parser_postfix_expression (c_parser *parser)
>   set_c_expr_source_range (&expr, loc, close_paren_loc);
>   break;
> }
> + case RID_BUILTIN_COUNTED_BY_REF:
> +   {
> + vec *cexpr_list;
> + c_expr_t *e_p;
> + location_t close_paren_loc;
> +
> + in_builtin_counted_by_ref = true;
> +
> + c_parser_consume_token (parser);
> + if (!c_parser_get_builtin_args (parser,
> + "__builtin_counted_by_ref",
> + &cexpr_list, false,
> + &close_paren_loc))
> +   {
> + expr.set_error ();
> + goto error_exit;

Up to Joseph or Marek as C maintainers/reviewers, but I think it is a bad
idea to use such a generic name for a label inside of handling of one
specific keyword.

Either use RAII and just break; instead of goto error_exit;, like
struct in_builtin_counted_by_ref_sentinel {
  ~in_builtin_counted_by_ref_sentinel ()
  { in_builtin_counted_by_ref = false; }
} ibcbr_sentinel;
or add those in_builtin_counted_by_ref = false; lines before each break;

Or set it just when parsing the args?

Anyway, I'm not even convinced a global variable like that is a good idea.
The argument can contain arbitrary expressions in there (e.g. comma
expression, statement expression, ...), I strongly doubt you want to
have that special handling in all the places in the grammar rather than just
for the last COMPONENT_REF in there.  And, there is no reason why
you couldn't have e.g. nested call inside of the argument:
__builtin_counted_by_ref (ptr[*__builtin_counted_by_ref 
(something->fam1)]->fam2)
and that on the other side clears in_builtin_counted_by_ref after parsing
the inner one.

Jakub



[PATCH v1] RISC-V: Fix asm check for Vector SAT_* due to middle-end change

2024-09-10 Thread pan2 . li
From: Pan Li 

The middle-end change makes the effect on the layout of the assembly
for vector SAT_*.  This patch would like to fix it and make it robust.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Adjust
asm check and make it robust.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/b

[PATCH] c++: decltype(auto) deduction of statement-expression [PR116418]

2024-09-10 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/backports?

-- >8 --

r8-7538 for PR84968 made strip_typedefs_expr diagnose seeing
STATEMENT_LIST, which effectively makes us reject statement-expressions
noexcept-specifiers (we already diagnose them in template arguments
at parse time).

Later r11-7452 made decltype(auto) deduction do strip_typedefs_expr on
the expression before deducing (as an implementation detail) and so ever
since we inadvertently reject decltype(auto) deduction of a
statement-expression.

This patch just removes the diagnostic in strip_typedefs_expr; it doesn't
seem like the right place for it.  And it lets us accept more code using
statement-expressions in various contexts.

PR c++/116418
PR c++/84968

gcc/cp/ChangeLog:

* tree.cc (strip_typedefs_expr) : Replace
with ...
: ... this non-diagnosing early exit.

gcc/testsuite/ChangeLog:

* g++.dg/eh/pr84968.C: No longer expect ah ahead of time diagnostic
for the statement-expresssion.  Instantiate the template and expect
an incomplete type error instead.
* g++.dg/ext/stmtexpr26.C: New test.
---
 gcc/cp/tree.cc|  5 ++---
 gcc/testsuite/g++.dg/eh/pr84968.C |  4 +++-
 gcc/testsuite/g++.dg/ext/stmtexpr26.C | 10 ++
 3 files changed, 15 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/stmtexpr26.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 31ecbb1ac79..a150a91f2fa 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -2011,9 +2011,8 @@ strip_typedefs_expr (tree t, bool *remove_attributes, 
unsigned int flags)
 case LAMBDA_EXPR:
   return t;
 
-case STATEMENT_LIST:
-  error ("statement-expression in a constant expression");
-  return error_mark_node;
+case STMT_EXPR:
+  return t;
 
 default:
   break;
diff --git a/gcc/testsuite/g++.dg/eh/pr84968.C 
b/gcc/testsuite/g++.dg/eh/pr84968.C
index 23c49f477a8..a6e21914eed 100644
--- a/gcc/testsuite/g++.dg/eh/pr84968.C
+++ b/gcc/testsuite/g++.dg/eh/pr84968.C
@@ -9,7 +9,9 @@ struct S {
   void a()
 try {
 } catch (int ()
-noexcept (({ union b a; true; }))) // { dg-error "constant" }
+noexcept (({ union b a; true; }))) // { dg-error "'b a' has 
incomplete type" }
   {
   }
 };
+
+template void S::a(); // { dg-message "required from here" }
diff --git a/gcc/testsuite/g++.dg/ext/stmtexpr26.C 
b/gcc/testsuite/g++.dg/ext/stmtexpr26.C
new file mode 100644
index 000..498dd12ef10
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/stmtexpr26.C
@@ -0,0 +1,10 @@
+// PR c++/116418
+// { dg-do compile { target c++14 } }
+
+void foo ();
+template 
+void bar ()
+{
+  decltype(auto) v = ({ foo (); 3; });
+}
+
-- 
2.46.0.540.g4c42d5ff28



RE: [PATCH v2 2/2] RISC-V: Fix ICE due to inconsistency of RVV intrinsic list in lto and cc1.

2024-09-10 Thread Li, Pan2
> * gcc.target/riscv/rvv/base/bug-11.c: New test.

Seems you missed this file in patch v2?

> +/* Helper for init_builtins in LTO.  */
> +static void
> +handle_pragma_vector_for_lto ()
> +{
> +  struct pragma_intrinsic_flags backup_flags;
> +
> +  riscv_pragma_intrinsic_flags_pollute (&backup_flags);
> +
> +  riscv_option_override ();
> +  init_adjust_machine_modes ();
> +
> +  register_builtin_types ();
> +
> +  handle_pragma_vector ();
> +  riscv_pragma_intrinsic_flags_restore (&backup_flags);
> +
> +  /* Re-initialize after the flags are restored.  */
> +  riscv_option_override ();
> +  init_adjust_machine_modes ();
> +}

Looks this part almost the same as most of riscv_pragma_intrinsic except 
register_builtin_types ().
I wonder if we can wrap a helper to avoid code duplication, and IMO the _lto 
suffix should be
removed as the body of function has nothing to do with lto.

Otherwise no comments from myside, and l'd leave it to kito or juzhe.

Pan

-Original Message-
From: Jin Ma  
Sent: Tuesday, September 10, 2024 1:57 PM
To: gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; Li, Pan2 ; 
kito.ch...@gmail.com; richard.guent...@gmail.com; jinma.cont...@gmail.com; Jin 
Ma 
Subject: [PATCH v2 2/2] RISC-V: Fix ICE due to inconsistency of RVV intrinsic 
list in lto and cc1.

When we use flto, the function list of rvv will be generated twice,
once in the cc1 phase and once in the lto phase. However, due to
the different generation methods, the two lists are different.

For example, when there is no zvfh or zvfhmin in arch, it is
generated by calling function "riscv_pragma_intrinsic". since the
TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation,
a list of rvv functions related to float16 will be generated. In
the lto phase, the rvv function list is generated only by calling
the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16
is disabled, so that the float16-related rvv function list cannot
be generated like cc1. This will cause confusion, resulting in
matching tothe wrong function due to inconsistent fcode in the lto
phase, eventually leading to ICE.

So I think we should be consistent with their generated lists, which
is exactly what this patch does.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov
to riscv-protos.h.
(riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c.
(riscv_pragma_intrinsic_flags_restore): Likewise.
(riscv_pragma_intrinsic): Likewise.
* config/riscv/riscv-protos.h (struct pragma_intrinsic_flags):
New.
(riscv_pragma_intrinsic_flags_restore): New.
(riscv_pragma_intrinsic_flags_pollute): New.
* config/riscv/riscv-vector-builtins.cc 
(riscv_pragma_intrinsic_flags_pollute): New.
(riscv_pragma_intrinsic_flags_restore): New.
(handle_pragma_vector_for_lto): New.
(init_builtins): Correct the processing logic for lto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-11.c: New test.
---
 gcc/config/riscv/riscv-c.cc   | 70 +--
 gcc/config/riscv/riscv-protos.h   | 13 
 gcc/config/riscv/riscv-vector-builtins.cc | 83 ++-
 3 files changed, 96 insertions(+), 70 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 71112d9c66d7..7037ecc1268a 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -34,72 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define builtin_define(TXT) cpp_define (pfile, TXT)
 
-struct pragma_intrinsic_flags
-{
-  int intrinsic_target_flags;
-
-  int intrinsic_riscv_vector_elen_flags;
-  int intrinsic_riscv_zvl_flags;
-  int intrinsic_riscv_zvb_subext;
-  int intrinsic_riscv_zvk_subext;
-};
-
-static void
-riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags)
-{
-  flags->intrinsic_target_flags = target_flags;
-  flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags;
-  flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags;
-  flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext;
-  flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext;
-
-  target_flags = target_flags
-| MASK_VECTOR;
-
-  riscv_zvl_flags = riscv_zvl_flags
-| MASK_ZVL32B
-| MASK_ZVL64B
-| MASK_ZVL128B;
-
-  riscv_vector_elen_flags = riscv_vector_elen_flags
-| MASK_VECTOR_ELEN_32
-| MASK_VECTOR_ELEN_64
-| MASK_VECTOR_ELEN_FP_16
-| MASK_VECTOR_ELEN_FP_32
-| MASK_VECTOR_ELEN_FP_64;
-
-  riscv_zvb_subext = riscv_zvb_subext
-| MASK_ZVBB
-| MASK_ZVBC
-| MASK_ZVKB;
-
-  riscv_zvk_subext = riscv_zvk_subext
-| MASK_ZVKG
-| MASK_ZVKNED
-| MASK_ZVKNHA
-| MASK_ZVKNHB
-| MASK_ZVKSED
-| MASK_ZVKSH
-| MASK_ZVKN
-| MASK_ZVKNC
-| MASK_ZVKNG
-| MASK_ZVKS
-| MASK_ZVKSC
-| MASK_ZVKSG
-| MASK_ZVKT;
-}
-
-static void
-riscv_pragma_intrinsic_flags_

[RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-10 Thread liuhongt
GCC12 enables vectorization for O2 with very cheap cost model which is 
restricted
to constant tripcount. The vectorization capacity is very limited w/ 
consideration
of codesize impact.

The patch extends the very cheap cost model a little bit to support variable 
tripcount.
But still disable peeling for gaps/alignment, runtime aliasing checking and 
epilogue
vectorization with the consideration of codesize.

So there're at most 2 versions of loop for O2 vectorization, one vectorized 
main loop
, one scalar/remainder loop.

.i.e.

void
foo1 (int* __restrict a, int* b, int* c, int n)
{
 for (int i = 0; i != n; i++)
  a[i] = b[i] + c[i];
}

with -O2 -march=x86-64-v3, will be vectorized to

.L10:
vmovdqu (%r8,%rax), %ymm0
vpaddd  (%rsi,%rax), %ymm0, %ymm0
vmovdqu %ymm0, (%rdi,%rax)
addq$32, %rax
cmpq%rdx, %rax
jne .L10
movl%ecx, %eax
andl$-8, %eax
cmpl%eax, %ecx
je  .L21
vzeroupper
.L12:
movl(%r8,%rax,4), %edx
addl(%rsi,%rax,4), %edx
movl%edx, (%rdi,%rax,4)
addq$1, %rax
cmpl%eax, %ecx
jne .L12

As measured with SPEC2017 on EMR, the patch(N-Iter) improves performance by 
4.11%
with extra 2.8% codeisze, and cheap cost model improve performance by 5.74% with
extra 8.88% codesize. The details are as below

Performance measured with -march=x86-64-v3 -O2 on EMR

N-Iter  cheap cost model
500.perlbench_r -0.12%  -0.12%
502.gcc_r   0.44%   -0.11%  
505.mcf_r   0.17%   4.46%
520.omnetpp_r   0.28%   -0.27%
523.xalancbmk_r 0.00%   5.93%
525.x264_r  -0.09%  23.53%
531.deepsjeng_r 0.19%   0.00%
541.leela_r 0.22%   0.00%
548.exchange2_r -11.54% -22.34%
557.xz_r0.74%   0.49%
GEOMEAN INT -1.04%  0.60%

503.bwaves_r3.13%   4.72%
507.cactuBSSN_r 1.17%   0.29%
508.namd_r  0.39%   6.87%
510.parest_r3.14%   8.52%
511.povray_r0.10%   -0.20%
519.lbm_r   -0.68%  10.14%
521.wrf_r   68.20%  76.73%
526.blender_r   0.12%   0.12%
527.cam4_r  19.67%  23.21%
538.imagick_r   0.12%   0.24%
544.nab_r   0.63%   0.53%
549.fotonik3d_r 14.44%  9.43%
554.roms_r  12.39%  0.00%
GEOMEAN FP  8.26%   9.41%
GEOMEAN ALL 4.11%   5.74%

Code sise impact
N-Iter  cheap cost model
500.perlbench_r 0.22%   1.03%
502.gcc_r   0.25%   0.60%   
505.mcf_r   0.00%   32.07%
520.omnetpp_r   0.09%   0.31%
523.xalancbmk_r 0.08%   1.86%
525.x264_r  0.75%   7.96%
531.deepsjeng_r 0.72%   3.28%
541.leela_r 0.18%   0.75%
548.exchange2_r 8.29%   12.19%
557.xz_r0.40%   0.60%
GEOMEAN INT 1.07%%  5.71%

503.bwaves_r12.89%  21.59%
507.cactuBSSN_r 0.90%   20.19%
508.namd_r  0.77%   14.75%
510.parest_r0.91%   3.91%
511.povray_r0.45%   4.08%
519.lbm_r   0.00%   0.00%
521.wrf_r   5.97%   12.79%
526.blender_r   0.49%   3.84%
527.cam4_r  1.39%   3.28%
538.imagick_r   1.86%   7.78%
544.nab_r   0.41%   3.00%
549.fotonik3d_r 25.50%  47.47%
554.roms_r  5.17%   13.01%
GEOMEAN FP  4.14%   11.38%
GEOMEAN ALL 2.80%   8.88%


The only regression is from 548.exchange_r, the vectorization for inner loop in 
each layer
of the 9-layer loops increases register pressure and causes more spill.
- block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10
  - block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
.
- block(rnext:9, 9, i9) = block(rnext:9, 9, i9) + 10
...
- block(rnext:9, 2, i2) = block(rnext:9, 2, i2) + 10
- block(rnext:9, 1, i1) = block(rnext:9, 1, i1) + 10

Looks like aarch64 doesn't have the issue because aarch64 has 32 gprs, but x86 
only has 16.
I have a extra patch to prevent loop vectorization in deep-depth loop for x86 
backend which can
bring the performance back.

For 503.bwaves_r/505.mcf_r/507.cactuBSSN_r/508.namd_r, cheap cost model 
increases codesize
a lot but don't imporve any performance. And N-iter is much better for that for 
codesize.


Any comments?


gcc/ChangeLog:

* tree-vect-loop.cc (vect_analyze_loop_costing): Enable
vectorization for LOOP_VINFO_PEELING_FOR_NITER in very cheap
cost model.
(vect_analyze_loop): Disable epilogue vectorization in very
cheap cost model.
---
 gcc/tree-vect-loop.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 242d5e2d916..06afd8cae79 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2356,8 +2356,7 @@ vect_analyze

Re: [PATCH] i386: Fix incorrect avx512f-mask-type.h include

2024-09-10 Thread Hongtao Liu
On Thu, Sep 5, 2024 at 10:05 AM Haochen Jiang  wrote:
>
> Hi all,
>
> In avx512f-mask-type.h, we need SIZE being defined to get
> MASK_TYPE defined correctly. Fix those testcases where
> SIZE are not defined before the include for avv512f-mask-type.h.
>
> Note that for convert intrins in AVX10.2, they will need more
> modifications due to the current tests did not include mask ones.
> They will be in a seperate patch.
>
> Tested on x86-64-pc-linux-gnu. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10-helper.h: Do not include
> avx512f-mask-type.h.
> * gcc.target/i386/avx10_2-512-vaddnepbf16-2.c:
> Define SIZE and include avx512f-mask-type.h.
> * gcc.target/i386/avx10_2-512-vcmppbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttpd2dqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttpd2qqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttpd2udqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttpd2uqqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttps2dqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttps2qqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttps2udqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vcvttps2uqqs-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vmulnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpbssds-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpbsud-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpbsuds-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpbuud-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpbuuds-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Ditto.
> * gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto.
> * gcc.target/i386/avx512fp16-vfpclassph-1b.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/avx10-helper.h  |  1 -
>  .../i386/avx10_2-512-vaddnepbf16-2.c  | 11 +-
>  .../gcc.target/i386/avx10_2-512-vcmppbf16-2.c |  5 +++--
>  .../i386/avx10_2-512-vcvtnebf162ibs-2.c   | 16 +++---
>  .../i386/avx10_2-512-vcvtnebf162iubs-2.c  | 16 +++---
>  .../i386/avx10_2-512-vcvtph2ibs-2.c   | 16 +++---
>  .../i386/avx10_2-512-vcvtph2iubs-2.c  | 16 +++---
>  .../i386/avx10_2-512-vcvtps2ibs-2.c   | 16 +++---
>  .../i386/avx10_2-

[PATCH] aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for SVE instructions

2024-09-10 Thread Soumya AR
On Neoverse V2, SVE ADD instructions have a throughput of 4, while shift
instructions like SHL have a throughput of 2. We can lean on that to emit code
like:
 addz31.b, z31.b, z31.b
instead of:
 lslz31.b, z31.b, #1

The implementation of this change for SVE vectors is similar to a prior patch
 that adds
the above functionality for Neon vectors.

Here, the machine descriptor pattern is split up to separately accommodate left
and right shifts, so we can specifically emit an add for all left shifts by 1. 

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (*post_ra_v3): Split 
pattern to
accomodate left and right shifts separately.
(*post_ra_v_ashl3): Matches left shifts with additional 
constraint to
check for shifts by 1.
(*post_ra_v_3): Matches right shifts.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Updated instances of lsl-1 
with
corresponding add
* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise. 
* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/adr_1.c: Likewise.
* gcc.target/aarch64/sve/adr_6.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_7.c: Likewise.
* gcc.target/aarch64/sve/cond_mla_8.c: Likewise.
* gcc.target/aarch64/sve/shift_2.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/rshl_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise.
* gcc.target/aarch64/sve/sve_shl_add.c: New test.



0001-aarch64-Emit-ADD-X-Y-Y-instead-of-SHL-X-Y-1-for-SVE-.patch
Description: 0001-aarch64-Emit-ADD-X-Y-Y-instead-of-SHL-X-Y-1-for-SVE-.patch


[PATCH v3 1/5] Genmatch: Add control flow graph match for case 0 and case 1

2024-09-10 Thread pan2 . li
From: Pan Li 

The gen_phi_on_cond can only support below control flow for cond
from day 1.  Aka:

+--+
| def  |
| ...  |   +-+
| cond |-->| def |
+--+   | ... |
   |   +-+
   |  |
   v  |
+-+   |
| PHI |<--+
+-+

Unfortunately, there will be more scenarios of control flow on PHI.
For example as below:

T __attribute__((noinline))\
sat_s_add_##T##_fmt_3 (T x, T y)   \
{  \
  T sum;   \
  bool overflow = __builtin_add_overflow (x, y, &sum); \
  return overflow ? x < 0 ? MIN : MAX : sum;   \
}

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

With expanded RTL like below.
   3   │
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

The above code will have below control flow which is not supported by
the gen_phi_on_cond.

+--+
| def  |
| ...  |   +-+
| cond |-->| def |
+--+   | ... |
   |   +-+
   |  |
   v  |
+-+   |
| def |   |
| ... |   |
+-+   |
   |  |
   |  |
   v  |
+-+   |
| PHI |<--+
+-+

This patch would like to add support above control flow matching for
the gen_phi_on_cond.

The below testsuites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* gimple-match-head.cc (match_control_flow_graph_case_0): Add
new func impl to match case 0 of cfg.
(match_control_flow_graph_case_1): Ditto but for case 1.

Signed-off-by: Pan Li 
---
 gcc/gimple-match-head.cc | 115 +++
 1 file changed, 115 insertions(+)

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 924d3f1e710..c51728ae742 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -375,3 +375,118 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool &wascmp, tree (*va
 return true;
   return false;
 }
+
+/*
+ * Return TRUE if the cfg matches the below layout by the given b2 in
+ * the first argument.  Or return FALSE.
+ *
+ * If return TRUE, the output argument b_out will be updated to the b0
+ * block as below example.
+ *
+ * If return FALSE, the output argument b_out will be NULL_BLOCK.
+ *
+ *|
+ *|
+ *v
+ * +--+
+ * | b0:  |
+ * | def  |   +-+
+ * | ...  |   | b1: |
+ * | cond |-->| def |
+ * +--+   | ... |
+ *|   +-+
+ *|  |
+ *v  |
+ * +-+   |
+ * | b2: |   |
+ * | def |<--+
+ * +-+
+ */
+static inline bool
+match_control_flow_graph_case_0 (basic_block b2, basic_block *b_out)
+{
+  *b_out = NULL;
+
+  if (EDGE_COUNT (b2->preds) != 2)
+return false;
+
+  basic_block pred_0 = EDGE_PRED (b2, 0)->src;
+  basic_block pred_1 = EDGE_PRED (b2, 1)->src;
+
+  if (pred_0 == NULL || pred_1 == NULL)
+return false;
+
+  if (!(EDGE_COUNT (pred_0->succs) == 2 && EDGE_COUNT (pred_1->succs) == 1)
+ && !(EDGE_COUNT (pred_0->succs) == 1 && EDGE_COUNT (pred_1->succs) == 2))
+return false;
+
+  basic_block b0 = EDGE_COUNT (pred_0->succs) == 2 ? pred_0 : pred_1;
+  basic_block b1 = EDGE_COUNT (pred_0->succs) == 1 ? pred_0 : pred_1;
+
+  if (EDGE_COUNT (b1->preds) != 1 || EDGE_PRED (b1, 0)->src != b0)
+return false;
+
+  *b_out = b0;
+  return true;
+}
+
+/*
+ * Return TRUE if the cfg matches the below layout by the given b3 in
+ * the first argument.  Or return FALSE.
+ *
+ * If return TRUE, the output argument b_out will be updated to the b0
+ 

[PATCH v3 3/5] Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi

2024-09-10 Thread pan2 . li
From: Pan Li 

This patch would like to leverage the match_cond_with_binary_phi to
match the phi on cond, and get the true/false arg if matched.  This
helps a lot to simplify the implementation of gen_phi_on_cond.

Before this patch:
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : _pb_1_1;
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
? _pb_1_1 : _pb_0_1;
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
&& EDGE_COUNT (_other_db_1->succs) == 1
&& EDGE_PRED (_other_db_1, 0)->src == _db_1)
{
  tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
  tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
  tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
_cond_lhs_1, _cond_rhs_1);
  bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
EDGE_TRUE_VALUE;
  tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
  tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
...

After this patch:
basic_block _b1 = gimple_bb (_a1);
tree _p1, _p2;
gcond *_cond_1 = match_cond_with_binary_phi (_a1, &_p1, &_p2);
if (_cond_1 && _p1 && _p2)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_cond_1);
tree _cond_rhs_1 = gimple_cond_rhs (_cond_1);
tree _p0 = build2 (gimple_cond_code (_cond_1), boolean_type_node, 
_cond_lhs_1, _cond_rhs_1);
...

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* genmatch.cc (dt_operand::gen_phi_on_cond): Leverage the
match_cond_with_binary_phi API to get cond gimple, true and
false TREE arg.

Signed-off-by: Pan Li 
---
 gcc/genmatch.cc | 67 +++--
 1 file changed, 15 insertions(+), 52 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a56bd90cb2c..e3d2ecc6266 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3516,79 +3516,42 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
depth)
 void
 dt_operand::gen_phi_on_cond (FILE *f, int indent, int depth)
 {
-  fprintf_indent (f, indent,
-"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
-
-  fprintf_indent (f, indent, "if (gimple_phi_num_args (_a%d) == 2)\n", depth);
+  char opname_0[20];
+  char opname_1[20];
+  char opname_2[20];
 
-  indent += 2;
-  fprintf_indent (f, indent, "{\n");
-  indent += 2;
+  gen_opname (opname_0, 0);
+  gen_opname (opname_1, 1);
+  gen_opname (opname_2, 2);
 
   fprintf_indent (f, indent,
-"basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth);
-  fprintf_indent (f, indent,
-"basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth);
-  fprintf_indent (f, indent,
-"basic_block _db_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d)) ? "
-"_pb_0_%d : _pb_1_%d;\n", depth, depth, depth, depth);
+"basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
+  fprintf_indent (f, indent, "tree %s, %s;\n", opname_1, opname_2);
   fprintf_indent (f, indent,
-"basic_block _other_db_%d = safe_dyn_cast  "
-"(*gsi_last_bb (_pb_0_%d)) ? _pb_1_%d : _pb_0_%d;\n",
-depth, depth, depth, depth);
+"gcond *_cond_%d = match_cond_with_binary_phi (_a%d, &%s, &%s);\n",
+depth, depth, opname_1, opname_2);
 
-  fprintf_indent (f, indent,
-"gcond *_ct_%d = safe_dyn_cast  (*gsi_last_bb (_db_%d));\n",
-depth, depth);
-  fprintf_indent (f, indent, "if (_ct_%d"
-" && EDGE_COUNT (_other_db_%d->preds) == 1\n", depth, depth);
-  fprintf_indent (f, indent,
-"  && EDGE_COUNT (_other_db_%d->succs) == 1\n", depth);
-  fprintf_indent (f, indent,
-"  && EDGE_PRED (_other_db_%d, 0)->src == _db_%d)\n", depth, depth);
+  fprintf_indent (f, indent, "if (_cond_%d && %s && %s)\n",
+depth, opname_1, opname_2);
 
   indent += 2;
   fprintf_indent (f, indent, "{\n");
   indent += 2;
 
   fprintf_indent (f, indent,
-"tree _cond_lhs_%d = gimple_cond_lhs (_ct_%d);\n", depth, depth);
+"tree _cond_lhs_%d = gimple_cond_lhs (_cond_%d);\n", depth, depth);
   fprintf_indent (f, indent,
-"tree _cond_rhs_%d = gimple_cond_rhs (_ct_%d);\n", depth, depth);
-
-  char opname_0[20];
-  char opname_1[20];
-  char opname_2[20];
-  gen_opname (opname_0, 0);
-
+"tree _cond_rhs_%d = gimple_cond_rhs (_cond_%d);\n", depth, depth);
   fprintf_indent (f, indent,
-"tree %s = build2 (gimple_cond_code (_ct_%d), "
+"tree %s = build2 (gimple_cond_code (_cond_%d), "
 "boolean_type_node, _cond_lhs_%d, _cond_rhs_%d);\n",
 opname_0, depth, depth, depth);
 
-  fprintf_indent (f, indent,
-"bool _arg_0_is_true_%d = gimple_phi_arg_edge (_a%d, 0)->flags"
-" & EDGE_TRUE_VALUE;\n", depth, depth);
-
-  ge

[PATCH v3 4/5] Match: Support form 3 for scalar signed integer .SAT_ADD

2024-09-10 Thread pan2 . li
From: Pan Li 

This patch would like to support the form 3 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_3 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return overflow ? x < 0 ? MIN : MAX : sum;   \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _3 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the form 3 of signed .SAT_ADD matching.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4cef965c9c7..167b1b106dd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3237,6 +3237,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@2)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
 
+/* Signed saturation add, case 4:
+   Z = .ADD_OVERFLOW (X, Y)
+   SAT_S_ADD = IMAGPART_EXPR (Z) != 0 ? (-(T)(X < 0) ^ MAX) : sum;  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   (realpart @2))
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
-- 
2.43.0



[PATCH v3 2/5] Match: Add interface match_cond_with_binary_phi for true/false arg

2024-09-10 Thread pan2 . li
From: Pan Li 

When matching the cond with 2 args phi node, we need to figure out
which arg of phi node comes from the true edge of cond block, as
well as the false edge.  This patch would like to add interface
to perform the action and return the true and false arg in TREE type.

There will be some additional handling if one of the arg is INTEGER_CST.
Because the INTEGER_CST args may have no source block, thus its' edge
source points to the condition block.  See below example in line 31,
the 255 INTEGER_CST has block 2 as source.  Thus, we need to find
the non-INTEGER_CST (aka _1) to tell which one is the true/false edge.
For example, the _1(3) takes block 3 as source, which is the dest
of false edge of the condition block.

   4   │ __attribute__((noinline))
   5   │ uint8_t sat_u_add_imm_type_check_uint8_t_fmt_2 (uint8_t x)
   6   │ {
   7   │   unsigned char _1;
   8   │   unsigned char _2;
   9   │   uint8_t _3;
  10   │   __complex__ unsigned char _5;
  11   │
  12   │ ;;   basic block 2, loop depth 0
  13   │ ;;pred:   ENTRY
  14   │   _5 = .ADD_OVERFLOW (x_4(D), 9);
  15   │   _2 = IMAGPART_EXPR <_5>;
  16   │   if (_2 != 0)
  17   │ goto ; [35.00%]
  18   │   else
  19   │ goto ; [65.00%]
  20   │ ;;succ:   3
  21   │ ;;4
  22   │
  23   │ ;;   basic block 3, loop depth 0
  24   │ ;;pred:   2
  25   │   _1 = REALPART_EXPR <_5>;
  26   │ ;;succ:   4
  27   │
  28   │ ;;   basic block 4, loop depth 0
  29   │ ;;pred:   2
  30   │ ;;3
  31   │   # _3 = PHI <255(2), _1(3)>
  32   │   return _3;
  33   │ ;;succ:   EXIT
  34   │
  35   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* gimple-match-head.cc (match_cond_with_binary_phi): Add new func
impl to match binary phi for true and false arg.

Signed-off-by: Pan Li 
---
 gcc/gimple-match-head.cc | 60 
 1 file changed, 60 insertions(+)

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index c51728ae742..64f4f28cc72 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -490,3 +490,63 @@ match_control_flow_graph_case_1 (basic_block b3, 
basic_block *b_out)
   *b_out = b0;
   return true;
 }
+
+/*
+ * Return the relevant gcond * of the given phi, as well as the true
+ * and false TREE args of the phi.  Or return NULL.
+ *
+ * If matched the gcond *, the output argument TREE true_arg and false_arg
+ * will be updated to the relevant args of phi.
+ *
+ * If failed to match, NULL gcond * will be returned, as well as the output
+ * arguments will be set to NULL_TREE.
+ */
+
+static inline gcond *
+match_cond_with_binary_phi (gphi *phi, tree *true_arg, tree *false_arg)
+{
+  basic_block cond_block;
+  *true_arg = *false_arg = NULL_TREE;
+
+  if (gimple_phi_num_args (phi) != 2)
+return NULL;
+
+  if (!match_control_flow_graph_case_0 (gimple_bb (phi), &cond_block)
+  && !match_control_flow_graph_case_1 (gimple_bb (phi), &cond_block))
+return NULL;
+
+  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_block));
+
+  if (!cond || EDGE_COUNT (cond_block->succs) != 2)
+return NULL;
+
+  tree t0 = gimple_phi_arg_def (phi, 0);
+  tree t1 = gimple_phi_arg_def (phi, 1);
+  edge e0 = gimple_phi_arg_edge (phi, 0);
+  edge e1 = gimple_phi_arg_edge (phi, 1);
+
+  if (TREE_CODE (t0) == INTEGER_CST && TREE_CODE (t1) == INTEGER_CST)
+return NULL;
+
+  bool arg_0_cst_p = TREE_CODE (t0) == INTEGER_CST;
+  edge arg_edge = arg_0_cst_p ? e1 : e0;
+  tree arg = arg_0_cst_p ? t1 : t0;
+  tree other_arg = arg_0_cst_p ? t0 : t1;
+
+  edge cond_e0 = EDGE_SUCC (cond_block, 0);
+  edge cond_e1 = EDGE_SUCC (cond_block, 1);
+  edge matched_edge = arg_edge->src == cond_e0->dest ? cond_e0 : cond_e1;
+
+  if (matched_edge->flags & EDGE_TRUE_VALUE)
+{
+  *true_arg = arg;
+  *false_arg = other_arg;
+}
+  else
+{
+  *false_arg = arg;
+  *true_arg = other_arg;
+}
+
+  return cond;
+}
-- 
2.43.0



[PATCH v3 5/5] RISC-V: Fix vector SAT_ADD dump check due to middle-end change

2024-09-10 Thread pan2 . li
From: Pan Li 

This patch would like fix the dump check times of vector SAT_ADD.  The
middle-end change makes the match times from 2 to 4 times.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Adjust
the dump check times from 2 to 4.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c   | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c| 2 +-
 16 files changed, 16 insertions(+), 16 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
index c525ba97c52..47dd5012cc6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint8_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
index 41372d08e52..df8d5a8d275 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint16_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
index dddebb54426..f286bd10e4b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint32_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
index ad5162d10a0..307ff36cc35 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c
@@ -15,4 +15,4 @@
 */
 DEF_VEC_SAT_U_ADD_FMT_6(uint64_t)
 
-/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c
index 39c20b3cea6..3218962724c 10064