Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 8:25 AM Hongyu Wang  wrote:
>
> Hi,
>
> In ix86_override_options_after_change, calls to ix86_default_align
> and ix86_recompute_optlev_based_flags will cause mismatched target
> opt_set when doing cl_optimization_restore. Move them back to
> ix86_option_override_internal to solve the issue.
>
> Bootstrapped & regtested on x86_64-pc-linux-gnu, and Rainer helped to
> test with i386-pc-solaris2.11 which also passed 32/64bit tests.

Since this is a tricky area apparently without too much test coverage can
we have a testcase for this?

> Ok for trunk and backport down to gcc12?
>
> gcc/ChangeLog:
>
> PR target/113719
> * config/i386/i386-options.cc (ix86_override_options_after_change):
> Remove call to ix86_default_align and
> ix86_recompute_optlev_based_flags.
> (ix86_option_override_internal): Call ix86_default_align and
> ix86_recompute_optlev_based_flags.
> ---
>  gcc/config/i386/i386-options.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index ac48b5c61c4..d97464f2c74 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1930,11 +1930,6 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
> *opts,
>  void
>  ix86_override_options_after_change (void)
>  {
> -  /* Default align_* from the processor table.  */
> -  ix86_default_align (&global_options);
> -
> -  ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> -
>/* Disable unrolling small loops when there's explicit
>   -f{,no}unroll-loop.  */
>if ((OPTION_SET_P (flag_unroll_loops))
> @@ -2530,6 +2525,8 @@ ix86_option_override_internal (bool main_args_p,
>
>set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
>
> +  ix86_recompute_optlev_based_flags (opts, opts_set);
> +
>ix86_override_options_after_change ();
>
>ix86_tune_cost = processor_cost_table[ix86_tune];
> @@ -2565,6 +2562,9 @@ ix86_option_override_internal (bool main_args_p,
>|| TARGET_64BIT_P (opts->x_ix86_isa_flags))
>  opts->x_ix86_regparm = REGPARM_MAX;
>
> +  /* Default align_* from the processor table.  */
> +  ix86_default_align (&global_options);
> +
>/* Provide default for -mbranch-cost= value.  */
>SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
>ix86_tune_cost->branch_cost);
> --
> 2.31.1
>


[PATCH] RISC-V: testsuite: Drop march-string in cpymemsi-1.c

2024-05-16 Thread Christoph Müllner
The test cpymemsi-1.c is a "dg-do run" test, which does not have
any restrictions for the enabled extensions.
Let's drop the "gc" requirement, so that the test can also be
executed on non-f and non-d targets.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymemsi-1.c: Drop march-string.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/cpymemsi-1.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
index 983b564ccaf..aee54d9aa00 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } } } 
*/
-/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } } } 
*/
+/* { dg-options "-save-temps -g0 -fno-lto" } */
 /* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
 /* { dg-timeout-factor 2 } */
 
-- 
2.44.0



[PATCH][v2] tree-optimization/79958 - make DSE track multiple paths

2024-05-16 Thread Richard Biener
DSE currently gives up when the path we analyze forks.  This leads
to multiple missed dead store elimination PRs.  The following fixes
this by recursing for each path and maintaining the visited bitmap
to avoid visiting CFG re-merges multiple times.  The overall cost
is still limited by the same bound, it's just more likely we'll hit
the limit now.  The patch doesn't try to deal with byte tracking
once a path forks but drops info on the floor and only handling
fully dead stores in that case.

This version adds some testsuite adjustments to avoid regressions.
Will push after retesting completed.

Richard.

PR tree-optimization/79958
PR tree-optimization/109087
PR tree-optimization/100314
PR tree-optimization/114774
* tree-ssa-dse.cc (dse_classify_store): New forwarder.
(dse_classify_store): Add arguments cnt and visited, recurse
to track multiple paths when we end up with multiple defs.

* gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
* gcc.dg/graphite/pr80906.c: Avoid DSE of last data reference
in loop.
* g++.dg/ipa/devirt-24.C: Adjust for extra DSE.
* g++.dg/warn/Wuninitialized-pr107919-1.C: Use more important
-O2 optimization level, -O1 regresses.
---
 gcc/testsuite/g++.dg/ipa/devirt-24.C  |  4 ++-
 .../g++.dg/warn/Wuninitialized-pr107919-1.C   |  2 +-
 gcc/testsuite/gcc.dg/graphite/pr80906.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c| 17 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c| 18 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c| 25 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c| 24 ++
 gcc/tree-ssa-dse.cc   | 31 ---
 8 files changed, 116 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c

diff --git a/gcc/testsuite/g++.dg/ipa/devirt-24.C 
b/gcc/testsuite/g++.dg/ipa/devirt-24.C
index 7b5b806dd05..333c03cd8dd 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-24.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-24.C
@@ -37,4 +37,6 @@ C *b = new (C);
   }
 }
 /* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known 
target" 1 "inline" { xfail *-*-* } } } */
-/* { dg-final { scan-ipa-dump-times "Aggregate passed by reference" 2 "cp"  } 
} */
+/* We used to have IPA CP see two aggregates passed to sort() but as the
+   first argument is unused DSE now elides the vptr initialization.  */
+/* { dg-final { scan-ipa-dump-times "Aggregate passed by reference" 1 "cp"  } 
} */
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
index dd631dc8bfe..067a44a462e 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
@@ -1,6 +1,6 @@
 // { dg-do compile }
 // { dg-require-effective-target c++17 }
-// { dg-options "-O -Wuninitialized" }
+// { dg-options "-O2 -Wuninitialized" }
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr80906.c 
b/gcc/testsuite/gcc.dg/graphite/pr80906.c
index 59c7f59cadf..ec3840834fc 100644
--- a/gcc/testsuite/gcc.dg/graphite/pr80906.c
+++ b/gcc/testsuite/gcc.dg/graphite/pr80906.c
@@ -18,7 +18,7 @@ ec (int lh[][2])
  --bm;
if (bm != 0)
  --c5;
-   lh[0][0] = 0;
+   lh[hp][0] = 0;
m3 *= jv;
   }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
new file mode 100644
index 000..edfc62c7e4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+int a;
+int foo (void);
+int bar (void);
+
+void
+baz (void)
+{
+  int *b[6];
+  b[0] = &a;
+  if (foo ())
+a |= bar ();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: b\\\[0\\\] = &a;" "dse1" } 
} */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
new file mode 100644
index 000..1eec284a415
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-dce -fdump-tree-dse1-details" } */
+
+struct X { int i; };
+void bar ();
+void foo (int b)
+{
+  struct X x;
+  x.i = 1;
+  if (b)
+{
+  bar ();
+  __builtin_abort ();
+}
+  bar ();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: x.i = 1;" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
new file mode 100644
i

Re: [PATCH 1/2] RISC-V: Add tests for cpymemsi expansion

2024-05-16 Thread Christoph Müllner
On Wed, May 15, 2024 at 10:22 PM Patrick O'Neill  wrote:
>
>
> On 5/14/24 22:00, Christoph Müllner wrote:
>
> On Fri, May 10, 2024 at 6:01 AM Patrick O'Neill  wrote:
>
> Hi Christoph,
>
> cpymemsi-1.c fails on a subset of newlib targets.
>
> "UNRESOLVED: gcc.target/riscv/cpymemsi-1.c   -O0  compilation failed to
> produce executable"
>
> Full list of failing targets here (New Failures section):
> https://github.com/patrick-rivos/gcc-postcommit-ci/issues/906
>
> Thanks for reporting!
> I'm having a hard time figuring out what the issue is here, as I can't
> reproduce it locally.
> This test is an execution test ("dg-do run"), so I wonder if this
> might be the issue?
>
> riscv-gnu-toolchain configure command: ../configure --prefix=$(pwd) 
> -with-arch=rv32imac_zba_zbb_zbc_zbs -with-abi=ilp32
>
> Here's the verbose logs:
>
> Executing on host: 
> /scratch/tc-testing/tc-upstream/build/build-gcc-newlib-stage2/gcc/xgcc 
> -B/scratch/tc-testing/tc-upstream/build/build-gcc-newlib-stage2/gcc/  
> /scratch/tc-testing/tc-upstream/gcc/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
>   -march=rv32imac_zba_zbb_zbc_zbs -mabi=ilp32 -mcmodel=medlow   
> -fdiagnostics-plain-output-O0  -march=rv32gc -save-temps -g0 -fno-lto 
> -DRUN_FRACTION=11  -lm  -o ./cpymemsi-1.exe(timeout = 1200)
> spawn -ignore SIGHUP 
> /scratch/tc-testing/tc-upstream/build/build-gcc-newlib-stage2/gcc/xgcc 
> -B/scratch/tc-testing/tc-upstream/build/build-gcc-newlib-stage2/gcc/ 
> /scratch/tc-testing/tc-upstream/gcc/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
>  -march=rv32imac_zba_zbb_zbc_zbs -mabi=ilp32 -mcmodel=medlow 
> -fdiagnostics-plain-output -O0 -march=rv32gc -save-temps -g0 -fno-lto 
> -DRUN_FRACTION=11 -lm -o ./cpymemsi-1.exe
> xgcc: fatal error: Cannot find suitable multilib set for 
> '-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32'
> compilation terminated.
> compiler exited with status 1
> FAIL: gcc.target/riscv/cpymemsi-1.c   -O0  (test for excess errors)
>
> Looks like it's only failing on targets without the 'f' extension so maybe we 
> need to add a riscv_f to avoid running on non-f targets (similar to what we 
> have for riscv_v)?

Ok, now I understand what's going on.
For "dg-do run" tests we should be more liberal with the provided
`-march` string in dg-options
(or be more restrictive using effective-target checks if necessary -
which is not the case here).
I've sent out the following patch, which should address this issue:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651841.html

BR
Christoph


Re: [PATCH] RISC-V: testsuite: Drop march-string in cpymemsi-1.c

2024-05-16 Thread Kito Cheng
Just one minor question

> diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
> b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> index 983b564ccaf..aee54d9aa00 100644
> --- a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> @@ -1,6 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } } 
> } */
> -/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } } 
> } */
> +/* { dg-options "-save-temps -g0 -fno-lto" } */

I know -save-temps -g0 already exists, but I am wondering why we need
those 2 options here?


Re: [PATCH] RISC-V: testsuite: Drop march-string in cpymemsi-1.c

2024-05-16 Thread Christoph Müllner
On Thu, May 16, 2024 at 10:03 AM Kito Cheng  wrote:
>
> Just one minor question
>
> > diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
> > b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> > index 983b564ccaf..aee54d9aa00 100644
> > --- a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> > +++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> > @@ -1,6 +1,5 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } 
> > } } */
> > -/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } 
> > } } */
> > +/* { dg-options "-save-temps -g0 -fno-lto" } */
>
> I know -save-temps -g0 already exists, but I am wondering why we need
> those 2 options here?

I copied from gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c.
But you are right, we could drop these as well
(gcc/testsuite/gcc.target/powerpc/block-cmp-8.c also did so).
I'll retest/resend without the whole dg-options line.


Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Richard Biener
On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
>
> > LGTM but you'll need an OK from Richard,
> > Thanks for working on this!
>
> Thanks Tamar for help and coaching, let's wait Richard for a while,😊!

OK.

Thanks for the patience,
Richard.

> Pan
>
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> Liu, Hongtao 
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> Hi Pan,
>
> Thanks!
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> >   PR target/51492
> >   PR target/112600
> >
> > gcc/ChangeLog:
> >
> >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> >   to the return true switch case(s).
> >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> >   * match.pd: Add unsigned SAT_ADD match(es).
> >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> >   us/ssadd.
> >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> >   extern func decl generated in match.pd match.
> >   (match_saturation_arith): New func impl to match the saturation arith.
> >   (math_opts_dom_walker::after_dom_children): Try match saturation
> >   arith when IOR expr.
> >
>
>  LGTM but you'll need an OK from Richard,
>
> Thanks for working on this!
>
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/internal-fn.cc|  1 +
> >  gcc/internal-fn.def   |  2 ++
> >  gcc/match.pd  | 51 +++
> >  gcc/optabs.def|  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >  case IFN_UBSAN_CHECK_MUL:
> >  case IFN_ADD_OVERFLOW:
> >  case IFN_MUL_OVERFLOW:
> > +case IFN_SAT_ADD:
> >  case IFN_VEC_WIDEN_PLUS:
> >  case IFN_VEC_WIDEN_PLUS_LO:
> >  case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> > smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gc

Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]

2024-05-16 Thread Richard Biener
On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis  wrote:
>
> New patch with the requested changes can be found below.
>
> I don't know how much this affects SCEV, but I do believe that we
> should incorporate this change somehow. I've seen various cases of
> suboptimal address calculation codegen that boil down to this.

This misses the ChangeLog (I assume it's unchanged) and indent
of the match.pd part is now off.

Please fix that, the patch is OK with that change.

Thanks,
Richard.

> gcc/match.pd | 31 +++
> gcc/testsuite/gcc.dg/pr109393.c | 16 
> 2 files changed, 47 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..1d642c205f0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (plus (convert @0) (op @2 (convert @1))
> #endif
> +/* ((T)(A + CST1)) * CST2 + CST3
> + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> + Where (A + CST1) doesn't need to have a single use. */
> +#if GIMPLE
> + (for op (plus minus)
> + (simplify
> + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> + INTEGER_CST@3)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_WRAPS (type))
> + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)
> +#endif
> +
> +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> +#if GIMPLE
> + (for op (plus minus)
> + (simplify
> + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_WRAPS (type))
> + (op (mult (convert @0) @2) (mult (convert @1) @2)
> +#endif
> +
> /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> to a simple value. */
> (for op (plus minus)
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 000..e9051273672
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +
> +int foo(int *a, int j)
> +{
> + int k = j - 1;
> + return a[j - 1] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> + int k = j - 1;
> + return (&a[j + 1] - 2) == &a[k];
> +}
> --
> 2.44.0
>
>
> On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis  
> wrote:
> >
> > The original motivation for this pattern was that the following function 
> > does
> > not fold to 'return 1':
> >
> > int foo(int *a, int j)
> > {
> >   int k = j - 1;
> >   return a[j - 1] == a[k];
> > }
> >
> > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part 
> > of
> > address calculations (e.g. arrays). These patterns help fold and simplify 
> > more
> > expressions.
> >
> > PR tree-optimization/109393
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> >   ((T)(A +- CST1)) * CST2 + CST3.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/pr109393.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/match.pd| 30 ++
> >  gcc/testsuite/gcc.dg/pr109393.c | 16 
> >  2 files changed, 46 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..13c828ba70d 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (plus (convert @0) (op @2 (convert @1))
> >  #endif
> >
> > +/* ((T)(A + CST1)) * CST2 + CST3
> > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > +   Where (A + CST1) doesn't need to have a single use.  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +(plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) 
> > INTEGER_CST@3)
> > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > + && TREE_CODE (type) == INTEGER_TYPE
> > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_WRAPS (type))
> > +   (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)
> > +#endif
> > +
> > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +(mult (

Re: [PATCH v3] driver: Output to a temp file; rename upon success [PR80182]

2024-05-16 Thread Richard Biener
On Sun, May 12, 2024 at 3:40 PM Peter Damianov  wrote:
>
> Currently, commands like:
> gcc -o file.c -lm
> will delete the user's code.
>
> This patch makes the linker write executables to a temp file, and then renames
> the temp file if successful. This fixes the case above, but has limitations.
> The source file will still get overwritten if the link "succeeds", such as the
> case of: gcc -o file.c -lm -r
>
> It's not perfect, but it should hopefully stop some people from ruining their
> day.

Hmm.  When suggesting this I was originally hoping for this to be implemented
in the linker so that it delays opening (and truncating) of the output
file as much as possible.

If we want to do something in the compiler driver then I like the filename based
heuristics more.  v3 seems to only address the case of -o specifying the linker
output file but of course

gcc -c t.c -o t2.c

or

gcc -S t.c -o t2.c

happily overwrite a source file as well.  For these cases
heuristically rejecting
source file patterns would be better.  As we've shown the rename trick when
the link was successful doesn't fully solve the issue.  And I bet some people
will claim it isn't an issue at all ...

That is, I do think the linker itself, as a quality of implementation issue,
should avoid truncating the output early.  In fact the BFD linker seems to
unlink the output very early:

24937 stat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 lstat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 unlink("t.c") = 0
24937 openat(AT_FDCWD, "t.c", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3

before even opening other inputs or the default linker script.

Richard.

> gcc/ChangeLog:
> PR driver/80182
> * gcc.cc (output_file_temp): New global variable
> (driver_handle_option): Create temp file for executable output
> (driver::maybe_run_linker): Rename output_file_temp to output_file if
> the linker ran successfully
>
> Signed-off-by: Peter Damianov 
> ---
>
> v3: don't attempt to create temp files -> rename for -o /dev/null
>
>  gcc/gcc.cc | 53 +
>  1 file changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..5e38c6e578a 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -2138,6 +2138,11 @@ static int have_E = 0;
>  /* Pointer to output file name passed in with -o. */
>  static const char *output_file = 0;
>
> +/* We write the output file to a temp file, and rename it if linking
> +   is successful. This is to prevent mistakes like: gcc -o file.c -lm from
> +   deleting the user's code.  */
> +static const char *output_file_temp = 0;
> +
>  /* Pointer to input file name passed in with -truncate.
> This file should be truncated after linking. */
>  static const char *totruncate_file = 0;
> @@ -4610,10 +4615,18 @@ driver_handle_option (struct gcc_options *opts,
>  #if defined(HAVE_TARGET_EXECUTABLE_SUFFIX) || 
> defined(HAVE_TARGET_OBJECT_SUFFIX)
>arg = convert_filename (arg, ! have_c, 0);
>  #endif
> -  output_file = arg;
> +  output_file_temp = output_file = arg;
> +  /* If creating an executable, create a temp file for the output, unless
> + -o /dev/null was requested. This will later get renamed, if the 
> linker
> + succeeds.  */
> +  if (!have_c && strcmp (output_file, HOST_BIT_BUCKET) != 0)
> +{
> +  output_file_temp = make_temp_file ("");
> +  record_temp_file (output_file_temp, false, true);
> +}
>/* On some systems, ld cannot handle "-o" without a space.  So
>  split the option from its argument.  */
> -  save_switch ("-o", 1, &arg, validated, true);
> +  save_switch ("-o", 1, &output_file_temp, validated, true);
>return true;
>
>  case OPT_pie:
> @@ -9266,22 +9279,30 @@ driver::maybe_run_linker (const char *argv0) const
>linker_was_run = (tmp != execution_count);
>  }
>
> -  /* If options said don't run linker,
> - complain about input files to be given to the linker.  */
> -
> -  if (! linker_was_run && !seen_error ())
> -for (i = 0; (int) i < n_infiles; i++)
> -  if (explicit_link_files[i]
> - && !(infiles[i].language && infiles[i].language[0] == '*'))
> +  if (!seen_error ())
> +{
> +  if (linker_was_run)
> +   /* If the linker finished without errors, rename the output from the
> +  temporary file to the real output name.  */
> +   rename (output_file_temp, output_file);
> +  else
> {
> - warning (0, "%s: linker input file unused because linking not done",
> -  outfiles[i]);
> - if (access (outfiles[i], F_OK) < 0)
> -   /* This is can be an indication the user specifed an errorneous
> -  separated option value, (or used the wrong prefix for an
> -  option).  */
> -   error ("%s: linker input file not found: %m", outfiles[i]

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Richard Biener
On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
>
> On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  wrote:
> >
> > If we consider code like:
> >
> > if (bar1 == x)
> >   return foo();
> > if (bar2 != y)
> >   return foo();
> > return 0;
> >
> > We would like the ifcombine pass to convert this to:
> >
> > if (bar1 == x || bar2 != y)
> >   return foo();
> > return 0;
> >
> > The ifcombine pass can handle this transformation but it is ran very early 
> > and
> > it misses the opportunity because there are two seperate blocks for foo().
> > The pre pass is good at removing duplicate code and blocks and due to that
> > running ifcombine again after it can increase the number of successful
> > conversions.
>
> I do think we should have something similar to re-running
> ssa-ifcombine but I think it should be much later, like after the loop
> optimizations are done.
> Maybe just a simplified version of it (that does the combining and not
> the optimizations part) included in isel or pass_optimize_widening_mul
> (which itself should most likely become part of isel or renamed since
> it handles more than just widening multiply these days).

I've long wished we had a (late?) pass that can also undo if-conversion
(basically do what RTL expansion would later do).  Maybe
gimple-predicate-analysis.cc (what's used by uninit analysis) can
represent mixed CFG + if-converted conditions so we can optimize
it and code-gen the condition in a more optimal manner much like
we have if-to-switch, switch-conversion and switch-expansion.

That said, I agree that re-running ifcombine should be later.  And there's
still the old task of splitting tail-merging from PRE (and possibly making
it more effective).

Richard.

>
> Thanks,
> Andrew Pinski
>
>
> >
> > PR 102793
> >
> > gcc/ChangeLog:
> >
> > * common.opt: -ftree-ifcombine option, enabled by default.
> > * doc/invoke.texi: Document.
> > * passes.def: Re-run ssa-ifcombine after pre.
> > * tree-ssa-ifcombine.cc: Make ifcombine cloneable. Add gate 
> > function.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/20030922-2.c: Change flag to -fno-tree-ifcombine.
> > * gcc.dg/uninit-pred-6_c.c: Remove inconsistent check.
> > * gcc.target/aarch64/pr102793.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/common.opt  |  4 +++
> >  gcc/doc/invoke.texi |  5 
> >  gcc/passes.def  |  1 +
> >  gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c  |  2 +-
> >  gcc/testsuite/gcc.dg/uninit-pred-6_c.c  |  4 ---
> >  gcc/testsuite/gcc.target/aarch64/pr102793.c | 30 +
> >  gcc/tree-ssa-ifcombine.cc   |  5 
> >  7 files changed, 46 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102793.c
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index ad348844775..e943202bcf1 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3163,6 +3163,10 @@ ftree-phiprop
> >  Common Var(flag_tree_phiprop) Init(1) Optimization
> >  Enable hoisting loads from conditional pointers.
> >
> > +ftree-ifcombine
> > +Common Var(flag_tree_ifcombine) Init(1) Optimization
> > +Merge some conditional branches to simplify control flow.
> > +
> >  ftree-pre
> >  Common Var(flag_tree_pre) Optimization
> >  Enable SSA-PRE optimization on trees.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index e2edf7a6c13..8d2ff6b4512 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13454,6 +13454,11 @@ This flag is enabled by default at @option{-O1} 
> > and higher.
> >  Perform hoisting of loads from conditional pointers on trees.  This
> >  pass is enabled by default at @option{-O1} and higher.
> >
> > +@opindex ftree-ifcombine
> > +@item -ftree-ifcombine
> > +Merge some conditional branches to simplify control flow.  This pass
> > +is enabled by default at @option{-O1} and higher.
> > +
> >  @opindex fhoist-adjacent-loads
> >  @item -fhoist-adjacent-loads
> >  Speculatively hoist loads from both branches of an if-then-else if the
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 1cbbd413097..1765b476131 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -270,6 +270,7 @@ along with GCC; see the file COPYING3.  If not see
> >NEXT_PASS (pass_lim);
> >NEXT_PASS (pass_walloca, false);
> >NEXT_PASS (pass_pre);
> > +  NEXT_PASS (pass_tree_ifcombine);
> >NEXT_PASS (pass_sink_code, false /* unsplit edges */);
> >NEXT_PASS (pass_sancov);
> >NEXT_PASS (pass_asan);
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> > index 16c79da9521..66c9f481a2f 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> > @@ -1,5 +1,5 @@
> 

Re: [NOT CODE REVIEW] [PATCH v3 1/1] [RISC-V] Add support for _Bfloat16

2024-05-16 Thread Kito Cheng
Hi Xiao Zeng:

Just wondering why use _Bfloat16 rather than __bf16? you mention
__bf16 in comment, but implementation use _Bfloat16? I would like to
use __bf16 to make it consistent between LLVM and psABI if possible :)


[PATCH] RISC-V: testsuite: Drop march-string in cmpmemsi/cpymemsi tests

2024-05-16 Thread Christoph Müllner
The tests cmpmemsi-1.c and cpymemsi-1.c are execution ("dg-do run")
tests, which does not have any restrictions for the enabled extensions.
Further, no other listed options are required.
Let's drop the options, so that the test can also be executed on
non-f and non-d targets.  However, we need to set options to the
defaults without '-ansi', because the included test file uses the
'asm' keyword, which is not part of ANSI C.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: Drop options.
* gcc.target/riscv/cpymemsi-1.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c | 3 +--
 gcc/testsuite/gcc.target/riscv/cpymemsi-1.c | 4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c 
b/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
index d7e0bc47407..698f27d89fb 100644
--- a/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-march=rv32gc_zbb -save-temps -g0 -fno-lto" { target { rv32 } 
} } */
-/* { dg-options "-march=rv64gc_zbb -save-temps -g0 -fno-lto" { target { rv64 } 
} } */
+/* { dg-options "-pedantic-errors" } */
 /* { dg-timeout-factor 2 } */
 
 #include "../../gcc.dg/memcmp-1.c"
diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
index 983b564ccaf..30e9f119bed 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
@@ -1,7 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } } } 
*/
-/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } } } 
*/
-/* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
+/* { dg-options "-pedantic-errors" } */
 /* { dg-timeout-factor 2 } */
 
 #include "../../gcc.dg/memcmp-1.c"
-- 
2.44.0



Re: [PATCH] RISC-V: testsuite: Drop march-string in cmpmemsi/cpymemsi tests

2024-05-16 Thread Kito Cheng
LGTM

On Thu, May 16, 2024 at 5:09 PM Christoph Müllner
 wrote:
>
> The tests cmpmemsi-1.c and cpymemsi-1.c are execution ("dg-do run")
> tests, which does not have any restrictions for the enabled extensions.
> Further, no other listed options are required.
> Let's drop the options, so that the test can also be executed on
> non-f and non-d targets.  However, we need to set options to the
> defaults without '-ansi', because the included test file uses the
> 'asm' keyword, which is not part of ANSI C.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/cmpmemsi-1.c: Drop options.
> * gcc.target/riscv/cpymemsi-1.c: Likewise.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c | 3 +--
>  gcc/testsuite/gcc.target/riscv/cpymemsi-1.c | 4 +---
>  2 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c 
> b/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
> index d7e0bc47407..698f27d89fb 100644
> --- a/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c
> @@ -1,6 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-march=rv32gc_zbb -save-temps -g0 -fno-lto" { target { rv32 
> } } } */
> -/* { dg-options "-march=rv64gc_zbb -save-temps -g0 -fno-lto" { target { rv64 
> } } } */
> +/* { dg-options "-pedantic-errors" } */
>  /* { dg-timeout-factor 2 } */
>
>  #include "../../gcc.dg/memcmp-1.c"
> diff --git a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c 
> b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> index 983b564ccaf..30e9f119bed 100644
> --- a/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/cpymemsi-1.c
> @@ -1,7 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-march=rv32gc -save-temps -g0 -fno-lto" { target { rv32 } } 
> } */
> -/* { dg-options "-march=rv64gc -save-temps -g0 -fno-lto" { target { rv64 } } 
> } */
> -/* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
> +/* { dg-options "-pedantic-errors" } */
>  /* { dg-timeout-factor 2 } */
>
>  #include "../../gcc.dg/memcmp-1.c"
> --
> 2.44.0
>


Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-16 Thread Hongyu Wang
Richard Biener  于2024年5月16日周四 15:05写道:

>
> On Thu, May 16, 2024 at 8:25 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > In ix86_override_options_after_change, calls to ix86_default_align
> > and ix86_recompute_optlev_based_flags will cause mismatched target
> > opt_set when doing cl_optimization_restore. Move them back to
> > ix86_option_override_internal to solve the issue.
> >
> > Bootstrapped & regtested on x86_64-pc-linux-gnu, and Rainer helped to
> > test with i386-pc-solaris2.11 which also passed 32/64bit tests.
>
> Since this is a tricky area apparently without too much test coverage can
> we have a testcase for this?

This is a fix for my previous change on PR 107692, which moved these 2
functions to ix86_override_options_after_change and it caused the
PR113719 regression. The PR103696 test is the one that expose the
issue. Actually the previous change will cause these 2 function be
called in cl_optimization_restore
which is redundant and incorrect. I cannot find another test to expose
other functional regressions.

>
> > Ok for trunk and backport down to gcc12?
> >
> > gcc/ChangeLog:
> >
> > PR target/113719
> > * config/i386/i386-options.cc (ix86_override_options_after_change):
> > Remove call to ix86_default_align and
> > ix86_recompute_optlev_based_flags.
> > (ix86_option_override_internal): Call ix86_default_align and
> > ix86_recompute_optlev_based_flags.
> > ---
> >  gcc/config/i386/i386-options.cc | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-options.cc 
> > b/gcc/config/i386/i386-options.cc
> > index ac48b5c61c4..d97464f2c74 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -1930,11 +1930,6 @@ ix86_recompute_optlev_based_flags (struct 
> > gcc_options *opts,
> >  void
> >  ix86_override_options_after_change (void)
> >  {
> > -  /* Default align_* from the processor table.  */
> > -  ix86_default_align (&global_options);
> > -
> > -  ix86_recompute_optlev_based_flags (&global_options, &global_options_set);
> > -
> >/* Disable unrolling small loops when there's explicit
> >   -f{,no}unroll-loop.  */
> >if ((OPTION_SET_P (flag_unroll_loops))
> > @@ -2530,6 +2525,8 @@ ix86_option_override_internal (bool main_args_p,
> >
> >set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
> >
> > +  ix86_recompute_optlev_based_flags (opts, opts_set);
> > +
> >ix86_override_options_after_change ();
> >
> >ix86_tune_cost = processor_cost_table[ix86_tune];
> > @@ -2565,6 +2562,9 @@ ix86_option_override_internal (bool main_args_p,
> >|| TARGET_64BIT_P (opts->x_ix86_isa_flags))
> >  opts->x_ix86_regparm = REGPARM_MAX;
> >
> > +  /* Default align_* from the processor table.  */
> > +  ix86_default_align (&global_options);
> > +
> >/* Provide default for -mbranch-cost= value.  */
> >SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
> >ix86_tune_cost->branch_cost);
> > --
> > 2.31.1
> >


Re: Re: [NOT CODE REVIEW] [PATCH v3 1/1] [RISC-V] Add support for _Bfloat16

2024-05-16 Thread Xiao Zeng
2024-05-16 16:55  Kito Cheng  wrote:
>
>Hi Xiao Zeng:
>
>Just wondering why use _Bfloat16 rather than __bf16? you mention
>__bf16 in comment, but implementation use _Bfloat16?
Obviously, this is a mistake.
This patch has spanned a considerable amount of time locally.

I will submit a new patch to correct it.
> I would like to use __bf16 to make it consistent between LLVM and psABI if 
> possible :)
Thanks Kito for pointing out this point. Meanwhile, due to my issue, I did not 
see
Andreas Schwab's email. He had already sent me an email earlier, pointing out 
the existing issues.


By the way, if I don't reply to the email in a timely manner, it must be my 
problem.
Please send me another email to remind me.
I will reset the email to avoid missing any emails

Thanks
Xiao Zeng



[COMMITTED 02/35] ada: Fix casing in error messages

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Error messages should not start with a capital letter.

gcc/ada/

* gnat_cuda.adb (Remove_CUDA_Device_Entities): Fix casing
(this primarily fixes a style, because the capitalization will
not be preserved by the error-reporting machinery anyway).
* sem_ch13.adb (Analyze_User_Aspect_Aspect_Specification): Fix
casing in error message.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gnat_cuda.adb | 2 +-
 gcc/ada/sem_ch13.adb  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gnat_cuda.adb b/gcc/ada/gnat_cuda.adb
index af47b728790..92576a4b397 100644
--- a/gcc/ada/gnat_cuda.adb
+++ b/gcc/ada/gnat_cuda.adb
@@ -270,7 +270,7 @@ package body GNAT_CUDA is
  and then Present (Corresponding_Stub (Parent (Bod)))
then
   Error_Msg_N
-("Cuda_Device not suported on separate subprograms",
+("CUDA_Device not suported on separate subprograms",
  Corresponding_Stub (Parent (Bod)));
else
   Remove (Bod);
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index eee2aa09cd5..8bc8e84ceb4 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -8754,7 +8754,7 @@ package body Sem_Ch13 is
  Arg : Node_Id;
   begin
  if No (UAD_Pragma) then
-Error_Msg_N ("No definition for user-defined aspect", Id);
+Error_Msg_N ("no definition for user-defined aspect", Id);
 return;
  end if;
 
-- 
2.43.2



[COMMITTED 01/35] ada: Fix docs and comments about pragmas for Boolean-valued aspects

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix various inconsistencies in documentation and comments of
Boolean-valued aspects.

gcc/ada/

* doc/gnat_rm/implementation_defined_pragmas.rst: Fix
documentation.
* sem_prag.adb: Fix comments.
* gnat_rm.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../implementation_defined_pragmas.rst| 57 +++
 gcc/ada/gnat_rm.texi  | 57 +++
 gcc/ada/sem_prag.adb  | 48 
 3 files changed, 89 insertions(+), 73 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst 
b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
index 3426c34ebe8..7f221e32344 100644
--- a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
@@ -341,7 +341,7 @@ Syntax:
   pragma Always_Terminates [ (boolean_EXPRESSION) ];
 
 For the semantics of this pragma, see the entry for aspect 
``Always_Terminates``
-in the SPARK 2014 Reference Manual, section 7.1.2.
+in the SPARK 2014 Reference Manual, section 6.1.10.
 
 .. _Pragma-Annotate:
 
@@ -2381,7 +2381,7 @@ Syntax:
 
 .. code-block:: ada
 
-  pragma Favor_Top_Level (type_NAME);
+  pragma Favor_Top_Level (type_LOCAL_NAME);
 
 
 The argument of pragma ``Favor_Top_Level`` must be a named access-to-subprogram
@@ -2838,7 +2838,7 @@ Syntax:
 
 .. code-block:: ada
 
-  pragma Independent (Local_NAME);
+  pragma Independent (component_LOCAL_NAME);
 
 
 This pragma is standard in Ada 2012 mode (which also provides an aspect
@@ -3537,6 +3537,11 @@ Pragma Lock_Free
 
 
 Syntax:
+
+.. code-block:: ada
+
+  pragma Lock_Free [ (static_boolean_EXPRESSION) ];
+
 This pragma may be specified for protected types or objects. It specifies that
 the implementation of protected operations must be implemented without locks.
 Compilation fails if the compiler cannot generate lock-free code for the
@@ -3850,7 +3855,7 @@ same name) that establishes the restriction 
``No_Elaboration_Code`` for
 the current unit and any extended main source units (body and subunits).
 It also has the effect of enforcing a transitive application of this
 aspect, so that if any unit is implicitly or explicitly with'ed by the
-current unit, it must also have the No_Elaboration_Code_All aspect set.
+current unit, it must also have the `No_Elaboration_Code_All` aspect set.
 It may be applied to package or subprogram specs or their generic versions.
 
 Pragma No_Heap_Finalization
@@ -4508,7 +4513,7 @@ Syntax:
 
 ::
 
-  pragma Persistent_BSS [(LOCAL_NAME)]
+  pragma Persistent_BSS [(object_LOCAL_NAME)]
 
 
 This pragma allows selected objects to be placed in the ``.persistent_bss``
@@ -6500,12 +6505,12 @@ Syntax:
 
 ::
 
-  pragma Suppress_Initialization ([Entity =>] variable_or_subtype_Name);
+  pragma Suppress_Initialization ([Entity =>] variable_or_subtype_LOCAL_NAME);
 
 
-Here variable_or_subtype_Name is the name introduced by a type declaration
-or subtype declaration or the name of a variable introduced by an
-object declaration.
+Here variable_or_subtype_LOCAL_NAME is the name introduced by a type
+declaration or subtype declaration or the name of a variable introduced by
+an object declaration.
 
 In the case of a type or subtype
 this pragma suppresses any implicit or explicit initialization
@@ -6889,22 +6894,24 @@ Syntax:
 
 
 This configuration pragma defines a new aspect, making it available for
-subsequent use in a User_Aspect aspect specification. The first
-identifier is the name of the new aspect. Any subsequent arguments
-specify the names of other aspects. A subsequent name for which no 
parenthesized
-arguments are given shall denote either a Boolean-valued
-non-representation aspect or an aspect that has been defined by another
-User_Aspect_Definition pragma. A name for which one or more arguments are
-given shall be either Annotate or Local_Restrictions (and the arguments shall
-be appropriate for the named aspect). This pragma, together with the
-User_Aspect aspect, provides a mechanism for
-avoiding textual duplication if some set of aspect specifications is needed
-in multiple places. This is somewhat analogous to how profiles allow avoiding
-duplication of Restrictions pragmas. The visibility rules for an aspect
-defined by a User_Aspect_Definition pragma are the same as for a check name
-introduced by a Check_Name pragma. If multiple
-definitions are visible for some aspect at some point, then the
-definitions must agree. A predefined aspect cannot be redefined.
+subsequent use in a `User_Aspect` aspect specification. The first identifier
+is the name of the new aspect. Any subsequent arguments specify the names
+of other aspects. A subsequent name for which no parenthesized arguments
+are given shall denote either a Boolean-valued non-representation aspect
+or an aspect that has been defined by another `User_Aspect_Definition`
+pra

[COMMITTED 05/35] ada: Cleanup reporting locations for Ada 2022 and GNAT extension aspects

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Specification): Consistently
reuse existing constant where possible.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 8bc8e84ceb4..ce9f15c1491 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -2417,7 +2417,7 @@ package body Sem_Ch13 is
 
 begin
if Ada_Version < Ada_2022 then
-  Error_Msg_Ada_2022_Feature ("aspect %", Sloc (Aspect));
+  Error_Msg_Ada_2022_Feature ("aspect %", Loc);
   return;
end if;
 
@@ -2442,7 +2442,7 @@ package body Sem_Ch13 is
 
   elsif Is_Imported_Intrinsic then
  Error_Msg_GNAT_Extension
-   ("aspect % on intrinsic function", Sloc (Aspect),
+   ("aspect % on intrinsic function", Loc,
 Is_Core_Extension => True);
 
   else
@@ -4133,7 +4133,7 @@ package body Sem_Ch13 is
 
when Aspect_Designated_Storage_Model =>
   if not All_Extensions_Allowed then
- Error_Msg_GNAT_Extension ("aspect %", Sloc (Aspect));
+ Error_Msg_GNAT_Extension ("aspect %", Loc);
 
   elsif not Is_Type (E)
 or else Ekind (E) /= E_Access_Type
@@ -4148,7 +4148,7 @@ package body Sem_Ch13 is
 
when Aspect_Storage_Model_Type =>
   if not All_Extensions_Allowed then
- Error_Msg_GNAT_Extension ("aspect %", Sloc (Aspect));
+ Error_Msg_GNAT_Extension ("aspect %", Loc);
 
   elsif not Is_Type (E)
 or else not Is_Immutably_Limited_Type (E)
@@ -4479,7 +4479,7 @@ package body Sem_Ch13 is
   --  Ada 2022 (AI12-0363): Full_Access_Only
 
   elsif A_Id = Aspect_Full_Access_Only then
- Error_Msg_Ada_2022_Feature ("aspect %", Sloc (Aspect));
+ Error_Msg_Ada_2022_Feature ("aspect %", Loc);
 
   --  Ada 2022 (AI12-0075): static expression functions
 
-- 
2.43.2



[COMMITTED 04/35] ada: Fix alphabetic ordering of aspect identifiers

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup.

gcc/ada/

* aspects.ads (Aspect_Id): Fix ordering.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/aspects.ads | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads
index eb5ab1a85dd..ce393d4f602 100644
--- a/gcc/ada/aspects.ads
+++ b/gcc/ada/aspects.ads
@@ -198,9 +198,9 @@ package Aspects is
   Aspect_Favor_Top_Level,   -- GNAT
   Aspect_Full_Access_Only,
   Aspect_Ghost, -- GNAT
+  Aspect_Import,
   Aspect_Independent,
   Aspect_Independent_Components,
-  Aspect_Import,
   Aspect_Inline,
   Aspect_Inline_Always, -- GNAT
   Aspect_Interrupt_Handler,
@@ -971,6 +971,7 @@ package Aspects is
   Aspect_Shared_Passive   => Always_Delay,
   Aspect_Simple_Storage_Pool  => Always_Delay,
   Aspect_Simple_Storage_Pool_Type => Always_Delay,
+  Aspect_Stable_Properties=> Always_Delay,
   Aspect_Static_Predicate => Always_Delay,
   Aspect_Storage_Model_Type   => Always_Delay,
   Aspect_Storage_Pool => Always_Delay,
@@ -1032,7 +1033,6 @@ package Aspects is
   Aspect_Relaxed_Initialization   => Never_Delay,
   Aspect_Side_Effects => Never_Delay,
   Aspect_SPARK_Mode   => Never_Delay,
-  Aspect_Stable_Properties=> Always_Delay,
   Aspect_Static   => Never_Delay,
   Aspect_Subprogram_Variant   => Never_Delay,
   Aspect_Synchronization  => Never_Delay,
-- 
2.43.2



[COMMITTED 08/35] ada: Fix bug in maintaining dimension info

2024-05-16 Thread Marc Poulhiès
From: Steve Baird 

Copying a node does not automatically propagate its associated dimension
information (if any). This must be done explicitly.

gcc/ada/

* sem_util.adb (Copy_Node_With_Replacement): Add call to
Copy_Dimensions so that any dimension information associated with
the copied node is also associated with the resulting copy.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 1785931530f..68e131db606 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -53,6 +53,7 @@ with Sem_Cat;use Sem_Cat;
 with Sem_Ch6;use Sem_Ch6;
 with Sem_Ch8;use Sem_Ch8;
 with Sem_Ch13;   use Sem_Ch13;
+with Sem_Dim;use Sem_Dim;
 with Sem_Disp;   use Sem_Disp;
 with Sem_Elab;   use Sem_Elab;
 with Sem_Eval;   use Sem_Eval;
@@ -23447,6 +23448,8 @@ package body Sem_Util is
   Set_Chars (Result, Chars (Entity (Result)));
end if;
 end if;
+
+Copy_Dimensions (From => N, To => Result);
  end if;
 
  return Result;
-- 
2.43.2



[COMMITTED 03/35] ada: Fix ordering of code for pragma Preelaborable_Initialization

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Move case alternative to match
to alphabetic order.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 160 +--
 1 file changed, 80 insertions(+), 80 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 105cc73eba3..2fc46ab0cd2 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -21889,86 +21889,6 @@ package body Sem_Prag is
Check_Arg_Is_One_Of (Arg1, Name_Semaphore, Name_No);
 end if;
 
- --
- -- Preelaborable_Initialization --
- --
-
- --  pragma Preelaborable_Initialization (DIRECT_NAME);
-
- when Pragma_Preelaborable_Initialization => Preelab_Init : declare
-Ent : Entity_Id;
-
- begin
-Ada_2005_Pragma;
-Check_Arg_Count (1);
-Check_No_Identifiers;
-Check_Arg_Is_Identifier (Arg1);
-Check_Arg_Is_Local_Name (Arg1);
-Check_First_Subtype (Arg1);
-Ent := Entity (Get_Pragma_Arg (Arg1));
-
---  A pragma that applies to a Ghost entity becomes Ghost for the
---  purposes of legality checks and removal of ignored Ghost code.
-
-Mark_Ghost_Pragma (N, Ent);
-
---  The pragma may come from an aspect on a private declaration,
---  even if the freeze point at which this is analyzed in the
---  private part after the full view.
-
-if Has_Private_Declaration (Ent)
-  and then From_Aspect_Specification (N)
-then
-   null;
-
---  Check appropriate type argument
-
-elsif Is_Private_Type (Ent)
-  or else Is_Protected_Type (Ent)
-  or else (Is_Generic_Type (Ent) and then Is_Derived_Type (Ent))
-
-  --  AI05-0028: The pragma applies to all composite types. Note
-  --  that we apply this binding interpretation to earlier versions
-  --  of Ada, so there is no Ada 2012 guard. Seems a reasonable
-  --  choice since there are other compilers that do the same.
-
-  or else Is_Composite_Type (Ent)
-then
-   null;
-
-else
-   Error_Pragma_Arg
- ("pragma % can only be applied to private, formal derived, "
-  & "protected, or composite type", Arg1);
-end if;
-
---  Give an error if the pragma is applied to a protected type that
---  does not qualify (due to having entries, or due to components
---  that do not qualify).
-
-if Is_Protected_Type (Ent)
-  and then not Has_Preelaborable_Initialization (Ent)
-then
-   Error_Msg_N
- ("protected type & does not have preelaborable "
-  & "initialization", Ent);
-
---  Otherwise mark the type as definitely having preelaborable
---  initialization.
-
-else
-   Set_Known_To_Have_Preelab_Init (Ent);
-end if;
-
-if Has_Pragma_Preelab_Init (Ent)
-  and then Warn_On_Redundant_Constructs
-then
-   Error_Pragma ("?r?duplicate pragma%!");
-else
-   Set_Has_Pragma_Preelab_Init (Ent);
-end if;
- end Preelab_Init;
-
  
  -- Persistent_BSS --
  
@@ -22057,6 +21977,86 @@ package body Sem_Prag is
 end if;
  end Persistent_BSS;
 
+ --
+ -- Preelaborable_Initialization --
+ --
+
+ --  pragma Preelaborable_Initialization (DIRECT_NAME);
+
+ when Pragma_Preelaborable_Initialization => Preelab_Init : declare
+Ent : Entity_Id;
+
+ begin
+Ada_2005_Pragma;
+Check_Arg_Count (1);
+Check_No_Identifiers;
+Check_Arg_Is_Identifier (Arg1);
+Check_Arg_Is_Local_Name (Arg1);
+Check_First_Subtype (Arg1);
+Ent := Entity (Get_Pragma_Arg (Arg1));
+
+--  A pragma that applies to a Ghost entity becomes Ghost for the
+--  purposes of legality checks and removal of ignored Ghost code.
+
+Mark_Ghost_Pragma (N, Ent);
+
+--  The pragma may come from an aspect on a private declaration,
+--  even if the freeze point at which this is analyzed in the
+--  private part after the full view.
+
+if Has_Private_Declaration (Ent)
+  and then From_Aspect_Specification (N)
+then
+   null;
+
+--  Check appropriate type argument

[COMMITTED 10/35] ada: Implement per-finalization-collection spinlocks

2024-05-16 Thread Marc Poulhiès
From: Eric Botcazou 

This changes the implementation of finalization collections from using the
global task lock to using per-collection spinlocks.  Spinlocks are a good
fit in this context because they are very cheap and therefore can be taken
with a fine granularity only around the portions of code implementing the
shuffling of pointers required by attachment and detachment actions.

gcc/ada/

* libgnat/s-finpri.ads (Lock_Type): New modular type.
(Collection_Node): Add Enclosing_Collection component.
(Finalization_Collection): Add Lock component.
* libgnat/s-finpri.adb: Add clauses for System.Atomic_Primitives.
(Attach_Object_To_Collection): Lock and unlock the collection.
Save a pointer to the enclosing collection in the node.
(Detach_Object_From_Collection): Lock and unlock the collection.
(Finalize): Likewise.
(Initialize): Initialize the lock.
(Lock_Collection): New procedure.
(Unlock_Collection): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 79 +---
 gcc/ada/libgnat/s-finpri.ads | 12 +-
 2 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index 8026b3fb284..09f2761a5b9 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -32,7 +32,8 @@
 with Ada.Exceptions;   use Ada.Exceptions;
 with Ada.Unchecked_Conversion;
 
-with System.Soft_Links; use System.Soft_Links;
+with System.Atomic_Primitives; use System.Atomic_Primitives;
+with System.Soft_Links;use System.Soft_Links;
 
 package body System.Finalization_Primitives is
 
@@ -42,7 +43,21 @@ package body System.Finalization_Primitives is
  new Ada.Unchecked_Conversion (Address, Collection_Node_Ptr);
 
procedure Detach_Node_From_Collection (Node : not null Collection_Node_Ptr);
-   --  Removes a collection node from its associated finalization collection
+   --  Remove a collection node from its associated finalization collection.
+   --  Calls to the procedure with a Node that has already been detached have
+   --  no effects.
+
+   procedure Lock_Collection (Collection : in out Finalization_Collection);
+   --  Lock the finalization collection. Upon return, the caller owns the lock
+   --  to the collection and no other call with the same actual parameter will
+   --  return until a corresponding call to Unlock_Collection has been made by
+   --  the caller. This means that it is not possible to call Lock_Collection
+   --  more than once on a collection without a call to Unlock_Collection in
+   --  between.
+
+   procedure Unlock_Collection (Collection : in out Finalization_Collection);
+   --  Unlock the finalization collection, i.e. relinquish ownership of the
+   --  lock to the collection.
 
---
-- Add_Offset_To_Address --
@@ -69,7 +84,7 @@ package body System.Finalization_Primitives is
To_Collection_Node_Ptr (Object_Address - Header_Size);
 
begin
-  Lock_Task.all;
+  Lock_Collection (Collection);
 
   --  Do not allow the attachment of controlled objects while the
   --  associated collection is being finalized.
@@ -89,22 +104,23 @@ package body System.Finalization_Primitives is
   pragma Assert
 (Finalize_Address /= null, "primitive Finalize_Address not available");
 
-  Node.Finalize_Address := Finalize_Address;
-  Node.Prev := Collection.Head'Unchecked_Access;
-  Node.Next := Collection.Head.Next;
+  Node.Enclosing_Collection := Collection'Unrestricted_Access;
+  Node.Finalize_Address := Finalize_Address;
+  Node.Prev := Collection.Head'Unchecked_Access;
+  Node.Next := Collection.Head.Next;
 
   Collection.Head.Next.Prev := Node;
   Collection.Head.Next  := Node;
 
-  Unlock_Task.all;
+  Unlock_Collection (Collection);
 
exception
   when others =>
 
- --  Unlock the task in case the attachment failed and reraise the
- --  exception.
+ --  Unlock the collection in case the attachment failed and reraise
+ --  the exception.
 
- Unlock_Task.all;
+ Unlock_Collection (Collection);
  raise;
end Attach_Object_To_Collection;
 
@@ -180,11 +196,11 @@ package body System.Finalization_Primitives is
To_Collection_Node_Ptr (Object_Address - Header_Size);
 
begin
-  Lock_Task.all;
+  Lock_Collection (Node.Enclosing_Collection.all);
 
   Detach_Node_From_Collection (Node);
 
-  Unlock_Task.all;
+  Unlock_Collection (Node.Enclosing_Collection.all);
end Detach_Object_From_Collection;
 
--
@@ -213,14 +229,14 @@ package body System.Finalization_Primitives is
   end Is_Empty_List;
 
begin
-  Lock_Task.all;
+  Lock_Collection (Collection);

[COMMITTED 17/35] ada: Fix typo in CUDA error message

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix typo in error message; semantics is unaffected.

gcc/ada/

* gnat_cuda.adb (Remove_CUDA_Device_Entities): Fix typo.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gnat_cuda.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/gnat_cuda.adb b/gcc/ada/gnat_cuda.adb
index 92576a4b397..b531c15d380 100644
--- a/gcc/ada/gnat_cuda.adb
+++ b/gcc/ada/gnat_cuda.adb
@@ -270,7 +270,7 @@ package body GNAT_CUDA is
  and then Present (Corresponding_Stub (Parent (Bod)))
then
   Error_Msg_N
-("CUDA_Device not suported on separate subprograms",
+("CUDA_Device not supported on separate subprograms",
  Corresponding_Stub (Parent (Bod)));
else
   Remove (Bod);
-- 
2.43.2



[COMMITTED 13/35] ada: Fix casing of CUDA in error messages

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Error messages now capitalize CUDA.

gcc/ada/

* erroutc.adb (Set_Msg_Insertion_Reserved_Word): Fix casing for
CUDA appearing in error message strings.
(Set_Msg_Str): Likewise for CUDA being a part of a Name_Id.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/erroutc.adb | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
index be200e0016e..cef04d5daf2 100644
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -1475,12 +1475,17 @@ package body Erroutc is
   if Name_Len = 2 and then Name_Buffer (1 .. 2) = "RM" then
  Set_Msg_Name_Buffer;
 
+  --  We make a similar exception for CUDA
+
+  elsif Name_Len = 4 and then Name_Buffer (1 .. 4) = "CUDA" then
+ Set_Msg_Name_Buffer;
+
   --  We make a similar exception for SPARK
 
   elsif Name_Len = 5 and then Name_Buffer (1 .. 5) = "SPARK" then
  Set_Msg_Name_Buffer;
 
-  --  Neither RM nor SPARK: case appropriately and add surrounding quotes
+  --  Otherwise, case appropriately and add surrounding quotes
 
   else
  Set_Casing (Keyword_Casing (Flag_Source), All_Lower_Case);
@@ -1608,6 +1613,12 @@ package body Erroutc is
   elsif Text = "Cpp_Vtable" then
  Set_Msg_Str ("CPP_Vtable");
 
+  elsif Text = "Cuda_Device" then
+ Set_Msg_Str ("CUDA_Device");
+
+  elsif Text = "Cuda_Global" then
+ Set_Msg_Str ("CUDA_Global");
+
   elsif Text = "Persistent_Bss" then
  Set_Msg_Str ("Persistent_BSS");
 
-- 
2.43.2



[COMMITTED 09/35] ada: Formal_Derived_Type'Size is not static

2024-05-16 Thread Marc Poulhiès
From: Steve Baird 

In deciding whether a Size attribute reference is static, the compiler could
get confused about whether an implicitly-declared subtype of a generic formal
type is itself a generic formal type, possibly resulting in an assertion
failure and then a bugbox.

gcc/ada/

* sem_attr.adb (Eval_Attribute): Expand existing checks for
generic formal types for which Is_Generic_Type returns False. In
that case, mark the attribute reference as nonstatic.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 65442d45a85..2fa7d7d25d2 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -8685,10 +8685,20 @@ package body Sem_Attr is
   --  If the root type or base type is generic, then we cannot fold. This
   --  test is needed because subtypes of generic types are not always
   --  marked as being generic themselves (which seems odd???)
+  --
+  --  Should this situation be addressed instead by either
+  -- a) setting Is_Generic_Type in more cases
+  --  or b) replacing preceding calls to Is_Generic_Type with calls to
+  --Sem_Util.Some_New_Function
+  --  so that we wouldn't have to deal with these cases here ???
 
   if Is_Generic_Type (P_Root_Type)
 or else Is_Generic_Type (P_Base_Type)
+or else (Present (Associated_Node_For_Itype (P_Base_Type))
+  and then Is_Generic_Type (Defining_Identifier
+ (Associated_Node_For_Itype (P_Base_Type
   then
+ Set_Is_Static_Expression (N, False);
  return;
   end if;
 
-- 
2.43.2



[COMMITTED 06/35] ada: Reuse existing expression when rewriting aspects to pragmas

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Specification): Consistently
reuse existing constant where possible.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index ce9f15c1491..00392ae88eb 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -1838,7 +1838,7 @@ package body Sem_Ch13 is
Make_Pragma_Argument_Association (Loc,
  Expression => Conv),
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc;
+ Expression => Ent)));
 
   Decorate (Aspect, Aitem);
   Insert_Pragma (Aitem);
@@ -3099,7 +3099,7 @@ package body Sem_Ch13 is
   Aitem := Make_Aitem_Pragma
 (Pragma_Argument_Associations => New_List (
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc)),
+ Expression => Ent),
Make_Pragma_Argument_Association (Sloc (Expr),
  Expression => Relocate_Node (Expr))),
  Pragma_Name  => Name_Linker_Section);
@@ -3120,7 +3120,7 @@ package body Sem_Ch13 is
   Aitem := Make_Aitem_Pragma
 (Pragma_Argument_Associations => New_List (
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc)),
+ Expression => Ent),
Make_Pragma_Argument_Association (Sloc (Expr),
  Expression => Relocate_Node (Expr))),
  Pragma_Name  => Name_Implemented);
@@ -3439,7 +3439,7 @@ package body Sem_Ch13 is
Make_Pragma_Argument_Association (Loc,
  Expression => Relocate_Node (Expr)),
Make_Pragma_Argument_Association (Sloc (Expr),
- Expression => New_Occurrence_Of (E, Loc))),
+ Expression => Ent)),
  Pragma_Name  => Nam);
 
   Delay_Required := False;
@@ -3452,7 +3452,7 @@ package body Sem_Ch13 is
Make_Pragma_Argument_Association (Sloc (Expr),
  Expression => Relocate_Node (Expr)),
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc))),
+ Expression => Ent)),
  Pragma_Name  => Name_Warnings);
 
   Decorate (Aspect, Aitem);
-- 
2.43.2



[COMMITTED 15/35] ada: Fix resolving tagged operations in array aggregates

2024-05-16 Thread Marc Poulhiès
From: Viljar Indus 

In the Two_Pass_Aggregate_Expansion we were removing
all of the entity links in the Iterator_Specification
to avoid reusing the same Iterator_Definition in both
loops.

However this approach was also breaking the links to
calls with dot notation that had been transformed to
the regular call notation.

In order to circumvent this, explicitly create new
identifier definitions when copying the
Iterator_Specfications for both of the loops.

gcc/ada/

* exp_aggr.adb (Two_Pass_Aggregate_Expansion):
Explicitly create new Defining_Iterators for both
of the loops.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index bdaca4aab58..f04dba719d9 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -5714,6 +5714,7 @@ package body Exp_Aggr is
  Iter : Node_Id;
  New_Comp : Node_Id;
  One_Loop : Node_Id;
+ Iter_Id  : Entity_Id;
 
  Size_Expr_Code : List_Id;
  Insertion_Code : List_Id := New_List;
@@ -5730,6 +5731,7 @@ package body Exp_Aggr is
 
  while Present (Assoc) loop
 Iter := Iterator_Specification (Assoc);
+Iter_Id := Defining_Identifier (Iter);
 Incr := Make_Assignment_Statement (Loc,
   Name => New_Occurrence_Of (Size_Id, Loc),
   Expression =>
@@ -5737,10 +5739,16 @@ package body Exp_Aggr is
  Left_Opnd  => New_Occurrence_Of (Size_Id, Loc),
  Right_Opnd => Make_Integer_Literal (Loc, 1)));
 
+--  Avoid using the same iterator definition in both loops by
+--  creating a new iterator for each loop and mapping it over the
+--  original iterator references.
+
 One_Loop := Make_Implicit_Loop_Statement (N,
   Iteration_Scheme =>
 Make_Iteration_Scheme (Loc,
-  Iterator_Specification => New_Copy_Tree (Iter)),
+  Iterator_Specification =>
+ New_Copy_Tree (Iter,
+Map => New_Elmt_List (Iter_Id, New_Copy (Iter_Id,
 Statements => New_List (Incr));
 
 Append (One_Loop, Size_Expr_Code);
@@ -5837,6 +5845,7 @@ package body Exp_Aggr is
 
  while Present (Assoc) loop
 Iter := Iterator_Specification (Assoc);
+Iter_Id := Defining_Identifier (Iter);
 New_Comp := Make_Assignment_Statement (Loc,
Name =>
  Make_Indexed_Component (Loc,
@@ -5869,10 +5878,16 @@ package body Exp_Aggr is
   Attribute_Name => Name_Last)),
Then_Statements => New_List (Incr));
 
+--  Avoid using the same iterator definition in both loops by
+--  creating a new iterator for each loop and mapping it over the
+--  original iterator references.
+
 One_Loop := Make_Implicit_Loop_Statement (N,
   Iteration_Scheme =>
 Make_Iteration_Scheme (Loc,
-  Iterator_Specification => Copy_Separate_Tree (Iter)),
+  Iterator_Specification =>
+ New_Copy_Tree (Iter,
+Map => New_Elmt_List (Iter_Id, New_Copy (Iter_Id,
 Statements => New_List (New_Comp, Incr));
 
 Append (One_Loop, Insertion_Code);
-- 
2.43.2



[COMMITTED 23/35] ada: Improve recovery from illegal occurrence of 'Old in if_expression

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix assertion failure in developer builds which happened when the THEN
expression contains an illegal occurrence of 'Old and the type of the
THEN expression is left as Any_Type, but there is no ELSE expression.

gcc/ada/

* sem_ch4.adb (Analyze_If_Expression): Add guard for
if_expression without an ELSE part.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index b4414a3f7ff..03364dade9f 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -2645,7 +2645,7 @@ package body Sem_Ch4 is
  ("\ELSE expression has}!", Else_Expr, Etype (Else_Expr));
 end if;
 
- else
+ elsif Present (Else_Expr) then
 if Is_Overloaded (Else_Expr) then
Error_Msg_N
  ("no interpretation compatible with type of THEN expression",
-- 
2.43.2



[COMMITTED 20/35] ada: Fix comments about Get_Ranged_Checks

2024-05-16 Thread Marc Poulhiès
From: Ronan Desplanques 

Checks.Get_Ranged_Checks was onced named Range_Check, and a few
comments referred to it by that name before this commit. To avoid
confusion with Types.Range_Check, this commits fixes those comments.

gcc/ada/

* checks.ads: Fix comments.
* checks.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 4 ++--
 gcc/ada/checks.ads | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index c81482a7b05..4e3eb502706 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -346,7 +346,7 @@ package body Checks is
   Warn_Node  : Node_Id) return Check_Result;
--  Like Apply_Selected_Length_Checks, except it doesn't modify
--  anything, just returns a list of nodes as described in the spec of
-   --  this package for the Range_Check function.
+   --  this package for the Get_Range_Checks function.
--  ??? In fact it does construct the test and insert it into the tree,
--  and insert actions in various ways (calling Insert_Action directly
--  in particular) so we do not call it in GNATprove mode, contrary to
@@ -359,7 +359,7 @@ package body Checks is
   Warn_Node  : Node_Id) return Check_Result;
--  Like Apply_Range_Check, except it does not modify anything, just
--  returns a list of nodes as described in the spec of this package
-   --  for the Range_Check function.
+   --  for the Get_Range_Checks function.
 
--
-- Access_Checks_Suppressed --
diff --git a/gcc/ada/checks.ads b/gcc/ada/checks.ads
index 36b5fa490fe..010627c3b03 100644
--- a/gcc/ada/checks.ads
+++ b/gcc/ada/checks.ads
@@ -980,7 +980,7 @@ package Checks is
 private
 
type Check_Result is array (Positive range 1 .. 2) of Node_Id;
-   --  There are two cases for the result returned by Range_Check:
+   --  There are two cases for the result returned by Get_Range_Checks:
--
--For the static case the result is one or two nodes that should cause
--a Constraint_Error. Typically these will include Expr itself or the
-- 
2.43.2



[COMMITTED 12/35] ada: Fix crash with -gnatdJ and -gnatw_q

2024-05-16 Thread Marc Poulhiès
From: Ronan Desplanques 

This commit makes the emission of -gnatw_q warnings pass node information
so as to handle the enclosing subprogram display of -gnatdJ instead of
crashing.

gcc/ada/

* exp_ch4.adb (Expand_Composite_Equality): Call Error_Msg_N
instead of Error_Msg.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 762e75616a7..7a2003691ec 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -2340,12 +2340,12 @@ package body Exp_Ch4 is
pragma Assert
  (Is_First_Subtype (Outer_Type)
or else Is_Generic_Actual_Type (Outer_Type));
-   Error_Msg_Node_1 := Outer_Type;
Error_Msg_Node_2 := Comp_Type;
-   Error_Msg
- ("?_q?""="" for type & uses predefined ""="" for }", Loc);
+   Error_Msg_N
+ ("?_q?""="" for type & uses predefined ""="" for }",
+  Outer_Type);
Error_Msg_Sloc := Sloc (Op);
-   Error_Msg ("\?_q?""="" # is ignored here", Loc);
+   Error_Msg_N ("\?_q?""="" # is ignored here", Outer_Type);
 end if;
  end;
 
-- 
2.43.2



[COMMITTED 07/35] ada: Remove Aspect_Specifications field from N_Procedure_Specification

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Sync Has_Aspect_Specifications_Flag with the actual flags in the AST.
Code cleanup; behavior is unaffected.

gcc/ada/

* gen_il-gen-gen_nodes.adb (N_Procedure_Specification): Remove
Aspect_Specifications field.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gen_il-gen-gen_nodes.adb | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
index f3dc215673a..a7021dc49bb 100644
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -736,7 +736,6 @@ begin -- Gen_IL.Gen.Gen_Nodes
 Sy (Null_Present, Flag),
 Sy (Must_Override, Flag),
 Sy (Must_Not_Override, Flag),
-Sy (Aspect_Specifications, List_Id, Default_No_List),
 Sm (Null_Statement, Node_Id)));
 
Ab (N_Access_To_Subprogram_Definition, Node_Kind);
-- 
2.43.2



[COMMITTED 21/35] ada: Fix detection of if_expressions that are known on entry

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix a small glitch in routine Is_Known_On_Entry, which returned False
for all if_expressions, regardless whether their conditions or dependent
expressions are known on entry.

gcc/ada/

* sem_util.adb (Is_Known_On_Entry): Check whether condition and
dependent expressions of an if_expression are known on entry.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 68e131db606..766cabfc109 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -30784,9 +30784,7 @@ package body Sem_Util is
   return Is_Known_On_Entry (Expression (Expr));
 
when N_If_Expression =>
-  if not All_Exps_Known_On_Entry (Expressions (Expr)) then
- return False;
-  end if;
+  return All_Exps_Known_On_Entry (Expressions (Expr));
 
when N_Case_Expression =>
   if not Is_Known_On_Entry (Expression (Expr)) then
-- 
2.43.2



[COMMITTED 16/35] ada: Fix latent alignment issue for dynamically-allocated controlled objects

2024-05-16 Thread Marc Poulhiès
From: Eric Botcazou 

Dynamically-allocated controlled objects are attached to a finalization
collection by means of a hidden header placed right before the object,
which means that the size effectively allocated must naturally account
for the size of this header.  But the allocation must also account for
the alignment of this header in order to have it properly aligned.

gcc/ada/

* libgnat/s-finpri.ads (Header_Alignment): New function.
(Header_Size): Adjust description.
(Master_Node): Put Finalize_Address as first component.
(Collection_Node): Likewise.
* libgnat/s-finpri.adb (Header_Alignment): New function.
(Header_Size): Return the object size in storage units.
* libgnat/s-stposu.ads (Adjust_Controlled_Dereference): Replace
collection node with header in description.
* libgnat/s-stposu.adb (Adjust_Controlled_Dereference): Likewise.
(Allocate_Any_Controlled): Likewise.  Pass the maximum of the
specified alignment and that of the header to the allocator.
(Deallocate_Any_Controlled): Likewise to the deallocator.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 11 +-
 gcc/ada/libgnat/s-finpri.ads | 21 +++
 gcc/ada/libgnat/s-stposu.adb | 69 +---
 gcc/ada/libgnat/s-stposu.ads |  2 +-
 4 files changed, 66 insertions(+), 37 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index 09f2761a5b9..5bd8eeaea22 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -389,13 +389,22 @@ package body System.Finalization_Primitives is
   end if;
end Finalize_Object;
 
+   --
+   -- Header_Alignment --
+   --
+
+   function Header_Alignment return System.Storage_Elements.Storage_Count is
+   begin
+  return Collection_Node'Alignment;
+   end Header_Alignment;
+
-
-- Header_Size --
-
 
function Header_Size return System.Storage_Elements.Storage_Count is
begin
-  return Collection_Node'Size / Storage_Unit;
+  return Collection_Node'Object_Size / Storage_Unit;
end Header_Size;
 

diff --git a/gcc/ada/libgnat/s-finpri.ads b/gcc/ada/libgnat/s-finpri.ads
index 4ba13dadec0..468aa584958 100644
--- a/gcc/ada/libgnat/s-finpri.ads
+++ b/gcc/ada/libgnat/s-finpri.ads
@@ -168,8 +168,11 @@ package System.Finalization_Primitives with Preelaborate is
--  Calls to the procedure with an object that has already been detached
--  have no effects.
 
+   function Header_Alignment return System.Storage_Elements.Storage_Count;
+   --  Return the alignment of type Collection_Node as Storage_Count
+
function Header_Size return System.Storage_Elements.Storage_Count;
-   --  Return the size of type Collection_Node as Storage_Count
+   --  Return the object size of type Collection_Node as Storage_Count
 
 private
 
@@ -182,11 +185,13 @@ private
 
--  Finalization masters:
 
-   --  Master node type structure
+   --  Master node type structure. Finalize_Address comes first because it is
+   --  an access-to-subprogram and, therefore, might be twice as large and as
+   --  aligned as an access-to-object on some platforms.
 
type Master_Node is record
-  Object_Address   : System.Address   := System.Null_Address;
   Finalize_Address : Finalize_Address_Ptr := null;
+  Object_Address   : System.Address   := System.Null_Address;
   Next : Master_Node_Ptr  := null;
end record;
 
@@ -211,15 +216,17 @@ private
 
--  Finalization collections:
 
-   --  Collection node type structure
+   --  Collection node type structure. Finalize_Address comes first because it
+   --  is an access-to-subprogram and, therefore, might be twice as large and
+   --  as aligned as an access-to-object on some platforms.
 
type Collection_Node is record
-  Enclosing_Collection : Finalization_Collection_Ptr := null;
-  --  A pointer to the collection to which the node is attached
-
   Finalize_Address : Finalize_Address_Ptr := null;
   --  A pointer to the Finalize_Address procedure of the object
 
+  Enclosing_Collection : Finalization_Collection_Ptr := null;
+  --  A pointer to the collection to which the node is attached
+
   Prev : Collection_Node_Ptr := null;
   Next : Collection_Node_Ptr := null;
   --  Collection nodes are managed as a circular doubly-linked list
diff --git a/gcc/ada/libgnat/s-stposu.adb b/gcc/ada/libgnat/s-stposu.adb
index 38dc69f976a..84535d2a506 100644
--- a/gcc/ada/libgnat/s-stposu.adb
+++ b/gcc/ada/libgnat/s-stposu.adb
@@ -56,12 +56,12 @@ package body System.Storage_Pools.Subpools is
   Header_And_Padding : constant Storage_Offset :=
  Header_Size_With_Padding (Alignment);
begin
-  --  Expose the collection node and its padding by shifting 

[COMMITTED 14/35] ada: Fix bogus error on function returning noncontrolling result in private part

2024-05-16 Thread Marc Poulhiès
From: Eric Botcazou 

This occurs in the additional case of RM 3.9.3(10) in Ada 2012, that is to
say the access controlling result, because the implementation does not use
the same (correct) conditions as in the original case.

This factors out these conditions and uses them in both cases, as well as
adjusts the wording of the message in the first case.

gcc/ada/

* sem_ch6.adb (Check_Private_Overriding): Implement the second part
of RM 3.9.3(10) consistently in both cases.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch6.adb | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index c0bfe873111..0a8030cb923 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -11555,35 +11555,30 @@ package body Sem_Ch6 is
   Incomplete_Or_Partial_View (T);
 
   begin
- if not Overrides_Visible_Function (Partial_View) then
+ if not Overrides_Visible_Function (Partial_View)
+   and then
+ Is_Tagged_Type
+   (if Present (Partial_View) then Partial_View else T)
+ then
 
 --  Here, S is "function ... return T;" declared in
 --  the private part, not overriding some visible
 --  operation. That's illegal in the tagged case
 --  (but not if the private type is untagged).
 
-if ((Present (Partial_View)
-  and then Is_Tagged_Type (Partial_View))
-  or else (No (Partial_View)
-and then Is_Tagged_Type (T)))
-  and then T = Base_Type (Etype (S))
-then
+if T = Base_Type (Etype (S)) then
Error_Msg_N
- ("private function with tagged result must"
+ ("private function with controlling result must"
   & " override visible-part function", S);
Error_Msg_N
  ("\move subprogram to the visible part"
   & " (RM 3.9.3(10))", S);
 
 --  Ada 2012 (AI05-0073): Extend this check to the case
---  of a function whose result subtype is defined by an
---  access_definition designating specific tagged type.
+--  of a function with access result type.
 
 elsif Ekind (Etype (S)) = E_Anonymous_Access_Type
-  and then Is_Tagged_Type (Designated_Type (Etype (S)))
-  and then
-not Is_Class_Wide_Type
-  (Designated_Type (Etype (S)))
+  and then T = Base_Type (Designated_Type (Etype (S)))
   and then Ada_Version >= Ada_2012
 then
Error_Msg_N
-- 
2.43.2



[COMMITTED 11/35] ada: Follow up fixes for Put_Image/streaming regressions

2024-05-16 Thread Marc Poulhiès
From: Steve Baird 

A recent change to reduce duplication of compiler-generated Put_Image and
streaming subprograms introduced some regressions. The fix for one of them
was incomplete.

gcc/ada/

* exp_attr.adb (Build_And_Insert_Type_Attr_Subp): Further tweaking
of the point where a compiler-generated Put_Image or streaming
subprogram is to be inserted in the tree. If one such subprogram
calls another (as is often the case with, for example, Put_Image
procedures for composite type and for a component type thereof),
then we want to avoid use-before-definition problems that can
result from inserting the caller ahead of the callee.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_attr.adb | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index e12e8b4a439..03bf4cf329c 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -1954,6 +1954,44 @@ package body Exp_Attr is
 while Present (Ancestor) loop
if Is_List_Member (Ancestor) then
   Insertion_Point := First (List_Containing (Ancestor));
+
+  --  A hazard to avoid here is use-before-definition
+  --  errors that can result when we have two of these
+  --  subprograms where one calls the other (e.g., given
+  --  Put_Image procedures for a composite type and
+  --  for a component type, the former will often call
+  --  the latter). At the time a subprogram is inserted,
+  --  we know that the one and only call to it is
+  --  somewhere in the subtree rooted at Ancestor.
+  --  So that placement constraint is easy to satisfy.
+  --  But if we construct another subprogram later and
+  --  if that second subprogram calls the first one,
+  --  then we need to be careful not to place the
+  --  second one ahead of the first one. That is the goal
+  --  of this loop. This may need to be revised if it turns
+  --  out that other stuff is being inserted on the list,
+  --  so that the loop terminates too early.
+
+  --  On the other hand, it seems like inserting things
+  --  earlier offers more opportunities for sharing.
+  --  If Ancestor occurs in the statement list of a
+  --  subprogram body (ignore the HSS node for now),
+  --  then perhaps we should look for an insertion site
+  --  in the decl list of the subprogram body and only
+  --  look in the statement list if the decl list is empty.
+  --  Similarly if Ancestor occors in the private decls list
+  --  for a package spec that has a non-empty visible
+  --  decls list. No examples where this would result in more
+  --  sharing and less duplication have been observed, so this
+  --  is just speculation.
+
+  while Insertion_Point /= Ancestor
+and then Nkind (Insertion_Point) = N_Subprogram_Body
+and then not Comes_From_Source (Insertion_Point)
+  loop
+ Next (Insertion_Point);
+  end loop;
+
   pragma Assert (Present (Insertion_Point));
end if;
Ancestor := Parent (Ancestor);
-- 
2.43.2



[COMMITTED 25/35] ada: Fix reason code for length check

2024-05-16 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch fixes the reason code used by Apply_Selected_Length_Checks,
which was wrong in some cases when the check could be determined to
always fail at compile time.

gcc/ada/

* checks.adb (Apply_Selected_Length_Checks): Fix reason code.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 4e3eb502706..6af392eeda8 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -322,7 +322,8 @@ package body Checks is
--  that the access value is non-null, since the checks do not
--  not apply to null access values.
 
-   procedure Install_Static_Check (R_Cno : Node_Id; Loc : Source_Ptr);
+   procedure Install_Static_Check
+ (R_Cno : Node_Id; Loc : Source_Ptr; Reason : RT_Exception_Code);
--  Called by Apply_{Length,Range}_Checks to rewrite the tree with the
--  Constraint_Error node.
 
@@ -3001,7 +3002,7 @@ package body Checks is
 Insert_Action (Insert_Node, R_Cno);
 
  else
-Install_Static_Check (R_Cno, Loc);
+Install_Static_Check (R_Cno, Loc, CE_Range_Check_Failed);
  end if;
   end loop;
end Apply_Range_Check;
@@ -3469,7 +3470,7 @@ package body Checks is
 end if;
 
  else
-Install_Static_Check (R_Cno, Loc);
+Install_Static_Check (R_Cno, Loc, CE_Length_Check_Failed);
  end if;
   end loop;
end Apply_Selected_Length_Checks;
@@ -8692,14 +8693,16 @@ package body Checks is
-- Install_Static_Check --
--
 
-   procedure Install_Static_Check (R_Cno : Node_Id; Loc : Source_Ptr) is
+   procedure Install_Static_Check
+ (R_Cno : Node_Id; Loc : Source_Ptr; Reason : RT_Exception_Code)
+   is
   Stat : constant Boolean   := Is_OK_Static_Expression (R_Cno);
   Typ  : constant Entity_Id := Etype (R_Cno);
 
begin
   Rewrite (R_Cno,
 Make_Raise_Constraint_Error (Loc,
-  Reason => CE_Range_Check_Failed));
+  Reason => Reason));
   Set_Analyzed (R_Cno);
   Set_Etype (R_Cno, Typ);
   Set_Raises_Constraint_Error (R_Cno);
-- 
2.43.2



[COMMITTED 19/35] ada: Minor performance improvement for dynamically-allocated controlled objects

2024-05-16 Thread Marc Poulhiès
From: Eric Botcazou 

The values returned by Header_Alignment and Header_Size are known at compile
time and powers of two on almost all platforms, so inlining them by means of
an expression function improves the object code generated for alignment and
size calculations involving them.

gcc/ada/

* libgnat/s-finpri.ads: Add use type clause for Storage_Offset.
(Header_Alignment): Turn into an expression function.
(Header_Size): Likewise.
* libgnat/s-finpri.adb: Remove use type clause for Storage_Offset.
(Header_Alignment): Delete.
(Header_Size): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 20 
 gcc/ada/libgnat/s-finpri.ads |  8 ++--
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index 5bd8eeaea22..bd70e582de3 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -37,8 +37,6 @@ with System.Soft_Links;use System.Soft_Links;
 
 package body System.Finalization_Primitives is
 
-   use type System.Storage_Elements.Storage_Offset;
-
function To_Collection_Node_Ptr is
  new Ada.Unchecked_Conversion (Address, Collection_Node_Ptr);
 
@@ -389,24 +387,6 @@ package body System.Finalization_Primitives is
   end if;
end Finalize_Object;
 
-   --
-   -- Header_Alignment --
-   --
-
-   function Header_Alignment return System.Storage_Elements.Storage_Count is
-   begin
-  return Collection_Node'Alignment;
-   end Header_Alignment;
-
-   -
-   -- Header_Size --
-   -
-
-   function Header_Size return System.Storage_Elements.Storage_Count is
-   begin
-  return Collection_Node'Object_Size / Storage_Unit;
-   end Header_Size;
-

-- Initialize --

diff --git a/gcc/ada/libgnat/s-finpri.ads b/gcc/ada/libgnat/s-finpri.ads
index 468aa584958..b0b662ca39c 100644
--- a/gcc/ada/libgnat/s-finpri.ads
+++ b/gcc/ada/libgnat/s-finpri.ads
@@ -39,6 +39,8 @@ with System.Storage_Elements;
 
 package System.Finalization_Primitives with Preelaborate is
 
+   use type System.Storage_Elements.Storage_Offset;
+
type Finalize_Address_Ptr is access procedure (Obj : System.Address);
--  Values of this type denote finalization procedures associated with
--  objects that have controlled parts. For convenience, such objects
@@ -168,10 +170,12 @@ package System.Finalization_Primitives with Preelaborate 
is
--  Calls to the procedure with an object that has already been detached
--  have no effects.
 
-   function Header_Alignment return System.Storage_Elements.Storage_Count;
+   function Header_Alignment return System.Storage_Elements.Storage_Count is
+ (Collection_Node'Alignment);
--  Return the alignment of type Collection_Node as Storage_Count
 
-   function Header_Size return System.Storage_Elements.Storage_Count;
+   function Header_Size return System.Storage_Elements.Storage_Count is
+ (Collection_Node'Object_Size / Storage_Unit);
--  Return the object size of type Collection_Node as Storage_Count
 
 private
-- 
2.43.2



[COMMITTED 26/35] ada: Ignore ghost nodes in call graph information for dispatching calls

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

When emitting call graph information, we already skipped calls to
ignored ghost entities, but this code was causing crashes (in production
builds) and assertion failures (in development builds), because the
ignored ghost entities are not fully decorated, e.g. when they come from
instances of generic units with default subprograms.

With this patch we skip call graph information for ignored ghost
entities when they are registered, both as explicit calls and as
tagged types that will come with internally generated dispatching
subprograms.

gcc/ada/

* exp_cg.adb (Generate_CG_Output): Remove code for ignored ghost
entities that applied to subprogram calls.
(Register_CG_Node): Skip ignored ghost entities, both calls
and tagged types, when they are registered.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_cg.adb | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/exp_cg.adb b/gcc/ada/exp_cg.adb
index addf1cae32a..91a6d40a6fa 100644
--- a/gcc/ada/exp_cg.adb
+++ b/gcc/ada/exp_cg.adb
@@ -125,14 +125,7 @@ package body Exp_CG is
   for J in Call_Graph_Nodes.First .. Call_Graph_Nodes.Last loop
  N := Call_Graph_Nodes.Table (J);
 
- --  No action needed for subprogram calls removed by the expander
- --  (for example, calls to ignored ghost entities).
-
- if Nkind (N) = N_Null_Statement then
-pragma Assert (Nkind (Original_Node (N)) in N_Subprogram_Call);
-null;
-
- elsif Nkind (N) in N_Subprogram_Call then
+ if Nkind (N) in N_Subprogram_Call then
 Write_Call_Info (N);
 
  else pragma Assert (Nkind (N) = N_Defining_Identifier);
@@ -358,7 +351,13 @@ package body Exp_CG is
 
procedure Register_CG_Node (N : Node_Id) is
begin
-  if Nkind (N) in N_Subprogram_Call then
+  --  Skip ignored ghost calls that will be removed by the expander
+
+  if Is_Ignored_Ghost_Node (N) then
+ null;
+
+  elsif Nkind (N) in N_Subprogram_Call then
+
  if Current_Scope = Main_Unit_Entity
or else Entity_Is_In_Main_Unit (Current_Scope)
  then
-- 
2.43.2



[COMMITTED 27/35] ada: Avoid checking parameters of protected procedures

2024-05-16 Thread Marc Poulhiès
From: Viljar Indus 

The compiler triggers warnings on generated protected procedures
if the procedure does not have an explicit spec. Instead check
if the body was created for a protected procedure if the spec
is not present.

gcc/ada/

* sem_ch6.adb (Analyze_Subprogram_Body_Helper):
If the spec is not present for a subprogram body then
check if the body definiton was created for a protected
procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch6.adb | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index 0a8030cb923..ca40b5479e0 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -4971,8 +4971,11 @@ package body Sem_Ch6 is
  --  Skip the check for subprograms generated for protected subprograms
  --  because it is also done for the protected subprograms themselves.
 
- elsif Present (Spec_Id)
-   and then Present (Protected_Subprogram (Spec_Id))
+ elsif (Present (Spec_Id)
+ and then Present (Protected_Subprogram (Spec_Id)))
+   or else
+ (Acts_As_Spec (N)
+   and then Present (Protected_Subprogram (Body_Id)))
  then
 null;
 
-- 
2.43.2



[COMMITTED 18/35] ada: Fixup one more pattern of broken scope information

2024-05-16 Thread Marc Poulhiès
When an array's initialization contains a `others =>` clause with an
expression that involves finalization, the resulting scope information
is incorrect and can cause crashes with backend (i.e. gnat-llvm) that
also use unnesting. The observable symptom is a nested object
declaration (created by the compiler) within a loop wrapped in a
procedure created by the unnester that has incoherent scope information:
its Scope field points to the scope of the procedure (1 level too high)
and is contained in the entity chain of some entity nested in the
procedure (correct).

The correct solution would be to fix the scope information when it is
created, but this revealed too large of a task with many interaction
with existing code.

This change adds another pattern to the Fixup_Inner_Scopes procedure to
detect the problematic case and fix the scope, "after the facts".

gcc/ada/

* exp_ch7.adb (Unnest_Loop::Fixup_Inner_Scopes): detect a new
problematic pattern and fixup the scope accordingly.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 66 ++---
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 25a7c0b2b46..6d76572f405 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -8809,8 +8809,11 @@ package body Exp_Ch7 is
 
procedure Unnest_Loop (Loop_Stmt : Node_Id) is
 
-  procedure Fixup_Inner_Scopes (Loop_Stmt : Node_Id);
-  --  The loops created by the compiler for array aggregates can have
+  procedure Fixup_Inner_Scopes (Loop_Or_Block : Node_Id);
+  --  This procedure fixes the scope for 2 identified cases of incorrect
+  --  scope information.
+  --
+  --  1) The loops created by the compiler for array aggregates can have
   --  nested finalization procedure when the type of the array components
   --  needs finalization. It has the following form:
 
@@ -8825,7 +8828,7 @@ package body Exp_Ch7 is
   --obj (J4b) := ...;
 
   --  When the compiler creates the N_Block_Statement, it sets its scope to
-  --  the upper scope (the one containing the loop).
+  --  the outer scope (the one containing the loop).
 
   --  The Unnest_Loop procedure moves the N_Loop_Statement inside a new
   --  procedure and correctly sets the scopes for both the new procedure
@@ -8833,25 +8836,68 @@ package body Exp_Ch7 is
   --  leaves the Tree in an incoherent state (i.e. the inner procedure must
   --  have its enclosing procedure in its scope ancestries).
 
-  --  This procedure fixes the scope links.
+  --  2) The second case happens when an object declaration is created
+  --  within a loop used to initialize the 'others' components of an
+  --  aggregate that is nested within a transient scope. When the transient
+  --  scope is removed, the object scope is set to the outer scope. For
+  --  example:
+
+  --  package pack
+  --   ...
+  -- L98s : for J90s in 2 .. 19 loop
+  --B101s : declare
+  --   R92s : aliased some_type;
+  --   ...
+
+  --  The loop L98s was initially wrapped in a transient scope B72s and
+  --  R92s was nested within it. Then the transient scope is removed and
+  --  the scope of R92s is set to 'pack'. And finally, when the unnester
+  --  moves the loop body in a new procedure, R92s's scope is still left
+  --  unchanged.
+
+  --  This procedure finds the two previous patterns and fixes the scope
+  --  information.
 
   --  Another (better) fix would be to have the block scope set to be the
   --  loop entity earlier (when the block is created or when the loop gets
   --  an actual entity set). But unfortunately this proved harder to
   --  implement ???
 
-  procedure Fixup_Inner_Scopes (Loop_Stmt : Node_Id) is
- Stmt  : Node_Id:= First (Statements (Loop_Stmt));
- Loop_Stmt_Ent : constant Entity_Id := Entity (Identifier (Loop_Stmt));
- Ent_To_Fix: Entity_Id;
+  procedure Fixup_Inner_Scopes (Loop_Or_Block : Node_Id) is
+ Stmt  : Node_Id;
+ Loop_Or_Block_Ent : Entity_Id;
+ Ent_To_Fix: Entity_Id;
+ Decl  : Node_Id := Empty;
   begin
+ pragma Assert (Nkind (Loop_Or_Block) in
+   N_Loop_Statement | N_Block_Statement);
+
+ Loop_Or_Block_Ent := Entity (Identifier (Loop_Or_Block));
+ if Nkind (Loop_Or_Block) = N_Loop_Statement then
+Stmt := First (Statements (Loop_Or_Block));
+ else -- N_Block_Statement
+Stmt := First
+  (Statements (Handled_Statement_Sequence (Loop_Or_Block)));
+Decl := First (Declarations (Loop_Or_Block));
+ end if;
+
+ --  Fix scopes for any object declaration found in the block
+ while Present (Decl) loop
+

[COMMITTED 24/35] ada: Propagate Program_Error from failed finalization of collection

2024-05-16 Thread Marc Poulhiès
From: Eric Botcazou 

This aligns finalization collections with finalization masters when it comes
to propagating an exception raised by the finalization of a specific object,
by always propagating Program_Error instead of the aforementioned exception.

gcc/ada/

* libgnat/s-finpri.adb (Raise_From_Controlled_Operation): New
declaration of imported procedure moved from...
(Finalize_Master): ...there.
(Finalize): Call Raise_From_Controlled_Operation instead of
Reraise_Occurrence to propagate the exception, if any.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index bd70e582de3..89f5f2952e4 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -37,6 +37,10 @@ with System.Soft_Links;use System.Soft_Links;
 
 package body System.Finalization_Primitives is
 
+   procedure Raise_From_Controlled_Operation (X : Exception_Occurrence);
+   pragma Import (Ada, Raise_From_Controlled_Operation,
+  "__gnat_raise_from_controlled_operation");
+
function To_Collection_Node_Ptr is
  new Ada.Unchecked_Conversion (Address, Collection_Node_Ptr);
 
@@ -297,7 +301,7 @@ package body System.Finalization_Primitives is
   --  If one of the finalization actions raised an exception, reraise it
 
   if Finalization_Exception_Raised then
- Reraise_Occurrence (Exc_Occur);
+ Raise_From_Controlled_Operation (Exc_Occur);
   end if;
end Finalize;
 
@@ -306,12 +310,8 @@ package body System.Finalization_Primitives is
-
 
procedure Finalize_Master (Master : in out Finalization_Master) is
-  procedure Raise_From_Controlled_Operation (X : Exception_Occurrence);
-  pragma Import (Ada, Raise_From_Controlled_Operation,
- "__gnat_raise_from_controlled_operation");
-
-  Finalization_Exception_Raised : Boolean := False;
   Exc_Occur : Exception_Occurrence;
+  Finalization_Exception_Raised : Boolean := False;
   Node  : Master_Node_Ptr;
 
begin
-- 
2.43.2



[COMMITTED 28/35] ada: Fix standalone Windows builds of adaint.c

2024-05-16 Thread Marc Poulhiès
From: Sebastian Poeplau 

Define PATH_SEPARATOR and HOST_EXECUTABLE_SUFFIX in standalone MinGW
builds; the definitions normally come from GCC, and the defaults don't
work for native Windows.

gcc/ada/

* adaint.c: New defines for STANDALONE mode.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/adaint.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index 74aa3c4128e..f26d69a1a2a 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -242,6 +242,13 @@ UINT __gnat_current_ccs_encoding;
 #undef DIR_SEPARATOR
 #define DIR_SEPARATOR '\\'
 
+#ifdef STANDALONE
+#undef PATH_SEPARATOR
+#define PATH_SEPARATOR ';'
+#undef HOST_EXECUTABLE_SUFFIX
+#define HOST_EXECUTABLE_SUFFIX ".exe"
+#endif
+
 #else
 #include 
 #include 
-- 
2.43.2



[COMMITTED 22/35] ada: No need to follow New_Occurrence_Of with Set_Etype

2024-05-16 Thread Marc Poulhiès
From: Piotr Trojanek 

Routine New_Occurrence_Of itself sets the Etype of its result; there is
no need to set it explicitly afterwards.

Code cleanup related to fix for attribute 'Old; semantics is unaffected.

gcc/ada/

* exp_ch13.adb (Expand_N_Free_Statement): After analysis, the
new temporary has the type of its Object_Definition and the new
occurrence of this temporary has this type as well; simplify.
* sem_util.adb
(Indirect_Temp_Value): Remove redundant call to Set_Etype;
simplify.
(Is_Access_Type_For_Indirect_Temp): Add missing body header.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch13.adb |  9 ++---
 gcc/ada/sem_util.adb | 11 +++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/ada/exp_ch13.adb b/gcc/ada/exp_ch13.adb
index 2d5ee9b6e80..af8c925586c 100644
--- a/gcc/ada/exp_ch13.adb
+++ b/gcc/ada/exp_ch13.adb
@@ -358,21 +358,16 @@ package body Exp_Ch13 is
  declare
 Expr_Typ : constant Entity_Id  := Etype (Expr);
 Loc  : constant Source_Ptr := Sloc (N);
-New_Expr : Node_Id;
-Temp_Id  : Entity_Id;
+Temp_Id  : constant Entity_Id  := Make_Temporary (Loc, 'T');
 
  begin
-Temp_Id := Make_Temporary (Loc, 'T');
 Insert_Action (N,
   Make_Object_Declaration (Loc,
 Defining_Identifier => Temp_Id,
 Object_Definition   => New_Occurrence_Of (Expr_Typ, Loc),
 Expression  => Relocate_Node (Expr)));
 
-New_Expr := New_Occurrence_Of (Temp_Id, Loc);
-Set_Etype (New_Expr, Expr_Typ);
-
-Set_Expression (N, New_Expr);
+Set_Expression (N, New_Occurrence_Of (Temp_Id, Loc));
  end;
   end if;
 
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 766cabfc109..5ebb1319de7 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -31081,8 +31081,7 @@ package body Sem_Util is
  begin
 if Is_Anonymous_Access_Type (Typ) then
--  No indirection in this case; just evaluate the temp.
-   Result := New_Occurrence_Of (Temp, Loc);
-   Set_Etype (Result, Etype (Temp));
+   return New_Occurrence_Of (Temp, Loc);
 
 else
Result := Make_Explicit_Dereference (Loc,
@@ -31101,11 +31100,15 @@ package body Sem_Util is
 
   Set_Etype (Result, Typ);
end if;
-end if;
 
-return Result;
+   return Result;
+end if;
  end Indirect_Temp_Value;
 
+ --
+ -- Is_Access_Type_For_Indirect_Temp --
+ --
+
  function Is_Access_Type_For_Indirect_Temp
(T : Entity_Id) return Boolean is
  begin
-- 
2.43.2



[COMMITTED 30/35] ada: Fix reference to RM clause in comment

2024-05-16 Thread Marc Poulhiès
From: Ronan Desplanques 

gcc/ada/

* sem_util.ads (Check_Function_Writable_Actuals): Fix comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.ads | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
index 527b1075c3f..99c60ddf708 100644
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -373,7 +373,7 @@ package Sem_Util is
--  call C2 (not including the construct N itself), there is no other name
--  anywhere within a direct constituent of the construct C other than
--  the one containing C2, that is known to refer to the same object (RM
-   --  6.4.1(6.17/3)).
+   --  6.4.1(6.18-6.19)).
 
procedure Check_Implicit_Dereference (N : Node_Id; Typ : Entity_Id);
--  AI05-139-2: Accessors and iterators for containers. This procedure
-- 
2.43.2



[COMMITTED 29/35] ada: Fix missing length checks with case expressions

2024-05-16 Thread Marc Poulhiès
From: Ronan Desplanques 

This fixes an issue where length checks were not generated when the
right-hand side of an assigment involved a case expression.

gcc/ada/

* sem_res.adb (Resolve_Case_Expression): Add length check
insertion.
* exp_ch4.adb (Expand_N_Case_Expression): Add handling of nodes
known to raise Constraint_Error.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 18 ++
 gcc/ada/sem_res.adb |  3 +++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 7a2003691ec..448cd5c82b6 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5098,10 +5098,20 @@ package body Exp_Ch4 is
 
 else
if not Is_Copy_Type (Typ) then
-  Alt_Expr :=
-Make_Attribute_Reference (Alt_Loc,
-  Prefix => Relocate_Node (Alt_Expr),
-  Attribute_Name => Name_Unrestricted_Access);
+  --  It's possible that a call to Apply_Length_Check in
+  --  Resolve_Case_Expression rewrote the dependent expression
+  --  into a N_Raise_Constraint_Error. If that's the case, we
+  --  don't create a reference to Unrestricted_Access, but we
+  --  update the type of the N_Raise_Constraint_Error node.
+
+  if Nkind (Alt_Expr) in N_Raise_Constraint_Error then
+ Set_Etype (Alt_Expr, Target_Typ);
+  else
+ Alt_Expr :=
+   Make_Attribute_Reference (Alt_Loc,
+ Prefix => Relocate_Node (Alt_Expr),
+ Attribute_Name => Name_Unrestricted_Access);
+  end if;
end if;
 
LHS := New_Occurrence_Of (Target, Loc);
diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index 85795ba3a05..d2eca7c5459 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -7438,6 +7438,9 @@ package body Sem_Res is
  if Is_Scalar_Type (Alt_Typ) and then Alt_Typ /= Typ then
 Rewrite (Alt_Expr, Convert_To (Typ, Alt_Expr));
 Analyze_And_Resolve (Alt_Expr, Typ);
+
+ elsif Is_Array_Type (Typ) then
+Apply_Length_Check (Alt_Expr, Typ);
  end if;
 
  Next (Alt);
-- 
2.43.2



[COMMITTED 32/35] ada: Exception on Indefinite_Vector aggregate with loop_parameter_specification

2024-05-16 Thread Marc Poulhiès
From: Gary Dismukes 

Constraint_Error is raised on evaluation of a container aggregate with
a loop_parameter_specification for the type Indefinite_Vector. This
happens due to the Aggregate aspect for type Indefinite_Vector specifying
the Empty_Vector constant for the type's Empty operation rather than
using the type's primitive Empty function. This problem shows up as
a recent regression relative to earlier compilers, evidently due to
recent fixes in the container aggregate area, which uncovered this
issue of the wrong specification in Ada.Containers.Indefinite_Vectors.
The compiler incorrectly initializes the aggregate object using the
Empty_Vector constant rather than invoking the New_Vector function
to allocate the vector object with the appropriate number of elements,
and subsequent calls to Replace_Element fail because the vector object
is empty.

In addition to correcting the Indefinite_Vectors generic package,
checking is added to give an error for an attempt to specify the
Empty operation as a constant rather than a function. (Also note
that another AdaCore package that needs a similar correction is
the VSS.Vector_Strings package.)

gcc/ada/

* libgnat/a-coinve.ads (type Vector): In the Aggregate aspect for
this type, the Empty operation is changed to denote the Empty
function, rather than the Empty_Vector constant.
* exp_aggr.adb (Expand_Container_Aggregate): Remove code for
handling the case where the Empty_Subp denotes a constant object,
which should never happen (and add an assertion that Empty_Subp
must denote a function).
* sem_ch13.adb (Valid_Empty): No longer allow the entity to be an
E_Constant, and require the (optional) parameter of an Empty
function to be of a signed integer type (rather than any integer
type).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 24 +---
 gcc/ada/libgnat/a-coinve.ads |  2 +-
 gcc/ada/sem_ch13.adb |  5 +
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index f04dba719d9..5d2b334722a 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -7119,10 +7119,12 @@ package body Exp_Aggr is
  Append (Init_Stat, Aggr_Code);
 
   --  The container will grow dynamically. Create a declaration for
-  --  the object, and initialize it either from a call to the Empty
-  --  function, or from the Empty constant.
+  --  the object, and initialize it from a call to the parameterless
+  --  Empty function.
 
   else
+ pragma Assert (Ekind (Entity (Empty_Subp)) = E_Function);
+
  Decl :=
Make_Object_Declaration (Loc,
  Defining_Identifier => Temp,
@@ -7130,20 +7132,12 @@ package body Exp_Aggr is
 
  Insert_Action (N, Decl);
 
- --  The Empty entity is either a parameterless function, or
- --  a constant.
-
- if Ekind (Entity (Empty_Subp)) = E_Function then
-Init_Stat := Make_Assignment_Statement (Loc,
-  Name => New_Occurrence_Of (Temp, Loc),
-  Expression => Make_Function_Call (Loc,
-Name => New_Occurrence_Of (Entity (Empty_Subp), Loc)));
+ --  The Empty entity is a parameterless function
 
- else
-Init_Stat := Make_Assignment_Statement (Loc,
-  Name => New_Occurrence_Of (Temp, Loc),
-  Expression => New_Occurrence_Of (Entity (Empty_Subp), Loc));
- end if;
+ Init_Stat := Make_Assignment_Statement (Loc,
+   Name => New_Occurrence_Of (Temp, Loc),
+   Expression => Make_Function_Call (Loc,
+ Name => New_Occurrence_Of (Entity (Empty_Subp), Loc)));
 
  Append (Init_Stat, Aggr_Code);
   end if;
diff --git a/gcc/ada/libgnat/a-coinve.ads b/gcc/ada/libgnat/a-coinve.ads
index 138ec3641c3..c51ec8aa06d 100644
--- a/gcc/ada/libgnat/a-coinve.ads
+++ b/gcc/ada/libgnat/a-coinve.ads
@@ -63,7 +63,7 @@ is
  Variable_Indexing => Reference,
  Default_Iterator  => Iterate,
  Iterator_Element  => Element_Type,
- Aggregate => (Empty  => Empty_Vector,
+ Aggregate => (Empty  => Empty,
Add_Unnamed=> Append,
New_Indexed=> New_Vector,
Assign_Indexed => Replace_Element);
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 00392ae88eb..13bf93ca548 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -16527,13 +16527,10 @@ package body Sem_Ch13 is
  if Etype (E) /= Typ or else Scope (E) /= Scope (Typ) then
 return False;
 
- elsif Ekind (E) = E_Constant then
-return True;
-
  elsif Ekind (E) = E_Function then
 return No (First_Formal (E))
   or else
-(Is_Integer_Type (Etype

[COMMITTED 33/35] ada: Redundant validity checks

2024-05-16 Thread Marc Poulhiès
From: Steve Baird 

In some cases with validity checking enabled via the -gnatVa option,
the compiler generates validity checks that can (obviously) never fail.
These include validity checks for (some) static expressions, and consecutive
identical checks generated for a single read of an object.

gcc/ada/

* checks.adb (Expr_Known_Valid): Return True for a static expression.
* exp_util.adb (Adjust_Condition): No validity check needed for a
condition if it is an expression for which a validity check has
already been generated.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb   | 3 +++
 gcc/ada/exp_util.adb | 5 +
 2 files changed, 8 insertions(+)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 6af392eeda8..bada3dffcbf 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -6839,6 +6839,9 @@ package body Checks is
   then
  return True;
 
+  elsif Is_Static_Expression (Expr) then
+ return True;
+
   --  If the expression is the value of an object that is known to be
   --  valid, then clearly the expression value itself is valid.
 
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 057cf3ebc48..b71f7739481 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -416,6 +416,11 @@ package body Exp_Util is
  if Validity_Checks_On
and then
  (Validity_Check_Tests or else Is_Hardbool_Type (T))
+
+   --  no check needed here if validity has already been checked
+   and then not
+ (Validity_Check_Operands and then
+   (Nkind (N) in N_Op or else Nkind (Parent (N)) in N_Op))
  then
 Ensure_Valid (N);
  end if;
-- 
2.43.2



[COMMITTED 34/35] ada: Reset scope of top level object declaration during unnesting

2024-05-16 Thread Marc Poulhiès
When unnesting, the compiler gathers elaboration code and wraps it with
a new dedicated procedure. While doing so, it resets the scopes of
entities that are wrapped to point to this new procedure. This change
also resets the scopes of N_Object_Declaration and
N_Object_Renaming_Declaration nodes only if an elaboration procedure
is needed.

gcc/ada/

* exp_ch7.adb (Reset_Scopes_To_Block_Elab_Proc): also reset scope
for object declarations.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 6d76572f405..f9738e115f9 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -3646,9 +3646,10 @@ package body Exp_Ch7 is
   --  unnesting actions, which depend on proper setting of the Scope links
   --  to determine the nesting level of each subprogram.
 
-  ---
-  --  Find_Local_Scope --
-  ---
+  --
+  --  Reset_Scopes_To_Block_Elab_Proc --
+  --
+  Maybe_Reset_Scopes_For_Decl : constant Elist_Id := New_Elmt_List;
 
   procedure Reset_Scopes_To_Block_Elab_Proc (L : List_Id) is
  Id   : Entity_Id;
@@ -3707,7 +3708,8 @@ package body Exp_Ch7 is
  Next (Node);
   end loop;
 
-   --  Reset the Scope of a subprogram occurring at the top level
+   --  Reset the Scope of a subprogram and object declaration
+   --  occurring at the top level
 
when N_Subprogram_Body =>
   Id := Defining_Entity (Stat);
@@ -3715,12 +3717,33 @@ package body Exp_Ch7 is
   Set_Block_Elab_Proc;
   Set_Scope (Id, Block_Elab_Proc);
 
+   when N_Object_Declaration
+ | N_Object_Renaming_Declaration =>
+  Id := Defining_Entity (Stat);
+  if No (Block_Elab_Proc) then
+ Append_Elmt (Id, Maybe_Reset_Scopes_For_Decl);
+  else
+ Set_Scope (Id, Block_Elab_Proc);
+  end if;
+
when others =>
   null;
 end case;
 
 Next (Stat);
  end loop;
+
+ --  If we are creating an Elab procedure, move all the gathered
+ --  declarations in its scope.
+
+ if Present (Block_Elab_Proc) then
+while not Is_Empty_Elmt_List (Maybe_Reset_Scopes_For_Decl) loop
+   Set_Scope
+ (Elists.Node
+   (Last_Elmt (Maybe_Reset_Scopes_For_Decl)), Block_Elab_Proc);
+   Remove_Last_Elmt (Maybe_Reset_Scopes_For_Decl);
+end loop;
+ end if;
   end Reset_Scopes_To_Block_Elab_Proc;
 
   --  Local variables
-- 
2.43.2



[COMMITTED 35/35] ada: Remove obsolete reference in comment

2024-05-16 Thread Marc Poulhiès
From: Eric Botcazou 

gcc/ada/

* exp_ch7.adb (Attach_Object_To_Master_Node): Remove reference to a
transient object in comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index f9738e115f9..993c13c7318 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -798,10 +798,10 @@ package body Exp_Ch7 is
  return;
   end if;
 
-  --  When the transient object is initialized by an aggregate, the
-  --  attachment must occur after the last aggregate assignment takes
-  --  place. Only then is the object considered initialized. Likewise
-  --  if we have a build-in-place call: we must attach only after it.
+  --  When the object is initialized by an aggregate, the attachment must
+  --  occur after the last aggregate assignment takes place; only then is
+  --  the object considered initialized. Likewise if it is initialized by
+  --  a build-in-place call: we must attach only after the call.
 
   if Ekind (Obj_Id) in E_Constant | E_Variable then
  if Present (Last_Aggregate_Assignment (Obj_Id)) then
-- 
2.43.2



RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Li, Pan2
> OK.

Thanks Richard for help and coaching. To double confirm, are you OK with this 
patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 16, 2024 4:10 PM
To: Li, Pan2 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
>
> > LGTM but you'll need an OK from Richard,
> > Thanks for working on this!
>
> Thanks Tamar for help and coaching, let's wait Richard for a while,😊!

OK.

Thanks for the patience,
Richard.

> Pan
>
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> Liu, Hongtao 
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> Hi Pan,
>
> Thanks!
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> >   PR target/51492
> >   PR target/112600
> >
> > gcc/ChangeLog:
> >
> >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> >   to the return true switch case(s).
> >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> >   * match.pd: Add unsigned SAT_ADD match(es).
> >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> >   us/ssadd.
> >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> >   extern func decl generated in match.pd match.
> >   (match_saturation_arith): New func impl to match the saturation arith.
> >   (math_opts_dom_walker::after_dom_children): Try match saturation
> >   arith when IOR expr.
> >
>
>  LGTM but you'll need an OK from Richard,
>
> Thanks for working on this!
>
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/internal-fn.cc|  1 +
> >  gcc/internal-fn.def   |  2 ++
> >  gcc/match.pd  | 51 +++
> >  gcc/optabs.def|  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >  case IFN_UBSAN_CHECK_MUL:
> >  case IFN_ADD_OVERFLOW:
> >  case IFN_MUL_OVERFLOW:
> > +case IFN_SAT_ADD:
> >  case IFN_VEC_WIDEN_PLUS:
> >  case IFN_VEC_WIDEN_PLUS_LO:
> >  case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED

Re: [PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-16 Thread Kewen.Lin
Hi,

on 2024/5/16 12:08, Andrew Pinski wrote:
> 
> On Thu, May 16, 2024, 4:09 AM Kewen.Lin  > wrote:
> 
> Hi,
> 
> As the associated test case in PR114846 shows, currently
> with eh_return involved some register restoring for EH
> RETURN DATA in epilogue can clobber the one which holding
> the return value.  Referring to the existing handlings in
> some other targets, this patch makes eh_return expander
> call one new define_insn_and_split eh_return_internal which
> directly calls rs6000_emit_epilogue with epilogue_type
> EPILOGUE_TYPE_EH_RETURN instead of the previous treating
> normal return with crtl->calls_eh_return specially.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
> 
> I'm going to push this next week if no objections.
> 
> 
> 
> Thanks for fixing this for powerpc. I hope my patch for aarch64 gets reviewed 
> soon and it will contain many more testcases. Hopefully someone will fix the 
> arm target too.
> 

Looking forward to that!  Thanks for contributing those new eh-return c-torture
test cases, I just tested all of them on LE, all passed. :)

BR,
Kewen

> Thanks,
> Andrew
> 
> 
> 
> BR,
> Kewen
> -
>         PR target/114846
> 
> gcc/ChangeLog:
> 
>         * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
>         EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
>         now, adjust the relevant handlings on it.
>         * config/rs6000/rs6000.md (eh_return expander): Append by calling
>         gen_eh_return_internal and emit_barrier.
>         (eh_return_internal): New define_insn_and_split, call function
>         rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/powerpc/pr114846.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc           |  7 +++
>  gcc/config/rs6000/rs6000.md                 | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr114846.c | 20 
>  3 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114846.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..bd5d56ba002 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4308,9 +4308,6 @@ rs6000_emit_epilogue (enum epilogue_type 
> epilogue_type)
> 
>    rs6000_stack_t *info = rs6000_stack_info ();
> 
> -  if (epilogue_type == EPILOGUE_TYPE_NORMAL && crtl->calls_eh_return)
> -    epilogue_type = EPILOGUE_TYPE_EH_RETURN;
> -
>    int strategy = info->savres_strategy;
>    bool using_load_multiple = !!(strategy & REST_MULTIPLE);
>    bool restoring_GPRs_inline = !!(strategy & REST_INLINE_GPRS);
> @@ -4788,7 +4785,9 @@ rs6000_emit_epilogue (enum epilogue_type 
> epilogue_type)
> 
>    /* In the ELFv2 ABI we need to restore all call-saved CR fields from
>       *separate* slots if the routine calls __builtin_eh_return, so
> -     that they can be independently restored by the unwinder.  */
> +     that they can be independently restored by the unwinder.  Since
> +     it is for CR fields restoring, it should be done for any epilogue
> +     types (not EPILOGUE_TYPE_EH_RETURN specific).  */
>    if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
>      {
>        int i, cr_off = info->ehcr_offset;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..d4120c3b9ce 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -14281,6 +14281,8 @@ (define_expand "eh_return"
>    ""
>  {
>    emit_insn (gen_eh_set_lr (Pmode, operands[0]));
> +  emit_jump_insn (gen_eh_return_internal ());
> +  emit_barrier ();
>    DONE;
>  })
> 
> @@ -14297,6 +14299,19 @@ (define_insn_and_split "@eh_set_lr_"
>    DONE;
>  })
> 
> +(define_insn_and_split "eh_return_internal"
> +  [(eh_return)]
> +  ""
> +  "#"
> +  "epilogue_completed"
> +  [(const_int 0)]
> +{
> +  if (!TARGET_SCHED_PROLOG)
> +    emit_insn (gen_blockage ());
> +  rs6000_emit_epilogue (EPILOGUE_TYPE_EH_RETURN);
> +  DONE;
> +})
> +
>  (define_insn "prefetch"
>    [(prefetch (match_operand 0 "indexed_or_indirect_address" "a")
>              (match_operand:SI 1 "const_int_operand" "n")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114846.c 
> b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> new file mode 100644
> index 000..efe2300b73a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* { dg-require-e

[COMMITTED] Cleanup prange sanity checks.

2024-05-16 Thread Aldy Hernandez
The pointers_handled_p() code was a temporary sanity check, and not
even a good one, since we have a cleaner way of checking type
mismatches with operand_check_p.  This patch removes all the code, and
adds an explicit type check for relational operators, which are the
main problem in PR114985.

Adding this check makes it clear where the type mismatch is happening
in IPA, even without prange.  I've added code to skip the range
folding if the types don't match what the operator expects.  In order
to reproduce the latent bug, just remove the operand_check_p calls.

Tested on x86-64 and ppc64le with and without prange support.

gcc/ChangeLog:

PR tree-optimization/114985
* gimple-range-op.cc: Remove pointers_handled_p.
* ipa-cp.cc (ipa_value_range_from_jfunc): Skip range folding if
operands don't match.
(propagate_vr_across_jump_function): Same.
* range-op-mixed.h: Remove pointers_handled_p and tweak
operand_check_p.
* range-op-ptr.cc (range_operator::pointers_handled_p): Remove.
(pointer_plus_operator::pointers_handled_p): Remove.
(class operator_pointer_diff): Remove pointers_handled_p.
(operator_pointer_diff::pointers_handled_p): Remove.
(operator_identity::pointers_handled_p): Remove.
(operator_cst::pointers_handled_p): Remove.
(operator_cast::pointers_handled_p): Remove.
(operator_min::pointers_handled_p): Remove.
(operator_max::pointers_handled_p): Remove.
(operator_addr_expr::pointers_handled_p): Remove.
(operator_bitwise_and::pointers_handled_p): Remove.
(operator_bitwise_or::pointers_handled_p): Remove.
(operator_equal::pointers_handled_p): Remove.
(operator_not_equal::pointers_handled_p): Remove.
(operator_lt::pointers_handled_p): Remove.
(operator_le::pointers_handled_p): Remove.
(operator_gt::pointers_handled_p): Remove.
(operator_ge::pointers_handled_p): Remove.
* range-op.cc (TRAP_ON_UNHANDLED_POINTER_OPERATORS): Remove.
(range_op_handler::lhs_op1_relation): Remove pointers_handled_p checks.
(range_op_handler::lhs_op2_relation): Same.
(range_op_handler::op1_op2_relation): Same.
* range-op.h: Remove RO_* declarations.
---
 gcc/gimple-range-op.cc |  24 
 gcc/ipa-cp.cc  |  12 ++
 gcc/range-op-mixed.h   |  38 ++
 gcc/range-op-ptr.cc| 259 -
 gcc/range-op.cc|  43 +--
 gcc/range-op.h |  17 ---
 6 files changed, 25 insertions(+), 368 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..7321342b00d 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -329,19 +329,6 @@ public:
 r = lhs;
 return true;
   }
-  virtual bool pointers_handled_p (range_op_dispatch_type type,
-  unsigned dispatch) const
-  {
-switch (type)
-  {
-  case DISPATCH_FOLD_RANGE:
-   return dispatch == RO_PPP;
-  case DISPATCH_OP1_RANGE:
-   return dispatch == RO_PPP;
-  default:
-   return true;
-  }
-  }
 } op_cfn_pass_through_arg1;
 
 // Implement range operator for CFN_BUILT_IN_SIGNBIT.
@@ -1132,17 +1119,6 @@ public:
 r.set (type, wi::zero (TYPE_PRECISION (type)), max - 2);
 return true;
   }
-  virtual bool pointers_handled_p (range_op_dispatch_type type,
-  unsigned dispatch) const
-  {
-switch (type)
-  {
-  case DISPATCH_FOLD_RANGE:
-   return dispatch == RO_IPI;
-  default:
-   return true;
-  }
-  }
 } op_cfn_strlen;
 
 
diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 5781f50c854..09cab761822 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1740,6 +1740,11 @@ ipa_value_range_from_jfunc (vrange &vr,
 
  if (!handler
  || !op_res.supports_type_p (vr_type)
+ /* Sometimes we try to fold comparison operators using a
+pointer type to hold the result instead of a boolean
+type.  Avoid trapping in the sanity check in
+fold_range until this is fixed.  */
+ || !handler.operand_check_p (vr_type, srcvr.type (), op_vr.type 
())
  || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
op_res.set_varying (vr_type);
 
@@ -2547,6 +2552,13 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
 
  if (!handler
  || !ipa_supports_p (operand_type)
+ /* Sometimes we try to fold comparison operators using a
+pointer type to hold the result instead of a boolean
+type.  Avoid trapping in the sanity check in
+fold_range until this is fixed.  */
+ || !handler.operand_check_p (operand_type,
+  src_lats->m_value_range.m_vr.type (),
+  

[COMMITTED] Use a boolean type when folding conditionals in simplify_using_ranges.

2024-05-16 Thread Aldy Hernandez
In adding some traps for PR114985 I noticed that the conditional
folding code in simplify_using_ranges was using the wrong type.  This
cleans up the oversight.

gcc/ChangeLog:

PR tree-optimization/114985
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Use
boolean type when folding conditionals.
---
 gcc/vr-values.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 0572bf6c8c7..e6ea9592574 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -316,10 +316,9 @@ simplify_using_ranges::fold_cond_with_ops (enum tree_code 
code,
   || !query->range_of_expr (r1, op1, s))
 return NULL_TREE;
 
-  tree type = TREE_TYPE (op0);
   int_range<1> res;
   range_op_handler handler (code);
-  if (handler && handler.fold_range (res, type, r0, r1))
+  if (handler && handler.fold_range (res, boolean_type_node, r0, r1))
 {
   if (res == range_true ())
return boolean_true_node;
-- 
2.45.0



[COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Aldy Hernandez
This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables prange
support again.
---
 gcc/gimple-range-cache.cc |  4 ++--
 gcc/gimple-range-fold.cc  |  4 ++--
 gcc/gimple-range-fold.h   |  2 +-
 gcc/gimple-range-infer.cc |  2 +-
 gcc/gimple-range-op.cc|  2 +-
 gcc/gimple-range-path.cc  |  2 +-
 gcc/gimple-ssa-warn-access.cc |  2 +-
 gcc/ipa-cp.h  |  2 +-
 gcc/range-op-ptr.cc   |  4 
 gcc/range-op.cc   | 18 --
 gcc/tree-ssa-structalias.cc   |  2 +-
 gcc/value-range.cc|  1 +
 gcc/value-range.h |  4 ++--
 13 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 72ac2552311..bdd2832873a 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -274,10 +274,10 @@ sbr_sparse_bitmap::sbr_sparse_bitmap (tree t, 
vrange_allocator *allocator,
   // Pre-cache zero and non-zero values for pointers.
   if (POINTER_TYPE_P (t))
 {
-  int_range<2> nonzero;
+  prange nonzero;
   nonzero.set_nonzero (t);
   m_range[1] = m_range_allocator->clone (nonzero);
-  int_range<2> zero;
+  prange zero;
   zero.set_zero (t);
   m_range[2] = m_range_allocator->clone (zero);
 }
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 9c4ad1ee7b9..a9c8c4d03e6 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -597,7 +597,7 @@ fold_using_range::fold_stmt (vrange &r, gimple *s, 
fur_source &src, tree name)
   // Process addresses.
   if (gimple_code (s) == GIMPLE_ASSIGN
   && gimple_assign_rhs_code (s) == ADDR_EXPR)
-return range_of_address (as_a  (r), s, src);
+return range_of_address (as_a  (r), s, src);
 
   gimple_range_op_handler handler (s);
   if (handler)
@@ -757,7 +757,7 @@ fold_using_range::range_of_range_op (vrange &r,
 // If a range cannot be calculated, set it to VARYING and return true.
 
 bool
-fold_using_range::range_of_address (irange &r, gimple *stmt, fur_source &src)
+fold_using_range::range_of_address (prange &r, gimple *stmt, fur_source &src)
 {
   gcc_checking_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
   gcc_checking_assert (gimple_assign_rhs_code (stmt) == ADDR_EXPR);
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index 7cbe15d05e5..c7c599bfc93 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -157,7 +157,7 @@ protected:
  fur_source &src);
   bool range_of_call (vrange &r, gcall *call, fur_source &src);
   bool range_of_cond_expr (vrange &r, gassign* cond, fur_source &src);
-  bool range_of_address (irange &r, gimple *s, fur_source &src);
+  bool range_of_address (prange &r, gimple *s, fur_source &src);
   bool range_of_phi (vrange &r, gphi *phi, fur_source &src);
   void range_of_ssa_name_with_loop_info (vrange &, tree, class loop *, gphi *,
 fur_source &src);
diff --git a/gcc/gimple-range-infer.cc b/gcc/gimple-range-infer.cc
index c8e8b9b60ac..d5e1aa14275 100644
--- a/gcc/gimple-range-infer.cc
+++ b/gcc/gimple-range-infer.cc
@@ -123,7 +123,7 @@ gimple_infer_range::add_nonzero (tree name)
 {
   if (!gimple_range_ssa_p (name))
 return;
-  int_range<2> nz;
+  prange nz;
   nz.set_nonzero (TREE_TYPE (name));
   add_range (name, nz);
 }
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 7321342b00d..aec3f39ec0e 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1107,7 +1107,7 @@ class cfn_strlen : public range_operator
 {
 public:
   using range_operator::fold_range;
-  virtual bool fold_range (irange &r, tree type, const irange &,
+  virtual bool fold_range (irange &r, tree type, const prange &,
   const irange &, relation_trio) const
   {
 wide_int max = irange_val_max (ptrdiff_type_node);
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 96c6ac6b6a5..f1a12f76144 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -443,7 +443,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
 void
 path_range_query::adjust_for_non_null_uses (basic_block bb)
 {
-  int_range_max r;
+  prange r;
   bitmap_iterator bi;
   unsigned i;
 
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 2c10d19e7f3..0cd5b6d6ef4 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -4213,7 +4213,7 @@ pass_waccess::check_pointer_uses (gimple *stmt, tree ptr,
 where the realloc call is known to have failed are valid.
 Ignore pointers that nothing is known about.  Those could
 have escaped along with their nullness.  */
- value_range vr;
+ prange vr;
  if (m_ptr_qry.rvals->range_of_expr (vr, realloc_lhs, use_stmt))
{
  if (vr.zero_p ())
diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h
in

Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Jakub Jelinek
On Thu, May 16, 2024 at 12:01:01PM +0200, Aldy Hernandez wrote:
> This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables 
> prange
> support again.

Please don't do this.
This breaks ChangeLog generation, will need to handle it tomorrow by hand again.
Both the ammendments to the git (cherry-pick -x or revert) added message
lines
This reverts commit COMMITHASH.
and
(cherry picked from commit COMMITHASH)
and revert of revert.

Jakub



Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Aldy Hernandez
Wait, what's the preferred way of reverting a patch?  I followed what I saw in:

commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
Author: Jeff Law 
Date:   Mon May 13 21:42:38 2024 -0600

Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"

This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

and here:

commit 0c6dd4b0973738ce43e76b468a002ab5eb58aaf4
Author: YunQiang Su 
Date:   Mon May 13 14:15:38 2024 +0800

Revert "MIPS: Support constraint 'w' for MSA instruction"

This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.

and here:

commit f6ce85502eb2e4e7bbd9b3c6c1c065a004f8f531
Author: Hans-Peter Nilsson 
Date:   Wed May 8 04:11:20 2024 +0200

Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle
xpass from combine improvement""

This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.

etc etc.

Next time, would you like me to add manual changelog entries?

My apologies, I thought what I did was the blessed way of doing things.
Aldy

On Thu, May 16, 2024 at 12:08 PM Jakub Jelinek  wrote:
>
> On Thu, May 16, 2024 at 12:01:01PM +0200, Aldy Hernandez wrote:
> > This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables 
> > prange
> > support again.
>
> Please don't do this.
> This breaks ChangeLog generation, will need to handle it tomorrow by hand 
> again.
> Both the ammendments to the git (cherry-pick -x or revert) added message
> lines
> This reverts commit COMMITHASH.
> and
> (cherry picked from commit COMMITHASH)
> and revert of revert.
>
> Jakub
>



Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Xi Ruoyao
On Thu, 2024-05-16 at 12:14 +0200, Aldy Hernandez wrote:
> Wait, what's the preferred way of reverting a patch?  I followed what
> I saw in:
> 
> commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
> Author: Jeff Law 
> Date:   Mon May 13 21:42:38 2024 -0600
> 
>     Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"
> 
>     This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

Revert is OK, but revert revert is not.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651144.html

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Jakub Jelinek
On Thu, May 16, 2024 at 12:14:09PM +0200, Aldy Hernandez wrote:
> Wait, what's the preferred way of reverting a patch?  I followed what I saw 
> in:

Reverting a patch (that isn't a reversion) just push git revert.
The important part is not to modify the This reverts commit line from what
git revert created.

> commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
> Author: Jeff Law 
> Date:   Mon May 13 21:42:38 2024 -0600
> 
> Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"
> 
> This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

So, this is just fine.

> and here:
> 
> commit 0c6dd4b0973738ce43e76b468a002ab5eb58aaf4
> Author: YunQiang Su 
> Date:   Mon May 13 14:15:38 2024 +0800
> 
> Revert "MIPS: Support constraint 'w' for MSA instruction"
> 
> This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.

And this too.

What is not fine is hand edit the message:
This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8 because
foo and bar.
You can do that separately, so
This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.
The reversion is because of foo and bar.
Or being further creative:
This reverts commit r13-8390-g9de6ff5ec9a46951d2.
etc.

> commit f6ce85502eb2e4e7bbd9b3c6c1c065a004f8f531
> Author: Hans-Peter Nilsson 
> Date:   Wed May 8 04:11:20 2024 +0200
> 
> Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle
> xpass from combine improvement""
> 
> This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.

This one is not fine.  Our current infrastructure for ChangeLog
generation can't deal with that and there is no agreement what to
write in the ChangeLog for it anyway, whether 2 reversions turn it into
Reapply commit: or 2 Revert: lines?  What happens on 3rd reversion?
So, one needs to manually remove the
This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.
line and supply ChangeLog entry.

For cases like this or the ammended lines (or say if This reverts
commit or (cherry-picked from ) lines refer to invalid commit
the daily DATESTAMP update then fails, I need to add manually
all problematic commits to IGNORED_COMMITS, rerun it by hand and
then write the ChangeLog entries by hand.
See
https://gcc.gnu.org/r13-8764
https://gcc.gnu.org/r15-426
https://gcc.gnu.org/r15-345
https://gcc.gnu.org/r15-344
https://gcc.gnu.org/r15-341
https://gcc.gnu.org/r14-9832
https://gcc.gnu.org/r14-9830
for what I had to do only in April/May for this.

Jakub



[wwwdocs] Document reimplementation of GNU threads library on Windows

2024-05-16 Thread Eric Botcazou
... which happened in GCC 13.

Validated with W3C's Validator and applied.

-- 
Eric Botcazoudiff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index e324b782..3ab4a101 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -770,8 +770,17 @@ You may also want to check out our
 
 
 
-
-
+Windows
+
+  The GNU threads library used by the win32 thread model has
+  been reimplemented using direct Win32 API calls, except for the Objective-C
+  specific subset.  It requires Windows XP/Server 2003 or later.  The new
+  implementation also adds the support needed for the C++11 threads, using
+  again direct Win32 API calls; this additional layer requires Windows
+  Vista/Server 2008 or later.  It is recommended to use a recent version of
+  MinGW-W64 in conjunction with the win32 thread model.
+  
+
 
 
 


Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Oleg Endo


On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > 
> > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  
> > wrote:
> > > 
> > > If we consider code like:
> > > 
> > > if (bar1 == x)
> > >   return foo();
> > > if (bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > We would like the ifcombine pass to convert this to:
> > > 
> > > if (bar1 == x || bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > The ifcombine pass can handle this transformation but it is ran very 
> > > early and
> > > it misses the opportunity because there are two seperate blocks for foo().
> > > The pre pass is good at removing duplicate code and blocks and due to that
> > > running ifcombine again after it can increase the number of successful
> > > conversions.
> > 
> > I do think we should have something similar to re-running
> > ssa-ifcombine but I think it should be much later, like after the loop
> > optimizations are done.
> > Maybe just a simplified version of it (that does the combining and not
> > the optimizations part) included in isel or pass_optimize_widening_mul
> > (which itself should most likely become part of isel or renamed since
> > it handles more than just widening multiply these days).
> 
> I've long wished we had a (late?) pass that can also undo if-conversion
> (basically do what RTL expansion would later do).  Maybe
> gimple-predicate-analysis.cc (what's used by uninit analysis) can
> represent mixed CFG + if-converted conditions so we can optimize
> it and code-gen the condition in a more optimal manner much like
> we have if-to-switch, switch-conversion and switch-expansion.
> 
> That said, I agree that re-running ifcombine should be later.  And there's
> still the old task of splitting tail-merging from PRE (and possibly making
> it more effective).

Sorry to butt in, but it might be little bit relevant and caught my
attention.

I've got this SH patch sitting around
https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543

The idea is basically to run an additional loop pass after combine and
split1.  The main purpose is to hoist constant loads out of loops. Such
constant loads might be formed (in this particular case) during combine
transformations.

The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
plate code copy pasted from other places to get the loop pass setup and
going.

Any thoughts on this way of doing it?


Best regards,
Oleg Endo


Re: [PATCH v3] driver: Output to a temp file; rename upon success [PR80182]

2024-05-16 Thread Peter0x44

On 2024-05-16 01:29, Richard Biener wrote:
On Sun, May 12, 2024 at 3:40 PM Peter Damianov  
wrote:


Currently, commands like:
gcc -o file.c -lm
will delete the user's code.

This patch makes the linker write executables to a temp file, and then 
renames
the temp file if successful. This fixes the case above, but has 
limitations.
The source file will still get overwritten if the link "succeeds", 
such as the

case of: gcc -o file.c -lm -r

It's not perfect, but it should hopefully stop some people from 
ruining their

day.


Hmm.  When suggesting this I was originally hoping for this to be 
implemented

in the linker so that it delays opening (and truncating) of the output
file as much as possible.
Ah, okay, I assumed you wanted it in the driver since then it could 
still fix the problem for older linker versions, but it could be a 
problem to sort the linker too.


If we want to do something in the compiler driver then I like the 
filename based
heuristics more.  v3 seems to only address the case of -o specifying 
the linker

output file but of course
The filename heuristics feel like too much hacks for my liking, but 
maybe I don't have a rational reason to feel that way.
I had some trouble figuring exactly which suffixes to reject, obviously 
-S should not reject .s as an output file, but I still don't think I got 
it all correct. I'm also a little worried, perhaps there is some weird 
makefiles or configure scripts out there that do depend on this 
behavior.


gcc -c t.c -o t2.c

or

gcc -S t.c -o t2.c

happily overwrite a source file as well.  For these cases
heuristically rejecting
source file patterns would be better.  As we've shown the rename trick 
when
the link was successful doesn't fully solve the issue.  And I bet some 
people

will claim it isn't an issue at all ...
I don't think there is any easy or nice way to "fully solve the issue", 
especially if you want to consider -c, -E, -S, etc.


One other idea for -c could be refusing to write out the object file if 
there is no elf/coff/pe/macho header, but I don't like it, sounds too 
complex.


That is, I do think the linker itself, as a quality of implementation 
issue,
should avoid truncating the output early.  In fact the BFD linker seems 
to

unlink the output very early:
Agreed. I decided to try some other linkers, lld and mold both don't 
have this issue.
BFD and gold do. I suppose I should open a bug for that, or investigate 
myself.


24937 stat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 lstat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 unlink("t.c") = 0
24937 openat(AT_FDCWD, "t.c", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3

before even opening other inputs or the default linker script.

Richard.


gcc/ChangeLog:
PR driver/80182
* gcc.cc (output_file_temp): New global variable
(driver_handle_option): Create temp file for executable output
(driver::maybe_run_linker): Rename output_file_temp to 
output_file if

the linker ran successfully

Signed-off-by: Peter Damianov 
---

v3: don't attempt to create temp files -> rename for -o /dev/null

 gcc/gcc.cc | 53 +
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 830a4700a87..5e38c6e578a 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -2138,6 +2138,11 @@ static int have_E = 0;
 /* Pointer to output file name passed in with -o. */
 static const char *output_file = 0;

+/* We write the output file to a temp file, and rename it if linking
+   is successful. This is to prevent mistakes like: gcc -o file.c -lm 
from

+   deleting the user's code.  */
+static const char *output_file_temp = 0;
+
 /* Pointer to input file name passed in with -truncate.
This file should be truncated after linking. */
 static const char *totruncate_file = 0;
@@ -4610,10 +4615,18 @@ driver_handle_option (struct gcc_options 
*opts,
 #if defined(HAVE_TARGET_EXECUTABLE_SUFFIX) || 
defined(HAVE_TARGET_OBJECT_SUFFIX)

   arg = convert_filename (arg, ! have_c, 0);
 #endif
-  output_file = arg;
+  output_file_temp = output_file = arg;
+  /* If creating an executable, create a temp file for the 
output, unless
+ -o /dev/null was requested. This will later get renamed, if 
the linker

+ succeeds.  */
+  if (!have_c && strcmp (output_file, HOST_BIT_BUCKET) != 0)
+{
+  output_file_temp = make_temp_file ("");
+  record_temp_file (output_file_temp, false, true);
+}
   /* On some systems, ld cannot handle "-o" without a space.  So
 split the option from its argument.  */
-  save_switch ("-o", 1, &arg, validated, true);
+  save_switch ("-o", 1, &output_file_temp, validated, true);
   return true;

 case OPT_pie:
@@ -9266,22 +9279,30 @@ driver::maybe_run_linker (const char *argv0) 
const

   linker_was_run = (tmp != execution_count);
 }

-  /* If options s

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 12:55 PM Oleg Endo  wrote:

>
> On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> > On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > >
> > > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis 
> wrote:
> > > >
> > > > If we consider code like:
> > > >
> > > > if (bar1 == x)
> > > >   return foo();
> > > > if (bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > We would like the ifcombine pass to convert this to:
> > > >
> > > > if (bar1 == x || bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > The ifcombine pass can handle this transformation but it is ran very
> early and
> > > > it misses the opportunity because there are two seperate blocks for
> foo().
> > > > The pre pass is good at removing duplicate code and blocks and due
> to that
> > > > running ifcombine again after it can increase the number of
> successful
> > > > conversions.
> > >
> > > I do think we should have something similar to re-running
> > > ssa-ifcombine but I think it should be much later, like after the loop
> > > optimizations are done.
> > > Maybe just a simplified version of it (that does the combining and not
> > > the optimizations part) included in isel or pass_optimize_widening_mul
> > > (which itself should most likely become part of isel or renamed since
> > > it handles more than just widening multiply these days).
> >
> > I've long wished we had a (late?) pass that can also undo if-conversion
> > (basically do what RTL expansion would later do).  Maybe
> > gimple-predicate-analysis.cc (what's used by uninit analysis) can
> > represent mixed CFG + if-converted conditions so we can optimize
> > it and code-gen the condition in a more optimal manner much like
> > we have if-to-switch, switch-conversion and switch-expansion.
> >
> > That said, I agree that re-running ifcombine should be later.  And
> there's
> > still the old task of splitting tail-merging from PRE (and possibly
> making
> > it more effective).
>
> Sorry to butt in, but it might be little bit relevant and caught my
> attention.
>
> I've got this SH patch sitting around
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543
>
> The idea is basically to run an additional loop pass after combine and
> split1.  The main purpose is to hoist constant loads out of loops. Such
> constant loads might be formed (in this particular case) during combine
> transformations.
>
> The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
> plate code copy pasted from other places to get the loop pass setup and
> going.
>
> Any thoughts on this way of doing it?
>

I have been looking at a similar issue on aarch64 for a few cases, csinc
and nand. What I decided to do for nand was not depend on combine in the
end and create a new infrastructure to expand better to rtl from gimple and
maybe even have target specific pattern matching on the gimple level. So
the constant is not part of the other instruction.

I should have a write up/first draft of an implementation by August time
frame or so. The write up will most likely be earlier.

Thanks,
Andrew



>
> Best regards,
> Oleg Endo
>


[PATCH] wrong code with points-to and volatile

2024-05-16 Thread Richard Biener
The following fixes points-to analysis which ignores the fact that
volatile qualified refs can result in any pointer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Btw, I noticed this working on ptr-vs-ptr compare simplification
using points-to info and running into gcc.c-torture/execute/pr64242.c

* tree-ssa-structalias.cc (get_constraint_for_1): For
volatile referenced or decls use ANYTHING.

* gcc.dg/tree-ssa/alias-38.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/alias-38.c | 14 ++
 gcc/tree-ssa-structalias.cc  |  7 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/alias-38.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c
new file mode 100644
index 000..a5c41493473
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int x;
+int y;
+
+int main ()
+{
+  int *volatile p = &x;
+  return (p != &y);
+}
+
+/* { dg-final { scan-tree-dump " != &y" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "return 1;" "optimized" } } */
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 9c63305063c..f0454bea2ea 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -3575,6 +3575,10 @@ get_constraint_for_1 (tree t, vec *results, bool 
address_p,
   }
 case tcc_reference:
   {
+   if (TREE_THIS_VOLATILE (t))
+ /* Fall back to anything.  */
+ break;
+
switch (TREE_CODE (t))
  {
  case MEM_REF:
@@ -3676,6 +3680,9 @@ get_constraint_for_1 (tree t, vec *results, bool 
address_p,
   }
 case tcc_declaration:
   {
+   if (VAR_P (t) && TREE_THIS_VOLATILE (t))
+ /* Fall back to anything.  */
+ break;
get_constraint_for_ssa_var (t, results, address_p);
return;
   }
-- 
2.35.3


Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this 
> patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
> 
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,😊!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> > Liu, Hongtao 
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for 
> > unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > > ; richard.guent...@gmail.com;
> > > hongtao@intel.com; Pan Li 
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > > scalar
> > > int
> > >
> > > From: Pan Li 
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;succ:   EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >   PR target/51492
> > >   PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >   to the return true switch case(s).
> > >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >   * match.pd: Add unsigned SAT_ADD match(es).
> > >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >   us/ssadd.
> > >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > >   extern func decl generated in match.pd match.
> > >   (match_saturation_arith): New func impl to match the saturation 
> > > arith.
> > >   (math_opts_dom_walker::after_dom_children): Try match saturation
> > >   arith when IOR expr.
> > >
> >
> >  LGTM but you'll need an OK from Richard,
> >
> > Thanks for working on this!
> >
> > Tamar
> >
> > > Signed-off-by: Pan Li 
> > > ---
> > >  gcc/internal-fn.cc|  1 +
> > >  gcc/internal-fn.def   |  2 ++
> > >  gcc/match.pd  | 51 +++
> > >  gcc/optabs.def|  4 +--
> > >  gcc/tree-ssa-math-opts.cc | 32 
> > >  5 files changed, 88 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 0a7053c2286..73045ca8c8c 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Li, Pan2
> For the series, the riscv specific part of course needs riscv approval.

Thanks a lot, have a nice day!

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 16, 2024 7:59 PM
To: Li, Pan2 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this 
> patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
> 
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,😊!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> > Liu, Hongtao 
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for 
> > unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > > ; richard.guent...@gmail.com;
> > > hongtao@intel.com; Pan Li 
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > > scalar
> > > int
> > >
> > > From: Pan Li 
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;succ:   EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >   PR target/51492
> > >   PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >   to the return true switch case(s).
> > >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >   * match.pd: Add unsigned SAT_ADD match(es).
> > >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >   us/ssadd.
> > >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > >   extern func decl generated in match.pd match.
> > >   (match_saturation_arith): New func impl to match the saturation 
> > > arith.
> > >   (math_opts_dom_walker::after_dom_children): Try match saturation
> > >   arith when IOR expr.
> > >
> >
> >  LGTM but you'll need an OK from Richard,
> >
> > Thanks for working on this!
> >
> > Tamar
> >
> > > Signed-off-by: Pan Li 
> > > ---
> > >  gcc/internal-fn.cc|  1 +
> > >  gcc/internal-fn.def   |  2 ++
> > >  gcc/match.pd  | 51 ++

[PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Richard Biener
Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
---
 gcc/gimple-pretty-print.cc   |  5 +++-
 gcc/testsuite/gcc.dg/tree-ssa/alias-39.c | 12 ++
 gcc/tree-ssa-alias.cc| 30 
 gcc/tree-ssa-alias.h |  5 
 gcc/tree-ssa-structalias.cc  |  6 ++---
 5 files changed, 50 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/alias-39.c

diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index abda8871f97..a71e1e0efc7 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -822,6 +822,8 @@ pp_points_to_solution (pretty_printer *buffer, const 
pt_solution *pt)
 pp_string (buffer, "unit-escaped ");
   if (pt->null)
 pp_string (buffer, "null ");
+  if (pt->const_pool)
+pp_string (buffer, "const-pool ");
   if (pt->vars
   && !bitmap_empty_p (pt->vars))
 {
@@ -838,7 +840,8 @@ pp_points_to_solution (pretty_printer *buffer, const 
pt_solution *pt)
   if (pt->vars_contains_nonlocal
  || pt->vars_contains_escaped
  || pt->vars_contains_escaped_heap
- || pt->vars_contains_restrict)
+ || pt->vars_contains_restrict
+ || pt->vars_contains_interposable)
{
  const char *comma = "";
  pp_string (buffer, " (");
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c
new file mode 100644
index 000..3b452893f6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-forwprop3" } */
+
+static int a, b;
+int foo (int n, int which)
+{
+  void *p = __builtin_malloc (n);
+  void *q = which ? &a : &b;
+  return p == q;
+}
+
+/* { dg-final { scan-tree-dump "return 0;" "forwprop3" } } */
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 96301bbde7f..6d31fc83691 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -484,9 +484,27 @@ ptrs_compare_unequal (tree ptr1, tree ptr2)
}
   return !pt_solution_includes (&pi->pt, obj1);
 }
-
-  /* ???  We'd like to handle ptr1 != NULL and ptr1 != ptr2
- but those require pt.null to be conservatively correct.  */
+  else if (TREE_CODE (ptr1) == SSA_NAME)
+{
+  struct ptr_info_def *pi1 = SSA_NAME_PTR_INFO (ptr1);
+  if (!pi1
+ || pi1->pt.vars_contains_restrict
+ || pi1->pt.vars_contains_interposable)
+   return false;
+  if (integer_zerop (ptr2) && !pi1->pt.null)
+   return true;
+  if (TREE_CODE (ptr2) == SSA_NAME)
+   {
+ struct ptr_info_def *pi2 = SSA_NAME_PTR_INFO (ptr2);
+ if (!pi2
+ || pi2->pt.vars_contains_restrict
+ || pi2->pt.vars_contains_interposable)
+   return false;
+ if ((!pi1->pt.null || !pi2->pt.null)
+ && (!pi1->pt.const_pool || !pi2->pt.const_pool))
+   return !pt_solutions_intersect (&pi1->pt, &pi2->pt);
+   }
+}
 
   return false;
 }
@@ -636,6 +654,9 @@ dump_points_to_solution (FILE *file, struct pt_solution *pt)
   if (pt->null)
 fprintf (file, ", points-to NULL");
 
+  if (pt->const_pool)
+fprintf (file, ", points-to const-pool");
+
   if (pt->vars)
 {
   fprintf (file, ", points-to vars: ");
@@ -643,7 +664,8 @@ dump_points_to_solution (FILE *file, struct pt_solution *pt)
   if (pt->vars_contains_nonlocal
  || pt->vars_contains_escaped
  || pt->vars_contains_escaped_heap
- || pt->vars_contains_restrict)
+ || pt->vars_contains_restrict
+ || pt->vars_contains_interposable)
{
  const char *comma = "";
  fprintf (file, " (");
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index b26fffeeb2d..e29dff58375 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -47,6 +47,11 @@ struct GTY(()) pt_solution
  includes memory at addr

Re: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 8:50 AM Tamar Christina  wrote:
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Thursday, May 16, 2024 5:06 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com; Richard Sandiford
> > ; Pan Li 
> > Subject: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit
> >
> > From: Pan Li 
> >
> > This patch adds early break auto-vectorization support for target which
> > use length on partial vectorization.  Consider this following example:
> >
> > unsigned vect_a[802];
> > unsigned vect_b[802];
> >
> > void test (unsigned x, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> > vect_b[i] = x + i;
> >
> > if (vect_a[i] > x)
> >   break;
> >
> > vect_a[i] = x;
> >   }
> > }
> >
> > We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
> > And then the IR of RVV looks like below:
> >
> >   ...
> >   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
> >   _55 = (int) _87;
> >   ...
> >   mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
> >   vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
> > {0, ... }, _87, 0);
> >   if (vec_len_mask_72 != { 0, ... })
> > goto ; [5.50%]
> >   else
> > goto ; [94.50%]
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
> >   handling for one or multiple stmt.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
> >   the loop len mask.
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
> >   vect_gen_loop_len_mask for 1 or more stmt(s).
> >   * tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
> >   for vect_gen_loop_len_mask.
> >
>
> Thanks, this version looks good to me!
>
> You'll need Richi's review still.

OK.

Thanks,
Richard.

> Cheers,
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/tree-vect-loop.cc  | 27 +++
> >  gcc/tree-vect-stmts.cc | 17 +++--
> >  gcc/tree-vectorizer.h  |  4 
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 361aec06488..83c0544b6aa 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -11416,6 +11416,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> > gimple_stmt_iterator *gsi,
> >return loop_len;
> >  }
> >
> > +/* Generate the tree for the loop len mask and return it.  Given the lens,
> > +   nvectors, vectype, index and factor to gen the len mask as below.
> > +
> > +   tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
> > +*/
> > +tree
> > +vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator 
> > *gsi,
> > + gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens,
> > + unsigned int nvectors, tree vectype, tree stmt,
> > + unsigned int index, unsigned int factor)
> > +{
> > +  tree all_one_mask = build_all_ones_cst (vectype);
> > +  tree all_zero_mask = build_zero_cst (vectype);
> > +  tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, 
> > index,
> > + factor);
> > +  tree bias = build_int_cst (intQI_type_node,
> > +  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> > (loop_vinfo));
> > +  tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL,
> > "vec_len_mask");
> > +  gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt,
> > + all_one_mask, all_zero_mask, len,
> > + bias);
> > +  gimple_call_set_lhs (call, len_mask);
> > +  gsi_insert_before (cond_gsi, call, GSI_SAME_STMT);
> > +
> > +  return len_mask;
> > +}
> > +
> >  /* Scale profiling counters by estimation for LOOP which is vectorized
> > by factor VF.
> > If FLAT is true, the loop we started with had unrealistically flat
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index b8a71605f1b..672959501bb 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12895,7 +12895,9 @@ vectorizable_early_exit (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> >
> >vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> >bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +  bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> >
> >/* Now build the new conditional.  Pattern gimple_conds get dropped 
> > during
> >   codegen so we must replace the original insn.  */
> > @

Re: [PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-05-16 Thread Richard Biener
On Fri, Apr 12, 2024 at 10:10 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

Looks good, if the rs6000 part is approved.

> Thanks
> Gui Haochen
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> for isnormal builtin.
> * optabs.def (isnormal_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 3174f52ebe8..defb39de95f 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-16 Thread Richard Biener
On Fri, Apr 12, 2024 at 5:07 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

OK if the rs6000 part is approved.

> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> for isfinite builtin.
> * optabs.def (isfinite_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index d2786f207b8..5262aa01660 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-16 Thread juzhe.zh...@rivai.ai
RISC-V part LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; kito.cheng; tamar.christina; richard.guenther; 
Richard.Sandiford; Pan Li
Subject: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len
From: Pan Li 
 
After we support the loop lens for the vectorizable,  we would like to
implement the feature for the RISC-V target.  Given below example:
 
unsigned vect_a[1923];
unsigned vect_b[1923];
 
void test (unsigned limit, int n)
{
  for (int i = 0; i < n; i++)
{
  vect_b[i] = limit + i;
 
  if (vect_a[i] > limit)
{
  ret = vect_b[i];
  return ret;
}
 
  vect_a[i] = limit;
}
}
 
Before this patch:
  ...
.L8:
  swa3,0(a5)
  addiw a0,a0,1
  addi  a4,a4,4
  addi  a5,a5,4
  beq   a1,a0,.L2
.L4:
  swa0,0(a4)
  lwa2,0(a5)
  bleu  a2,a3,.L8
  ret
 
After this patch:
  ...
.L5:
  vsetvli   a5,a3,e8,mf4,ta,ma
  vmv1r.v   v4,v2
  vsetvli   t4,zero,e32,m1,ta,ma
  vmv.v.x   v1,a5
  vadd.vv   v2,v2,v1
  vsetvli   zero,a5,e32,m1,ta,ma
  vadd.vv   v5,v4,v3
  slli  a6,a5,2
  vle32.v   v1,0(t1)
  vmsltu.vv v1,v3,v1
  vcpop.m   t4,v1
  beq   t4,zero,.L4
  vmv.x.s   a4,v4
.L3:
  ...
 
The below tests are passed for this patch:
1. The riscv fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md
  (*vcond_mask_len_popcount_):
New pattern of vcond_mask_len_popcount for vector bool mode.
* config/riscv/autovec.md (vcond_mask_len_): New pattern
of vcond_mask_len for vector bool mode.
(cbranch4): New pattern for vector bool mode.
* config/riscv/vector-iterators.md: Add new unspec
  UNSPEC_SELECT_MASK.
* config/riscv/vector.md (@pred_popcount): Add
VLS mode to popcount pattern.
(@pred_popcount): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/early-break-1.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-2.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec-opt.md   | 33 ++
gcc/config/riscv/autovec.md   | 61 +++
gcc/config/riscv/vector-iterators.md  |  1 +
gcc/config/riscv/vector.md| 18 +++---
.../riscv/rvv/autovec/early-break-1.c | 34 +++
.../riscv/rvv/autovec/early-break-2.c | 37 +++
6 files changed, 175 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-2.c
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..04f85d8e455 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,36 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Optimization pattern for early break auto-vectorization
+;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount
+;; -> non vlmax popcount (mask, len)
+(define_insn_and_split "*vcond_mask_len_popcount_"
+  [(set (match_operand:P 0 "register_operand")
+(popcount:P
+ (unspec:VB_VLS [
+  (unspec:VB_VLS [
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "const_1_operand")
+   (match_operand:VB_VLS 3 "const_0_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK)
+  (match_operand 6 "autovec_length_operand")
+  (const_int 1)
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_nonvlmax_insn (
+ code_for_pred_popcount (mode, Pmode),
+ riscv_vector::CPOP_OP,
+ operands, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..1ee3c8052fb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,64 @@ (define_expand "rawmemchr"
 DONE;
   }
)
+
+;; =
+;; == Early break auto-vectorization patterns
+;; =
+
+;; vcond_mask_len (mask, 1s, 0s, len, bias)
+;; => mask[i] = mask[i] && i < len ? 1 : 0
+(define_insn_and_split "vcond_mask_len_"
+  [(set (match_operand:VB 0 "register_operand")
+(unspec: VB [
+ (match_operand:VB 1 "register_operand")
+ (match_operand:VB 2 "const_1_operand")
+ (match_operand:VB 3 "const_0_operand")
+ (match_operand 4 "autovec_length_operand")
+ (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+mac

Re: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

2024-05-16 Thread juzhe.zh...@rivai.ai
RISC-V part LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; kito.cheng; tamar.christina; richard.guenther; 
Richard.Sandiford; Pan Li
Subject: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite
From: Pan Li 
 
After we supported vectorizable early exit in RISC-V,  we would like to
enable the gcc vect test for vectorizable early test.
 
The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during ccp1.  We will improve that first and mark
this as xfail for RISC-V.
 
The below tests are passed for this patch:
1. The riscv fully regression tests.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will
have 2 times LOOP VECTORIZED in RISC-V.
* gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the
riscv backend.
* lib/target-supports.exp: Add RISC-V backend.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c  | 2 ++
gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 2 ++
3 files changed, 5 insertions(+), 1 deletion(-)
 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
index fdd9032da98..2f80bf89e5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -28,6 +28,8 @@ main ()
   if (__builtin_memcmp (x, res, sizeof (x)) != 0)
 abort ();
+
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
 if (flag[i] != 0 && flag[i] != 1)
   abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 51abf245ccb..101ae1e0eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -2,7 +2,7 @@
/* { dg-require-effective-target vect_early_break_hw } */
/* { dg-require-effective-target vect_long_long } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
#include "tree-vect.h"
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 6f5d477b128..ec9baa4f32a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,7 @@ proc check_effective_target_vect_early_break { } {
|| [check_effective_target_arm_v8_neon_ok]
|| [check_effective_target_sse4]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v]
}}]
}
@@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } {
|| [check_effective_target_arm_v8_neon_hw]
|| [check_sse4_hw_available]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v_ok]
}}]
}
-- 
2.34.1
 
 


RE: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-16 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 16, 2024 8:13 PM
To: Tamar Christina 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Richard Sandiford 

Subject: Re: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

On Thu, May 16, 2024 at 8:50 AM Tamar Christina  wrote:
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Thursday, May 16, 2024 5:06 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com; Richard Sandiford
> > ; Pan Li 
> > Subject: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit
> >
> > From: Pan Li 
> >
> > This patch adds early break auto-vectorization support for target which
> > use length on partial vectorization.  Consider this following example:
> >
> > unsigned vect_a[802];
> > unsigned vect_b[802];
> >
> > void test (unsigned x, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> > vect_b[i] = x + i;
> >
> > if (vect_a[i] > x)
> >   break;
> >
> > vect_a[i] = x;
> >   }
> > }
> >
> > We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
> > And then the IR of RVV looks like below:
> >
> >   ...
> >   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
> >   _55 = (int) _87;
> >   ...
> >   mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
> >   vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
> > {0, ... }, _87, 0);
> >   if (vec_len_mask_72 != { 0, ... })
> > goto ; [5.50%]
> >   else
> > goto ; [94.50%]
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
> >   handling for one or multiple stmt.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
> >   the loop len mask.
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
> >   vect_gen_loop_len_mask for 1 or more stmt(s).
> >   * tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
> >   for vect_gen_loop_len_mask.
> >
>
> Thanks, this version looks good to me!
>
> You'll need Richi's review still.

OK.

Thanks,
Richard.

> Cheers,
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/tree-vect-loop.cc  | 27 +++
> >  gcc/tree-vect-stmts.cc | 17 +++--
> >  gcc/tree-vectorizer.h  |  4 
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 361aec06488..83c0544b6aa 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -11416,6 +11416,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> > gimple_stmt_iterator *gsi,
> >return loop_len;
> >  }
> >
> > +/* Generate the tree for the loop len mask and return it.  Given the lens,
> > +   nvectors, vectype, index and factor to gen the len mask as below.
> > +
> > +   tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
> > +*/
> > +tree
> > +vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator 
> > *gsi,
> > + gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens,
> > + unsigned int nvectors, tree vectype, tree stmt,
> > + unsigned int index, unsigned int factor)
> > +{
> > +  tree all_one_mask = build_all_ones_cst (vectype);
> > +  tree all_zero_mask = build_zero_cst (vectype);
> > +  tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, 
> > index,
> > + factor);
> > +  tree bias = build_int_cst (intQI_type_node,
> > +  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> > (loop_vinfo));
> > +  tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL,
> > "vec_len_mask");
> > +  gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt,
> > + all_one_mask, all_zero_mask, len,
> > + bias);
> > +  gimple_call_set_lhs (call, len_mask);
> > +  gsi_insert_before (cond_gsi, call, GSI_SAME_STMT);
> > +
> > +  return len_mask;
> > +}
> > +
> >  /* Scale profiling counters by estimation for LOOP which is vectorized
> > by factor VF.
> > If FLAT is true, the loop we started with had unrealistically flat
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index b8a71605f1b..672959501bb 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12895,7 +12895,9 @@ vectorizable_early_exit (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> >
> >vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  vec_loop_lens *lens = 

Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-05-16 Thread Richard Biener
On Wed, 3 Apr 2024, Chung-Lin Tang wrote:

> Hi Richard, Thomas,
> 
> On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>
> >> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >> flag.
> >>
> >> The actual optimization then is done in this second patch.  Chung-Lin
> >> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >> I don't have much experience with most of the following generic code, so
> >> would appreciate a helping hand, whether that conceptually makes sense as
> >> well as from the implementation point of view:
> 
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes 
> to
> gimplify.cc now)
> 
> I remember this code was an artifact of earlier attempts to allow 
> struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging 
> deeper)
> 
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> cases) for map
> clause decl reference building, so hoping that the variables "happen to be" 
> single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
> appear to be
> a little risky.
> 
> However, for firstprivate pointers processed during omp-low, it appears to be 
> somewhat different.
> (see below description)
> 
> > No, I don't think you can use that flag on non-default-defs, nor
> > preserve it on copying.  So
> > it also doesn't nicely extend to DECLs as done by the patch.  We
> > currently _only_ use it
> > for incoming parameters.  When used on arbitrary code you can get to for 
> > example
> > 
> > ptr1(points-to-readony-memory) = &p->x;
> > ... access via ptr1 ...
> > ptr2 = &p->x;
> > ... access via ptr2 ...
> > 
> > where both are your OMP regions differently constrained (the constrain is 
> > on the
> > code in the region, _not_ on the actual protections of the pointed to
> > data, much like
> > for the fortran case).  But now CSE comes along and happily replaces all 
> > ptr2
> > with ptr2 in the second region and ... oops!
> 
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
> the second region"?
> 
> That doesn't happen, because during omp-lower/expand, OMP target regions 
> (which is all that
> this applies currently) is separated into different individual child 
> functions.
> 
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> the first load
> of this pointer)
> 
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[8]);
> r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[12]);
> r = a[12];
>   }
> 
> After omp-expand (before SSA):
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>:
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;

So 'readonly: a[:32]' is put in .omp_data_i->D.2947 in the caller
and extracted here.  And you arrange for 'a.8' to have
DECL_POINTS_TO_READONLY set by "magic"?  Looking at this I wonder
if it would be more useful to "const qualify" (but "really", not
in the C sense) .omp_data_i->D.2947 instead?  Thus have a
FIELD_POINTS_TO_READONLY_MEMORY flag on the FIELD_DECL.

Points-to analysis should then be able to handle this similar to how
it handles loads of restrict qualified pointers.  Well, of course not
as simple since it now adds "qualifiers" to storage since I presume
the same object can be both readonly and not readonly like via

 #pragma acc parallel copyin(readonly: a[:32], a[33:64]) copyout(r)

?  That is, currently there's only one "readonly" object kind in
points-to, that's STRING_CSTs which get all globbed to string_id
and "ignored" for alias purposes since you can't change them.

So possibly you want to combine this with restrict qualifying the
pointer so we know there's no other (read-write) access to the memory
possible.  But then you might get all the good stuff already by
_just_ doing that restrict qualification and ignoring the readonly-ness?

>   r.1 = (*a.8)[12];
>   foo (a.8, r.1);
>   r.1 = (*a.8)[12];
>   D.2965 = .omp_data_i->r;
>   *D.2965 = r.1;
>   return;
> }
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> {
>   ...
>:
>   D.2968 = .omp_data_i->D.2939;
>   a.4 = D.2968;
>   r.0 = (*a.4)[8];
>   foo (a.4, r.0);
>   r.0 = (*a.4)[8];
>   D.2971 = .omp_data_i->r;
>   *D.2971 = r.0;
>   return;
> }
> 
> So actually, the cr

Re: [PATCH] c++: represent all class non-dep assignments as CALL_EXPR

2024-05-16 Thread Jason Merrill

On 5/15/24 13:55, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linu-xgnu, does this look OK
for trunk?


OK.


-- >8 --

Non-dependent compound assignment expressions are currently represented
as CALL_EXPR to the selected operator@= overload.  Non-dependent simple
assignments on the other hand are still represented as MODOP_EXPR, which
doesn't hold on to the selected overload.

That we need to remember the selected operator@= overload ahead of time
is a correctness thing, because they can be declared at namespace scope
and we don't want to consider later-declared namespace scope overloads
at instantiation time.  This doesn't apply to simple operator= because
it can only be declared at class scope, so it's fine to repeat the name
lookup and overload resolution at instantiation time.  But it still
seems desirable for sake of QoI to also avoid this repeated name lookup
and overload resolution for simple assignments along the lines of
r12-6075-g2decd2cabe5a4f.

To that end, this patch makes us represent non-dependent simple
assignments as CALL_EXPR to the selected operator= overload rather than
as MODOP_EXPR.  In order for is_assignment_op_expr_p to recognize such
CALL_EXPR as an assignment expression, cp_get_fndecl_from_callee needs
to look through templated COMPONENT_REF callee corresponding to a member
function call, otherwise ahead of time -Wparentheses warnings stop
working (e.g. g++.dg/warn/Wparentheses-{32,33}.C).

gcc/cp/ChangeLog:

* call.cc (build_new_op): Pass 'overload' to
cp_build_modify_expr.
* cp-tree.h (cp_build_modify_expr): New overload that
takes a tree* out-parameter.
* pt.cc (tsubst_expr) : Propagate
OPT_Wparentheses warning suppression to the result.
* cvt.cc (cp_get_fndecl_from_callee): Use maybe_get_fns
to extract the FUNCTION_DECL from a callee.
* semantics.cc (is_assignment_op_expr_p): Also recognize
templated operator expressions represented as a CALL_EXPR
to operator=.
* typeck.cc (cp_build_modify_expr): Add 'overload'
out-parameter and pass it to build_new_op.
(build_x_modify_expr): Pass 'overload' to cp_build_modify_expr.
---
  gcc/cp/call.cc   |  2 +-
  gcc/cp/cp-tree.h |  3 +++
  gcc/cp/cvt.cc|  5 +++--
  gcc/cp/pt.cc |  2 ++
  gcc/cp/typeck.cc | 18 ++
  5 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index e058da7735f..e3d4cf8949d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -7473,7 +7473,7 @@ build_new_op (const op_location_t &loc, enum tree_code 
code, int flags,
switch (code)
  {
  case MODIFY_EXPR:
-  return cp_build_modify_expr (loc, arg1, code2, arg2, complain);
+  return cp_build_modify_expr (loc, arg1, code2, arg2, overload, complain);
  
  case INDIRECT_REF:

return cp_build_indirect_ref (loc, arg1, RO_UNARY_STAR, complain);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9a8c8659157..1e565086e80 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8267,6 +8267,9 @@ extern tree cp_build_c_cast   
(location_t, tree, tree,
  extern cp_expr build_x_modify_expr(location_t, tree,
 enum tree_code, tree,
 tree, tsubst_flags_t);
+extern tree cp_build_modify_expr   (location_t, tree,
+enum tree_code, tree,
+tree *, tsubst_flags_t);
  extern tree cp_build_modify_expr  (location_t, tree,
 enum tree_code, tree,
 tsubst_flags_t);
diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index db086c017e8..2f4c0f88694 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1015,8 +1015,9 @@ cp_get_fndecl_from_callee (tree fn, bool fold /* = true 
*/)
return f;
  };
  
-  if (TREE_CODE (fn) == FUNCTION_DECL)

-return fn_or_local_alias (fn);
+  if (tree f = maybe_get_fns (fn))
+if (TREE_CODE (f) == FUNCTION_DECL)
+  return fn_or_local_alias (f);
tree type = TREE_TYPE (fn);
if (type == NULL_TREE || !INDIRECT_TYPE_P (type))
  return NULL_TREE;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 32640f8e946..d83f530ac8d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21093,6 +21093,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
if (warning_suppressed_p (t, OPT_Wpessimizing_move))
  /* This also suppresses -Wredundant-move.  */
  suppress_warning (ret, OPT_Wpessimizing_move);
+   if (warning_suppressed_p (t, OPT_Wparentheses))
+ suppress_warning (STRIP_REFERENCE_REF (ret), OPT_Wparentheses);
  }
  
  	RETURN (ret);

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 5f16994300f..75b696e32e0 10064

C++ Patch ping - Re: [PATCH] c++: Fix parsing of abstract-declarator starting with ... followed by [ or ( [PR115012]

2024-05-16 Thread Jakub Jelinek
Hi!

I'd like to ping the 
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651199.html
patch.

Thanks.

On Thu, May 09, 2024 at 08:12:30PM +0200, Jakub Jelinek wrote:
> The C++26 P2662R3 Pack indexing paper mentions that both GCC
> and MSVC don't handle T...[10] parameter declaration when T
> is a pack.  While that will change meaning in C++26, in C++11 .. C++23
> this ought to be valid.  Also, T...(args) as well.
> 
> The following patch handles those in cp_parser_direct_declarator.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-05-09  Jakub Jelinek  
> 
>   PR c++/115012
>   * parser.cc (cp_parser_direct_declarator): Handle
>   abstract declarator starting with ... followed by [
>   or (.
> 
>   * g++.dg/cpp0x/variadic185.C: New test.
>   * g++.dg/cpp0x/variadic186.C: New test.

Jakub



Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Jeff Law




On 5/16/24 6:03 AM, Richard Biener wrote:

Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
You might want to test this against 92539 as well.  There's a nonzero 
chance it'll resolve that one.


jeff



Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Jeff Law




On 5/16/24 5:58 AM, Richard Biener wrote:

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:



OK.


Thanks Richard for help and coaching. To double confirm, are you OK with this 
patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.


For the series, the riscv specific part of course needs riscv approval.
Yea, we'll take a look at it.  Tons of stuff to go through, but this is 
definitely on the list.


jeff



Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF

2024-05-16 Thread Jan Hubicka
Hi,
TARGET_MEM_REF can be used to offset constant base into a memory object (to
produce lea instruction).  This confuses points_to_local_or_readonly_memory_p
which treats the constant address as a base of the access.

Bootstrapped/regtsted x86_64-linux, comitted.
Honza

gcc/ChangeLog:

PR ipa/113787
* ipa-fnsummary.cc (points_to_local_or_readonly_memory_p): Do not
look into TARGET_MEM_REFS with constant opreand 0.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr113787.c: New test.

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 07a853f78e3..2faf2389297 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -2648,7 +2648,9 @@ points_to_local_or_readonly_memory_p (tree t)
return true;
   return !ptr_deref_may_alias_global_p (t, false);
 }
-  if (TREE_CODE (t) == ADDR_EXPR)
+  if (TREE_CODE (t) == ADDR_EXPR
+  && (TREE_CODE (TREE_OPERAND (t, 0)) != TARGET_MEM_REF
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (t, 0), 0)) != INTEGER_CST))
 return refs_local_or_readonly_memory_p (TREE_OPERAND (t, 0));
   return false;
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113787.c 
b/gcc/testsuite/gcc.c-torture/execute/pr113787.c
new file mode 100644
index 000..702b6c35fc6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr113787.c
@@ -0,0 +1,38 @@
+void foo(int x, int y, int z, int d, int *buf)
+{
+  for(int i = z; i < y-z; ++i)
+for(int j = 0; j < d; ++j)
+  /* buf[x(i+1) + j] = buf[x(i+1)-j-1] */
+  buf[i*x+(x-z+j)] = buf[i*x+(x-z-1-j)];
+}
+
+void bar(int x, int y, int z, int d, int *buf)
+{
+  for(int i = 0; i < d; ++i)
+for(int j = z; j < x-z; ++j)
+  /* buf[j+(y+i)*x] = buf[j+(y-1-i)*x] */
+  buf[j+(y-z+i)*x] = buf[j+(y-z-1-i)*x];
+}
+
+__attribute__((noipa))
+void baz(int x, int y, int d, int *buf)
+{
+  foo(x, y, 0, d, buf);
+  bar(x, y, 0, d, buf);
+}
+
+int main(void)
+{
+  int a[] = { 1, 2, 3 };
+  baz (1, 2, 1, a);
+  /* foo does:
+ buf[1] = buf[0];
+ buf[2] = buf[1];
+
+ bar does:
+ buf[2] = buf[1]; (no-op)
+ so we should have { 1, 1, 1 }.  */
+  for (int i = 0; i < 3; i++)
+if (a[i] != 1)
+  __builtin_abort ();
+}


Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Richard Biener
On Thu, 16 May 2024, Jeff Law wrote:

> 
> 
> On 5/16/24 6:03 AM, Richard Biener wrote:
> > Now that we handle pt.null conservatively we can implement the missing
> > tracking of constant pool entries (aka STRING_CST) and handle
> > ptr-ptr compares using points-to info in ptrs_compare_unequal.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.
> > 
> > Richard.
> > 
> >  PR tree-optimization/13962
> >  PR tree-optimization/96564
> >  * tree-ssa-alias.h (pt_solution::const_pool): New flag.
> >  * tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
> >  compares.
> >  (dump_points_to_solution): Dump the const_pool flag, fix guard
> >  of flag dumping.
> >  * gimple-pretty-print.cc (pp_points_to_solution): Likewise.
> >  * tree-ssa-structalias.cc (find_what_var_points_to): Set
> >  the const_pool flag for STRING.
> >  (pt_solution_ior_into): Handle the const_pool flag.
> >  (ipa_escaped_pt): Initialize it.
> > 
> >  * gcc.dg/tree-ssa/alias-39.c: New testcase.
> >  * g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
> >  to manifest in transforms no longer vectorizing this testcase
> >  for an ICE.
> You might want to test this against 92539 as well.  There's a nonzero chance
> it'll resolve that one.

Unfortunately it doesn't.

Richard.


[PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing

2024-05-16 Thread Victor Do Nascimento
Following improvements to the way ifuncs are selected based on
detected architectural features, we are able to do away with many of
the aliases that were previously needed for subsets of atomic
functions that were not implemented in a given extension.

This may be clarified by virtue of an example. Before, LSE128
functions carried the suffix _i1 and LSE2 functions the _i2.

Using a single ifunc selector for all atomic functions meant that if
LSE128 was detected, the _i1 function variant would be used
indiscriminately, irrespective of whether or not a function had an
LSE128-specific implementation.  Aliasing was thus needed to redirect
calls to these missing functions to their _i2 LSE2 alternatives.

The more architectural extensions for which support was added, the
more complex the aliasing chain.

With the per-file configuration of ifuncs, we do away with the need
for such aliasing.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: Remove unnecessary
aliasing.
---
 libatomic/config/linux/aarch64/atomic_16.S | 41 --
 1 file changed, 41 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 1517e9e78df..16ff03057ab 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -732,47 +732,6 @@ ENTRY_ALIASED (test_and_set_16)
 END (test_and_set_16)
 
 
-/* Alias entry points which are the same in LSE2 and LSE128.  */
-
-#if HAVE_IFUNC
-# if !HAVE_FEAT_LSE128
-ALIAS (exchange_16, LSE128, LSE2)
-ALIAS (fetch_or_16, LSE128, LSE2)
-ALIAS (fetch_and_16, LSE128, LSE2)
-ALIAS (or_fetch_16, LSE128, LSE2)
-ALIAS (and_fetch_16, LSE128, LSE2)
-# endif
-ALIAS (load_16, LSE128, LSE2)
-ALIAS (store_16, LSE128, LSE2)
-ALIAS (compare_exchange_16, LSE128, LSE2)
-ALIAS (fetch_add_16, LSE128, LSE2)
-ALIAS (add_fetch_16, LSE128, LSE2)
-ALIAS (fetch_sub_16, LSE128, LSE2)
-ALIAS (sub_fetch_16, LSE128, LSE2)
-ALIAS (fetch_xor_16, LSE128, LSE2)
-ALIAS (xor_fetch_16, LSE128, LSE2)
-ALIAS (fetch_nand_16, LSE128, LSE2)
-ALIAS (nand_fetch_16, LSE128, LSE2)
-ALIAS (test_and_set_16, LSE128, LSE2)
-
-/* Alias entry points which are the same in baseline and LSE2.  */
-
-ALIAS (exchange_16, LSE2, CORE)
-ALIAS (fetch_add_16, LSE2, CORE)
-ALIAS (add_fetch_16, LSE2, CORE)
-ALIAS (fetch_sub_16, LSE2, CORE)
-ALIAS (sub_fetch_16, LSE2, CORE)
-ALIAS (fetch_or_16, LSE2, CORE)
-ALIAS (or_fetch_16, LSE2, CORE)
-ALIAS (fetch_and_16, LSE2, CORE)
-ALIAS (and_fetch_16, LSE2, CORE)
-ALIAS (fetch_xor_16, LSE2, CORE)
-ALIAS (xor_fetch_16, LSE2, CORE)
-ALIAS (fetch_nand_16, LSE2, CORE)
-ALIAS (nand_fetch_16, LSE2, CORE)
-ALIAS (test_and_set_16, LSE2, CORE)
-#endif
-
 /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
 #define FEATURE_1_AND 0xc000
 #define FEATURE_1_BTI 1
-- 
2.34.1



[PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-05-16 Thread Victor Do Nascimento
The recent introduction of the optional LSE128 and RCPC3 architectural
extensions to AArch64 has further led to the increased flexibility of
atomic support in the architecture, with many extensions providing
support for distinct atomic operations, each with different potential
applications in mind.

This has led to maintenance difficulties in Libatomic, in particular
regarding the way the ifunc selector is generated via a series of
macro expansions at compile-time.

Until now, irrespective of the atomic operation in question, all atomic
functions for a particular operand size were expected to have the same
number of ifunc alternatives, meaning that a one-size-fits-all
approach could reasonably be taken for the selector.

This meant that if, hypothetically, for a particular architecture and
operand size one particular atomic operation was to have 3 different
implementations associated with different extensions, libatomic would
likewise be required to present three ifunc alternatives for all other
atomic functions.

The consequence in the design choice was the unnecessary use of
function aliasing and the unwieldy code which resulted from this.

This patch series attempts to remediate this issue by making the
preprocessor macros defining the number of ifunc alternatives and
their respective selection functions dependent on the file importing
the ifunc selector-generating framework.

all files are given `LAT_' macros, defined at the beginning
and undef'd at the end of the file.  It is these macros that are
subsequently used to fine-tune the behaviors of `libatomic_i.h' and
`host-config.h'.

In particular, the definition of the `IFUNC_NCOND(N)' and
`IFUNC_COND_' macros in host-config.h can now be guarded behind
these new file-specific macros, which ultimately control what the
`GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to.  As both of
these headers are imported once per file implementing some atomic
operation, fine-tuned control is now possible.

Regtested with both `--enable-gnu-indirect-function' and
`--disable-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.

Victor Do Nascimento (4):
  Libatomic: Define per-file identifier macros
  Libatomic: Make ifunc selector behavior contingent on importing file
  Libatomic: Clean up AArch64 ifunc aliasing
  Libatomic: Clean up AArch64 `atomic_16.S' implementation file

 libatomic/cas_n.c|   2 +
 libatomic/config/linux/aarch64/atomic_16.S   | 623 +--
 libatomic/config/linux/aarch64/host-config.h |  35 +-
 libatomic/exch_n.c   |   2 +
 libatomic/fadd_n.c   |   2 +
 libatomic/fand_n.c   |   2 +
 libatomic/fence.c|   2 +
 libatomic/fenv.c |   2 +
 libatomic/fior_n.c   |   2 +
 libatomic/flag.c |   2 +
 libatomic/fnand_n.c  |   2 +
 libatomic/fop_n.c|   2 +
 libatomic/fsub_n.c   |   2 +
 libatomic/fxor_n.c   |   2 +
 libatomic/gcas.c |   2 +
 libatomic/gexch.c|   2 +
 libatomic/glfree.c   |   2 +
 libatomic/gload.c|   2 +
 libatomic/gstore.c   |   2 +
 libatomic/load_n.c   |   2 +
 libatomic/store_n.c  |   2 +
 libatomic/tas_n.c|   2 +
 22 files changed, 357 insertions(+), 341 deletions(-)

-- 
2.34.1



[PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-05-16 Thread Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors.  This reduces the number of unnecessary feature tests that
need to be carried out in order to find the best atomic implementation
for a function at run-time.

An immediate benefit of this is that we can further fine-tune the
architectural requirements for each atomic function without risk of
incurring the maintenance and runtime-performance penalties of having
to maintain an ifunc selector with a huge number of alternatives, most
of which are irrelevant for any particular function.  Consequently,
for AArch64 targets, we relax the architectural requirements of
`compare_exchange_16', which now requires only LSE as opposed to the
newer LSE2.

The new flexibility provided by this approach also means that certain
functions can now be called directly, doing away with ifunc selectors
altogether when only a single implementation is available for it on a
given target.  As per the macro expansion framework laid out in
`libatomic_i.h', such functions should have their names prefixed with
`__atomic_' as opposed to `libat_'.  This is the same prefix applied
to function names when Libatomic is configured with
`--disable-gnu-indirect-function'.

To achieve this, these functions unconditionally apply the aliasing
rule that at present is conditionally applied only when libatomic is
built without ifunc support, which ensures that the default
`libat_##NAME' is accessible via the equivalent `__atomic_##NAME' too.
This is ensured by using the new `ENTRY_ALIASED' macro.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S (LSE): New.
(ENTRY_ALIASED): Likewise.
* config/linux/aarch64/host-config.h (LSE_ATOP): New.
(LSE2_ATOP): Likewise.
(LSE128_ATOP): Likewise.
(IFUNC_COND_1): Make its definition conditional on above 3
macros.
(IFUNC_NCOND): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 31 +
 libatomic/config/linux/aarch64/host-config.h | 35 
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index b63e97ac5a2..1517e9e78df 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -54,17 +54,20 @@
 #endif
 
 #define LSE128(NAME)   libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i2
+#define LSE(NAME)  libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
+/* Emit __atomic_* entrypoints if no ifuncs.  */
+#define ENTRY_ALIASED(NAME)ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+
 #if HAVE_IFUNC
 # define ENTRY(NAME)   ENTRY2 (CORE (NAME), )
 # define ENTRY_FEAT(NAME, FEAT) ENTRY2 (FEAT (NAME), )
 # define END_FEAT(NAME, FEAT)  END2 (FEAT (NAME))
 #else
-/* Emit __atomic_* entrypoints if no ifuncs.  */
-# define ENTRY(NAME)   ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+# define ENTRY(NAME)   ENTRY_ALIASED (NAME)
 #endif
 
 #define END(NAME)  END2 (CORE (NAME))
@@ -299,7 +302,7 @@ END (compare_exchange_16)
 
 
 #if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE2)
+ENTRY_FEAT (compare_exchange_16, LSE)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -332,11 +335,11 @@ ENTRY_FEAT (compare_exchange_16, LSE2)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END_FEAT (compare_exchange_16, LSE2)
+END_FEAT (compare_exchange_16, LSE)
 #endif
 
 
-ENTRY (fetch_add_16)
+ENTRY_ALIASED (fetch_add_16)
mov x5, x0
cbnzw4, 2f
 
@@ -358,7 +361,7 @@ ENTRY (fetch_add_16)
 END (fetch_add_16)
 
 
-ENTRY (add_fetch_16)
+ENTRY_ALIASED (add_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -380,7 +383,7 @@ ENTRY (add_fetch_16)
 END (add_fetch_16)
 
 
-ENTRY (fetch_sub_16)
+ENTRY_ALIASED (fetch_sub_16)
mov x5, x0
cbnzw4, 2f
 
@@ -402,7 +405,7 @@ ENTRY (fetch_sub_16)
 END (fetch_sub_16)
 
 
-ENTRY (sub_fetch_16)
+ENTRY_ALIASED (sub_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -624,7 +627,7 @@ END_FEAT (and_fetch_16, LSE128)
 #endif
 
 
-ENTRY (fetch_xor_16)
+ENTRY_ALIASED (fetch_xor_16)
mov x5, x0
cbnzw4, 2f
 
@@ -646,7 +649,7 @@ ENTRY (fetch_xor_16)
 END (fetch_xor_16)
 
 
-ENTRY (xor_fetch_16)
+ENTRY_ALIASED (xor_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -668,7 +671,7 @@ ENTRY (xor_fetch_16)
 END (xor_fetch_16)
 
 
-ENTRY (fetch_nand_16)
+ENTRY_ALIASED (fetch_nand_16)
mov x5, x0
mvn in0, in0
mvn in1, in1
@@ -692,7 +695,7 @@ ENTRY (fetch_nand_16)
 END (fetch_nand_16)
 
 
-ENTRY (nand_fetch_16)
+ENTRY_ALIASED (

[PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-05-16 Thread Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the
same functions together in the file.  Therefore, as an example,
the LSE128 implementation of `exchange_16' follows on immediately
from its core implementation, as does the `fetch_or_16' LSE128
implementation.

Such architectural extension-dependent implementations are dependent
both on ifunc and assembler support.  They may therefore conceivably
be guarded by 2 preprocessor macros, e.g. `#if HAVE_IFUNC' and `#if
HAVE_FEAT_LSE128'.

Having to apply these guards on a per-function basis adds unnecessary
clutter to the file and makes its maintenance more error-prone.

We therefore reorganize the layout of the file in such a way that all
core implementations needing no `#ifdef's are placed first, followed
by all ifunc-dependent implementations, which can all be guarded by a
single `#if HAVE_IFUNC'.  Within the guard, these are then subdivided
and organized according to architectural extension requirements such
that in the case of LSE128-specific functions, for example, they can
all be guarded by a single `#if HAVE_FEAT_LSE128', greatly reducing
the overall number of required `#ifdef' macros.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: reshuffle functions.
---
 libatomic/config/linux/aarch64/atomic_16.S | 583 ++---
 1 file changed, 288 insertions(+), 295 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 16ff03057ab..27363f82b75 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,15 +40,12 @@
 
 #include "auto-config.h"
 
-#if !HAVE_IFUNC
-# undef HAVE_FEAT_LSE128
-# define HAVE_FEAT_LSE128 0
-#endif
-
-#define HAVE_FEAT_LSE2 HAVE_IFUNC
-
-#if HAVE_FEAT_LSE128
+#if HAVE_IFUNC
+# if HAVE_FEAT_LSE128
.arch   armv9-a+lse128
+# else
+   .arch   armv8-a+lse
+# endif
 #else
.arch   armv8-a+lse
 #endif
@@ -124,6 +121,8 @@ NAME:   \
 #define ACQ_REL 4
 #define SEQ_CST 5
 
+/* Core atomic operation implementations.  These are available irrespective of
+   ifunc support or the presence of additional architectural extensions.  */
 
 ENTRY (load_16)
mov x5, x0
@@ -143,31 +142,6 @@ ENTRY (load_16)
 END (load_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (load_16, LSE2)
-   cbnzw1, 1f
-
-   /* RELAXED.  */
-   ldp res0, res1, [x0]
-   ret
-1:
-   cmp w1, SEQ_CST
-   b.eq2f
-
-   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-
-   /* SEQ_CST.  */
-2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-END_FEAT (load_16, LSE2)
-#endif
-
-
 ENTRY (store_16)
cbnzw4, 2f
 
@@ -185,23 +159,6 @@ ENTRY (store_16)
 END (store_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (store_16, LSE2)
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   stp in0, in1, [x0]
-   ret
-
-   /* RELEASE/SEQ_CST.  */
-1: ldxpxzr, tmp0, [x0]
-   stlxp   w4, in0, in1, [x0]
-   cbnzw4, 1b
-   ret
-END_FEAT (store_16, LSE2)
-#endif
-
-
 ENTRY (exchange_16)
mov x5, x0
cbnzw4, 2f
@@ -229,31 +186,6 @@ ENTRY (exchange_16)
 END (exchange_16)
 
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (exchange_16, LSE128)
-   mov tmp0, x0
-   mov res0, in0
-   mov res1, in1
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   swppres0, res1, [tmp0]
-   ret
-1:
-   cmp w4, ACQUIRE
-   b.hi2f
-
-   /* ACQUIRE/CONSUME.  */
-   swppa   res0, res1, [tmp0]
-   ret
-
-   /* RELEASE/ACQ_REL/SEQ_CST.  */
-2: swppal  res0, res1, [tmp0]
-   ret
-END_FEAT (exchange_16, LSE128)
-#endif
-
-
 ENTRY (compare_exchange_16)
ldp exp0, exp1, [x1]
cbz w4, 3f
@@ -301,43 +233,97 @@ ENTRY (compare_exchange_16)
 END (compare_exchange_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE)
-   ldp exp0, exp1, [x1]
-   mov tmp0, exp0
-   mov tmp1, exp1
-   cbz w4, 2f
-   cmp w4, RELEASE
-   b.hs3f
+ENTRY (fetch_or_16)
+   mov x5, x0
+   cbnzw4, 2f
 
-   /* ACQUIRE/CONSUME.  */
-   caspa   exp0, exp1, in0, in1, [x0]
-0:
-   cmp exp0, tmp0
-   ccmpexp1, tmp1, 0, eq
-   bne 1f
-   mov x0, 1
+   /* RELAXED.  */
+1: ldxpres0, res1, [x5]
+   orr tmp0, res0, in0
+   orr tmp1, res1, in1
+   stxpw4, tmp0, tmp1, [x5]
+   cbnzw4, 1b
ret
-1:
-   stp exp0, exp1, [x1]
-   mov x0, 0
+
+   /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2: ldaxp   res0, res1, [x5]
+   orr tmp0, res0, in0
+   orr tmp1, res1, in1
+   stlxp   w4, tmp0, tmp1, [x5]
+   cbnzw

[PATCH 1/4] Libatomic: Define per-file identifier macros

2024-05-16 Thread Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.

The idea is that different parts of these headers could then be
conditionally defined depending on the macros set by the file that
`#include'd them.

Given how it is possible that some file names are generic enough that
using them as-is for macro names (e.g. flag.c -> FLAG) may potentially
lead to name clashes with other macros, all file names first have LAT_
prepended to them such that, for example, flag.c is assigned the
LAT_FLAG macro.

Libatomic/ChangeLog:

* cas_n.c (LAT_CAS_N): New.
* exch_n.c (LAT_EXCH_N): Likewise.
* fadd_n.c (LAT_FADD_N): Likewise.
* fand_n.c (LAT_FAND_N): Likewise.
* fence.c (LAT_FENCE): Likewise.
* fenv.c (LAT_FENV): Likewise.
* fior_n.c (LAT_FIOR_N): Likewise.
* flag.c (LAT_FLAG): Likewise.
* fnand_n.c (LAT_FNAND_N): Likewise.
* fop_n.c (LAT_FOP_N): Likewise
* fsub_n.c (LAT_FSUB_N): Likewise.
* fxor_n.c (LAT_FXOR_N): Likewise.
* gcas.c (LAT_GCAS): Likewise.
* gexch.c (LAT_GEXCH): Likewise.
* glfree.c (LAT_GLFREE): Likewise.
* gload.c (LAT_GLOAD): Likewise.
* gstore.c (LAT_GSTORE): Likewise.
* load_n.c (LAT_LOAD_N): Likewise.
* store_n.c (LAT_STORE_N): Likewise.
* tas_n.c (LAT_TAS_N): Likewise.
---
 libatomic/cas_n.c   | 2 ++
 libatomic/exch_n.c  | 2 ++
 libatomic/fadd_n.c  | 2 ++
 libatomic/fand_n.c  | 2 ++
 libatomic/fence.c   | 2 ++
 libatomic/fenv.c| 2 ++
 libatomic/fior_n.c  | 2 ++
 libatomic/flag.c| 2 ++
 libatomic/fnand_n.c | 2 ++
 libatomic/fop_n.c   | 2 ++
 libatomic/fsub_n.c  | 2 ++
 libatomic/fxor_n.c  | 2 ++
 libatomic/gcas.c| 2 ++
 libatomic/gexch.c   | 2 ++
 libatomic/glfree.c  | 2 ++
 libatomic/gload.c   | 2 ++
 libatomic/gstore.c  | 2 ++
 libatomic/load_n.c  | 2 ++
 libatomic/store_n.c | 2 ++
 libatomic/tas_n.c   | 2 ++
 20 files changed, 40 insertions(+)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index a080b990371..2a6357e48db 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_CAS_N
 #include "libatomic_i.h"
 
 
@@ -122,3 +123,4 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, 
UTYPE newval,
 #endif
 
 EXPORT_ALIAS (SIZE(compare_exchange));
+#undef LAT_CAS_N
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index e5ff80769b9..184d3de1009 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_EXCH_N
 #include "libatomic_i.h"
 
 
@@ -126,3 +127,4 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel 
UNUSED)
 #endif
 
 EXPORT_ALIAS (SIZE(exchange));
+#undef LAT_EXCH_N
diff --git a/libatomic/fadd_n.c b/libatomic/fadd_n.c
index bc15b8bc0e6..32b75cec654 100644
--- a/libatomic/fadd_n.c
+++ b/libatomic/fadd_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FADD_N
 #include 
 
 #define NAME   add
@@ -43,3 +44,4 @@
 #endif
 
 #include "fop_n.c"
+#undef LAT_FADD_N
diff --git a/libatomic/fand_n.c b/libatomic/fand_n.c
index ffe9ed8700f..9eab55bcd72 100644
--- a/libatomic/fand_n.c
+++ b/libatomic/fand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FAND_N
 #define NAME   and
 #define OP(X,Y)((X) & (Y))
 #include "fop_n.c"
+#undef LAT_FAND_N
diff --git a/libatomic/fence.c b/libatomic/fence.c
index a9b1e280c5a..4022194a57a 100644
--- a/libatomic/fence.c
+++ b/libatomic/fence.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENCE
 #include "libatomic_i.h"
 
 #include 
@@ -43,3 +44,4 @@ void
 {
   atomic_signal_fence (order);
 }
+#undef LAT_FENCE
diff --git a/libatomic/fenv.c b/libatomic/fenv.c
index 41f187c1f85..dccad356a31 100644
--- a/libatomic/fenv.c
+++ b/libatomic/fenv.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENV
 #include "libatomic_i.h"
 
 #ifdef HAVE_FENV_H
@@ -70,3 +71,4 @@ __atomic_feraiseexcept (int excepts __attribute__ ((unused)))
 }
 #endif
 }
+#undef LAT_FENV
diff --git a/libatomic/fior_n.c b/libatomic/fior_n.c
index 55d0d66b469..2b58d4805d6 100644
--- a/libatomic/fior_n.c
+++ b/libatomic/fior_n.c
@@ -1,3 +1,5 @@
+#define LAT_FIOR_N
 #define NAME   or
 #define OP(X,Y)((X) | (Y))
 #include "fop_n.c"
+#undef LAT_FIOR_N
diff --git a/libatomic/flag.c b/libatomic/flag.c
index e4a5a27819a..8afd80c9130 100644
--- a/libatomic/

RE: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-16 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, May 16, 2024 8:19 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; tamar.christina 
; Richard Biener ; 
richard.sandiford ; Li, Pan2 
Subject: Re: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len

RISC-V part LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
tamar.christina; 
richard.guenther; 
Richard.Sandiford; Pan 
Li
Subject: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len
From: Pan Li mailto:pan2...@intel.com>>

After we support the loop lens for the vectorizable,  we would like to
implement the feature for the RISC-V target.  Given below example:

unsigned vect_a[1923];
unsigned vect_b[1923];

void test (unsigned limit, int n)
{
  for (int i = 0; i < n; i++)
{
  vect_b[i] = limit + i;

  if (vect_a[i] > limit)
{
  ret = vect_b[i];
  return ret;
}

  vect_a[i] = limit;
}
}

Before this patch:
  ...
.L8:
  swa3,0(a5)
  addiw a0,a0,1
  addi  a4,a4,4
  addi  a5,a5,4
  beq   a1,a0,.L2
.L4:
  swa0,0(a4)
  lwa2,0(a5)
  bleu  a2,a3,.L8
  ret

After this patch:
  ...
.L5:
  vsetvli   a5,a3,e8,mf4,ta,ma
  vmv1r.v   v4,v2
  vsetvli   t4,zero,e32,m1,ta,ma
  vmv.v.x   v1,a5
  vadd.vv   v2,v2,v1
  vsetvli   zero,a5,e32,m1,ta,ma
  vadd.vv   v5,v4,v3
  slli  a6,a5,2
  vle32.v   v1,0(t1)
  vmsltu.vv v1,v3,v1
  vcpop.m   t4,v1
  beq   t4,zero,.L4
  vmv.x.s   a4,v4
.L3:
  ...

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md
  (*vcond_mask_len_popcount_):
New pattern of vcond_mask_len_popcount for vector bool mode.
* config/riscv/autovec.md (vcond_mask_len_): New pattern
of vcond_mask_len for vector bool mode.
(cbranch4): New pattern for vector bool mode.
* config/riscv/vector-iterators.md: Add new unspec
  UNSPEC_SELECT_MASK.
* config/riscv/vector.md (@pred_popcount): Add
VLS mode to popcount pattern.
(@pred_popcount): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/early-break-1.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-2.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/autovec-opt.md   | 33 ++
gcc/config/riscv/autovec.md   | 61 +++
gcc/config/riscv/vector-iterators.md  |  1 +
gcc/config/riscv/vector.md| 18 +++---
.../riscv/rvv/autovec/early-break-1.c | 34 +++
.../riscv/rvv/autovec/early-break-2.c | 37 +++
6 files changed, 175 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..04f85d8e455 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,36 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Optimization pattern for early break auto-vectorization
+;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount
+;; -> non vlmax popcount (mask, len)
+(define_insn_and_split "*vcond_mask_len_popcount_"
+  [(set (match_operand:P 0 "register_operand")
+(popcount:P
+ (unspec:VB_VLS [
+  (unspec:VB_VLS [
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "const_1_operand")
+   (match_operand:VB_VLS 3 "const_0_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK)
+  (match_operand 6 "autovec_length_operand")
+  (const_int 1)
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_nonvlmax_insn (
+ code_for_pred_popcount (mode, Pmode),
+ riscv_vector::CPOP_OP,
+ operands, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..1ee3c8052fb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,64 @@ (define_expand "rawmemchr"
 DONE;
   }
)
+
+;; =
+;; == Early break auto-vectorization patterns
+;; ===

RE: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

2024-05-16 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, May 16, 2024 8:19 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; tamar.christina 
; Richard Biener ; 
richard.sandiford ; Li, Pan2 
Subject: Re: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

RISC-V part LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
tamar.christina; 
richard.guenther; 
Richard.Sandiford; Pan 
Li
Subject: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite
From: Pan Li mailto:pan2...@intel.com>>

After we supported vectorizable early exit in RISC-V,  we would like to
enable the gcc vect test for vectorizable early test.

The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during ccp1.  We will improve that first and mark
this as xfail for RISC-V.

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will
have 2 times LOOP VECTORIZED in RISC-V.
* gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the
riscv backend.
* lib/target-supports.exp: Add RISC-V backend.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c  | 2 ++
gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 2 ++
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
index fdd9032da98..2f80bf89e5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -28,6 +28,8 @@ main ()
   if (__builtin_memcmp (x, res, sizeof (x)) != 0)
 abort ();
+
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
 if (flag[i] != 0 && flag[i] != 1)
   abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 51abf245ccb..101ae1e0eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -2,7 +2,7 @@
/* { dg-require-effective-target vect_early_break_hw } */
/* { dg-require-effective-target vect_long_long } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
#include "tree-vect.h"
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 6f5d477b128..ec9baa4f32a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,7 @@ proc check_effective_target_vect_early_break { } {
|| [check_effective_target_arm_v8_neon_ok]
|| [check_effective_target_sse4]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v]
}}]
}
@@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } {
|| [check_effective_target_arm_v8_neon_hw]
|| [check_sse4_hw_available]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v_ok]
}}]
}
--
2.34.1




[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-05-16 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

* configure.ac: Add call to LIBAT_TEST_FEAT_LRCPC3() test.
* configure: Regenerate.
* config/linux/aarch64/host-config.h (has_rcpc3): New.
(HWCAP2_LRCPC3): Likewise.
(LSE2_LRCPC3_ATOP): Likewise.
* libatomic/config/linux/aarch64/atomic_16.S: New +rcpc3 .arch
directives.
* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LRCPC3): New.
(HAVE_FEAT_LRCPC3): Likewise
(ARCH_AARCH64_HAVE_LRCPC3): Likewise.
* auto-config.h.in (HAVE_FEAT_LRCPC3): New.
---
 libatomic/acinclude.m4   | 18 +++
 libatomic/auto-config.h.in   |  3 ++
 libatomic/config/linux/aarch64/atomic_16.S   | 55 +++-
 libatomic/config/linux/aarch64/host-config.h | 39 --
 libatomic/configure  | 41 +++
 libatomic/configure.ac   |  1 +
 6 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index 6d2e0b1c355..628275b9945 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -101,6 +101,24 @@ AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LSE128],[
[Have LSE128 support for 16 byte integers.])
 ])
 
+dnl
+dnl Test if the host assembler supports armv8.2-a RCPC3 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LRCPC3],[
+  AC_CACHE_CHECK([for armv8.2-a LRCPC3 insn support],
+[libat_cv_have_feat_lrcpc3],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv8.2-a+rcpc3")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lrcpc3=yes
+else
+  eval libat_cv_have_feat_lrcpc3=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LRCPC3], [$libat_cv_have_feat_lrcpc3],
+   [Have LRCPC3 support for 16 byte integers.])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index 7c78933b07d..a925686effa 100644
--- a/libatomic/auto-config.h.in
+++ b/libatomic/auto-config.h.in
@@ -108,6 +108,9 @@
 /* Have LSE128 support for 16 byte integers. */
 #undef HAVE_FEAT_LSE128
 
+/* Have LRCPC3 support for 16 byte integers. */
+#undef HAVE_FEAT_LRCPC3
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_FENV_H
 
diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 27363f82b75..47ceb7301c9 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -42,7 +42,13 @@
 
 #if HAVE_IFUNC
 # if HAVE_FEAT_LSE128
+#  if HAVE_FEAT_LRCPC3
+   .arch   armv9-a+lse128+rcpc3
+#  else
.arch   armv9-a+lse128
+#  endif
+# elif HAVE_FEAT_LRCPC3
+   .arch   armv8-a+lse+rcpc3
 # else
.arch   armv8-a+lse
 # endif
@@ -50,9 +56,20 @@
.arch   armv8-a+lse
 #endif
 
+/* There is overlap in some atomic instructions being implemented in both RCPC3
+   and LSE2 extensions, so both _i1 and _i2 suffixes are needed in such
+   situations.  Otherwise, all extension-specific implementations are mapped
+   to _i1.  */
+
+#if HAVE_FEAT_LRCPC3
+# define LRCPC3(NAME)  libat_##NAME##_i1
+# define LSE2(NAME)libat_##NAME##_i2
+#else
+# define LSE2(NAME)libat_##NAME##_i1
+#endif
+
 #define LSE128(NAME)   libat_##NAME##_i1
 #define LSE(NAME)  libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
@@ -722,6 +739,42 @@ ENTRY_FEAT (and_fetch_16, LSE128)
ret
 END_FEAT (and_fetch_16, LSE128)
 #endif /* HAVE_FEAT_LSE128 */
+
+
+#if HAVE_FEAT_LRCPC3
+ENTRY_FEAT (load_16, LRCPC3)
+   cbnzw1, 1f
+
+   /* RELAXED.  */
+   ldp res0, res1, [x0]
+   ret
+1:
+   cmp w1, SEQ_CST
+   b.eq2f
+
+   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
+   ldiapp  res0, res1, [x0]
+   ret
+
+   /* SEQ_CST.  */
+2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
+   ldiapp  res0, res1, [x0]
+   ret
+END_FEAT (load_16, LRCPC3)
+
+
+ENTRY_FEAT (store_16, LRCPC3)
+   cbnzw4, 1f
+
+   /* RELAXED.  */
+   stp in0, in1, [x0]
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+1: stilp   in0, in1, [x0]
+   ret
+END_FEAT (stor

[PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento
At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i *references)
clobbers_memory = true;
break;
  }
+
+  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+   {
+ enum built_in_function fn_type = DECL_FUNCTION_CODE (TREE_OPERAND 
(gimple_call_fn (stmt), 0));
+ if (fn_type == BUILT_IN_PREFETCH)
+   clobbers_memory = false;
+ else
+   clobbers_memory = true;
+   }
   else
clobbers_memory = true;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 361aec06488..65e8b421d80 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   !gsi_end_p (si);)
{
  stmt = gsi_stmt (si);
- /* During vectorization remove existing clobber stmts.  */
+ /* During vectorization remove existing clobber stmts and
+prefetches.  */
  if (gimple_clobber_p (stmt))
{
  unlink_stmt_vdef (stmt);
  gsi_remove (&si, true);
  release_defs (stmt);
}
+ else if (gimple_call_builtin_p (stmt) &&
+  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn (stmt),
+0)) == BUILT_IN_PREFETCH)
+   gsi_remove (&si, true);
  else
{
  /* Ignore vector stmts created in the outer loop.  */
-- 
2.34.1



Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
>
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i a[i] = a[i] + b[i];
> __builtin_prefetch(&(b[i+8]));
>   }
> }
>
> The failure stems from two issues:
>
> 1. Given that it is typically not possible to fully reason about a
>function call due to the possibility of side effects, the
>autovectorizer does not attempt to vectorize loops which make such
>calls.
>
>Given the memory reference passed to `__builtin_prefetch', in the
>absence of assurances about its effect on the passed memory
>location the compiler deems the function unsafe to vectorize,
>marking it as clobbering memory in `vect_find_stmt_data_reference'.
>This leads to the failure in autovectorization.
>
> 2. Notwithstanding the above issue, though the prefetch statement
>would be classed as `vect_unused_in_scope', the loop invariant that
>is used in the address of the prefetch is the scalar loop's and not
>the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>because the instruction wasn't vectorized, causing DCE to think the
>value is live, such that we now have both the vector and scalar loop
>invariant actively used in the loop.
>
> This patch addresses both of these:
>
> 1. About the issue regarding the memory clobber, data prefetch does
>not generate faults if its address argument is invalid and does not
>write to memory.  Therefore, it does not alter the internal state
>of the program or its control flow under any circumstance.  As
>such, it is reasonable that the function be marked as not affecting
>memory contents.
>
>To achieve this, we add the necessary logic to
>`get_references_in_stmt' to ensure that builtin functions are given
>given the same treatment as internal functions.  If the gimple call
>is to a builtin function and its function code is
>`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
>
> 2. Finding precedence in the way clobber statements are handled,
>whereby the vectorizer drops these from both the scalar and
>vectorized versions of a given loop, we choose to drop prefetch
>hints in a similar fashion.  This seems appropriate given how
>software prefetch hints are typically ignored by processors across
>architectures, as they seldom lead to performance gain over their
>hardware counterparts.
>
>PR target/114061
>

This most likely be tree-optimization/114061 since it is a generic
vectorizer issue. Oh maybe reference the bug # in summary next time just
for easier reference.

Thanks,
Andrew


> gcc/ChangeLog:
>
> * tree-data-ref.cc (get_references_in_stmt): set
> `clobbers_memory' to false for __builtin_prefetch.
> * tree-vect-loop.cc (vect_transform_loop): Drop all
> __builtin_prefetch calls from loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-prefetch-drop.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
>  gcc/tree-data-ref.cc   |  9 +
>  gcc/tree-vect-loop.cc  |  7 ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 000..57723a8c972
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
> -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m,
> z\[0-9\]+.d, z\[0-9\]+.d" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index f37734b5340..47bfec0f922 100644
> --- a/gcc/tree-data-ref.cc
> +++ b/gcc/tree-data-ref.cc
> @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
> vec *references)
> clobbers_memory = true;
> break;
>   }
> +
> +  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> +   {
> + enum built_in_function fn_type = DECL_FUNCTION_CODE
> (TREE_OPERAND (gimple_call_fn (stmt), 0));
> + if (fn_type == BUILT_IN_PREFETCH)
> +   clobbers_memory = false;
> + else
> +   clobbers_memory = true

Re: Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-16 Thread 钟居哲
LGTM this patch (fix for vfwadd.wf).

And here is a simpel case to reproduce same bug for vwadd.wx:

https://compiler-explorer.com/z/4rP9Yvdq1

#include 
#include 

vint64m8_t test_vwadd_wx_i64m8_m(vbool8_t vm, vint64m8_t vs2, int rs1, size_t 
vl) {
  return __riscv_vwadd_wx_i64m8_m(vm, vs2, rs1, vl);
}

char global_memory[1024];
void *fake_memory = (void *)global_memory;

int main ()
{
  asm volatile("fence":::"memory");
  long x;
  asm volatile("":"=r"(x)::"memory");
  vint64m8_t vwadd_wx_i64m8_m_vd = test_vwadd_wx_i64m8_m(
__riscv_vreinterpret_v_i8m1_b8(__riscv_vundefined_i8m1()), 
__riscv_vundefined_i64m8(), x, __riscv_vsetvlmax_e64m8());
  asm volatile(""::"vr"(vwadd_wx_i64m8_m_vd):"memory");

  return 0;
}

main:
fence
vsetvli a4,zero,e32,m4,ta,ma
vwadd.wxv0,v8,a5,v0.t > vd and vm are both v0 which is 
wrong.
li  a0,0
ret


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-16 03:31
To: 钟居哲; gcc-patches
CC: rdapp.gcc; palmer; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
> I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ?
 
Yes, will do.  At first I didn't manage to reproduce it because we
seem to be lacking a combine-opt pattern for it.  I'm going to post
it separately.
 
Regards
Robin
 
 


Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento

On 5/16/24 15:16, Andrew Pinski wrote:



On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento 
mailto:victor.donascime...@arm.com>> wrote:


At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
   int i;
   for(i=0; iThis most likely be tree-optimization/114061 since it is a generic 
vectorizer issue. Oh maybe reference the bug # in summary next time just 
for easier reference.


Thanks,
Andrew


My bad.

You're right, it's tree-optimization/114061.  Thanks for catching this.

Cheers,
Victor



gcc/ChangeLog:

         * tree-data-ref.cc (get_references_in_stmt): set
         `clobbers_memory' to false for __builtin_prefetch.
         * tree-vect-loop.cc (vect_transform_loop): Drop all
         __builtin_prefetch calls from loops.

gcc/testsuite/ChangeLog:

         * gcc.dg/vect/vect-prefetch-drop.c: New test.
---
  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
  gcc/tree-data-ref.cc                           |  9 +
  gcc/tree-vect-loop.cc                          |  7 ++-
  3 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
new file mode 100644
index 000..57723a8c972
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
-fdump-tree-vect-details" { target { aarch64*-*-* } } } */
+
+void foo(double * restrict a, double * restrict b, int n){
+  int i;
+  for(i=0; i+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" 
} } */

diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index f37734b5340..47bfec0f922 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
vec *references)
             clobbers_memory = true;
             break;
           }
+
+      else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+       {
+         enum built_in_function fn_type = DECL_FUNCTION_CODE
(TREE_OPERAND (gimple_call_fn (stmt), 0));
+         if (fn_type == BUILT_IN_PREFETCH)
+           clobbers_memory = false;
+         else
+           clobbers_memory = true;
+       }
        else
         clobbers_memory = true;
      }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 361aec06488..65e8b421d80 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info
loop_vinfo, gimple *loop_vectorized_call)
            !gsi_end_p (si);)
         {
           stmt = gsi_stmt (si);
-         /* During vectorization remove existing clobber stmts.  */
+         /* During vectorization remove existing clobber stmts and
+            prefetches.  */
           if (gimple_clobber_p (stmt))
             {
               unlink_stmt_vdef (stmt);
               gsi_remove (&si, true);
               release_defs (stmt);
             }
+         else if (gimple_call_builtin_p (stmt) &&
+                  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn
(stmt),
+                                                    0)) ==
BUILT_IN_PREFETCH)
+               gsi_remove (&si, true);
           else
             {
               /* Ignore vector stmts created in the outer loop.  */
-- 
2.34.1




[PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Victor Do Nascimento
From: Victor Do Nascimento 

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; ihttps://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
[2] 
https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
renamed to `dot_prod_twoway_vnx8hi'.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
update icodes used in line with above rename.
* optabs-tree.cc (optab_for_tree_code_1): Renamed
`optab_for_tree_code' and added new argument.
(optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
* optabs-tree.h (optab_for_tree_code_1): New.
* optabs.cc (expand_widen_pattern_expr): Expand support for
DOT_PROD_EXPR patterns.
* optabs.def (udot_prod_twoway_optab): New.
(sdot_prod_twoway_optab): Likewise.
* tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
support for misc optabs that use two modes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-dotprod-twoway.c: New.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/optabs-tree.cc| 23 --
 gcc/optabs-tree.h |  2 ++
 gcc/optabs.cc |  2 +-
 gcc/optabs.def|  2 ++
 .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
 gcc/tree-vect-patterns.cc |  2 +-
 8 files changed, 54 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..e457db09f66 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -764,8 +764,8 @@ public:
   icode = (e.type_suffix (0).float_p
   ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
   : e.type_suffix (0).unsigned_p
-  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
-  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
+  ? CODE_FOR_udot_prod_twoway_vnx8hi
+  : CODE_FOR_sdot_prod_twoway_vnx8hi);
 return e.use_unpred_insn (icode);
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3..5677de7108d 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub__lane_"
 )
 
 ;; Two-way dot-product.
-(define_insn "@aarch64_sve_dotvnx4sivnx8hi"
+(define_insn "dot_prod_twoway_vnx8hi"
   [(set (match_operand:VNx4SI 0 "register_operand")
(plus:VNx4SI
  (unspec:VNx4SI
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index b69a5bc3676..e3c5a618ea2 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
cannot give complete results for multiplication or division) but probably
ought to be relied on more widely throughout the expander.  */
 optab
-optab_for_tree_code (enum tree_code code, const_tree type,
-enum optab_subtype subtype)
+optab_for_tree_code_1 (enum tree_code code, const_tree type,
+  const_tree otype, enum optab_subtype subtype)
 {
   bool trapv;
   switch (code)
@@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 
 case DOT_PROD_EXPR:
   {
+   if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
+ == TYPE_PRECISION (TREE_TYPE (otype
+ {
+   if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
+ return udot_prod_twoway_optab;
+   if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
+ return sdot_prod_twoway_optab;
+ }
if (subtype == optab_vector_mi

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>

Since you are adding an optab, please update the internals manual with the
documentation of the optab (the standard pattern names section).

Thanks,
Andrew


> [1]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-16 Thread Segher Boessenkool
Hi!

On Thu, May 16, 2024 at 02:56:49PM +0800, Jiufu Guo wrote:
> Jiufu Guo  writes:
> > Segher Boessenkool  writes:
> >> On Tue, May 14, 2024 at 05:53:56PM +0800, Jiufu Guo wrote:
> >>> Thanks so much for your great review!
> >>> Reference other messages, I'm wondering "invalid %%a value" may be
> >>> acceptable, or "invalid %%a address expression in TOC" maybe better.
> >>
> >> "%%a requires a memory operand"?  Maybe even print out the actual
> >> operand given, too.
> >
> > Thanks! I updated the code using:
> > "%%a requires a memory reference operand", since the actual operand
> > is treated as the address.
> 
> I suspect one thing here: if "%%a requires memory" is accurate vs.
> "%%a requires a memory reference".
> 
> Reference the words from doc:
> https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Generic-Operand-Modifiers
> a: Substitute a memory reference, with the actual operand treated as the
> address.
> 
> And for below code:
> '("#%a0" : :"m"(x))' is not accepted.

Yeah, it always confuses me.  Sorry.  The operand is the actual address.

> While '("#%a0" : :"r"(&x))' is ok.
> 
> So, it may be more accurate that: "%%a" as requirement of address of
> memory.

That sounds good yes.


Segher


[PATCH] c++: paren aggr CTAD with base classes [PR115114]

2024-05-16 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk and perhaps 14?

-- >8 --

We're accidentally ignoring base classes during parenthesized aggregate
CTAD because the TYPE_FIELDS of a template type doesn't contain bases,
so we need to consider them separately.

PR c++/115114

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Consider base classes in the paren
init case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr15.C: New test.
---
 gcc/cp/pt.cc  |  7 ++
 .../g++.dg/cpp2a/class-deduction-aggr15.C | 23 +++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d83f530ac8d..54d74989903 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30202,6 +30202,13 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
   else if (TREE_CODE (init) == TREE_LIST)
 {
   int len = list_length (init);
+  for (tree binfo : BINFO_BASE_BINFOS (TYPE_BINFO (template_type)))
+   {
+ if (!len)
+   break;
+ parms = tree_cons (NULL_TREE, BINFO_TYPE (binfo), parms);
+ --len;
+   }
   for (tree field = TYPE_FIELDS (template_type);
   len;
   --len, field = DECL_CHAIN (field))
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C
new file mode 100644
index 000..16dc0f52b64
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C
@@ -0,0 +1,23 @@
+// PR c++/115114
+// { dg-do compile { target c++20 } }
+
+struct X {} x;
+struct Y {} y;
+
+template
+struct A : T {
+  U m;
+};
+
+using ty1 = decltype(A{x, 42}); // OK
+using ty1 = decltype(A(x, 42)); // OK, used to fail
+using ty1 = A;
+
+template
+struct B : T, V {
+  U m = 42;
+};
+
+using ty2 = decltype(B{x, y}); // OK
+using ty2 = decltype(B(x, y)); // OK, used to fail
+using ty2 = B;
-- 
2.45.1.190.g19fe900cfc



[PATCH] attribs: Fix and refactor diag_attr_exclusions

2024-05-16 Thread Andrew Carlotti
The existing implementation of this function was convoluted, and had
multiple control flow errors that became apparent to me while reading
the code:

1. The initial early return only checked the properties of the first
exclusion in the list, when these properties could be different for
subsequent exclusions.

2. excl was not reset within the outer loop, so the inner loop body
would only execute during the first iteration of the outer loop.  This
effectively meant that the value of attrs[1] was ignored.

3. The function called itself recursively twice, with both last_decl and
TREE_TYPE (last_decl) as parameters. The second recursive call should
have been redundant, since attrs[1] = TREE_TYPE (last_decl) during the
first recursive call.

This patch eliminated the early return, and combines the checks with
those present within the inner loop.  It also fixes the inner loop
initialisation, and modifies the outer loop to iterate over nodes
instead of their attributes. This latter change allows the recursion to
be eliminated, by extending the new nodes array to include last_decl
(and its type) as well.

This patch provides an alternative fix for PR114634, although I wasn't
aware of that issue until rebasing on top of Jakub's fix.

I am not aware of any other compiler bugs resulting from these issues.
However, if the exclusions for target_clones were listed in the opposite
order, then it would have broken detection of the always_inline
exclusion on aarch64 (where TARGET_HAS_FMV_TARGET_ATTRIBUTE is false).

Is this ok for master?

gcc/ChangeLog:

* attribs.cc (diag_attr_exclusions): Fix and refactor.


diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
3ab0b0fd87a4404a593b2de365ea5226e31fe24a..431dd4255e68e92dd8d10bbb21ea079e50811faa
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -433,84 +433,69 @@ get_attribute_namespace (const_tree attr)
or a TYPE.  */
 
 static bool
-diag_attr_exclusions (tree last_decl, tree node, tree attrname,
+diag_attr_exclusions (tree last_decl, tree base_node, tree attrname,
  const attribute_spec *spec)
 {
-  const attribute_spec::exclusions *excl = spec->exclude;
 
-  tree_code code = TREE_CODE (node);
+  /* BASE_NODE is either the current decl to which the attribute is being
+ applied, or its type.  For the former, consider the attributes on both the
+ decl and its type.  Check both LAST_DECL and its type as well.  */
 
-  if ((code == FUNCTION_DECL && !excl->function
-   && (!excl->type || !spec->affects_type_identity))
-  || (code == VAR_DECL && !excl->variable
- && (!excl->type || !spec->affects_type_identity))
-  || (((code == TYPE_DECL || RECORD_OR_UNION_TYPE_P (node)) && 
!excl->type)))
-return false;
+  tree nodes[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
 
-  /* True if an attribute that's mutually exclusive with ATTRNAME
- has been found.  */
-  bool found = false;
+  nodes[0] = base_node;
+  if (DECL_P (base_node))
+  nodes[1] = (TREE_TYPE (base_node));
 
-  if (last_decl && last_decl != node && TREE_TYPE (last_decl) != node)
+  if (last_decl)
 {
-  /* Check both the last DECL and its type for conflicts with
-the attribute being added to the current decl or type.  */
-  found |= diag_attr_exclusions (last_decl, last_decl, attrname, spec);
-  tree decl_type = TREE_TYPE (last_decl);
-  found |= diag_attr_exclusions (last_decl, decl_type, attrname, spec);
+  nodes[2] = last_decl;
+  if (DECL_P (last_decl))
+ nodes[3] = TREE_TYPE (last_decl);
 }
 
-  /* NODE is either the current DECL to which the attribute is being
- applied or its TYPE.  For the former, consider the attributes on
- both the DECL and its type.  */
-  tree attrs[2];
-
-  if (DECL_P (node))
-{
-  attrs[0] = DECL_ATTRIBUTES (node);
-  if (TREE_TYPE (node))
-   attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
-  else
-   /* TREE_TYPE can be NULL e.g. while processing attributes on
-  enumerators.  */
-   attrs[1] = NULL_TREE;
-}
-  else
-{
-  attrs[0] = TYPE_ATTRIBUTES (node);
-  attrs[1] = NULL_TREE;
-}
+  /* True if an attribute that's mutually exclusive with ATTRNAME
+ has been found.  */
+  bool found = false;
 
   /* Iterate over the mutually exclusive attribute names and verify
  that the symbol doesn't contain it.  */
-  for (unsigned i = 0; i != ARRAY_SIZE (attrs); ++i)
+  for (unsigned i = 0; i != ARRAY_SIZE (nodes); ++i)
 {
-  if (!attrs[i])
+  tree node = nodes[i];
+
+  if (!node)
continue;
 
-  for ( ; excl->name; ++excl)
+  tree attr;
+  if DECL_P (node)
+   attr = DECL_ATTRIBUTES (node);
+  else
+   attr = TYPE_ATTRIBUTES (node);
+
+  tree_code code = TREE_CODE (node);
+
+  for (auto excl = spec->exclude; excl->name; ++excl)
{
  /* Avoid checking the attribute against itself.  */
  if (is_attribute_p (excl->name, attrna

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-16 Thread Alex Coplan
Hi Ajit,

Thanks a lot for working through the review feedback.

The patch LGTM with the two minor suggested changes below.  I can't
approve the patch, though, so you'll need an OK from Richard S.

Also, I'm not sure if it makes sense to apply the patch in isolation, it
might make more sense to only apply it in series with follow-up patches to:
 - Finish renaming any bits of the generic code that need renaming (I
   guess we'll want to rename at least ldp_bb_info to something else,
   probably there are other bits too).
 - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
   middle-end.

I'll let Richard S make the final judgement on that.  I don't really
mind either way.

On 15/05/2024 15:06, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface between target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-05-15  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
>  1 file changed, 357 insertions(+), 176 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..429e532ea3b 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,225 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// When querying handle_writeback_opportunities, this enum is used to
> +// qualify which opportunities we are asking about.
> +enum class writeback {
> +  // Only those writeback opportunities that arise from existing
> +  // auto-increment accesses.
> +  EXISTING,

Very minor nit: I think an extra blank line here would be nice for readability
now that the enumerators have comments above.

> +  // All writeback opportunities including those that involve folding
> +  // base register updates into a non-writeback pair.
> +  ALL
> +};
> +

Can we have a block comment here which describes the purpose of the
class and how it fits together with the target?  Something like the
following would do:

// This class can be overriden by targets to give a pass that fuses
// adjacent loads and stores into load/store pair instructions.
//
// The target can override the various virtual functions to customize
// the behaviour of the pass as appropriate for the target.

> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };
> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming pairs from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mode) = 0;
> +
> +  // Return alias check limit.
> +  // This is needed to avoi

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread Segher Boessenkool
Hi!

On Fri, Apr 12, 2024 at 04:24:23PM +0800, HAO CHEN GUI wrote:
>   This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test
> data class instructions.
> 
>   This patch relies on former patch which adds optab_isnormal.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isnormal2): New expand for SFmode and
>   DFmode.

* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5357,6 +5357,30 @@ (define_expand "isfinite2"
>DONE;
>  })
> 
> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> + (use (match_operand:SFDF 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +   && TARGET_P9_VECTOR"

Please put the condition on just one line if it is as simple and short
as this.

Why is TARGET_P9_VECTOR the correct condition?

> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];

This is an expander.  can_create_pseudo_p always return true.  Please
simplify the code, keeping that in mind :-)

> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> + (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +   && TARGET_P9_VECTOR"
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> +  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
> +  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
> +  DONE;
> +})

Same issues here, of course.

> +

Why add radom white lines?  Pleaase don't.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */

If you use a -mcpu=, don't use vsx_ok.

If you use a -mcpu=, don't use -mvsx.

> +int test1 (double x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */

Just \mfcmp please (so that it also catches fcmpo, if we ever generate
that).

> +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */

Maybe you should test for one each of the s and d version?  So just
/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target lp64 } } */

Why run this on 64-bit systems only?  If there is a reason, document
that here (but is there a reason?)

> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
> -Wno-psabi" } */

Same comments here: If you have a -mcpu you do not want vsx_ok or -mvsx.

Please fix these things and resend.  Thanks!


Segher


RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Tamar Christina
Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Thursday, May 16, 2024 3:39 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> 
> From: Victor Do Nascimento 
> 
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
> 
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
> 
> This means that the following example fails autovectorization:
> 
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
> 
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
> 
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> 
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
> 
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
> 
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
>   renamed to `dot_prod_twoway_vnx8hi'.
>   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>   update icodes used in line with above rename.

Please split the target specific bits from the target agnostic parts.
I.e. this patch series should be split in two.

>   * optabs-tree.cc (optab_for_tree_code_1): Renamed
>   `optab_for_tree_code' and added new argument.
>   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>   * optabs-tree.h (optab_for_tree_code_1): New.
>   * optabs.cc (expand_widen_pattern_expr): Expand support for
>   DOT_PROD_EXPR patterns.
>   * optabs.def (udot_prod_twoway_optab): New.
>   (sdot_prod_twoway_optab): Likewise.
>   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>   support for misc optabs that use two modes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>  : e.type_suffix (0).unsigned_p
> -? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -: CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +? CODE_FOR_udot_prod_twoway_vnx8hi
> +: CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarc

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 7:46 PM Tamar Christina 
wrote:

> Hi Victor,
>
> > -Original Message-
> > From: Victor Do Nascimento 
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Richard Earnshaw
> > ; Victor Do Nascimento
> > 
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_dotvnx4sivnx8hi):
> >   renamed to `dot_prod_twoway_vnx8hi'.
> >   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >   update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> >   * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >   `optab_for_tree_code' and added new argument.
> >   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >   * optabs-tree.h (optab_for_tree_code_1): New.
> >   * optabs.cc (expand_widen_pattern_expr): Expand support for
> >   DOT_PROD_EXPR patterns.
> >   * optabs.def (udot_prod_twoway_optab): New.
> >   (sdot_prod_twoway_optab): Likewise.
> >   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >   support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md|  2 +-
> >  gcc/optabs-tree.cc| 23 --
> >  gcc/optabs-tree.h |  2 ++
> >  gcc/optabs.cc |  2 +-
> >  gcc/optabs.def|  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
> >  gcc/tree-vect-patterns.cc |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >icode = (e.type_suffix (0).float_p
> >  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >  : e.type_suffix (0).unsigned_p
> > -? CODE_FOR_aarch64_sve_udotvnx4sivnx8h

  1   2   >