Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-21 Thread Richard Biener
On Fri, Jun 21, 2024 at 5:53 AM  wrote:
>
> From: Pan Li 
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB
>   } while (--n);
> }
>
> It will have gimple after ifcvt pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to do some reconcile for above pattern to match
> the SAT_SUB pattern.  Then the underlying vect pass is able to vectorize
> the SAT_SUB.

Hmm.  I was thinking of allowing

/* Unsigned saturation sub, case 2 (branch with ge):
   SAT_U_SUB = X >= Y ? X - Y : 0.  */
(match (unsigned_integer_sat_sub @0 @1)
 (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0, @1

to match this by changing it to

/* Unsigned saturation sub, case 2 (branch with ge):
   SAT_U_SUB = X >= Y ? X - Y : 0.  */
(match (unsigned_integer_sat_sub @0 @1)
 (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0, @1

and when using the gimple_match_* function make sure to consider
that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
we matched?

Richard.

> _2 = a_11 - b_12(D);
> _18 = a_11 >= b_12(D);
> _pattmp = _18 ? _2 : 0; // .SAT_SUB pattern
> iftmp.0_13 = (short unsigned int) _pattmp;
> iftmp.0_5 = iftmp.0_13;
>
> The below tests are running for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * match.pd: Add new match for trunated unsigned sat_sub.
> * tree-if-conv.cc (gimple_truncated_unsigned_integer_sat_sub):
> New external decl from match.pd.
> (tree_if_cond_reconcile_unsigned_integer_sat_sub): New func impl
> to reconcile the truncated sat_sub pattern.
> (tree_if_cond_reconcile): New func impl to reconcile.
> (pass_if_conversion::execute): Try to reconcile after ifcvt.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd|  9 +
>  gcc/tree-if-conv.cc | 83 +
>  2 files changed, 92 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..9617a5f9d5e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3210,6 +3210,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub and then truncated, aka:
> +   Truncated = X >= Y ? (Other Type) (X - Y) : 0.
> + */
> +(match (truncated_unsigned_integer_sat_sub @0 @1)
> + (cond (ge @0 @1) (convert (minus @0 @1)) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (@0, @1)
> +  && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 57992b6deca..535743130f2 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -3738,6 +3738,87 @@ bitfields_to_lower_p (class loop *loop,
>return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
>  }
>
> +extern bool gimple_truncated_unsigned_integer_sat_sub (tree, tree*,
> +  tree (*)(tree));
> +
> +/*
> + * Try to reconcile the stmt pattern as below to math the SAT_SUB
> + * in vectorization.  If and only if the related internal_fn has
> + * been implemented already.
> + *
> + * The reconcile will insert one new stmt named 'a' in below example,
> + * replace the stmt '4' by new added stmt 'b' as well.  Then the stmt
> + * pattern is able to hit the SAT_SUB pattern in the underlying pass.
> + *
> + * 1. _2 = a_11 - b_12(D);
> + * 2. iftmp.0_13 = (short unsigned int) _2;
> + * 3. _18 = a_11 >= b_12(D);
> + * 4. iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> + * ==>
> + * 1. _2 = a_11 - b_12(D);
> + * 3. _18 = a_11 >= b_12(D);
> + * a. pattmp = _18 ? _2 : 0; // New insertion
> + * 2. iftmp.0_13 = (short unsigned int) _pattmp; // Move before
> + * b. iftmp.0_5 = iftmp.0_13;
> + *== Replace ==> 4. iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> + */
> +static void
> +tree_if_cond_reconcile_unsigned_integer_sat_sub (gimple_stmt_iterator *gsi,
> +gassign *stmt)
> +{
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +  bool supported_p = direct_internal_fn_supported_p (IFN_SAT_SUB,
> +  

Re: [PATCH v3 6/6] aarch64: Add DLL import/export to AArch64 target

2024-06-21 Thread Evgeny Karpov
Monday, June 10, 2024 7:03 PM
Richard Sandiford  wrote:

> Thanks for the update.  Parts 1-5 look good to me.  Some minor comments
> below about part 6:
> 
> If the TARGET_DLLIMPORT_DECL_ATTRIBUTES condition can be dropped, the
> series is OK from my POV with that change and with the changes above.
> Please get sign-off from an x86 maintainer too though.

Thank you for the review and suggestions. Here is the updated version of patch 
6, based on the comments.
The x86 and mingw maintainers have already approved the series.

Regards,
Evgeny 



This patch reuses the MinGW implementation to enable DLL import/export
functionality for the aarch64-w64-mingw32 target. It also modifies
environment configurations for MinGW.

gcc/ChangeLog:

* config.gcc: Add winnt-dll.o, which contains the DLL
import/export implementation.
* config/aarch64/aarch64.cc (aarch64_legitimize_pe_coff_symbol):
Add a conditional function that reuses the MinGW implementation
for COFF and does nothing otherwise.
(aarch64_expand_call): Add dllimport implementation.
(aarch64_legitimize_address): Likewise.
* config/aarch64/cygming.h (SYMBOL_FLAG_DLLIMPORT): Modify MinGW
environment to support DLL import/export.
(SYMBOL_FLAG_DLLEXPORT): Likewise.
(SYMBOL_REF_DLLIMPORT_P): Likewise.
(SYMBOL_FLAG_STUBVAR): Likewise.
(SYMBOL_REF_STUBVAR_P): Likewise.
(TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
(TARGET_ASM_FILE_END): Likewise.
(SUB_TARGET_RECORD_STUB): Likewise.
(GOT_ALIAS_SET): Likewise.
(PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Likewise.
(HAVE_64BIT_POINTERS): Likewise.
---
 gcc/config.gcc|  4 +++-
 gcc/config/aarch64/aarch64.cc | 26 ++
 gcc/config/aarch64/cygming.h  | 26 --
 3 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d053b98efa8..331285b7b6d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1276,10 +1276,12 @@ aarch64-*-mingw*)
tm_file="${tm_file} mingw/mingw32.h"
tm_file="${tm_file} mingw/mingw-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
tmake_file="${tmake_file} aarch64/t-aarch64"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
-   extra_objs="${extra_objs} winnt.o"
+   extra_objs="${extra_objs} winnt.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
d_target_objs="${d_target_objs} winnt-d.o"
tmake_file="${tmake_file} mingw/t-cygming"
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3418e57218f..32e31e08449 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -860,6 +860,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
   { "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
   { "SVE type",  3, 3, false, true,  false, true,  NULL, NULL 
},
   { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
+#if TARGET_DLLIMPORT_DECL_ATTRIBUTES
+  { "dllimport", 0, 0, false, false, false, false, handle_dll_attribute, NULL 
},
+  { "dllexport", 0, 0, false, false, false, false, handle_dll_attribute, NULL 
},
+#endif
 #ifdef SUBTARGET_ATTRIBUTE_TABLE
   SUBTARGET_ATTRIBUTE_TABLE
 #endif
@@ -2865,6 +2869,15 @@ static void
 aarch64_load_symref_appropriately (rtx dest, rtx imm,
   enum aarch64_symbol_type type)
 {
+#if TARGET_PECOFF
+  rtx tmp = legitimize_pe_coff_symbol (imm, true);
+  if (tmp)
+{
+  emit_insn (gen_rtx_SET (dest, tmp));
+  return;
+}
+#endif
+
   switch (type)
 {
 case SYMBOL_SMALL_ABSOLUTE:
@@ -11233,6 +11246,13 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, 
bool sibcall)
 
   gcc_assert (MEM_P (mem));
   callee = XEXP (mem, 0);
+
+#if TARGET_PECOFF
+  tmp = legitimize_pe_coff_symbol (callee, false);
+  if (tmp)
+callee = tmp;
+#endif
+
   mode = GET_MODE (callee);
   gcc_assert (mode == Pmode);
 
@@ -12709,6 +12729,12 @@ aarch64_anchor_offset (HOST_WIDE_INT offset, 
HOST_WIDE_INT size,
 static rtx
 aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
 {
+#if TARGET_PECOFF
+  rtx tmp = legitimize_pe_coff_symbol (x, true);
+  if (tmp)
+return tmp;
+#endif
+
   /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
  where mask is selected by alignment and size of the offset.
  We try to pick as large a range for the offset as possible to
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 76623153080..e26488735db 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -28,12 +28,18 

[PATCH] MATCH: Simplify (vec CMP vec) eq/ne (vec CMP vec) [PR111150]

2024-06-21 Thread Eikansh Gupta
We can optimize (vec_cond eq/ne vec_cond) when vec_cond is a
result of (vec CMP vec). The optimization is because of the
observation that in vec_cond, (-1 != 0) is true. So, we can
generate vec_cond of xor of vec resulting in a single
VEC_COND_EXPR instead of 3.

The patch adds match pattern for vec a, b:
(a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0
(a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0

PR tree-optimization/50

gcc/ChangeLog:

* match.pd: Optimization for above mentioned pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr50.c: New test.

Signed-off-by: Eikansh Gupta 
---
 gcc/match.pd | 18 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr50.c | 19 +++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr50.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 3d0689c9312..5cb78bd7ff9 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5522,6 +5522,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (vec_cond (bit_and (bit_not @0) @1) @2 @3)))
 #endif
 
+/* (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 */
+/* (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 */
+(for eqne (eq ne)
+ (simplify
+  (eqne:c (vec_cond @0 uniform_integer_cst_p@2 uniform_integer_cst_p@3)
+ (vec_cond @1 @2 @3))
+  (with
+   {
+ tree newop1 = @2;
+ tree newop2 = @3;
+ if (eqne == NE_EXPR)
+   std::swap (newop1, newop2);
+   }
+   (if (integer_all_onesp (@2) && integer_zerop (@3))
+(vec_cond (bit_xor @0 @1) {newop1;} {newop2;})
+(if (integer_all_onesp (@3) && integer_zerop (@2))
+ (vec_cond (bit_xor @0 @1) {newop2;} {newop1;}))
+
 /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
types are compatible.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr50.c
new file mode 100644
index 000..d10564fd722
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr50.c
@@ -0,0 +1,19 @@
+/* PR tree-optimization/50 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1" } */
+
+typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
+
+v4si f1_(v4si a, v4si b, v4si c, v4si d) {
+  v4si X = a == b;
+  v4si Y = c == d;
+  return (X != Y);
+}
+
+v4si f2_(v4si a, v4si b, v4si c, v4si d) {
+  v4si X = a == b;
+  v4si Y = c == d;
+  return (X == Y);
+}
+
+/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 2 "forwprop1" } } */
-- 
2.17.1



[PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-21 Thread Alexandre Oliva
On Jun 20, 2024, Christophe Lyon  wrote:

> Maybe using
> if ((unsigned)b[i] >= BITS) \
> would be clearer?

Heh.  Why make it simpler if we can make it unreadable, right? :-D

Thanks, here's another version I've just retested on x-arm-eabi.  Ok?

I'm not sure how to credit your suggestion.  It's not like you pretty
much wrote the entire patch, as in Richard's case, but it's still a
sizable chunk of this two-liner.  Any preferences?


The test was too optimistic, alas.  We used to vectorize shifts
involving 8-bit and 16-bit integral types by clamping the shift count
at the highest in-range shift count, but that was not correct: such
narrow shifts expect integral promotion, so larger shift counts should
be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
before the fix).

Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.

So the test that expected the incorrect clamping we no longer perform
needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization.


Co-Authored-By: Richard Earnshaw 

for  gcc/testsuite/ChangeLog

PR tree-optimization/113281
* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
index 8c7adef9ed8f1..03078de49c65e 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -9,6 +9,8 @@
   void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
__restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
 int i; \
 for (i=0; i= (unsigned)(BITS))  \
+   __builtin_unreachable();\
   dest[i] = a[i] OP b[i];  \
 }  \
 }


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] MATCH: Simplify (vec CMP vec) eq/ne (vec CMP vec) [PR111150]

2024-06-21 Thread Richard Biener
On Fri, Jun 21, 2024 at 9:12 AM Eikansh Gupta  wrote:
>
> We can optimize (vec_cond eq/ne vec_cond) when vec_cond is a
> result of (vec CMP vec). The optimization is because of the
> observation that in vec_cond, (-1 != 0) is true. So, we can
> generate vec_cond of xor of vec resulting in a single
> VEC_COND_EXPR instead of 3.
>
> The patch adds match pattern for vec a, b:
> (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0
> (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0

Why should this only work for uniform -1 and 0 vectors?
It seems to me it's valid for arbitrary values, thus

 (a ? x : y) != (b ? x : y) -> a^b ? x : y
 (a ? x : y) == (b ? x : y) -> a^b ? y : x

no?

> PR tree-optimization/50
>
> gcc/ChangeLog:
>
> * match.pd: Optimization for above mentioned pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr50.c: New test.
>
> Signed-off-by: Eikansh Gupta 
> ---
>  gcc/match.pd | 18 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50.c | 19 +++
>  2 files changed, 37 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr50.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..5cb78bd7ff9 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5522,6 +5522,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
>  #endif
>
> +/* (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 */
> +/* (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 */
> +(for eqne (eq ne)
> + (simplify
> +  (eqne:c (vec_cond @0 uniform_integer_cst_p@2 uniform_integer_cst_p@3)
> + (vec_cond @1 @2 @3))
> +  (with
> +   {
> + tree newop1 = @2;
> + tree newop2 = @3;
> + if (eqne == NE_EXPR)
> +   std::swap (newop1, newop2);
> +   }
> +   (if (integer_all_onesp (@2) && integer_zerop (@3))
> +(vec_cond (bit_xor @0 @1) {newop1;} {newop2;})
> +(if (integer_all_onesp (@3) && integer_zerop (@2))
> + (vec_cond (bit_xor @0 @1) {newop2;} {newop1;}))
> +
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> types are compatible.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr50.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr50.c
> new file mode 100644
> index 000..d10564fd722
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr50.c
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/50 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-forwprop1" } */
> +
> +typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
> +
> +v4si f1_(v4si a, v4si b, v4si c, v4si d) {
> +  v4si X = a == b;
> +  v4si Y = c == d;
> +  return (X != Y);
> +}
> +
> +v4si f2_(v4si a, v4si b, v4si c, v4si d) {
> +  v4si X = a == b;
> +  v4si Y = c == d;
> +  return (X == Y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 2 "forwprop1" } } */
> --
> 2.17.1
>


Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-21 Thread Richard Sandiford
Oleg Endo  writes:
> On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote:
>> 
>> I tried compiling at least one target per CPU directory and comparing
>> the assembly output for parts of the GCC testsuite.  This is just a way
>> of getting a flavour of how the pass performs; it obviously isn't a
>> meaningful benchmark.  All targets seemed to improve on average:
>> 
>> Target Tests   GoodBad   %Good   Delta  Median
>> == =   ===   =   =  ==
>> aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
>> aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
>> alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
>> amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
>> arc-elf 2166   1932234  89.20%  -37742  -1
>> arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
>> arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
>> avr-elf 4789   4330459  90.42% -441276  -4
>> bfin-elf2795   2394401  85.65%  -19252  -1
>> bpf-elf 3122   2928194  93.79%   -8785  -1
>> c6x-elf 2227   1929298  86.62%  -17339  -1
>> cris-elf3464   3270194  94.40%  -23263  -2
>> csky-elf2915   2591324  88.89%  -22146  -1
>> epiphany-elf2399   2304 95  96.04%  -28698  -2
>> fr30-elf7712   7299413  94.64%  -99830  -2
>> frv-linux-gnu   3332   2877455  86.34%  -25108  -1
>> ft32-elf2775   2667108  96.11%  -25029  -1
>> h8300-elf   3176   2862314  90.11%  -29305  -2
>> hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
>> ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
>> iq2000-elf  9684   9637 47  99.51% -126557  -2
>> lm32-elf2681   2608 73  97.28%  -59884  -3
>> loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
>> m32r-elf1626   1517109  93.30%   -9323  -2
>> m68k-linux-gnu  3022   2620402  86.70%  -21531  -1
>> mcore-elf   2315   2085230  90.06%  -24160  -1
>> microblaze-elf  2782   2585197  92.92%  -16530  -1
>> mipsel-linux-gnu1958   1827131  93.31%  -15462  -1
>> mipsisa64-linux-gnu 1655   1488167  89.91%  -16592  -2
>> mmix4914   4814100  97.96%  -63021  -1
>> mn10300-elf 3639   3320319  91.23%  -34752  -2
>> moxie-rtems 3497   3252245  92.99%  -87305  -3
>> msp430-elf  4353   3876477  89.04%  -23780  -1
>> nds32le-elf 3042   2780262  91.39%  -27320  -1
>> nios2-linux-gnu 1683   1355328  80.51%   -8065  -1
>> nvptx-none  2114   1781333  84.25%  -12589  -2
>> or1k-elf3045   2699346  88.64%  -14328  -2
>> pdp11   4515   4146369  91.83%  -26047  -2
>> pru-elf 1585   1245340  78.55%   -5225  -1
>> riscv32-elf 2122   2000122  94.25% -101162  -2
>> riscv64-elf 1841   1726115  93.75%  -49997  -2
>> rl78-elf2823   2530293  89.62%  -40742  -4
>> rx-elf  2614   2480134  94.87%  -18863  -1
>> s390-linux-gnu  1591   1393198  87.55%  -16696  -1
>> s390x-linux-gnu 2015   1879136  93.25%  -21134  -1
>> sh-linux-gnu1870   1507363  80.59%   -9491  -1
>> sparc-linux-gnu 1123   1075 48  95.73%  -14503  -1
>> sparc-wrs-vxworks   1121   1073 48  95.72%  -14578  -1
>> sparc64-linux-gnu   1096   1021 75  93.16%  -15003  -1
>> v850-elf1897   1728169  91.09%  -11078  -1
>> vax-netbsdelf   3035   2995 40  98.68%  -27642  -1
>> visium-elf  1392   1106286  79.45%   -7984  -2
>> xstormy16-elf   2577   2071506  80.36%  -13061  -1
>> 
>> 
>
> Since you have already briefly compared some of the code, can you share
> those cases which get worse and might require some potential follow up
> patches?

I think a lot of them are unpredictable secondary effects, such as on
register allocation, tail merging potential, and so on.  For sh, it also
includes whether delay slots are filled with useful work, or whether
they get a nop.  (Instruction combination tends to create more complex
instructions, so there will be fewer 2-byte instructions to act as delay
slot candidates.)

Also, this kind of combination can decrease the number of instructions
but increase the constant pool size.  The figures take that into account.
(The comparison is a bit ad-hoc, though, since I wasn't dedicated enough
to try to b

Re: [PATCH 1/6] rtl-ssa: Rework _ignoring interfaces

2024-06-21 Thread Richard Sandiford
Alex Coplan  writes:
> Hi Richard,
>
> I had a quick look through the patch and noticed a couple of minor typos.
> Otherwise looks like a nice cleanup!

Thanks for the review!  I've fixed the typos in my local copy.

Richard

> On 20/06/2024 14:34, Richard Sandiford wrote:
>> rtl-ssa has routines for scanning forwards or backwards for something
>> under the control of an exclusion set.  These searches are currently
>> used for two main things:
>> 
>> - to work out where an instruction can be moved within its EBB
>> - to work out whether recog can add a new hard register clobber
>> 
>> The exclusion set was originally a callback function that returned
>> true for insns that should be ignored.  However, for the late-combine
>> work, I'd also like to be able to skip an entire definition, along
>> with all its uses.
>> 
>> This patch prepares for that by turning the exclusion set into an
>> object that provides predicate member functions.  Currently the
>> only two member functions are:
>> 
>> - should_ignore_insn: what the old callback did
>> - should_ignore_def: the new functionality
>> 
>> but more could be added later.
>> 
>> Doing this also makes it easy to remove some assymmetry that I think
>
> s/assymmetry/asymmetry/
>
>> in hindsight was a mistake: in forward scans, ignoring an insn meant
>> ignoring all definitions in that insn (ok) and all uses of those
>> definitions (non-obvious).  The new interface makes it possible
>> to select the required behaviour, with that behaviour being applied
>> consistently in both directions.
>> 
>> Now that the exclusion set is a dedicated object, rather than
>> just a "random" function, I think it makes sense to remove the
>> _ignoring suffix from the function names.  The suffix was originally
>> there to describe the callback, and in particular to emphasise that
>> a true return meant "ignore" rather than "heed".
>> 
>> gcc/
>>  * rtl-ssa.h: Include predicates.h.
>>  * rtl-ssa/predicates.h: New file.
>>  * rtl-ssa/access-utils.h (prev_call_clobbers_ignoring): Rename to...
>>  (prev_call_clobbers): ...this and treat the ignore parameter as an
>>  object with the same interface as ignore_nothing.
>>  (next_call_clobbers_ignoring): Rename to...
>>  (next_call_clobbers): ...this and treat the ignore parameter as an
>>  object with the same interface as ignore_nothing.
>>  (first_nondebug_insn_use_ignoring): Rename to...
>>  (first_nondebug_insn_use): ...this and treat the ignore parameter as
>>  an object with the same interface as ignore_nothing.
>>  (last_nondebug_insn_use_ignoring): Rename to...
>>  (last_nondebug_insn_use): ...this and treat the ignore parameter as
>>  an object with the same interface as ignore_nothing.
>>  (last_access_ignoring): Rename to...
>>  (last_access): ...this and treat the ignore parameter as an object
>>  with the same interface as ignore_nothing.  Conditionally skip
>>  definitions.
>>  (prev_access_ignoring): Rename to...
>>  (prev_access): ...this and treat the ignore parameter as an object
>>  with the same interface as ignore_nothing.
>>  (first_def_ignoring): Replace with...
>>  (first_access): ...this new function.
>>  (next_access_ignoring): Rename to...
>>  (next_access): ...this and treat the ignore parameter as an object
>>  with the same interface as ignore_nothing.  Conditionally skip
>>  definitions.
>>  * rtl-ssa/change-utils.h (insn_is_changing): Delete.
>>  (restrict_movement_ignoring): Rename to...
>>  (restrict_movement): ...this and treat the ignore parameter as an
>>  object with the same interface as ignore_nothing.
>>  (recog_ignoring): Rename to...
>>  (recog): ...this and treat the ignore parameter as an object with
>>  the same interface as ignore_nothing.
>>  * rtl-ssa/changes.h (insn_is_changing_closure): Delete.
>>  * rtl-ssa/functions.h (function_info::add_regno_clobber): Treat
>>  the ignore parameter as an object with the same interface as
>>  ignore_nothing.
>>  * rtl-ssa/insn-utils.h (insn_is): Delete.
>>  * rtl-ssa/insns.h (insn_is_closure): Delete.
>>  * rtl-ssa/member-fns.inl
>>  (insn_is_changing_closure::insn_is_changing_closure): Delete.
>>  (insn_is_changing_closure::operator()): Likewise.
>>  (function_info::add_regno_clobber): Treat the ignore parameter
>>  as an object with the same interface as ignore_nothing.
>>  (ignore_changing_insns::ignore_changing_insns): New function.
>>  (ignore_changing_insns::should_ignore_insn): Likewise.
>>  * rtl-ssa/movement.h (restrict_movement_for_dead_range): Treat
>>  the ignore parameter as an object with the same interface as
>>  ignore_nothing.
>>  (restrict_movement_for_defs_ignoring): Rename to...
>>  (restrict_movement_for_defs): ...this and treat the ignore parameter
>>  as an object with the same interface as ignore_nothing.
>>  (restric

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-21 Thread Richard Sandiford
Richard Biener  writes:
> [...]
> I wonder if you can amend doc/passes.texi, specifically noting differences
> between fwprop, combine and late-combine?

Ooh, we have a doc/passes.texi? :)  Somehow missed that.

How about the patch below?

Thanks,
Richard


diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 5746d3ec636..4ac7a2306a1 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -991,6 +991,25 @@ RTL expressions for the instructions by substitution, 
simplifies the
 result using algebra, and then attempts to match the result against
 the machine description.  The code is located in @file{combine.cc}.
 
+@item Late instruction combination
+
+This pass attempts to do further instruction combination, on top of
+that performed by @file{combine.cc}.  Its current purpose is to
+substitute definitions into all uses simultaneously, so that the
+definition can be removed.  This differs from the forward propagation
+pass, whose purpose is instead to simplify individual uses on the
+assumption that the definition will remain.  It differs from
+@file{combine.cc} in that there is no hard-coded limit on the number
+of instructions that can be combined at once.  It also differs from
+@file{combine.cc} in that it can move instructions, where necessary.
+
+However, the pass is not in principle limited to this form of
+combination.  It is intended to be a home for other, future
+combination approaches as well.
+
+The pass runs twice, once before register allocation and once after
+register allocation.  The code is located in @file{late-combine.cc}.
+
 @item Mode switching optimization
 
 This pass looks for instructions that require the processor to be in a


Re: [wwwdocs] [PATCH 1/4] branch-closing: Fix various typos

2024-06-21 Thread Gerald Pfeifer
On Tue, 22 Mar 2022, Pokechu22 via Gcc-patches wrote:
> --- a/htdocs/branch-closing.html
> +++ b/htdocs/branch-closing.html
> @@ -54,7 +54,7 @@ is listed in "Known to work" or "Known to fail" as
> applicable.
>  If the bug is a regression that is not fixed on all subsequent
>  release branches and on trunk then it needs to remain open.  Remove
>  the version number of the branch being closed from the summary (for
> -example, change "[7/8 Regression]" to "[8 Regression]".  If the
> +example, change "[7/8 Regression]" to "[8 Regression]").  If the
>  milestone is not set, or is set to a version from the branch being
>  closed, set it to the version number of the next release from the next
>  oldest release branch.

Thank you for pointing this out; I fixed this now.

Gerald


[pushed] wwwdocs: news: Unify hsafoundation.com URLs

2024-06-21 Thread Gerald Pfeifer
---
 htdocs/news.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index 4a6c2ab3..471b31b7 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -168,7 +168,7 @@
 
 BRIG/HSAIL (Heterogeneous Systems Architecture Intermediate 
Language) front end added
  [2017-02-01] wwwdocs:
- http://hsafoundation.com";> Heterogeneous Systems
+ https://hsafoundation.com";>Heterogeneous Systems
  Architecture 1.0 BRIG (HSAIL)
  front end was added to GCC,
  enabling HSAIL finalization for gcc-supported
@@ -207,7 +207,7 @@
 
  Heterogeneous Systems Architecture support
  [2016-01-27] wwwdocs:
- http://www.hsafoundation.com/";> Heterogeneous Systems
+ https://hsafoundation.com";>Heterogeneous Systems
  Architecture 1.0 https://gcc.gnu.org/gcc-6/changes.html#hsa";>
  support was added to GCC, contributed by Martin Jambor, Martin Li??ka
  and Michael Matz from SUSE.
-- 
2.45.2


RE: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-21 Thread Li, Pan2
Thanks Richard for comments.

> to match this by changing it to

> /* Unsigned saturation sub, case 2 (branch with ge):
>SAT_U_SUB = X >= Y ? X - Y : 0.  */
> (match (unsigned_integer_sat_sub @0 @1)
> (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
>  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>   && types_match (type, @0, @1

Do we need another name for this matching ? Add (convert? here may change the 
sematics of .SAT_SUB.
When we call gimple_unsigned_integer_sat_sub (lhs, ops, NULL), the converted 
value may be returned different
to the (minus @0 @1). Please correct me if my understanding is wrong.

> and when using the gimple_match_* function make sure to consider
> that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> we matched?

This may have problem for vector part I guess, require some additional change 
from vectorize_convert when
I try to do that in previous. Let me double check about it, and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, June 21, 2024 3:00 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

On Fri, Jun 21, 2024 at 5:53 AM  wrote:
>
> From: Pan Li 
>
> The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> truncated as below:
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB
>   } while (--n);
> }
>
> It will have gimple after ifcvt pass,  it cannot hit any pattern of
> SAT_SUB and then cannot vectorize to SAT_SUB.
>
> _2 = a_11 - b_12(D);
> iftmp.0_13 = (short unsigned int) _2;
> _18 = a_11 >= b_12(D);
> iftmp.0_5 = _18 ? iftmp.0_13 : 0;
>
> This patch would like to do some reconcile for above pattern to match
> the SAT_SUB pattern.  Then the underlying vect pass is able to vectorize
> the SAT_SUB.

Hmm.  I was thinking of allowing

/* Unsigned saturation sub, case 2 (branch with ge):
   SAT_U_SUB = X >= Y ? X - Y : 0.  */
(match (unsigned_integer_sat_sub @0 @1)
 (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0, @1

to match this by changing it to

/* Unsigned saturation sub, case 2 (branch with ge):
   SAT_U_SUB = X >= Y ? X - Y : 0.  */
(match (unsigned_integer_sat_sub @0 @1)
 (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
  && types_match (type, @0, @1

and when using the gimple_match_* function make sure to consider
that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
we matched?

Richard.

> _2 = a_11 - b_12(D);
> _18 = a_11 >= b_12(D);
> _pattmp = _18 ? _2 : 0; // .SAT_SUB pattern
> iftmp.0_13 = (short unsigned int) _pattmp;
> iftmp.0_5 = iftmp.0_13;
>
> The below tests are running for this patch.
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * match.pd: Add new match for trunated unsigned sat_sub.
> * tree-if-conv.cc (gimple_truncated_unsigned_integer_sat_sub):
> New external decl from match.pd.
> (tree_if_cond_reconcile_unsigned_integer_sat_sub): New func impl
> to reconcile the truncated sat_sub pattern.
> (tree_if_cond_reconcile): New func impl to reconcile.
> (pass_if_conversion::execute): Try to reconcile after ifcvt.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd|  9 +
>  gcc/tree-if-conv.cc | 83 +
>  2 files changed, 92 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..9617a5f9d5e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3210,6 +3210,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub and then truncated, aka:
> +   Truncated = X >= Y ? (Other Type) (X - Y) : 0.
> + */
> +(match (truncated_unsigned_integer_sat_sub @0 @1)
> + (cond (ge @0 @1) (convert (minus @0 @1)) integer_zerop)
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (@0, @1)
> +  && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 57992b6deca..535743130f2 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -3738,6 +3738,87 @@ bitfields_to_lower_p (class loop *loop,
>return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
>  }
>
> +extern bool gimple_truncated_unsigne

[COMMITTED 01/22] ada: Spurious style error with mutiple square brackets

2024-06-21 Thread Marc Poulhiès
From: Justin Squirek 

This patch fixes a spurious error in the compiler when checking for style for
token separation where two square brackets are next to each other.

gcc/ada/

* csets.ads (Identifier_Char): New function - replacing table.
* csets.adb (Identifier_Char): Rename and move table for static values.
(Initialize): Remove dynamic calculations.
(Identifier_Char): New function to calculate dynamic values.
* opt.adb (Set_Config_Switches): Remove setting of Identifier_Char.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/csets.adb | 46 --
 gcc/ada/csets.ads | 14 +++---
 gcc/ada/opt.adb   |  3 ---
 3 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/csets.adb b/gcc/ada/csets.adb
index 7e5af3ffa17..54ebdb46b6c 100644
--- a/gcc/ada/csets.adb
+++ b/gcc/ada/csets.adb
@@ -29,6 +29,12 @@ with System.WCh_Con; use System.WCh_Con;
 
 package body Csets is
 
+   Identifier_Char_Table : Char_Array_Flags;
+   --  This table contains all statically known characters which can appear in
+   --  identifiers, but excludes characters which need to be known dynamically,
+   --  for example like those that depend on the current Ada version which may
+   --  change from file to file.
+
X_80 : constant Character := Character'Val (16#80#);
X_81 : constant Character := Character'Val (16#81#);
X_82 : constant Character := Character'Val (16#82#);
@@ -1085,6 +1091,34 @@ package body Csets is
 
   others => ' ');
 
+   -
+   -- Identifier_Char --
+   -
+
+   function Identifier_Char (Item : Character) return Boolean is
+   begin
+  --  Handle explicit dynamic cases
+
+  case Item is
+
+ --  Add [ as an identifier character to deal with the brackets
+ --  notation for wide characters used in identifiers for versions up
+ --  to Ada 2012.
+
+ --  Note that if we are not allowing wide characters in identifiers,
+ --  then any use of this notation will be flagged as an error in
+ --  Scan_Identifier.
+
+ when '[' | ']' =>
+return Ada_Version < Ada_2022;
+
+ --  Otherwise, this is a static case - use the table
+
+ when others =>
+return Identifier_Char_Table (Item);
+  end case;
+   end Identifier_Char;
+

-- Initialize --

@@ -1144,24 +1178,16 @@ package body Csets is
   --  Build Identifier_Char table from used entries of Fold_Upper
 
   for J in Character loop
- Identifier_Char (J) := (Fold_Upper (J) /= ' ');
+ Identifier_Char_Table (J) := (Fold_Upper (J) /= ' ');
   end loop;
 
-  --  Add [ as an identifier character to deal with the brackets notation
-  --  for wide characters used in identifiers for versions up to Ada 2012.
-  --  Note that if we are not allowing wide characters in identifiers, then
-  --  any use of this notation will be flagged as an error in
-  --  Scan_Identifier.
-
-  Identifier_Char ('[') := Ada_Version < Ada_2022;
-
   --  Add entry for ESC if wide characters in use with a wide character
   --  encoding method active that uses the ESC code for encoding.
 
   if Identifier_Character_Set = 'w'
 and then Wide_Character_Encoding_Method in WC_ESC_Encoding_Method
   then
- Identifier_Char (ASCII.ESC) := True;
+ Identifier_Char_Table (ASCII.ESC) := True;
   end if;
end Initialize;
 
diff --git a/gcc/ada/csets.ads b/gcc/ada/csets.ads
index 9dc78ba10e8..f0930df47db 100644
--- a/gcc/ada/csets.ads
+++ b/gcc/ada/csets.ads
@@ -80,12 +80,12 @@ package Csets is
Fold_Lower : Translate_Table;
--  Table to fold upper case identifier letters to lower case
 
-   Identifier_Char : Char_Array_Flags;
-   --  This table has True entries for all characters that can legally appear
-   --  in identifiers, including digits, the underline character, all letters
-   --  including upper and lower case and extended letters (as controlled by
-   --  the setting of Opt.Identifier_Character_Set), left bracket for brackets
-   --  notation wide characters and also ESC if wide characters are permitted
-   --  in identifiers using escape sequences starting with ESC.
+   function Identifier_Char (Item : Character) return Boolean;
+   --  Return True for all characters that can legally appear in identifiers,
+   --  including digits, the underline character, all letters including upper
+   --  and lower case and extended letters (as controlled by the setting of
+   --  Opt.Identifier_Character_Set), left bracket for brackets notation wide
+   --  characters and also ESC if wide characters are permitted in identifiers
+   --  using escape sequences starting with ESC.
 
 end Csets;
diff --git a/gcc/ada/opt.adb b/gcc/ada/opt.adb
index 5427a95a3b6..8598ce234cc 100644
--- a/gcc/ada/opt.adb
+++ b/gcc/ada/opt.

[COMMITTED 02/22] ada: Fix for Default_Component_Value with declare expressions

2024-06-21 Thread Marc Poulhiès
From: Piotr Trojanek 

When the expression of aspect Default_Component_Value includes a declare
expression with current type instance, we attempted to recursively froze
that type, which itself caused an infinite recursion, because we didn't
properly manage the scope of declare expression.

This patch fixes both the detection of the current type instance and
analysis of the expression that caused recursive freezing.

gcc/ada/

* sem_attr.adb (In_Aspect_Specification): Use the standard
condition that works correctly with declare expressions.
* sem_ch13.adb (Analyze_Aspects_At_Freeze_Point): Replace
ordinary analysis with preanalysis of spec expressions.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb |  4 +++-
 gcc/ada/sem_ch13.adb | 12 ++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 72f5ab49175..d56c25a79cc 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -1843,7 +1843,9 @@ package body Sem_Attr is
if Nkind (P) = N_Aspect_Specification then
   return P_Type = Entity (P);
 
-   elsif Nkind (P) in N_Declaration then
+   --  Prevent the search from going too far
+
+   elsif Is_Body_Or_Package_Declaration (P) then
   return False;
end if;
 
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 4012932a6f2..a86f774018a 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -1037,11 +1037,19 @@ package body Sem_Ch13 is
 
  Parent_Type : Entity_Id;
 
+ Save_In_Spec_Expression : constant Boolean := In_Spec_Expression;
+
   begin
  --  Ensure Expr is analyzed so that e.g. all types are properly
- --  resolved for Find_Type_Reference.
+ --  resolved for Find_Type_Reference. We preanalyze this expression
+ --  as a spec expression (to avoid recursive freezing), while skipping
+ --  resolution (to not fold type self-references, e.g. T'Last).
 
- Analyze (Expr);
+ In_Spec_Expression := True;
+
+ Preanalyze (Expr);
+
+ In_Spec_Expression := Save_In_Spec_Expression;
 
  --  A self-referential aspect is illegal if it forces freezing the
  --  entity before the corresponding aspect has been analyzed.
-- 
2.45.1



[COMMITTED 05/22] ada: Fix gnatcheck violation reported after a recent cleanup

2024-06-21 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_ch3.adb (Add_Interface_Tag_Components): Simplify with No.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index eebaedc216b..a1112d7b44a 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -1618,7 +1618,7 @@ package body Sem_Ch3 is
 
   Last_Tag := Empty;
 
-  if not Present (Component_List (Ext)) then
+  if No (Component_List (Ext)) then
  Set_Null_Present (Ext, False);
  L := New_List;
  Set_Component_List (Ext,
-- 
2.45.1



[COMMITTED 07/22] ada: Fix incorrect handling of packed array with aliased composite components

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

The problem is that the handling of the interaction between packing and
aliased/atomic/independent components of an array type is tied to that of
the interaction between a component clause and aliased/atomic/independent
components, although the semantics are different: packing is a best effort
thing, whereas a component clause must be honored or else an error be given.

This decouples the two handlings, but retrofits the separate processing of
independent components done in both cases into the common code and changes
the error message from "minimum allowed is" to "minimum allowed value is"
for the sake of consistency with the aliased/atomic processing.

gcc/ada/

* freeze.adb (Freeze_Array_Type): Decouple the handling of the
interaction between packing and aliased/atomic components from
that of the interaction between a component clause and aliased/
atomic components, and retrofit the processing of the interaction
between the two characteristics and independent components into
the common processing.

gcc/testsuite/ChangeLog:

* gnat.dg/atomic10.adb: Adjust.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb | 190 ++---
 gcc/testsuite/gnat.dg/atomic10.adb |   4 +-
 2 files changed, 93 insertions(+), 101 deletions(-)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 1867880b314..29733a17a56 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -3634,7 +3634,9 @@ package body Freeze is
   procedure Freeze_Array_Type (Arr : Entity_Id) is
  FS : constant Entity_Id := First_Subtype (Arr);
  Ctyp   : constant Entity_Id := Component_Type (Arr);
- Clause : Entity_Id;
+
+ Clause : Node_Id;
+ --  Set to Component_Size clause or Atomic pragma, if any
 
  Non_Standard_Enum : Boolean := False;
  --  Set true if any of the index types is an enumeration type with a
@@ -3710,76 +3712,57 @@ package body Freeze is
end;
 end if;
 
---  Check for Aliased or Atomic_Components or Full Access with
---  unsuitable packing or explicit component size clause given.
-
-if (Has_Aliased_Components (Arr)
- or else Has_Atomic_Components (Arr)
- or else Is_Full_Access (Ctyp))
-  and then
-(Has_Component_Size_Clause (Arr) or else Is_Packed (Arr))
-then
-   Alias_Atomic_Check : declare
+--  Check for Aliased or Atomic or Full Access or Independent
+--  components with an unsuitable component size clause given.
+--  The main purpose is to give an error when bit packing would
+--  be required to honor the component size, because bit packing
+--  is incompatible with these aspects; when bit packing is not
+--  required, the final validation of the component size may be
+--  left to the back end.
 
-  procedure Complain_CS (T : String);
-  --  Outputs error messages for incorrect CS clause or pragma
-  --  Pack for aliased or full access components (T is either
-  --  "aliased" or "atomic" or "volatile full access");
+if Has_Component_Size_Clause (Arr) then
+   CS_Check : declare
+  procedure Complain_CS (T : String; Min : Boolean := False);
+  --  Output an error message for an unsuitable component size
+  --  clause for independent components (T is either "aliased"
+  --  or "atomic" or "volatile full access" or "independent").
 
   -
   -- Complain_CS --
   -
 
-  procedure Complain_CS (T : String) is
+  procedure Complain_CS (T : String; Min : Boolean := False) is
   begin
- if Has_Component_Size_Clause (Arr) then
-Clause :=
-  Get_Attribute_Definition_Clause
-(FS, Attribute_Component_Size);
+ Clause :=
+   Get_Attribute_Definition_Clause
+ (FS, Attribute_Component_Size);
 
-Error_Msg_N
-  ("incorrect component size for "
-   & T & " components", Clause);
-Error_Msg_Uint_1 := Esize (Ctyp);
-Error_Msg_N
-  ("\only allowed value is^", Clause);
+ Error_Msg_N
+   ("incorrect component size for " & T & " components",
+Clause);
 
+ if Known_Static_Esize (Ctyp) then
+Error_Msg_Uint_1 := Esize (Ctyp);
+

[COMMITTED 19/22] ada: Implement fast modulo reduction for nonbinary modular multiplication

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

This adds the missing guard to prevent the reduction from being used when
the target does not provide or cannot synthesize a high-part multiply.

gcc/ada/

* gcc-interface/trans.cc (gnat_to_gnu) : Fix formatting.
* gcc-interface/utils2.cc: Include optabs-query.h.
(fast_modulo_reduction): Call can_mult_highpart_p on the TYPE_MODE
before generating a high-part multiply.  Fix formatting.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc  |  2 +-
 gcc/ada/gcc-interface/utils2.cc | 12 +++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 7c5282602b2..83ed17bff84 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -7323,7 +7323,7 @@ gnat_to_gnu (Node_Id gnat_node)
 pair in the needed precision up to the word size.  But not when
 optimizing for size, because it will be longer than a div+mul+sub
 sequence.  */
-else if (!optimize_size
+   else if (!optimize_size
 && (code == FLOOR_MOD_EXPR || code == TRUNC_MOD_EXPR)
 && TYPE_UNSIGNED (gnu_type)
 && TYPE_PRECISION (gnu_type) <= BITS_PER_WORD
diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc
index a37eccc4cfb..d101d7729bf 100644
--- a/gcc/ada/gcc-interface/utils2.cc
+++ b/gcc/ada/gcc-interface/utils2.cc
@@ -35,6 +35,7 @@
 #include "builtins.h"
 #include "expmed.h"
 #include "fold-const.h"
+#include "optabs-query.h"
 #include "stor-layout.h"
 #include "stringpool.h"
 #include "varasm.h"
@@ -558,11 +559,11 @@ fast_modulo_reduction (tree op, tree modulus, unsigned 
int precision)
 
   op / d = (op * multiplier) >> shifter
 
- But choose_multiplier provides a slightly different interface:
+But choose_multiplier provides a slightly different interface:
 
-   op / d = (op h* multiplier) >> reduced_shifter
+ op / d = (op h* multiplier) >> reduced_shifter
 
- that makes things easier by using a high-part multiplication.  */
+that makes things easier by using a high-part multiplication.  */
   mh = choose_multiplier (d, type_precision, precision, &ml, &post_shift);
 
   /* If the suggested multiplier is more than TYPE_PRECISION bits, we can
@@ -577,8 +578,9 @@ fast_modulo_reduction (tree op, tree modulus, unsigned int 
precision)
pre_shift = 0;
 
   /* If the suggested multiplier is still more than TYPE_PRECISION bits,
-try again with a larger type up to the word size.  */
-  if (mh != 0)
+or the TYPE_MODE does not have a high-part multiply, try again with
+a larger type up to the word size.  */
+  if (mh != 0 || !can_mult_highpart_p (TYPE_MODE (type), true))
{
  if (type_precision < BITS_PER_WORD)
{
-- 
2.45.1



[COMMITTED 04/22] ada: Predefined arithmetic operators incorrectly treated as directly visible

2024-06-21 Thread Marc Poulhiès
From: Steve Baird 

In some cases, a predefined operator (e.g., the "+" operator for an
integer type) is incorrectly treated as being directly visible when
it is not. This can lead to both accepting operator uses that should
be rejected and also to incorrectly rejecting legal constructs as ambiguous
(for example, an expression "Foo + 1" where Foo is an overloaded function and
the "+" operator is directly visible for the result type of only one of
the possible callees).

gcc/ada/

* sem_ch4.adb (Is_Effectively_Visible_Operator): A new function.
(Check_Arithmetic_Pair): In paths where Add_One_Interp was
previously called unconditionally, instead call only if
Is_Effectively_Visible_Operator returns True.
(Check_Boolean_Pair): Likewise.
(Find_Unary_Types): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index 1175a34df21..dfeff02a011 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -270,6 +270,18 @@ package body Sem_Ch4 is
--  these aspects can be achieved without larger modifications to the
--  two-pass resolution algorithm.
 
+   function Is_Effectively_Visible_Operator
+ (N : Node_Id; Typ : Entity_Id) return Boolean
+   is (Is_Visible_Operator (N => N, Typ => Typ)
+ or else
+   --  test for a rewritten Foo."+" call
+   (N /= Original_Node (N)
+ and then Is_Effectively_Visible_Operator
+(N => Original_Node (N), Typ => Typ))
+ or else not Comes_From_Source (N));
+   --  Return True iff either Is_Visible_Operator returns True or if
+   --  there is a reason it is ok for Is_Visible_Operator to return False.
+
function Possible_Type_For_Conditional_Expression
  (T1, T2 : Entity_Id) return Entity_Id;
--  Given two types T1 and T2 that are _not_ compatible, return a type that
@@ -6641,6 +6653,8 @@ package body Sem_Ch4 is
and then (Covers (T1 => T1, T2 => T2)
or else
  Covers (T1 => T2, T2 => T1))
+   and then Is_Effectively_Visible_Operator
+  (N, Specific_Type (T1, T2))
  then
 Add_One_Interp (N, Op_Id, Specific_Type (T1, T2));
  end if;
@@ -6670,6 +6684,8 @@ package body Sem_Ch4 is
and then (Covers (T1 => T1, T2 => T2)
or else
  Covers (T1 => T2, T2 => T1))
+   and then Is_Effectively_Visible_Operator
+  (N, Specific_Type (T1, T2))
  then
 Add_One_Interp (N, Op_Id, Specific_Type (T1, T2));
 
@@ -6713,6 +6729,8 @@ package body Sem_Ch4 is
and then (Covers (T1 => T1, T2 => T2)
or else
  Covers (T1 => T2, T2 => T1))
+   and then Is_Effectively_Visible_Operator
+  (N, Specific_Type (T1, T2))
  then
 Add_One_Interp (N, Op_Id, Specific_Type (T1, T2));
  end if;
@@ -7086,6 +7104,7 @@ package body Sem_Ch4 is
T := Any_Modular;
 end if;
 
+--  test Is_Effectively_Visible_Operator here ???
 Add_One_Interp (N, Op_Id, T);
  end if;
   end Check_Boolean_Pair;
@@ -7615,7 +7634,8 @@ package body Sem_Ch4 is
then
   null;
 
-   else
+   elsif Is_Effectively_Visible_Operator (N, Base_Type (It.Typ))
+   then
   Add_One_Interp (N, Op_Id, Base_Type (It.Typ));
end if;
 end if;
-- 
2.45.1



[COMMITTED 03/22] ada: Fix assertion failure on predicate involving access parameter

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

The assertion fails because the Original_Node of the expression has no Etype
since its an unanalyzed identifier.

gcc/ada/

* accessibility.adb (Accessibility_Level): Apply the processing to
Expr when its Original_Node is an unanalyzed identifier.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/accessibility.adb | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/accessibility.adb b/gcc/ada/accessibility.adb
index da4d1d9ce2e..298103377a7 100644
--- a/gcc/ada/accessibility.adb
+++ b/gcc/ada/accessibility.adb
@@ -398,7 +398,7 @@ package body Accessibility is
 
   --  Local variables
 
-  E   : Node_Id := Original_Node (Expr);
+  E   : Node_Id;
   Pre : Node_Id;
 
--  Start of processing for Accessibility_Level
@@ -409,6 +409,17 @@ package body Accessibility is
 
   if Present (Param_Entity (Expr)) then
  E := Param_Entity (Expr);
+
+  --  Use the original node unless it is an unanalyzed identifier, as we
+  --  don't want to reason on unanalyzed expressions from predicates.
+
+  elsif Nkind (Original_Node (Expr)) /= N_Identifier
+or else Analyzed (Original_Node (Expr))
+  then
+ E := Original_Node (Expr);
+
+  else
+ E := Expr;
   end if;
 
   --  Extract the entity
-- 
2.45.1



[COMMITTED 10/22] ada: Cannot override inherited function with controlling result

2024-06-21 Thread Marc Poulhiès
From: Javier Miranda 

When a package has the declaration of a derived tagged
type T with private null extension that inherits a public
function F with controlling result, and a derivation of T
is declared in the public part of another package, overriding
function F may be rejected by the compiler.

gcc/ada/

* sem_disp.adb (Find_Hidden_Overridden_Primitive): Check
public dispatching primitives of ancestors; previously,
only immediately-visible primitives were checked.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_disp.adb | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_disp.adb b/gcc/ada/sem_disp.adb
index 9c498ee9a3f..fe822290e45 100644
--- a/gcc/ada/sem_disp.adb
+++ b/gcc/ada/sem_disp.adb
@@ -89,7 +89,9 @@ package body Sem_Disp is
--  to the found entity; otherwise return Empty.
--
--  This routine does not search for non-hidden primitives since they are
-   --  covered by the normal Ada 2005 rules.
+   --  covered by the normal Ada 2005 rules. Its name was motivated by an
+   --  intermediate version of AI05-0125 where this term was proposed to
+   --  name these entities in the RM.
 
function Is_Inherited_Public_Operation (Op : Entity_Id) return Boolean;
--  Check whether a primitive operation is inherited from an operation
@@ -2403,7 +2405,7 @@ package body Sem_Disp is
Orig_Prim := Original_Corresponding_Operation (Prim);
 
if Orig_Prim /= Prim
- and then Is_Immediately_Visible (Orig_Prim)
+ and then not Is_Hidden (Orig_Prim)
then
   Vis_Ancestor := First_Elmt (Vis_List);
   while Present (Vis_Ancestor) loop
-- 
2.45.1



[COMMITTED 09/22] ada: Fix missing index check with declare expression

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

The Do_Range_Check flag is properly set on the Expression of the EWA node
built for the declare expression, so this instructs Generate_Index_Checks
to look into this Expression.

gcc/ada/

* checks.adb (Generate_Index_Checks): Add specific treatment for
index expressions that are N_Expression_With_Actions nodes.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index bada3dffcbf..c8a0696be67 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -7248,7 +7248,8 @@ package body Checks is
   Loc   : constant Source_Ptr := Sloc (N);
   A : constant Node_Id:= Prefix (N);
   A_Ent : constant Entity_Id  := Entity_Of_Prefix;
-  Sub   : Node_Id;
+
+  Expr : Node_Id;
 
--  Start of processing for Generate_Index_Checks
 
@@ -7294,13 +7295,13 @@ package body Checks is
   --  us to omit the check have already been taken into account in the
   --  setting of the Do_Range_Check flag earlier on.
 
-  Sub := First (Expressions (N));
+  Expr := First (Expressions (N));
 
   --  Handle string literals
 
   if Ekind (Etype (A)) = E_String_Literal_Subtype then
- if Do_Range_Check (Sub) then
-Set_Do_Range_Check (Sub, False);
+ if Do_Range_Check (Expr) then
+Set_Do_Range_Check (Expr, False);
 
 --  For string literals we obtain the bounds of the string from the
 --  associated subtype.
@@ -7310,8 +7311,8 @@ package body Checks is
 Condition =>
Make_Not_In (Loc,
  Left_Opnd  =>
-   Convert_To (Base_Type (Etype (Sub)),
- Duplicate_Subexpr_Move_Checks (Sub)),
+   Convert_To (Base_Type (Etype (Expr)),
+ Duplicate_Subexpr_Move_Checks (Expr)),
  Right_Opnd =>
Make_Attribute_Reference (Loc,
  Prefix => New_Occurrence_Of (Etype (A), Loc),
@@ -7330,11 +7331,19 @@ package body Checks is
 Ind : Pos;
 Num : List_Id;
 Range_N : Node_Id;
+Stmt: Node_Id;
+Sub : Node_Id;
 
  begin
 A_Idx := First_Index (Etype (A));
 Ind   := 1;
-while Present (Sub) loop
+while Present (Expr) loop
+   if Nkind (Expr) = N_Expression_With_Actions then
+  Sub := Expression (Expr);
+   else
+  Sub := Expr;
+   end if;
+
if Do_Range_Check (Sub) then
   Set_Do_Range_Check (Sub, False);
 
@@ -7396,7 +7405,7 @@ package body Checks is
  Expressions=> Num);
   end if;
 
-  Insert_Action (N,
+  Stmt :=
 Make_Raise_Constraint_Error (Loc,
   Condition =>
  Make_Not_In (Loc,
@@ -7404,14 +7413,21 @@ package body Checks is
  Convert_To (Base_Type (Etype (Sub)),
Duplicate_Subexpr_Move_Checks (Sub)),
Right_Opnd => Range_N),
-  Reason => CE_Index_Check_Failed));
+  Reason => CE_Index_Check_Failed);
+
+  if Nkind (Expr) = N_Expression_With_Actions then
+ Append_To (Actions (Expr), Stmt);
+ Analyze (Stmt);
+  else
+ Insert_Action (Expr, Stmt);
+  end if;
 
   Checks_Generated.Elements (Ind) := True;
end if;
 
Next_Index (A_Idx);
Ind := Ind + 1;
-   Next (Sub);
+   Next (Expr);
 end loop;
  end;
   end if;
-- 
2.45.1



[COMMITTED 06/22] ada: Generic formal/actual matching -- misc cleanup

2024-06-21 Thread Marc Poulhiès
From: Bob Duff 

The only substantive change is to remove Activation_Chain_Entity
from N_Generic_Package_Declaration. The comment in sinfo.ads suggesting
this change was written in 1993!

Various pieces of missing documentation are added to Sinfo and Einfo.

Also other minor cleanups.

gcc/ada/

* gen_il-gen-gen_nodes.adb
(N_Generic_Package_Declaration): Remove Activation_Chain_Entity.
* sinfo.ads: Comment improvements. Add missing doc.
Remove obsolete comment about Activation_Chain_Entity.
* einfo.ads: Comment improvements. Add missing doc.
* einfo-utils.adb (Base_Type): Add Assert (disabled for now).
(Next_Index): Minor cleanup.
* aspects.ads: Minor comment fix.
* exp_ch6.adb: Likewise.
* sem_ch3.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/aspects.ads  |  2 +-
 gcc/ada/einfo-utils.adb  | 29 -
 gcc/ada/einfo.ads| 29 +++--
 gcc/ada/exp_ch6.adb  |  4 ++--
 gcc/ada/gen_il-gen-gen_nodes.adb |  3 +--
 gcc/ada/sem_ch3.adb  |  4 ++--
 gcc/ada/sinfo.ads| 32 ++--
 7 files changed, 59 insertions(+), 44 deletions(-)

diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads
index 140fb7c8fe1..cf992a89038 100644
--- a/gcc/ada/aspects.ads
+++ b/gcc/ada/aspects.ads
@@ -1176,7 +1176,7 @@ package Aspects is
  Class_Present : Boolean := False;
  Or_Rep_Item   : Boolean := False) return Node_Id;
--  Find the aspect specification of aspect A (or A'Class if Class_Present)
-   --  associated with entity I.
+   --  associated with entity Id.
--  If found, then return the aspect specification.
--  If not found and Or_Rep_Item is true, then look for a representation
--  item (as opposed to an N_Aspect_Specification node) which specifies
diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb
index 438868ac757..4c86ba1c3b1 100644
--- a/gcc/ada/einfo-utils.adb
+++ b/gcc/ada/einfo-utils.adb
@@ -664,12 +664,22 @@ package body Einfo.Utils is
 
function Base_Type (Id : E) return E is
begin
-  if Is_Base_Type (Id) then
- return Id;
-  else
- pragma Assert (Is_Type (Id));
- return Etype (Id);
-  end if;
+  return Result : E do
+ if Is_Base_Type (Id) then
+Result := Id;
+ else
+pragma Assert (Is_Type (Id));
+Result := Etype (Id);
+if False then
+   pragma Assert (Is_Base_Type (Result));
+   --  ???It seems like Base_Type should return a base type,
+   --  but this assertion is disabled because it is not always
+   --  true. Hence the need to say "Base_Type (Base_Type (...))"
+   --  in some cases; Base_Type is not idempotent as one might
+   --  expect.
+end if;
+ end if;
+  end return;
end Base_Type;
 
--
@@ -2018,10 +2028,11 @@ package body Einfo.Utils is

 
function Next_Index (Id : N) return Node_Id is
-   begin
   pragma Assert (Nkind (Id) in N_Is_Index);
-  pragma Assert (No (Next (Id)) or else Nkind (Next (Id)) in N_Is_Index);
-  return Next (Id);
+  Result : constant Node_Id := Next (Id);
+  pragma Assert (No (Result) or else Nkind (Result) in N_Is_Index);
+   begin
+  return Result;
end Next_Index;
 
--
diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
index 8ee419b3e07..dd95ea051c1 100644
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -1334,7 +1334,7 @@ package Einfo is
 --First_Component (synthesized)
 --   Applies to incomplete, private, protected, record and task types.
 --   Returns the first component by following the chain of declared
---   entities for the type a component is found (one with an Ekind of
+--   entities for the type until a component is found (one with an Ekind of
 --   E_Component). The discriminants are skipped. If the record is null,
 --   then Empty is returned.
 
@@ -1342,6 +1342,10 @@ package Einfo is
 --   Similar to First_Component, but discriminants are not skipped, so will
 --   find the first discriminant if discriminants are present.
 
+--First_Discriminant (synthesized)
+--   Defined for types with discriminants or unknown discriminants.
+--   Returns the first in the Next_Discriminant chain; see Sem_Aux.
+
 --First_Entity
 --   Defined in all entities that act as scopes to which a list of
 --   associated entities is attached, and also in all [sub]types. Some
@@ -1375,12 +1379,11 @@ package Einfo is
 --First_Index
 --   Defined in array types and subtypes. By introducing implicit subtypes
 --   for the index constraints, we have the same structure for constrained
---   and un

[COMMITTED 08/22] ada: Fix internal error on case expression used as index of array component

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

This occurs when the bounds of the array component depend on a discriminant
and the component reference is not nested, that is to say the component is
not (referenced as) a subcomponent of a larger record.

In this case, Analyze_Selected_Component does not build the actual subtype
for the component, but it turns out to be required for constructs generated
during the analysis of the case expression.

The change causes this actual subtype to be built, and also renames a local
variable used to hold the prefix of the selected component.

gcc/ada/

* sem_ch4.adb (Analyze_Selected_Component): Rename Name into Pref
and use Sel local variable consistently.
(Is_Simple_Indexed_Component): New predicate.
Call Is_Simple_Indexed_Component to determine whether to build an
actual subtype for the component.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 108 ++--
 1 file changed, 73 insertions(+), 35 deletions(-)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index dfeff02a011..4e1d1bc7ed7 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -4927,7 +4927,7 @@ package body Sem_Ch4 is
--  the selector must denote a visible entry.
 
procedure Analyze_Selected_Component (N : Node_Id) is
-  Name  : constant Node_Id := Prefix (N);
+  Pref  : constant Node_Id := Prefix (N);
   Sel   : constant Node_Id := Selector_Name (N);
   Act_Decl  : Node_Id;
   Comp  : Entity_Id := Empty;
@@ -4962,8 +4962,11 @@ package body Sem_Ch4 is
   --  indexed component rather than a function call.
 
   function Has_Dereference (Nod : Node_Id) return Boolean;
-  --  Check whether prefix includes a dereference, explicit or implicit,
-  --  at any recursive level.
+  --  Check whether Nod includes a dereference, explicit or implicit, at
+  --  any recursive level.
+
+  function Is_Simple_Indexed_Component (Nod : Node_Id) return Boolean;
+  --  Check whether Nod is a simple indexed component in the context
 
   function Try_By_Protected_Procedure_Prefixed_View return Boolean;
   --  Return True if N is an access attribute whose prefix is a prefixed
@@ -5107,6 +5110,40 @@ package body Sem_Ch4 is
  end if;
   end Has_Dereference;
 
+  -
+  -- Is_Simple_Indexed_Component --
+  -
+
+  function Is_Simple_Indexed_Component (Nod : Node_Id) return Boolean is
+ Expr : Node_Id;
+
+  begin
+ --  Nod must be an indexed component
+
+ if Nkind (Nod) /= N_Indexed_Component then
+return False;
+ end if;
+
+ --  The context must not be a nested selected component
+
+ if Nkind (Pref) = N_Selected_Component then
+return False;
+ end if;
+
+ --  The expressions must not be case expressions
+
+ Expr := First (Expressions (Nod));
+ while Present (Expr) loop
+if Nkind (Expr) = N_Case_Expression then
+   return False;
+end if;
+
+Next (Expr);
+ end loop;
+
+ return True;
+  end Is_Simple_Indexed_Component;
+
   --
   -- Try_By_Protected_Procedure_Prefixed_View --
   --
@@ -5292,17 +5329,17 @@ package body Sem_Ch4 is
begin
   Set_Etype (N, Any_Type);
 
-  if Is_Overloaded (Name) then
+  if Is_Overloaded (Pref) then
  Analyze_Overloaded_Selected_Component (N);
  return;
 
-  elsif Etype (Name) = Any_Type then
+  elsif Etype (Pref) = Any_Type then
  Set_Entity (Sel, Any_Id);
  Set_Etype (Sel, Any_Type);
  return;
 
   else
- Prefix_Type := Etype (Name);
+ Prefix_Type := Etype (Pref);
   end if;
 
   if Is_Access_Type (Prefix_Type) then
@@ -5345,8 +5382,8 @@ package body Sem_Ch4 is
   --  component prefixes because of the prefixed dispatching call case.
   --  Note that implicit dereferences are checked for this just above.
 
-  elsif Nkind (Name) = N_Explicit_Dereference
-and then Is_Remote_Access_To_Class_Wide_Type (Etype (Prefix (Name)))
+  elsif Nkind (Pref) = N_Explicit_Dereference
+and then Is_Remote_Access_To_Class_Wide_Type (Etype (Prefix (Pref)))
 and then Comes_From_Source (N)
   then
  if Try_Object_Operation (N) then
@@ -5397,7 +5434,7 @@ package body Sem_Ch4 is
 Is_Concurrent_Type (Prefix_Type)
   and then Is_Internal_Name (Chars (Prefix_Type))
   and then not Is_Derived_Type (Prefix_Type)
-  and then Is_Entity_Name (Name);
+  and then Is_Entity_Name (Pref);
 
   --  Avoid initializing Comp if that initialization is not needed
   --  (and, more importantly, if the ca

[COMMITTED 11/22] ada: Revert conditional installation of signal handlers on VxWorks

2024-06-21 Thread Marc Poulhiès
From: Doug Rupp 

The conditional installation resulted in a semantic change, and
although it is likely what is ultimately wanted (since HW interrupts
are being reworked on VxWorks). However it must be done in concert
with other modifications for the new formulation of HW interrupts and
not in isolation.

gcc/ada/

* init.c [vxworks] (__gnat_install_handler): Revert to
installing signal handlers without regard to interrupt_state.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/init.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/init.c b/gcc/ada/init.c
index acb8c7cc57e..93e73f53c64 100644
--- a/gcc/ada/init.c
+++ b/gcc/ada/init.c
@@ -2100,14 +2100,10 @@ __gnat_install_handler (void)
 
   /* For VxWorks, install all signal handlers, since pragma Interrupt_State
  applies to vectored hardware interrupts, not signals.  */
-  if (__gnat_get_interrupt_state (SIGFPE) != 's')
- sigaction (SIGFPE,  &act, NULL);
-  if (__gnat_get_interrupt_state (SIGILL) != 's')
- sigaction (SIGILL,  &act, NULL);
-  if (__gnat_get_interrupt_state (SIGSEGV) != 's')
- sigaction (SIGSEGV, &act, NULL);
-  if (__gnat_get_interrupt_state (SIGBUS) != 's')
- sigaction (SIGBUS,  &act, NULL);
+  sigaction (SIGFPE,  &act, NULL);
+  sigaction (SIGILL,  &act, NULL);
+  sigaction (SIGSEGV, &act, NULL);
+  sigaction (SIGBUS,  &act, NULL);
 
 #if defined(__leon__) && defined(_WRS_KERNEL)
   /* Specific to the LEON VxWorks kernel run-time library */
-- 
2.45.1



[COMMITTED 18/22] ada: Implement fast modulo reduction for nonbinary modular multiplication

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

This implements modulo reduction for nonbinary modular multiplication with
small moduli by means of the standard division-free algorithm also used in
the optimizer, but with fewer constraints and therefore better results.

For the sake of consistency, it is also used for the 'Mod attribute of the
same modular types and, more generally, for the Mod (and Rem) operators of
unsigned types if the second operand is static and not a power of two.

gcc/ada/

* gcc-interface/gigi.h (fast_modulo_reduction): Declare.
* gcc-interface/trans.cc (gnat_to_gnu) : In the unsigned
case, call fast_modulo_reduction for {FLOOR,TRUNC}_MOD_EXPR if the
RHS is a constant and not a power of two, and the precision is not
larger than the word size.
* gcc-interface/utils2.cc: Include expmed.h.
(fast_modulo_reduction): New function.
(nonbinary_modular_operation): Call fast_modulo_reduction for the
multiplication if the precision is not larger than the word size.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/gigi.h|   5 ++
 gcc/ada/gcc-interface/trans.cc  |  17 ++
 gcc/ada/gcc-interface/utils2.cc | 102 +++-
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h
index 6ed74d6879e..40f3f0d3d13 100644
--- a/gcc/ada/gcc-interface/gigi.h
+++ b/gcc/ada/gcc-interface/gigi.h
@@ -1040,6 +1040,11 @@ extern bool simple_constant_p (Entity_Id gnat_entity);
 /* Return the size of TYPE, which must be a positive power of 2.  */
 extern unsigned int resolve_atomic_size (tree type);
 
+/* Try to compute the reduction of OP modulo MODULUS in PRECISION bits with a
+   division-free algorithm.  Return NULL_TREE if this is not easily doable.  */
+extern tree fast_modulo_reduction (tree op, tree modulus,
+  unsigned int precision);
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index e68fb3fd776..7c5282602b2 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -7317,6 +7317,23 @@ gnat_to_gnu (Node_Id gnat_node)
  gnu_result
= build_binary_op_trapv (code, gnu_type, gnu_lhs, gnu_rhs,
 gnat_node);
+
+ /* For an unsigned modulo operation with nonbinary constant modulus,
+we first try to do a reduction by means of a (multiplier, shifter)
+pair in the needed precision up to the word size.  But not when
+optimizing for size, because it will be longer than a div+mul+sub
+sequence.  */
+else if (!optimize_size
+&& (code == FLOOR_MOD_EXPR || code == TRUNC_MOD_EXPR)
+&& TYPE_UNSIGNED (gnu_type)
+&& TYPE_PRECISION (gnu_type) <= BITS_PER_WORD
+&& TREE_CODE (gnu_rhs) == INTEGER_CST
+&& !integer_pow2p (gnu_rhs)
+&& (gnu_expr
+= fast_modulo_reduction (gnu_lhs, gnu_rhs,
+ TYPE_PRECISION (gnu_type
+ gnu_result = gnu_expr;
+
else
  {
/* Some operations, e.g. comparisons of arrays, generate complex
diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc
index 70271cf2836..a37eccc4cfb 100644
--- a/gcc/ada/gcc-interface/utils2.cc
+++ b/gcc/ada/gcc-interface/utils2.cc
@@ -33,6 +33,7 @@
 #include "tree.h"
 #include "inchash.h"
 #include "builtins.h"
+#include "expmed.h"
 #include "fold-const.h"
 #include "stor-layout.h"
 #include "stringpool.h"
@@ -534,6 +535,91 @@ compare_fat_pointers (location_t loc, tree result_type, 
tree p1, tree p2)
   p1_array_is_null, same_bounds));
 }
 
+/* Try to compute the reduction of OP modulo MODULUS in PRECISION bits with a
+   division-free algorithm.  Return NULL_TREE if this is not easily doable.  */
+
+tree
+fast_modulo_reduction (tree op, tree modulus, unsigned int precision)
+{
+  const tree type = TREE_TYPE (op);
+  const unsigned int type_precision = TYPE_PRECISION (type);
+
+  /* The implementation is host-dependent for the time being.  */
+  if (type_precision <= HOST_BITS_PER_WIDE_INT)
+{
+  const unsigned HOST_WIDE_INT d = tree_to_uhwi (modulus);
+  unsigned HOST_WIDE_INT ml, mh;
+  int pre_shift, post_shift;
+  tree t;
+
+  /* The trick is to replace the division by d with a multiply-and-shift
+sequence parameterized by a (multiplier, shifter) pair computed from
+d, the precision of the type and the needed precision:
+
+  op / d = (op * multiplier) >> shifter
+
+ But choose_multiplier provides a slightly different interface:
+
+   op / d = (op h* multiplier) >> reduced_shifter
+
+ that makes things easier by using a high-part 

[COMMITTED 16/22] ada: Apply fixes to Examine_Array_Bounds

2024-06-21 Thread Marc Poulhiès
From: Ronan Desplanques 

gcc/ada/

* sem_util.adb (Examine_Array_Bounds): Add missing return
statements. Fix criterion for a string literal being empty.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 4cdac9443e6..4dde5f3964e 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -8157,13 +8157,15 @@ package body Sem_Util is
   if not Is_Constrained (Typ) then
  All_Static := False;
  Has_Empty  := False;
+ return;
 
   --  A string literal has static bounds, and is not empty as long as it
   --  contains at least one character.
 
   elsif Ekind (Typ) = E_String_Literal_Subtype then
  All_Static := True;
- Has_Empty  := String_Literal_Length (Typ) > 0;
+ Has_Empty  := String_Literal_Length (Typ) = 0;
+ return;
   end if;
 
   --  Assume that all bounds are static and not empty
-- 
2.45.1



[COMMITTED 12/22] ada: Small cleanup in processing of primitive operations

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

The processing of primitive operations is now always uniform for tagged and
untagged types, but the code contains left-overs from the time where it was
specific to tagged types, in particular for the handling of subtypes.

gcc/ada/

* einfo.ads (Direct_Primitive_Operations): Mention concurrent types
as well as GNAT extensions instead of implementation details.
(Primitive_Operations): Document that Direct_Primitive_Operations is
also used for concurrent types as a fallback.
* einfo-utils.adb (Primitive_Operations): Tweak formatting.
* exp_util.ads (Find_Prim_Op): Adjust description.
* exp_util.adb (Make_Subtype_From_Expr): In the private case with
unknown discriminants, always copy Direct_Primitive_Operations and
do not overwrite the Class_Wide_Type of the expression's base type.
* sem_ch3.adb (Analyze_Incomplete_Type_Decl): Tweak comment.
(Analyze_Subtype_Declaration): Remove older and now dead calls to
Set_Direct_Primitive_Operations.  Tweak comment.
(Build_Derived_Private_Type): Likewise.
(Build_Derived_Record_Type): Likewise.
(Build_Discriminated_Subtype): Set Direct_Primitive_Operations in
all cases instead of just for tagged types.
(Complete_Private_Subtype): Likewise.
(Derived_Type_Declaration): Tweak comment.
* sem_ch4.ads (Try_Object_Operation): Adjust description.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/einfo-utils.adb |  4 +--
 gcc/ada/einfo.ads   | 34 ---
 gcc/ada/exp_util.adb|  8 ++
 gcc/ada/exp_util.ads| 10 +++
 gcc/ada/sem_ch3.adb | 61 ++---
 gcc/ada/sem_ch4.ads |  5 ++--
 6 files changed, 55 insertions(+), 67 deletions(-)

diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb
index 4c86ba1c3b1..c0c79f92e13 100644
--- a/gcc/ada/einfo-utils.adb
+++ b/gcc/ada/einfo-utils.adb
@@ -2422,8 +2422,8 @@ package body Einfo.Utils is
begin
   if Is_Concurrent_Type (Id) then
  if Present (Corresponding_Record_Type (Id)) then
-return Direct_Primitive_Operations
-  (Corresponding_Record_Type (Id));
+return
+  Direct_Primitive_Operations (Corresponding_Record_Type (Id));
 
  --  When expansion is disabled, the corresponding record type is
  --  absent, but if this is a tagged type with ancestors, or if the
diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
index dd95ea051c1..de175310ee9 100644
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -932,18 +932,17 @@ package Einfo is
 --   subtypes. Contains the Digits value specified in the declaration.
 
 --Direct_Primitive_Operations
---   Defined in tagged types and subtypes (including synchronized types),
---   in tagged private types, and in tagged incomplete types. Moreover, it
---   is also defined for untagged types, both when Extensions_Allowed is
---   True (-gnatX) to support the extension feature of prefixed calls for
---   untagged types, and when Extensions_Allowed is False to get better
---   error messages. This field is an element list of entities for
---   primitive operations of the type. For incomplete types the list is
---   always empty. In order to follow the C++ ABI, entities of primitives
---   that come from source must be stored in this list in the order of
---   their occurrence in the sources. When expansion is disabled, the
---   corresponding record type of a synchronized type is not constructed.
---   In that case, such types carry this attribute directly.
+--   Defined in concurrent types, tagged record types and subtypes, tagged
+--   private types, and tagged incomplete types. Moreover, it is also
+--   defined in untagged types, both when GNAT extensions are allowed, to
+--   support prefixed calls for untagged types, and when GNAT extensions
+--   are not allowed, to give better error messages. Set to a list of
+--   entities for primitive operations of the type. For incomplete types
+--   the list is always empty. In order to follow the C++ ABI, entities of
+--   primitives that come from source must be stored in this list in the
+--   order of their occurrence in the sources. When expansion is disabled,
+--   the corresponding record type of concurrent types is not constructed;
+--   in this case, such types carry this attribute directly.
 
 --Directly_Designated_Type
 --   Defined in access types. This field points to the type that is
@@ -4066,10 +4065,13 @@ package Einfo is
 
 --Primitive_Operations (synthesized)
 --   Defined in concurrent types, tagged record types and subtypes, tagged
---   private types and tagged incomplete types. For concurrent types whose
---   Corresponding_Record_Type (CRT) is available, returns the list of

[COMMITTED 14/22] ada: Crash when using user defined string literals

2024-06-21 Thread Marc Poulhiès
From: Javier Miranda 

When a non-overridable aspect is explicitly specified for a
non-tagged derived type, the compiler blows up processing an
object declaration of an object of such type.

gcc/ada/

* sem_ch13.adb (Analyze_One_Aspect): Fix code locating the entity
of the parent type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index a86f774018a..90376f818a3 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -4801,8 +4801,14 @@ package body Sem_Ch13 is
   and then Nkind (Type_Definition (N)) = N_Derived_Type_Definition
   and then not In_Instance_Body
 then
+   --  In order to locate the parent type we must go first to its
+   --  base type because the frontend introduces an implicit base
+   --  type even if there is no constraint attached to it, since
+   --  this is closer to the Ada semantics.
+
declare
-  Parent_Type  : constant Entity_Id := Etype (E);
+  Parent_Type  : constant Entity_Id :=
+Etype (Base_Type (E));
   Inherited_Aspect : constant Node_Id :=
 Find_Aspect (Parent_Type, A_Id);
begin
-- 
2.45.1



[COMMITTED 21/22] ada: Fix bogus Address Sanitizer stack-buffer-overflow on packed array copy

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

The Address Sanitizer considers that the padding at the end of a justified
modular type may be accessed through the object, but it is never accessed
and therefore can always be reused.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) : Set
the TYPE_JUSTIFIED_MODULAR_P flag earlier.
* gcc-interface/misc.cc (gnat_unit_size_without_reusable_padding):
New function.
(LANG_HOOKS_UNIT_SIZE_WITHOUT_REUSABLE_PADDING): Redefine to above
function.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc |  2 +-
 gcc/ada/gcc-interface/misc.cc | 17 -
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index aa31a18..5b3a3b4961b 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -1976,6 +1976,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
 
  gnu_type = make_node (RECORD_TYPE);
  TYPE_NAME (gnu_type) = create_concat_name (gnat_entity, "JM");
+ TYPE_JUSTIFIED_MODULAR_P (gnu_type) = 1;
  TYPE_PACKED (gnu_type) = 1;
  TYPE_SIZE (gnu_type) = TYPE_SIZE (gnu_field_type);
  TYPE_SIZE_UNIT (gnu_type) = TYPE_SIZE_UNIT (gnu_field_type);
@@ -2006,7 +2007,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
 
  /* We will output additional debug info manually below.  */
  finish_record_type (gnu_type, gnu_field, 2, false);
- TYPE_JUSTIFIED_MODULAR_P (gnu_type) = 1;
 
  /* Make the original array type a parallel/debug type.  Note that
 gnat_get_array_descr_info needs a TYPE_IMPL_PACKED_ARRAY_P type
diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index b703f00d3c0..4f6f6774fe7 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -760,6 +760,19 @@ gnat_type_max_size (const_tree gnu_type)
   return max_size_unit;
 }
 
+/* Return the unit size of TYPE without reusable tail padding.  */
+
+static tree
+gnat_unit_size_without_reusable_padding (tree type)
+{
+  /* The padding of justified modular types can always be reused.  */
+  if (TYPE_JUSTIFIED_MODULAR_P (type))
+return fold_convert (sizetype,
+size_binop (CEIL_DIV_EXPR,
+TYPE_ADA_SIZE (type), bitsize_unit_node));
+  return TYPE_SIZE_UNIT (type);
+}
+
 static tree get_array_bit_stride (tree);
 
 /* Provide information in INFO for debug output about the TYPE array type.
@@ -1407,6 +1420,8 @@ const struct scoped_attribute_specs *const 
gnat_attribute_table[] =
 #define LANG_HOOKS_TYPE_FOR_SIZE   gnat_type_for_size
 #undef  LANG_HOOKS_TYPES_COMPATIBLE_P
 #define LANG_HOOKS_TYPES_COMPATIBLE_P  gnat_types_compatible_p
+#undef  LANG_HOOKS_UNIT_SIZE_WITHOUT_REUSABLE_PADDING
+#define LANG_HOOKS_UNIT_SIZE_WITHOUT_REUSABLE_PADDING 
gnat_unit_size_without_reusable_padding
 #undef  LANG_HOOKS_GET_ARRAY_DESCR_INFO
 #define LANG_HOOKS_GET_ARRAY_DESCR_INFOgnat_get_array_descr_info
 #undef  LANG_HOOKS_GET_SUBRANGE_BOUNDS
@@ -1433,7 +1448,7 @@ const struct scoped_attribute_specs *const 
gnat_attribute_table[] =
 #define LANG_HOOKS_DEEP_UNSHARING  true
 #undef  LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
 #define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
-#undef LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
+#undef  LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE
 #define LANG_HOOKS_GET_SARIF_SOURCE_LANGUAGE gnat_get_sarif_source_language
 
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
-- 
2.45.1



Re: Re: [PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-06-21 Thread wangf...@eswincomputing.com
On 2024-06-21 12:24  juzhe.zhong  wrote:
>
>+  if (*group.shape == shapes::loadstore
>+  || *group.shape == shapes::indexed_loadstore
>+  || *group.shape == shapes::vundefined
>+  || *group.shape == shapes::misc
>+  || *group.shape == shapes::vset
>+  || *group.shape == shapes::vget
>+  || *group.shape == shapes::vcreate
>+  || *group.shape == shapes::fault_load
>+  || *group.shape == shapes::seg_loadstore
>+  || *group.shape == shapes::seg_indexed_loadstore
>+  || *group.shape == shapes::seg_fault_load)
>+    return true;
>
>I prefer use swith-case:
>
>switch
>case...
>return true
>default
>return fasle;
>
>
>juzhe.zh...@rivai.ai
> 

I tried your suggestion, but this type(function_shape) can't use switch case 
structure. It will require adding more code if using it.
If you have a beteer method, please don't hesitate to share it.
Thanks.

>From: Feng Wang
>Date: 2024-06-21 09:54
>To: gcc-patches
>CC: kito.cheng; juzhe.zhong; jinma.contrib; Feng Wang
>Subject: [PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic
>Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
>functions are added by this patch.
>
>gcc/ChangeLog:
>
>* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
>    Add 'Zvfbfmin' intrinsic in bases.
>(class vfwcvtbf16_f): Ditto.
>(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
>(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
>* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 'Zvfbfmin' 
>and 'Zvfbfwma'.
>* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
>    Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
>(vfncvtbf16_f): Ditto.
>(vfncvtbf16_f_frm): Ditto.
>(vfwcvtbf16_f): Ditto.
>(vfwmaccbf16): Ditto.
>(vfwmaccbf16_frm): Ditto.
>* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
>    Add vector intrinsic build judgment for BFloat16.
>(build_all): Ditto.
>(BASE_NAME_MAX_LEN): Adjust max length.
>* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
>    Add new operand type for BFloat16.
>(vfloat32mf2_t): Ditto.
>(vfloat32m1_t): Ditto.
>(vfloat32m2_t): Ditto.
>(vfloat32m4_t): Ditto.
>(vfloat32m8_t): Ditto.
>* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
>(validate_instance_type_required_extensions):
>    Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
>* config/riscv/riscv-vector-builtins.h (enum required_ext):
>    Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
>(reqired_ext_to_isa_name): Ditto.
>(required_extensions_specified): Ditto.
>(struct function_group_info): Add match case for 'Zvfbfmin' and 'Zvfbfwma'.
>* config/riscv/riscv.cc (riscv_validate_vector_type):
>    Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
>
>---
>.../riscv/riscv-vector-builtins-bases.cc  | 69 +++
>.../riscv/riscv-vector-builtins-bases.h   |  7 ++
>.../riscv/riscv-vector-builtins-functions.def | 15 
>.../riscv/riscv-vector-builtins-shapes.cc | 37 --
>.../riscv/riscv-vector-builtins-types.def | 13 
>gcc/config/riscv/riscv-vector-builtins.cc | 64 +
>gcc/config/riscv/riscv-vector-builtins.h  | 14 
>gcc/config/riscv/riscv.cc |  5 +-
>8 files changed, 218 insertions(+), 6 deletions(-)
>
>diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
>b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>index b6f6e4ff37e..b10a83ab1fd 100644
>--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
>+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>@@ -2424,6 +2424,60 @@ public:
>   }
>};
>+/* Implements vfncvtbf16_f. */
>+template
>+class vfncvtbf16_f : public function_base
>+{
>+public:
>+  bool has_rounding_mode_operand_p () const override
>+  {
>+    return FRM_OP == HAS_FRM;
>+  }
>+
>+  bool may_require_frm_p () const override { return true; }
>+
>+  rtx expand (function_expander &e) const override
>+  {
>+    return e.use_exact_insn (code_for_pred_trunc_to_bf16 (e.vector_mode ()));
>+  }
>+};
>+
>+/* Implements vfwcvtbf16_f. */
>+class vfwcvtbf16_f : public function_base
>+{
>+public:
>+  rtx expand (function_expander &e) const override
>+  {
>+    return e.use_exact_insn (code_for_pred_extend_bf16_to (e.vector_mode ()));
>+  }
>+};
>+
>+/* Implements vfwmaccbf16. */
>+template
>+class vfwmaccbf16 : public function_base
>+{
>+public:
>+  bool has_rounding_mode_operand_p () const override
>+  {
>+    return FRM_OP == HAS_FRM;
>+  }
>+
>+  bool may_require_frm_p () const override { return true; }
>+
>+  bool has_merge_operand_p () const override { return false; }
>+
>+  rtx expand (function_expander &e) const override
>+  {
>+    if (e.op_info->op == OP_TYPE_vf)
>+  return e.use_widen_ternop_insn (
>+    code_for_pred_widen_bf16_mul_scalar (e.vector_mode ()));
>+    if (e.op_info->op == OP_TYPE_vv)
>+  return e.use_widen_ternop_insn (
>+    code_for_pred_widen_bf16_mul (e.vector_mode ()));
>+    gcc_un

[COMMITTED 15/22] ada: Fix crash in GNATbind during error reporting

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

This is the minimal fix to avoid the crash.

gcc/ada/

* bcheck.adb (Check_Consistency_Of_Sdep): Guard against path to ALI
file not found.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/bcheck.adb | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/bcheck.adb b/gcc/ada/bcheck.adb
index 56a417cc517..64a6734a330 100644
--- a/gcc/ada/bcheck.adb
+++ b/gcc/ada/bcheck.adb
@@ -162,10 +162,14 @@ package body Bcheck is
 end if;
 
  else
-ALI_Path_Id :=
-  Osint.Full_Lib_File_Name (A.Afile);
+ALI_Path_Id := Osint.Full_Lib_File_Name (A.Afile);
+
+--  Guard against Find_File not finding (again) the file because
+--  Primary_Directory has been clobbered in between.
 
-if Osint.Is_Readonly_Library (ALI_Path_Id) then
+if Present (ALI_Path_Id)
+  and then Osint.Is_Readonly_Library (ALI_Path_Id)
+then
if Tolerate_Consistency_Errors then
   Error_Msg ("?{ should be recompiled");
   Error_Msg_File_1 := ALI_Path_Id;
-- 
2.45.1



[COMMITTED 20/22] ada: Fix bogus Address Sanitizer stack-buffer-overflow on packed record equality

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

We set DECL_BIT_FIELD optimistically during the translation of record types
and clear it afterward if needed, but fail to clear other attributes in the
latter case, which fools the logic of the Address Sanitizer.

gcc/ada/

* gcc-interface/utils.cc (clear_decl_bit_field): New function.
(finish_record_type): Call clear_decl_bit_field instead of clearing
DECL_BIT_FIELD manually.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/utils.cc | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 771cb1a17ca..0eb9af8d4a2 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -2002,6 +2002,21 @@ finish_fat_pointer_type (tree record_type, tree 
field_list)
   TYPE_CONTAINS_PLACEHOLDER_INTERNAL (record_type) = 2;
 }
 
+/* Clear DECL_BIT_FIELD flag and associated markers on FIELD, which is a field
+   of aggregate type TYPE.  */
+
+static void
+clear_decl_bit_field (tree field, tree type)
+{
+  DECL_BIT_FIELD (field) = 0;
+  DECL_BIT_FIELD_TYPE (field) = NULL_TREE;
+
+  /* DECL_BIT_FIELD_REPRESENTATIVE is not defined for QUAL_UNION_TYPE since
+ it uses the same slot as DECL_QUALIFIER.  */
+  if (TREE_CODE (type) != QUAL_UNION_TYPE)
+DECL_BIT_FIELD_REPRESENTATIVE (field) = NULL_TREE;
+}
+
 /* Given a record type RECORD_TYPE and a list of FIELD_DECL nodes FIELD_LIST,
finish constructing the record or union type.  If REP_LEVEL is zero, this
record has no representation clause and so will be entirely laid out here.
@@ -2112,7 +2127,7 @@ finish_record_type (tree record_type, tree field_list, 
int rep_level,
  if (TYPE_ALIGN (record_type) >= align)
{
  SET_DECL_ALIGN (field, MAX (DECL_ALIGN (field), align));
- DECL_BIT_FIELD (field) = 0;
+ clear_decl_bit_field (field, record_type);
}
  else if (!had_align
   && rep_level == 0
@@ -2122,7 +2137,7 @@ finish_record_type (tree record_type, tree field_list, 
int rep_level,
{
  SET_TYPE_ALIGN (record_type, align);
  SET_DECL_ALIGN (field, MAX (DECL_ALIGN (field), align));
- DECL_BIT_FIELD (field) = 0;
+ clear_decl_bit_field (field, record_type);
}
}
 
@@ -2130,7 +2145,7 @@ finish_record_type (tree record_type, tree field_list, 
int rep_level,
  if (!STRICT_ALIGNMENT
  && DECL_BIT_FIELD (field)
  && value_factor_p (pos, BITS_PER_UNIT))
-   DECL_BIT_FIELD (field) = 0;
+   clear_decl_bit_field (field, record_type);
}
 
   /* Clear DECL_BIT_FIELD_TYPE for a variant part at offset 0, it's simply
@@ -2453,10 +2468,7 @@ rest_of_record_type_compilation (tree record_type)
 avoid generating useless attributes for the field in DWARF.  */
  if (DECL_SIZE (old_field) == TYPE_SIZE (field_type)
  && value_factor_p (pos, BITS_PER_UNIT))
-   {
- DECL_BIT_FIELD (new_field) = 0;
- DECL_BIT_FIELD_TYPE (new_field) = NULL_TREE;
-   }
+   clear_decl_bit_field (new_field, new_record_type);
  DECL_CHAIN (new_field) = TYPE_FIELDS (new_record_type);
  TYPE_FIELDS (new_record_type) = new_field;
 
-- 
2.45.1



[COMMITTED 17/22] ada: Reject ambiguous function calls in interpolated string expressions

2024-06-21 Thread Marc Poulhiès
From: Javier Miranda 

When the interpolated expression is a call to an ambiguous call
the frontend does not reject it; erroneously accepts the call
and generates code that calls to one of them.

gcc/ada/

* sem_ch2.adb (Analyze_Interpolated_String_Literal): Reject
ambiguous function calls.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch2.adb | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/ada/sem_ch2.adb b/gcc/ada/sem_ch2.adb
index aae9990eb4d..08cc75c9104 100644
--- a/gcc/ada/sem_ch2.adb
+++ b/gcc/ada/sem_ch2.adb
@@ -25,7 +25,9 @@
 
 with Atree;  use Atree;
 with Einfo;  use Einfo;
+with Einfo.Entities; use Einfo.Entities;
 with Einfo.Utils;use Einfo.Utils;
+with Errout; use Errout;
 with Ghost;  use Ghost;
 with Mutably_Tagged; use Mutably_Tagged;
 with Namet;  use Namet;
@@ -141,6 +143,14 @@ package body Sem_Ch2 is
   Str_Elem := First (Expressions (N));
   while Present (Str_Elem) loop
  Analyze (Str_Elem);
+
+ if Nkind (Str_Elem) = N_Identifier
+   and then Ekind (Entity (Str_Elem)) = E_Function
+   and then Is_Overloaded (Str_Elem)
+ then
+Error_Msg_NE ("ambiguous call to&", Str_Elem, Entity (Str_Elem));
+ end if;
+
  Next (Str_Elem);
   end loop;
end Analyze_Interpolated_String_Literal;
-- 
2.45.1



[COMMITTED 22/22] ada: Fix internal error on protected type with -gnatc -gnatR

2024-06-21 Thread Marc Poulhiès
From: Eric Botcazou 

It occurs when the body of a protected subprogram is processed, because the
references to the components of the type have not been properly expanded.

gcc/ada/

* gcc-interface/trans.cc (Subprogram_Body_to_gnu): Also return early
for a protected subprogram in -gnatc mode.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 83ed17bff84..3f2eadd7b2b 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -3934,6 +3934,12 @@ Subprogram_Body_to_gnu (Node_Id gnat_node)
   if (Is_Generic_Subprogram (gnat_subprog) || Is_Eliminated (gnat_subprog))
 return;
 
+  /* Likewise if this is a protected subprogram and we are only annotating
+ types, as the required expansion of references did not take place.  */
+  if (Convention (gnat_subprog) == Convention_Protected
+  && type_annotate_only)
+return;
+
   /* If this subprogram acts as its own spec, define it.  Otherwise, just get
  the already-elaborated tree node.  However, if this subprogram had its
  elaboration deferred, we will already have made a tree node for it.  So
-- 
2.45.1



[COMMITTED 13/22] ada: Change error message on invalid RTS path

2024-06-21 Thread Marc Poulhiès
Include the invalid path in the error message.

gcc/ada/

* make.adb (Scan_Make_Arg): Adjust error message.
* gnatls.adb (Search_RTS): Likewise.
* switch-b.adb (Scan_Debug_Switches): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gnatls.adb   | 11 ---
 gcc/ada/make.adb | 14 +-
 gcc/ada/switch-b.adb | 15 ++-
 3 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/gnatls.adb b/gcc/ada/gnatls.adb
index 2c26001743a..c52c1aea9c3 100644
--- a/gcc/ada/gnatls.adb
+++ b/gcc/ada/gnatls.adb
@@ -1673,9 +1673,13 @@ procedure Gnatls is
   end if;
 
   if Lib_Path /= null then
- Osint.Fail ("RTS path not valid: missing adainclude directory");
+ Osint.Fail
+   ("RTS path """ & Name
+& """ not valid: missing adainclude directory");
   elsif Src_Path /= null then
- Osint.Fail ("RTS path not valid: missing adalib directory");
+ Osint.Fail
+   ("RTS path """ & Name
+& """ not valid: missing adalib directory");
   end if;
 
   --  Try to find the RTS on the project path. First setup the project path
@@ -1710,7 +1714,8 @@ procedure Gnatls is
   end if;
 
   Osint.Fail
-("RTS path not valid: missing adainclude and adalib directories");
+("RTS path """ & Name
+  & """ not valid: missing adainclude and adalib directories");
end Search_RTS;
 
---
diff --git a/gcc/ada/make.adb b/gcc/ada/make.adb
index 24b2d099bfe..cef24341135 100644
--- a/gcc/ada/make.adb
+++ b/gcc/ada/make.adb
@@ -4478,13 +4478,14 @@ package body Make is
RTS_Switch := True;
 
declare
+  RTS_Arg_Path : constant String := Argv (7 .. Argv'Last);
   Src_Path_Name : constant String_Ptr :=
 Get_RTS_Search_Dir
-  (Argv (7 .. Argv'Last), Include);
+  (RTS_Arg_Path, Include);
 
   Lib_Path_Name : constant String_Ptr :=
 Get_RTS_Search_Dir
-  (Argv (7 .. Argv'Last), Objects);
+  (RTS_Arg_Path, Objects);
 
begin
   if Src_Path_Name /= null
@@ -4501,16 +4502,19 @@ package body Make is
 and then Lib_Path_Name = null
   then
  Make_Failed
-   ("RTS path not valid: missing adainclude and adalib "
+   ("RTS path """ & RTS_Arg_Path
+& """ not valid: missing adainclude and adalib "
 & "directories");
 
   elsif Src_Path_Name = null then
  Make_Failed
-   ("RTS path not valid: missing adainclude directory");
+   ("RTS path """ & RTS_Arg_Path
+& """ not valid: missing adainclude directory");
 
   else pragma Assert (Lib_Path_Name = null);
  Make_Failed
-   ("RTS path not valid: missing adalib directory");
+   ("RTS path """ & RTS_Arg_Path
+& """ not valid: missing adalib directory");
   end if;
end;
 end if;
diff --git a/gcc/ada/switch-b.adb b/gcc/ada/switch-b.adb
index 8d8dc58937c..2de516dba56 100644
--- a/gcc/ada/switch-b.adb
+++ b/gcc/ada/switch-b.adb
@@ -672,13 +672,15 @@ package body Switch.B is
   Opt.RTS_Switch := True;
 
   declare
+ RTS_Arg_Path : constant String :=
+   Switch_Chars (Ptr + 1 .. Max);
  Src_Path_Name : constant String_Ptr :=
Get_RTS_Search_Dir
- (Switch_Chars (Ptr + 1 .. Max),
+ (RTS_Arg_Path,
   Include);
  Lib_Path_Name : constant String_Ptr :=
Get_RTS_Search_Dir
- (Switch_Chars (Ptr + 1 .. Max),
+ (RTS_Arg_Path,
   Objects);
 
   begin
@@ -698,14 +700,17 @@ package body Switch.B is
and then Lib_Path_Name = null
  then
 Osint.Fail
-  ("RTS path not valid: missing adainclude and "
+  ("RTS path """ & RTS_Arg_Path
+   & """ not valid: missing adainclude and "
& "adalib directories");
  elsif Src_Path_Name = null then
 Osint.Fail
-  

Re: [PATCH][PR115565] cse: Don't use a valid regno for non-register in comparison_qty

2024-06-21 Thread Richard Sandiford
"Maciej W. Rozycki"  writes:
> Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not 
> with a register, because the value of -1 is actually a valid reference 
> to register 0 in the case where it has not been assigned a quantity.  
>
> Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty' 
> comparison in `fold_rtx' to incorrectly trigger in rare circumstances 
> and return true for a memory reference making CSE consider a comparison 
> operation to evaluate to a constant expression and consequently make the 
> resulting code incorrectly execute or fail to execute conditional 
> blocks.
>
> This has caused a miscompilation of rwlock.c from LinuxThreads for the 
> `alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()' 
> expression (where `thread_self' returns the thread pointer via a PALcode 
> call) has been decided to be always true (with `ent->comparison_qty' 
> using -1 for a reference to to `rwlock->__rw_writer', while register 0 
> holding the thread pointer retrieved by `thread_self') and code for the 
> false case has been optimized away where it mustn't have, causing 
> program lockups.
>
> The issue has been observed as a regression from commit 08a692679fb8 
> ("Undefined cse.c behaviour causes 3.4 regression on HPUX"), 
> , and up to 
> commit 932ad4d9b550 ("Make CSE path following use the CFG"), 
> , where CSE 
> has been restructured sufficiently for the issue not to trigger with the 
> original reproducer anymore.  However the original bug remains and can 
> trigger, because `comparison_qty' will still be assigned -1 for a memory 
> reference and the `reg_qty' member of a `cse_reg_info_table' entry will 
> still be assigned -1 for register 0 where the entry has not been 
> assigned a quantity, e.g. at initialization.
>
> Use INT_MIN then as noted above, so that the value remains negative, for 
> consistency with the REGNO_QTY_VALID_P macro (even though not used on 
> `comparison_qty'), and then so that it should not ever match a valid 
> negated register number, fixing the regression with commit 08a692679fb8.
>
>   gcc/
>   PR rtl-optimization/115565
>   * cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
>   `comparison_qty' if !REG_P.
> ---
> Hi,
>
>  Oh boy, this was hard to chase and debug!  See the PR referred for 
> details.  Sadly I have no reproducer for GCC 15, this bug seems too 
> elusive to make one easily.
>
>  This has passed verification in native `powerpc64le-linux-gnu' and 
> `x86_64-linux-gnu' regstraps, as well as with the `alpha-linux-gnu' 
> target.  OK to apply and backport to the release branches?

Huh!  Nice detective work.

The patch is OK for trunk, thanks.  I agree that it's a regression
from 08a692679fb8.  Since it's fixing such a hard-to-diagnose wrong
code bug, and since it seems very safe, I think it's worth backporting
to all active branches, after a grace period.

Thanks,
Richard

>
>   Maciej
> ---
>  gcc/cse.cc |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> gcc-cse-comparison-qty.diff
> Index: gcc/gcc/cse.cc
> ===
> --- gcc.orig/gcc/cse.cc
> +++ gcc/gcc/cse.cc
> @@ -239,7 +239,7 @@ static int next_qty;
> the constant being compared against, or zero if the comparison
> is not against a constant.  `comparison_qty' holds the quantity
> being compared against when the result is known.  If the comparison
> -   is not with a register, `comparison_qty' is -1.  */
> +   is not with a register, `comparison_qty' is INT_MIN.  */
>  
>  struct qty_table_elem
>  {
> @@ -4058,7 +4058,7 @@ record_jump_cond (enum rtx_code code, ma
>else
>   {
> ent->comparison_const = op1;
> -   ent->comparison_qty = -1;
> +   ent->comparison_qty = INT_MIN;
>   }
>  
>return;


[committed] libstdc++: Initialize base in test allocator's constructor

2024-06-21 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This fixes a warning from one of the test allocators:
warning: base class 'class std::allocator<__gnu_test::copy_tracker>' should be 
explicitly initialized in the copy constructor [-Wextra]

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_allocator.h (tracker_allocator):
Initialize base class in copy constructor.
---
 libstdc++-v3/testsuite/util/testsuite_allocator.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/util/testsuite_allocator.h 
b/libstdc++-v3/testsuite/util/testsuite_allocator.h
index b7739f13ca3..2f9c453cbd1 100644
--- a/libstdc++-v3/testsuite/util/testsuite_allocator.h
+++ b/libstdc++-v3/testsuite/util/testsuite_allocator.h
@@ -154,7 +154,7 @@ namespace __gnu_test
   tracker_allocator()
   { }
 
-  tracker_allocator(const tracker_allocator&)
+  tracker_allocator(const tracker_allocator& a) : Alloc(a)
   { }
 
   ~tracker_allocator()
-- 
2.45.2



Re: [PATCH] libstdc++: Fix __cpp_lib_chrono for old std::string ABI

2024-06-21 Thread Jonathan Wakely
Pushed to trunk now. Backport to gcc-14 needed too.

On Thu, 20 Jun 2024 at 16:27, Jonathan Wakely  wrote:
>
> This unfortunately means we can never increase __cpp_lib_chrono again
> for the old string ABI, but I don't see any alternative (except
> supporting chrono::tzdb for the old string, which will be a lot of work
> that I don't want to do!)
>
> -- >8 --
>
> The  header is incomplete for the old std::string ABI, because
> std::chrono::tzdb is only defined for the new ABI. The feature test
> macro advertising full C++20 support should not be defined for the old
> ABI.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/version.def (chrono): Add cxx11abi = yes.
> * include/bits/version.h: Regenerate.
> * testsuite/std/time/syn_c++20.cc: Adjust expected value for
> the feature test macro.
> ---
>  libstdc++-v3/include/bits/version.def|  1 +
>  libstdc++-v3/include/bits/version.h  |  2 +-
>  libstdc++-v3/testsuite/std/time/syn_c++20.cc | 11 +--
>  3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/version.def 
> b/libstdc++-v3/include/bits/version.def
> index 683b967d54b..42cdef2f526 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -574,6 +574,7 @@ ftms = {
>  v = 201907;
>  cxxmin = 20;
>  hosted = yes;
> +cxx11abi = yes; // std::chrono::tzdb requires cxx11 std::string
>};
>values = {
>  v = 201611;
> diff --git a/libstdc++-v3/include/bits/version.h 
> b/libstdc++-v3/include/bits/version.h
> index 4850041c0a3..1eaf3733bc2 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -639,7 +639,7 @@
>  #undef __glibcxx_want_boyer_moore_searcher
>
>  #if !defined(__cpp_lib_chrono)
> -# if (__cplusplus >= 202002L) && _GLIBCXX_HOSTED
> +# if (__cplusplus >= 202002L) && _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_HOSTED
>  #  define __glibcxx_chrono 201907L
>  #  if defined(__glibcxx_want_all) || defined(__glibcxx_want_chrono)
>  #   define __cpp_lib_chrono 201907L
> diff --git a/libstdc++-v3/testsuite/std/time/syn_c++20.cc 
> b/libstdc++-v3/testsuite/std/time/syn_c++20.cc
> index f0b86199e9d..4a527262e9d 100644
> --- a/libstdc++-v3/testsuite/std/time/syn_c++20.cc
> +++ b/libstdc++-v3/testsuite/std/time/syn_c++20.cc
> @@ -20,9 +20,16 @@
>
>  #include 
>
> +// std::chrono::tzdb is not defined for the old std::string ABI.
> +#if _GLIBCXX_USE_CXX_ABI
> +# define EXPECTED_VALUE 201907L
> +#else
> +# define EXPECTED_VALUE 201611L
> +#endif
> +
>  #ifndef __cpp_lib_chrono
>  # error "Feature test macro for chrono is missing in "
> -#elif __cpp_lib_chrono < 201907L
> +#elif __cpp_lib_chrono < EXPECTED_VALUE
>  # error "Feature test macro for chrono has wrong value in "
>  #endif
>
> @@ -94,7 +101,7 @@ namespace __gnu_test
>using std::chrono::make12;
>using std::chrono::make24;
>
> -#if _GLIBCXX_USE_CXX11_ABI
> +#if __cpp_lib_chrono >= 201803L
>using std::chrono::tzdb;
>using std::chrono::tzdb_list;
>using std::chrono::get_tzdb;
> --
> 2.45.2
>



Re: [PATCH] libstdc++: Fix std::to_array for trivial-ish types [PR115522]

2024-06-21 Thread Jonathan Wakely
Pushed to trunk now.

On Wed, 19 Jun 2024 at 17:38, Jonathan Wakely  wrote:
>
> Tested x86_64-linux. Not pushed yet. backports will be needed too.
>
> -- >8 --
>
> Due to PR c++/85723 the std::is_trivial trait is true for types with a
> deleted default constructor, so the use of std::is_trivial in
> std::to_array is not sufficient to ensure the type can be trivially
> default constructed then filled using memcpy.
>
> I also forgot that a type with a deleted assignment operator can still
> be trivial, so we also need to check that it's assignable because the
> is_constant_evaluated() path can't use memcpy.
>
> Replace the uses of std::is_trivial with std::is_trivially_copyable
> (needed for memcpy), std::is_trivially_default_constructible (needed so
> that the default construction is valid and does no work) and
> std::is_copy_assignable (needed for the constant evaluation case).
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/115522
> * include/std/array (to_array): Workaround the fact that
> std::is_trivial is not sufficient to check that a type is
> trivially default constructible and assignable.
> * testsuite/23_containers/array/creation/115522.cc: New test.
> ---
>  libstdc++-v3/include/std/array|  8 +++--
>  .../23_containers/array/creation/115522.cc| 33 +++
>  2 files changed, 39 insertions(+), 2 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/23_containers/array/creation/115522.cc
>
> diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
> index 39695471e24..8710bf75924 100644
> --- a/libstdc++-v3/include/std/array
> +++ b/libstdc++-v3/include/std/array
> @@ -431,7 +431,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static_assert(is_constructible_v<_Tp, _Tp&>);
>if constexpr (is_constructible_v<_Tp, _Tp&>)
> {
> - if constexpr (is_trivial_v<_Tp>)
> + if constexpr (is_trivially_copyable_v<_Tp>
> + && is_trivially_default_constructible_v<_Tp>
> + && is_copy_assignable_v<_Tp>)
> {
>   array, _Nm> __arr;
>   if (!__is_constant_evaluated() && _Nm != 0)
> @@ -460,7 +462,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static_assert(is_move_constructible_v<_Tp>);
>if constexpr (is_move_constructible_v<_Tp>)
> {
> - if constexpr (is_trivial_v<_Tp>)
> + if constexpr (is_trivially_copyable_v<_Tp>
> + && is_trivially_default_constructible_v<_Tp>
> + && is_copy_assignable_v<_Tp>)
> {
>   array, _Nm> __arr;
>   if (!__is_constant_evaluated() && _Nm != 0)
> diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/115522.cc 
> b/libstdc++-v3/testsuite/23_containers/array/creation/115522.cc
> new file mode 100644
> index 000..37073e002bd
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/array/creation/115522.cc
> @@ -0,0 +1,33 @@
> +// { dg-do compile { target c++20 } }
> +
> +// PR libstdc++/115522 std::to_array no longer works for struct which is
> +// trivial but not default constructible
> +
> +#include 
> +
> +void
> +test_deleted_ctor()
> +{
> +  struct S
> +  {
> +S() = delete;
> +S(int) { }
> +  };
> +
> +  S arr[1] = {{1}};
> +  auto arr1 = std::to_array(arr);
> +  auto arr2 = std::to_array(std::move(arr));
> +}
> +
> +void
> +test_deleted_assignment()
> +{
> +  struct S
> +  {
> +void operator=(const S&) = delete;
> +  };
> +
> +  S arr[1] = {};
> +  auto a1 = std::to_array(arr);
> +  auto a2 = std::to_array(std::move(arr));
> +}
> --
> 2.45.1
>



Re: [PATCH] Build: Set gcc_cv_as_mips_explicit_relocs if gcc_cv_as_mips_explicit_relocs_pcrel

2024-06-21 Thread Richard Sandiford
YunQiang Su  writes:
> We check gcc_cv_as_mips_explicit_relocs if 
> gcc_cv_as_mips_explicit_relocs_pcrel
> only, while gcc_cv_as_mips_explicit_relocs is used by later code.
>
> Maybe, it is time for use to set gcc_cv_as_mips_explicit_relocs always now,
> as it has been in Binutils for more than 20 years.

Yeah, agreed FWIW.  This was necessary while the feature was relatively
new, and while we still supported IRIX as, but I can't see any reasonable
justification for using such an ancient binutils with modern GCC.

Getting rid of -mno-explicit-relocs altogether might simplify things.

ichard

>
> gcc
>   * configure.ac: Set gcc_cv_as_mips_explicit_relocs if
>   gcc_cv_as_mips_explicit_relocs_pcrel.
>   * configure: Regenerate.
> ---
>  gcc/configure| 2 ++
>  gcc/configure.ac | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/gcc/configure b/gcc/configure
> index 9dc0b65dfaa..ad998105da3 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -30278,6 +30278,8 @@ $as_echo "#define MIPS_EXPLICIT_RELOCS 
> MIPS_EXPLICIT_RELOCS_BASE" >>confdefs.h
>  
>  fi
>  
> +else
> +  gcc_cv_as_mips_explicit_relocs=yes
>  fi
>  
>  if test x$gcc_cv_as_mips_explicit_relocs = xno; then \
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index b2243e9954a..c51d3ca5f1b 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -5255,6 +5255,8 @@ LCF0:
>  [lw $4,%gp_rel(foo)($4)],,
>[AC_DEFINE(MIPS_EXPLICIT_RELOCS, MIPS_EXPLICIT_RELOCS_BASE,
>[Define if assembler supports %reloc.])])
> +else
> +  gcc_cv_as_mips_explicit_relocs=yes
>  fi
>  
>  if test x$gcc_cv_as_mips_explicit_relocs = xno; then \


Re: [PATCH 2/3] libstdc++: Add deprecation warnings to types

2024-06-21 Thread Jonathan Wakely
Pushed to trunk now.

On Thu, 20 Jun 2024 at 16:38, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/backward/backward_warning.h: Adjust comments to
> suggest  as another alternative to .
> * include/backward/strstream (strstreambuf, istrstream)
> (ostrstream, strstream): Add deprecated attribute.
> ---
>  .../include/backward/backward_warning.h   | 12 +++
>  libstdc++-v3/include/backward/strstream   | 20 +++
>  2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/libstdc++-v3/include/backward/backward_warning.h 
> b/libstdc++-v3/include/backward/backward_warning.h
> index 3f3330327d4..834fc5680cc 100644
> --- a/libstdc++-v3/include/backward/backward_warning.h
> +++ b/libstdc++-v3/include/backward/backward_warning.h
> @@ -40,10 +40,14 @@
>A list of valid replacements is as follows:
>
>Use: Instead of:
> -  , basic_stringbuf   , strstreambuf
> -  , basic_istringstream   , istrstream
> -  , basic_ostringstream   , ostrstream
> -  , basic_stringstream, strstream
> +  , stringbuf
> +or , spanbuf   , strstreambuf
> +  , istringstream
> +or , ispanstream   , istrstream
> +  , ostringstream
> +or , ospanstream   , ostrstream
> +  , stringstream
> +or , spanstream, strstream
>, unordered_set   , hash_set
>, unordered_multiset  , hash_multiset
>, unordered_map   , hash_map
> diff --git a/libstdc++-v3/include/backward/strstream 
> b/libstdc++-v3/include/backward/strstream
> index 152e93767f6..5e421143385 100644
> --- a/libstdc++-v3/include/backward/strstream
> +++ b/libstdc++-v3/include/backward/strstream
> @@ -57,6 +57,12 @@ namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> +#if __glibcxx_spanstream
> +# define _GLIBCXX_STRSTREAM_DEPR(A, B) _GLIBCXX_DEPRECATED_SUGGEST(A "' or 
> '" B)
> +#else
> +# define _GLIBCXX_STRSTREAM_DEPR(A, B) _GLIBCXX_DEPRECATED_SUGGEST(A)
> +#endif
> +
>// Class strstreambuf, a streambuf class that manages an array of char.
>// Note that this class is not a template.
>class strstreambuf : public basic_streambuf >
> @@ -151,7 +157,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  bool _M_dynamic  : 1;
>  bool _M_frozen   : 1;
>  bool _M_constant : 1;
> -  };
> +  } _GLIBCXX_STRSTREAM_DEPR("std::stringbuf", "std::spanbuf");
> +
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
>
>// Class istrstream, an istream that manages a strstreambuf.
>class istrstream : public basic_istream
> @@ -176,7 +185,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>private:
>  strstreambuf _M_buf;
> -  };
> +  } _GLIBCXX_STRSTREAM_DEPR("std::istringstream", "std::ispanstream");
>
>// Class ostrstream
>class ostrstream : public basic_ostream
> @@ -201,7 +210,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>private:
>  strstreambuf _M_buf;
> -  };
> +  } _GLIBCXX_STRSTREAM_DEPR("std::ostringstream", "std::ospanstream");
>
>// Class strstream
>class strstream : public basic_iostream
> @@ -231,7 +240,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>private:
>  strstreambuf _M_buf;
> -  };
> +  } _GLIBCXX_STRSTREAM_DEPR("std::stringstream", "std::spanstream");
> +
> +#undef _GLIBCXX_STRSTREAM_DEPR
> +#pragma GCC diagnostic pop
>
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace
> --
> 2.45.2
>



Re: [PATCH 1/3] libstdc++: Add [[deprecated]] to std::wstring_convert and std::wbuffer_convert

2024-06-21 Thread Jonathan Wakely
Pushed to trunk now.

On Thu, 20 Jun 2024 at 16:38, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> -- >8 --
>
> These were deprecated in C++17 and std::wstring_convert is planned for
> removal in C++26.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/locale_conv.h (wstring_convert): Add deprecated
> attribute for C++17 and later.
> (wbuffer_convert): Likewise.
> * testsuite/22_locale/codecvt/codecvt_utf16/79980.cc: Disable
> deprecated warnings.
> * testsuite/22_locale/codecvt/codecvt_utf8/79980.cc: Likewise.
> * testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc:
> Likewise.
> * testsuite/22_locale/conversions/buffer/1.cc: Add dg-warning.
> * testsuite/22_locale/conversions/buffer/2.cc: Likewise.
> * testsuite/22_locale/conversions/buffer/3.cc: Likewise.
> * testsuite/22_locale/conversions/buffer/requirements/typedefs.cc:
> Likewise.
> * testsuite/22_locale/conversions/string/1.cc: Likewise.
> * testsuite/22_locale/conversions/string/2.cc: Likewise.
> * testsuite/22_locale/conversions/string/3.cc: Likewise.
> * testsuite/22_locale/conversions/string/66441.cc: Likewise.
> * testsuite/22_locale/conversions/string/requirements/typedefs-2.cc:
> Likewise.
> * testsuite/22_locale/conversions/string/requirements/typedefs.cc:
> Likewise.
> ---
>  libstdc++-v3/include/bits/locale_conv.h  | 5 +++--
>  .../testsuite/22_locale/codecvt/codecvt_utf16/79980.cc   | 1 +
>  .../testsuite/22_locale/codecvt/codecvt_utf8/79980.cc| 1 +
>  .../testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc  | 1 +
>  libstdc++-v3/testsuite/22_locale/conversions/buffer/1.cc | 1 +
>  libstdc++-v3/testsuite/22_locale/conversions/buffer/2.cc | 1 +
>  libstdc++-v3/testsuite/22_locale/conversions/buffer/3.cc | 2 ++
>  .../22_locale/conversions/buffer/requirements/typedefs.cc| 2 +-
>  libstdc++-v3/testsuite/22_locale/conversions/string/1.cc | 1 +
>  libstdc++-v3/testsuite/22_locale/conversions/string/2.cc | 1 +
>  libstdc++-v3/testsuite/22_locale/conversions/string/3.cc | 1 +
>  libstdc++-v3/testsuite/22_locale/conversions/string/66441.cc | 1 +
>  .../22_locale/conversions/string/requirements/typedefs-2.cc  | 1 +
>  .../22_locale/conversions/string/requirements/typedefs.cc| 1 +
>  14 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/locale_conv.h 
> b/libstdc++-v3/include/bits/locale_conv.h
> index 754c36b92b8..63dee1ac872 100644
> --- a/libstdc++-v3/include/bits/locale_conv.h
> +++ b/libstdc++-v3/include/bits/locale_conv.h
> @@ -259,7 +259,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>templatetypename _Wide_alloc = allocator<_Elem>,
>typename _Byte_alloc = allocator>
> -class wstring_convert
> +class _GLIBCXX17_DEPRECATED wstring_convert
>  {
>  public:
>typedef basic_string, _Byte_alloc>   
> byte_string;
> @@ -406,7 +406,8 @@ _GLIBCXX_END_NAMESPACE_CXX11
>/// Buffer conversions
>templatetypename _Tr = char_traits<_Elem>>
> -class wbuffer_convert : public basic_streambuf<_Elem, _Tr>
> +class _GLIBCXX17_DEPRECATED wbuffer_convert
> +: public basic_streambuf<_Elem, _Tr>
>  {
>typedef basic_streambuf<_Elem, _Tr> _Wide_streambuf;
>
> diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/79980.cc 
> b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/79980.cc
> index 1c6711b56db..90cb844bba3 100644
> --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/79980.cc
> +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/79980.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do run { target c++11 } }
> +// { dg-additional-options "-Wno-deprecated-declarations" { target c++17 } }
>
>  #include 
>  #include 
> diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/79980.cc 
> b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/79980.cc
> index d6cd89ce420..66aadc1161d 100644
> --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/79980.cc
> +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/79980.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do run { target c++11 } }
> +// { dg-additional-options "-Wno-deprecated-declarations" { target c++17 } }
>
>  #include 
>  #include 
> diff --git 
> a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc 
> b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc
> index d09aa41ee93..e6934818864 100644
> --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc
> +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/79511.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do run { target c++11 } }
> +// { dg-additional-options "-Wno-deprecated-declarations

Re: [PATCH 3/3] libstdc++: Undeprecate std::pmr::polymorphic_allocator::destroy (P2875R4)

2024-06-21 Thread Jonathan Wakely
Pusahed to trunk now.

On Thu, 20 Jun 2024 at 16:41, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> -- >8 --
>
> This member function was previously deprecated, but that was reverted by
> P2875R4, approved earlier this year in Tokyo. Since it's not going to be
> deprecated in C++26, and so presumably not removed, there is no point in
> giving deprecated warnings for C++23 mode.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/memory_resource.h (polymorphic_allocator::destroy):
> Remove deprecated attribute.
> ---
>  libstdc++-v3/include/bits/memory_resource.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/memory_resource.h 
> b/libstdc++-v3/include/bits/memory_resource.h
> index 022371245c1..5f50b296df7 100644
> --- a/libstdc++-v3/include/bits/memory_resource.h
> +++ b/libstdc++-v3/include/bits/memory_resource.h
> @@ -305,7 +305,6 @@ namespace pmr
>  #endif
>
>template
> -   _GLIBCXX20_DEPRECATED_SUGGEST("allocator_traits::destroy")
> __attribute__((__nonnull__))
> void
> destroy(_Up* __p)
> --
> 2.45.2
>



Re: Re: [PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-06-21 Thread juzhe.zh...@rivai.ai
I see, it's operator== overloaded.

LGTM.



juzhe.zh...@rivai.ai
 
From: wangf...@eswincomputing.com
Date: 2024-06-21 17:03
To: juzhe.zhong; gcc-patches
CC: kito.cheng; jinma.contrib
Subject: Re: Re: [PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic
On 2024-06-21 12:24  juzhe.zhong  wrote:
>
>+  if (*group.shape == shapes::loadstore
>+  || *group.shape == shapes::indexed_loadstore
>+  || *group.shape == shapes::vundefined
>+  || *group.shape == shapes::misc
>+  || *group.shape == shapes::vset
>+  || *group.shape == shapes::vget
>+  || *group.shape == shapes::vcreate
>+  || *group.shape == shapes::fault_load
>+  || *group.shape == shapes::seg_loadstore
>+  || *group.shape == shapes::seg_indexed_loadstore
>+  || *group.shape == shapes::seg_fault_load)
>+return true;
>
>I prefer use swith-case:
>
>switch
>case...
>return true
>default
>return fasle;
>
>
>juzhe.zh...@rivai.ai
> 
 
I tried your suggestion, but this type(function_shape) can't use switch case 
structure. It will require adding more code if using it.
If you have a beteer method, please don't hesitate to share it.
Thanks.
 
>From: Feng Wang
>Date: 2024-06-21 09:54
>To: gcc-patches
>CC: kito.cheng; juzhe.zhong; jinma.contrib; Feng Wang
>Subject: [PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic
>Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
>functions are added by this patch.
>
>gcc/ChangeLog:
>
>* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
>Add 'Zvfbfmin' intrinsic in bases.
>(class vfwcvtbf16_f): Ditto.
>(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
>(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
>* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 'Zvfbfmin' 
>and 'Zvfbfwma'.
>* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
>Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
>(vfncvtbf16_f): Ditto.
>(vfncvtbf16_f_frm): Ditto.
>(vfwcvtbf16_f): Ditto.
>(vfwmaccbf16): Ditto.
>(vfwmaccbf16_frm): Ditto.
>* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
>Add vector intrinsic build judgment for BFloat16.
>(build_all): Ditto.
>(BASE_NAME_MAX_LEN): Adjust max length.
>* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
>Add new operand type for BFloat16.
>(vfloat32mf2_t): Ditto.
>(vfloat32m1_t): Ditto.
>(vfloat32m2_t): Ditto.
>(vfloat32m4_t): Ditto.
>(vfloat32m8_t): Ditto.
>* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
>(validate_instance_type_required_extensions):
>Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
>* config/riscv/riscv-vector-builtins.h (enum required_ext):
>Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
>(reqired_ext_to_isa_name): Ditto.
>(required_extensions_specified): Ditto.
>(struct function_group_info): Add match case for 'Zvfbfmin' and 'Zvfbfwma'.
>* config/riscv/riscv.cc (riscv_validate_vector_type):
>Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
>
>---
>.../riscv/riscv-vector-builtins-bases.cc  | 69 +++
>.../riscv/riscv-vector-builtins-bases.h   |  7 ++
>.../riscv/riscv-vector-builtins-functions.def | 15 
>.../riscv/riscv-vector-builtins-shapes.cc | 37 --
>.../riscv/riscv-vector-builtins-types.def | 13 
>gcc/config/riscv/riscv-vector-builtins.cc | 64 +
>gcc/config/riscv/riscv-vector-builtins.h  | 14 
>gcc/config/riscv/riscv.cc |  5 +-
>8 files changed, 218 insertions(+), 6 deletions(-)
>
>diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
>b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>index b6f6e4ff37e..b10a83ab1fd 100644
>--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
>+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
>@@ -2424,6 +2424,60 @@ public:
>   }
>};
>+/* Implements vfncvtbf16_f. */
>+template
>+class vfncvtbf16_f : public function_base
>+{
>+public:
>+  bool has_rounding_mode_operand_p () const override
>+  {
>+return FRM_OP == HAS_FRM;
>+  }
>+
>+  bool may_require_frm_p () const override { return true; }
>+
>+  rtx expand (function_expander &e) const override
>+  {
>+return e.use_exact_insn (code_for_pred_trunc_to_bf16 (e.vector_mode ()));
>+  }
>+};
>+
>+/* Implements vfwcvtbf16_f. */
>+class vfwcvtbf16_f : public function_base
>+{
>+public:
>+  rtx expand (function_expander &e) const override
>+  {
>+return e.use_exact_insn (code_for_pred_extend_bf16_to (e.vector_mode ()));
>+  }
>+};
>+
>+/* Implements vfwmaccbf16. */
>+template
>+class vfwmaccbf16 : public function_base
>+{
>+public:
>+  bool has_rounding_mode_operand_p () const override
>+  {
>+return FRM_OP == HAS_FRM;
>+  }
>+
>+  bool may_require_frm_p () const override { return true; }
>+
>+  bool has_merge_operand_p () const override { return false; }
>+
>+  rtx expand (function_expander &e) const override
>+  {
>+if (e.op_info->op == OP_TYPE_vf)
>

[committed] libstdc++: Qualify calls in to prevent ADL

2024-06-21 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. Probably worth backporting.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h (uninitialized_default_construct)
(uninitialized_default_construct_n, uninitialized_value_construct)
(uninitialized_value_construct_n): Qualify calls to prevent ADL.
---
 libstdc++-v3/include/bits/stl_uninitialized.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
b/libstdc++-v3/include/bits/stl_uninitialized.h
index 7f84da31578..3c405d8fbe8 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -975,7 +975,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 uninitialized_default_construct(_ForwardIterator __first,
_ForwardIterator __last)
 {
-  __uninitialized_default_novalue(__first, __last);
+  std::__uninitialized_default_novalue(__first, __last);
 }
 
   /**
@@ -989,7 +989,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline _ForwardIterator
 uninitialized_default_construct_n(_ForwardIterator __first, _Size __count)
 {
-  return __uninitialized_default_novalue_n(__first, __count);
+  return std::__uninitialized_default_novalue_n(__first, __count);
 }
 
   /**
@@ -1003,7 +1003,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 uninitialized_value_construct(_ForwardIterator __first,
  _ForwardIterator __last)
 {
-  return __uninitialized_default(__first, __last);
+  return std::__uninitialized_default(__first, __last);
 }
 
   /**
@@ -1017,7 +1017,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline _ForwardIterator
 uninitialized_value_construct_n(_ForwardIterator __first, _Size __count)
 {
-  return __uninitialized_default_n(__first, __count);
+  return std::__uninitialized_default_n(__first, __count);
 }
 
   /**
-- 
2.45.2



Re: [PATCH] libstdc++: Make std::any_cast ill-formed (LWG 3305)

2024-06-21 Thread Jonathan Wakely
Pushed to trunk now. I might backport this later too.


On Thu, 20 Jun 2024 at 16:39, Jonathan Wakely  wrote:
>
> Tested x86_64-linux.
>
> -- >8 --
>
> LWG 3305 was approved earlier this year in Tokyo. We need to give an
> error if using std::any_cast, but std::any_cast is valid
> (but always returns null).
>
> libstdc++-v3/ChangeLog:
>
> * include/std/any (any_cast(any*), any_cast(const any*)): Add
> static assertion to reject void types, as per LWG 3305.
> * testsuite/20_util/any/misc/lwg3305.cc: New test.
> ---
>  libstdc++-v3/include/std/any  |  8 
>  .../testsuite/20_util/any/misc/lwg3305.cc | 15 +++
>  2 files changed, 23 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/20_util/any/misc/lwg3305.cc
>
> diff --git a/libstdc++-v3/include/std/any b/libstdc++-v3/include/std/any
> index 690ddc2aa57..e4709b1ce04 100644
> --- a/libstdc++-v3/include/std/any
> +++ b/libstdc++-v3/include/std/any
> @@ -554,6 +554,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  inline const _ValueType* any_cast(const any* __any) noexcept
>  {
> +  // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +  // 3305. any_cast
> +  static_assert(!is_void_v<_ValueType>);
> +
> +  // As an optimization, don't bother instantiating __any_caster for
> +  // function types, since std::any can only hold objects.
>if constexpr (is_object_v<_ValueType>)
> if (__any)
>   return static_cast<_ValueType*>(__any_caster<_ValueType>(__any));
> @@ -563,6 +569,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  inline _ValueType* any_cast(any* __any) noexcept
>  {
> +  static_assert(!is_void_v<_ValueType>);
> +
>if constexpr (is_object_v<_ValueType>)
> if (__any)
>   return static_cast<_ValueType*>(__any_caster<_ValueType>(__any));
> diff --git a/libstdc++-v3/testsuite/20_util/any/misc/lwg3305.cc 
> b/libstdc++-v3/testsuite/20_util/any/misc/lwg3305.cc
> new file mode 100644
> index 000..49f5d747ab3
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/any/misc/lwg3305.cc
> @@ -0,0 +1,15 @@
> +// { dg-do compile { target c++17 } }
> +
> +// LWG 3305. any_cast
> +
> +#include 
> +
> +void
> +test_lwg3305()
> +{
> +  std::any a;
> +  (void) std::any_cast(&a); // { dg-error "here" }
> +  const std::any a2;
> +  (void) std::any_cast(&a2); // { dg-error "here" }
> +}
> +// { dg-error "static assertion failed" "" { target *-*-* } 0 }
> --
> 2.45.2
>



Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-21 Thread Richard Biener
On Fri, Jun 21, 2024 at 10:21 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > [...]
> > I wonder if you can amend doc/passes.texi, specifically noting differences
> > between fwprop, combine and late-combine?
>
> Ooh, we have a doc/passes.texi? :)  Somehow missed that.

Yeah, I also usually forget this.

> How about the patch below?

Thanks - looks good to me.

Richard.

> Thanks,
> Richard
>
>
> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index 5746d3ec636..4ac7a2306a1 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -991,6 +991,25 @@ RTL expressions for the instructions by substitution, 
> simplifies the
>  result using algebra, and then attempts to match the result against
>  the machine description.  The code is located in @file{combine.cc}.
>
> +@item Late instruction combination
> +
> +This pass attempts to do further instruction combination, on top of
> +that performed by @file{combine.cc}.  Its current purpose is to
> +substitute definitions into all uses simultaneously, so that the
> +definition can be removed.  This differs from the forward propagation
> +pass, whose purpose is instead to simplify individual uses on the
> +assumption that the definition will remain.  It differs from
> +@file{combine.cc} in that there is no hard-coded limit on the number
> +of instructions that can be combined at once.  It also differs from
> +@file{combine.cc} in that it can move instructions, where necessary.
> +
> +However, the pass is not in principle limited to this form of
> +combination.  It is intended to be a home for other, future
> +combination approaches as well.
> +
> +The pass runs twice, once before register allocation and once after
> +register allocation.  The code is located in @file{late-combine.cc}.
> +
>  @item Mode switching optimization
>
>  This pass looks for instructions that require the processor to be in a


Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-21 Thread Richard Biener
On Fri, Jun 21, 2024 at 10:50 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > to match this by changing it to
>
> > /* Unsigned saturation sub, case 2 (branch with ge):
> >SAT_U_SUB = X >= Y ? X - Y : 0.  */
> > (match (unsigned_integer_sat_sub @0 @1)
> > (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
> >  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >   && types_match (type, @0, @1
>
> Do we need another name for this matching ? Add (convert? here may change the 
> sematics of .SAT_SUB.
> When we call gimple_unsigned_integer_sat_sub (lhs, ops, NULL), the converted 
> value may be returned different
> to the (minus @0 @1). Please correct me if my understanding is wrong.

I think gimple_unsigned_integer_sat_sub (lhs, ...) simply matches
(typeof LHS).SAT_SUB (ops[0], ops[1]) now, I don't think it's necessary to
handle the case where typef LHS and typeof ops[0] are equal specially?

> > and when using the gimple_match_* function make sure to consider
> > that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> > we matched?
>
> This may have problem for vector part I guess, require some additional change 
> from vectorize_convert when
> I try to do that in previous. Let me double check about it, and keep you 
> posted.

You are using gimple_unsigned_integer_sat_sub from pattern recognition, the
thing to do is simply to add a conversion stmt to the pattern sequence in case
the types differ?

But maybe I'm missing something.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 21, 2024 3:00 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB
>
> On Fri, Jun 21, 2024 at 5:53 AM  wrote:
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB
> >   } while (--n);
> > }
> >
> > It will have gimple after ifcvt pass,  it cannot hit any pattern of
> > SAT_SUB and then cannot vectorize to SAT_SUB.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = _18 ? iftmp.0_13 : 0;
> >
> > This patch would like to do some reconcile for above pattern to match
> > the SAT_SUB pattern.  Then the underlying vect pass is able to vectorize
> > the SAT_SUB.
>
> Hmm.  I was thinking of allowing
>
> /* Unsigned saturation sub, case 2 (branch with ge):
>SAT_U_SUB = X >= Y ? X - Y : 0.  */
> (match (unsigned_integer_sat_sub @0 @1)
>  (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
>  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>   && types_match (type, @0, @1
>
> to match this by changing it to
>
> /* Unsigned saturation sub, case 2 (branch with ge):
>SAT_U_SUB = X >= Y ? X - Y : 0.  */
> (match (unsigned_integer_sat_sub @0 @1)
>  (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
>  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>   && types_match (type, @0, @1
>
> and when using the gimple_match_* function make sure to consider
> that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> we matched?
>
> Richard.
>
> > _2 = a_11 - b_12(D);
> > _18 = a_11 >= b_12(D);
> > _pattmp = _18 ? _2 : 0; // .SAT_SUB pattern
> > iftmp.0_13 = (short unsigned int) _pattmp;
> > iftmp.0_5 = iftmp.0_13;
> >
> > The below tests are running for this patch.
> > 1. The rv64gcv fully regression tests.
> > 2. The rv64gcv build with glibc.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add new match for trunated unsigned sat_sub.
> > * tree-if-conv.cc (gimple_truncated_unsigned_integer_sat_sub):
> > New external decl from match.pd.
> > (tree_if_cond_reconcile_unsigned_integer_sat_sub): New func impl
> > to reconcile the truncated sat_sub pattern.
> > (tree_if_cond_reconcile): New func impl to reconcile.
> > (pass_if_conversion::execute): Try to reconcile after ifcvt.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/match.pd|  9 +
> >  gcc/tree-if-conv.cc | 83 +
> >  2 files changed, 92 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 3d0689c9312..9617a5f9d5e 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3210,6 +3210,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >&& types_match (type, @0, @1
> >
> > +/* Unsigned saturation sub and then truncated, aka:
> > +   Truncated = X >= Y ? (Other Type) (X - Y) : 0.
> > + */
> > +(mat

Re: [PATCH 4/4] libstdc++: Remove std::__is_pointer and std::__is_scalar [PR115497]

2024-06-21 Thread Jonathan Wakely
Oops, this patch series actually depends on
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655267.html which
was posted separately, but needs to be applied before 4/4 in this
series.

On Thu, 20 Jun 2024 at 16:35, Jonathan Wakely  wrote:
>
> We still have __is_arithmetic in  after this,
> but that needs a lot more work to remove its uses from  and
> .
>
> Tested x86_64-linux.
>
> -- >8 --
>
> This removes the std::__is_pointer and std::__is_scalar traits, as they
> conflicts with a Clang built-in.
>
> Although Clang has a hack to make the class templates work despite using
> reserved names, removing these class templates will allow that hack to
> be dropped at some future date.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/115497
> * include/bits/cpp_type_traits.h (__is_pointer, __is_scalar):
> Remove.
> (__is_arithmetic): Do not use __is_pointer in the primary
> template. Add partial specialization for pointers.
> ---
>  libstdc++-v3/include/bits/cpp_type_traits.h | 33 -
>  1 file changed, 33 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
> b/libstdc++-v3/include/bits/cpp_type_traits.h
> index 4d83b9472e6..abe0c7603e3 100644
> --- a/libstdc++-v3/include/bits/cpp_type_traits.h
> +++ b/libstdc++-v3/include/bits/cpp_type_traits.h
> @@ -343,31 +343,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
>  };
>  #endif
>
> -  //
> -  // Pointer types
> -  //
> -#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> -  template
> -struct __is_pointer : __truth_type<_IsPtr>
> -{
> -  enum { __value = _IsPtr };
> -};
> -#else
> -  template
> -struct __is_pointer
> -{
> -  enum { __value = 0 };
> -  typedef __false_type __type;
> -};
> -
> -  template
> -struct __is_pointer<_Tp*>
> -{
> -  enum { __value = 1 };
> -  typedef __true_type __type;
> -};
> -#endif
> -
>//
>// An arithmetic type is an integer type or a floating point type
>//
> @@ -376,14 +351,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
>  : public __traitor<__is_integer<_Tp>, __is_floating<_Tp> >
>  { };
>
> -  //
> -  // A scalar type is an arithmetic type or a pointer type
> -  //
> -  template
> -struct __is_scalar
> -: public __traitor<__is_arithmetic<_Tp>, __is_pointer<_Tp> >
> -{ };
> -
>//
>// For use in std::copy and std::find overloads for streambuf iterators.
>//
> --
> 2.45.2
>



Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-21 Thread Richard Biener
On Thu, 20 Jun 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Mon, 17 Jun 2024, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Fri, 14 Jun 2024, Richard Biener wrote:
> >> >
> >> >> On Fri, 14 Jun 2024, Richard Sandiford wrote:
> >> >> 
> >> >> > Richard Biener  writes:
> >> >> > > On Fri, 14 Jun 2024, Richard Sandiford wrote:
> >> >> > >
> >> >> > >> Richard Biener  writes:
> >> >> > >> > The following retires vcond{,u,eq} optabs by stopping to use them
> >> >> > >> > from the middle-end.  Targets instead (should) implement 
> >> >> > >> > vcond_mask
> >> >> > >> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> >> >> > >> > possibly affected targets - those implementing these patterns,
> >> >> > >> > and in particular it lists mips, sparc and ia64 as targets that
> >> >> > >> > most definitely will regress while others might simply remove
> >> >> > >> > their vcond{,u,eq} patterns.
> >> >> > >> >
> >> >> > >> > I'd appreciate testing, I do not expect fallout for x86 or 
> >> >> > >> > arm/aarch64.
> >> >> > >> > I know riscv doesn't implement any of the legacy optabs.  But 
> >> >> > >> > less
> >> >> > >> > maintained vector targets might need adjustments.
> >> >> > >> >
> >> >> > >> > I want to get rid of those optabs for GCC 15.  If I don't hear 
> >> >> > >> > from
> >> >> > >> > you I will assume your target is fine.
> >> >> > >> 
> >> >> > >> Great!  Thanks for doing this.
> >> >> > >> 
> >> >> > >> Is there a plan for how we should handle vector comparisons that
> >> >> > >> have to be done as the inverse of the negated condition?  Should
> >> >> > >> targets simply not provide vec_cmp for such conditions and leave
> >> >> > >> the target-independent code to deal with the fallout?  (For a
> >> >> > >> standalone comparison, it would invert the result.  For a 
> >> >> > >> VEC_COND_EXPR
> >> >> > >> it would swap the true and false values.)
> >> >> > >
> >> >> > > I would expect that the ISEL pass which currently deals with finding
> >> >> > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
> >> >> > > So how do we deal with this right now?  I expect RTL expansion will
> >> >> > > do the inverse trick, no?
> >> >> > 
> >> >> > I think in practice (at least for the targets I've worked on),
> >> >> > the target's vec_cmp handles the inversion itself.  Thus the
> >> >> > main optimisation done by targets' vcond patterns is to avoid
> >> >> > the inversion (and instead swap the true/false values) when the
> >> >> > "opposite" comparison is the native one.
> >> >> 
> >> >> I see.  I suppose whether or not vec_cmp is handled is determined
> >> >> by a FAIL so it's somewhat difficult to determine this at ISEL time.
> >> 
> >> In principle we could say that the predicates should accept only the
> >> conditions that can be done natively.  Then target-independent code
> >> can apply the usual approaches to generating other conditions
> >> (which tend to be replicated across targets anyway).
> >
> > Ah yeah, I suppose that would work.  So we'd update the docs
> > to say predicates are required to reject not handled compares
> > and otherwise the expander may not FAIL?
> >
> > I'll note that expand_vec_cmp_expr_p already looks at the insn
> > predicates, so adjusting vector lowering (and vectorization) to
> > emit only recognized compares (and requiring folding to keep it at that)
> > should be possible.
> >
> > ISEL would then mainly need to learn the trick of swapping vector
> > cond arms on inverted masks.  OTOH folding should also do that.
> 
> Yeah.
> 
> > Or do you suggest to allow all compares on GIMPLE and only fixup
> > during ISEL?  How do we handle vector lowering then?  Would it be
> > enough to require "any" condition code and thus we expect targets
> > to implement enough codes so all compares can be handled by
> > swapping/inversion?
> 
> I'm not sure TBH.  I can see the argument that "canonicalising"
> conditions for the target could be either vector lowering or ISEL.
> 
> If a target can only do == or != natively, for instance (is any target
> like that?), then I think it should be ok for the predicates to accept
> only that condition.  Then the opposite != or == could be done using
> vector lowering/ISEL, but ordered comparisons would need to be lowered
> as though vec_cmp wasn't implemented at all.
> 
> Something similar probably applies to FP comparisons if the handling
> of unordered comparisons is limited.
> 
> And if we do that, it might be easier for vector lowering to handle
> everything itself, rather than try to predict what ISEL is going to do.

I agree that as we have to handle completely unsupported cases in
vector lowering anyway it's reasonable to try to force only supported
ops after that.

Note that when targets stop to advertise not supported compares then
the vectorizer likely needs adjustments as well.  We can of course
put some common logic into the middle-end like making the
expand_vec_cmp_expr_p functi

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-21 Thread Richard Earnshaw (lists)
On 21/06/2024 08:57, Alexandre Oliva wrote:
> On Jun 20, 2024, Christophe Lyon  wrote:
> 
>> Maybe using
>> if ((unsigned)b[i] >= BITS) \
>> would be clearer?
> 
> Heh.  Why make it simpler if we can make it unreadable, right? :-D
> 
> Thanks, here's another version I've just retested on x-arm-eabi.  Ok?
> 
> I'm not sure how to credit your suggestion.  It's not like you pretty
> much wrote the entire patch, as in Richard's case, but it's still a
> sizable chunk of this two-liner.  Any preferences?

How about mentioning Christophe's simplification in the commit log?
> 
> 
> The test was too optimistic, alas.  We used to vectorize shifts
> involving 8-bit and 16-bit integral types by clamping the shift count
> at the highest in-range shift count, but that was not correct: such
> narrow shifts expect integral promotion, so larger shift counts should
> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> before the fix).

This is OK, but you might wish to revisit this statement before committing.  I 
think the above is a mis-summary of the original bug report which had a test to 
pick between 0 and 1 as the result of a shift operation.

If I've understood what's going on here correctly, then we have 

(int16_t)32768 >> (int16_t) 16

but shift is always done at int precision, so this is (due to default 
promotions)

(int)(int16_t)32768 >> 16  // size/type of the shift amount does not matter.

which then simplifies to

-32768 >> 16;  // 0x8000 >> 16

= -1;

I think the original bug was that we were losing the cast to short (and hence 
the sign extension of the intermediate value), so effectively we simplified 
this to 

32768 >> 16; // 0x8000 >> 16

= 0;

And the other part of the observation was that it had to be done this way (and 
couldn't be narrowed for vectorization) because 16 is larger than the maximum 
shift for a short (actually you say that just below).

R.

> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the incorrect clamping we no longer perform
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization.
> 
> 
> Co-Authored-By: Richard Earnshaw 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/113281
>   * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..03078de49c65e 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>  int i;   \
>  for (i=0; i +  if ((unsigned)b[i] >= (unsigned)(BITS))
> \
> + __builtin_unreachable();\
>dest[i] = a[i] OP b[i];
> \
>  }
> \
>  }
> 
> 



Re: [PATCH 05/52] rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-21 Thread Arthur Cohen

Hi,

Sorry about the delay in my answer! The patch looks good to me :) Will 
you push it as part of your patchset?


Kindly,

Arthur

On 6/3/24 05:00, Kewen Lin wrote:

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in rust with TYPE_PRECISION of {float,{,long_}double}_type_node.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/rust/ChangeLog:

* rust-gcc.cc (float_type): Use TYPE_PRECISION of
{float,double,long_double}_type_node to replace
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
---
  gcc/rust/rust-gcc.cc | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/rust/rust-gcc.cc b/gcc/rust/rust-gcc.cc
index f17e19a2dfc..38169c08985 100644
--- a/gcc/rust/rust-gcc.cc
+++ b/gcc/rust/rust-gcc.cc
@@ -411,11 +411,11 @@ tree
  float_type (int bits)
  {
tree type;
-  if (bits == FLOAT_TYPE_SIZE)
+  if (bits == TYPE_PRECISION (float_type_node))
  type = float_type_node;
-  else if (bits == DOUBLE_TYPE_SIZE)
+  else if (bits == TYPE_PRECISION (double_type_node))
  type = double_type_node;
-  else if (bits == LONG_DOUBLE_TYPE_SIZE)
+  else if (bits == TYPE_PRECISION (long_double_type_node))
  type = long_double_type_node;
else
  {


Re: [PATCH 05/52] rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-21 Thread Kewen.Lin
Hi Arthur,

on 2024/6/21 18:17, Arthur Cohen wrote:
> Hi,
> 
> Sorry about the delay in my answer! The patch looks good to me :) Will you 
> push it as part of your patchset?
> 

Thanks for the review!  Since this one doesn't necessarily depend on
"09/52 Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
mode_for_floating_type", I'm going to push this before that (just like
the other FE changes excepting for the jit one 10/52 which depends on
the new hook 09/52).  btw, all after 09/52 would be merged into 09/52
when committing. :)

Does it sound good to you?

BR,
Kewen

> Kindly,
> 
> Arthur
> 
> On 6/3/24 05:00, Kewen Lin wrote:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> in rust with TYPE_PRECISION of {float,{,long_}double}_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/rust/ChangeLog:
>>
>> * rust-gcc.cc (float_type): Use TYPE_PRECISION of
>> {float,double,long_double}_type_node to replace
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>> ---
>>   gcc/rust/rust-gcc.cc | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/rust/rust-gcc.cc b/gcc/rust/rust-gcc.cc
>> index f17e19a2dfc..38169c08985 100644
>> --- a/gcc/rust/rust-gcc.cc
>> +++ b/gcc/rust/rust-gcc.cc
>> @@ -411,11 +411,11 @@ tree
>>   float_type (int bits)
>>   {
>>     tree type;
>> -  if (bits == FLOAT_TYPE_SIZE)
>> +  if (bits == TYPE_PRECISION (float_type_node))
>>   type = float_type_node;
>> -  else if (bits == DOUBLE_TYPE_SIZE)
>> +  else if (bits == TYPE_PRECISION (double_type_node))
>>   type = double_type_node;
>> -  else if (bits == LONG_DOUBLE_TYPE_SIZE)
>> +  else if (bits == TYPE_PRECISION (long_double_type_node))
>>   type = long_double_type_node;
>>     else
>>   {


Re: [PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-21 Thread Kewen.Lin
Hi Carl,

on 2024/6/20 00:13, Carl Love wrote:
> GCC maintainers:
> 
> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
> Peter suggested the -mdejagnu-cpu= value must be power7.  
> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
> has been retested on a Power 10 box, it succeeds
> with 2 passes and no fails.

IMHO Peter's suggestion on power7 (-mdejagnu-cpu=power7) is mainly for
altivec-1-runnable.c.  Both your testing and the comments in the test
case show this altivec-2-runnable.c requires at least power8.

> 
> Per the additional feedback after patch: 
> 
>   commit c892525813c94b018464d5a4edc17f79186606b7
>   Author: Carl Love 
>   Date:   Tue Jun 11 14:01:16 2024 -0400
> 
>   rs6000, altivec-2-runnable.c should be a runnable test
> 
>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>   test.  This patch changes the dg-do command argument to run.
> 
>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.
> 
> was approved and committed, I have updated the dg-require-effective-target
> and dg-options as requested so the test will compile with -O2 on a 
> machine that has a minimum support of Power 8 vector hardware.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> rs6000, altivec-2-runnable.c update the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> index 17b23eb9d50..9e7ef89327b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -1,7 +1,7 @@
> -/* { dg-do run } */
> -/* { dg-options "-mvsx" } */
> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
> -/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-do run { target vsx_hw } } */

As this test case requires power8 and up, and dg-options specifies
-mdejagnu-cpu=power8, we should use p8vector_hw instead of vsx_hw here,
otherwise it will fail on power7 env.

> +/* { dg-do compile { target { ! vmx_hw } } } */

This condition should be ! , so ! p8vector_hw.

> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */> +/* { 
> dg-require-effective-target powerpc_altivec } */

This should be powerpc_vsx instead, otherwise this case can still be
tested with -mno-vsx -maltivec, then this test case would fail.

Besides, as the discussion on the name of this test case, could you also
rename this to p8vector-builtin-9.c instead?

BR,
Kewen



Re: [PATCH V5 1/2] split complicate 64bit constant to memory

2024-06-21 Thread Kewen.Lin
Hi Jeff,

on 2024/6/13 10:19, Jiufu Guo wrote:
> Hi,
> 
> Sometimes, a complicated constant is built via 3(or more)
> instructions.  Generally speaking, it would not be as fast
> as loading it from the constant pool (as the discussions in
> PR63281):
> "ld" is one instruction.  If consider "address/toc" adjust,
> we may count it as 2 instructions. And "pld" may need fewer
> cycles.
> 
> As testing(SPEC2017), it could get better/stable runtime
> if set the threshold as "> 2" (compare with "> 3").
> 
> As known, because the constant is load from memory by this
> patch,  so this functionality may affect the cache missing.

I wonder if it's a good idea to offer one rs6000 specific
parameter to control this threshold, since this change isn't
always a win, like 5 constant building simple insns can be
well scheduled among insns in its own bb and an equilavent
load can suffer from cache miss or insufficient LSU resource
A parameter may be better for users in case they want to
fine tune further for some cases.

> While, IMHO, this patch would be still do the right thing.
> 
> Compare with the previous version:
> This version 1. allow assigning complicate constant to r0 before RA,
> 2. allow more condition beside TARGET_ELF,
> 3. updated test cases, and remove 2 test cases as the orignal test
> point is not used any more.

Can they be written with the proposed FORCE_CONST_INTO_REG used in
other test cases?

> 
> Boostrap & regtest pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu Guo)
> 
>   PR target/63281
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_const): Split constant to
>   memory under -m64.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/const_anchors.c: Test final-rtl.
>   * gcc.target/powerpc/pr106550_1.c (FORCE_CONST_INTO_REG): New macro.
>   * gcc.target/powerpc/pr106550_1.c: Use macro FORCE_CONST_INTO_REG.

Curious if git gcc-verify will complain this (same changed file have multipe 
lines).

>   * gcc.target/powerpc/pr87870.c: Update asm insn checking.
>   * gcc.target/powerpc/pr93012.c: Likewise.
>   * gcc.target/powerpc/parall_5insn_const.c: Removed.
>   * gcc.target/powerpc/pr106550.c: Removed.

Nit: s/Removed/Remove/

>   * gcc.target/powerpc/pr63281.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc   | 15 +++
>  .../gcc.target/powerpc/const_anchors.c|  5 ++--
>  .../gcc.target/powerpc/parall_5insn_const.c   | 27 ---
>  gcc/testsuite/gcc.target/powerpc/pr106550.c   | 14 --
>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c | 16 ++-
>  gcc/testsuite/gcc.target/powerpc/pr63281.c| 11 
>  gcc/testsuite/gcc.target/powerpc/pr87870.c|  5 +++-
>  gcc/testsuite/gcc.target/powerpc/pr93012.c|  6 -
>  8 files changed, 47 insertions(+), 52 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc..bc9d6f5c34f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source)
> c = sext_hwi (c, 32);
> emit_move_insn (lo, GEN_INT (c));
>   }
> +

Nit: Unexpected new line.

> +  else if ((can_create_pseudo_p () || base_reg_operand (dest, mode))

Nit: It's not obvious, maybe one comment on why we need base_reg_operand
restriction under !can_create_pseudo_p.

BR,
Kewen

> +&& TARGET_64BIT && num_insns_constant (source, mode) > 2)
> + {
> +   rtx sym = force_const_mem (mode, source);
> +   if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> +   && use_toc_relative_ref (XEXP (sym, 0), mode))
> + {
> +   rtx toc = create_TOC_reference (XEXP (sym, 0), dest);
> +   sym = gen_const_mem (mode, toc);
> +   set_mem_alias_set (sym, get_TOC_alias_set ());
> + }
> +
> +   emit_move_insn (dest, sym);
> + }
>else
>   rs6000_emit_set_long_const (dest, c);
>break;
> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> index 542e2674b12..682e773d506 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target has_arch_ppc64 } } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
>  
>  #define C1 0x2351847027482577ULL
>  #define C2 0x2351847027482578ULL
> @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
> long b)
>  *a++ = C2;
>  }
>  
> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
> +
> di

Re: [PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-21 Thread Kewen.Lin
Hi Carl,

on 2024/6/20 00:18, Carl Love wrote:
> GCC maintainers:
> 
> The dg options for this test should be the same as for altivec-2-runnable.c.  
> This patch updates the dg options to match 
> the settings in altivec-2-runnable.c.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> --From
>  289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c update 
> the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.

This is not true, vec_unpackh and vec_unpackl doesn't require power8,
vupk[hl]s[hb]/vupk[hl]px are all ISA 2.03.

> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-1-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> index da8ebbc30ba..c113089c13a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> @@ -1,6 +1,7 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> -/* { dg-require-effective-target powerpc_altivec_ok } */
> -/* { dg-options "-maltivec" } */
> +/* { dg-do run { target vsx_hw } } */

So this line should check for vmx_hw.

> +/* { dg-do compile { target { ! vmx_hw } } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

With more thinking, I think it's better to use
"-O2 -maltivec" to be consistent with the others.

As mentioned in the other thread, powerpc_altivec
effective target check should guarantee the altivec
feature support, if any default cpu type or user
specified option disable altivec, this test case
will not be tested.  If we specify one cpu type
specially here, it may cause confusion why it's
different from the other existing ones.  So let's
go without no specified cpu type.

Besides, similar to the request for altivec-1-runnable.c,
could you also rename this to altivec-38.c?

BR,
Kewen

> +/* { dg-require-effective-target powerpc_altivec } */
>  
>  #include 
>  



Re: [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-06-21 Thread Kewen.Lin
Hi Haochen,

on 2024/5/24 14:02, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isinf for SFDF and IEEE128 by test
> data class instructions.
> 
>   Compared with previous version, the main change is to narrow
> down the predict for float operand according to review's advice.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isinf for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isinf2 for SFDF): New expand.
>   (isinf2 for IEEE128): New expand.

I think we can add one new mode iterator IEEE_FP including both SFDF
and IEEE128, then we can merge these two into one.

> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-1.c: New test.
>   * gcc.target/powerpc/pr97786-2.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..08cce11da60 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
>operands[4] = CONST0_RTX (SImode);
>  })
> 
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:SFDF 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));

Nit: It would be more readable if we can create some macros
for "Test Data Class" mask bits.

The other looks good to me, thanks!

BR,
Kewen

> +  DONE;
> +})
> +
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT 
> (0x30)));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> new file mode 100644
> index 000..c1c4f64ee8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +
> +int test1 (double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test3 (float x)
> +{
> +  return __builtin_isinff (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> new file mode 100644
> index 000..ed305e8572e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (long double x)
> +{
> +  return __builtin_isinfl (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */



Re: [PATCH 4/7 v2] lto: Implement ltrans cache

2024-06-21 Thread Jan Hubicka
> Michal Jires  writes:
> 
> No performance data?

Michal has bachelor thesis on the topic which has some statistics
https://dspace.cuni.cz/handle/20.500.11956/183051?locale-attribute=en
> 
> > +
> > +static const md5_checksum_t INVALID_CHECKSUM = {
> > +  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > +};
> 
> There are much faster/optimized modern hashes for good collision detection 
> over
> MD5 especially when it's not needed to be cryptographically secure. Pick
> something from smhasher.
> 
> Also perhaps the check sum should be cached in the file? I assume it's
> cheap to compute while writing. It could be written at the tail of the
> file. Then it can be read by seeking to the end and you save that
> step.
> 
> The lockfiles scare me a bit. What happens when they get lost, e.g.
> due to a compiler crash? You may need some recovery for that.
> Perhaps it would be better to make the files self checking, so that
> partial files can be detected when reading, and get rid of the locks.

Those seem good ideas. One problem is that the cached files are object
files and thus we will likely need simple-object extension to embed the
hash.

Overall incremental LTO is quite complex problem hitting several areas
where current implementation of LTO is lacking. Most important problem
seems to be the divergence of partition files which happens due to
various global counters and have no good purpose. Fixing those is useful
per se, since it improves reproducibility of builds. (Some of this was
merged to last release, but not all).

Ohter important issue is the debug info which is slow and diverges
often.  We will also need to work on speeding up WPA.

Since it is relatively early stage1, I think it makes sense to merge the
code without adding extra complexity and optimizing it incrementally.
(Since it works relatively well already)

There are many things to do and I think it is better to do that in trunk
rahter than cumulating relatively complex changes on branch.
md5 is already supported by libiberty so it is kind of easy choice for
first cut implementation.

Honza
> 
> -Andi


[PATCH] RISC-V: Fix unresolved mcpu-[67].c tests

2024-06-21 Thread Craig Blackmore
These tests check the sched2 dump, so skip them for optimization levels
that do not enable sched2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-6.c: Skip for -O0, -O1, -Og.
* gcc.target/riscv/mcpu-7.c: Likewise.
---
 gcc/testsuite/gcc.target/riscv/mcpu-6.c | 1 +
 gcc/testsuite/gcc.target/riscv/mcpu-7.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-6.c 
b/gcc/testsuite/gcc.target/riscv/mcpu-6.c
index 96faa01653e..0126011939f 100644
--- a/gcc/testsuite/gcc.target/riscv/mcpu-6.c
+++ b/gcc/testsuite/gcc.target/riscv/mcpu-6.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
 /* Verify -mtune has higher priority than -mcpu for pipeline model .  */
 /* { dg-options "-mcpu=sifive-u74 -mtune=rocket -fdump-rtl-sched2-details 
-march=rv32i -mabi=ilp32" } */
 /* { dg-final { scan-rtl-dump "simple_return\[ \]+:alu" "sched2" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-7.c 
b/gcc/testsuite/gcc.target/riscv/mcpu-7.c
index 6832323e529..656436343bd 100644
--- a/gcc/testsuite/gcc.target/riscv/mcpu-7.c
+++ b/gcc/testsuite/gcc.target/riscv/mcpu-7.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
 /* Verify -mtune has higher priority than -mcpu for pipeline model .  */
 /* { dg-options "-mcpu=sifive-s21 -mtune=sifive-u74 -fdump-rtl-sched2-details 
-march=rv32i -mabi=ilp32" } */
 /* { dg-final { scan-rtl-dump "simple_return\[ \]+:sifive_7_B" "sched2" } } */
-- 
2.34.1



Re: [PATCH] RISC-V: Fix unresolved mcpu-[67].c tests

2024-06-21 Thread Kito Cheng
LGTM, thanks :)

On Fri, Jun 21, 2024 at 7:33 PM Craig Blackmore <
craig.blackm...@embecosm.com> wrote:

> These tests check the sched2 dump, so skip them for optimization levels
> that do not enable sched2.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/mcpu-6.c: Skip for -O0, -O1, -Og.
> * gcc.target/riscv/mcpu-7.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/riscv/mcpu-6.c | 1 +
>  gcc/testsuite/gcc.target/riscv/mcpu-7.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-6.c
> b/gcc/testsuite/gcc.target/riscv/mcpu-6.c
> index 96faa01653e..0126011939f 100644
> --- a/gcc/testsuite/gcc.target/riscv/mcpu-6.c
> +++ b/gcc/testsuite/gcc.target/riscv/mcpu-6.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
>  /* Verify -mtune has higher priority than -mcpu for pipeline model .  */
>  /* { dg-options "-mcpu=sifive-u74 -mtune=rocket -fdump-rtl-sched2-details
> -march=rv32i -mabi=ilp32" } */
>  /* { dg-final { scan-rtl-dump "simple_return\[ \]+:alu" "sched2" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-7.c
> b/gcc/testsuite/gcc.target/riscv/mcpu-7.c
> index 6832323e529..656436343bd 100644
> --- a/gcc/testsuite/gcc.target/riscv/mcpu-7.c
> +++ b/gcc/testsuite/gcc.target/riscv/mcpu-7.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
>  /* Verify -mtune has higher priority than -mcpu for pipeline model .  */
>  /* { dg-options "-mcpu=sifive-s21 -mtune=sifive-u74
> -fdump-rtl-sched2-details -march=rv32i -mabi=ilp32" } */
>  /* { dg-final { scan-rtl-dump "simple_return\[ \]+:sifive_7_B" "sched2" }
> } */
> --
> 2.34.1
>
>


[PATCH] tree-optimization/115528 - fix vect alignment analysis for outer loop vect

2024-06-21 Thread Richard Biener
For outer loop vectorization of a data reference in the inner loop
we have to look at both steps to see if they preserve alignment.

What is special for this testcase is that the outer loop step is
one element but the inner loop step four and that we now use SLP
and the vectorization factor is one.  But the issue looks latent
to me.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/115528
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Make sure to look at both the inner and outer loop step
behavior.

* gfortran.dg/vect/pr115528.f: New testcase.
---
 gcc/testsuite/gfortran.dg/vect/pr115528.f | 27 +++
 gcc/tree-vect-data-refs.cc| 57 ---
 2 files changed, 56 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/vect/pr115528.f

diff --git a/gcc/testsuite/gfortran.dg/vect/pr115528.f 
b/gcc/testsuite/gfortran.dg/vect/pr115528.f
new file mode 100644
index 000..764a4b92b3e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/pr115528.f
@@ -0,0 +1,27 @@
+! { dg-additional-options "-fno-inline" }
+
+  subroutine init(COEF1,FORM1,AA)
+  double precision COEF1,X
+  double complex FORM1
+  double precision AA(4,4)
+  COEF1=0
+  FORM1=0
+  AA=0
+  end
+  subroutine curr(HADCUR)
+  double precision COEF1
+  double complex HADCUR(4),FORM1
+  double precision AA(4,4)
+  call init(COEF1,FORM1,AA)
+  do i = 1,4
+ do j = 1,4
+HADCUR(I)=
+ $ HADCUR(I)+CMPLX(COEF1)*FORM1*AA(I,J)
+ end do
+  end do
+  end
+  program test
+double complex HADCUR(4)
+hadcur=0
+call curr(hadcur)
+  end
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ae237407672..959e127c385 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -1356,42 +1356,43 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
   step_preserves_misalignment_p = true;
 }
 
-  /* In case the dataref is in an inner-loop of the loop that is being
- vectorized (LOOP), we use the base and misalignment information
- relative to the outer-loop (LOOP).  This is ok only if the misalignment
- stays the same throughout the execution of the inner-loop, which is why
- we have to check that the stride of the dataref in the inner-loop evenly
- divides by the vector alignment.  */
-  else if (nested_in_vect_loop_p (loop, stmt_info))
-{
-  step_preserves_misalignment_p
-   = (DR_STEP_ALIGNMENT (dr_info->dr) % vect_align_c) == 0;
-
-  if (dump_enabled_p ())
-   {
- if (step_preserves_misalignment_p)
-   dump_printf_loc (MSG_NOTE, vect_location,
-"inner step divides the vector alignment.\n");
- else
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"inner step doesn't divide the vector"
-" alignment.\n");
-   }
-}
-
-  /* Similarly we can only use base and misalignment information relative to
- an innermost loop if the misalignment stays the same throughout the
- execution of the loop.  As above, this is the case if the stride of
- the dataref evenly divides by the alignment.  */
   else
 {
+  /* We can only use base and misalignment information relative to
+an innermost loop if the misalignment stays the same throughout the
+execution of the loop.  As above, this is the case if the stride of
+the dataref evenly divides by the alignment.  */
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   step_preserves_misalignment_p
-   = multiple_p (DR_STEP_ALIGNMENT (dr_info->dr) * vf, vect_align_c);
+   = multiple_p (drb->step_alignment * vf, vect_align_c);
 
   if (!step_preserves_misalignment_p && dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "step doesn't divide the vector alignment.\n");
+
+  /* In case the dataref is in an inner-loop of the loop that is being
+vectorized (LOOP), we use the base and misalignment information
+relative to the outer-loop (LOOP).  This is ok only if the
+misalignment stays the same throughout the execution of the
+inner-loop, which is why we have to check that the stride of the
+dataref in the inner-loop evenly divides by the vector alignment.  */
+  if (step_preserves_misalignment_p
+ && nested_in_vect_loop_p (loop, stmt_info))
+   {
+ step_preserves_misalignment_p
+   = (DR_STEP_ALIGNMENT (dr_info->dr) % vect_align_c) == 0;
+
+ if (dump_enabled_p ())
+   {
+ if (step_preserves_misalignment_p)
+   dump_printf_loc (MSG_NOTE, vect_location,
+"inner step

Re: [PATCH] testsuite/ubsan/overflow-div-3.c: Use SIGTRAP for MIPS

2024-06-21 Thread Maciej W. Rozycki
On Fri, 21 Jun 2024, YunQiang Su wrote:

> >  Then GCC emits the wrong trap instruction, wherever it comes from and
> > whatever has caused it.  The correct ones for integer division by zero
> 
> Thanks so much. It is not the bug of Linux kernel or GCC.
> It is a bug of me ;) and qemu.
> 
> Qemu didn't pass the code of TEQ correctly; and I haven't run this test on
> real hardware.

 QEMU is a simulator only and has bugs or other discrepancies from real 
hardware.  Especially the user emulation mode combines issues with actual 
instruction simulation *and* the OS interface.  Therefore you can use QEMU 
as a valuable tool in the course of making your changes, but you need to 
always verify the final result with real hardware before submitting it 
upstream.

 Also submissions need to include details as to how the problem addressed 
has been reproduced.

 Otherwise this is just producing noise and wasting people's time.

  Maciej


[WIP PATCH] libcpp, c-family: Add concatenated string support for #emebd gnu::base64 argument

2024-06-21 Thread Jakub Jelinek
Hi!

Here is an incremental patch which adds support for string concatenation
in the parsing of gnu::base64 #embed parameter and emits it as well
in -E -fdirectives-only preprocessing, e.g.
cat embed-do.c; ./cc1 -quiet -E -fdirectives-only embed-do.c -nostdinc
#embed "/usr/src/gcc/gcc/tree-ssa-dce.h"
# 0 "embed-do.c"
...
# 0 ""
# 1 "embed-do.c"
 47,
# 1 "embed-do.c"
#embed "." __gnu__::__base64__( \
"KiBDb3B5cmlnaHQgKEMpIDIwMTctMjAyNCBGcmVlIFNvZnR3YXJlIEZvdW5kYXRpb24sIEluYy4K" \
"ClRoaXMgZmlsZSBpcyBwYXJ0IG9mIEdDQy4KCkdDQyBpcyBmcmVlIHNvZnR3YXJlOyB5b3UgY2Fu" \
"IHJlZGlzdHJpYnV0ZSBpdCBhbmQvb3IgbW9kaWZ5IGl0CnVuZGVyIHRoZSB0ZXJtcyBvZiB0aGUg" \
"R05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2UgYXMgcHVibGlzaGVkIGJ5IHRoZQpGcmVlIFNvZnR3" \
"YXJlIEZvdW5kYXRpb247IGVpdGhlciB2ZXJzaW9uIDMsIG9yIChhdCB5b3VyIG9wdGlvbikgYW55" \
"CmxhdGVyIHZlcnNpb24uCgpHQ0MgaXMgZGlzdHJpYnV0ZWQgaW4gdGhlIGhvcGUgdGhhdCBpdCB3" \
"aWxsIGJlIHVzZWZ1bCwgYnV0IFdJVEhPVVQKQU5ZIFdBUlJBTlRZOyB3aXRob3V0IGV2ZW4gdGhl" \
"IGltcGxpZWQgd2FycmFudHkgb2YgTUVSQ0hBTlRBQklMSVRZIG9yCkZJVE5FU1MgRk9SIEEgUEFS" \
"VElDVUxBUiBQVVJQT1NFLiAgU2VlIHRoZSBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZQpmb3Ig" \
"bW9yZSBkZXRhaWxzLgoKWW91IHNob3VsZCBoYXZlIHJlY2VpdmVkIGEgY29weSBvZiB0aGUgR05V" \
"IEdlbmVyYWwgUHVibGljIExpY2Vuc2UKYWxvbmcgd2l0aCBHQ0M7IHNlZSB0aGUgZmlsZSBDT1BZ" \
"SU5HMy4gIElmIG5vdCBzZWUKPGh0dHA6Ly93d3cuZ251Lm9yZy9saWNlbnNlcy8+LiAgKi8KCiNp" \
"Zm5kZWYgVFJFRV9TU0FfRENFX0gKI2RlZmluZSBUUkVFX1NTQV9EQ0VfSApleHRlcm4gdm9pZCBz" \
"aW1wbGVfZGNlX2Zyb21fd29ya2xpc3QgKGJpdG1hcCwgYml0bWFwID0gbnVsbHB0cik7CiNlbmRp" \
"Zg==")
# 1 "embed-do.c"
,10

Is that what we want?  If so, I can incorporate it into the gnu::base64
(everything but c-ppoutput.cc) and into the WIP patch for CPP_EMBED support
(c-ppoutput.cc).

2024-06-21  Jakub Jelinek  

libcpp/
* internal.h (struct cpp_embed_params): Change base64 member
type from const cpp_token * to cpp_embed_params_token.
(_cpp_free_embed_params_tokens): Declare.
* directives.cc (save_token_for_embed, _cpp_free_embed_params_tokens):
New functions.
(skip_balanced_token_seq): Use save_token_for_embed.
(_cpp_parse_embed_params): Parse one or more consecutive CPP_STRING
tokens into params->base64 cpp_embed_params_token instead of saving
just one token.
(do_embed): Use _cpp_free_embed_params_tokens.
* files.cc (finish_embed): Don't try to move over some tokens from
previous CPP_EMBED if there is just one.
(finish_base64_embed): Rework to read base64 encoded characters from
one or more CPP_STRING tokens in cpp_embed_params_token instead of
just from a single token.
(_cpp_stack_embed): Adjust for the params->base64 member type change.
* macro.cc (builtin_has_embed): Use _cpp_free_embed_params_tokens.
gcc/
* doc/cpp.texi (Binary Resource Inclusion): Remove comment about
string concatenation not being supported, add comment about escape
sequences not supported.
gcc/c-family/
* c-ppoutput.cc (token_streamer::stream): Adjust formatting of
CPP_EMBED token, if longer than 30 bytes emit it on multiple lines
with at most 76 base64 characters per line.
gcc/testsuite/
* c-c++-common/cpp/embed-17.c: Add tests for concatenated string
literal arguments of gnu::base64.
* c-c++-common/cpp/embed-18.c: Remove them here.

--- libcpp/internal.h.jj2024-06-19 09:28:25.881760114 +0200
+++ libcpp/internal.h   2024-06-21 12:12:04.647654213 +0200
@@ -631,8 +631,7 @@ struct cpp_embed_params
   location_t loc;
   bool has_embed;
   cpp_num_part limit, offset;
-  const cpp_token *base64;
-  cpp_embed_params_tokens prefix, suffix, if_empty;
+  cpp_embed_params_tokens prefix, suffix, if_empty, base64;
 };
 
 /* Character classes.  Based on the more primitive macros in safe-ctype.h.
@@ -806,6 +805,7 @@ extern void _cpp_restore_pragma_names (c
 extern int _cpp_do__Pragma (cpp_reader *, location_t);
 extern void _cpp_init_directives (cpp_reader *);
 extern void _cpp_init_internal_pragmas (cpp_reader *);
+extern void _cpp_free_embed_params_tokens (cpp_embed_params_tokens *);
 extern bool _cpp_parse_embed_params (cpp_reader *, struct cpp_embed_params *);
 extern void _cpp_do_file_change (cpp_reader *, enum lc_reason, const char *,
 linenum_type, unsigned int);
--- libcpp/directives.cc.jj 2024-06-19 12:12:54.178141429 +0200
+++ libcpp/directives.cc2024-06-21 12:59:44.669537045 +0200
@@ -932,6 +932,50 @@ do_include_next (cpp_reader *pfile)
   do_include_common (pfile, type);
 }
 
+/* Helper function for skip_balanced_token_seq and _cpp_parse_embed_params.
+   Save one token *TOKEN into *SAVE.  */
+
+static void
+save_token_for_embed (cpp_embed_params_tokens *save, const cpp_token *token)
+{
+  if (save->count == 0)
+{
+  _cpp_init_tokenrun (&save->base_run, 4);
+  save->cur_run = &save->base_run;
+  save->cur_token = save->base_run.base;
+}
+ 

Re: [PATCH] Build: Set gcc_cv_as_mips_explicit_relocs if gcc_cv_as_mips_explicit_relocs_pcrel

2024-06-21 Thread Maciej W. Rozycki
On Fri, 21 Jun 2024, Richard Sandiford wrote:

> > We check gcc_cv_as_mips_explicit_relocs if 
> > gcc_cv_as_mips_explicit_relocs_pcrel
> > only, while gcc_cv_as_mips_explicit_relocs is used by later code.
> >
> > Maybe, it is time for use to set gcc_cv_as_mips_explicit_relocs always now,
> > as it has been in Binutils for more than 20 years.
> 
> Yeah, agreed FWIW.  This was necessary while the feature was relatively
> new, and while we still supported IRIX as, but I can't see any reasonable
> justification for using such an ancient binutils with modern GCC.
> 
> Getting rid of -mno-explicit-relocs altogether might simplify things.

 FWIW I tend to agree too, although I think the current mess has to be 
fixed first (and backported to the release branches) before going forward 
with the removal.

 And AFAICT the proposed change is the wrong one: it has to be analysed 
how we came at the current breakage and then the state reproducing how it 
used to work before recreated.

 Perhaps we need to check for general explicit reloc support first, before 
following with PC-relative relocs.  It seems natural to me this way, 
because you can't have support for PC-relative relocs (narrower scope) 
unless you have general explicit reloc support (wider scope) in the first 
place, so I wonder why we came up with what we have now.

  Maciej


[pushed 1/2] diagnostics: fixes to SARIF output [PR109360]

2024-06-21 Thread David Malcolm
When adding validation of .sarif files against the schema
(PR testsuite/109360) I discovered various issues where we were
generating invalid .sarif files.

Specifically, in
  c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-1.c
the relatedLocations for the "note" diagnostics were missing column
numbers, leading to validation failure due to non-unique elements,
such as multiple:
"message": {"text": "invalid UTF-8 character "}},
on line 25 with no column information.

Root cause is that for some diagnostics in libcpp we have a location_t
representing the line as a whole, setting a column_override on the
rich_location (since the line hasn't been fully read yet).  We were
handling this column override for plain text output, but not for .sarif
output.

Similarly, in diagnostic-format-sarif-file-pr111700.c there is a warning
emitted on "line 0" of the file, whereas SARIF requires line numbers to
be positive.

We also use column == 0 internally to mean "the line as a whole",
whereas SARIF required column numbers to be positive.

This patch fixes these various issues.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1540-g9f4fdc3acebcf6.

gcc/ChangeLog:
PR testsuite/109360
* diagnostic-format-sarif.cc
(sarif_builder::make_location_object): Pass any column override
from rich_loc to maybe_make_physical_location_object.
(sarif_builder::maybe_make_physical_location_object): Add
"column_override" param and pass it to maybe_make_region_object.
(sarif_builder::maybe_make_region_object): Add "column_override"
param and use it when the location has 0 for a column.  Don't
add "startLine", "startColumn", "endLine", or "endColumn" if
the values aren't positive.
(sarif_builder::maybe_make_region_object_for_context): Don't
add "startLine" or "endLine" if the values aren't positive.

libcpp/ChangeLog:
PR testsuite/109360
* include/rich-location.h (rich_location::get_column_override):
New accessor.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-sarif.cc | 75 --
 libcpp/include/rich-location.h |  2 +
 2 files changed, 56 insertions(+), 21 deletions(-)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 79116f051bc1..acf2aa875c48 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -243,11 +243,13 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *
   maybe_make_physical_location_object (location_t loc,
-  enum diagnostic_artifact_role role);
+  enum diagnostic_artifact_role role,
+  int column_override);
   json::object *make_artifact_location_object (location_t loc);
   json::object *make_artifact_location_object (const char *filename);
   json::object *make_artifact_location_object_for_pwd () const;
-  json::object *maybe_make_region_object (location_t loc) const;
+  json::object *maybe_make_region_object (location_t loc,
+ int column_override) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
   json::object *make_region_object_for_hint (const fixit_hint &hint) const;
   json::object *make_multiformat_message_string (const char *msg) const;
@@ -924,8 +926,9 @@ sarif_builder::make_location_object (const rich_location 
&rich_loc,
   location_t loc = rich_loc.get_loc ();
 
   /* "physicalLocation" property (SARIF v2.1.0 section 3.28.3).  */
-  if (json::object *phs_loc_obj = maybe_make_physical_location_object (loc,
-  role))
+  if (json::object *phs_loc_obj
+   = maybe_make_physical_location_object (loc, role,
+  rich_loc.get_column_override ()))
 location_obj->set ("physicalLocation", phs_loc_obj);
 
   /* "logicalLocations" property (SARIF v2.1.0 section 3.28.4).  */
@@ -946,7 +949,7 @@ sarif_builder::make_location_object (const diagnostic_event 
&event,
   /* "physicalLocation" property (SARIF v2.1.0 section 3.28.3).  */
   location_t loc = event.get_location ();
   if (json::object *phs_loc_obj
-   = maybe_make_physical_location_object (loc, role))
+   = maybe_make_physical_location_object (loc, role, 0))
 location_obj->set ("physicalLocation", phs_loc_obj);
 
   /* "logicalLocations" property (SARIF v2.1.0 section 3.28.4).  */
@@ -961,7 +964,10 @@ sarif_builder::make_location_object (const 
diagnostic_event &event,
   return location_obj;
 }
 
-/* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC,
+/* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC.
+
+   If COLUMN_OVERRIDE is non-z

[pushed 2/2] testsuite: check that generated .sarif files validate against the SARIF schema [PR109360]

2024-06-21 Thread David Malcolm
This patch extends the dg directive verify-sarif-file so that if
the "jsonschema" tool is available, it will be used to validate the
generated .sarif file.

Tested with jsonschema 3.2 with Python 3.8

With the previous patch, all files generated by the DejaGnu testsuite
validate.

There were no validation failures in integration testing.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1541-ga84fe222029ff2.

gcc/ChangeLog:
PR testsuite/109360
* doc/install.texi: Mention optional usage of "jsonschema" tool.

gcc/testsuite/ChangeLog:
PR testsuite/109360
* lib/sarif-schema-2.1.0.json: New file, downloaded from

https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/schemas/sarif-schema-2.1.0.json
Licensing information can be seen at
https://github.com/oasis-tcs/sarif-spec/issues/583
which states "They are free to incorporate it into their
implementation. No need for special permission or paperwork from
OASIS."
* lib/scansarif.exp (verify-sarif-file): If "jsonschema" is
available, use it to verify that the .sarif file complies with the
SARIF schema.
* lib/target-supports.exp (check_effective_target_jsonschema):
New.

Signed-off-by: David Malcolm 
---
 gcc/doc/install.texi  |5 +
 gcc/testsuite/lib/sarif-schema-2.1.0.json | 3370 +
 gcc/testsuite/lib/scansarif.exp   |   23 +
 gcc/testsuite/lib/target-supports.exp |   12 +
 4 files changed, 3410 insertions(+)
 create mode 100644 gcc/testsuite/lib/sarif-schema-2.1.0.json

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 1774a010889a..0c7691651466 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -460,6 +460,11 @@ is shown below:
 @item g++ testsuite
 @code{gcov}, @code{gzip}, @code{json}, @code{os} and @code{pytest}.
 
+@item SARIF testsuite
+Tests of SARIF output will use the @code{jsonschema} program from the
+@code{jsonschema} module (if available) to validate generated .sarif files.
+If this tool is not found, the validation parts of those tests are skipped.
+
 @item c++ cxx api generation
 @code{csv}, @code{os}, @code{sys} and @code{time}.
 
diff --git a/gcc/testsuite/lib/sarif-schema-2.1.0.json 
b/gcc/testsuite/lib/sarif-schema-2.1.0.json
new file mode 100644
index ..534d35da2b84
--- /dev/null
+++ b/gcc/testsuite/lib/sarif-schema-2.1.0.json
@@ -0,0 +1,3370 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#";,
+  "title": "Static Analysis Results Format (SARIF) Version 2.1.0 JSON Schema",
+  "$id": 
"https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json";,
+  "description": "Static Analysis Results Format (SARIF) Version 2.1.0 JSON 
Schema: a standard format for the output of static analysis tools.",
+  "additionalProperties": false,
+  "type": "object",
+  "properties": {
+
+"$schema": {
+  "description": "The URI of the JSON schema corresponding to the 
version.",
+  "type": "string",
+  "format": "uri"
+},
+
+"version": {
+  "description": "The SARIF format version of this log file.",
+  "enum": [ "2.1.0" ]
+},
+
+"runs": {
+  "description": "The set of runs contained in this log file.",
+  "type": "array",
+  "minItems": 0,
+  "uniqueItems": false,
+  "items": {
+"$ref": "#/definitions/run"
+  }
+},
+
+"inlineExternalProperties": {
+  "description": "References to external property files that share data 
between runs.",
+  "type": "array",
+  "minItems": 0,
+  "uniqueItems": true,
+  "items": {
+"$ref": "#/definitions/externalProperties"
+  }
+},
+
+"properties": {
+  "description": "Key/value pairs that provide additional information 
about the log file.",
+  "$ref": "#/definitions/propertyBag"
+}
+  },
+
+  "required": [ "version", "runs" ],
+
+  "definitions": {
+
+"address": {
+  "description": "A physical or virtual address, or a range of addresses, 
in an 'addressable region' (memory or a binary file).",
+  "additionalProperties": false,
+  "type": "object",
+  "properties": {
+
+"absoluteAddress": {
+  "description": "The address expressed as a byte offset from the 
start of the addressable region.",
+  "type": "integer",
+  "minimum": -1,
+  "default": -1
+
+},
+
+"relativeAddress": {
+  "description": "The address expressed as a byte offset from the 
absolute address of the top-most parent object.",
+  "type": "integer"
+
+},
+
+"length": {
+  "description": "The number of bytes in this range of addresses.",
+  "type": "integer"
+},
+
+"kind": {
+  "description": "An open-ended string that identifies the addres

[COMMITTED] Add builtin_unreachable processing for fast_vrp.

2024-06-21 Thread Andrew MacLeod
With the earlier rework of VRP which removed the array_bounds pass, it 
is now possible to invoke a different VRP for any pass. This patch adds 
the ability to call fast_vrp with the final_pass flag set, and remove 
invoke the remove_unreachable object to remove __builtin_unreachable 
calls if it is the final pass.


Bootstraps on  x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew
From 82c704c69fab610afcf4a1947577ed97dd72c429 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 17 Jun 2024 11:23:12 -0400
Subject: [PATCH 1/5] Add builtin_unreachable processing for fast_vrp.

Add a remove_unreachable object to fast vrp, and honor the final_p flag.

	* tree-vrp.cc (remove_unreachable::remove): Export global range
	if builtin_unreachable dominates all uses.
	(remove_unreachable::remove_and_update_globals): Do not reset SCEV.
	(execute_ranger_vrp): Reset SCEV here instead.
	(fvrp_folder::fvrp_folder): Take final pass flag
	and create a remove_unreachable object when specified.
	(fvrp_folder::pre_fold_stmt): Register GIMPLE_CONDs with
	the remove_unreachcable object.
	(fvrp_folder::m_unreachable): New.
	(execute_fast_vrp): Process remove_unreachable object.
	(pass_vrp::execute): Add final_p flag to execute_fast_vrp.
---
 gcc/tree-vrp.cc | 52 ++---
 1 file changed, 41 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 5f5eb9b57e9..a3b1a5cd337 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -280,6 +280,25 @@ remove_unreachable::remove ()
   gimple *s = gimple_outgoing_range_stmt_p (e->src);
   gcc_checking_assert (gimple_code (s) == GIMPLE_COND);
 
+  tree name = gimple_range_ssa_p (gimple_cond_lhs (s));
+  if (!name)
+	name = gimple_range_ssa_p (gimple_cond_rhs (s));
+  // Check if global value can be set for NAME.
+  if (name && fully_replaceable (name, src))
+	{
+	  value_range r (TREE_TYPE (name));
+	  if (gori_name_on_edge (r, name, e, &m_ranger)
+	  && set_range_info (name, r) &&(dump_file))
+	{
+	  fprintf (dump_file, "Global Exported (via unreachable): ");
+	  print_generic_expr (dump_file, name, TDF_SLIM);
+	  fprintf (dump_file, " = ");
+	  gimple_range_global (r, name);
+	  r.dump (dump_file);
+	  fputc ('\n', dump_file);
+	}
+	}
+
   change = true;
   // Rewrite the condition.
   if (e->flags & EDGE_TRUE_VALUE)
@@ -305,14 +324,10 @@ remove_unreachable::remove_and_update_globals ()
   if (m_list.length () == 0)
 return false;
 
-  // If there is no import/export info, just remove unreachables if necessary.
+  // If there is no import/export info, Do basic removal.
   if (!m_ranger.gori_ssa ())
 return remove ();
 
-  // Ensure the cache in SCEV has been cleared before processing
-  // globals to be removed.
-  scev_reset ();
-
   bool change = false;
   tree name;
   unsigned i;
@@ -1107,6 +1122,9 @@ execute_ranger_vrp (struct function *fun, bool final_p)
   rvrp_folder folder (ranger, final_p);
   phi_analysis_initialize (ranger->const_query ());
   folder.substitute_and_fold ();
+  // Ensure the cache in SCEV has been cleared before processing
+  // globals to be removed.
+  scev_reset ();
   // Remove tagged builtin-unreachable and maybe update globals.
   folder.m_unreachable.remove_and_update_globals ();
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -1168,9 +1186,15 @@ execute_ranger_vrp (struct function *fun, bool final_p)
 class fvrp_folder : public substitute_and_fold_engine
 {
 public:
-  fvrp_folder (dom_ranger *dr) : substitute_and_fold_engine (),
- m_simplifier (dr)
-  { m_dom_ranger = dr; }
+  fvrp_folder (dom_ranger *dr, bool final_p) : substitute_and_fold_engine (),
+	   m_simplifier (dr)
+  {
+m_dom_ranger = dr;
+if (final_p)
+  m_unreachable = new remove_unreachable (*dr, final_p);
+else
+  m_unreachable = NULL;
+  }
 
   ~fvrp_folder () { }
 
@@ -1228,6 +1252,9 @@ public:
 	value_range vr(type);
 	m_dom_ranger->range_of_stmt (vr, s);
   }
+if (m_unreachable && gimple_code (s) == GIMPLE_COND)
+  m_unreachable->maybe_register (s);
+
   }
 
   bool fold_stmt (gimple_stmt_iterator *gsi) override
@@ -1238,6 +1265,7 @@ public:
 return ret;
   }
 
+  remove_unreachable *m_unreachable;
 private:
   DISABLE_COPY_AND_ASSIGN (fvrp_folder);
   simplify_using_ranges m_simplifier;
@@ -1248,16 +1276,18 @@ private:
 // Main entry point for a FAST VRP pass using a dom ranger.
 
 unsigned int
-execute_fast_vrp (struct function *fun)
+execute_fast_vrp (struct function *fun, bool final_p)
 {
   calculate_dominance_info (CDI_DOMINATORS);
   dom_ranger dr;
-  fvrp_folder folder (&dr);
+  fvrp_folder folder (&dr, final_p);
 
   gcc_checking_assert (!fun->x_range_query);
   fun->x_range_query = &dr;
 
   folder.substitute_and_fold ();
+  if (folder.m_unreachable)
+folder.m_unreachable->remove ();
 
   fun->x_range_query = NULL;
   return 0;
@@ -1325,7 +1355,7 @@ public:
 {
 

[COMMITTED] Print "Global Exported" to dump_file from set_range_info.

2024-06-21 Thread Andrew MacLeod
I found that I was frequently writing the same hunk of code which checks 
the result of set_range_info() and and prints the global range to the 
dump_file when it is updated.


This routine only returns true if the value provided improves the 
range.  Ie,    'old_value'  intersect 'new_value'  != 'old_value'


set_range_info  is called from within other passes occasionally, but it 
seems to me that it is worthwhile to print out that it happened in those 
passes as well.  There are many times I don't know where the global 
range got updated outside of VRP, but this will make it easy to find 
with a grep of the listings.    It does not seem to interfere with any 
testcases I found.



Bootstrapped on  x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew

From b7cff112b4a3ee950b22abaa2218485140e945bd Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 17 Jun 2024 16:07:16 -0400
Subject: [PATCH 3/5] Print "Global Exported" to dump_file from set_range_info.

	* gimple-range.cc (gimple_ranger::register_inferred_ranges): Do not
	dump global range info after set_range_info.
	(gimple_ranger::register_transitive_inferred_ranges): Likewise.
	(dom_ranger::range_of_stmt): Likewise.
	* tree-ssanames.cc (set_range_info): If global range info
	changes, maybe print new range to dump_file.
	* tree-vrp.cc (remove_unreachable::handle_early): Do not
	dump global range info after set_range_info.
	(remove_unreachable::remove): Likewise.
	(remove_unreachable::remove_and_update_globals): Likewise.
	(pass_assumptions::execute): Likewise.
---
 gcc/gimple-range.cc  | 60 
 gcc/tree-ssanames.cc | 42 ---
 gcc/tree-vrp.cc  | 43 +++
 3 files changed, 47 insertions(+), 98 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 4e507485f5e..50448ef81a2 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -495,15 +495,8 @@ gimple_ranger::register_inferred_ranges (gimple *s)
   if (lhs)
 {
   value_range tmp (TREE_TYPE (lhs));
-  if (range_of_stmt (tmp, s, lhs) && !tmp.varying_p ()
-	  && set_range_info (lhs, tmp) && dump_file)
-	{
-	  fprintf (dump_file, "Global Exported: ");
-	  print_generic_expr (dump_file, lhs, TDF_SLIM);
-	  fprintf (dump_file, " = ");
-	  tmp.dump (dump_file);
-	  fputc ('\n', dump_file);
-	}
+  if (range_of_stmt (tmp, s, lhs) && !tmp.varying_p ())
+	set_range_info (lhs, tmp);
 }
   m_cache.apply_inferred_ranges (s);
 }
@@ -562,38 +555,25 @@ gimple_ranger::register_transitive_inferred_ranges (basic_block bb)
 void
 gimple_ranger::export_global_ranges ()
 {
-  /* Cleared after the table header has been printed.  */
-  bool print_header = true;
+  if (dump_file)
+{
+  /* Print the header only when there's something else
+	 to print below.  */
+  fprintf (dump_file, "Exporting new  global ranges:\n");
+  fprintf (dump_file, "\n");
+}
   for (unsigned x = 1; x < num_ssa_names; x++)
 {
   tree name = ssa_name (x);
   if (!name)
 	continue;
   value_range r (TREE_TYPE (name));
-  if (name && !SSA_NAME_IN_FREE_LIST (name)
-	  && gimple_range_ssa_p (name)
-	  && m_cache.get_global_range (r, name)
-	  && !r.varying_p())
-	{
-	  bool updated = set_range_info (name, r);
-	  if (!updated || !dump_file)
-	continue;
-
-	  if (print_header)
-	{
-	  /* Print the header only when there's something else
-		 to print below.  */
-	  fprintf (dump_file, "Exported global range table:\n");
-	  fprintf (dump_file, "\n");
-	  print_header = false;
-	}
-
-	  print_generic_expr (dump_file, name , TDF_SLIM);
-	  fprintf (dump_file, "  : ");
-	  r.dump (dump_file);
-	  fprintf (dump_file, "\n");
-	}
+  if (name && !SSA_NAME_IN_FREE_LIST (name) && gimple_range_ssa_p (name)
+	  && m_cache.get_global_range (r, name) && !r.varying_p())
+	set_range_info (name, r);
 }
+  if (dump_file)
+fprintf (dump_file, "= Done =\n");
 }
 
 // Print the known table values to file F.
@@ -1069,16 +1049,8 @@ dom_ranger::range_of_stmt (vrange &r, gimple *s, tree name)
   // If there is a new calculated range and it is not varying, set
   // a global range.
   if (ret && name && m_global.merge_range (name, r) && !r.varying_p ())
-{
-  if (set_range_info (name, r) && dump_file)
-	{
-	  fprintf (dump_file, "Global Exported: ");
-	  print_generic_expr (dump_file, name, TDF_SLIM);
-	  fprintf (dump_file, " = ");
-	  r.dump (dump_file);
-	  fputc ('\n', dump_file);
-	}
-}
+set_range_info (name, r);
+
   if (idx)
 tracer.trailer (idx, " ", ret, name, r);
   return ret;
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 615d522d0b1..411ea848c49 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "tree-pass.h"
 #include "ssa.h"

[COMMITTED] Change fast VRP algorithm.

2024-06-21 Thread Andrew MacLeod
This patch changes the algorithm for fast VRP.  When looking at PR 
114855, although VRP is not the main culrpit, it is taking about 600 
seconds out of the 10,000.. far more than it should. That test case has 
about 400,000 basic blocks, and when we get into that size, some of the 
tables and such are just too massive.


I experimented with using the fast_vrp pass I had created last year, and 
although it was better (dropped to 100 seconds), it was clear that the 
mechanism I had chosen for fast_vrp was still failing miserably in this 
case.  So I revamped it.  This is the new fast-vrp algorithm.  Its 
actually simpler anyway.


It still reuses components of ranger, so there shouldn't be any 
correctness issues (right? :-).


This is how the new mechanism  works.

- Global range are simply stored in a global range cache.
- Upon entry to each basic block, we pick up any contextual ranges that 
were active from the immediate dominator. (The first block would have none).
- If the block is a single predecessor block, we also pick up all 
contextual ranges generated from that edge.  gori_on_edge provides the 
complete set.
- This is combined with the global range (or existing contextual range), 
to procude the new set of contextual ranges for the block.
- The current basic block has a lazy_ssa_cache object which contains all 
contextual ranges that are active, so picking up the range for an 
ssa-name is now simple.  Its either the contextual range if poresent, or 
the global range.
- when post_bb is processed, we free the contextual range cache for the 
block.


This runs that testcase in 7 seconds now, so its a big improvement, and 
shoulduse a lot less memory as we don't build export and dependency tables.


This version also turns on the relation_oracle, which allows for a lot 
of relation processing to also happen.


I have tested it with a patch which  uses it for all 3 passes of VRP 
always, and this bootstraps successfully.  The original fast vrp ran 
about 38% faster, on a  bootstrap of GCC,  this runs about 32% faster, 
but also includes relation processing which the previous version did 
not.   It doesn't get everything normal VRP gets, but that is to be 
expected... it still gets  a lot. Eventually I'll spend another week and 
see if we can add inferred ranges or get any other low hanging fruit 
that is missed from the testsuite.


Bootstraps on  x86_64-pc-linux-gnu with no regressions.  (Of course, it 
isn't actually called anywhere yet).    Pushed.


Andrew


From 6bfd8f1b0ac116fb2cb1edb4a9dff7069257adb7 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 17 Jun 2024 11:32:51 -0400
Subject: [PATCH 2/5] Change fast VRP algorithm

Change the fast VRP algorithm to track contextual ranges active within
each basic block.

	* gimple-range.cc (dom_ranger::dom_ranger): Create a block
	vector.
	(dom_ranger::~dom_ranger): Dispose of the block vector.
	(dom_ranger::edge_range): Delete.
	(dom_ranger::range_on_edge): Combine range in src BB with any
	range gori_nme_on_edge returns.
	(dom_ranger::range_in_bb): Combine global range with any active
	contextual range for an ssa-name.
	(dom_ranger::range_of_stmt): Fix non-ssa LHS case, use
	fur_depend for folding so relations can be registered.
	(dom_ranger::maybe_push_edge): Delete.
	(dom_ranger::pre_bb): Create incoming contextual range vector.
	(dom_ranger::post_bb): Free contextual range vector.
	* gimple-range.h (dom_ranger::edge_range): Delete.
	(dom_ranger::m_e0): Delete.
	(dom_ranger::m_e1): Delete.
	(dom_ranger::m_bb): New.
	(dom_ranger::m_pop_list): Delete.
	* tree-vrp.cc (execute_fast_vrp): Enable relation oracle.
---
 gcc/gimple-range.cc | 232 
 gcc/gimple-range.h  |   8 +-
 gcc/tree-vrp.cc |   2 +
 3 files changed, 90 insertions(+), 152 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index f3e4ec2d249..4e507485f5e 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -918,7 +918,15 @@ assume_query::dump (FILE *f)
 }
 
 // ---
-
+//
+// The DOM based ranger assumes a single DOM walk through the IL, and is
+// used by the fvrp_folder as a fast VRP.
+// During the dom walk, the current block has an ssa_lazy_cache pointer
+// m_bb[bb->index] which represents all the cumulative contextual ranges
+// active in the block.
+// These ranges are pure static ranges generated by branches, and must be
+// combined with the equivlaent global range to produce the final range.
+// A NULL pointer means there are no contextual ranges.
 
 // Create a DOM based ranger for use by a DOM walk pass.
 
@@ -926,11 +934,8 @@ dom_ranger::dom_ranger () : m_global ()
 {
   m_freelist.create (0);
   m_freelist.truncate (0);
-  m_e0.create (0);
-  m_e0.safe_grow_cleared (last_basic_block_for_fn (cfun));
-  m_e1.create (0);
-  m_e1.safe_grow_cleared (last_basic_block_for_fn (cfun));
-  m_pop_list = BITMAP_ALLOC (NULL);
+  m_bb.creat

[PATCH] Add param for bb limit to invoke fast_vrp.

2024-06-21 Thread Andrew MacLeod

This patch adds

    --param=vrp-block-limit=N

When the basic block counter for a function exceeded 'N' , VRP is 
invoked with the new fast_vrp algorithm instead.   This algorithm uses a 
lot less memory and processing power, although it does get a few less 
things.


Primary motivation is cases like 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP 
passes consume about 600 seconds of the compile time, and a lot of 
memory.      With fast_vrp, it spends less than 10 seconds total in the 
3 passes of VRP. This test case has about 400,000 basic blocks.


The default for N in this patch is 150,000,  arbitrarily chosen.

This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0 
as well) on x86_64-pc-linux-gnu, with no regressions.


What do you think, OK for trunk?

Andrew

PS sorry,. it doesn't help the threader in that PR :-(

From 3bb9bd3ca8038676e45b0bddcda91cbed7e51662 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH 4/5] Add param for bb limit to invoke fast_vrp.

If the basic block count is too high, simply use fast_vrp for all
VRP passes.

	gcc/doc/
	* invoke.texi (vrp-block-limit): Document.

	gcc/
	* params.opt (-param=vrp-block-limit): New.
	* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
	count exceeds limit.
---
 gcc/doc/invoke.texi | 3 +++
 gcc/params.opt  | 4 
 gcc/tree-vrp.cc | 4 ++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5d7a87fde86..f2f8f6334dc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16840,6 +16840,9 @@ this parameter.  The default value of this parameter is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
 @item vrp-sparse-threshold
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vrp-block-limit=
+Common Joined UInteger Var(param_vrp_block_limit) Init(15) Optimization Param
+Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
+
 -param=vrp-sparse-threshold=
 Common Joined UInteger Var(param_vrp_sparse_threshold) Init(3000) Optimization Param
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 4fc33e63e7d..eef02146ec6 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1330,9 +1330,9 @@ public:
   unsigned int execute (function *fun) final override
 {
   // Check for fast vrp.
-  if (&data == &pass_data_fast_vrp)
+  if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+	  &data == &pass_data_fast_vrp)
 	return execute_fast_vrp (fun, final_p);
-
   return execute_ranger_vrp (fun, final_p);
 }
 
-- 
2.45.0



[Patch] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-06-21 Thread Tobias Burnus

Hi all,

it turned out that 'declare target' with 'link' clause was broken in multiple 
ways.

The main fix is the attached patch, i.e. namely pushing the variables already to
the offload-vars list already in the FE.

When implementing it, I noticed:
* C has a similar issue when using nested functions, which is
  a GNU extension →https://gcc.gnu.org/115574

* When doing partial mapping of arrays (which is one of the reasons for 'link'),
  offsets are mishandled in Fortran (not tested in C), see FIXME in the patch)
  There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19.
  (I will file a PR about this).

* It might happen that linked variables do not get linked. I have not 
investigated
  why, but 'arr2' gives link errors – while 'arr' works.
  See FIXME in the patch. (I will file a PR about this)

* For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577

* When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c',
  it fails to link the device side as the 'mycom_' symbol cannot be found on the
  device side.  (I will file a PR about this)

As COMMON as issues, an alternative would be to defer the trans-common.cc
changes to a later patch.

Comments, questions, concerns?

Tobias

PS: Tested with nvptx offloading with a page-migration supporting system with
nvptx and GCN offloading configured and no new fails observed.
OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

	PR fortran/115559

gcc/fortran/ChangeLog:

	* trans-common.cc (build_common_decl): Add 'omp declare target' and
	'omp declare target link' variables to offload_vars.
	* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
	call decl_attributes.
	(get_proc_pointer_decl, gfc_get_extern_function_decl,
	build_function_decl): Update calls.
	(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
	to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: New test.

 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 .../libgomp.fortran/declare-target-link.f90| 119 +
 3 files changed, 195 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663..e714342c3c0 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init)
 	  = tree_cons (get_identifier ("omp declare target"),
 		   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+	{
+	  /* Add to offload_vars; get_create does so for omp_declare_target,
+	 omp_declare_target_link requires manual work.  */
+	  gcc_assert (symtab_node::get (decl) == 0);
+	  symtab_node *node = symtab_node::get_create (decl);
+	  if (node != NULL && com->omp_declare_target_link)
+	{
+	  node->offloadable = 1;
+	  if (ENABLE_OFFLOADING)
+		{
+		  g->have_offload = true;
+		  if (is_a  (node))
+		vec_safe_push (offload_vars, decl);
+		}
+	}
+	}
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 8d4f06a4e1d..4067dd6ed77 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -46,7 +46,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-stmt.h"
 #include "gomp-constants.h"
 #include "gimplify.h"
+#include "context.h"
 #include "omp-general.h"
+#include "omp-offload.h"
 #include "attr-fnspec.h"
 #include "tree-iterator.h"
 #include "dependency.h"
@@ -1470,19 +1472,18 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
 }
 
 
-static tree
-add_attributes_to_decl (symbol_attribute sym_attr, tree list)
+static void
+add_attributes_to_decl (tree *decl_p, const gfc_symbol *sym)
 {
   unsigned id;
-  tree attr;
+  tree list = NUL

Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

2024-06-21 Thread Christophe Lyon
On Fri, 21 Jun 2024 at 12:14, Richard Earnshaw (lists)
 wrote:
>
> On 21/06/2024 08:57, Alexandre Oliva wrote:
> > On Jun 20, 2024, Christophe Lyon  wrote:
> >
> >> Maybe using
> >> if ((unsigned)b[i] >= BITS) \
> >> would be clearer?
> >
> > Heh.  Why make it simpler if we can make it unreadable, right? :-D
> >
> > Thanks, here's another version I've just retested on x-arm-eabi.  Ok?
> >
> > I'm not sure how to credit your suggestion.  It's not like you pretty
> > much wrote the entire patch, as in Richard's case, but it's still a
> > sizable chunk of this two-liner.  Any preferences?
>
> How about mentioning Christophe's simplification in the commit log?

For the avoidance of doubt: it's OK for me (but you don't need to
mention my name in fact ;-)

Thanks,

Christophe

> >
> >
> > The test was too optimistic, alas.  We used to vectorize shifts
> > involving 8-bit and 16-bit integral types by clamping the shift count
> > at the highest in-range shift count, but that was not correct: such
> > narrow shifts expect integral promotion, so larger shift counts should
> > be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> > before the fix).
>
> This is OK, but you might wish to revisit this statement before committing.  
> I think the above is a mis-summary of the original bug report which had a 
> test to pick between 0 and 1 as the result of a shift operation.
>
> If I've understood what's going on here correctly, then we have
>
> (int16_t)32768 >> (int16_t) 16
>
> but shift is always done at int precision, so this is (due to default 
> promotions)
>
> (int)(int16_t)32768 >> 16  // size/type of the shift amount does not matter.
>
> which then simplifies to
>
> -32768 >> 16;  // 0x8000 >> 16
>
> = -1;
>
> I think the original bug was that we were losing the cast to short (and hence 
> the sign extension of the intermediate value), so effectively we simplified 
> this to
>
> 32768 >> 16; // 0x8000 >> 16
>
> = 0;
>
> And the other part of the observation was that it had to be done this way 
> (and couldn't be narrowed for vectorization) because 16 is larger than the 
> maximum shift for a short (actually you say that just below).
>
> R.
>
> >
> > Unfortunately, in the gimple model of vector units, such large shift
> > counts wouldn't be well-defined, so we won't vectorize such shifts any
> > more, unless we can tell they're in range or undefined.
> >
> > So the test that expected the incorrect clamping we no longer perform
> > needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> > suggested annotating the test with the expected ranges so as to enable
> > the optimization.
> >
> >
> > Co-Authored-By: Richard Earnshaw 
> >
> > for  gcc/testsuite/ChangeLog
> >
> >   PR tree-optimization/113281
> >   * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> > ---
> >  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c 
> > b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > index 8c7adef9ed8f1..03078de49c65e 100644
> > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > @@ -9,6 +9,8 @@
> >void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * 
> > __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
> >  int i;   \
> >  for (i=0; i > +  if ((unsigned)b[i] >= (unsigned)(BITS))  
> >   \
> > + __builtin_unreachable();\
> >dest[i] = a[i] OP b[i];  
> >   \
> >  }  
> >   \
> >  }
> >
> >
>


Re: [PATCH v3 6/6] aarch64: Add DLL import/export to AArch64 target

2024-06-21 Thread Richard Sandiford
Evgeny Karpov  writes:
> Monday, June 10, 2024 7:03 PM
> Richard Sandiford  wrote:
>
>> Thanks for the update.  Parts 1-5 look good to me.  Some minor comments
>> below about part 6:
>> 
>> If the TARGET_DLLIMPORT_DECL_ATTRIBUTES condition can be dropped, the
>> series is OK from my POV with that change and with the changes above.
>> Please get sign-off from an x86 maintainer too though.
>
> Thank you for the review and suggestions. Here is the updated version of 
> patch 6, based on the comments.
> The x86 and mingw maintainers have already approved the series.
>
> Regards,
> Evgeny 
>
>
>
> This patch reuses the MinGW implementation to enable DLL import/export
> functionality for the aarch64-w64-mingw32 target. It also modifies
> environment configurations for MinGW.
>
> gcc/ChangeLog:
>
>   * config.gcc: Add winnt-dll.o, which contains the DLL
>   import/export implementation.
>   * config/aarch64/aarch64.cc (aarch64_legitimize_pe_coff_symbol):
>   Add a conditional function that reuses the MinGW implementation
>   for COFF and does nothing otherwise.
>   (aarch64_expand_call): Add dllimport implementation.
>   (aarch64_legitimize_address): Likewise.
>   * config/aarch64/cygming.h (SYMBOL_FLAG_DLLIMPORT): Modify MinGW
>   environment to support DLL import/export.
>   (SYMBOL_FLAG_DLLEXPORT): Likewise.
>   (SYMBOL_REF_DLLIMPORT_P): Likewise.
>   (SYMBOL_FLAG_STUBVAR): Likewise.
>   (SYMBOL_REF_STUBVAR_P): Likewise.
>   (TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
>   (TARGET_ASM_FILE_END): Likewise.
>   (SUB_TARGET_RECORD_STUB): Likewise.
>   (GOT_ALIAS_SET): Likewise.
>   (PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Likewise.
>   (HAVE_64BIT_POINTERS): Likewise.

OK, thanks.  If you'd like commit access, please follow the instructions
on https://gcc.gnu.org/gitwrite.html , listing me as sponsor.

Richard.

> ---
>  gcc/config.gcc|  4 +++-
>  gcc/config/aarch64/aarch64.cc | 26 ++
>  gcc/config/aarch64/cygming.h  | 26 --
>  3 files changed, 53 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d053b98efa8..331285b7b6d 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1276,10 +1276,12 @@ aarch64-*-mingw*)
>   tm_file="${tm_file} mingw/mingw32.h"
>   tm_file="${tm_file} mingw/mingw-stdint.h"
>   tm_file="${tm_file} mingw/winnt.h"
> + tm_file="${tm_file} mingw/winnt-dll.h"
>   tmake_file="${tmake_file} aarch64/t-aarch64"
>   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> + target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
>   extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
> - extra_objs="${extra_objs} winnt.o"
> + extra_objs="${extra_objs} winnt.o winnt-dll.o"
>   c_target_objs="${c_target_objs} msformat-c.o"
>   d_target_objs="${d_target_objs} winnt-d.o"
>   tmake_file="${tmake_file} mingw/t-cygming"
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 3418e57218f..32e31e08449 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -860,6 +860,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
>{ "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
>{ "SVE type",3, 3, false, true,  false, true,  NULL, NULL 
> },
>{ "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
> +#if TARGET_DLLIMPORT_DECL_ATTRIBUTES
> +  { "dllimport", 0, 0, false, false, false, false, handle_dll_attribute, 
> NULL },
> +  { "dllexport", 0, 0, false, false, false, false, handle_dll_attribute, 
> NULL },
> +#endif
>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>SUBTARGET_ATTRIBUTE_TABLE
>  #endif
> @@ -2865,6 +2869,15 @@ static void
>  aarch64_load_symref_appropriately (rtx dest, rtx imm,
>  enum aarch64_symbol_type type)
>  {
> +#if TARGET_PECOFF
> +  rtx tmp = legitimize_pe_coff_symbol (imm, true);
> +  if (tmp)
> +{
> +  emit_insn (gen_rtx_SET (dest, tmp));
> +  return;
> +}
> +#endif
> +
>switch (type)
>  {
>  case SYMBOL_SMALL_ABSOLUTE:
> @@ -11233,6 +11246,13 @@ aarch64_expand_call (rtx result, rtx mem, rtx 
> cookie, bool sibcall)
>  
>gcc_assert (MEM_P (mem));
>callee = XEXP (mem, 0);
> +
> +#if TARGET_PECOFF
> +  tmp = legitimize_pe_coff_symbol (callee, false);
> +  if (tmp)
> +callee = tmp;
> +#endif
> +
>mode = GET_MODE (callee);
>gcc_assert (mode == Pmode);
>  
> @@ -12709,6 +12729,12 @@ aarch64_anchor_offset (HOST_WIDE_INT offset, 
> HOST_WIDE_INT size,
>  static rtx
>  aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
>  {
> +#if TARGET_PECOFF
> +  rtx tmp = legitimize_pe_coff_symbol (x, true);
> +  if (tmp)
> +return tmp;
> +#endif
> +
>/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(

Re: [PATCH] Build: Set gcc_cv_as_mips_explicit_relocs if gcc_cv_as_mips_explicit_relocs_pcrel

2024-06-21 Thread Maciej W. Rozycki
On Fri, 21 Jun 2024, Maciej W. Rozycki wrote:

> > Yeah, agreed FWIW.  This was necessary while the feature was relatively
> > new, and while we still supported IRIX as, but I can't see any reasonable
> > justification for using such an ancient binutils with modern GCC.
> > 
> > Getting rid of -mno-explicit-relocs altogether might simplify things.
> 
>  FWIW I tend to agree too, although I think the current mess has to be 
> fixed first (and backported to the release branches) before going forward 
> with the removal.

 And FAOD I think a stub check has to remain even after the removal and 
just cause `configure' to bail out if an unsupported obsolete version of 
GAS has been identified.

  Maciej


Re: [PATCH] Build: Set gcc_cv_as_mips_explicit_relocs if gcc_cv_as_mips_explicit_relocs_pcrel

2024-06-21 Thread YunQiang Su
Maciej W. Rozycki  于2024年6月21日周五 20:55写道:
>
> On Fri, 21 Jun 2024, Richard Sandiford wrote:
>
> > > We check gcc_cv_as_mips_explicit_relocs if 
> > > gcc_cv_as_mips_explicit_relocs_pcrel
> > > only, while gcc_cv_as_mips_explicit_relocs is used by later code.
> > >
> > > Maybe, it is time for use to set gcc_cv_as_mips_explicit_relocs always 
> > > now,
> > > as it has been in Binutils for more than 20 years.
> >
> > Yeah, agreed FWIW.  This was necessary while the feature was relatively
> > new, and while we still supported IRIX as, but I can't see any reasonable
> > justification for using such an ancient binutils with modern GCC.
> >
> > Getting rid of -mno-explicit-relocs altogether might simplify things.
>
>  FWIW I tend to agree too, although I think the current mess has to be
> fixed first (and backported to the release branches) before going forward
> with the removal.
>

Sure.

>  And AFAICT the proposed change is the wrong one: it has to be analysed
> how we came at the current breakage and then the state reproducing how it
> used to work before recreated.
>
>  Perhaps we need to check for general explicit reloc support first, before
> following with PC-relative relocs.  It seems natural to me this way,
> because you can't have support for PC-relative relocs (narrower scope)
> unless you have general explicit reloc support (wider scope) in the first
> place, so I wonder why we came up with what we have now.
>

I guess that we can suppose that these stages (some-future-one/pcrel/base)
are a strict superset one by one.

So we can detect the newest one, if it is OK, all older ones are also available.
If we check the oldest one first, we will have some trouble with AC_DEFINE,
as we may emit multiple "#define MIPS_EXPLICIT_RELOCS".

>   Maciej


Re: [PATCH] Build: Set gcc_cv_as_mips_explicit_relocs if gcc_cv_as_mips_explicit_relocs_pcrel

2024-06-21 Thread YunQiang Su
Maciej W. Rozycki  于2024年6月21日周五 22:00写道:
>
> On Fri, 21 Jun 2024, Maciej W. Rozycki wrote:
>
> > > Yeah, agreed FWIW.  This was necessary while the feature was relatively
> > > new, and while we still supported IRIX as, but I can't see any reasonable
> > > justification for using such an ancient binutils with modern GCC.
> > >
> > > Getting rid of -mno-explicit-relocs altogether might simplify things.
> >
> >  FWIW I tend to agree too, although I think the current mess has to be
> > fixed first (and backported to the release branches) before going forward
> > with the removal.
>
>  And FAOD I think a stub check has to remain even after the removal and
> just cause `configure' to bail out if an unsupported obsolete version of
> GAS has been identified.
>

Sure. And it is also useful to emit error if we cannot find mips binutils.
In fact, sometimes, I meet a problem if I forget to install mips binutils first.

>   Maciej


Re: [PATCH] Build: Set gcc_cv_as_mips_explicit_relocs if gcc_cv_as_mips_explicit_relocs_pcrel

2024-06-21 Thread YunQiang Su
> >
> >  And FAOD I think a stub check has to remain even after the removal and
> > just cause `configure' to bail out if an unsupported obsolete version of
> > GAS has been identified.
> >

Ohh,  I think that we shouldn't remove it now, as I have figure out
the PCREL patch,
and I am still waiting your response of PCREL support of Binutils.

My plan is, once Binutils is ready, I can submit my GCC patch.
I don't want to rewrite them.

And then, we can remove all no_explicit_relocs support. I mean that I
plan to remove
all `TARGET_EXPLICIT_RELOCS` macro related code in mips.cc/mips.h/mips.md etc.


Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-21 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 20 Jun 2024, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Mon, 17 Jun 2024, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > On Fri, 14 Jun 2024, Richard Biener wrote:
>> >> >
>> >> >> On Fri, 14 Jun 2024, Richard Sandiford wrote:
>> >> >> 
>> >> >> > Richard Biener  writes:
>> >> >> > > On Fri, 14 Jun 2024, Richard Sandiford wrote:
>> >> >> > >
>> >> >> > >> Richard Biener  writes:
>> >> >> > >> > The following retires vcond{,u,eq} optabs by stopping to use 
>> >> >> > >> > them
>> >> >> > >> > from the middle-end.  Targets instead (should) implement 
>> >> >> > >> > vcond_mask
>> >> >> > >> > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
>> >> >> > >> > possibly affected targets - those implementing these patterns,
>> >> >> > >> > and in particular it lists mips, sparc and ia64 as targets that
>> >> >> > >> > most definitely will regress while others might simply remove
>> >> >> > >> > their vcond{,u,eq} patterns.
>> >> >> > >> >
>> >> >> > >> > I'd appreciate testing, I do not expect fallout for x86 or 
>> >> >> > >> > arm/aarch64.
>> >> >> > >> > I know riscv doesn't implement any of the legacy optabs.  But 
>> >> >> > >> > less
>> >> >> > >> > maintained vector targets might need adjustments.
>> >> >> > >> >
>> >> >> > >> > I want to get rid of those optabs for GCC 15.  If I don't hear 
>> >> >> > >> > from
>> >> >> > >> > you I will assume your target is fine.
>> >> >> > >> 
>> >> >> > >> Great!  Thanks for doing this.
>> >> >> > >> 
>> >> >> > >> Is there a plan for how we should handle vector comparisons that
>> >> >> > >> have to be done as the inverse of the negated condition?  Should
>> >> >> > >> targets simply not provide vec_cmp for such conditions and leave
>> >> >> > >> the target-independent code to deal with the fallout?  (For a
>> >> >> > >> standalone comparison, it would invert the result.  For a 
>> >> >> > >> VEC_COND_EXPR
>> >> >> > >> it would swap the true and false values.)
>> >> >> > >
>> >> >> > > I would expect that the ISEL pass which currently deals with 
>> >> >> > > finding
>> >> >> > > valid combos of .VCMP{,U,EQ} and .VCOND_MASK deals with this.
>> >> >> > > So how do we deal with this right now?  I expect RTL expansion will
>> >> >> > > do the inverse trick, no?
>> >> >> > 
>> >> >> > I think in practice (at least for the targets I've worked on),
>> >> >> > the target's vec_cmp handles the inversion itself.  Thus the
>> >> >> > main optimisation done by targets' vcond patterns is to avoid
>> >> >> > the inversion (and instead swap the true/false values) when the
>> >> >> > "opposite" comparison is the native one.
>> >> >> 
>> >> >> I see.  I suppose whether or not vec_cmp is handled is determined
>> >> >> by a FAIL so it's somewhat difficult to determine this at ISEL time.
>> >> 
>> >> In principle we could say that the predicates should accept only the
>> >> conditions that can be done natively.  Then target-independent code
>> >> can apply the usual approaches to generating other conditions
>> >> (which tend to be replicated across targets anyway).
>> >
>> > Ah yeah, I suppose that would work.  So we'd update the docs
>> > to say predicates are required to reject not handled compares
>> > and otherwise the expander may not FAIL?
>> >
>> > I'll note that expand_vec_cmp_expr_p already looks at the insn
>> > predicates, so adjusting vector lowering (and vectorization) to
>> > emit only recognized compares (and requiring folding to keep it at that)
>> > should be possible.
>> >
>> > ISEL would then mainly need to learn the trick of swapping vector
>> > cond arms on inverted masks.  OTOH folding should also do that.
>> 
>> Yeah.
>> 
>> > Or do you suggest to allow all compares on GIMPLE and only fixup
>> > during ISEL?  How do we handle vector lowering then?  Would it be
>> > enough to require "any" condition code and thus we expect targets
>> > to implement enough codes so all compares can be handled by
>> > swapping/inversion?
>> 
>> I'm not sure TBH.  I can see the argument that "canonicalising"
>> conditions for the target could be either vector lowering or ISEL.
>> 
>> If a target can only do == or != natively, for instance (is any target
>> like that?), then I think it should be ok for the predicates to accept
>> only that condition.  Then the opposite != or == could be done using
>> vector lowering/ISEL, but ordered comparisons would need to be lowered
>> as though vec_cmp wasn't implemented at all.
>> 
>> Something similar probably applies to FP comparisons if the handling
>> of unordered comparisons is limited.
>> 
>> And if we do that, it might be easier for vector lowering to handle
>> everything itself, rather than try to predict what ISEL is going to do.
>
> I agree that as we have to handle completely unsupported cases in
> vector lowering anyway it's reasonable to try to force only supported
> ops after that.
>
> Note that when targets stop to advertise not supported compa

Re: [PATCH 2/6] rtl-ssa: Don't cost no-op moves

2024-06-21 Thread Jeff Law




On 6/20/24 7:34 AM, Richard Sandiford wrote:

No-op moves are given the code NOOP_MOVE_INSN_CODE if we plan
to delete them later.  Such insns shouldn't be costed, partly
because they're going to disappear, and partly because targets
won't recognise the insn code.

gcc/
* rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Don't
cost no-op moves.
* rtl-ssa/insns.cc (insn_info::calculate_cost): Likewise.
This is OK.  Your call if you want to include it now or wait for the 
full series to be ACK'd.


jeff



Re: [PATCH 5/6] xstormy16: Fix xs_hi_nonmemory_operand

2024-06-21 Thread Jeff Law




On 6/20/24 7:34 AM, Richard Sandiford wrote:

All uses of xs_hi_nonmemory_operand allow constraint "i",
which means that they allow consts, symbol_refs and label_refs.
The definition of xs_hi_nonmemory_operand accounted for consts,
but not for symbol_refs and label_refs.

gcc/
* config/stormy16/predicates.md (xs_hi_nonmemory_operand): Handle
symbol_ref and label_ref.

OK for the trunk anytime.
jeff



Re: [PATCH 3/6] iq2000: Fix test and branch instructions

2024-06-21 Thread Jeff Law




On 6/20/24 7:34 AM, Richard Sandiford wrote:

The iq2000 test and branch instructions had patterns like:

   [(set (pc)
(if_then_else
 (eq (and:SI (match_operand:SI 0 "register_operand" "r")
 (match_operand:SI 1 "power_of_2_operand" "I"))
  (const_int 0))
 (match_operand 2 "pc_or_label_operand" "")
 (match_operand 3 "pc_or_label_operand" "")))]

power_of_2_operand allows any 32-bit power of 2, whereas "I" only
accepts 16-bit signed constants.  This meant that any power of 2
greater than 32768 would cause an "insn does not satisfy its
constraints" ICE.

Also, the %p operand modifier barfed on 1<<31, which is sign-
rather than zero-extended to 64 bits.  The code is inherently
limited to 32-bit operands -- power_of_2_operand contains a test
involving "unsigned" -- so this patch just ands with 0x.

gcc/
* config/iq2000/iq2000.cc (iq2000_print_operand): Make %p handle 1<<31.
* config/iq2000/iq2000.md: Remove "I" constraints on
power_of_2_operands.

OK for the trunk.
jeff



Re: [PATCH 1/6] rtl-ssa: Rework _ignoring interfaces

2024-06-21 Thread Jeff Law




On 6/20/24 7:34 AM, Richard Sandiford wrote:

rtl-ssa has routines for scanning forwards or backwards for something
under the control of an exclusion set.  These searches are currently
used for two main things:

- to work out where an instruction can be moved within its EBB
- to work out whether recog can add a new hard register clobber

The exclusion set was originally a callback function that returned
true for insns that should be ignored.  However, for the late-combine
work, I'd also like to be able to skip an entire definition, along
with all its uses.

This patch prepares for that by turning the exclusion set into an
object that provides predicate member functions.  Currently the
only two member functions are:

- should_ignore_insn: what the old callback did
- should_ignore_def: the new functionality

but more could be added later.

Doing this also makes it easy to remove some assymmetry that I think
in hindsight was a mistake: in forward scans, ignoring an insn meant
ignoring all definitions in that insn (ok) and all uses of those
definitions (non-obvious).  The new interface makes it possible
to select the required behaviour, with that behaviour being applied
consistently in both directions.

Now that the exclusion set is a dedicated object, rather than
just a "random" function, I think it makes sense to remove the
_ignoring suffix from the function names.  The suffix was originally
there to describe the callback, and in particular to emphasise that
a true return meant "ignore" rather than "heed".

gcc/
* rtl-ssa.h: Include predicates.h.
* rtl-ssa/predicates.h: New file.
* rtl-ssa/access-utils.h (prev_call_clobbers_ignoring): Rename to...
(prev_call_clobbers): ...this and treat the ignore parameter as an
object with the same interface as ignore_nothing.
(next_call_clobbers_ignoring): Rename to...
(next_call_clobbers): ...this and treat the ignore parameter as an
object with the same interface as ignore_nothing.
(first_nondebug_insn_use_ignoring): Rename to...
(first_nondebug_insn_use): ...this and treat the ignore parameter as
an object with the same interface as ignore_nothing.
(last_nondebug_insn_use_ignoring): Rename to...
(last_nondebug_insn_use): ...this and treat the ignore parameter as
an object with the same interface as ignore_nothing.
(last_access_ignoring): Rename to...
(last_access): ...this and treat the ignore parameter as an object
with the same interface as ignore_nothing.  Conditionally skip
definitions.
(prev_access_ignoring): Rename to...
(prev_access): ...this and treat the ignore parameter as an object
with the same interface as ignore_nothing.
(first_def_ignoring): Replace with...
(first_access): ...this new function.
(next_access_ignoring): Rename to...
(next_access): ...this and treat the ignore parameter as an object
with the same interface as ignore_nothing.  Conditionally skip
definitions.
* rtl-ssa/change-utils.h (insn_is_changing): Delete.
(restrict_movement_ignoring): Rename to...
(restrict_movement): ...this and treat the ignore parameter as an
object with the same interface as ignore_nothing.
(recog_ignoring): Rename to...
(recog): ...this and treat the ignore parameter as an object with
the same interface as ignore_nothing.
* rtl-ssa/changes.h (insn_is_changing_closure): Delete.
* rtl-ssa/functions.h (function_info::add_regno_clobber): Treat
the ignore parameter as an object with the same interface as
ignore_nothing.
* rtl-ssa/insn-utils.h (insn_is): Delete.
* rtl-ssa/insns.h (insn_is_closure): Delete.
* rtl-ssa/member-fns.inl
(insn_is_changing_closure::insn_is_changing_closure): Delete.
(insn_is_changing_closure::operator()): Likewise.
(function_info::add_regno_clobber): Treat the ignore parameter
as an object with the same interface as ignore_nothing.
(ignore_changing_insns::ignore_changing_insns): New function.
(ignore_changing_insns::should_ignore_insn): Likewise.
* rtl-ssa/movement.h (restrict_movement_for_dead_range): Treat
the ignore parameter as an object with the same interface as
ignore_nothing.
(restrict_movement_for_defs_ignoring): Rename to...
(restrict_movement_for_defs): ...this and treat the ignore parameter
as an object with the same interface as ignore_nothing.
(restrict_movement_for_uses_ignoring): Rename to...
(restrict_movement_for_uses): ...this and treat the ignore parameter
as an object with the same interface as ignore_nothing.  Conditionally
skip definitions.
* doc/rtl.texi: Update for above name changes.  Use
ignore_changing_insns instead of insn_is_changing.
* conf

RE: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

2024-06-21 Thread Li, Pan2
Thanks Richard for suggestion, tried the (convert? with below gimple stmt but 
got a miss def ice.
To double confirm, the *type_out should be the vector type of lhs, and we only 
need to build
one cvt stmt from itype to otype here. Or just return the call directly and set 
the type_out to the v_otype?

static gimple *
vect_recog_build_binary_gimple_stmt (vec_info *vinfo, gimple *stmt,
 internal_fn fn, tree *type_out,
 tree lhs, tree op_0, tree op_1)
{
  tree itype = TREE_TYPE (op_0);
  tree otype = TREE_TYPE (lhs);
  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);

  if (v_itype != NULL_TREE && v_otype != NULL_TREE
&& direct_internal_fn_supported_p (fn, v_itype, OPTIMIZE_FOR_BOTH))
{
  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
  tree itype_ssa = vect_recog_temp_ssa_var (itype, NULL);

  gimple_call_set_lhs (call, itype_ssa);
  gimple_call_set_nothrow (call, /* nothrow_p */ false);
  gimple_set_location (call, gimple_location (stmt));

  *type_out = v_otype;
  gimple *new_stmt = call;

  if (itype != otype)
{
  tree otype_ssa = vect_recog_temp_ssa_var (otype, NULL);
  new_stmt = gimple_build_assign (otype_ssa, CONVERT_EXPR, itype_ssa);
}

  return new_stmt;
}

  return NULL;
}

-cut the ice---

zip.test.c: In function ‘test’:
zip.test.c:4:6: error: missing definition
4 | void test (uint16_t *x, unsigned b, unsigned n)
  |  ^~~~
for SSA_NAME: patt_40 in statement:
vect_cst__151 = [vec_duplicate_expr] patt_40;
during GIMPLE pass: vect
dump file: zip.test.c.180t.vect
zip.test.c:4:6: internal compiler error: verify_ssa failed
0x1de0860 verify_ssa(bool, bool)

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
0x1919f69 execute_function_todo

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
0x1918b46 do_per_function

/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
0x191a116 execute_todo

Pan


-Original Message-
From: Richard Biener  
Sent: Friday, June 21, 2024 5:29 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB

On Fri, Jun 21, 2024 at 10:50 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > to match this by changing it to
>
> > /* Unsigned saturation sub, case 2 (branch with ge):
> >SAT_U_SUB = X >= Y ? X - Y : 0.  */
> > (match (unsigned_integer_sat_sub @0 @1)
> > (cond^ (ge @0 @1) (convert? (minus @0 @1)) integer_zerop)
> >  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> >   && types_match (type, @0, @1
>
> Do we need another name for this matching ? Add (convert? here may change the 
> sematics of .SAT_SUB.
> When we call gimple_unsigned_integer_sat_sub (lhs, ops, NULL), the converted 
> value may be returned different
> to the (minus @0 @1). Please correct me if my understanding is wrong.

I think gimple_unsigned_integer_sat_sub (lhs, ...) simply matches
(typeof LHS).SAT_SUB (ops[0], ops[1]) now, I don't think it's necessary to
handle the case where typef LHS and typeof ops[0] are equal specially?

> > and when using the gimple_match_* function make sure to consider
> > that the .SAT_SUB (@0, @1) is converted to the type of the SSA name
> > we matched?
>
> This may have problem for vector part I guess, require some additional change 
> from vectorize_convert when
> I try to do that in previous. Let me double check about it, and keep you 
> posted.

You are using gimple_unsigned_integer_sat_sub from pattern recognition, the
thing to do is simply to add a conversion stmt to the pattern sequence in case
the types differ?

But maybe I'm missing something.

Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Friday, June 21, 2024 3:00 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB
>
> On Fri, Jun 21, 2024 at 5:53 AM  wrote:
> >
> > From: Pan Li 
> >
> > The zip benchmark of coremark-pro have one SAT_SUB like pattern but
> > truncated as below:
> >
> > void test (uint16_t *x, unsigned b, unsigned n)
> > {
> >   unsigned a = 0;
> >   register uint16_t *p = x;
> >
> >   do {
> > a = *--p;
> > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB
> >   } while (--n);
> > }
> >
> > It will have gimple after ifcvt pass,  it cannot hit any pattern of
> > SAT_SUB and then cannot vectorize to SAT_SUB.
> >
> > _2 = a_11 - b_12(D);
> > iftmp.0_13 = (short unsigned int) _2;
> > _18 = a_11 >= b_12(D);
> > iftmp.0_5 = _18 ? iftmp.0_13 :

[PATCH] RISC-V: Fix unrecognizable pattern in riscv_expand_conditional_move()

2024-06-21 Thread Artemiy Volkov
Presently, the code fragment:

int x[5];

void
d(int a, int b, int c) {
  for (int i = 0; i < 5; i++)
x[i] = (a != b) ? c : a;
}

causes an ICE when compiled with -O2 -march=rv32i_zicond:

test.c: In function 'd':
test.c: error: unrecognizable insn:
   11 | }
  | ^
(insn 8 5 9 2 (set (reg:SI 139 [ iftmp.0_2 ])
(if_then_else:SI (ne:SI (reg/v:SI 136 [ a ])
(reg/v:SI 137 [ b ]))
(reg/v:SI 136 [ a ])
(reg/v:SI 138 [ c ]))) -1
 (nil))
during RTL pass: vregs

This happens because, as part of one of the optimizations in
riscv_expand_conditional_move(), an if_then_else is generated with both
comparands being register operands, resulting in an unmatchable insn since
Zicond patterns require constant 0 as the second comparand.  Fix this by adding
a extra check before performing this optimization.

The code snippet mentioned above is also included in this patch as a new Zicond
testcase.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Add a
CONST0_RTX check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ice-3.c: New test.

Signed-off-by: Artemiy Volkov 
---
 gcc/config/riscv/riscv.cc |  3 ++-
 gcc/testsuite/gcc.target/riscv/zicond-ice-3.c | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ice-3.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 029c80b21cf..6c58687b5e5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4674,8 +4674,9 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
   /* reg, reg  */
   else if (REG_P (cons) && REG_P (alt))
{
- if ((code == EQ && rtx_equal_p (cons, op0))
+ if (((code == EQ && rtx_equal_p (cons, op0))
   || (code == NE && rtx_equal_p (alt, op0)))
+ && op1 == CONST0_RTX (mode))
{
  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
  if (!rtx_equal_p (cons, op0))
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ice-3.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ice-3.c
new file mode 100644
index 000..ac6049c9ae5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ice-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+
+int x[5];
+
+void
+d(int a, int b, int c) {
+  for (int i = 0; i < 5; i++)
+x[i] = (a != b) ? c : a;
+}
-- 
2.37.1



Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-21 Thread Jeff Law




On 6/20/24 7:34 AM, Richard Sandiford wrote:

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.
I would expect this to fix a problem we've seen on RISC-V as well. 
Essentially we have A, B an C.  We want to combine A->B and A->C 
generating B' and C' and eliminate A.  This shows up in the xz loop.



.


On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.
IIRC the limited enablement was one of the things folks were unhappy 
about in the gcc-14 cycle.  Good to see that addressed.





Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation.  This trips things like:

   (define_insn_and_split "..."
 [...pattern...]
 "...cond..."
 "#"
 "&& 1"
 [...pattern...]
 {
   ...unconditional use of gen_reg_rtx ()...;
 }

because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed.  rs6000 has several instances of this.
Interesting.  I suspect ppc won't be the only affected port.  This is 
somewhat worrisome.




xtensa has a variation in which the split condition is:

 "&& can_create_pseudo_p ()"

The failure then is that, if we match after RA, we'll never be
able to split the instruction.

The patch therefore disables the pass by default on i386, rs6000
and xtensa.  Hopefully we can fix those ports later (if their
maintainers want).  It seems easier to add the pass first, though,
to make it easier to test any such fixes.
I suspect it'll be a "does this make code better on the port, then let's 
fix the port so it can be used consistently" kind of scenario.  Given 
the data you've presented I strongly suspect it would make the code 
better on the xtensa, so hopefully Max will do the gruntwork on that one.





gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* opts.cc (default_options_table): Enable it by default at -O2
and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.
* config/i386/i386-options.cc (ix86_override_options_after_change):
Disable late-combine by default.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Likewise.
* config/xtensa/xtensa.cc (xtensa_option_override): Likewise.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add
-fno-late-combine-instructions.
* gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
---



OK.  Obviously we'll need to keep an eye on testing state after this 
patch.  I do expect fallout from the splitter issue noted above, but 
IMHO those are port problems for the port maintainers to sort out.


Jeff


[PATCH] libstdc++: Fix test on x86_64 and non-simd targets

2024-06-21 Thread Matthias Kretz

* Running a test compiled with AVX512 instructions requires
avx512f_runtime not just avx512f.

* The 'reduce2' test violated an invariant of fixed_size_simd_mask and
thus failed on all targets without 16-Byte vector builtins enabled (in
bits/simd.h).

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/115575
* testsuite/experimental/simd/pr115454_find_last_set.cc: Require
avx512f_runtime. Don't memcpy fixed_size masks.
---
 .../testsuite/experimental/simd/pr115454_find_last_set.cc   | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc b/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc
index b47f19d3067..25a713b4e94 100644
--- a/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc
+++ b/libstdc++-v3/testsuite/experimental/simd/pr115454_find_last_set.cc
@@ -1,7 +1,7 @@
 // { dg-options "-std=gnu++17" }
 // { dg-do run { target *-*-* } }
 // { dg-require-effective-target c++17 }
-// { dg-additional-options "-march=x86-64-v4" { target avx512f } }
+// { dg-additional-options "-march=x86-64-v4" { target avx512f_runtime } }
 // { dg-require-cmath "" }
 
 #include 
@@ -25,7 +25,9 @@ namespace stdx
 {
   using M8 = typename V::mask_type;
   using M4 = typename V::mask_type;
-  if constexpr (sizeof(M8) == sizeof(M4))
+  if constexpr (sizeof(M8) == sizeof(M4)
+		  && !std::is_same_v>)
+// fixed_size invariant: padding bits of masks are zero, the memcpy would violate that
 {
   M4 k;
   __builtin_memcpy(&__data(k), &__data(M8(true)), sizeof(M4));


signature.asc
Description: This is a digitally signed message part.


[Patch] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Tobias Burnus

mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobias
gcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_obj): Generate C file that uses #embed.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 66 +
 1 file changed, 12 insertions(+), 54 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..0ccb874398a 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, &len);
-
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n"
+	   "#embed \"%s\"\n"
+	   "#else\n"
+	   "#error \"#embed '%s' failed\"\n"
+	   "#endif\n"
+	   "};\n\n", fname_in, fname_in, fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
 	   "  size_t size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
@@ -1316,7 +1274,7 @@ main (int argc, char **argv)
   if (!in)
 	fatal_error (input_location, "cannot open intermediate gcn obj file");
 
-  process_obj (in, cfile, omp_requires);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   fclose (in);
 


Re: [committed] [RISC-V] Fix wrong patch application

2024-06-21 Thread Jeff Law




On 6/19/24 9:43 PM, Christoph Müllner wrote:

Hi Jeff,

the test should probably also be skipped on -Oz:

 === gcc: Unexpected fails for rv64imafdc lp64d medlow  ===
FAIL: gcc.target/riscv/zbs-ext-2.c  -Oz   scan-assembler-times andi\t 1
FAIL: gcc.target/riscv/zbs-ext-2.c  -Oz   scan-assembler-times andn\t 1
FAIL: gcc.target/riscv/zbs-ext-2.c  -Oz   scan-assembler-times li\t 1
Thanks.  I'll double-check.   I must admit that when I added -Os I was 
surprised I didn't need -Oz, or maybe I just mis-read the results.


jeff



[Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Tobias Burnus

[I messed up copying from the build system, picking up an old version.
Changes to v1 (bottom of the diff): fopen is no longer required.]

Tobias Burnus wrote:

mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobiasgcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_obj): Generate C file that uses #embed.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 72 -
 1 file changed, 12 insertions(+), 60 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..0c840318b2d 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, &len);
-
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n"
+	   "#embed \"%s\"\n"
+	   "#else\n"
+	   "#error \"#embed '%s' failed\"\n"
+	   "#endif\n"
+	   "};\n\n", fname_in, fname_in, fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
 	   "  size_t size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
@@ -1312,13 +1270,7 @@ main (int argc, char **argv)
   fork_execute (ld_argv[0], CONST_CAST (char **, ld_argv), true, ".ld_args");
   obstack_free (&ld_argv_obstack, NULL);
 
-  in = fopen (gcn_o_name, "r");
-  if (!in)
-	fatal_error (input_location, "cannot open intermediate gcn obj file");
-
-  process_obj (in, cfile, omp_requires);
-
-  fclose (in);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
   xputenv (concat ("COMPILER_PATH=", cpath, NULL));


Re: [PATCH] libstdc++: Fix test on x86_64 and non-simd targets

2024-06-21 Thread Jonathan Wakely
On Fri, 21 Jun 2024 at 16:07, Matthias Kretz  wrote:
>
>
> * Running a test compiled with AVX512 instructions requires
> avx512f_runtime not just avx512f.
>
> * The 'reduce2' test violated an invariant of fixed_size_simd_mask and
> thus failed on all targets without 16-Byte vector builtins enabled (in
> bits/simd.h).

OK, thanks for the quick diagnosis and fix.


>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/115575
> * testsuite/experimental/simd/pr115454_find_last_set.cc: Require
> avx512f_runtime. Don't memcpy fixed_size masks.
> ---
>  .../testsuite/experimental/simd/pr115454_find_last_set.cc   | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──


Re: [Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Andrew Stubbs

On 21/06/2024 16:30, Tobias Burnus wrote:

[I messed up copying from the build system, picking up an old version.
Changes to v1 (bottom of the diff): fopen is no longer required.]

Tobias Burnus wrote:

mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobias




   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */


Please adjust that comment; the FIXME can be removed completely.

Otherwise LGTM, thanks. :-)

Andrew


Re: [PATCH] libstdc++: Fix std::fill and std::fill_n optimizations [PR109150]

2024-06-21 Thread Jonathan Wakely
Pushed to trunk now.

On Thu, 20 Jun 2024 at 16:28, Jonathan Wakely  wrote:
>
> I think the new conditions are correct. They're certainly an improvment
> on just checking __is_scalar without considering what we're assigning it
> to.
>
> Tested x86_64-linux.
>
> -- >8 --
>
> As noted in the PR, the optimization used for scalar types in std::fill
> and std::fill_n is non-conforming, because it doesn't consider that
> assigning a scalar type might have non-trivial side effects which are
> affected by the optimization.
>
> By changing the condition under which the optimization is done we ensure
> it's only performed when safe to do so, and we also enable it for
> additional types, which was the original subject of the PR.
>
> Instead of two overloads using __enable_if<__is_scalar::__value, R>
> we can combine them into one and create a local variable which is either
> a local copy of __value or another reference to it, depending on whether
> the optimization is allowed.
>
> This removes a use of std::__is_scalar, which is a step towards fixing
> PR 115497 by removing std::__is_pointer from 
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/109150
> * include/bits/stl_algobase.h (__fill_a1): Combine the
> !__is_scalar and __is_scalar overloads into one and rewrite the
> condition used to decide whether to perform the load outside the
> loop.
> * testsuite/25_algorithms/fill/109150.cc: New test.
> * testsuite/25_algorithms/fill_n/109150.cc: New test.
> ---
>  libstdc++-v3/include/bits/stl_algobase.h  | 75 ---
>  .../testsuite/25_algorithms/fill/109150.cc| 62 +++
>  .../testsuite/25_algorithms/fill_n/109150.cc  | 62 +++
>  3 files changed, 171 insertions(+), 28 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill/109150.cc
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc
>
> diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
> b/libstdc++-v3/include/bits/stl_algobase.h
> index d831e0e9883..1a0f8c14073 100644
> --- a/libstdc++-v3/include/bits/stl_algobase.h
> +++ b/libstdc++-v3/include/bits/stl_algobase.h
> @@ -929,28 +929,39 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
>  #define _GLIBCXX_MOVE_BACKWARD3(_Tp, _Up, _Vp) std::copy_backward(_Tp, _Up, 
> _Vp)
>  #endif
>
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions"
>template
>  _GLIBCXX20_CONSTEXPR
> -inline typename
> -__gnu_cxx::__enable_if::__value, void>::__type
> +inline void
>  __fill_a1(_ForwardIterator __first, _ForwardIterator __last,
>   const _Tp& __value)
>  {
> -  for (; __first != __last; ++__first)
> -   *__first = __value;
> -}
> +  // We can optimize this loop by moving the load from __value outside
> +  // the loop, but only if we know that making that copy is trivial,
> +  // and the assignment in the loop is also trivial (so that the identity
> +  // of the operand doesn't matter).
> +  const bool __load_outside_loop =
> +#if __has_builtin(__is_trivially_constructible) \
> +  && __has_builtin(__is_trivially_assignable)
> +   __is_trivially_constructible(_Tp, const _Tp&)
> +   && __is_trivially_assignable(__decltype(*__first), const _Tp&)
> +#else
> +   __is_trivially_copyable(_Tp)
> +   && __is_same(_Tp, __typeof__(*__first))
> +#endif
> +   && sizeof(_Tp) <= sizeof(long long);
>
> -  template
> -_GLIBCXX20_CONSTEXPR
> -inline typename
> -__gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type
> -__fill_a1(_ForwardIterator __first, _ForwardIterator __last,
> - const _Tp& __value)
> -{
> -  const _Tp __tmp = __value;
> +  // When the condition is true, we use a copy of __value,
> +  // otherwise we just use another reference.
> +  typedef typename __gnu_cxx::__conditional_type<__load_outside_loop,
> +const _Tp,
> +const _Tp&>::__type _Up;
> +  _Up __val(__value);
>for (; __first != __last; ++__first)
> -   *__first = __tmp;
> +   *__first = __val;
>  }
> +#pragma GCC diagnostic pop
>
>// Specialization: for char types we can use memset.
>template
> @@ -1079,28 +1090,36 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
>__size_to_integer(__float128 __n) { return (long long)__n; }
>  #endif
>
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wc++17-extensions"
>template
>  _GLIBCXX20_CONSTEXPR
> -inline typename
> -__gnu_cxx::__enable_if::__value, 
> _OutputIterator>::__type
> +inline _OutputIterator
>  __fill_n_a1(_OutputIterator __first, _Size __n, const _Tp& __value)
>  {
> -  for (; __n > 0; --__n, (void) ++__first)
> -   *__first = __value;
> -  return __first;
> -}
> +  // See std::__fill_a1 for explanation

Re: [PATCH 4/4] libstdc++: Remove std::__is_pointer and std::__is_scalar [PR115497]

2024-06-21 Thread Jonathan Wakely
This series (and the patch it depends on) have been pushed to trunk now.


On Fri, 21 Jun 2024 at 10:31, Jonathan Wakely  wrote:
>
> Oops, this patch series actually depends on
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655267.html which
> was posted separately, but needs to be applied before 4/4 in this
> series.
>
> On Thu, 20 Jun 2024 at 16:35, Jonathan Wakely  wrote:
> >
> > We still have __is_arithmetic in  after this,
> > but that needs a lot more work to remove its uses from  and
> > .
> >
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > This removes the std::__is_pointer and std::__is_scalar traits, as they
> > conflicts with a Clang built-in.
> >
> > Although Clang has a hack to make the class templates work despite using
> > reserved names, removing these class templates will allow that hack to
> > be dropped at some future date.
> >
> > libstdc++-v3/ChangeLog:
> >
> > PR libstdc++/115497
> > * include/bits/cpp_type_traits.h (__is_pointer, __is_scalar):
> > Remove.
> > (__is_arithmetic): Do not use __is_pointer in the primary
> > template. Add partial specialization for pointers.
> > ---
> >  libstdc++-v3/include/bits/cpp_type_traits.h | 33 -
> >  1 file changed, 33 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
> > b/libstdc++-v3/include/bits/cpp_type_traits.h
> > index 4d83b9472e6..abe0c7603e3 100644
> > --- a/libstdc++-v3/include/bits/cpp_type_traits.h
> > +++ b/libstdc++-v3/include/bits/cpp_type_traits.h
> > @@ -343,31 +343,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> >  };
> >  #endif
> >
> > -  //
> > -  // Pointer types
> > -  //
> > -#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > -  template
> > -struct __is_pointer : __truth_type<_IsPtr>
> > -{
> > -  enum { __value = _IsPtr };
> > -};
> > -#else
> > -  template
> > -struct __is_pointer
> > -{
> > -  enum { __value = 0 };
> > -  typedef __false_type __type;
> > -};
> > -
> > -  template
> > -struct __is_pointer<_Tp*>
> > -{
> > -  enum { __value = 1 };
> > -  typedef __true_type __type;
> > -};
> > -#endif
> > -
> >//
> >// An arithmetic type is an integer type or a floating point type
> >//
> > @@ -376,14 +351,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> >  : public __traitor<__is_integer<_Tp>, __is_floating<_Tp> >
> >  { };
> >
> > -  //
> > -  // A scalar type is an arithmetic type or a pointer type
> > -  //
> > -  template
> > -struct __is_scalar
> > -: public __traitor<__is_arithmetic<_Tp>, __is_pointer<_Tp> >
> > -{ };
> > -
> >//
> >// For use in std::copy and std::find overloads for streambuf iterators.
> >//
> > --
> > 2.45.2
> >



Re: [PATCH 2/3] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-06-21 Thread Patrick O'Neill

Hi Feng,

Pre-commit has flagged a build-failure for patch 2/3:
https://github.com/ewlu/gcc-precommit-ci/issues/1786#issuecomment-2181962244

When applied to 9a76db24e04 i386: Allow all register_operand SUBREGs in 
x86_ternlog_idx.


Re-confirmed locally with 5320bcbd342 xstormy16: Fix 
xs_hi_nonmemory_operand.


Additionally there is an apply failure for patch 3/3.

Results can be seen here:
Series:
https://patchwork.sourceware.org/project/gcc/list/?series=35407
Patch 2/3:
https://patchwork.sourceware.org/project/gcc/patch/20240621015459.13525-2-wangf...@eswincomputing.com/
https://github.com/ewlu/gcc-precommit-ci/issues/1786#issuecomment-2181863112
Patch 3/3:
https://patchwork.sourceware.org/project/gcc/patch/20240621015459.13525-3-wangf...@eswincomputing.com/
https://github.com/ewlu/gcc-precommit-ci/issues/1784#issuecomment-2181861381

Thanks,
Patrick

On 6/20/24 18:54, Feng Wang wrote:

Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
functions are added by this patch.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
Add 'Zvfbfmin' intrinsic in bases.
(class vfwcvtbf16_f): Ditto.
(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 
'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS):
Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
(vfncvtbf16_f): Ditto.
(vfncvtbf16_f_frm): Ditto.
(vfwcvtbf16_f): Ditto.
(vfwmaccbf16): Ditto.
(vfwmaccbf16_frm): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
Add vector intrinsic build judgment for BFloat16.
(build_all): Ditto.
(BASE_NAME_MAX_LEN): Adjust max length.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
Add new operand type for BFloat16.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
(validate_instance_type_required_extensions):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins.h (enum required_ext):
Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
(reqired_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Add match case for 'Zvfbfmin' and 
'Zvfbfwma'.
* config/riscv/riscv.cc (riscv_validate_vector_type):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.

---


Re: [PATCH 4/7 v2] lto: Implement ltrans cache

2024-06-21 Thread Michal Jireš

On 6/20/24 7:45 PM, Andi Kleen wrote:


There are much faster/optimized modern hashes for good collision detection over
MD5 especially when it's not needed to be cryptographically secure. Pick
something from smhasher.

Also perhaps the check sum should be cached in the file? I assume it's
cheap to compute while writing. It could be written at the tail of the
file. Then it can be read by seeking to the end and you save that
step.


Good ideas, but with only minor benefits relative to time spent in
LTRANS phase. Current focus was to create simple mostly self-contained
implementation that reduces LTRANS recompilations. We can do these more
pervasive improvements incrementally.

Just to clarify, the hashes are computed only once, then stored in the
cache.


The lockfiles scare me a bit. What happens when they get lost, e.g.
due to a compiler crash? You may need some recovery for that.
Perhaps it would be better to make the files self checking, so that
partial files can be detected when reading, and get rid of the locks.


It uses process-associated locks via fcntl, so if the compiler crashes,
the locks will be released. If the compiler process crashes and leaves
partially written file, the lto-wrapper deletes it in tool_cleanup.
If a file is missing, the cache entry will be deleted.

Michal


Re: [PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-21 Thread Carl Love
Kewen:

On 6/21/24 03:36, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/20 00:13, Carl Love wrote:
>> GCC maintainers:
>>
>> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
>> Peter suggested the -mdejagnu-cpu= value must be power7.  
>> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  
>> Patch has been retested on a Power 10 box, it succeeds
>> with 2 passes and no fails.
> 
> IMHO Peter's suggestion on power7 (-mdejagnu-cpu=power7) is mainly for
> altivec-1-runnable.c.  Both your testing and the comments in the test
> case show this altivec-2-runnable.c requires at least power8.

OK.  Per other thread changed altivec-1-runnable to power7.

> 
>>
>> Per the additional feedback after patch: 
>>
>>   commit c892525813c94b018464d5a4edc17f79186606b7
>>   Author: Carl Love 
>>   Date:   Tue Jun 11 14:01:16 2024 -0400
>>
>>   rs6000, altivec-2-runnable.c should be a runnable test
>> 
>>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>>   test.  This patch changes the dg-do command argument to run.
>> 
>>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>>   argument to run.
>>
>> was approved and committed, I have updated the dg-require-effective-target
>> and dg-options as requested so the test will compile with -O2 on a 
>> machine that has a minimum support of Power 8 vector hardware.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> 
>> rs6000, altivec-2-runnable.c update the require-effective-target
>>
>> The test requires a minimum of Power8 vector HW and a compile level
>> of -O2.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-2-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> index 17b23eb9d50..9e7ef89327b 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> @@ -1,7 +1,7 @@
>> -/* { dg-do run } */
>> -/* { dg-options "-mvsx" } */
>> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */
>> -/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-do run { target vsx_hw } } */
> 
> As this test case requires power8 and up, and dg-options specifies
> -mdejagnu-cpu=power8, we should use p8vector_hw instead of vsx_hw here,
> otherwise it will fail on power7 env.

Changed to p8vector_hw

> 
>> +/* { dg-do compile { target { ! vmx_hw } } } */
> 
> This condition should be ! , so ! p8vector_hw.

Changed. 

> 
>> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */> +/* { 
>> dg-require-effective-target powerpc_altivec } */
> 
> This should be powerpc_vsx instead, otherwise this case can still be
> tested with -mno-vsx -maltivec, then this test case would fail.

OK
> 
> Besides, as the discussion on the name of this test case, could you also
> rename this to p8vector-builtin-9.c instead?

Put the name change in a separate patch to change both test file names.
 
  Carl 


[PATCH] rs6000, change altivec*-runnable.c test file names

2024-06-21 Thread Carl Love
GCC maintainers:

Per the discussion of the dg header changes for test files altivec-1-runnable.c 
and altivec-2-runnable.c it was decided it would be best to change the names of 
the two tests to better align them with the tests that they are better aligned 
with.

This patch is dependent on the two patches to update the dg arguments for test 
files altivec-1-runnable.c and altivec-2-runnable.c being accepted and 
committed before this patch.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--
rs6000, change altivec*-runnable.c test file names

Changed the names of the test files.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the name to
altivec-38.c.
* gcc.target/powerpc/altivec-2-runnable.c: Change the name to
p8vector-builtin-9.c.
---
 .../gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} | 0
 .../powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c}| 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} 
(100%)
 rename gcc/testsuite/gcc.target/powerpc/{altivec-2-runnable.c => 
p8vector-builtin-9.c} (100%)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-38.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
-- 
2.45.0



Re: [PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-21 Thread Carl Love
Kewen:

On 6/21/24 03:37, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/20 00:18, Carl Love wrote:
>> GCC maintainers:
>>
>> The dg options for this test should be the same as for altivec-2-runnable.c. 
>>  This patch updates the dg options to match 
>> the settings in altivec-2-runnable.c.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> --From
>>  289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c 
>> update the require-effective-target
>>
>> The test requires a minimum of Power8 vector HW and a compile level
>> of -O2.
> 
> This is not true, vec_unpackh and vec_unpackl doesn't require power8,
> vupk[hl]s[hb]/vupk[hl]px are all ISA 2.03.
> 
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-1-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> index da8ebbc30ba..c113089c13a 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> @@ -1,6 +1,7 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> -/* { dg-require-effective-target powerpc_altivec_ok } */
>> -/* { dg-options "-maltivec" } */
>> +/* { dg-do run { target vsx_hw } } */
> 
> So this line should check for vmx_hw.

OK, fingers are used to typing vsx   Fixed.

> 
>> +/* { dg-do compile { target { ! vmx_hw } } } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> With more thinking, I think it's better to use
> "-O2 -maltivec" to be consistent with the others.

OK, changed it back.  We now have:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-options "-O2 -maltivec" } */
/* { dg-require-effective-target powerpc_altivec } */

The regression test runs fine with the above.  Two passes, no failures.


> 
> As mentioned in the other thread, powerpc_altivec
> effective target check should guarantee the altivec
> feature support, if any default cpu type or user
> specified option disable altivec, this test case
> will not be tested.  If we specify one cpu type
> specially here, it may cause confusion why it's
> different from the other existing ones.  So let's
> go without no specified cpu type.
> 
> Besides, similar to the request for altivec-1-runnable.c,
> could you also rename this to altivec-38.c?

OK, will change the names for the two test cases at the same time in a separate 
patch.
 
 Carl 


[PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-21 Thread Carl Love
GCC maintainers:

version 2, update the dg options per the feedback.  Retested the patch on Power 
10 with no regressions.

This patch updates the dg options.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

-- 
rs6000, altivec-1-runnable.c update the require-effective-target

Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index da8ebbc30ba..3f084c91798 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,6 +1,7 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vmx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -maltivec" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH 4/7 v2] lto: Implement ltrans cache

2024-06-21 Thread Andi Kleen


FWIW I suspect not handling lockfile errors could be a show stopper
even for an initial implementation.  It's not that uncommon that people
press Ctrl-C. flock on systems that have it would be a safer
alternative.

> There are many things to do and I think it is better to do that in trunk
> rahter than cumulating relatively complex changes on branch.
> md5 is already supported by libiberty so it is kind of easy choice for
> first cut implementation.

At least use sha1 then. This is also in libiberty and it has hardware
acceleration on modern x86.

-Andi


Re: [PATCH 4/7 v2] lto: Implement ltrans cache

2024-06-21 Thread Andi Kleen
On Fri, Jun 21, 2024 at 06:59:05PM +0200, Michal Jireš wrote:
> > The lockfiles scare me a bit. What happens when they get lost, e.g.
> > due to a compiler crash? You may need some recovery for that.
> > Perhaps it would be better to make the files self checking, so that
> > partial files can be detected when reading, and get rid of the locks.
> 
> It uses process-associated locks via fcntl, so if the compiler crashes,
> the locks will be released. If the compiler process crashes and leaves
> partially written file, the lto-wrapper deletes it in tool_cleanup.
> If a file is missing, the cache entry will be deleted.

Sounds good to me.

-Andi


  1   2   >