date:20241013

Re: [PATCH v6] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Yangyu Chen



> On Oct 7, 2024, at 18:15, Kito Cheng  wrote:
> 
> Could you implement the latest API defined in the doc?
> 
> struct {
> unsigned length;
> unsigned long long features[];
> } __riscv_feature_bits;
> 
> struct {
> unsigned length;
> unsigned long long features[];
> } __riscv_vendor_feature_bits;
> 
> struct {
> unsigned mvendorid;
> unsigned marchid;
> unsigned mimpid;
> } __riscv_cpu_model;
> 

I doubt whether we should use unsigned instead of unsigned long for marchid and 
mimpid since they are XLEN-sized CSR:

https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74#discussion_r1798166307

I will submit another patch when this question has been resolved.

Re: [PATCH] warning option for traps (-Wtrap)

2024-10-13 Thread Richard Biener

On Sat, 12 Oct 2024, Martin Uecker wrote:

> Am Samstag, dem 12.10.2024 um 18:44 +0200 schrieb Richard Biener:
> > 
> > > Am 12.10.2024 um 16:43 schrieb Martin Uecker :
> > > 
> > > 
> > > There is code which should not fail at run-time.  For this,
> > > it is helpful to get a warning when a compiler inserts traps
> > > (e.g. sanitizers, hardbools, __builtin_trap(), etc.).
> > > 
> > > Having a warning for this also has many other use cases, e.g.
> > > one can use it with some sanitizer to rule out that some
> > > piece of code has certain undefined behavior such as
> > > signed overflow or undefined behavior in left-shifts
> > > (one gets a warning if the optimizer does not prove the
> > > trap is dead and it is emitted).
> > > 
> > > Another potential use case could be writing tests.
> > > 
> > > 
> > > Bootstrapped and regression tested on x64_84.
> > > 
> > > 
> > >Add warning option that warns when a trap is generated.
> > > 
> > >This adds a warning option -Wtrap that is emitted in
> > >expand_builtin_trap.  It can be used to verify that traps
> > >are generated or - more importantly - are not generated
> > >under various conditions, e.g. for UBSan with -fsanitize-trap,
> > >hardbools, etc.
> > 
> > Isn’t it better to diagnose with more context from the callers that insert 
> > the trap?
> 
> More context would be better.  Should there be additional
> arguments when creating the call to the builtin?

Why not diagnose when we create the call?  But sure, adding a diagnostic
argument would work, it might also work to distinguish calls we want to
diagnose from those we don't.

Richard.

[releases/gcc-14 PATCH COMMITTED] testsuite: fix PR111613 test

2024-10-13 Thread Sam James

PR ipa/111613
* gcc.c-torture/pr111613.c: Rename to..
* gcc.c-torture/execute/pr111613.c: ...this.

(cherry picked from commit 5e5d7a88932b132437069f716160f8b20862890b)
---
I actually thought I'd backported this already. Pushed as obvious, as
the test issue was only introduced recently when the original change
was backported. (As discussed previously on IRC, I don't plan on backporting
my general test fixes I've been doing, but this is a different case.)

 gcc/testsuite/gcc.c-torture/{ => execute}/pr111613.c | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.c-torture/{ => execute}/pr111613.c (100%)

diff --git a/gcc/testsuite/gcc.c-torture/pr111613.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111613.c
similarity index 100%
rename from gcc/testsuite/gcc.c-torture/pr111613.c
rename to gcc/testsuite/gcc.c-torture/execute/pr111613.c
-- 
2.47.0

Re: [PATCH 0/5] Provide better definitions of NULL

2024-10-13 Thread Alejandro Colomar

Hi Joseph,

On Fri, Oct 11, 2024 at 01:44:36PM GMT, Alejandro Colomar wrote:
> Hi,
> 
> This is just an untested draft.  If there's rough agreement that this is
> wanted, I'll test it, write changelog, etc.
> 
> The intention of this change is to help improve the common C/C++
> language subset, promoting the use of NULL in both languages as the null
> pointer constant, expanding to nullptr in C++, and to ((void *)0) in C.
> 
> While C23 added nullptr, it was a terrible mistake, and it doesn't do
> any good to the C language.  (See the link below for more details.)
> Instead, it's C++ which should reconcile with C.  Let's work in that
> direction.
> 
> Link: 
> 
> Have a lovely day!
> Alex
> 
> Alejandro Colomar (5):
>   gcc/ginclude/stddef.h: Indent nested cpp conditionals
>   gcc/ginclude/stddef.h: Invert conditional
>   gcc/ginclude/stddef.h: Define NULL as nullptr if possible
>   Don't define NULL as 0 in C
>   libgm2/libm2pim/wrapc.cc: Define NULL as nullptr
> 
>  gcc/ginclude/stddef.h | 24 +++
>  .../gcc.c-torture/execute/pr68143_1.c |  2 +-
>  gcc/testsuite/gcc.c-torture/execute/pr70566.c |  2 +-
>  gcc/testsuite/gcc.dg/tm/20100615.c|  2 +-
>  gcc/testsuite/gcc.target/aarch64/pr91927.c|  2 +-
>  libgm2/libm2pim/wrapc.cc  |  4 +---
>  libiberty/alloca.c|  2 +-
>  libiberty/argv.c  |  2 +-
>  libiberty/getopt1.c   |  2 +-
>  9 files changed, 22 insertions(+), 20 deletions(-)

There are some regressions.  Below is the diff of .sum files.  In the
.log files, I see some errors saying `CRC mismatch`.  Did I do anything
wrong?

Have a lovely night!
Alex


--- ../.tmp.null.b4/./gcc/testsuite/g++/g++.sum 2024-10-13 17:49:00.772692813 
+0200
+++ ../.tmp.null/./gcc/testsuite/g++/g++.sum2024-10-13 20:45:15.537873070 
+0200
@@ -1,4 +1,4 @@
-Test run by alx on Sun Oct 13 12:28:13 2024
+Test run by alx on Sun Oct 13 19:41:09 2024
 Native configuration is x86_64-pc-linux-gnu
 
=== g++ tests ===
@@ -192400,8 +192400,7 @@
 PASS: g++.dg/modules/p1689-1.C -std=c++2b (test for excess errors)
 FAIL: g++.dg/modules/p1689-2.C -std=c++17 (test for excess errors)
 FAIL: g++.dg/modules/p1689-2.C -std=c++2a (test for excess errors)
-PASS: g++.dg/modules/p1689-2.C -std=c++2b (test for excess errors)
-FAIL: ERROR: length mismatch at rules/0/requires: actual: "1" expect: "0"
+FAIL: g++.dg/modules/p1689-2.C -std=c++2b (test for excess errors)
 PASS: g++.dg/modules/p1689-3.C -std=c++17 (test for excess errors)
 PASS: g++.dg/modules/p1689-3.C -std=c++2a (test for excess errors)
 PASS: g++.dg/modules/p1689-3.C -std=c++2b (test for excess errors)
@@ -193382,12 +193381,12 @@
 PASS: g++.dg/modules/binding-1_b.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/binding-1_b.H.gcm)
 FAIL: g++.dg/modules/binding-1_c.C -std=c++2a (test for excess errors)
 FAIL: g++.dg/modules/binding-1_c.C module-cmi hello (gcm.cache/hello.gcm)
-PASS: g++.dg/modules/binding-1_a.H -std=c++2b (test for excess errors)
-PASS: g++.dg/modules/binding-1_a.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/binding-1_a.H.gcm)
+FAIL: g++.dg/modules/binding-1_a.H -std=c++2b (test for excess errors)
+FAIL: g++.dg/modules/binding-1_a.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/binding-1_a.H.gcm)
 PASS: g++.dg/modules/binding-1_b.H -std=c++2b (test for excess errors)
 PASS: g++.dg/modules/binding-1_b.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/binding-1_b.H.gcm)
-PASS: g++.dg/modules/binding-1_c.C -std=c++2b (test for excess errors)
-PASS: g++.dg/modules/binding-1_c.C module-cmi hello (gcm.cache/hello.gcm)
+FAIL: g++.dg/modules/binding-1_c.C -std=c++2b (test for excess errors)
+FAIL: g++.dg/modules/binding-1_c.C module-cmi hello (gcm.cache/hello.gcm)
 PASS: g++.dg/modules/block-decl-1_a.C -std=c++17  (test for bogus messages, 
line 7)
 PASS: g++.dg/modules/block-decl-1_a.C -std=c++17 (test for excess errors)
 PASS: g++.dg/modules/block-decl-1_a.C module-cmi bla (gcm.cache/bla.gcm)
@@ -194824,13 +194823,13 @@
 FAIL: g++.dg/modules/global-3_b.C -std=c++2a (test for excess errors)
 UNRESOLVED: g++.dg/modules/global-3 -std=c++2a link
 UNRESOLVED: g++.dg/modules/global-3 -std=c++2a execute
-PASS: g++.dg/modules/global-3_a.C -std=c++2b (test for excess errors)
-PASS: g++.dg/modules/global-3_a.C -std=c++2b  scan-lang-dump module 
"Dependencies of decl function_decl:'::Log'"
+FAIL: g++.dg/modules/global-3_a.C -std=c++2b (test for excess errors)
+FAIL: g++.dg/modules/global-3_a.C -std=c++2b  scan-lang-dump module 
"Dependencies of decl function_decl:'::Log'"
 PASS: g++.dg/modules/global-3_a.C -std=c++2b  scan-lang-dump-not module 
"Reachable GMF '::printf[^'\\n]*' added"
-PASS: g++.dg/modules/global-3_a.C module-cmi logger (gcm.cache/logger.gcm)
-PASS: g++.dg/modules/global-3_b.C -std=c++2b (test for excess errors)
-PASS: g++.dg/modules/global-3 -std=c+

Re: [PATCH RFC] build: enable C++11 narrowing warnings

2024-10-13 Thread Eric Gallager

On Tue, Sep 24, 2024 at 2:41 AM Richard Biener
 wrote:
>
> On Mon, Sep 23, 2024 at 3:41 PM Jason Merrill  wrote:
> >
> > On 9/23/24 9:05 AM, Richard Biener wrote:
> > > On Sat, Sep 21, 2024 at 2:49 AM Jason Merrill  wrote:
> > >>
> > >> Tested x86_64-pc-linux-gnu.  OK for trunk?
> > >>
> > >> -- 8< --
> > >>
> > >> We've been using -Wno-narrowing since gcc 4.7, but at this point 
> > >> narrowing
> > >> diagnostics seem like a stable part of C++ and we should adjust.
> > >>
> > >> This patch changes -Wno-narrowing to -Wno-error=narrowing so that 
> > >> narrowing
> > >> issues will still not break bootstrap, but we can see them.
> > >>
> > >> The rest of the patch fixes the narrowing warnings I see in an
> > >> x86_64-pc-linux-gnu bootstrap.  In most of the cases, by adjusting the 
> > >> types
> > >> of various declarations so that we store the values in the same types we
> > >> compute them in, which seems worthwhile anyway.  This also allowed us to
> > >> remove a few -Wsign-compare casts.
> > >>
> > >> The one place I didn't see how best to do this was in
> > >> vect_prologue_cost_for_slp: changing const_nunits to unsigned int broke 
> > >> the
> > >> call to TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits), since
> > >> poly_uint64.is_constant wants a pointer to unsigned HOST_WIDE_INT.  So I
> > >> added casts in that one place.  Not too bad, I think.
> > >>
> > >> +   unsigned HOST_WIDE_INT foff = bitpos_of_field (field);
> > >
> > > Can you make bitpos_of_field return unsigned HOST_WIDE_INT then and 
> > > adjust it
> > > accordingly - it looks for shwi fitting but negative DECL_FIELD_OFFSET
> > > or BIT_OFFSET are not a thing.
> >
> > So, like the attached?
> >
> > >> @@ -7471,7 +7471,8 @@ vect_prologue_cost_for_slp (slp_tree node,
> > >> nelt_limit = const_nunits;
> > >> hash_set vector_ops;
> > >> for (unsigned int i = 0; i < SLP_TREE_NUMBER_OF_VEC_STMTS 
> > >> (node); ++i)
> > >> -   if (!vector_ops.add ({ ops, i * const_nunits, const_nunits }))
> > >
> > > So why do we diagnose this case (unsigned int member) but not ...
> > >
> > >> +   if (!vector_ops.add
> > >> +   ({ ops, i * (unsigned)const_nunits, (unsigned)const_nunits 
> > >> }))
> > >>starts.quick_push (i * const_nunits);
> > >
> > > ... this one - unsigned int function argument?
> >
> > Because the former is in { }, and the latter isn't; narrowing
> > conversions are only ill-formed within { }.
> >
> > > I think it would be slightly better to do
> > >
> > >  {
> > > unsigned start = (unsigned) const_units * i;
> > > if (!vector_ops.add ({ ops, start, const_unints }))
> > >   starts.quick_push (start);
> > >  }
> > >
> > > to avoid the non-obvious difference between both.
> >
> > We'd still need the cast for the third element, but now I notice we can
> > use nelt_limit instead since it just got the same value.
> >
> > So, OK with this supplemental patch?
>
> OK.
>
> Thanks,
> Richard.

Note that libcpp uses -Wno-narrowing, too; perhaps it's worth making
this change there, as well?

[PATCH] tree-optimization/116907 - stale BLOCK reference from DECL_VALUE_EXPR

2024-10-13 Thread Richard Biener

When we remove unused BLOCKs we fail to clean references to them
from DECL_VALUE_EXPRs of variables in other BLOCKs which in the
PR causes LTO streaming to walk into pointers to GGC freed blocks.

There's the question of whether such DECL_VALUE_EXPRs should keep
variables and blocks referenced live (it doesn't seem to do that)
and whether such DECL_VALUE_EXPRs should have survived in the
first place.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/116907
* tree-ssa-live.cc (clear_unused_block_pointer_in_block): New
helper.
(clear_unused_block_pointer): Call it.
---
 gcc/tree-ssa-live.cc | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index 0739faa022e..484698899cf 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -612,6 +612,22 @@ clear_unused_block_pointer_1 (tree *tp, int *, void *)
   return NULL_TREE;
 }
 
+/* Clear references to unused BLOCKs from DECL_VALUE_EXPRs of variables
+   in BLOCK.  */
+
+static void
+clear_unused_block_pointer_in_block (tree block)
+{
+  for (tree t = BLOCK_VARS (block); t; t = DECL_CHAIN (t))
+if (VAR_P (t) && DECL_HAS_VALUE_EXPR_P (t))
+  {
+   tree val = DECL_VALUE_EXPR (t);
+   walk_tree (&val, clear_unused_block_pointer_1, NULL, NULL);
+  }
+  for (tree t = BLOCK_SUBBLOCKS (block); t; t = BLOCK_CHAIN (t))
+clear_unused_block_pointer_in_block (t);
+}
+
 /* Set all block pointer in debug or clobber stmt to NULL if the block
is unused, so that they will not be streamed out.  */
 
@@ -667,6 +683,10 @@ clear_unused_block_pointer (void)
  walk_tree (gimple_op_ptr (stmt, i), clear_unused_block_pointer_1,
 NULL, NULL);
   }
+
+  /* Walk all variables mentioned in the functions BLOCK tree and clear
+ DECL_VALUE_EXPR from unused blocks where present.  */
+  clear_unused_block_pointer_in_block (DECL_INITIAL (current_function_decl));
 }
 
 /* Dump scope blocks starting at SCOPE to FILE.  INDENT is the
-- 
2.43.0

[PATCH] c++: Restore rust front-end build [PR117114]

2024-10-13 Thread Simon Martin

The patch that I merged via r15-4282-g60163c85730e6b breaks the build
for the rust front-end because it does not work well when virtual
inheritance is in play.

The problem is that in such a case, an overrider and its overridden base
method might have a different DECL_VINDEX, and the derived method would
be incorrectly considered as hiding the base one.

This patch fixes this by not comparing the DECL_VINDEX anymore, but
rather going back to comparing the signatures, only after having
excluded conversion operators to different types.

I'm currently running the testsuite on x86_64-pc-linux-gnu, and already
verified that the rust front-end builds again with the patch applied.

OK if the testsuite run finishes successfully?

PR c++/117114

gcc/cp/ChangeLog:

* class.cc (warn_hidden): Don't compare DECL_VINDEX but
signatures, after having excluded operators to different types.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Woverloaded-virt10.C: New test.

---
 gcc/cp/class.cc   | 20 +--
 .../g++.dg/warn/Woverloaded-virt10.C  |  7 +++
 2 files changed, 17 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index f754886a60a..fce71483ccc 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -3279,25 +3279,25 @@ warn_hidden (tree t)
  {
num_fns++;
tree fndecl_vindex = DECL_VINDEX (fndecl);
-   /* If the method from the base class has the same DECL_VINDEX
-  as the method from the derived, it has been overridden.
+   /* If the methods from the base and derived classes have the
+  same signature and they are not conversion operators to
+  different types, then the base method has been overriden.
   Note that we can't move on after finding one match: fndecl
   might override multiple base fns.  */
for (size_t k = 0; k < base_fndecls.length (); k++)
  {
if (!base_fndecls[k] || !DECL_VINDEX (base_fndecls[k]))
  continue;
-   tree base_fndecl_vindex = DECL_VINDEX (base_fndecls[k]);
-   if (fndecl_vindex
-   && !tree_int_cst_compare (fndecl_vindex,
- base_fndecl_vindex))
- overriden_base_fndecls.add (base_fndecls[k]);
-   else if (IDENTIFIER_CONV_OP_P (name)
-&& (DECL_NAME (fndecl)
-!= DECL_NAME (base_fndecls[k])))
+   if (IDENTIFIER_CONV_OP_P (name)
+   && (DECL_NAME (fndecl)
+   != DECL_NAME (base_fndecls[k])))
  /* If base_fndecl[k] and fndecl are conversion operators
 to different types, they're unrelated.  */
  ;
+   else if (same_signature_p (fndecl, base_fndecls[k]))
+ /* If fndecl has the same signature as base_fndecl[k], it
+overrides it.  */
+ overriden_base_fndecls.add (base_fndecls[k]);
else
  hidden_base_fndecls.put (base_fndecls[k], fndecl);
  }
diff --git a/gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C 
b/gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C
new file mode 100644
index 000..5e669360ec9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C
@@ -0,0 +1,7 @@
+// PR c++/117114
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -Wall }
+
+struct Troops { virtual ~Troops(); };
+struct Control { virtual int GetControl() const = 0; };
+struct Army : Troops, Control { int GetControl() const override; };
-- 
2.44.0

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Richard Biener

On Sun, 13 Oct 2024, Hongtao Liu wrote:

> On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu  wrote:
> >
> > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener  wrote:
> > >
> > > The following helps the x86 backend by canonicalizing FMAs to have
> > > any negation done to one of the commutative multiplication operands
> > > be done to a register (and not a memory operand).  Likewise to
> > > put a register operand first and a memory operand second;
> > > swap_commutative_operands_p seems to treat REG_P and MEM_P the
> > > same but comments indicate "complex expressiosn should be first".
> > >
> > > In particular this does (fma MEM REG REG) -> (fma REG MEM REG) and
> > > (fma (neg MEM) REG REG) -> (fma (neg REG) MEM REG) which are the
> > > reasons for the testsuite regressions in gcc.target/i386/cond_op_fma*.c
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > I'm not quite sure this is the correct approach - simplify-rtx
> > > doesn't seem to do "only canonicalization" but the existing FMA
> > > case looks odd in that context.
> > >
> > > Should the target simply reject cases with wrong "canonicalization"
> > > or does it need to cope with all variants in the patterns that fail
> > > matching during combine without the change?
> > Let me try the backend fix first.
> The backend fix requires at least 8 more patterns, so I think RTL
> canonicalization would be better.
> Please go ahead with the patch.

I'm still looking for insights on how we usually canonicalize on RTL
(and where we document canonicalizations) and how we maintain RTL
in canonical form.

I'm also still wondering why the backend accepts "non-canonical" RTL
instead of rejecting it, giving the middle-end the chance to try
an alternative variant?

Richard.

> > >
> > > Thanks,
> > > Richard.
> > >
> > > PR target/117072
> > > * simplify-rtx.cc (simplify_context::simplify_ternary_operation):
> > > Adjust FMA canonicalization.
> > > ---
> > >  gcc/simplify-rtx.cc | 15 +--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > > index e8e60404ef6..8b4fa0d7aa4 100644
> > > --- a/gcc/simplify-rtx.cc
> > > +++ b/gcc/simplify-rtx.cc
> > > @@ -6830,10 +6830,21 @@ simplify_context::simplify_ternary_operation 
> > > (rtx_code code, machine_mode mode,
> > > op0 = tem, op1 = XEXP (op1, 0), any_change = true;
> > > }
> > >
> > > -  /* Canonicalize the two multiplication operands.  */
> > > +  /* Canonicalize the two multiplication operands.  A negation
> > > +should go first and if possible the negation should be
> > > +to a register.  */
> > >/* a * -b + c  =>  -b * a + c.  */
> > > -  if (swap_commutative_operands_p (op0, op1))
> > > +  if (swap_commutative_operands_p (op0, op1)
> > > + || (REG_P (op1) && !REG_P (op0) && GET_CODE (op0) != NEG))
> > > std::swap (op0, op1), any_change = true;
> > > +  else if (GET_CODE (op0) == NEG && !REG_P (XEXP (op0, 0))
> > > +  && REG_P (op1))
> > > +   {
> > > + op0 = XEXP (op0, 0);
> > > + op1 = simplify_gen_unary (NEG, mode, op1, mode);
> > > + std::swap (op0, op1);
> > > + any_change = true;
> > > +   }
> > >
> > >if (any_change)
> > > return gen_rtx_FMA (mode, op0, op1, op2);
> > > --
> > > 2.43.0
> >
> >
> >
> > --
> > BR,
> > Hongtao
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

[PATCH] tree-optimization/116290 - fix compare-debug issue in ldist

2024-10-13 Thread Richard Biener

Loop distribution does different analysis with -g0/-g due to counting
a debug stmt starting a BB against a limit which will everntually
lead to different IVOPTs choices.  I've fixed a possible IVOPTs
issue on the way even though it doesn't make a difference here.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/116290
* tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs
have no debug variants.  Start with first non-debug real stmt.
* tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze
debug stmts.

* gcc.dg/pr116290.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr116290.c | 18 ++
 gcc/tree-loop-distribution.cc   |  6 +++---
 gcc/tree-ssa-loop-ivopts.cc |  3 ++-
 3 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr116290.c

diff --git a/gcc/testsuite/gcc.dg/pr116290.c b/gcc/testsuite/gcc.dg/pr116290.c
new file mode 100644
index 000..97b946bda89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr116290.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O2 -fcompare-debug" } */
+
+char *camel_message_info_class_intern_init_part;
+void g_once_init_enter();
+void camel_message_info_class_intern_init() {
+  int ii;
+  char *label;
+  for (; camel_message_info_class_intern_init_part[ii]; ii++)
+if (camel_message_info_class_intern_init_part) {
+  if (label && *label)
+g_once_init_enter();
+  label = &camel_message_info_class_intern_init_part[ii + 1];
+  camel_message_info_class_intern_init_part[ii] = ' ';
+}
+  if (label)
+g_once_init_enter();
+}
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index f0430ede2f4..d7013ba5f8d 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -3552,7 +3552,7 @@ determine_reduction_stmt_1 (const loop_p loop, const 
basic_block *bbs)
   basic_block bb = bbs[i];
 
   for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
-  gsi_next_nondebug (&bsi))
+  gsi_next (&bsi))
{
  gphi *phi = bsi.phi ();
  if (virtual_operand_p (gimple_phi_result (phi)))
@@ -3565,8 +3565,8 @@ determine_reduction_stmt_1 (const loop_p loop, const 
basic_block *bbs)
}
}
 
-  for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
-  gsi_next_nondebug (&bsi), ++ninsns)
+  for (gimple_stmt_iterator bsi = gsi_start_nondebug_bb (bb);
+  !gsi_end_p (bsi); gsi_next_nondebug (&bsi), ++ninsns)
{
  /* Bail out early for loops which are unlikely to match.  */
  if (ninsns > 16)
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index dfe1b254156..7441324aec2 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -1461,7 +1461,8 @@ find_givs_in_bb (struct ivopts_data *data, basic_block bb)
   gimple_stmt_iterator bsi;
 
   for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
-find_givs_in_stmt (data, gsi_stmt (bsi));
+if (!is_gimple_debug (gsi_stmt (bsi)))
+  find_givs_in_stmt (data, gsi_stmt (bsi));
 }
 
 /* Finds general ivs.  */
-- 
2.43.0

Re: [RFC PATCH] RISC-V: Implement riscv_minimal_hwprobe_feature_bits

2024-10-13 Thread Yangyu Chen




> On Oct 7, 2024, at 20:37, Kito Cheng  wrote:
> 
> I suggest not handling the extension implication rules. This way, it
> can simplify
> the logic and reduce the cost of run-time checks.
> 
> Also, you need to consider situations where that extension can't be detected.

For this, I would also like to update the __init_riscv_features_bits_linux
function in another patch [1] to set implied extensions such as V
implies Zve32x.

[1] 
https://patchwork.sourceware.org/project/gcc/patch/20241003182256.1765569-1-chenyan...@isrc.iscas.ac.cn/

> 
> And last, I would like to defer this until run-time resolver patch coming, but
> welcome to send another version of RFC patch :P
> 

No worries. And I also provide a basic functional target_clones branch [2].

[2] https://github.com/cyyself/gcc/tree/rv_target_clones_20241007

> 
> On Sun, Oct 6, 2024 at 2:21 AM Yangyu Chen  
> wrote:
>> 
>> This patch implements the riscv_minimal_hwprobe_feature_bits feature
>> for the RISC-V target. The feature bits are defined in the previous
>> patch [1] to provide bitmasks of ISA extensions that defined in RISC-V
>> C-API. Thus, we need a function to generate the feature bits for IFUNC
>> resolver to dispatch between different functions based on the hardware
>> features. The final version of the target_clones support on RISC-V is
>> still under development, I am working on it.
>> 
>> The minimal feature bits means to use the earliest extension appeard in
>> the Linux hwprobe to cover the given ISA string. To allow older kernels
>> without some implied extensions probe to run the FMV dispatcher
>> correctly.
>> 
>> For example, V implies Zve32x, but Zve32x appears in the Linux kernel
>> since v6.11. If we use isa string directly to generate FMV dispatcher
>> with functions with "arch=+v" extension, since we have V implied the
>> Zve32x, FMV dispatcher will check if the Zve32x extension is supported
>> by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
>> will fail to detect the Zve32x extension even it already implies by the
>> V extension, thus making the FMV dispatcher fail to dispatch the correct
>> function.
>> 
>> Thus, we need to generate the minimal feature bits to cover the given
>> ISA string to allow the FMV dispatcher to work correctly on older
>> kernels.
>> 
>> [1] 
>> https://patchwork.sourceware.org/project/gcc/patch/20241003182256.1765569-1-chenyan...@isrc.iscas.ac.cn/
>> 
>> gcc/ChangeLog:
>> 
>>* common/config/riscv/riscv-common.cc
>>(struct riscv_ext_bitmask_table_t): New struct.
>>(riscv_minimal_hwprobe_feature_bits): New function.
>>* config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H):
>>(riscv_minimal_hwprobe_feature_bits): Declare the function.
>>* common/config/riscv/feature_bits.h: New file.
>> ---
>> gcc/common/config/riscv/feature_bits.h  |  33 ++
>> gcc/common/config/riscv/riscv-common.cc | 144 
>> gcc/config/riscv/riscv-subset.h |   4 +
>> 3 files changed, 181 insertions(+)
>> create mode 100644 gcc/common/config/riscv/feature_bits.h
>> 
>> diff --git a/gcc/common/config/riscv/feature_bits.h 
>> b/gcc/common/config/riscv/feature_bits.h
>> new file mode 100644
>> index 000..c6c6d983edb
>> --- /dev/null
>> +++ b/gcc/common/config/riscv/feature_bits.h
> 
> Rename to riscv_feature_bits.h to prevent potential file name conflict
> also move to gcc/config/riscv/ rather than gcc/config/riscv/
> 
>> @@ -0,0 +1,33 @@
>> +/* Definition of RISC-V feature bits corresponding to
>> +   libgcc/config/riscv/feature_bits.c
>> +   Copyright (C) 2024 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify
>> +it under the terms of the GNU General Public License as published by
>> +the Free Software Foundation; either version 3, or (at your option)
>> +any later version.
>> +
>> +GCC is distributed in the hope that it will be useful,
>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +GNU General Public License for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +.  */
>> +
> 
> Need header guard, something like #ifndef GCC_RISCV_FEATURE_BITS_H
> 
>> +#define RISCV_FEATURE_BITS_LENGTH 2
>> +struct riscv_feature_bits {
>> +  unsigned length;
>> +  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
>> +};
>> +
>> +#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
>> +
>> +struct riscv_vendor_feature_bits {
>> +  unsigned vendorID;
>> +  unsigned length;
>> +  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
>> +};
>> diff --git a/gcc/common/config/riscv/riscv-common.cc 
>> b/gcc/common/config/riscv/riscv-common.cc
>> index bd42fd01532..9f343782ae6 100644
>> --- a/gcc/common/config/riscv/riscv-common.cc
>> +++ b/gcc/c

Re: [Patch] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device

2024-10-13 Thread Tobias Burnus


Now pushed as r15-4298-g3269a722b7a036.

Thanks to Bernhard for some proof reading!

Tobias

Tobias Burnus wrote:
Anyone feeling like reviewing this patch? (Mainly the Fortran side?!?) 
— Or should I declare it as OpenMP/(OpenACC) patch and just commit it?


→ https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664985.html

Tobias

Tobias Burnus write:
I forgot to update the subject line. To make it easier to find (patch 
archeology), now with proper subject line …


Tobias Burnus wrote:

Sometimes waiting a bit leads to better code …

Tobias Burnus wrote:

...
[I guess, we eventually want to add support for more builtins. For 
instance, acc_on_device would be a candidate, but I could imagine 
some additional builtins.]


I have now implemented acc_on_device and I think the new fix-up 
function is way is nicer.


Thus, this patch does:

* (v1) Fix omp_is_initial_device → do only replace when used in 
calls (and not when used as function pointer/actual to a dummy 
function) + fix ICE due to integer(4) != logical(4) in the middle end.


* (new) For OpenACC, use a builtin for acc_on_device + actually do 
compile-time optimization when offloading is not configured.


* (new) libgomp.texi: Typo fixes accumulated, fix wording, and for 
acc_on_device, add a note that compile-time folding may be done (and 
how it can be disabled).


For OpenACC, I now mix compile time folding vs. runtime to ensure 
that it works.


Tested on x86-64 without and with offloading configured, running 
with nvptx offloading.


Code review, comments, suggestions, remarks?

Tobias

PS: The testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c 
example is not completely clear to me; however, the new optimization 
causes that without offloading enabled, the dump message is not 
shown. I tried to understand it better with 
-fno-builtin-acc_on_device, but that then caused link errors as the 
device function wasn't optimizated away, leaving me puzzled. — At 
the end, I just changed the dg-* and did not try to understand the 
issue.

Re: [PATCH] warning option for traps (-Wtrap)

2024-10-13 Thread Martin Uecker

Am Sonntag, dem 13.10.2024 um 10:56 +0200 schrieb Richard Biener:
> On Sat, 12 Oct 2024, Martin Uecker wrote:
> 
> > Am Samstag, dem 12.10.2024 um 18:44 +0200 schrieb Richard Biener:
> > > 
> > > > Am 12.10.2024 um 16:43 schrieb Martin Uecker :
> > > > 
> > > > 
> > > > There is code which should not fail at run-time.  For this,
> > > > it is helpful to get a warning when a compiler inserts traps
> > > > (e.g. sanitizers, hardbools, __builtin_trap(), etc.).
> > > > 
> > > > Having a warning for this also has many other use cases, e.g.
> > > > one can use it with some sanitizer to rule out that some
> > > > piece of code has certain undefined behavior such as
> > > > signed overflow or undefined behavior in left-shifts
> > > > (one gets a warning if the optimizer does not prove the
> > > > trap is dead and it is emitted).
> > > > 
> > > > Another potential use case could be writing tests.
> > > > 
> > > > 
> > > > Bootstrapped and regression tested on x64_84.
> > > > 
> > > > 
> > > >Add warning option that warns when a trap is generated.
> > > > 
> > > >This adds a warning option -Wtrap that is emitted in
> > > >expand_builtin_trap.  It can be used to verify that traps
> > > >are generated or - more importantly - are not generated
> > > >under various conditions, e.g. for UBSan with -fsanitize-trap,
> > > >hardbools, etc.
> > > 
> > > Isn’t it better to diagnose with more context from the callers that 
> > > insert the trap?
> > 
> > More context would be better.  Should there be additional
> > arguments when creating the call to the builtin?
> 
> Why not diagnose when we create the call? 

I agree that having optional warnings for all situation where there
could be run-time UB (or a trap) would be useful.  But having a
generic warning for all such situations would produce many warnings
and also cover cases where we already have more specific warnings.

Doing it when the trap is generated directly gives me somewhat
different information that I sometimes need: Is there a trap left
in the generated binary?

We have a similar warning already for generating trampolines.

Before adding the warning to my local tree, I often looked at the
generated assembly to look for  generated "ud2" instructions.  But this
is painful and gives me even less context.

A practical example from failing to properly take integer
promotions into account (adapted from a old bug in a crypto
library) is:

uint32_t bar(int N, unsigned char key)
{
unsigned int kappa = key << 24;
return kappa;
}

which has UB that the warning tells me about and 
where adding a cast is required to eliminate it:
https://godbolt.org/z/osvEsdcqc


>  But sure, adding a diagnostic
> argument would work, it might also work to distinguish calls we want to
> diagnose from those we don't.

Would it be reasonable to approve this patch now and I try
to improve this later?

Martin

[PATCH] tree-optimization/116481 - avoid building function_type[]

2024-10-13 Thread Richard Biener

The following avoids building an array type with function or method
element type during diagnosing an array bound violation as this
will result in an error, rejecting a program with a not too useful
error message.  Instead build such array type manually.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/116481
* pointer-query.cc (build_printable_array_type):
Build an array types with function or method element type
manually to avoid bogus diagnostic.

* gcc.dg/pr116481.c: New testcase.
---
 gcc/pointer-query.cc| 11 +++
 gcc/testsuite/gcc.dg/pr116481.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr116481.c

diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index c8ab0571e57..a85c04128ff 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -2587,6 +2587,17 @@ array_elt_at_offset (tree artype, HOST_WIDE_INT off,
 tree
 build_printable_array_type (tree eltype, unsigned HOST_WIDE_INT nelts)
 {
+  /* Cannot build an array type of functions or methods without
+ an error diagnostic.  */
+  if (FUNC_OR_METHOD_TYPE_P (eltype))
+{
+  tree arrtype = make_node (ARRAY_TYPE);
+  TREE_TYPE (arrtype) = eltype;
+  TYPE_SIZE (arrtype) = bitsize_zero_node;
+  TYPE_SIZE_UNIT (arrtype) = size_zero_node;
+  return arrtype;
+}
+
   if (TYPE_SIZE_UNIT (eltype)
   && TREE_CODE (TYPE_SIZE_UNIT (eltype)) == INTEGER_CST
   && !integer_zerop (TYPE_SIZE_UNIT (eltype))
diff --git a/gcc/testsuite/gcc.dg/pr116481.c b/gcc/testsuite/gcc.dg/pr116481.c
new file mode 100644
index 000..3ee6d747087
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr116481.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds" } */
+
+extern void tramp ();
+
+int is_trampoline (void* function) /* { dg-bogus "arrays of functions are not 
meaningful" } */
+{
+  void* tramp_address = tramp;
+  if (!(((unsigned long)function & 3) == 2))
+return 0;
+  return (((long *) ((char*)function - 2))[0]
+ == ((long *) ((char*)tramp_address-2))[0]); /* { dg-warning "outside 
array bounds" } */
+}
-- 
2.43.0

[PATCH v7] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Yangyu Chen

From: Kito Cheng 

This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

The API is defined in the riscv-c-api-doc:
https://github.com/riscv-non-isa/riscv-c-api-doc/blob/main/src/c-api.adoc

Proposal to use unsigned long long for marchid and mimpid:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/91

Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.

Changes since v6:
- Implement __riscv_cpu_model.
- Set new sub extension bits which implied from previous extensions.

Changes since v5:
- Minor fixes on indentation.

Changes since v4:
- Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc*
  Zimop, Zcmop, Zawrs.
- Rename the return variable name of hwprobe syscall.
- Minor fixes on indentation.

Changes since v3:
- Fix non-linux build.
- Let __init_riscv_feature_bits become constructor

Changes since v2:
- Prevent it initialize more than once.

Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
  operations during the computation of the feature bit.

libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.

Co-Developed-by: Yangyu Chen 
Signed-off-by: Yangyu Chen 
---
 libgcc/config/riscv/feature_bits.c | 403 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 404 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..59c23791995
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,403 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 2
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__ ((visibility ("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__ ((visibility ("hidden"), 
nocommon));
+
+struct {
+  unsigned mvendorid;
+  unsigned long long marchid;
+  unsigned long long mimpid;
+} __riscv_cpu_model __attribute__ ((visibility ("hidden"), nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#

[releases/gcc-13 PATCH 0/2] Two backports (Valgrind noise, wrong-code)

2024-10-13 Thread Sam James

Two backports for Valgrind noise (PR109920, PR116808) and wrong-code
(PR109934, PR117100).

OK? Bootstrapped and regtested with no regressions.

Aldy Hernandez (2):
  Use delete[] in int_range destructor [PR109920]
  Remove buggy special case in irange::invert [PR109934].

 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c | 22 ++
 gcc/value-range.cc   |  8 
 gcc/value-range.h|  2 +-
 3 files changed, 23 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c


base-commit: e3c492a5ad847f5f21189d48c7ff2d0ee5920529
-- 
2.47.0

[releases/gcc-13 PATCH 2/2] Remove buggy special case in irange::invert [PR109934].

2024-10-13 Thread Sam James

From: Aldy Hernandez 

This patch removes a buggy special case in irange::invert which seems
to have been broken for a while, and probably never triggered because
the legacy code was handled elsewhere, and the non-legacy code was
using an int_range_max of int_range<255> which made it extremely
likely for num_ranges == 255.  However, with auto-resizing ranges,
int_range_max will start off at 3 and can hit this bogus code in the
unswitching code.

PR tree-optimization/109934

gcc/ChangeLog:

* value-range.cc (irange::invert): Remove buggy special case.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr109934.c: New test.

(cherry picked from commit 8d5f050dabbf6dd3b992c3b46661848dbcf30d9e)
---
I'm testing this for 12 as well. OK for 13 (and 12 if no regressions)?

 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c | 22 ++
 gcc/value-range.cc   |  8 
 2 files changed, 22 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
new file mode 100644
index ..08bd5ce95c61
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
@@ -0,0 +1,22 @@
+// { dg-do run }
+// { dg-options "-O3" }
+
+int printf(const char *, ...);
+short a;
+long b = 3, c;
+int d(int e) {
+  switch (e)
+  case 111:
+  case 222:
+  case 44:
+return 0;
+  return e;
+}
+int main() {
+  for (; a >= 0; --a)
+if (d(c + 23) - 23)
+  b = 0;
+
+  if (b != 3)
+__builtin_abort ();
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 753f5e8cc769..1ac62e3018e1 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -2833,14 +2833,6 @@ irange::invert ()
   wide_int type_min = wi::min_value (prec, sign);
   wide_int type_max = wi::max_value (prec, sign);
   m_nonzero_mask = NULL;
-  if (m_num_ranges == m_max_ranges
-  && lower_bound () != type_min
-  && upper_bound () != type_max)
-{
-  m_base[1] = wide_int_to_tree (ttype, type_max);
-  m_num_ranges = 1;
-  return;
-}
 
   // At this point, we need one extra sub-range to represent the
   // inverse.
-- 
2.47.0

[releases/gcc-13 PATCH 1/2] Use delete[] in int_range destructor [PR109920]

2024-10-13 Thread Sam James

From: Aldy Hernandez 

gcc/ChangeLog:

PR tree-optimization/109920
* value-range.h (RESIZABLE>::~int_range): Use delete[].

(cherry picked from commit 493a63af6cbab094c36a76435c12b1886328dab8)
---
 gcc/value-range.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 0284d6cedf4c..f859fe7e7a70 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -504,7 +504,7 @@ inline
 int_range::~int_range ()
 {
   if (RESIZABLE && m_base != m_ranges)
-delete m_base;
+delete[] m_base;
 }
 
 // This is an "infinite" precision irange for use in temporary
-- 
2.47.0

Re: [PATCH] tree-optimization/116481 - avoid building function_type[]

2024-10-13 Thread Jakub Jelinek

On Sun, Oct 13, 2024 at 12:50:31PM +0200, Richard Biener wrote:
> The following avoids building an array type with function or method
> element type during diagnosing an array bound violation as this
> will result in an error, rejecting a program with a not too useful
> error message.  Instead build such array type manually.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/116481
>   * pointer-query.cc (build_printable_array_type):
>   Build an array types with function or method element type
>   manually to avoid bogus diagnostic.
> 
>   * gcc.dg/pr116481.c: New testcase.

Ok.

Jakub

[PATCH] libstdc++: testsuite: adjust name_fortify test for pre-defined _FORTIFY_SOURCE

2024-10-13 Thread Sam James

Otherwise we get failures with toolchains that have _FORTIFY_SOURCE
defined already to a different value like 3.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names_fortify.cc: Undefine _FORTIFY_SOURCE.
---
I'll commit later if no objections.

 libstdc++-v3/testsuite/17_intro/names_fortify.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/17_intro/names_fortify.cc 
b/libstdc++-v3/testsuite/17_intro/names_fortify.cc
index c975412074be..f24af21f8a70 100644
--- a/libstdc++-v3/testsuite/17_intro/names_fortify.cc
+++ b/libstdc++-v3/testsuite/17_intro/names_fortify.cc
@@ -1,6 +1,7 @@
 // { dg-do compile { target *-*-linux* } }
 // { dg-add-options no_pch }
 
+#undef _FORTIFY_SOURCE
 #define _FORTIFY_SOURCE 2
 // Now we can define the macros to poison uses of non-reserved names:
 #include "names.cc"
-- 
2.47.0

Re: [PATCH] libstdc++: testsuite: adjust name_fortify test for pre-defined _FORTIFY_SOURCE

2024-10-13 Thread Jonathan Wakely

On Sun, 13 Oct 2024 at 23:23, Sam James  wrote:
>
> Otherwise we get failures with toolchains that have _FORTIFY_SOURCE
> defined already to a different value like 3.

I was going to say we could do:

#ifndef _FORTIFY_SOURCE
#define _FORTIFY_SOURCE 2
#endif

But I realised that the original names.cc test will already run with
whatever default value the toolchain uses. So this looks fine, thanks.

>
> libstdc++-v3/ChangeLog:
>
> * testsuite/17_intro/names_fortify.cc: Undefine _FORTIFY_SOURCE.
> ---
> I'll commit later if no objections.
>
>  libstdc++-v3/testsuite/17_intro/names_fortify.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/testsuite/17_intro/names_fortify.cc 
> b/libstdc++-v3/testsuite/17_intro/names_fortify.cc
> index c975412074be..f24af21f8a70 100644
> --- a/libstdc++-v3/testsuite/17_intro/names_fortify.cc
> +++ b/libstdc++-v3/testsuite/17_intro/names_fortify.cc
> @@ -1,6 +1,7 @@
>  // { dg-do compile { target *-*-linux* } }
>  // { dg-add-options no_pch }
>
> +#undef _FORTIFY_SOURCE
>  #define _FORTIFY_SOURCE 2
>  // Now we can define the macros to poison uses of non-reserved names:
>  #include "names.cc"
> --
> 2.47.0
>

[PATCH] libstdc++: Refactor std::uninitialized_{copy, fill, fill_n} algos [PR68350]

2024-10-13 Thread Jonathan Wakely

Tested x86_64-linux.

-- >8 --

This refactors the std::uninitialized_copy, std::uninitialized_fill and
std::uninitialized_fill_n algorithms to directly perform memcpy/memset
optimizations instead of dispatching to std::copy/std::fill/std::fill_n.

The reasons for this are:

- Use 'if constexpr' to simplify and optimize compilation throughput, so
  dispatching to specialized class templates is only needed for C++98
  mode.
- Relax the conditions for using memcpy/memset, because the C++20 rules
  on implicit-lifetime types mean that we can rely on memcpy to begin
  lifetimes of trivially copyable types.  We don't need to require
  trivially default constructible, so don't need to limit the
  optimization to trivial types. See PR 68350 for more details.
- The conditions on non-overlapping ranges are stronger for
  std::uninitialized_copy than for std::copy so we can use memcpy instead
  of memmove, which might be a minor optimization.
- Avoid including  in .
  It only needs some iterator utilities from that file now, which belong
  in  anyway, so this moves them there.

Several tests need changes to the diagnostics matched by dg-error
because we no longer use the __constructible() function that had a
static assert in. Now we just get straightforward errors for attempting
to use a deleted constructor.

Two tests needed more signficant changes to the actual expected results
of executing the tests, because they were checking for old behaviour
which was incorrect according to the standard.
20_util/specialized_algorithms/uninitialized_copy/64476.cc was expecting
std::copy to be used for a call to std::uninitialized_copy involving two
trivially copyable types. That was incorrect behaviour, because a
non-trivial constructor should have been used, but using std::copy used
trivial default initialization followed by assignment.
20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc was testing
the behaviour with a non-integral Size passed to uninitialized_fill_n,
but I wrote the test looking at the requirements of uninitialized_copy_n
which are not the same as uninitialized_fill_n. The former uses --n and
tests n > 0, but the latter just tests n-- (which will never be false
for a floating-point value with a fractional part).

libstdc++-v3/ChangeLog:

PR libstdc++/68350
PR libstdc++/93059
* include/bits/stl_algobase.h (__niter_base, __niter_wrap): Move
to ...
* include/bits/stl_iterator.h: ... here.
* include/bits/stl_uninitialized.h (__check_constructible)
(_GLIBCXX_USE_ASSIGN_FOR_INIT): Remove.
[C++98] (__unwrappable_niter): New trait.
(__uninitialized_copy): Replace use of std::copy.
(uninitialized_copy): Fix Doxygen comments. Open-code memcpy
optimization for C++11 and later.
(__uninitialized_fill): Replace use of std::fill.
(uninitialized_fill): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
(__uninitialized_fill_n): Replace use of std::fill_n.
(uninitialized_fill_n): Fix Doxygen comments. Open-code memset
optimization for C++11 and later.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/64476.cc:
Adjust expected behaviour to match what the standard specifies.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/sizes.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error directives.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc:
Likewise.
* 
testsuite/20_util/specialized_algorithms/uninitialized_copy_n/89164.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc:
Likewise.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/89164.cc:
Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Likewise.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Likewise.
---
 libstdc++-v3/include/bits/stl_algobase.h  |  45 --
 libstdc++-v3/include/bits/stl_iterator.h  |  54 +++
 libstdc++-v3/include/bits/stl_uninitialized.h | 385 +-
 .../uninitialized_copy/1.cc   |   3 +-
 .../uninitialized_copy/64476.cc   |   6 +-
 .../uninitialized_copy/89164.cc   |   3 +-
 .../uninitialized_copy_n/89164.cc |   3 +-
 .../uninitialized_fill/89164.cc   |   3 +-
 .../uninitialized_fill_n/89164.cc |   3 +-
 .../uninitialized_fill_n/sizes.cc |  22 +-
 .../23_containers/vector/cons/89164.cc|   5 +-
 .../23_containers/vector/cons/89164_c++17.cc  |   3 +-
 12 files changed, 383 insertions(+), 152 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 384e5fdcdc9..751b7ad119b 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstd

[PATCH] libstdc++: Implement LWG 3798 for range adaptors [PR106676]

2024-10-13 Thread Jonathan Wakely

Tested x86_64-linux.

-- >8 --

LWG 3798 modified the iterator_category of the iterator types for
transform_view, join_with_view, zip_transform_view and
adjacent_transform_view, to allow the iterator's reference type to be an
rvalue reference.

libstdc++-v3/ChangeLog:

PR libstdc++/106676
* include/bits/iterator_concepts.h (__cpp17_fwd_iterator): Use
is_reference instead of is_value_reference.
rvalue references.
* include/std/ranges (transform_view:__iter_cat::_S_iter_cat):
Likewise.
(zip_transform_view::__iter_cat::_S_iter_cat): Likewise.
(adjacent_transform_view::__iter_cat::_S_iter_cat): Likewise.
(join_with_view::__iter_cat::_S_iter_cat): Likewise.
* testsuite/std/ranges/adaptors/transform.cc: Check
iterator_category when the transformation function returns an
rvalue reference type.
---
 libstdc++-v3/include/bits/iterator_concepts.h|  4 +++-
 libstdc++-v3/include/std/ranges  | 16 
 .../testsuite/std/ranges/adaptors/transform.cc   | 16 
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index 490a362cdf1..669d3ddfd1e 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -333,10 +333,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
typename incrementable_traits<_Iter>::difference_type>;
};
 
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3798. Rvalue reference and iterator_category
 template
   concept __cpp17_fwd_iterator = __cpp17_input_iterator<_Iter>
&& constructible_from<_Iter>
-   && is_lvalue_reference_v>
+   && is_reference_v>
&& same_as>,
   typename indirectly_readable_traits<_Iter>::value_type>
&& requires(_Iter __it)
diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index f0d81cbea0c..941189d65c3 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1892,7 +1892,9 @@ namespace views::__adaptor
using _Fpc = __detail::__maybe_const_t<_Const, _Fp>;
using _Base = transform_view::_Base<_Const>;
using _Res = invoke_result_t<_Fpc&, range_reference_t<_Base>>;
-   if constexpr (is_lvalue_reference_v<_Res>)
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3798. Rvalue reference and iterator_category
+   if constexpr (is_reference_v<_Res>)
  {
using _Cat
  = typename 
iterator_traits>::iterator_category;
@@ -5047,7 +5049,9 @@ namespace views::__adaptor
  using __detail::__range_iter_cat;
  using _Res = invoke_result_t<__maybe_const_t<_Const, _Fp>&,
   
range_reference_t<__maybe_const_t<_Const, _Vs>>...>;
- if constexpr (!is_lvalue_reference_v<_Res>)
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 3798. Rvalue reference and iterator_category
+ if constexpr (!is_reference_v<_Res>)
return input_iterator_tag{};
  else if constexpr ((derived_from<__range_iter_cat<_Vs, _Const>,
   random_access_iterator_tag> && ...))
@@ -5820,7 +5824,9 @@ namespace views::__adaptor
   using _Res = invoke_result_t<__unarize<__maybe_const_t<_Const, _Fp>&, 
_Nm>,
   range_reference_t<_Base>>;
   using _Cat = typename 
iterator_traits>::iterator_category;
-  if constexpr (!is_lvalue_reference_v<_Res>)
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3798. Rvalue reference and iterator_category
+  if constexpr (!is_reference_v<_Res>)
return input_iterator_tag{};
   else if constexpr (derived_from<_Cat, random_access_iterator_tag>)
return random_access_iterator_tag{};
@@ -7228,7 +7234,9 @@ namespace views::__adaptor
  using _OuterCat = typename 
iterator_traits<_OuterIter>::iterator_category;
  using _InnerCat = typename 
iterator_traits<_InnerIter>::iterator_category;
  using _PatternCat = typename 
iterator_traits<_PatternIter>::iterator_category;
- if constexpr 
(!is_lvalue_reference_v,
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 3798. Rvalue reference and iterator_category
+ if constexpr 
(!is_reference_v,
  
iter_reference_t<_PatternIter>>>)
return input_iterator_tag{};
  else if constexpr (derived_from<_OuterCat, bidirectional_iterator_tag>
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
index ca695349650..e1b74353e63 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
@@ -214,6 +214,21 @@ test10()

[PATCH] libstdc++: Implement LWG 3564 for ranges::transform_view

2024-10-13 Thread Jonathan Wakely

Tested x86_64-linux.

-- >8 --

The _Iterator type returned by begin() const uses const F& to
transform the elements, so it should use const F& to determine the
iterator's value_type and iterator_category as well.

This was accepted into the WP in July 2022.

libstdc++-v3/ChangeLog:

* include/std/ranges (transform_view:_Iterator): Use const F&
to determine value_type and iterator_category of
_Iterator, as per LWG 3564.
* testsuite/std/ranges/adaptors/transform.cc: Check value_type
and iterator_category.
---
 libstdc++-v3/include/std/ranges   |  9 +++--
 .../std/ranges/adaptors/transform.cc  | 19 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 6e6e3b97d82..f0d81cbea0c 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1886,8 +1886,12 @@ namespace views::__adaptor
  static auto
  _S_iter_cat()
  {
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3564. transform_view::iterator::value_type and
+   // iterator_category should use const F&
+   using _Fpc = __detail::__maybe_const_t<_Const, _Fp>;
using _Base = transform_view::_Base<_Const>;
-   using _Res = invoke_result_t<_Fp&, range_reference_t<_Base>>;
+   using _Res = invoke_result_t<_Fpc&, range_reference_t<_Base>>;
if constexpr (is_lvalue_reference_v<_Res>)
  {
using _Cat
@@ -1927,6 +1931,7 @@ namespace views::__adaptor
  return input_iterator_tag{};
  }
 
+ using _Fpc = __detail::__maybe_const_t<_Const, _Fp>;
  using _Base_iter = iterator_t<_Base>;
 
  _Base_iter _M_current = _Base_iter();
@@ -1936,7 +1941,7 @@ namespace views::__adaptor
  using iterator_concept = decltype(_S_iter_concept());
  // iterator_category defined in __transform_view_iter_cat
  using value_type
-   = remove_cvref_t>>;
+   = remove_cvref_t>>;
  using difference_type = range_difference_t<_Base>;
 
  _Iterator() requires default_initializable<_Base_iter> = default;
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
index bcb18a3fc6c..ca695349650 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/transform.cc
@@ -196,6 +196,24 @@ test09()
 #endif
 }
 
+void
+test10()
+{
+  struct F {
+short operator()(int) { return 0; }
+const int& operator()(const int& i) const { return i; }
+  };
+
+  int x[] {2, 4};
+  const auto xform = x | views::transform(F{});
+  using const_iterator = decltype(xform.begin());
+  // LWG 3564. transform_view::iterator::value_type and iterator_category
+  // should use const F&
+  static_assert(std::same_as, int>);
+  using cat = std::iterator_traits::iterator_category;
+  static_assert(std::same_as);
+}
+
 int
 main()
 {
@@ -208,4 +226,5 @@ main()
   test07();
   test08();
   test09();
+  test10();
 }
-- 
2.46.2

[PATCH] libstdc++: Use std::move for iterator in ranges::fill [PR117094]

2024-10-13 Thread Jonathan Wakely

Tested x86_64-linux.

-- >8 --

Input iterators aren't required to be copyable.

libstdc++-v3/ChangeLog:

PR libstdc++/117094
* include/bits/ranges_algobase.h (__fill_fn): Use std::move for
iterator that might not be copyable.
* testsuite/25_algorithms/fill/constrained.cc: Check
non-copyable iterator with sized sentinel.
---
 libstdc++-v3/include/bits/ranges_algobase.h   |  2 +-
 .../25_algorithms/fill/constrained.cc | 34 +++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index 3c8d46198c5..0345ea850a4 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -592,7 +592,7 @@ namespace ranges
if constexpr (sized_sentinel_for<_Sent, _Out>)
  {
const auto __len = __last - __first;
-   return ranges::fill_n(__first, __len, __value);
+   return ranges::fill_n(std::move(__first), __len, __value);
  }
else if constexpr (is_scalar_v<_Tp>)
  {
diff --git a/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc
index 126515eddca..7cae99f2d5c 100644
--- a/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/fill/constrained.cc
@@ -83,9 +83,43 @@ test02()
   return ok;
 }
 
+void
+test03()
+{
+  // Bug libstdc++/117094 - ranges::fill misses std::move for output_iterator
+
+  // Move-only output iterator
+  struct Iterator
+  {
+using difference_type = long;
+Iterator(int* p) : p(p) { }
+Iterator(Iterator&&) = default;
+Iterator& operator=(Iterator&&) = default;
+int& operator*() const { return *p; }
+Iterator& operator++() { ++p; return *this; }
+Iterator operator++(int) { return Iterator(p++ ); }
+int* p;
+
+struct Sentinel
+{
+  const int* p;
+  bool operator==(const Iterator& i) const { return p == i.p; }
+  long operator-(const Iterator& i) const { return p - i.p; }
+};
+
+long operator-(Sentinel s) const { return p - s.p; }
+  };
+  static_assert(std::sized_sentinel_for);
+  int a[2];
+  std::ranges::fill(Iterator(a), Iterator::Sentinel{a+2}, 999);
+  VERIFY( a[0] == 999 );
+  VERIFY( a[1] == 999 );
+}
+
 int
 main()
 {
   test01();
   static_assert(test02());
+  test03();
 }
-- 
2.46.2

Re: [PATCH] libstdc++: testsuite: adjust name_fortify test for pre-defined _FORTIFY_SOURCE

2024-10-13 Thread Sam James

Jonathan Wakely  writes:

> On Sun, 13 Oct 2024 at 23:23, Sam James  wrote:
>>
>> Otherwise we get failures with toolchains that have _FORTIFY_SOURCE
>> defined already to a different value like 3.
>
> I was going to say we could do:
>
> #ifndef _FORTIFY_SOURCE
> #define _FORTIFY_SOURCE 2
> #endif
>
> But I realised that the original names.cc test will already run with
> whatever default value the toolchain uses. So this looks fine, thanks.

Thanks, pushed.

> [...]

Re: [PATCH] c++: Restore rust front-end build [PR117114]

2024-10-13 Thread Simon Martin

The patch failed so I’ve reverted the offending commit for the
moment; commit r15-4301-ga4eec6c712ec16 refers.

I’ll work on a resubmit via the initial PR (109918).

Apologies for the breakage.

Simon

On 13 Oct 2024, at 13:55, Simon Martin wrote:

> The patch that I merged via r15-4282-g60163c85730e6b breaks the build
> for the rust front-end because it does not work well when virtual
> inheritance is in play.
>
> The problem is that in such a case, an overrider and its overridden 
> base
> method might have a different DECL_VINDEX, and the derived method 
> would
> be incorrectly considered as hiding the base one.
>
> This patch fixes this by not comparing the DECL_VINDEX anymore, but
> rather going back to comparing the signatures, only after having
> excluded conversion operators to different types.
>
> I'm currently running the testsuite on x86_64-pc-linux-gnu, and 
> already
> verified that the rust front-end builds again with the patch applied.
>
> OK if the testsuite run finishes successfully?
>
>   PR c++/117114
>
> gcc/cp/ChangeLog:
>
>   * class.cc (warn_hidden): Don't compare DECL_VINDEX but
>   signatures, after having excluded operators to different types.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/warn/Woverloaded-virt10.C: New test.
>
> ---
>  gcc/cp/class.cc   | 20 
> +--
>  .../g++.dg/warn/Woverloaded-virt10.C  |  7 +++
>  2 files changed, 17 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C
>
> diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
> index f754886a60a..fce71483ccc 100644
> --- a/gcc/cp/class.cc
> +++ b/gcc/cp/class.cc
> @@ -3279,25 +3279,25 @@ warn_hidden (tree t)
> {
>   num_fns++;
>   tree fndecl_vindex = DECL_VINDEX (fndecl);
> - /* If the method from the base class has the same DECL_VINDEX
> -as the method from the derived, it has been overridden.
> + /* If the methods from the base and derived classes have the
> +same signature and they are not conversion operators to
> +different types, then the base method has been overriden.
>  Note that we can't move on after finding one match: fndecl
>  might override multiple base fns.  */
>   for (size_t k = 0; k < base_fndecls.length (); k++)
> {
>   if (!base_fndecls[k] || !DECL_VINDEX (base_fndecls[k]))
> continue;
> - tree base_fndecl_vindex = DECL_VINDEX (base_fndecls[k]);
> - if (fndecl_vindex
> - && !tree_int_cst_compare (fndecl_vindex,
> -   base_fndecl_vindex))
> -   overriden_base_fndecls.add (base_fndecls[k]);
> - else if (IDENTIFIER_CONV_OP_P (name)
> -  && (DECL_NAME (fndecl)
> -  != DECL_NAME (base_fndecls[k])))
> + if (IDENTIFIER_CONV_OP_P (name)
> + && (DECL_NAME (fndecl)
> + != DECL_NAME (base_fndecls[k])))
> /* If base_fndecl[k] and fndecl are conversion operators
>to different types, they're unrelated.  */
> ;
> + else if (same_signature_p (fndecl, base_fndecls[k]))
> +   /* If fndecl has the same signature as base_fndecl[k], it
> +  overrides it.  */
> +   overriden_base_fndecls.add (base_fndecls[k]);
>   else
> hidden_base_fndecls.put (base_fndecls[k], fndecl);
> }
> diff --git a/gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C 
> b/gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C
> new file mode 100644
> index 000..5e669360ec9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Woverloaded-virt10.C
> @@ -0,0 +1,7 @@
> +// PR c++/117114
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options -Wall }
> +
> +struct Troops { virtual ~Troops(); };
> +struct Control { virtual int GetControl() const = 0; };
> +struct Army : Troops, Control { int GetControl() const override; };
> -- 
> 2.44.0

Re: [PATCH] DSE: add remove_unused_locals to the todos [PR117096]

2024-10-13 Thread Richard Biener

On Sun, Oct 13, 2024 at 12:07 AM Andrew Pinski  wrote:
>
> This is a better patch to fix PR 117096 (phiopt vs clobbers).
> The only time we remove clobbers of local variables is during
> remove_unused_locals. Since DSE might remove all of the stores
> to a local variable, it makes sense to also try to remove unused local
> variables afterwards.
> This shows up more in C++ code with encapsulation.
>
> This was previously mentioned to be done:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567744.html
> In the end it was chosen "Or we might want to place explicit schedules of
> remove_unused_locals in the pass pipeline." (in stdargs,
> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568807.html) but that is 
> too
> late for early phiopt. This patch does the "maybe we should unconditionally 
> do so"
> option instead.
>
> Bootstrapped and tested on x86_64-linux-gnu.

We've been very "careful" with placing this TODO it seems.  Note that now
all DSE pass invocations are followed by DCE which knows how to "lazily"
remove _indirect_ CLOBBERs and it can elide loads.  So I think it makes
more sense to stick the TODO_remove_unused_locals on the DCE pass?
Maybe add a 2nd pass parameter indicating whether to remove unused
locals after it?

Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/117096
> * tree-ssa-dse.cc (pass_data_dse): Add TODO_remove_unused_locals
> to todo_flags_finish.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssa-dse.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 63bf4491cf6..42a133b8631 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -1654,7 +1654,7 @@ const pass_data pass_data_dse =
>0, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  0, /* todo_flags_finish */
> +  TODO_remove_unused_locals, /* todo_flags_finish */
>  };
>
>  class pass_dse : public gimple_opt_pass
> --
> 2.43.0
>

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Hongtao Liu

On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu  wrote:
>
> On Fri, Oct 11, 2024 at 8:22 PM Richard Biener  wrote:
> >
> > The following helps the x86 backend by canonicalizing FMAs to have
> > any negation done to one of the commutative multiplication operands
> > be done to a register (and not a memory operand).  Likewise to
> > put a register operand first and a memory operand second;
> > swap_commutative_operands_p seems to treat REG_P and MEM_P the
> > same but comments indicate "complex expressiosn should be first".
> >
> > In particular this does (fma MEM REG REG) -> (fma REG MEM REG) and
> > (fma (neg MEM) REG REG) -> (fma (neg REG) MEM REG) which are the
> > reasons for the testsuite regressions in gcc.target/i386/cond_op_fma*.c
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > I'm not quite sure this is the correct approach - simplify-rtx
> > doesn't seem to do "only canonicalization" but the existing FMA
> > case looks odd in that context.
> >
> > Should the target simply reject cases with wrong "canonicalization"
> > or does it need to cope with all variants in the patterns that fail
> > matching during combine without the change?
> Let me try the backend fix first.
The backend fix requires at least 8 more patterns, so I think RTL
canonicalization would be better.
Please go ahead with the patch.
> >
> > Thanks,
> > Richard.
> >
> > PR target/117072
> > * simplify-rtx.cc (simplify_context::simplify_ternary_operation):
> > Adjust FMA canonicalization.
> > ---
> >  gcc/simplify-rtx.cc | 15 +--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > index e8e60404ef6..8b4fa0d7aa4 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -6830,10 +6830,21 @@ simplify_context::simplify_ternary_operation 
> > (rtx_code code, machine_mode mode,
> > op0 = tem, op1 = XEXP (op1, 0), any_change = true;
> > }
> >
> > -  /* Canonicalize the two multiplication operands.  */
> > +  /* Canonicalize the two multiplication operands.  A negation
> > +should go first and if possible the negation should be
> > +to a register.  */
> >/* a * -b + c  =>  -b * a + c.  */
> > -  if (swap_commutative_operands_p (op0, op1))
> > +  if (swap_commutative_operands_p (op0, op1)
> > + || (REG_P (op1) && !REG_P (op0) && GET_CODE (op0) != NEG))
> > std::swap (op0, op1), any_change = true;
> > +  else if (GET_CODE (op0) == NEG && !REG_P (XEXP (op0, 0))
> > +  && REG_P (op1))
> > +   {
> > + op0 = XEXP (op0, 0);
> > + op1 = simplify_gen_unary (NEG, mode, op1, mode);
> > + std::swap (op0, op1);
> > + any_change = true;
> > +   }
> >
> >if (any_change)
> > return gen_rtx_FMA (mode, op0, op1, op2);
> > --
> > 2.43.0
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

m68k: replace reload_in_progress by reload_in_progress || lra_in_progress

2024-10-13 Thread Andreas Schwab

For now assume that LRA needs the same treatment as reload.

* config/m68k/m68k.md ("movsi", "movxf"): Replace
reload_in_progress by reload_in_progress || lra_in_progress.
* config/m68k/m68k.cc (m68k_legitimate_mem_p)
(emit_move_sequence): Likewise.
* config/m68k/predicates.md ("fp_src_operand"): Likewise.
---
 gcc/config/m68k/m68k.cc   | 21 ++---
 gcc/config/m68k/m68k.md   |  9 +
 gcc/config/m68k/predicates.md |  1 +
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc
index 7986e92c511..729a1e8875d 100644
--- a/gcc/config/m68k/m68k.cc
+++ b/gcc/config/m68k/m68k.cc
@@ -2352,7 +2352,8 @@ m68k_legitimate_mem_p (rtx x, struct m68k_address 
*address)
 {
   return (MEM_P (x)
  && m68k_decompose_address (GET_MODE (x), XEXP (x, 0),
-reload_in_progress || reload_completed,
+(reload_in_progress || lra_in_progress
+ || reload_completed),
 address));
 }
 
@@ -3899,11 +3900,13 @@ emit_move_sequence (rtx *operands, machine_mode mode, 
rtx scratch_reg)
   rtx tem;
 
   if (scratch_reg
-  && reload_in_progress && GET_CODE (operand0) == REG
+  && (reload_in_progress || lra_in_progress)
+  && GET_CODE (operand0) == REG
   && REGNO (operand0) >= FIRST_PSEUDO_REGISTER)
 operand0 = reg_equiv_mem (REGNO (operand0));
   else if (scratch_reg
-  && reload_in_progress && GET_CODE (operand0) == SUBREG
+  && (reload_in_progress || lra_in_progress)
+  && GET_CODE (operand0) == SUBREG
   && GET_CODE (SUBREG_REG (operand0)) == REG
   && REGNO (SUBREG_REG (operand0)) >= FIRST_PSEUDO_REGISTER)
 {
@@ -3916,11 +3919,13 @@ emit_move_sequence (rtx *operands, machine_mode mode, 
rtx scratch_reg)
 }
 
   if (scratch_reg
-  && reload_in_progress && GET_CODE (operand1) == REG
+  && (reload_in_progress || lra_in_progress)
+  && GET_CODE (operand1) == REG
   && REGNO (operand1) >= FIRST_PSEUDO_REGISTER)
 operand1 = reg_equiv_mem (REGNO (operand1));
   else if (scratch_reg
-  && reload_in_progress && GET_CODE (operand1) == SUBREG
+  && (reload_in_progress || lra_in_progress)
+  && GET_CODE (operand1) == SUBREG
   && GET_CODE (SUBREG_REG (operand1)) == REG
   && REGNO (SUBREG_REG (operand1)) >= FIRST_PSEUDO_REGISTER)
 {
@@ -3932,11 +3937,13 @@ emit_move_sequence (rtx *operands, machine_mode mode, 
rtx scratch_reg)
   operand1 = alter_subreg (&temp, true);
 }
 
-  if (scratch_reg && reload_in_progress && GET_CODE (operand0) == MEM
+  if (scratch_reg && (reload_in_progress || lra_in_progress)
+  && GET_CODE (operand0) == MEM
   && ((tem = find_replacement (&XEXP (operand0, 0)))
  != XEXP (operand0, 0)))
 operand0 = gen_rtx_MEM (GET_MODE (operand0), tem);
-  if (scratch_reg && reload_in_progress && GET_CODE (operand1) == MEM
+  if (scratch_reg && (reload_in_progress || lra_in_progress)
+  && GET_CODE (operand1) == MEM
   && ((tem = find_replacement (&XEXP (operand1, 0)))
  != XEXP (operand1, 0)))
 operand1 = gen_rtx_MEM (GET_MODE (operand1), tem);
diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index e5c25288844..1c9a6bf1748 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -957,11 +957,12 @@ (define_expand "movsi"
   /* The source is an address which requires PIC relocation.
  Call legitimize_pic_address with the source, mode, and a relocation
  register (a new pseudo, or the final destination if reload_in_progress
- is set).   Then fall through normally */
-  rtx temp = reload_in_progress ? operands[0] : gen_reg_rtx (Pmode);
+ or lra_in_progress is set).   Then fall through normally */
+  rtx temp = ((reload_in_progress || lra_in_progress)
+ ? operands[0] : gen_reg_rtx (Pmode));
   operands[1] = legitimize_pic_address (operands[1], SImode, temp);
 }
-  else if (flag_pic && TARGET_PCREL && ! reload_in_progress)
+  else if (flag_pic && TARGET_PCREL && ! (reload_in_progress || 
lra_in_progress))
 {
   /* Don't allow writes to memory except via a register;
 the m68k doesn't consider PC-relative addresses to be writable.  */
@@ -1452,7 +1453,7 @@ (define_expand "movxf"
   ""
 {
   /* We can't rewrite operands during reload.  */
-  if (! reload_in_progress)
+  if (! (reload_in_progress || lra_in_progress))
 {
   if (CONSTANT_P (operands[1]))
{
diff --git a/gcc/config/m68k/predicates.md b/gcc/config/m68k/predicates.md
index 46fc3795a17..787e544a43f 100644
--- a/gcc/config/m68k/predicates.md
+++ b/gcc/config/m68k/predicates.md
@@ -237,6 +237,7 @@ (define_predicate "fp_src_operand"
  || (TARGET_68881
  && (!standard_68881_constant_p (op)

[PATCH v2] testsuite: Sanitize pacbti test cases for Cortex-M

2024-10-13 Thread Torbjörn SVENSSON

Ok for trunk and releases/gcc-14?

Changes since v1:

- Dropped changes to dg- instructions. These will be addressed in a separate 
set of patches later.

--

Some of the test cases were scanning for "bti", but it would,
incorrectly, match the ".arch_extenssion pacbti".

gcc/testsuite/ChangeLog:

* gcc.target/arm/bti-1.c: Check for asm instructions starting
with a tab.
* gcc.target/arm/bti-2.c: Likewise.
* gcc.target/arm/pac-1.c: Likewise.
* gcc.target/arm/pac-2.c: Likewise.
* gcc.target/arm/pac-3.c: Likewise.
* gcc.target/arm/pac-4.c: Likewise.
* gcc.target/arm/pac-6.c: Likewise.
* gcc.target/arm/pac-7.c: Likewise.
* gcc.target/arm/pac-8.c: Likewise.
* gcc.target/arm/pac-9.c: Likewise.
* gcc.target/arm/pac-10.c: Likewise.
* gcc.target/arm/pac-11.c: Likewise.
* gcc.target/arm/pac-15.c: Likewise.
* gcc.target/arm/pac-sibcall.c: Likewise.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/testsuite/gcc.target/arm/bti-1.c   | 2 +-
 gcc/testsuite/gcc.target/arm/bti-2.c   | 2 +-
 gcc/testsuite/gcc.target/arm/pac-1.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-10.c  | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-11.c  | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-15.c  | 2 +-
 gcc/testsuite/gcc.target/arm/pac-2.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-3.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-4.c   | 2 +-
 gcc/testsuite/gcc.target/arm/pac-6.c   | 6 +++---
 gcc/testsuite/gcc.target/arm/pac-7.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-8.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-9.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-sibcall.c | 2 +-
 14 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bti-1.c 
b/gcc/testsuite/gcc.target/arm/bti-1.c
index 79dd8010d2d..54a733d9f20 100644
--- a/gcc/testsuite/gcc.target/arm/bti-1.c
+++ b/gcc/testsuite/gcc.target/arm/bti-1.c
@@ -9,4 +9,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler "bti" } } */
+/* { dg-final { scan-assembler "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/bti-2.c 
b/gcc/testsuite/gcc.target/arm/bti-2.c
index 33910563849..645ad369424 100644
--- a/gcc/testsuite/gcc.target/arm/bti-2.c
+++ b/gcc/testsuite/gcc.target/arm/bti-2.c
@@ -55,4 +55,4 @@ lab2:
   return 2;
 }
 
-/* { dg-final { scan-assembler-times "bti" 15 } } */
+/* { dg-final { scan-assembler-times "\tbti" 14 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-1.c 
b/gcc/testsuite/gcc.target/arm/pac-1.c
index 9b26f62b65f..e0eea0858e0 100644
--- a/gcc/testsuite/gcc.target/arm/pac-1.c
+++ b/gcc/testsuite/gcc.target/arm/pac-1.c
@@ -6,6 +6,6 @@
 
 #include "pac.h"
 
-/* { dg-final { scan-assembler-times "pac\tip, lr, sp" 2 } } */
-/* { dg-final { scan-assembler-times "aut\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\tpac\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\taut\tip, lr, sp" 2 } } */
 /* { dg-final { scan-assembler-not "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-10.c 
b/gcc/testsuite/gcc.target/arm/pac-10.c
index a794195e8f6..6da8434aeaf 100644
--- a/gcc/testsuite/gcc.target/arm/pac-10.c
+++ b/gcc/testsuite/gcc.target/arm/pac-10.c
@@ -5,6 +5,6 @@
 
 #include "pac.h"
 
-/* { dg-final { scan-assembler "pac\tip, lr, sp" } } */
-/* { dg-final { scan-assembler "aut\tip, lr, sp" } } */
+/* { dg-final { scan-assembler "\tpac\tip, lr, sp" } } */
+/* { dg-final { scan-assembler "\taut\tip, lr, sp" } } */
 /* { dg-final { scan-assembler-not "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-11.c 
b/gcc/testsuite/gcc.target/arm/pac-11.c
index 37ffc93b41b..0bb727c2c80 100644
--- a/gcc/testsuite/gcc.target/arm/pac-11.c
+++ b/gcc/testsuite/gcc.target/arm/pac-11.c
@@ -5,6 +5,6 @@
 
 #include "pac.h"
 
-/* { dg-final { scan-assembler-times "pacbti\tip, lr, sp" 2 } } */
-/* { dg-final { scan-assembler-times "aut\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\tpacbti\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\taut\tip, lr, sp" 2 } } */
 /* { dg-final { scan-assembler-not "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-15.c 
b/gcc/testsuite/gcc.target/arm/pac-15.c
index e1054902955..979941492d3 100644
--- a/gcc/testsuite/gcc.target/arm/pac-15.c
+++ b/gcc/testsuite/gcc.target/arm/pac-15.c
@@ -24,7 +24,7 @@ int main (void)
 }
 
 /* { dg-final { scan-assembler-times "\.pacspval" 1 } } */
-/* { dg-final { scan-assembler-times "pac  ip, lr, sp" 3 } } */
+/* { dg-final { scan-assembler-times "\tpac\tip, lr, sp" 3 } } */
 /* { dg-final { scan-assembler-times "\.cfi_register 143, 12" 3 } } */
 /* { dg-final { scan-assembler-times "\.save {r7, ra_auth_code, lr}" 2 } } */
 /* { dg-final { scan-assembler-times "\.cfi_offset 143, -8" 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-2.c 
b/gcc/testsuite/gcc.target/arm/pac-2.c
index 945ce

[COMMITED] MAINTAINERS: Add myself to write after approval

2024-10-13 Thread Josef Melcr

ChangeLog:

* MAINTAINERS: Add myself to write after approval

Signed-off-by: Josef Melcr 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4a53521d8eb..f94aa9aeb79 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -657,6 +657,7 @@ Bryce McKinlay  bryce   

 Adam Megacz -   
 Bingfeng Meimeibf   
 Michael Meissnermeissner
+Josef Melcr -   
 Jason Merrill   jason   
 Jim Meyering-   
 Martin Michlmayrtbm 
-- 
2.47.0

[PATCH] testsuite: arm: Update expected asm in no-literal-pool-m0.c

2024-10-13 Thread Torbjörn SVENSSON

Ok for trunk?

--

With the changes in r15-1579-g792f97b44ff, the constants have been
updated. This patch aligns the constants in the test cases with the
updated implementation.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pure-code/no-literal-pool-m0.c: Update expected
asm.

Signed-off-by: Torbjörn SVENSSON 
---
 .../arm/pure-code/no-literal-pool-m0.c| 29 ++-
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c 
b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
index bd6f4af183b..effaf8b60b6 100644
--- a/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
+++ b/gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool-m0.c
@@ -95,8 +95,13 @@ test_65536 ()
 /*
 ** test_0x123456:
 ** ...
+** (
 ** movsr[0-3], #18
 ** lslsr[0-3], r[0-3], #8
+** |
+** movsr[0-3], #144
+** lslsr[0-3], r[0-3], #5
+** )
 ** addsr[0-3], r[0-3], #52
 ** lslsr[0-3], r[0-3], #8
 ** addsr[0-3], r[0-3], #86
@@ -125,18 +130,16 @@ test_0x1123456 ()
   return 0x1123456;
 }
 
-/* With -Os, we generate:
-   movs r0, #16
-   lsls r0, r0, r0
-   With the other optimization levels, we generate:
-   movs r0, #16
-   lsls r0, r0, #16
-   hence the two alternatives.  */
 /*
 ** test_0x110:
 ** ...
+** (
 ** movsr[0-3], #16
-** lslsr[0-3], r[0-3], (#16|r[0-3])
+** lslsr[0-3], r[0-3], #16
+** |
+** movsr[0-3], #128
+** lslsr[0-3], r[0-3], #13
+** )
 ** addsr[0-3], r[0-3], #1
 ** lslsr[0-3], r[0-3], #4
 ** ...
@@ -150,8 +153,13 @@ test_0x110 ()
 /*
 ** test_0x111:
 ** ...
+** (
 ** movsr[0-3], #1
 ** lslsr[0-3], r[0-3], #24
+** |
+** movsr[0-3], #128
+** lslsr[0-3], r[0-3], #17
+** )
 ** addsr[0-3], r[0-3], #17
 ** ...
 */
@@ -164,8 +172,13 @@ test_0x111 ()
 /*
 ** test_m8192:
 ** ...
+** (
 ** movsr[0-3], #1
 ** lslsr[0-3], r[0-3], #13
+** |
+** movsr[0-3], #128
+** lslsr[0-3], r[0-3], #6
+** )
 ** rsbsr[0-3], r[0-3], #0
 ** ...
 */
-- 
2.25.1

Re: [PATCH] libstdc++: improve std::atomic compatibility with Clang

2024-10-13 Thread Giuseppe D'Angelo


Hello,

On 11/10/2024 21:43, Jonathan Wakely wrote:


Or alternatively, and arguably simpler, we could get rid of the ugly
_GLIBCXX20_INIT macro and just do this:
+#if __cpp_lib_atomic_value_initialization
+  atomic() noexcept (is_nothrow_default_constructible_v<_Tp>)
+  requires (is_default_constructible_v<_Tp>)
+  : _M_i() { }
+#else
   atomic() = default;
+#endif
   ~atomic() noexcept = default;
   atomic(const atomic&) = delete;
   atomic& operator=(const atomic&) = delete;


This is now basically my original patch :) but it does correctly not 
value-initialize _M_i in pre-C++20 modes -- my bad, I didn't realize 
that P0883R2 was *not* meant to be treated as a DR and backported.




This more closely matches what's in the standard, except that we have
a requires-clause on the default constructor. It seems like a defect
that the standard doesn't do this, instead requiring
is_constructible> to be true, but Mandating an error if
you try to use it.



Did you mean is_default_constructible? Yes, the core of the "problem" 
seems to be the Mandates instead of the Constraints used to describe 
atomic()'s default constructor in 
https://eel.is/c++draft/atomics.types.operations#1 . I also think a 
constraint would be more natural, in line with the idea of atomic 
"wrapping" a T object. I don't seem to find any rationale for this 
choice in P0883's revisions, I guess I'll just ask SG1 and/or possibly 
submit a LWG issue.



Thank you again for the review,
--
Giuseppe D'Angelo



smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH v4] libstdc++: implement concatenation of strings and string_views

2024-10-13 Thread Giuseppe D'Angelo


Hello,

On 09/10/2024 22:39, Patrick Palka wrote:

+#if __glibcxx_string_view >= 202403L
+  // const string & + string_view
+  template
+[[nodiscard]]
+constexpr inline basic_string<_CharT, _Traits, _Alloc>


Redundant 'inline's


+operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
+  type_identity_t> __rhs)
+{
+  typedef basic_string<_CharT, _Traits, _Alloc> _Str;


These typedefs might as well be usings instead

Besides that LGTM!


Thank you for the review, updated patch attached to fix both of these.
(Just for the record, these had been C&P from the corresponding 
operator+ overloads that deal with const char *.)


Thanks,
--
Giuseppe D'Angelo
From 19c8c67f81661f41d97ca7c48f5feaf7444d6e48 Mon Sep 17 00:00:00 2001
From: Giuseppe D'Angelo 
Date: Tue, 30 Jul 2024 20:09:12 +0200
Subject: [PATCH] libstdc++: implement concatenation of strings and
 string_views

This adds support for P2591R5, merged for C++26.

libstdc++-v3/ChangeLog:

	* include/bits/basic_string.h: Implement the four operator+
	overloads between basic_string and (types convertible to)
	basic_string_view.
	* include/bits/version.def: Bump the feature-testing macro.
	* include/bits/version.h: Regenerate.
	* testsuite/21_strings/basic_string/operators/char/op_plus_fspath_neg.cc: New test.
	* testsuite/21_strings/basic_string/operators/char/op_plus_string_view.cc: New test.
	* testsuite/21_strings/basic_string/operators/char/op_plus_string_view_compat.cc:
	New test.

Signed-off-by: Giuseppe D'Angelo 
---
 libstdc++-v3/include/bits/basic_string.h  |  48 +
 libstdc++-v3/include/bits/version.def |   5 +
 libstdc++-v3/include/bits/version.h   |   7 +-
 .../operators/char/op_plus_fspath_neg.cc  |  13 ++
 .../operators/char/op_plus_string_view.cc | 169 ++
 .../char/op_plus_string_view_compat.cc|  63 +++
 6 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/operators/char/op_plus_fspath_neg.cc
 create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/operators/char/op_plus_string_view.cc
 create mode 100644 libstdc++-v3/testsuite/21_strings/basic_string/operators/char/op_plus_string_view_compat.cc

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index e9b17ea48b5..1b49b7e58c3 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3747,6 +3747,54 @@ _GLIBCXX_END_NAMESPACE_CXX11
 { return std::move(__lhs.append(1, __rhs)); }
 #endif
 
+#if __glibcxx_string_view >= 202403L
+  // const string & + string_view
+  template
+[[nodiscard]]
+constexpr basic_string<_CharT, _Traits, _Alloc>
+operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
+	   type_identity_t> __rhs)
+{
+  using _Str = basic_string<_CharT, _Traits, _Alloc>;
+  return std::__str_concat<_Str>(__lhs.data(), __lhs.size(),
+  __rhs.data(), __rhs.size(),
+  __lhs.get_allocator());
+}
+
+  // string && + string_view
+  template
+[[nodiscard]]
+constexpr basic_string<_CharT, _Traits, _Alloc>
+operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
+	   type_identity_t> __rhs)
+{
+  return std::move(__lhs.append(__rhs));
+}
+
+  // string_view + const string &
+  template
+[[nodiscard]]
+constexpr basic_string<_CharT, _Traits, _Alloc>
+operator+(type_identity_t> __lhs,
+	   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
+{
+  using _Str = basic_string<_CharT, _Traits, _Alloc>;
+  return std::__str_concat<_Str>(__lhs.data(), __lhs.size(),
+  __rhs.data(), __rhs.size(),
+  __rhs.get_allocator());
+}
+
+  // string_view + string &&
+  template
+[[nodiscard]]
+constexpr basic_string<_CharT, _Traits, _Alloc>
+operator+(type_identity_t> __lhs,
+	   basic_string<_CharT, _Traits, _Alloc>&& __rhs)
+{
+  return std::move(__rhs.insert(0, __lhs));
+}
+#endif
+
   // operator ==
   /**
*  @brief  Test equivalence of two strings.
diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def
index 477651f358a..b88376bae23 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -698,6 +698,11 @@ ftms = {
 
 ftms = {
   name = string_view;
+  values = {
+v = 202403;
+cxxmin = 26;
+hosted = yes;
+  };
   values = {
 v = 201803;
 cxxmin = 17;
diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h
index 342ee6b4c7a..ab551f6a719 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -776,7 +776,12 @@
 #undef __glibcxx_want_shared_ptr_weak_type
 
 #if !defined(__cpp_lib_string_view)
-# if (__cplusplus >= 201703L) && _GLIBCXX_HOSTED
+# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED
+#  define __glibcxx_string_view 202403L

Re: [PATCH] libstdc++: improve std::atomic compatibility with Clang

2024-10-13 Thread Jonathan Wakely

On Sun, 13 Oct 2024 at 18:59, Giuseppe D'Angelo
 wrote:
>
> Hello,
>
> On 11/10/2024 21:43, Jonathan Wakely wrote:
>
> > Or alternatively, and arguably simpler, we could get rid of the ugly
> > _GLIBCXX20_INIT macro and just do this:
> > +#if __cpp_lib_atomic_value_initialization
> > +  atomic() noexcept (is_nothrow_default_constructible_v<_Tp>)
> > +  requires (is_default_constructible_v<_Tp>)
> > +  : _M_i() { }
> > +#else
> >atomic() = default;
> > +#endif
> >~atomic() noexcept = default;
> >atomic(const atomic&) = delete;
> >atomic& operator=(const atomic&) = delete;
>
> This is now basically my original patch :) but it does correctly not
> value-initialize _M_i in pre-C++20 modes -- my bad, I didn't realize
> that P0883R2 was *not* meant to be treated as a DR and backported.

There was a related issue:
https://cplusplus.github.io/LWG/issue2334

Do you know what other implementations do?


>
> > This more closely matches what's in the standard, except that we have
> > a requires-clause on the default constructor. It seems like a defect
> > that the standard doesn't do this, instead requiring
> > is_constructible> to be true, but Mandating an error if
> > you try to use it.
>
>
> Did you mean is_default_constructible?

It's the same thing :-)
is_default_constructible is defined as is_constructible.

> Yes, the core of the "problem"
> seems to be the Mandates instead of the Constraints used to describe
> atomic()'s default constructor in
> https://eel.is/c++draft/atomics.types.operations#1 . I also think a
> constraint would be more natural, in line with the idea of atomic
> "wrapping" a T object. I don't seem to find any rationale for this
> choice in P0883's revisions, I guess I'll just ask SG1 and/or possibly
> submit a LWG issue.

I think that would be good, thanks.

[committed] libstdc++: Fix ranges::copy_backward for a single memcpyable element [PR117121]

2024-10-13 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk.

-- >8 --

The result iterator needs to be decremented before writing to it.

Improve the PR 108846 tests for all of std::copy, std::copy_n,
std::copy_backward, and the std::ranges versions.

libstdc++-v3/ChangeLog:

PR libstdc++/117121
* include/bits/ranges_algobase.h (copy_backward): Decrement
output iterator before assigning one element through it.
* testsuite/25_algorithms/copy/108846.cc: Ensure the algorithm's
effects are correct for a single memcpyable element.
* testsuite/25_algorithms/copy_backward/108846.cc: Likewise.
* testsuite/25_algorithms/copy_n/108846.cc: Likewise.
---
 libstdc++-v3/include/bits/ranges_algobase.h   |  5 +++--
 .../testsuite/25_algorithms/copy/108846.cc| 15 +++
 .../25_algorithms/copy_backward/108846.cc | 15 +++
 .../testsuite/25_algorithms/copy_n/108846.cc  | 15 +++
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index 40c628b3818..3c8d46198c5 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -418,12 +418,13 @@ namespace ranges
{
  using _ValueTypeI = iter_value_t<_Iter>;
  auto __num = __last - __first;
+ __result -= __num;
  if (__num > 1) [[likely]]
-   __builtin_memmove(__result - __num, __first,
+   __builtin_memmove(__result, __first,
  sizeof(_ValueTypeI) * __num);
  else if (__num == 1)
ranges::__assign_one<_IsMove>(__first, __result);
- return {__first + __num, __result - __num};
+ return {__first + __num, __result};
}
}
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/108846.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy/108846.cc
index e3b722c068a..a283e6fcd9f 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy/108846.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/108846.cc
@@ -25,10 +25,15 @@ test_pr108846()
 B *src = &dsrc;
 // If this is optimized to memmove it will overwrite tail padding.
 std::copy(src, src+1, dst);
+// Check tail padding is unchanged:
 VERIFY(ddst.x == 3);
+// Check B subobject was copied:
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #if __cpp_lib_ranges >= 201911L
+ddst.i = ddst.j = 99;
 std::ranges::copy(src, src+1, dst);
 VERIFY(ddst.x == 3);
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #endif
 }
 
@@ -52,10 +57,15 @@ test_non_const_copy_assign()
 B2 *src = &dsrc;
 // Ensure the not-taken trivial copy path works for this type.
 std::copy(src, src+1, dst);
+// Check tail padding is unchanged:
 VERIFY(ddst.x == 3);
+// Check B subobject was copied:
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #if __cpp_lib_ranges >= 201911L
+ddst.i = ddst.j = 99;
 std::ranges::copy(src, src+1, dst);
 VERIFY(ddst.x == 3);
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #endif
 }
 
@@ -81,10 +91,15 @@ test_non_const_copy_assign_trivial()
 B3 *src = &dsrc;
 // If this is optimized to memmove it will overwrite tail padding.
 std::copy(src, src+1, dst);
+// Check tail padding is unchanged:
 VERIFY(ddst.x == 3);
+// Check B subobject was copied:
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #if __cpp_lib_ranges >= 201911L
+ddst.i = ddst.j = 99;
 std::ranges::copy(src, src+1, dst);
 VERIFY(ddst.x == 3);
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #endif
 }
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy_backward/108846.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy_backward/108846.cc
index 206748d92d3..855ee3e182f 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy_backward/108846.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy_backward/108846.cc
@@ -25,10 +25,15 @@ test_pr108846()
 B *src = &dsrc;
 // If this is optimized to memmove it will overwrite tail padding.
 std::copy_backward(src, src+1, dst+1);
+// Check tail padding is unchanged:
 VERIFY(ddst.x == 3);
+// Check B subobject was copied:
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #if __cpp_lib_ranges >= 201911L
+ddst.i = ddst.j = 99;
 std::ranges::copy_backward(src, src+1, dst+1);
 VERIFY(ddst.x == 3);
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #endif
 }
 
@@ -52,10 +57,15 @@ test_non_const_copy_assign()
 B2 *src = &dsrc;
 // Ensure the not-taken trivial copy path works for this type.
 std::copy_backward(src, src+1, dst+1);
+// Check tail padding is unchanged:
 VERIFY(ddst.x == 3);
+// Check B subobject was copied:
+VERIFY(ddst.i == 4 && ddst.j == 5);
 #if __cpp_lib_ranges >= 201911L
+ddst.i = ddst.j = 99;
 std::ranges::copy_backward(src, src+1, dst+1);
 VERIFY(

[PATCH] libquadmath: Typo in powq summary

2024-10-13 Thread Ivan Agarsky

Hello,

since step 9 in "Basics: Contributing to GCC in 10 easy steps" says I
should not commit the first few patches, I kindly ask someone to commit
the following for me:

libquadmath\ChangeLog:

* math/powq.c:


--- a/libquadmath/math/powq.c
+++ b/libquadmath/math/powq.c
@@ -37,7 +37,7 @@
  * 1. Compute and return log2(x) in two pieces:
  * log2(x) = w1 + w2,
  *where w1 has 113-53 = 60 bit trailing zeros.
- * 2. Perform y*log2(x) = n+y' by simulating muti-precision
+ * 2. Perform y*log2(x) = n+y' by simulating multi-precision
  *arithmetic, where |y'|<=0.5.
  * 3. Return x**y = 2**n*exp(y'*log2)
  *
--


Kind regards,
Ivan

[PATCH 4/5] dce: Use a base common base class for pass_cd_dce and pass_dce

2024-10-13 Thread Andrew Pinski

The classes pass_dce and pass_cd_dce share the same mechansim for their
params and almost the same execute functionality so let's create a new
base class which will be used for these two classes and move the common
code into the same one.

Note update_address_taken_p was updated to be a NSDMI instead of initializing
it explicitly in the constructor.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-dce.cc (tree_ssa_dce): Remove.
(tree_ssa_cd_dce): Remove.
(class pass_dce_base): New class.
(class pass_dce): Use pass_dce_base as the base class.
(class pass_cd_dce): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-dce.cc | 72 +++--
 1 file changed, 31 insertions(+), 41 deletions(-)

diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 66612b5d575..3075459e25f 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2057,19 +2057,6 @@ perform_tree_ssa_dce (bool aggressive)
   return todo;
 }
 
-/* Pass entry points.  */
-static unsigned int
-tree_ssa_dce (void)
-{
-  return perform_tree_ssa_dce (/*aggressive=*/false);
-}
-
-static unsigned int
-tree_ssa_cd_dce (void)
-{
-  return perform_tree_ssa_dce (/*aggressive=*/optimize >= 2);
-}
-
 namespace {
 
 const pass_data pass_data_dce =
@@ -2085,15 +2072,11 @@ const pass_data pass_data_dce =
   0, /* todo_flags_finish */
 };
 
-class pass_dce : public gimple_opt_pass
+class pass_dce_base : public gimple_opt_pass
 {
 public:
-  pass_dce (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_dce, ctxt), update_address_taken_p (false)
-  {}
-
   /* opt_pass methods: */
-  opt_pass * clone () final override { return new pass_dce (m_ctxt); }
+  bool gate (function *) final override { return flag_tree_dce != 0; }
   void set_pass_param (unsigned n, bool param) final override
 {
   gcc_assert (n == 0 || n == 1);
@@ -2102,17 +2085,38 @@ public:
   else if (n == 1)
remove_unused_locals_p = param;
 }
-  bool gate (function *) final override { return flag_tree_dce != 0; }
-  unsigned int execute (function *) final override
+
+protected:
+  pass_dce_base (const pass_data &data, gcc::context *ctxt)
+: gimple_opt_pass (data, ctxt)
+  {}
+  unsigned int execute_dce (function *, bool aggressive)
 {
-  return (tree_ssa_dce ()
+  return (perform_tree_ssa_dce (aggressive)
  | (remove_unused_locals_p ? TODO_remove_unused_locals : 0)
  | (update_address_taken_p ? TODO_update_address_taken : 0));
 }
 
 private:
-  bool update_address_taken_p;
+  bool update_address_taken_p = false;
   bool remove_unused_locals_p = false;
+}; // class pass_dce_base
+
+
+class pass_dce : public pass_dce_base
+{
+public:
+  pass_dce (gcc::context *ctxt)
+: pass_dce_base (pass_data_dce, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  opt_pass * clone () final override { return new pass_dce (m_ctxt); }
+  unsigned int execute (function *func) final override
+{
+  return execute_dce (func, /*aggressive=*/false);
+}
+
 }; // class pass_dce
 
 } // anon namespace
@@ -2138,34 +2142,20 @@ const pass_data pass_data_cd_dce =
   0, /* todo_flags_finish */
 };
 
-class pass_cd_dce : public gimple_opt_pass
+class pass_cd_dce : public pass_dce_base
 {
 public:
   pass_cd_dce (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_cd_dce, ctxt), update_address_taken_p (false)
+: pass_dce_base (pass_data_cd_dce, ctxt)
   {}
 
   /* opt_pass methods: */
   opt_pass * clone () final override { return new pass_cd_dce (m_ctxt); }
-  void set_pass_param (unsigned n, bool param) final override
-{
-  gcc_assert (n == 0 || n == 1);
-  if (n == 0)
-   update_address_taken_p = param;
-  else if (n == 1)
-   remove_unused_locals_p = param;
-}
-  bool gate (function *) final override { return flag_tree_dce != 0; }
-  unsigned int execute (function *) final override
+  unsigned int execute (function *func) final override
 {
-  return (tree_ssa_cd_dce ()
- | (remove_unused_locals_p ? TODO_remove_unused_locals : 0)
- | (update_address_taken_p ? TODO_update_address_taken : 0));
+  return execute_dce (func, /*aggressive=*/optimize >= 2);
 }
 
-private:
-  bool update_address_taken_p;
-  bool remove_unused_locals_p = false;
 }; // class pass_cd_dce
 
 } // anon namespace
-- 
2.43.0

[PATCH 5/5] passes: Remove limit on the number of params

2024-10-13 Thread Andrew Pinski

Having a limit of 2 params for NEXT_PASS was just done because I didn't think 
there was
a way to handle arbitrary number of params. But I found that we can handle this
via a static const variable array (constexpr so we know it is true or false at 
compile time)
and just loop over the array.

Note I keep around NEXT_PASS_WITH_ARG and NEXT_PASS macros instead of always 
using
NEXT_PASS_WITH_ARGS macro to make sure these cases get optimized for -O0 
(stage1).

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gen-pass-instances.awk: Remove the limit of the params.
* pass_manager.h (NEXT_PASS_WITH_ARG2): Rename to ...
(NEXT_PASS_WITH_ARGS): This.
* passes.cc (NEXT_PASS_WITH_ARG2): Rename to ...
(NEXT_PASS_WITH_ARGS): This and support more than 2 params by using
a constexpr array.

Signed-off-by: Andrew Pinski 
---
 gcc/gen-pass-instances.awk | 11 ++-
 gcc/pass_manager.h |  2 +-
 gcc/passes.cc  | 13 +
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index def09347765..9bd73c9ce0f 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -195,7 +195,6 @@ function replace_pass(line, fnname, num, i)
 }
 
 END {
-  max_number_args = 2;
   for (i = 1; i < lineno; i++)
 {
   ret = parse_line(lines[i], "NEXT_PASS");
@@ -220,13 +219,8 @@ END {
  if (num_args > 0)
{
  printf "NEXT_PASS_WITH_ARG";
- if (num_args > max_number_args)
-   {
- print "ERROR: Only supports up to " max_number_args " args to 
NEXT_PASS";
- exit 1;
-   }
  if (num_args != 1)
-   printf num_args;
+   printf "S";
}
  else
printf "NEXT_PASS";
@@ -266,8 +260,7 @@ END {
   print "#undef POP_INSERT_PASSES"
   print "#undef NEXT_PASS"
   print "#undef NEXT_PASS_WITH_ARG"
-  for (i = 2; i <= max_number_args; i++)
-print "#undef NEXT_PASS_WITH_ARG" i
+  print "#undef NEXT_PASS_WITH_ARGS"
   print "#undef TERMINATE_PASS_LIST"
 }
 
diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index f18ae026257..294cdd0b1f7 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -130,7 +130,7 @@ private:
 #define POP_INSERT_PASSES()
 #define NEXT_PASS(PASS, NUM) opt_pass *PASS ## _ ## NUM
 #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
-#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) NEXT_PASS (PASS, NUM)
+#define NEXT_PASS_WITH_ARGS(PASS, NUM, ...) NEXT_PASS (PASS, NUM)
 #define TERMINATE_PASS_LIST(PASS)
 
 #include "pass-instances.def"
diff --git a/gcc/passes.cc b/gcc/passes.cc
index b5475fce522..ae80f40b96a 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -1589,7 +1589,7 @@ pass_manager::pass_manager (context *ctxt)
 #define POP_INSERT_PASSES()
 #define NEXT_PASS(PASS, NUM) PASS ## _ ## NUM = NULL
 #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
-#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) NEXT_PASS (PASS, NUM)
+#define NEXT_PASS_WITH_ARGS(PASS, NUM, ...) NEXT_PASS (PASS, NUM)
 #define TERMINATE_PASS_LIST(PASS)
 #include "pass-instances.def"
 
@@ -1636,11 +1636,16 @@ pass_manager::pass_manager (context *ctxt)
   PASS ## _ ## NUM->set_pass_param (0, ARG);   \
 } while (0)
 
-#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) \
+#define NEXT_PASS_WITH_ARGS(PASS, NUM, ...)\
 do {   \
   NEXT_PASS (PASS, NUM);   \
-  PASS ## _ ## NUM->set_pass_param (0, ARG0);  \
-  PASS ## _ ## NUM->set_pass_param (1, ARG1);  \
+  static constexpr bool values[] = { __VA_ARGS__ };\
+  unsigned i = 0;  \
+  for (bool value : values)\
+   {   \
+ PASS ## _ ## NUM->set_pass_param (i, value);  \
+ i++;  \
+   }   \
 } while (0)
 
 #include "pass-instances.def"
-- 
2.43.0

[SH][committed] Add -fno-math-errno to fsca,fsrra tests.

2024-10-13 Thread Oleg Endo

Without -fno-math-errno some of the test might fail because the expected
insns will not be generated.

Tested with make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-
mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed to master.

gcc/testsuite/ChangeLog:
* gcc.target/sh/pr53512-1.c: Add -fno-math-errno option.
* gcc.target/sh/pr53512-2.c: Likewise.
* gcc.target/sh/pr53512-3.c: Likewise.
* gcc.target/sh/pr53512-4.c: Likewise.
* gcc.target/sh/pr54680.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/sh/pr53512-1.c b/gcc/testsuite/gcc.target/sh/pr53512-1.c
index 14106c0..a032738 100644
--- a/gcc/testsuite/gcc.target/sh/pr53512-1.c
+++ b/gcc/testsuite/gcc.target/sh/pr53512-1.c
@@ -1,8 +1,8 @@
 /* Verify that the fsca insn is used when specifying -mfsca and
   -funsafe-math-optimizations.  */
 /* { dg-do compile { target { has_fsca } } }  */
-/* { dg-options "-O1 -mfsca -funsafe-math-optimizations" } */
+/* { dg-options "-O1 -mfsca -funsafe-math-optimizations -fno-math-errno" } */
 /* { dg-final { scan-assembler-times "fsca" 3 } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/sh/pr53512-2.c b/gcc/testsuite/gcc.target/sh/pr53512-2.c
index a79e58a..48872a1 100644
--- a/gcc/testsuite/gcc.target/sh/pr53512-2.c
+++ b/gcc/testsuite/gcc.target/sh/pr53512-2.c
@@ -1,8 +1,8 @@
 /* Verify that the fsca insn is not used when specifying -mno-fsca and
   -funsafe-math-optimizations.  */
 /* { dg-do compile { target { has_fsca } } }  */
-/* { dg-options "-O1 -mno-fsca -funsafe-math-optimizations" } */
+/* { dg-options "-O1 -mno-fsca -funsafe-math-optimizations -fno-math-errno" } */
 /* { dg-final { scan-assembler-not "fsca" } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/sh/pr53512-3.c b/gcc/testsuite/gcc.target/sh/pr53512-3.c
index 19e9ede..b834f35 100644
--- a/gcc/testsuite/gcc.target/sh/pr53512-3.c
+++ b/gcc/testsuite/gcc.target/sh/pr53512-3.c
@@ -1,8 +1,8 @@
 /* Verify that the fsrra insn is used when specifying -mfsrra and
   -funsafe-math-optimizations and -ffinite-math-only.  */
 /* { dg-do compile { target { has_fsrra } } }  */
-/* { dg-options "-O1 -mfsrra -funsafe-math-optimizations -ffinite-math-only" } */
+/* { dg-options "-O1 -mfsrra -funsafe-math-optimizations -ffinite-math-only -fno-math-errno" } */
 /* { dg-final { scan-assembler "fsrra" } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/sh/pr53512-4.c b/gcc/testsuite/gcc.target/sh/pr53512-4.c
index a1d3e81..01a981d 100644
--- a/gcc/testsuite/gcc.target/sh/pr53512-4.c
+++ b/gcc/testsuite/gcc.target/sh/pr53512-4.c
@@ -1,8 +1,8 @@
 /* Verify that the fsrra insn is not used when specifying -mno-fsrra and
   -funsafe-math-optimizations and -ffinite-math-only.  */
 /* { dg-do compile { target { has_fsrra } } }  */
-/* { dg-options "-O1 -mno-fsrra -funsafe-math-optimizations -ffinite-math-only" } */
+/* { dg-options "-O1 -mno-fsrra -funsafe-math-optimizations -ffinite-math-only -fno-math-errno" } */
 /* { dg-final { scan-assembler-not "fsrra" } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/sh/pr54680.c b/gcc/testsuite/gcc.target/sh/pr54680.c
index 7b02de3..1ca67b7 100644
--- a/gcc/testsuite/gcc.target/sh/pr54680.c
+++ b/gcc/testsuite/gcc.target/sh/pr54680.c
@@ -1,9 +1,9 @@
 /* Verify that the fsca input value is not converted to float and then back
to int.  Notice that we can't count just "lds" insns because mode switches
use "lds.l".  */
 /* { dg-do compile { target { has_fsca } } }  */
-/* { dg-options "-O2 -mfsca -funsafe-math-optimizations -fno-ipa-icf" }  */
+/* { dg-options "-O2 -mfsca -funsafe-math-optimizations -fno-ipa-icf -fno-math-errno" }  */
 /* { dg-final { scan-assembler-times "fsca" 7 } } */
 /* { dg-final { scan-assembler-times "shad" 1 } } */
 /* { dg-final { scan-assembler-times "lds\tr\[0-9\],fpul" 6 } } */
 /* { dg-final { scan-assembler-times "fmul" 2 } } */
--
libgit2 1.7.2

[SH][committed] PR 113533

2024-10-13 Thread Oleg Endo

For memory loads/stores (that contain a MEM rtx) sh_rtx_costs would wrongly
report a cost lower than 1 insn which is not accurate as it makes
loads/stores appear cheaper than simple arithmetic insns.  The cost of a
load/store insn is at least 1 insn plus the cost of the address expression
(some addressing modes can be considered more expensive than others due to
additional constraints).

Tested with make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-
mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

CSiBE set shows a little bit of +/- code size movement due to some insn
reordering.  Difficult to judge whether it's all good or bad.  Doesn't seem
that significant.

Thanks to Roger for the original patch proposal.
Committed to master.


gcc/ChangeLog:

PR target/113533
* config/sh/sh.cc (sh_rtx_costs): Adjust cost estimation of MEM rtx
to be always at least COST_N_INSNS (1).  Forward speed argument to
sh_address_cost.

From b717c462b96e7870f8081d2bc330e4749a4b0538 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Sun, 13 Oct 2024 11:36:38 +0900
Subject: [PATCH] SH: Fix cost estimation of mem load/store

For memory loads/stores (that contain a MEM rtx) sh_rtx_costs would wrongly
report a cost lower than 1 insn which is not accurate as it makes loads/stores
appear cheaper than simple arithmetic insns.  The cost of a load/store insn is
at least 1 insn plus the cost of the address expression (some addressing modes
can be considered more expensive than others due to additional constraints).

gcc/ChangeLog:

	PR target/113533
	* config/sh/sh.cc (sh_rtx_costs): Adjust cost estimation of MEM rtx
	to be always at least COST_N_INSNS (1).  Forward speed argument to
	sh_address_cost.

Co-authored-by: Roger Sayle 
---
 gcc/config/sh/sh.cc | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 7391b8d..6ad202f 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -3230,9 +3230,9 @@ multcosts (rtx x ATTRIBUTE_UNUSED)
scanned.  In either case, *TOTAL contains the cost result.  */
 static bool
 sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	  int opno ATTRIBUTE_UNUSED,
-	  int *total, bool speed ATTRIBUTE_UNUSED)
+	  int *total, bool speed)
 {
   int code = GET_CODE (x);
 
   switch (code)
@@ -3263,12 +3263,14 @@ sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	  return true;
 }
   return false;
 
-/* The cost of a mem access is mainly the cost of the address mode.  */
+/* The cost of a mem access is mainly the cost of the address mode on top
+   of the cost of the load/store insn itself.  */
 case MEM:
   *total = sh_address_cost (XEXP (x, 0), GET_MODE (x), MEM_ADDR_SPACE (x),
-true);
+speed)
+	   + COSTS_N_INSNS (1);
   return true;
 
 case IF_THEN_ELSE:
   /* This case is required for the if_then_else negc pattern.  */
@@ -3316,9 +3318,10 @@ sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
   if (MEM_P (XEXP (x, 0)))
 	{
 	  *total = sh_address_cost (XEXP (XEXP (x, 0), 0),
 GET_MODE (XEXP (x, 0)),
-MEM_ADDR_SPACE (XEXP (x, 0)), true);
+MEM_ADDR_SPACE (XEXP (x, 0)), speed)
+		   + COSTS_N_INSNS (1);
 	  return true;
 	}
   return false;
 
@@ -3334,9 +3337,10 @@ sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	{
 	  /* Handle SH2A's movu.b and movu.w insn.  */
 	  *total = sh_address_cost (XEXP (XEXP (x, 0), 0), 
 GET_MODE (XEXP (x, 0)), 
-MEM_ADDR_SPACE (XEXP (x, 0)), true);
+MEM_ADDR_SPACE (XEXP (x, 0)), speed)
+		   + COSTS_N_INSNS (1);
 	  return true;
 	}
   return false;
 
@@ -3349,16 +3353,18 @@ sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	  if (GET_CODE (xx) == SET && MEM_P (XEXP (xx, 0)))
 	{
 	  *total = sh_address_cost (XEXP (XEXP (xx, 0), 0), 
 	GET_MODE (XEXP (xx, 0)),
-	MEM_ADDR_SPACE (XEXP (xx, 0)), true);
+	MEM_ADDR_SPACE (XEXP (xx, 0)), speed);
+		   + COSTS_N_INSNS (1);
 	  return true;
 	}
 	  if (GET_CODE (xx) == SET && MEM_P (XEXP (xx, 1)))
 	{
 	  *total = sh_address_cost (XEXP (XEXP (xx, 1), 0),
 	GET_MODE (XEXP (xx, 1)),
-	MEM_ADDR_SPACE (XEXP (xx, 1)), true);
+	MEM_ADDR_SPACE (XEXP (xx, 1)), speed);
+		   + COSTS_N_INSNS (1);
 	  return true;
 	}
 	}
 
--
libgit2 1.7.2

Re: [PATCH 3/5][_Hashtable] Avoid redundant usage of rehash policy

2024-10-13 Thread François Dumont


Here is a fresh rebased version.

François

On 04/01/2024 06:50, François Dumont wrote:

Here is an updated version.

    libstdc++: [_Hashtable] Avoid redundant usage of rehash policy

    Bypass call to __detail::__distance_fwd and the check if rehash is 
needed when
    instantiating from an iterator range or assigning an 
initializer_list to an

    unordered_multimap or unordered_multiset.

    libstdc++-v3/ChangeLog:

    * include/bits/hashtable.h
    (_Hashtable<>::_M_insert_range(_InputIte, _InputIte, 
_NodeGen&)): New.

(_Hashtable<>::operator=(initializer_list)): Use latter.
    (_Hashtable<>::_Hashtable(_InputIte, _InputIte, size_type, 
const _Hash&, const _Equal&,

    const allocator_type&, false_type)): Use latter.
    * include/bits/hashtable_policy.h
    (_Insert_base<>::_M_insert_range): Rename in...
    (_Insert_base<>::_M_insert): ...this and private.

Ok to commit ?

François


On 21/12/2023 23:17, Jonathan Wakely wrote:
On Thu, 23 Nov 2023 at 21:59, François Dumont  
wrote:

  libstdc++: [_Hashtable] Avoid redundant usage of rehash policy

  Bypass call to __detail::__distance_fwd and the check if 
rehash is

needed when
  assigning an initializer_list to an unordered_multimap or
unordered_multiset.

I find this patch and the description a bit confusing. It would help
if the new _Hashtable::_M_insert_range function had a comment (or a
different name!) explaining how it's different from the existing
_Insert_base::_M_insert_range functions.



  libstdc++-v3/ChangeLog:

  * include/bits/hashtable.h
  (_Hashtable<>::_M_insert_range(_InputIte, _InputIte,
_NodeGen&)): New.
(_Hashtable<>::operator=(initializer_list)): Use latter.
  (_Hashtable<>::_Hashtable(_InputIte, _InputIte, 
size_type,

const _Hash&, const _Equal&,
  const allocator_type&, false_type)): Use latter.
  * include/bits/hashtable_policy.h
(_Insert_base<>::_M_insert_range(_InputIte, _InputIte,
true_type)): Use latter.
(_Insert_base<>::_M_insert_range(_InputIte, _InputIte,
false_type)): Likewise.

Tested under Linux x64

Ok to commit ?

Françoisdiff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index b4e8e4d3fb2..abb672a2dbf 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -624,7 +624,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if (_M_bucket_count < __l_bkt_count)
  rehash(__l_bkt_count);
 
-   this->_M_insert_range(__l.begin(), __l.end(), __roan, __unique_keys{});
+   _M_insert_range(__l.begin(), __l.end(), __roan);
return *this;
   }
 
@@ -992,6 +992,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_insert(const_iterator, _Arg&&,
  _NodeGenerator&, false_type __uks);
 
+  template
+   void
+   _M_insert_range(_InputIterator __first, _InputIterator __last,
+   _NodeGenerator&);
+
   size_type
   _M_erase(true_type __uks, const key_type&);
 
@@ -1307,8 +1312,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
 
__alloc_node_gen_t __node_gen(*this);
-   for (; __f != __l; ++__f)
- _M_insert(*__f, __node_gen, __uks);
+   _M_insert_range(__f, __l, __node_gen);
   }
 
   template
+template
+  void
+  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+_Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
+  _M_insert_range(_InputIterator __first, _InputIterator __last,
+ _NodeGenerator& __node_gen)
+  {
+   for (; __first != __last; ++__first)
+ _M_insert(*__first, __node_gen, __unique_keys{});
+  }
+
   template(this)); }
 
-  template
+private:
+  template
void
-   _M_insert_range(_InputIterator __first, _InputIterator __last,
-   _NodeGetter&, true_type __uks);
+   _M_insert(_InputIterator __first, _InputIterator __last,
+ true_type __uks);
 
-  template
+  template
void
-   _M_insert_range(_InputIterator __first, _InputIterator __last,
-   _NodeGetter&, false_type __uks);
+   _M_insert(_InputIterator __first, _InputIterator __last,
+ false_type __uks);
 
 public:
   using iterator = _Node_iterator<_Value, __constant_iterators::value,
@@ -1075,41 +1076,37 @@ namespace __detail
   template
void
insert(_InputIterator __first, _InputIterator __last)
-   {
- __hashtable& __h = _M_conjure_hashtable();
- __node_gen_type __node_gen(__h);
- return _M_insert_range(__first, __last, __node_gen, __unique_keys{});
-   }
+   { _M_insert(__first, __last, __unique_keys{}); }
 };
 
   template
-template
+template
   void
   _Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused,

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Richard Biener

On Mon, 14 Oct 2024, Hongtao Liu wrote:

> On Sun, Oct 13, 2024 at 8:02 PM Richard Biener  wrote:
> >
> > On Sun, 13 Oct 2024, Hongtao Liu wrote:
> >
> > > On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener  
> > > > wrote:
> > > > >
> > > > > The following helps the x86 backend by canonicalizing FMAs to have
> > > > > any negation done to one of the commutative multiplication operands
> > > > > be done to a register (and not a memory operand).  Likewise to
> > > > > put a register operand first and a memory operand second;
> > > > > swap_commutative_operands_p seems to treat REG_P and MEM_P the
> > > > > same but comments indicate "complex expressiosn should be first".
> > > > >
> > > > > In particular this does (fma MEM REG REG) -> (fma REG MEM REG) and
> > > > > (fma (neg MEM) REG REG) -> (fma (neg REG) MEM REG) which are the
> > > > > reasons for the testsuite regressions in 
> > > > > gcc.target/i386/cond_op_fma*.c
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > >
> > > > > I'm not quite sure this is the correct approach - simplify-rtx
> > > > > doesn't seem to do "only canonicalization" but the existing FMA
> > > > > case looks odd in that context.
> > > > >
> > > > > Should the target simply reject cases with wrong "canonicalization"
> > > > > or does it need to cope with all variants in the patterns that fail
> > > > > matching during combine without the change?
> > > > Let me try the backend fix first.
> > > The backend fix requires at least 8 more patterns, so I think RTL
> > > canonicalization would be better.
> > > Please go ahead with the patch.
> >
> > I'm still looking for insights on how we usually canonicalize on RTL
> > (and where we document canonicalizations) and how we maintain RTL
> > in canonical form.
> >
> > I'm also still wondering why the backend accepts "non-canonical" RTL
> > instead of rejecting it, giving the middle-end the chance to try
> > an alternative variant?
> So you mean middle-end will alway canonicalize (fma: reg mem reg) to
> (fma: mem reg reg)?
> I only saw that RTL will canonicalize (fma: a (neg: b) c) to (fma: (neg a) b 
> c).
> Or what do you mean about "non-canonical" RTL in the backend?

"non-canonical" RTL in the backend is what the patterns in question
for this bugreport do not accept.  But maybe I'm missing something here.

IIRC there's code somewhere in combine to try several "canonical"
varaints of an insns in recog_for_combine, but as said I'm not very
familiar with how things work on RTL to decide what's conceptually the
correct thing to do here.  I just discvered simplify_rtx already does some
minor canonicalization for FMAs ...

Richard.

> >
> > Richard.
> >
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > PR target/117072
> > > > > * simplify-rtx.cc 
> > > > > (simplify_context::simplify_ternary_operation):
> > > > > Adjust FMA canonicalization.
> > > > > ---
> > > > >  gcc/simplify-rtx.cc | 15 +--
> > > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > > > > index e8e60404ef6..8b4fa0d7aa4 100644
> > > > > --- a/gcc/simplify-rtx.cc
> > > > > +++ b/gcc/simplify-rtx.cc
> > > > > @@ -6830,10 +6830,21 @@ simplify_context::simplify_ternary_operation 
> > > > > (rtx_code code, machine_mode mode,
> > > > > op0 = tem, op1 = XEXP (op1, 0), any_change = true;
> > > > > }
> > > > >
> > > > > -  /* Canonicalize the two multiplication operands.  */
> > > > > +  /* Canonicalize the two multiplication operands.  A negation
> > > > > +should go first and if possible the negation should be
> > > > > +to a register.  */
> > > > >/* a * -b + c  =>  -b * a + c.  */
> > > > > -  if (swap_commutative_operands_p (op0, op1))
> > > > > +  if (swap_commutative_operands_p (op0, op1)
> > > > > + || (REG_P (op1) && !REG_P (op0) && GET_CODE (op0) != NEG))
> > > > > std::swap (op0, op1), any_change = true;
> > > > > +  else if (GET_CODE (op0) == NEG && !REG_P (XEXP (op0, 0))
> > > > > +  && REG_P (op1))
> > > > > +   {
> > > > > + op0 = XEXP (op0, 0);
> > > > > + op1 = simplify_gen_unary (NEG, mode, op1, mode);
> > > > > + std::swap (op0, op1);
> > > > > + any_change = true;
> > > > > +   }
> > > > >
> > > > >if (any_change)
> > > > > return gen_rtx_FMA (mode, op0, op1, op2);
> > > > > --
> > > > > 2.43.0
> > > >
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> > >
> > >
> > >
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> > HRB 36809 (AG Nuernberg)
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernbe

Re: [PATCH v7] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Kito Cheng

> +  if (hwprobe_ima_ext.value & RISCV_HWPROBE_IMA_C)
> +{
> +  SET_EXT (C);
> +  SET_EXT (ZCA);
> +  if (hwprobe_ima_ext.value & RISCV_HWPROBE_IMA_FD)
> +   {
> + SET_EXT (ZCF);

Zcf if F is specified (RV32 only)

https://github.com/riscv/riscv-isa-manual/blob/main/src/zc.adoc#c

> + SET_EXT (ZCD);
> +   }
> +}

Re: [wwwdocs] projects/gomp: Update for TR13; gcc-15/changes.html link to routine doc

2024-10-13 Thread Gerald Pfeifer

On Tue, 8 Oct 2024, Tobias Burnus wrote:
> Update and sync with
> https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Implementation-Status.html
> 
> Current page: https://gcc.gnu.org/projects/gomp/ (and
> https://gcc.gnu.org/gcc-15/changes.html )
> 
> Comments before I apply it?

Quite some work! I like how you changed some anchors from "manual" to
"Memory Allocation"; this looks like an improvement. Maybe this could be 
  ...Memory Allocation in the manual...
even?

Gerald

[PATCH] middle-end/116891 - fix (negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)

2024-10-13 Thread Richard Biener

Transforming -fma (-a, b, -c) to fma (a, b, c) is only valid when
not rounding towards -inf or +inf as the sign of the multiplication
changes.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

Richard.

PR middle-end/116891
* match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
---
 gcc/match.pd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index eff1ace87f5..b65998f201d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9482,7 +9482,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (IFN_FMA @0 @1 @2))
  (simplify
   (negate (IFN_FNMS@3 @0 @1 @2))
-  (if (single_use (@3))
+  (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
(IFN_FMA @0 @1 @2
 
 /* CLZ simplifications.  */
-- 
2.43.0

Re: [PATCH v7] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Kito Cheng

On Mon, Oct 14, 2024 at 1:40 PM Yangyu Chen  wrote:
>
>
>
> > On Oct 14, 2024, at 12:32, Kito Cheng  wrote:
> >
> > I prefer not to handle the extension implication rules, as it's easy
> > for them to get out of sync, and this should be the linux kernel's
> > responsibility rather than libgcc's.
> >
>
> I still prefer to handle the extension here since users may use old
> versions / LTS versions of the Linux kernel, which have no some
> implied extensions.
>
> I have another patch to let GCC set the minimal required bits to
> provide the best compatibility. However, in some cases, such as
> some targets requiring Zve32x but not full V and users who have
> kernel version < v6.11, the resolver will not probe the Zve32x
> extension. Thus, we will not use the corresponding version. If we
> check V in zve32x ifunc resolver, building it in gimple will make
> it harder to maintain. Thus, I think both handling minimal extension
> in libgcc and having the minimal feature bits to generate the IFUNC
> resolver will be a better way to handle this.
>
> Meanwhile, handling it in this way does not increase the maintenance
> overhead since we don’t need to handle the extensions and implied
> extensions appear in the same version of the Linux kernel. So implied
> extensions like Zbc->Zbkc does not need to be handled here. Till now,
> we only need the handle C->Zca, Zcd, Zcf and V->Zve32x, Zve32f,
> Zve64x, Zve64f, Zve64d.

I am OK with minimal extension implication rules, like what you implement now,
but I am still concerned about implementing full rules.

ping: [PATCH] Fix crash with constant initializer

2024-10-13 Thread Eric Botcazou

The original submission patch is at:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-September/664127.html

Thanks in advance.

-- 
Eric Botcazou

[PATCH 2/5] passes: Allow for second param for NEXT_PASS

2024-10-13 Thread Andrew Pinski

Right now we currently only support 1 parameter for each pass in NEXT_PASS.
We also don't error out if someone tries to use more than 1.
This adds support for more than one but only to a max of max_number_args
(which is currently 2).
In the next patch, this will be used for DCE, adding a new parameter.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gen-pass-instances.awk (END): Handle processing
of multiple arguments to NEXT_PASS. Also error out
if using more than max_number_args (2).
* pass_manager.h (NEXT_PASS_WITH_ARG2): New define.
* passes.cc (NEXT_PASS_WITH_ARG2): New define.

Signed-off-by: Andrew Pinski 
---
 gcc/gen-pass-instances.awk | 24 +++-
 gcc/pass_manager.h |  1 +
 gcc/passes.cc  |  8 
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index f56b8072ed5..def09347765 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -195,6 +195,7 @@ function replace_pass(line, fnname, num, i)
 }
 
 END {
+  max_number_args = 2;
   for (i = 1; i < lineno; i++)
 {
   ret = parse_line(lines[i], "NEXT_PASS");
@@ -202,7 +203,9 @@ END {
{
  # Set pass_name argument, an optional with_arg argument
  pass_name = args[1];
- with_arg = args[2];
+ num_args = 0;
+ while (args[num_args + 2])
+   num_args++;
 
  # Set pass_final_counts
  if (pass_name in pass_final_counts)
@@ -214,13 +217,22 @@ END {
 
  # Print call expression with extra pass_num argument
  printf "%s", prefix;
- if (with_arg)
-   printf "NEXT_PASS_WITH_ARG";
+ if (num_args > 0)
+   {
+ printf "NEXT_PASS_WITH_ARG";
+ if (num_args > max_number_args)
+   {
+ print "ERROR: Only supports up to " max_number_args " args to 
NEXT_PASS";
+ exit 1;
+   }
+ if (num_args != 1)
+   printf num_args;
+   }
  else
printf "NEXT_PASS";
  printf " (%s, %s", pass_name, pass_num;
- if (with_arg)
-   printf ",%s", with_arg;
+ for (j = 0; j < num_args; j++)
+   printf ",%s", args[j+2];
  printf ")%s\n", postfix;
 
  continue;
@@ -254,6 +266,8 @@ END {
   print "#undef POP_INSERT_PASSES"
   print "#undef NEXT_PASS"
   print "#undef NEXT_PASS_WITH_ARG"
+  for (i = 2; i <= max_number_args; i++)
+print "#undef NEXT_PASS_WITH_ARG" i
   print "#undef TERMINATE_PASS_LIST"
 }
 
diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index 5a78d3fe56b..f18ae026257 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -130,6 +130,7 @@ private:
 #define POP_INSERT_PASSES()
 #define NEXT_PASS(PASS, NUM) opt_pass *PASS ## _ ## NUM
 #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
+#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) NEXT_PASS (PASS, NUM)
 #define TERMINATE_PASS_LIST(PASS)
 
 #include "pass-instances.def"
diff --git a/gcc/passes.cc b/gcc/passes.cc
index 3abae971ace..b5475fce522 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -1589,6 +1589,7 @@ pass_manager::pass_manager (context *ctxt)
 #define POP_INSERT_PASSES()
 #define NEXT_PASS(PASS, NUM) PASS ## _ ## NUM = NULL
 #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
+#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) NEXT_PASS (PASS, NUM)
 #define TERMINATE_PASS_LIST(PASS)
 #include "pass-instances.def"
 
@@ -1635,6 +1636,13 @@ pass_manager::pass_manager (context *ctxt)
   PASS ## _ ## NUM->set_pass_param (0, ARG);   \
 } while (0)
 
+#define NEXT_PASS_WITH_ARG2(PASS, NUM, ARG0, ARG1) \
+do {   \
+  NEXT_PASS (PASS, NUM);   \
+  PASS ## _ ## NUM->set_pass_param (0, ARG0);  \
+  PASS ## _ ## NUM->set_pass_param (1, ARG1);  \
+} while (0)
+
 #include "pass-instances.def"
 
   /* Register the passes with the tree dump code.  */
-- 
2.43.0

[PATCH 1/5] passes: Move #undef to pass-instances.def

2024-10-13 Thread Andrew Pinski

Like what was done r6-4608-g0aad01985747ab for builtins.def/DEF_BUILTIN,
the same should be done for the defines that are used for pass-instances.def.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gen-pass-instances.awk: Print out the #undefs.
* pass_manager.h Don't #undef INSERT_PASSES_AFTER,
PUSH_INSERT_PASSES_WITHIN, POP_INSERT_PASSES, NEXT_PASS,
NEXT_PASS_WITH_ARG, and TERMINATE_PASS_LIST.
* passes.cc: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/gen-pass-instances.awk |  7 +++
 gcc/pass_manager.h |  7 ---
 gcc/passes.cc  | 13 -
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 871ac0cdb52..f56b8072ed5 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -248,6 +248,13 @@ END {
 
   print lines[i];
 }
+  # print out the #undefs
+  print "#undef INSERT_PASSES_AFTER"
+  print "#undef PUSH_INSERT_PASSES_WITHIN"
+  print "#undef POP_INSERT_PASSES"
+  print "#undef NEXT_PASS"
+  print "#undef NEXT_PASS_WITH_ARG"
+  print "#undef TERMINATE_PASS_LIST"
 }
 
 # Local Variables:
diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index edd775e9a9c..5a78d3fe56b 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -134,13 +134,6 @@ private:
 
 #include "pass-instances.def"
 
-#undef INSERT_PASSES_AFTER
-#undef PUSH_INSERT_PASSES_WITHIN
-#undef POP_INSERT_PASSES
-#undef NEXT_PASS
-#undef NEXT_PASS_WITH_ARG
-#undef TERMINATE_PASS_LIST
-
 }; // class pass_manager
 
 } // namespace gcc
diff --git a/gcc/passes.cc b/gcc/passes.cc
index 775c3e46302..3abae971ace 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -1591,12 +1591,6 @@ pass_manager::pass_manager (context *ctxt)
 #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
 #define TERMINATE_PASS_LIST(PASS)
 #include "pass-instances.def"
-#undef INSERT_PASSES_AFTER
-#undef PUSH_INSERT_PASSES_WITHIN
-#undef POP_INSERT_PASSES
-#undef NEXT_PASS
-#undef NEXT_PASS_WITH_ARG
-#undef TERMINATE_PASS_LIST
 
   /* Initialize the pass_lists array.  */
 #define DEF_PASS_LIST(LIST) pass_lists[PASS_LIST_NO_##LIST] = &LIST;
@@ -1643,13 +1637,6 @@ pass_manager::pass_manager (context *ctxt)
 
 #include "pass-instances.def"
 
-#undef INSERT_PASSES_AFTER
-#undef PUSH_INSERT_PASSES_WITHIN
-#undef POP_INSERT_PASSES
-#undef NEXT_PASS
-#undef NEXT_PASS_WITH_ARG
-#undef TERMINATE_PASS_LIST
-
   /* Register the passes with the tree dump code.  */
   register_dump_files (all_lowering_passes);
   register_dump_files (all_small_ipa_passes);
-- 
2.43.0

[PATCH 3/5] dce: add remove_unused_locals conditionally to the todos [PR117096]

2024-10-13 Thread Andrew Pinski

This is the updated patch with the suggestion from:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665217.html
Where we use a second arg/param to set which passes we want to have
the remove_unused_locals on the dce.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimize/117096
* passes.def: Update some of the dce/cd-cde passes setting
the 2nd arg to true.
Also remove comment about stdarg since dce does it.
* tree-ssa-dce.cc (pass_dce): Add remove_unused_locals_p field.
Update set_pass_param to allow for 2nd param.
Use remove_unused_locals_p in execute to return 
TODO_remove_unused_locals.
(pass_cd_dce): Likewise.
* tree-stdarg.cc (pass_data_stdarg): Remove TODO_remove_unused_locals.

Signed-off-by: Andrew Pinski 
---
 gcc/passes.def  | 11 ---
 gcc/tree-ssa-dce.cc | 18 ++
 gcc/tree-stdarg.cc  |  2 +-
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 40162ac20a0..7d01227eed1 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_early_vrp);
  NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_dse);
- NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+ NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */, true /* 
remove_unused_locals */);
  NEXT_PASS (pass_phiopt, true /* early_p */);
  NEXT_PASS (pass_tail_recursion);
  NEXT_PASS (pass_if_to_switch);
@@ -225,10 +225,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_vrp, false /* final_p*/);
   NEXT_PASS (pass_array_bounds);
   NEXT_PASS (pass_dse);
-  NEXT_PASS (pass_dce);
-  /* pass_stdarg is always run and at this point we execute
- TODO_remove_unused_locals to prune CLOBBERs of dead
-variables which are otherwise a churn on alias walkings.  */
+  NEXT_PASS (pass_dce, false /* update_address_taken_p */, true /* 
remove_unused_locals */);
   NEXT_PASS (pass_stdarg);
   NEXT_PASS (pass_call_cdce);
   NEXT_PASS (pass_cselim);
@@ -273,7 +270,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   NEXT_PASS (pass_dse, true /* use DR analysis */);
-  NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_dce, false /* update_address_taken_p */, false /* 
remove_unused_locals */);
   /* Pass group that runs when 1) enabled, 2) there are loops
 in the function.  Make sure to run pass_fix_loops before
 to discover/remove loops before running the gate function
@@ -355,7 +352,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_warn_restrict);
   NEXT_PASS (pass_dse);
-  NEXT_PASS (pass_dce, true /* update_address_taken_p */);
+  NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /* 
remove_unused_locals */);
   /* After late DCE we rewrite no longer addressed locals into SSA
 form if possible.  */
   NEXT_PASS (pass_forwprop);
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 69249c73013..66612b5d575 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2096,18 +2096,23 @@ public:
   opt_pass * clone () final override { return new pass_dce (m_ctxt); }
   void set_pass_param (unsigned n, bool param) final override
 {
-  gcc_assert (n == 0);
-  update_address_taken_p = param;
+  gcc_assert (n == 0 || n == 1);
+  if (n == 0)
+   update_address_taken_p = param;
+  else if (n == 1)
+   remove_unused_locals_p = param;
 }
   bool gate (function *) final override { return flag_tree_dce != 0; }
   unsigned int execute (function *) final override
 {
   return (tree_ssa_dce ()
+ | (remove_unused_locals_p ? TODO_remove_unused_locals : 0)
  | (update_address_taken_p ? TODO_update_address_taken : 0));
 }
 
 private:
   bool update_address_taken_p;
+  bool remove_unused_locals_p = false;
 }; // class pass_dce
 
 } // anon namespace
@@ -2144,18 +2149,23 @@ public:
   opt_pass * clone () final override { return new pass_cd_dce (m_ctxt); }
   void set_pass_param (unsigned n, bool param) final override
 {
-  gcc_assert (n == 0);
-  update_address_taken_p = param;
+  gcc_assert (n == 0 || n == 1);
+  if (n == 0)
+   update_address_taken_p = param;
+  else if (n == 1)
+   remove_unused_locals_p = param;
 }
   bool gate (function *) final override { return flag_tree_dce != 0; }
   unsigned int execute (function *) final override
 {
   return (tree_ssa_cd_dce ()
+ | (remove_unused_locals_p ? TODO_remove_unused_locals : 0)
  | (update_address_taken_p ? TODO_update_address_taken : 0));
 }
 
 private:
   bool update_address

Re: [PATCH v7] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Kito Cheng

I prefer not to handle the extension implication rules, as it's easy
for them to get out of sync, and this should be the linux kernel's
responsibility rather than libgcc's.

> +struct {
> +  unsigned vendorID;

This field is gone since it moved to __riscv_cpu_model

> +  unsigned length;
> +  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
> +} __riscv_vendor_feature_bits __attribute__ ((visibility ("hidden"), 
> nocommon));
> +
> +struct {
> +  unsigned mvendorid;
> +  unsigned long long marchid;
> +  unsigned long long mimpid;

although it mismatch with what riscv-c-api-doc defined but you are
right, it's MXLEN, rather than 32 bits only, also MXLEN may larger than
UXLEN which is legal, so unsigned long long is the right

> +} __riscv_cpu_model __attribute__ ((visibility ("hidden"), nocommon));
> +

Re: [PATCH v7] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Yangyu Chen

> On Oct 14, 2024, at 12:32, Kito Cheng  wrote:
> 
> I prefer not to handle the extension implication rules, as it's easy
> for them to get out of sync, and this should be the linux kernel's
> responsibility rather than libgcc's.
> 

I still prefer to handle the extension here since users may use old
versions / LTS versions of the Linux kernel, which have no some
implied extensions.

I have another patch to let GCC set the minimal required bits to
provide the best compatibility. However, in some cases, such as
some targets requiring Zve32x but not full V and users who have
kernel version < v6.11, the resolver will not probe the Zve32x
extension. Thus, we will not use the corresponding version. If we
check V in zve32x ifunc resolver, building it in gimple will make
it harder to maintain. Thus, I think both handling minimal extension
in libgcc and having the minimal feature bits to generate the IFUNC
resolver will be a better way to handle this.

Meanwhile, handling it in this way does not increase the maintenance
overhead since we don’t need to handle the extensions and implied
extensions appear in the same version of the Linux kernel. So implied
extensions like Zbc->Zbkc does not need to be handled here. Till now,
we only need the handle C->Zca, Zcd, Zcf and V->Zve32x, Zve32f,
Zve64x, Zve64f, Zve64d.

>> +struct {
>> +  unsigned vendorID;
> 
> This field is gone since it moved to __riscv_cpu_model
> 

I missed this update. I will fix it in the next revision.

>> +  unsigned length;
>> +  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
>> +} __riscv_vendor_feature_bits __attribute__ ((visibility ("hidden"), 
>> nocommon));
>> +
>> +struct {
>> +  unsigned mvendorid;
>> +  unsigned long long marchid;
>> +  unsigned long long mimpid;
> 
> although it mismatch with what riscv-c-api-doc defined but you are
> right, it's MXLEN, rather than 32 bits only, also MXLEN may larger than
> UXLEN which is legal, so unsigned long long is the right
> 

Thanks. Since the next revision is needed, I will submit the next
revision after that PR to modify the type of marchid and mimpid
being merged to riscv-c-api-doc.

Link: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/91

>> +} __riscv_cpu_model __attribute__ ((visibility ("hidden"), nocommon));
>> +

Re: [PATCH] middle-end/116891 - fix (negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)

2024-10-13 Thread Jakub Jelinek

On Mon, Oct 14, 2024 at 08:14:01AM +0200, Richard Biener wrote:
> Transforming -fma (-a, b, -c) to fma (a, b, c) is only valid when
> not rounding towards -inf or +inf as the sign of the multiplication
> changes.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> 
> Richard.
> 
>   PR middle-end/116891
>   * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
>   Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.

Guess it would be nice to have a testcase which FAILs without the patch and
PASSes with it, but it can be added later.

So ok.

> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -9482,7 +9482,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(IFN_FMA @0 @1 @2))
>   (simplify
>(negate (IFN_FNMS@3 @0 @1 @2))
> -  (if (single_use (@3))
> +  (if (!HONOR_SIGN_DEPENDENT_ROUNDING (type) && single_use (@3))
> (IFN_FMA @0 @1 @2
>  
>  /* CLZ simplifications.  */
> -- 
> 2.43.0

Jakub

Re: [PR middle-end/114635] Set OMP safelen handling to INT_MAX when the pragma didn’t provide one.

2024-10-13 Thread Kugan Vivekanandarajah

Hi Richard,

Thanks for the review.

> On 8 Oct 2024, at 7:15 pm, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, Aug 5, 2024 at 7:05 AM Kugan Vivekanandarajah
>  wrote:
>> 
>> 
>> 
>>> On 15 Jul 2024, at 5:18 pm, Jakub Jelinek  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Mon, Jul 15, 2024 at 12:39:22AM +, Kugan Vivekanandarajah wrote:
 OMP safelen handling is assigning backend provided max as an int even when 
 the pragma didn’t provide one. As a result, vectoriser is rejecting SVE 
 modes while comparing poly_int with the safelen.
 
 That is, for the attached test case,  omp_max_vf gets [16, 16] from the 
 backend. This then becomes 16 as omp safelen is an integer. When 
 vectoriser compares the potential vector mode with  maybe_lt (max_vf, 
 min_vf)) , this would fail resulting in any SVE vector mode being  
 selected.
 
 One suggestion there was to set safelen to INT_MAX when OMP pragma does 
 not provide safely explicitly.
>>> 
>>> This is wrong.  The reason why safelen is set to that sctx.max_vf is that if
>>> there are any "omp simd array" arrays, sctx.max_vf is their size.
>>> The code you're touching has a comment that explains it even:
>>> /* If max_vf is non-zero, then we can use only a vectorization factor
>>>up to the max_vf we chose.  So stick it into the safelen clause.  */
>>> if (maybe_ne (sctx.max_vf, 0U))
>>> 
>>> If sctx.max_vf is 0, there were no "omp simd array" arrays emitted and so
>>> OMP_CLAUSE_SAFELEN isn't set.
>>> The vectorizer can only shrink the arrays, not grow them and that is why
>>> there is this limit.
>>> 
>>> Now, I think even SVE has a limit, which is not a scalar constant but
>>> poly_int, so I think in that case you need to arrange for the size of the
>>> arrays to be POLY_INT_CST as well and use that as a limit.
>>> Now, the clause argument itself at least in the OpenMP standard needs to be 
>>> an
>>> integer constant (if provided), because the proposals to extend it for the
>>> SVE-like arches (aarch64, RISC-V) have not been voted in I believe.
>>> So, there is a question what to do if user specifies safelen (32) or
>>> something similar.
>>> But if the user didn't specify it (so it is effectively infinitity), then
>>> yes, it might be ok to set it to some POLY_INT_CST representing the sizes of
>>> the arrays and tweak the loop safelen so that it can represent those.
>> 
>> Thanks for the explanation. Does that mean:
>> 1. We change loop->safelen to poly_int
>> 2. Modify the apply_safelen  to account for the poly_int.
>> 
>> I am attaching an RFC patch for your reference.
> 
> @@ -412,10 +413,15 @@ vect_analyze_data_ref_dependence (struct
> data_dependence_relation *ddr,
>  executed concurrently, assume independence.  */
>   auto apply_safelen = [&]()
> {
> -  if (loop->safelen >= 2)
> +  if (maybe_ge (loop->safelen,  2UL))
>{
> - if ((unsigned int) loop->safelen < *max_vf)
> -   *max_vf = loop->safelen;
> + unsigned int safelen;
> + if (loop->safelen.is_constant ())
> +   safelen = loop->safelen.coeffs[0];
> + else
> +   safelen = INT_MAX;
> + if (safelen < *max_vf)
> +   *max_vf = safelen;
> 
> isn't that effectively the same as doing
> 
>  if (known_lt (safelen, *max_vf))
>*max_vf = ;
> 
> ?  Specifically at this point we have to take a _lower_ bound on 'safelen', so
> I think assuming INT_MAX here is just bogus, or at least turning loop->safelen
> into a POLY_INT just to represent a "maximum" value in a strange way is
> broken given that exact chosen POLY_INT is _not_ the actual safelen (else
> your change above is wrong?).

Doesn’t POLY_INT with two coefficients means that we can (in theory) have large 
 enough value with this? That is why I  thought max_vf in that case can reach 
INT_MAX.
> 
> Can't we just explicitly document that loop->safelen == 0 means there's no
> guaranteed concurrent evaluation but loop->safelen == -1 means it's safe
> to evaluate any number of concurrent iterations and in turn special-case
> the OMP SIMD array handling to use a proper array size in that case
> (dependent on targets)?

This sounds good to me. If you like this approach, I can revise the patch 
accordingly.

Thanks,
Kugan

> 
> Richard.
> 
>> Thanks,
>> Kugan
>> 
>> 
>> 
>> Signed-off-by: Kugan Vivekanandarajah 
>> 
>>> 
   PR middle-end/114635
   PR 114635
 
 gcc/ChangeLog:
 
   * omp-low.cc (lower_rec_input_clauses): Set INT_MAX
   when safelen is not provided instead of using backend
   provided safelen.
 
 gcc/testsuite/ChangeLog:
 
   * c-c++-common/pr114635-1.cpp: New test.
   * c-c++-common/pr114635-2.cpp: New test.
 
 Signed-off-by: Kugan Vivekanandarajah 
>>> 
>>>   Jakub

Re: [PATCH] [RFC] target/117072 - more RTL FMA canonicalization

2024-10-13 Thread Hongtao Liu

On Sun, Oct 13, 2024 at 8:02 PM Richard Biener  wrote:
>
> On Sun, 13 Oct 2024, Hongtao Liu wrote:
>
> > On Fri, Oct 11, 2024 at 8:33 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Oct 11, 2024 at 8:22 PM Richard Biener  wrote:
> > > >
> > > > The following helps the x86 backend by canonicalizing FMAs to have
> > > > any negation done to one of the commutative multiplication operands
> > > > be done to a register (and not a memory operand).  Likewise to
> > > > put a register operand first and a memory operand second;
> > > > swap_commutative_operands_p seems to treat REG_P and MEM_P the
> > > > same but comments indicate "complex expressiosn should be first".
> > > >
> > > > In particular this does (fma MEM REG REG) -> (fma REG MEM REG) and
> > > > (fma (neg MEM) REG REG) -> (fma (neg REG) MEM REG) which are the
> > > > reasons for the testsuite regressions in gcc.target/i386/cond_op_fma*.c
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > >
> > > > I'm not quite sure this is the correct approach - simplify-rtx
> > > > doesn't seem to do "only canonicalization" but the existing FMA
> > > > case looks odd in that context.
> > > >
> > > > Should the target simply reject cases with wrong "canonicalization"
> > > > or does it need to cope with all variants in the patterns that fail
> > > > matching during combine without the change?
> > > Let me try the backend fix first.
> > The backend fix requires at least 8 more patterns, so I think RTL
> > canonicalization would be better.
> > Please go ahead with the patch.
>
> I'm still looking for insights on how we usually canonicalize on RTL
> (and where we document canonicalizations) and how we maintain RTL
> in canonical form.
>
> I'm also still wondering why the backend accepts "non-canonical" RTL
> instead of rejecting it, giving the middle-end the chance to try
> an alternative variant?
So you mean middle-end will alway canonicalize (fma: reg mem reg) to
(fma: mem reg reg)?
I only saw that RTL will canonicalize (fma: a (neg: b) c) to (fma: (neg a) b c).
Or what do you mean about "non-canonical" RTL in the backend?

>
> Richard.
>
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > PR target/117072
> > > > * simplify-rtx.cc 
> > > > (simplify_context::simplify_ternary_operation):
> > > > Adjust FMA canonicalization.
> > > > ---
> > > >  gcc/simplify-rtx.cc | 15 +--
> > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > > > index e8e60404ef6..8b4fa0d7aa4 100644
> > > > --- a/gcc/simplify-rtx.cc
> > > > +++ b/gcc/simplify-rtx.cc
> > > > @@ -6830,10 +6830,21 @@ simplify_context::simplify_ternary_operation 
> > > > (rtx_code code, machine_mode mode,
> > > > op0 = tem, op1 = XEXP (op1, 0), any_change = true;
> > > > }
> > > >
> > > > -  /* Canonicalize the two multiplication operands.  */
> > > > +  /* Canonicalize the two multiplication operands.  A negation
> > > > +should go first and if possible the negation should be
> > > > +to a register.  */
> > > >/* a * -b + c  =>  -b * a + c.  */
> > > > -  if (swap_commutative_operands_p (op0, op1))
> > > > +  if (swap_commutative_operands_p (op0, op1)
> > > > + || (REG_P (op1) && !REG_P (op0) && GET_CODE (op0) != NEG))
> > > > std::swap (op0, op1), any_change = true;
> > > > +  else if (GET_CODE (op0) == NEG && !REG_P (XEXP (op0, 0))
> > > > +  && REG_P (op1))
> > > > +   {
> > > > + op0 = XEXP (op0, 0);
> > > > + op1 = simplify_gen_unary (NEG, mode, op1, mode);
> > > > + std::swap (op0, op1);
> > > > + any_change = true;
> > > > +   }
> > > >
> > > >if (any_change)
> > > > return gen_rtx_FMA (mode, op0, op1, op2);
> > > > --
> > > > 2.43.0
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)



-- 
BR,
Hongtao

Re: [PATCH] SVE intrinsics: Fold constant operands for svlsl.

2024-10-13 Thread Soumya AR

Pinging with updated subject, had missed the [PATCH] header before.

Regards,
Soumya

> On 24 Sep 2024, at 2:00 PM, Soumya AR  wrote:
> 
> This patch implements constant folding for svlsl. Test cases have been added 
> to
> check for the following cases:
> 
> Zero, merge, and don't care predication.
> Shift by 0.
> Shift by register width.
> Overflow shift on signed and unsigned integers.
> Shift on a negative integer.
> Maximum possible shift, eg. shift by 7 on an 8-bit integer.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Soumya AR 
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::fold):
> Try constant folding.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/sve/const_fold_lsl_1.c: New test.
> 
> <0001-SVE-intrinsics-Fold-constant-operands-for-svlsl.patch>

Re: [PATCH v7] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-13 Thread Yangyu Chen




> On Oct 14, 2024, at 14:12, Kito Cheng  wrote:
> 
> On Mon, Oct 14, 2024 at 1:40 PM Yangyu Chen  
> wrote:
>> 
>>> On Oct 14, 2024, at 12:32, Kito Cheng  wrote:
>>> 
>>> I prefer not to handle the extension implication rules, as it's easy
>>> for them to get out of sync, and this should be the linux kernel's
>>> responsibility rather than libgcc's.
>> I still prefer to handle the extension here since users may use old
>> versions / LTS versions of the Linux kernel, which have no some
>> implied extensions.
>> 
>> I have another patch to let GCC set the minimal required bits to
>> provide the best compatibility. However, in some cases, such as
>> some targets requiring Zve32x but not full V and users who have
>> kernel version < v6.11, the resolver will not probe the Zve32x
>> extension. Thus, we will not use the corresponding version. If we
>> check V in zve32x ifunc resolver, building it in gimple will make
>> it harder to maintain. Thus, I think both handling minimal extension
>> in libgcc and having the minimal feature bits to generate the IFUNC
>> resolver will be a better way to handle this.
>> 
>> Meanwhile, handling it in this way does not increase the maintenance
>> overhead since we don’t need to handle the extensions and implied
>> extensions appear in the same version of the Linux kernel. So implied
>> extensions like Zbc->Zbkc does not need to be handled here. Till now,
>> we only need the handle C->Zca, Zcd, Zcf and V->Zve32x, Zve32f,
>> Zve64x, Zve64f, Zve64d.
> 
> I am OK with minimal extension implication rules, like what you implement now,
> but I am still concerned about implementing full rules.

I didn't know the reason for the concerns. Maybe I didn't tell it
clearly.

I want both sets of implied extensions in libgcc here and also to
use a minimal feature bit for the IFUNC resolver to solve. I think
this is the safest way to provide compatibility for both older
versions of Linux kernels and users who only set a minimal extension
(like Zve32x but no V).

Would this be acceptable? If OK, I will fix the C->Zcd for RV64 only
and submit the next revision.

Thanks,
Yangyu Chen

58 matches

Mail list logo