Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Andrew Pinski
On Tue, Mar 4, 2025 at 9:49 AM Richard Sandiford
 wrote:
>
> Wilco Dijkstra  writes:
> > Hi Richard&Kyrill,
> >
> >>> I’m in favour of this.
> >>
> >> Yeah, seems ok to me too.  I suppose we ought to update the documentation 
> >> too:
> >
> > I've added a note to the documentation. However it is impossible to be 
> > complete here
> > since many targets switch off early scheduling under various circumstances.
>
> Yeah, that's fair. :)
>
> > So I've just mentioned what x86 and AArch64 do (see v2 below).
>
> SGTM.  Getting it right for two target is a strict improvement
> over the status quo.  Wording suggestion below though:

Note the documentation issue for x86 not enabling the pre-RA scheduler
at -O2+ is PR 38768; though it was closed as fixed but it was not
actually fixed (I think there was a missing reading of the sources).
So please add a reference to PR 38768 in the commit message.

Thanks,
Andrew Pinski

>
> > Cheers,
> > Wilco
> >
> >
> > v2: Update documentation
> >
> > Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15 
> > benefits
> > from much faster build times with -O2, but avoids the regressions in lbm 
> > which
> > is very sensitive to minor scheduling changes due to long FMA chains.
> >
> > gcc:
> > PR target/118351
> > * common/config/aarch64/aarch64-common.cc: Enable early scheduling 
> > with
> > -O3 and higher.
> > * doc/invoke.texi (-fschedule-insns): Update comment.
> >
> > ---
> >
> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> > b/gcc/common/config/aarch64/aarch64-common.cc
> > index 
> > 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd
> >  100644
> > --- a/gcc/common/config/aarch64/aarch64-common.cc
> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
> > @@ -54,8 +54,10 @@ static const struct default_options 
> > aarch_option_optimization_table[] =
> >  { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 },
> >  /* Enable -fsched-pressure by default when optimizing.  */
> >  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> > -/* Disable early scheduling due to high compile-time overheads.  */
> > +/* Except for -O3 and higher, disable early scheduling due to high
> > +   compile-time overheads.  */
> >  { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> > +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 },
> >  /* Enable redundant extension instructions removal at -O2 and higher.  
> > */
> >  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> >  { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 
> > bad49a017cc1bb0922eba9f0b2db80166d409cd7..5eb2e96c483126ad075b427969e8c02ce3f51e7a
> >  100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13503,7 +13503,9 @@ helps machines that have slow floating point or 
> > memory load instructions
> >  by allowing other instructions to be issued until the result of the load
> >  or floating-point instruction is required.
> >
> > -Enabled at levels @option{-O2}, @option{-O3}.
> > +Enabled at levels @option{-O2}, @option{-O3}.  Many targets use a different
> > +default, for example, it is disabled on x86 and enabled only at level
> > +@option{-O3} on AArch64.
>
> Very minor, and this is going to be personal taste, but how about:
>
> Conventionally enabled at optimization levels @option{-O2} and @option{-O3}.
> However, many targets override this behavior.  For example, on x86, it is
> disabled at all levels, while on AArch64, it is enabled only at @option{-O3}.
>
> (I'm just trying to avoid the second sentence contradicting the first.)
>
> OK with that change if noone objects in 24hrs.
>
> Thanks,
> Richard
>


Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard&Kyrill,
>
>>> I’m in favour of this.
>>
>> Yeah, seems ok to me too.  I suppose we ought to update the documentation 
>> too:
>
> I've added a note to the documentation. However it is impossible to be 
> complete here
> since many targets switch off early scheduling under various circumstances.

Yeah, that's fair. :)

> So I've just mentioned what x86 and AArch64 do (see v2 below).

SGTM.  Getting it right for two target is a strict improvement
over the status quo.  Wording suggestion below though:

> Cheers,
> Wilco
>
>
> v2: Update documentation
>
> Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15 benefits
> from much faster build times with -O2, but avoids the regressions in lbm which
> is very sensitive to minor scheduling changes due to long FMA chains.
>
> gcc:
> PR target/118351
> * common/config/aarch64/aarch64-common.cc: Enable early scheduling 
> with
> -O3 and higher.
> * doc/invoke.texi (-fschedule-insns): Update comment.
>
> ---
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -54,8 +54,10 @@ static const struct default_options 
> aarch_option_optimization_table[] =
>  { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 },
>  /* Enable -fsched-pressure by default when optimizing.  */
>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> -/* Disable early scheduling due to high compile-time overheads.  */
> +/* Except for -O3 and higher, disable early scheduling due to high
> +   compile-time overheads.  */
>  { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 },
>  /* Enable redundant extension instructions removal at -O2 and higher.  */
>  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>  { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> bad49a017cc1bb0922eba9f0b2db80166d409cd7..5eb2e96c483126ad075b427969e8c02ce3f51e7a
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13503,7 +13503,9 @@ helps machines that have slow floating point or 
> memory load instructions
>  by allowing other instructions to be issued until the result of the load
>  or floating-point instruction is required.
>  
> -Enabled at levels @option{-O2}, @option{-O3}.
> +Enabled at levels @option{-O2}, @option{-O3}.  Many targets use a different
> +default, for example, it is disabled on x86 and enabled only at level
> +@option{-O3} on AArch64.

Very minor, and this is going to be personal taste, but how about:

Conventionally enabled at optimization levels @option{-O2} and @option{-O3}.
However, many targets override this behavior.  For example, on x86, it is
disabled at all levels, while on AArch64, it is enabled only at @option{-O3}.

(I'm just trying to avoid the second sentence contradicting the first.)

OK with that change if noone objects in 24hrs.

Thanks,
Richard



Re: [PATCH v2] c++: Check invalid use of constrained auto with trailing return type [PR100589]

2025-03-04 Thread Jason Merrill

On 3/4/25 9:51 AM, Patrick Palka wrote:

On Sun, 2 Mar 2025, Da Xie wrote:


Add check for constrained auto type specifier in function declaration or
function type declaration with trailing return type. Issue error if such
usage is detected.

Test file renamed, and added a new test for type declaration.

Successfully bootstrapped and regretested on x86_64-pc-linux-gnu:
Added 6 passed and 4 unsupported tests.

PR c++/100589

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Issue an error for a declarator with
constrained auto type specifier and trailing return types. Include
function names if available.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr100589.C: New test.


LGTM, thanks!  Jason, shall I push this to trunk?


Please.



Re: [PATCH] c++: wrong targs printed in hard satisfaction error [PR99214]

2025-03-04 Thread Jason Merrill

On 3/4/25 2:49 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a hard error (or a
constraint recursion error) that occurred during satisfaction, e.g. for
the below testcase without this patch we emit:
   In substitution of '... void A::f() [with U = int]'
and with this patch we emit:
   In substitution of '... void A::f() [with U = char; T = int]'.

This patch fixes this by always passing the most general template to
push_tinst_level.


That soungs good, but getting it by passing it back from 
get_normalized_constraints_from_decl seems confusing; I'd think we 
should calculate it in parallel to changing args to correspond to that 
template.



PR c++/99214

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): New
out-parameter GEN_D.
(satisfy_declaration_constraints): Use it to pass the most
general version of T to push_tinst_level.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic20.C: New test.
---
  gcc/cp/constraint.cc | 15 +++
  gcc/testsuite/g++.dg/concepts/diagnostic20.C | 15 +++
  2 files changed, 26 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic20.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a9caba8e2cc..f688a99c5fd 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -648,10 +648,13 @@ get_normalized_constraints_from_info (tree ci, tree 
in_decl, bool diag = false)
return t;
  }
  
-/* Returns the normalized constraints for the declaration D.  */

+/* Returns the normalized constraints for the declaration D.
+   If GEN_D is non-NULL, sets *GEN_D to the most general version
+   of D that ultimately owns its constraints.  */
  
  static tree

-get_normalized_constraints_from_decl (tree d, bool diag = false)
+get_normalized_constraints_from_decl (tree d, bool diag = false,
+ tree *gen_d = nullptr)
  {
tree tmpl;
tree decl;
@@ -716,6 +719,8 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
  tmpl = most_general_template (tmpl);
  
d = tmpl ? tmpl : decl;

+  if (gen_d)
+*gen_d = d;
  
/* If we're not diagnosing errors, use cached constraints, if any.  */

if (!diag)
@@ -2730,9 +2735,11 @@ satisfy_declaration_constraints (tree t, tree args, 
sat_info info)
  return boolean_true_node;
  
tree result = boolean_true_node;

-  if (tree norm = get_normalized_constraints_from_decl (t, info.noisy ()))
+  tree gen_t;
+  if (tree norm = get_normalized_constraints_from_decl (t, info.noisy (),
+   &gen_t))
  {
-  if (!push_tinst_level (t, args))
+  if (!push_tinst_level (gen_t, args))
return result;
tree pattern = DECL_TEMPLATE_RESULT (t);
push_to_top_level ();
diff --git a/gcc/testsuite/g++.dg/concepts/diagnostic20.C 
b/gcc/testsuite/g++.dg/concepts/diagnostic20.C
new file mode 100644
index 000..b8d586e9a21
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/diagnostic20.C
@@ -0,0 +1,15 @@
+// PR c++/99214
+// { dg-do compile { target c++20 } }
+
+template 
+struct A {
+  template  static void f() requires ([] { return U::fail; }()); // { dg-error 
"fail" }
+  template  static void f();
+};
+
+int main() {
+  A::f();
+}
+
+// This matches the context line "In substitution of '... [with U = char; T = 
int]'"
+// { dg-message "U = char; T = int" "" { target *-*-* } 0 }




Re: [patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists

2025-03-04 Thread Thomas Koenig

Hi Andre,


while the patch looks ok to me, why did you not choose to generate a "22.7.2
Variable-Length Parameter Lists"

https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Variable-Number-of-Arguments.html

when the arguments differ? Then its the callee responsibility to figure
stuff out. Just out of curiosity. Is this prohibited by any standard? My
question is just out of interest. No need to change the patch. Having a warning
is better, then running into "WTF?".


A variable-length parameter list is something that the caller and
the callee must agree on, otherwise this can (and will) cause breakage
on many ABIs - aarch64 and Power among them. x86_64 is an exception,
there it is mostly harmless.


Look good to me, ok for mainline.


Committed as r15-7817-g21ca9153ebe525b077ac96811cfd48be6b259e7e .

Thanks for the review!

I will mention the new option in changes.html shortly.

Best regards

Thomas



Re: [PATCH] go/119098 - bump libgo version for GCC 15 release

2025-03-04 Thread Ian Lance Taylor
On Tue, Mar 4, 2025 at 2:12 AM Richard Biener  wrote:

> As PR119098 shows the gccgo driver uses the install path compiled
> into the shared libgo and thus updating that but not the driver
> will cause the driver to fail to find other tools like cgo.
>
> Thus bump the SONAME for GCC 15 as well, as we've done for each
> release in the past.
>
> I think libgo changes go through another "upstream", so Ian, can you
> please take care of this?
>

Thanks, committed.

I'll try to fix the underlying issue.

Ian


[PATCH] c++: wrong targs printed in hard satisfaction error [PR99214]

2025-03-04 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a hard error (or a
constraint recursion error) that occurred during satisfaction, e.g. for
the below testcase without this patch we emit:
  In substitution of '... void A::f() [with U = int]'
and with this patch we emit:
  In substitution of '... void A::f() [with U = char; T = int]'.

This patch fixes this by always passing the most general template to
push_tinst_level.

PR c++/99214

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): New
out-parameter GEN_D.
(satisfy_declaration_constraints): Use it to pass the most
general version of T to push_tinst_level.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic20.C: New test.
---
 gcc/cp/constraint.cc | 15 +++
 gcc/testsuite/g++.dg/concepts/diagnostic20.C | 15 +++
 2 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic20.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a9caba8e2cc..f688a99c5fd 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -648,10 +648,13 @@ get_normalized_constraints_from_info (tree ci, tree 
in_decl, bool diag = false)
   return t;
 }
 
-/* Returns the normalized constraints for the declaration D.  */
+/* Returns the normalized constraints for the declaration D.
+   If GEN_D is non-NULL, sets *GEN_D to the most general version
+   of D that ultimately owns its constraints.  */
 
 static tree
-get_normalized_constraints_from_decl (tree d, bool diag = false)
+get_normalized_constraints_from_decl (tree d, bool diag = false,
+ tree *gen_d = nullptr)
 {
   tree tmpl;
   tree decl;
@@ -716,6 +719,8 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
 tmpl = most_general_template (tmpl);
 
   d = tmpl ? tmpl : decl;
+  if (gen_d)
+*gen_d = d;
 
   /* If we're not diagnosing errors, use cached constraints, if any.  */
   if (!diag)
@@ -2730,9 +2735,11 @@ satisfy_declaration_constraints (tree t, tree args, 
sat_info info)
 return boolean_true_node;
 
   tree result = boolean_true_node;
-  if (tree norm = get_normalized_constraints_from_decl (t, info.noisy ()))
+  tree gen_t;
+  if (tree norm = get_normalized_constraints_from_decl (t, info.noisy (),
+   &gen_t))
 {
-  if (!push_tinst_level (t, args))
+  if (!push_tinst_level (gen_t, args))
return result;
   tree pattern = DECL_TEMPLATE_RESULT (t);
   push_to_top_level ();
diff --git a/gcc/testsuite/g++.dg/concepts/diagnostic20.C 
b/gcc/testsuite/g++.dg/concepts/diagnostic20.C
new file mode 100644
index 000..b8d586e9a21
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/diagnostic20.C
@@ -0,0 +1,15 @@
+// PR c++/99214
+// { dg-do compile { target c++20 } }
+
+template 
+struct A {
+  template  static void f() requires ([] { return U::fail; }()); // { 
dg-error "fail" }
+  template  static void f();
+};
+
+int main() {
+  A::f();
+}
+
+// This matches the context line "In substitution of '... [with U = char; T = 
int]'"
+// { dg-message "U = char; T = int" "" { target *-*-* } 0 }
-- 
2.49.0.rc0.57.gdb91954e18



Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Wilco Dijkstra
Hi Ramana,

> -Generate code for the large code model.  This makes no assumptions about
> -addresses and sizes of sections.  Programs can be statically linked only.  
> The
> +Generate code for the large code model.  This allows large .bss and .data
> +sections, however .text and .rodata must still be < 2GiB in total.  Position
> +independent code or statically linked libraries are not supported.  The
>  @option{-mcmodel=large} option is incompatible with @option{-mabi=ilp32},
> -@option{-fpic} and @option{-fPIC}.
> +@option{-fpie}, @option{-fPIE}, @option{-fpic}, @option{-fPIC},
> +@option{-static}, @option{-moutline-atomics}.

> What's the issue with -static ?

Basically the small and large model are fundamentally incompatible. The infamous
"dumb linker" approach means it doesn't try to sort sections, so an ADRP 
relocation
will be out of reach if its data is placed after a huge array. Static linking 
with GLIBC or
enabling outline atomics are in fact the same issue.

As I proposed for the medium model, the solution is to place large arrays into
specially named sections so that they are ordered last. That means all "small" 
data
referenced via ADRP will be in range, and thus using outline atomics or mixing
small/medium model just work. The large model will have to do the same.

Cheers,
Wilco



Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Hongtao Liu
On Tue, Mar 4, 2025 at 6:31 PM Richard Biener
 wrote:
>
> On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford
>  wrote:
> >
> > Richard Sandiford  writes:
> > > Jan Hubicka  writes:
> > >>>
> > >>> Thanks for running these.  I saw poor results for perlbench with my
> > >>> initial aarch64 hooks because the hooks reduced the cost to zero for
> > >>> the entry case:
> > >>>
> > >>> auto entry_cost = targetm.callee_save_cost
> > >>>   (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> > >>>ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> > >>>allocated_callee_save_regs, existing_spills_p);
> > >>> /* In the event of a tie between caller-save and callee-save,
> > >>>prefer callee-save.  We apply this to the entry cost rather
> > >>>than the exit cost since the entry frequency must be at
> > >>>least as high as the exit frequency.  */
> > >>> if (entry_cost > 0)
> > >>>   entry_cost -= 1;
> > >>>
> > >>> I "fixed" that by bumping the cost to a minimum of 2, but I was
> > >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 
> > >>> 1",
> > >>> so that the cost is always greater than not using a callee save for
> > >>> registers that don't cross a call.  WDYT?
> > >>
> > >> For x86 perfomance costs, the push cost should be memory_move_cost which
> > >> is 6, -2 for adjustment in the target hook and -1 for this. So cost
> > >> should not be 0 I think.
> > >>
> > >> For size cost, I currently return 1, so we indeed get 0 after
> > >> adjustment.
> > >>
> > >> I think cost of 0 will make us to pick callee save even if caller save
> > >> is available and there are no function calls, so I guess we do not want
> > >> that
> > >
> > > OK, here's an updated patch that makes that change.  The x86 parts
> > > should be replaced by your patch.
> > >
> > > Tested on aarch64-linux-gnu.  I also tried to test on 
> > > pwoerpc64el-linux-gnu
> > > (on gcc112), but I keep getting broken pipes during the test runs,
> > > so I'm struggling to get good before/after comparisons.  It does at
> > > least bootstrap though...
> >
> > Here's the patch with Honza's x86 changes.  Boostrapped & regresiion-tested
> > on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120).  The powerpc64le
> > results regressed:
> >
> > FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 i == 5
> > FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 sizeof 
> > (a) == 17 * sizeof (short)
> >
> > but the same test already failed for -O2 and -O3.
> >
> > OK to install now?  Or, given the lateness in the release cycle,
> > would it be better to wait for GCC 16?
>
> I think it's OK to install now.  Not installing anything isn't an option, the
> alternative would be to at least revert HJs change.
I'm hoping to install this patch in GCC15.
>
> Thanks,
> Richard.
>
> >
> > Thanks,
> > Richard
> >
> >
> > Following on from the discussion in:
> >
> >   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html
> >
> > this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
> > replaces it with two hooks: one that controls the cost of using an
> > extra callee-saved register and one that controls the cost of allocating
> > a frame for the first spill.
> >
> > (The patch does not attempt to address the shrink-wrapping part of
> > the thread above.)
> >
> > On AArch64, this is enough to fix PR117477, as verified by the new tests.
> > The patch does not change the SPEC2017 scores significantly.  (I saw a
> > slight improvement in fotonik3d and roms, but I'm not convinced that
> > the improvements are real.)
> >
> > The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
> > which is a scan-dump correctness test that relies on not using
> > caller saves.  The decision to use caller saves looks appropriate,
> > and saves an instruction, so I've just added -fno-caller-saves
> > to the test options.
> >
> > The x86 parts were written by Honza.
> >
> > gcc/
> > PR rtl-optimization/117477
> > * config/aarch64/aarch64.cc (aarch64_count_saves): New function.
> > (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
> > (aarch64_frame_allocation_cost): Likewise.
> > (TARGET_CALLEE_SAVE_COST): Define.
> > (TARGET_FRAME_ALLOCATION_COST): Likewise.
> > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> > Replace with...
> > (ix86_callee_save_cost): ...this new hook.
> > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
> > (TARGET_CALLEE_SAVE_COST): Define.
> > * target.h (spill_cost_type, frame_cost_type): New enums.
> > * target.def (callee_save_cost, frame_allocation_cost): New hooks.
> > (ira_callee_saved_register_cost_scale): Delete.
> > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): 
> > Delete.
> >   

Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-04 Thread Lulu Cheng

LGTM!

Thanks.

在 2025/3/3 下午3:29, Xi Ruoyao 写道:

They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression.  The incorrect reorder has been observed in openh264 LTO
build.

Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need to make loongarch_address_insns return 1 for
ADDRESS_REG_REG because the constraint "R" expects this behavior, or
the vldx instruction will be considered invalid by the register
allocate pass and turned to add.d + vld.  Apply the ADDRESS_REG_REG
penalty in loongarch_address_cost instead, loongarch_rtx_costs should
also call loongarch_address_cost instead of loongarch_address_insns
then.

Closes: https://github.com/cisco/openh264/issues/3857

gcc/ChangeLog:

PR target/119084
* config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove.
(lasx_xvldx): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove.
(lsx_vldx): Remove.
* config/loongarch/simd.md (QIVEC): New define_mode_iterator.
(_vldx): New define_expand.
* config/loongarch/loongarch.cc (loongarch_address_insns_1): New
static function with most logic factored out from ...
(loongarch_address_insns): ... here.  Call
loongarch_address_insns_1 with reg_reg_cost = 1.
(loongarch_address_cost): Call loongarch_address_insns_1 with
reg_reg_cost = la_addr_reg_reg_cost.

gcc/testsuite/ChangeLog:

PR target/119084
* gcc.target/loongarch/pr119084.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk and
gcc-14 branch?

  gcc/config/loongarch/lasx.md  | 13 -
  gcc/config/loongarch/loongarch.cc | 48 +++
  gcc/config/loongarch/lsx.md   | 13 -
  gcc/config/loongarch/simd.md  |  9 
  gcc/testsuite/gcc.target/loongarch/pr119084.c | 24 ++
  5 files changed, 61 insertions(+), 46 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr119084.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e4505c1660d..43e3ab0026a 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -119,7 +119,6 @@ (define_c_enum "unspec" [
UNSPEC_LASX_XVSSRLRN
UNSPEC_LASX_XVEXTL_QU_DU
UNSPEC_LASX_XVLDI
-  UNSPEC_LASX_XVLDX
UNSPEC_LASX_XVSTX
UNSPEC_LASX_VECINIT_MERGE
UNSPEC_LASX_VEC_SET_INTERNAL
@@ -3579,18 +3578,6 @@ (define_insn "lasx_xvldi"
[(set_attr "type" "simd_load")
 (set_attr "mode" "V4DI")])
  
-(define_insn "lasx_xvldx"

-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-   (unspec:V32QI [(match_operand:DI 1 "register_operand" "r")
-  (match_operand:DI 2 "reg_or_0_operand" "rJ")]
- UNSPEC_LASX_XVLDX))]
-  "ISA_HAS_LASX"
-{
-  return "xvldx\t%u0,%1,%z2";
-}
-  [(set_attr "type" "simd_load")
-   (set_attr "mode" "V32QI")])
-
  (define_insn "lasx_xvstx"
[(set (mem:V32QI (plus:DI (match_operand:DI 1 "register_operand" "r")
(match_operand:DI 2 "reg_or_0_operand" "rJ")))
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index eb3baac7019..3779e283f8d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2363,14 +2363,9 @@ loongarch_index_address_p (rtx addr, machine_mode mode 
ATTRIBUTE_UNUSED)
return true;
  }
  
-/* Return the number of instructions needed to load or store a value

-   of mode MODE at address X.  Return 0 if X isn't valid for MODE.
-   Assume that multiword moves may need to be split into word moves
-   if MIGHT_SPLIT_P, otherwise assume that a single load or store is
-   enough.  */
-
-int
-loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p)
+static int
+loongarch_address_insns_1 (rtx x, machine_mode mode, bool might_split_p,
+  int reg_reg_cost)
  {
struct loongarch_address_info addr;
int factor;
@@ -2405,7 +2400,7 @@ loongarch_address_insns (rtx x, machine_mode mode, bool 
might_split_p)
return factor;
  
case ADDRESS_REG_REG:

-   return factor * la_addr_reg_reg_cost;
+   return factor * reg_reg_cost;
  
case ADDRESS_CONST_INT:

return lsx_p ? 0 : factor;
@@ -2420,6 +2415,18 @@ loongarch_address_insns (rtx x, machine_mode mode, bool 
might_split_p)
return 0;
  }
  
+/* Return the number of instructions needed to load or store a value

+   of mode MODE at address X.  Return 0 if X isn't valid for MODE.
+   Assume that multiword moves may need to be split into word moves
+   if MIGHT_SPLIT_P, otherwise assume that a single load or store is
+   enough.  */
+
+int
+loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p)
+{
+  return loongarch_address_insns_1 (x, mode, might_split_p, 1);
+}
+
  /* Return true if X fits within an 

Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-04 Thread Xi Ruoyao
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote:
> LGTM!

Pushed to trunk.  The draft of gcc-14 backport is attached, I'll push it
if it builds & tests fine and there's no objection.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 6e4ca8b9c4944ffe896f4d8952f29c4cd518df22 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Sun, 2 Mar 2025 19:02:50 +0800
Subject: [gcc14 PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and
 __lasx_xvldx [PR119084]

They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression.  The incorrect reorder has been observed in openh264 LTO
build.

Expand them to a (mem) expression instead of unspec to fix the issue.

Closes: https://github.com/cisco/openh264/issues/3857

(cherry picked from commit 4856292f7a680ec478e7607f1b71781996d7d542)

Edited to remove the loongarch.cc change which is not needed for gcc-14
branch.

gcc/ChangeLog:

	PR target/119084
	* config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove.
	(lasx_xvldx): Remove.
	* config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove.
	(lsx_vldx): Remove.
	* config/loongarch/simd.md (QIVEC): New define_mode_iterator.
	(_vldx): New define_expand.

gcc/testsuite/ChangeLog:

	PR target/119084
	* gcc.target/loongarch/pr119084.c: New test.
---
 gcc/config/loongarch/lasx.md  | 13 --
 gcc/config/loongarch/lsx.md   | 13 --
 gcc/config/loongarch/simd.md  | 10 
 gcc/testsuite/gcc.target/loongarch/pr119084.c | 24 +++
 4 files changed, 34 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr119084.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 94bbd0c26bb..fe32194e811 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -155,7 +155,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVSSRLRN
   UNSPEC_LASX_XVEXTL_QU_DU
   UNSPEC_LASX_XVLDI
-  UNSPEC_LASX_XVLDX
   UNSPEC_LASX_XVSTX
   UNSPEC_LASX_VECINIT_MERGE
   UNSPEC_LASX_VEC_SET_INTERNAL
@@ -4679,18 +4678,6 @@ (define_insn "lasx_xvldi"
   [(set_attr "type" "simd_load")
(set_attr "mode" "V4DI")])
 
-(define_insn "lasx_xvldx"
-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-	(unspec:V32QI [(match_operand:DI 1 "register_operand" "r")
-		   (match_operand:DI 2 "reg_or_0_operand" "rJ")]
-		  UNSPEC_LASX_XVLDX))]
-  "ISA_HAS_LASX"
-{
-  return "xvldx\t%u0,%1,%z2";
-}
-  [(set_attr "type" "simd_load")
-   (set_attr "mode" "V32QI")])
-
 (define_insn "lasx_xvstx"
   [(set (mem:V32QI (plus:DI (match_operand:DI 1 "register_operand" "r")
 			(match_operand:DI 2 "reg_or_0_operand" "rJ")))
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 5ee5845e84b..67ba8e8ad5d 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -100,7 +100,6 @@ (define_c_enum "unspec" [
   UNSPEC_LSX_VSSRLRN
   UNSPEC_LSX_VLDI
   UNSPEC_LSX_VSHUF_B
-  UNSPEC_LSX_VLDX
   UNSPEC_LSX_VSTX
   UNSPEC_LSX_VEXTL_QU_DU
   UNSPEC_LSX_VSETEQZ_V
@@ -3070,18 +3069,6 @@ (define_insn "lsx_vshuf_b"
   [(set_attr "type" "simd_shf")
(set_attr "mode" "V16QI")])
 
-(define_insn "lsx_vldx"
-  [(set (match_operand:V16QI 0 "register_operand" "=f")
-	(unspec:V16QI [(match_operand:DI 1 "register_operand" "r")
-		   (match_operand:DI 2 "reg_or_0_operand" "rJ")]
-		  UNSPEC_LSX_VLDX))]
-  "ISA_HAS_LSX"
-{
-  return "vldx\t%w0,%1,%z2";
-}
-  [(set_attr "type" "simd_load")
-   (set_attr "mode" "V16QI")])
-
 (define_insn "lsx_vstx"
   [(set (mem:V16QI (plus:DI (match_operand:DI 1 "register_operand" "r")
 			(match_operand:DI 2 "reg_or_0_operand" "rJ")))
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 00ff2823a4e..691e7195636 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -113,6 +113,16 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
 ;; instruction here so we can avoid duplicating logics.
 ;; ===
 
+;; REG + REG load
+
+(define_mode_iterator QIVEC [(V16QI "ISA_HAS_LSX") (V32QI "ISA_HAS_LASX")])
+(define_expand "_vldx"
+  [(set (match_operand:QIVEC 0 "register_operand" "=f")
+	(mem:QIVEC (plus:DI (match_operand:DI 1 "register_operand")
+			(match_operand:DI 2 "register_operand"]
+  "TARGET_64BIT")
+
+
 ;;
 ;; FP vector rounding instructions
 ;;
diff --git a/gcc/testsuite/gcc.target/loongarch/pr119084.c b/gcc/testsuite/gcc.target/loongarch/pr119084.c
new file mode 100644
index 000..b5943303851
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr119084.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mlsx" } */
+/* { dg-require-effective-target loongarch_sx_hw } */
+
+typedef signed char V16QI __attribute__ ((vector_size (16)));
+static char x[128];
+
+__attribute__ ((noipa)) int
+noopt (int x)
+{

Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-04 Thread Lulu Cheng



在 2025/3/5 上午11:03, Xi Ruoyao 写道:

On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote:

LGTM!

Pushed to trunk.  The draft of gcc-14 backport is attached, I'll push it
if it builds & tests fine and there's no objection.



Thanks a lot.



Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Kyrylo Tkachov


> On 3 Mar 2025, at 19:52, Wilco Dijkstra  wrote:
> 
> 
> Outline atomics is not designed to be used with -mcmodel=large, so disable
> it automatically if the large code model is used.
> 
> Passes regress, OK for commit?
> 

This restriction should be documented in invoke.texi IMO.
I also think it would be more user friendly to warn them about the 
incompatibility if an explicit -moutline-atomics option is passed.
It’s okay though to silently turn off the implicit default-on option though.

Thanks,
Kyrill

> gcc:
>PR target/112465
>* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
>Turn off outline atomics with -mcmodel=large.
> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..31d083d11bfc3e9756b41c901c96749a7b8a840a
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct 
> gcc_options *opts)
>  intermediary step for the former.  */
>   if (flag_mlow_precision_sqrt)
> flag_mrecip_low_precision_sqrt = true;
> +
> +  /* Turn off outline atomics with -mcmodel=large.  */
> +  if (aarch64_cmodel == AARCH64_CMODEL_LARGE)
> +opts->x_aarch64_flag_outline_atomics = 0;
> }
> 
> /* 'Unpack' up the internal tuning structs and update the options
> 
> 



Re: [Fortran, Patch, PR77872, v1] Fix ICE when getting caf-token from abstract class type.

2025-03-04 Thread Andre Vehreschild
Hi Steve,

thanks for the review. Committed as gcc-15-7804-g5bd66483839.

Thanks again,
Andre

On Mon, 3 Mar 2025 12:34:49 -0800
Steve Kargl  wrote:

> On Mon, Mar 03, 2025 at 03:58:24PM +0100, Andre Vehreschild wrote:
> >
> > attached patches fix a 12-regression, when a caf token is requested from an
> > abstract class-typed dummy. The token was not looked up in the correct spot.
> > Due the class typed object getting an artificial variable for direct derived
> > type access, the get_caf_decl was looking at the wrong decl.
> >
> > This patch consists of two parts, the first is just some code complexity
> > reduction, where an existing attr is now used instead of checking for
> > BT_CLASS type and branching.
> >
> > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> >
>
> Thanks.  OK to commit.
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


[Fortran, Patch, PR103391, v1] Fix gimple error on assign to pointer array.

2025-03-04 Thread Andre Vehreschild
Hi all,

attached patch fixes a 12-regression when an assignment to a pointer array is
done. The issue was a missing indirect ref on assign as it was already done for
allocatable arrays.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 27ca62555c4b09349ab33f806b386b485dfe7c8a Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Tue, 4 Mar 2025 12:56:20 +0100
Subject: [PATCH] Fortran: Fix gimplification error on assignment to pointer
 [PR103391]

	PR fortran/103391

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_trans_assignment_1): Do not use poly assign
	for pointer arrays on lhs (as it is done for allocatables
	already).

gcc/testsuite/ChangeLog:

	* gfortran.dg/assign_12.f90: New test.
---
 gcc/fortran/trans-expr.cc   | 16 +++---
 gcc/testsuite/gfortran.dg/assign_12.f90 | 28 +
 2 files changed, 36 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/assign_12.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 0d790b63f95..fbe7333fd71 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12876,14 +12876,14 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
  needed.  */
   lhs_attr = gfc_expr_attr (expr1);

-  is_poly_assign = (use_vptr_copy || lhs_attr.pointer
-		|| (lhs_attr.allocatable && !lhs_attr.dimension))
-		   && (expr1->ts.type == BT_CLASS
-		   || gfc_is_class_array_ref (expr1, NULL)
-		   || gfc_is_class_scalar_expr (expr1)
-		   || gfc_is_class_array_ref (expr2, NULL)
-		   || gfc_is_class_scalar_expr (expr2))
-		   && lhs_attr.flavor != FL_PROCEDURE;
+  is_poly_assign
+= (use_vptr_copy
+   || ((lhs_attr.pointer || lhs_attr.allocatable) && !lhs_attr.dimension))
+  && (expr1->ts.type == BT_CLASS || gfc_is_class_array_ref (expr1, NULL)
+	  || gfc_is_class_scalar_expr (expr1)
+	  || gfc_is_class_array_ref (expr2, NULL)
+	  || gfc_is_class_scalar_expr (expr2))
+  && lhs_attr.flavor != FL_PROCEDURE;

   assoc_assign = is_assoc_assign (expr1, expr2);

diff --git a/gcc/testsuite/gfortran.dg/assign_12.f90 b/gcc/testsuite/gfortran.dg/assign_12.f90
new file mode 100644
index 000..be31021f24c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/assign_12.f90
@@ -0,0 +1,28 @@
+!{ dg-do run }
+!
+! Check assignment works for derived types to memory referenced by pointer
+! Contributed by G. Steinmetz  
+
+program pr103391
+   type t
+ character(1) :: c
+   end type
+   type t2
+  type(t), pointer :: a(:)
+   end type
+
+   type(t), target :: arr(2)
+   type(t2) :: r
+
+   arr = [t('a'), t('b')]
+
+   r = f([arr])
+   if (any(r%a(:)%c /= ['a', 'b'])) stop 1
+contains
+   function f(x)
+  class(t), intent(in), target :: x(:)
+  type(t2) :: f
+  allocate(f%a(size(x,1)))
+  f%a = x
+   end
+end
--
2.48.1



Re: [Fortran, Patch, PR103391, v1] Fix gimple error on assign to pointer array.

2025-03-04 Thread Paul Richard Thomas
Hi Andre,

You beat me to it! I had just noticed the missing indirect reference. Yes,
this is good for mainline and backporting to 14-branch at very least.

Thanks for the patch. If you want to do the push to mainline, I will do the
backporting if you like? I'll not be back at base for a little while so
there will be a suitable delay.
I have several Steinmetz sourced regression fixes to bring to the list in
the next days.

Cheers

Paul


On Tue, 4 Mar 2025 at 12:02, Andre Vehreschild  wrote:

> Hi all,
>
> attached patch fixes a 12-regression when an assignment to a pointer array
> is
> done. The issue was a missing indirect ref on assign as it was already
> done for
> allocatable arrays.
>
> Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists

2025-03-04 Thread Harald Anlauf

Hi Thomas,

Am 03.03.25 um 22:50 schrieb Thomas Koenig:

Hello world,

this patch is a bit more complicated than originally envisioned.

The problem was that we were not handling external dummy arguments
with -fc-prototypes-external. In looking at this, I found that we
were not warning about external procedures with different argument
lists.  This can actually be legal (see the two test cases) but
creates a problem for the C prototypes: If we have something like

subroutine foo(a,n)
   external a
   if (n == 1) call a(1)
   if (n == 2) call a(2,3)
end subroutine foo

then, pre-C23, we could just have written out the prototype as

void foo_ (void (*a) (), int *n);

but this is illegal in C23. What to do?  I finally chose to warn
about the argument mismatch, with a new option. Warn only because the
code above is legal, but include in -Wall because such code seems highly
suspect.  This option is also implied in -fc-prototypes-external. I also
put a warning in the generated header file in that case, so users
have a chance to see what is going on (especially since gcc now
defaults to C23).

Regression-tested.

Comments?  Suggestions for better wordings?  Is -Wall too strong,
should this be -Wextra (but then nobody would see it, probably...)?
OK for trunk?

Best regards

 Thomas


aren't we wasting memory here?

@@ -2023,6 +2023,10 @@ typedef struct gfc_symbol
  scope. Used in the suppression of uninitialized warnings in
reallocation
  on assignment.  */
   unsigned allocated_in_scope:1;
+  /* Set if an external dummy argument is called with different
argument lists.
+ This is legal in Fortran, but can cause problems with autogenerated
+ C prototypes for C23.  */
+  unsigned ext_dummy_arglist_mismatch;


I assume you wanted a single bit instead of a full unsigned, i.e.:

+  unsigned ext_dummy_arglist_mismatch:1;

Harald



Break false dependency chains on zen5

2025-03-04 Thread Jan Hubicka
Hi,
Zen5 on some variants has false dependency on tzcnt, blsi, blsr and blsmsk
instructions.  Those can be tested by the following benchmark

jh@shroud:~> cat ee.c
int
main()
{
   int a = 10;
   int b = 0;
   for (int i = 0; i < 10; i++)
   {
#ifdef BREAK
   asm volatile ("xor %0, %0": "=r" (b));
#endif
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   asm volatile (INST " %2, %0": "=r"(b): "0"(b),"r"(a));
   }
   return 0;
}
jh@shroud:~> cat bmk.sh
gcc ee.c -DBREAK -DINST=\"$1\" -O2 ; time ./a.out ; gcc ee.c -DINST=\"$1\" -O2 
; time ./a.out
jh@shroud:~> sh bmk.sh tzcnt

real0m0.886s
user0m0.886s
sys 0m0.000s

real0m0.886s
user0m0.886s
sys 0m0.000s 

jh@shroud:~> sh bmk.sh blsi

real0m0.979s
user0m0.979s
sys 0m0.000s

real0m2.418s
user0m2.418s
sys 0m0.000s

jh@shroud:~> sh bmk.sh blsr

real0m0.986s
user0m0.986s
sys 0m0.000s

real0m2.422s
user0m2.421s
sys 0m0.000s
jh@shroud:~> sh bmk.sh blsmsk

real0m0.973s
user0m0.973s
sys 0m0.000s

real0m2.422s
user0m2.422s
sys 0m0.000s 

So zen5 tested seems to have problem with bls* but not with tzcnt.
I tested zen1-zen4 and they do not seem to be affected.

This can be mitigated by adding a splitter that first clear the
destination register.  We already have runable that controls tzcnt
together with lzcnt and popcnt.  Since it seems that only tzcnt is
affected I added new tunable to control tzcnt only.  I also added
splitters for blsi/blsr/blsmsk implemented analogously to existing
splitter for lzcnt.

The patch is neutral on SPEC. We produce blsi and blsr in some internal
loops, but they usually have same destination as source. However it is
good to break the dependency chain to avoid patogolical cases and it is
quite cheap overall, so I think we want to enable this for generic.  I
will send followup patch for this.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_TZCNT): New macro.
(TARGET_AVOID_FALSE_DEP_FOR_BLS): New macro.
* config/i386/i386.md (*bmi_blsi_): Add splitter for false
dependency.
(*bmi_blsi__ccno): Add splitter for false dependency.
(*bmi_blsi__falsedep): New pattern.
(*bmi_blsmsk_): Add splitter for false dependency.
(*bmi_blsmsk__falsedep): New pattern.
(*bmi_blsr_): Add splitter for false dependency.
(*bmi_blsr__cmp): Add splitter for false dependency
(*bmi_blsr__cmp_falsedep): New pattern.
* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT): New 
tune.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BLS): New tune.

gcc/testsuite/ChangeLog:

* gcc.target/i386/blsi.c: New test.
* gcc.target/i386/blsmsk.c: New test.
* gcc.target/i386/blsr.c: New test.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 40b1aa4e6df..c89f493d180 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -455,6 +455,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 ix86_tune_features[X86_TUNE_ADJUST_UNROLL]
 #define TARGET_AVOID_FALSE_DEP_FOR_BMI \
ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BMI]
+#define TARGET_AVOID_FALSE_DEP_FOR_TZCNT \
+   ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_TZCNT]
+#define TARGET_AVOID_FALSE_DEP_FOR_BLS \
+   ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BLS]
 #define TARGET_ONE_IF_CONV_INSN \
ix86_tune_features[X86_TUNE_ONE_IF_CONV_INSN]
 #define TARGET_AVOID_MFENCE ix86_tune_features[X86_TUNE_AVOID_MFENCE]
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8575fbf40fe..b1cd52382a8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20993,7 +20993,8 @@
(ctz:SWI48 (match_dup 1)))]
   "TARGET_BMI"
   "tzcnt{}\t{%1, %0|%0, %1}";
-  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+  "&& (TARGET_AVOID_FALSE_DEP_FOR_BMI || TARGET_AVOID_FALSE_DEP_FOR_TZCNT)
+   && epilogue_completed
&& optimize_function_for_speed_p (cfun)
&& !reg_mentioned_p (operands[0], operands[1])"
   [(parallel
@@ -21060,7 +21061,8 @@
   return "bsf{}\t{%1, %0|%0, %1}";
 }
   "(TARGET_BMI || TARGET_CPU_P (GENERIC))
-   && TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && (TARGET_AVOID_FALSE_DEP_FOR_BMI || TARG

[PATCH] ARM ACLE: add inline definitions for __fma and __fmaf in aarch64 and aarch32 headers

2025-03-04 Thread Ayan Shafqat
Hi GCC team,

This patch introduces inline definitions for the __fma and __fmaf
functions in the ARM ACLE headers for both aarch64 and arm targets. The
new implementations use the built-in functions (__builtin_fma and
__builtin_fmaf) to ensure proper inlining and adherence to the ARM ACLE
requirements[1].

Changes include:
  - In gcc/config/aarch64/arm_acle.h:
  Added inline definitions for __fma and __fmaf.
  - In gcc/config/arm/arm_acle.h:
  Added inline definitions for __fma and __fmaf.

These changes have been tested locally, and I have verified that they
integrate smoothly with the existing ARM backend configurations. Please
let me know if you have any questions or need further modifications.

Thanks,
Ayan

[1] ARM ACLE Document: 
https://arm-software.github.io/acle/main/acle.html#fused-multiply-accumulate-fma

Signed-off-by: Ayan Shafqat 

diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 7976c117daf..d9e2401ea9f 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -129,6 +129,20 @@ __jcvt (double __a)
 
 #pragma GCC pop_options
 
+__extension__ extern __inline double
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__fma (double __x, double __y, double __z)
+{
+  return __builtin_fma (__x, __y, __z);
+}
+
+__extension__ extern __inline float
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__fmaf (float __x, float __y, float __z)
+{
+  return __builtin_fmaf (__x, __y, __z);
+}
+
 #pragma GCC push_options
 #pragma GCC target ("+nothing+frintts")
 __extension__ extern __inline float
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index c6c03fdce27..256710a2c31 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -829,6 +829,20 @@ __crc32cd (uint32_t __a, uint64_t __b)
 #endif /* __ARM_FEATURE_CRC32  */
 #pragma GCC pop_options
 
+__extension__ extern __inline double
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__fma (double __x, double __y, double __z)
+{
+  return __builtin_fma (__x, __y, __z);
+}
+
+__extension__ extern __inline float
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__fmaf (float __x, float __y, float __z)
+{
+  return __builtin_fmaf (__x, __y, __z);
+}
+
 #ifdef __cplusplus
 }
 #endif



Re: [PATCH 1/2] __builtin_bswapXX: improve docs and tests

2025-03-04 Thread Richard Sandiford
Oscar Gustafsson  writes:
> ChangeLog:
>
> * doc/extend.texi: Improve example for __builtin_swap16.
>
> ---
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ec9bb59900c..83f6e45170b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -16338,7 +16338,7 @@ Returns the first argument raised to the power of
> the second.  Unlike the
>
>  @defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})}
>  Returns @var{x} with the order of the bytes reversed; for example,
> -@code{0xaabb} becomes @code{0xbbaa}.  Byte here always means
> +@code{0xabcd} becomes @code{0xcdab}.  Byte here always means
>  exactly 8 bits.
>  @enddefbuiltin

Thanks for the patch.  I agree this example is clearer, so I've pushed
it to trunk.  I'll push the testsuite update tomorrow if there are no
objections.

Richard


Re: [3/3 PATCH v6]middle-end: delay checking for alignment to load [PR118464]

2025-03-04 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, 4 Mar 2025, Tamar Christina wrote:
>
>> Hi All,
>> 
>> This fixes two PRs on Early break vectorization by delaying the safety 
>> checks to
>> vectorizable_load when the VF, VMAT and vectype are all known.
>> 
>> This patch does add two new restrictions:
>> 
>> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>>group sizes, as they are unaligned every n % 2 iterations and so may cross
>>a page unwittingly.

Sorry for the drive-by comment, but: it might be worth updating the
commit message to say non-power-of-2, rather than uneven.  The patch
uses the right check, but the message made it sound like it didn't.

Thanks,
Richard

>> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization 
>> if
>>we cannot peel for alignment, as the alignment requirement is quite large 
>> at
>>GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
>>don't support it for now.
>> 
>> There are other steps documented inside the code itself so that the reasoning
>> is next to the code.
>> 
>> As a fall-back, when the alignment fails we require partial vector support.
>> 
>> For VLA targets like SVE return element alignment as the desired vector
>> alignment.  This means that the loads are never misaligned and so annoying it
>> won't ever need to peel.
>> 
>> So what I think needs to happen in GCC 16 is that.
>> 
>> 1. during vect_compute_data_ref_alignment we need to take the max of
>>POLY_VALUE_MIN and vector_alignment.
>> 
>> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add 
>> a
>>check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use 
>> as a
>>proxy for pagesize.
>> 
>> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>>vect_determine_partial_vectors_and_peeling since the first iteration has 
>> to
>>be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>>vectorize.
>> 
>> 4. Create a default mask to be used, so that 
>> vect_use_loop_mask_for_alignment_p
>>becomes true and we generate the peeled check through loop control for
>>partial loops.  From what I can tell this won't work for
>>LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support 
>> at
>>all in the compiler.  That would need to be done independently from the
>>above.
>> 
>> In any case, not GCC 15 material so I've kept the WIP patches I have 
>> downstream.
>> 
>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
>> -m32, -m64 and no issues.
>> 
>> Ok for master?
>
> OK.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Tamar
>> 
>> gcc/ChangeLog:
>> 
>>  PR tree-optimization/118464
>>  PR tree-optimization/116855
>>  * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>>  * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>>  checks.
>>  (vect_compute_data_ref_alignment): Remove alignment checks and move to
>>  get_load_store_type, increase group access alignment.
>>  (vect_enhance_data_refs_alignment): Add note to comment needing
>>  investigating.
>>  (vect_analyze_data_refs_alignment): Likewise.
>>  (vect_supportable_dr_alignment): For group loads look at first DR.
>>  * tree-vect-stmts.cc (get_load_store_type):
>>  Perform safety checks for early break pfa.
>>  * tree-vectorizer.h (dr_set_safe_speculative_read_required,
>>  dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
>>  (need_peeling_for_alignment): Renamed to...
>>  (safe_speculative_read_required): .. This
>>  (class dr_vec_info): Add scalar_access_known_in_bounds.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  PR tree-optimization/118464
>>  PR tree-optimization/116855
>>  * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>>  load type is relaxed later.
>>  * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>>  * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
>>  * gcc.dg/vect/vect-early-break_128.c: Likewise.
>>  * gcc.dg/vect/vect-early-break_26.c: Likewise.
>>  * gcc.dg/vect/vect-early-break_43.c: Likewise.
>>  * gcc.dg/vect/vect-early-break_44.c: Likewise.
>>  * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
>>  * gcc.dg/vect/vect-early-break_7.c: Likewise.
>>  * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>>  * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>>  * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>

Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-03-04 Thread Richard Earnshaw (lists)
On 04/03/2025 14:49, Torbjorn SVENSSON wrote:
> 
> 
> On 2025-03-03 18:00, Richard Earnshaw wrote:
>> On 28/02/2025 16:18, Richard Earnshaw wrote:
>>> On 28/02/2025 16:12, Richard Earnshaw wrote:
 On 08/11/2024 18:47, Torbjörn SVENSSON wrote:
> Ok for trunk and releases/gcc-14?
>
> -- 
>
> A long time ago, this test forced -march=armv6.
>
> With -marm, the generated assembler is:
> foo:
>   sub r0, r0, #48
>   cmp r0, #9
>   movhi   r0, #0
>   movls   r0, #1
>   bx  lr
>
> With -mthumb, the generated assembler is:
> foo:
>   subs    r0, r0, #48
>   movs    r2, #9
>   uxtb    r3, r0
>   movs    r0, #0
>   cmp r2, r3
>   adcs    r0, r0, r0
>   uxtb    r0, r0
>   bx  lr
>

 Looking at this code, I think both the uxtb instructions are unnecessary.

 For the first UXTB.  On entry, 'c' is a char (unsigned) so the value is 
 passed by the caller in a 32-bit reg, but range-limited (by the ABI) to 
 values between 0 and 255.
 We subtract 48 from that, so the range now, is from -48 to 207, but we're 
 going to look at this as an unsigned value, so it's effectively 0..207 or 
 UINT_MAX-48..UINT_MAX.
 The UXTB instruction then converts the range so that the range becomes 
 0..255 again (but importantly, the values between UINT_MAX-48 and UINT_MAX 
 are mapped to the range 208..255.  We then do an unsigned comparison 
 between that value and the constant 9 to test whether the result is >= 9.
 Now, importantly, all the values that are in the range 208..255 are >= 9, 
 but were also >= 9 before the UXTB instruction.  In effect, the UXTB is 
 completely redundant.

>>> I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That 
>>> does affect the rest of the argument.
>>>
>>> R.
>>>
 The second UXTB is even more egregious.  We have 0 in r0 and we then 
 conditionally add 1 to it, so r0 is in the range 0..1.  Zero extending 
 that with a UXTB is therefore clearly pointless.

 So neither of the UXTB instructions should be present, even if generating 
 Thumb1 code.

 I think the failures are, therefore, real and we should look to fix the 
 compiler rather than 'fix' the scope of the test.

 R.

> Require effective-target arm_arm_ok to skip the test for thumb-only
> targets (Cortex-M).
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/arm/unsigned-extend-1.c: Use effective-target
>  arm_arm_ok.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>    gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
>    1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ 
> testsuite/gcc.target/arm/unsigned-extend-1.c
> index 3b4ab048fb0..73f2e1a556d 100644
> --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
> +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
> @@ -1,4 +1,5 @@
>    /* { dg-do compile } */
> +/* { dg-require-effective-target arm_arm_ok } */
>    /* { dg-options "-O2" } */
>    unsigned char foo (unsigned char c)

>>>
>>
>> I've just pushed a patch to eliminate one of the redundant zero-extends.  
>> The other is harder to eliminate, so I've adjusted the test to xfail when 
>> compiling for thumb1.
> 
> Hi Richard,
> 
> Can this be backported to releases/gcc-14 too?
> I can do the backport if you just approve it.
> 
> Kind regards,
> Torbjörn

By all means backport the testsuite change, but not the change to thumb1.md, 
please.

R.


Re: [PATCH][stage1] tree-optimization/119103 - missed overwidening detection for shift

2025-03-04 Thread Richard Sandiford
Richard Biener  writes:
> When vectorizing a shift of u16 data by an amount that's known to
> be less than 16 we currently fail to emit a vector u16 shift.  The
> first reason is that the promotion of the shift amount is hoisted
> only by PRE and that cannot preserve range info, the second reason
> is that pattern detection doesn't use range info when computing
> the precision required for an operation.
>
> The following addresses the first issue by making LIM hoist all
> expressions for the pass that runs right before PRE and the
> second issue by querying ranges for the shift amount.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
>   PR tree-optimization/119103
>   * tree-ssa-loop-im.cc (in_loop_pipeline): Globalize.
>   (compute_invariantness): Override costing when we run
>   right before PRE and PRE is enabled.
>   (pass_lim::execute): Adjust.
>   * tree-vect-patterns.cc (vect_determine_precisions_from_users):
>   For variable shift amounts use range information.
>
>   * gcc.target/i386/pr119103.c: New testcase.

Nice!

> [...]
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 4f0a7ea162b..09408359cf2 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6544,10 +6544,19 @@ vect_determine_precisions_from_users (stmt_vec_info 
> stmt_info, gassign *stmt)
>  case RSHIFT_EXPR:
>{
>   tree shift = gimple_assign_rhs2 (stmt);
> - if (TREE_CODE (shift) != INTEGER_CST
> - || !wi::ltu_p (wi::to_widest (shift), precision))
> + unsigned int const_shift;
> + wide_int min_shift, max_shift;
> + if (TREE_CODE (shift) == SSA_NAME
> + && vect_get_range_info (shift, &min_shift, &max_shift)
> + && wi::ge_p (min_shift, 0, TYPE_SIGN (TREE_TYPE (shift)))
> + && wi::lt_p (max_shift, TYPE_PRECISION (type),
> +  TYPE_SIGN (TREE_TYPE (shift
> +   const_shift = max_shift.to_uhwi ();

Don't we need to use min_shift rather than max_shift for LSHIFT_EXPR's:

/* We need CONST_SHIFT fewer bits of the input.  */
min_input_precision = (MAX (operation_precision, const_shift)
   - const_shift);

?  If so, I guess that means replacing const_shift with a minimum and
a maximum.

Thanks,
Richard

> + else if (TREE_CODE (shift) == INTEGER_CST
> +  && wi::ltu_p (wi::to_widest (shift), precision))
> +   const_shift = TREE_INT_CST_LOW (shift);
> + else
> return;
> - unsigned int const_shift = TREE_INT_CST_LOW (shift);
>   if (code == LSHIFT_EXPR)
> {
>   /* Avoid creating an undefined shift.


[PATCH] middle-end/97323 - TYPE_CANONICAL vs. ARRAY_TYPE modes

2025-03-04 Thread Richard Biener
For strict-alignment targets we can end up with BLKmode single-element
array types when the element type is unaligned.  This confuses
type checking since the canonical type would have an aligned
element type and a non-BLKmode mode.  The following simply ignores
the mode we assign to array types for this purpose, like we already
do for record and union types.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR middle-end/97323
* tree.cc (gimple_canonical_types_compatible_p): Ignore
TYPE_MODE also for ARRAY_TYPE.
(verify_type): Likewise.

* gcc.dg/pr97323.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr97323.c | 5 +
 gcc/tree.cc| 2 ++
 2 files changed, 7 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr97323.c

diff --git a/gcc/testsuite/gcc.dg/pr97323.c b/gcc/testsuite/gcc.dg/pr97323.c
new file mode 100644
index 000..8845a34fa15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97323.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-g" } */
+
+typedef int a __attribute__((aligned(2)));
+a b[1];
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 0743ed71c78..4e855b4b2d3 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -13982,6 +13982,7 @@ gimple_canonical_types_compatible_p (const_tree t1, 
const_tree t2,
  flexible array members, we allow mismatching modes for structures or
  unions.  */
   if (!RECORD_OR_UNION_TYPE_P (t1)
+  && TREE_CODE (t1) != ARRAY_TYPE
   && TYPE_MODE (t1) != TYPE_MODE (t2))
 return false;
 
@@ -14313,6 +14314,7 @@ verify_type (const_tree t)
 flexible array members.  */
   && !RECORD_OR_UNION_TYPE_P (t)
   && !RECORD_OR_UNION_TYPE_P (TYPE_CANONICAL (t))
+  && TREE_CODE (t) != ARRAY_TYPE
   && TYPE_MODE (t) != TYPE_MODE (TYPE_CANONICAL (t)))
 {
   error ("% of % is not compatible");
-- 
2.43.0


Re: [Fortran, Patch, PR103391, v1] Fix gimple error on assign to pointer array.

2025-03-04 Thread Andre Vehreschild
Hi Paul,

thanks for the review. Committed as gcc-15-7812-g04909c7ecc0.

Yes please: you do the backport. Thank you very much.

I am looking at pr104684 at the moment, which is the last regression on my list
(i.e. that I am assigned or cc'ed to). When the regression test for this is
fine, I will continue with the coarray teams stuff. If you find any regressions
I have caused or you don't find the solution for, just add me to cc.

Thanks again,
Andre

On Tue, 4 Mar 2025 13:46:33 +
Paul Richard Thomas  wrote:

> Hi Andre,
>
> You beat me to it! I had just noticed the missing indirect reference. Yes,
> this is good for mainline and backporting to 14-branch at very least.
>
> Thanks for the patch. If you want to do the push to mainline, I will do the
> backporting if you like? I'll not be back at base for a little while so
> there will be a suitable delay.
> I have several Steinmetz sourced regression fixes to bring to the list in
> the next days.
>
> Cheers
>
> Paul
>
>
> On Tue, 4 Mar 2025 at 12:02, Andre Vehreschild  wrote:
>
> > Hi all,
> >
> > attached patch fixes a 12-regression when an assignment to a pointer array
> > is
> > done. The issue was a missing indirect ref on assign as it was already
> > done for
> > allocatable arrays.
> >
> > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> >
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
> >


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH v4] libstdc++: implement constexpr memory algorithms

2025-03-04 Thread Patrick Palka
On Fri, 28 Feb 2025, Giuseppe D'Angelo wrote:

> Hello,
> 
> On 27/02/2025 15:34, Jonathan Wakely wrote:
> > On 26/02/25 17:27 +0100, Giuseppe D'Angelo wrote:
> > > On 26/02/2025 16:33, Giuseppe D'Angelo wrote:
> > > > Whops, sorry, missed this sub-thread (while replying to the other one).
> > > > Change of plans then, I'll amend and remove the ad-hoc constexpr macro.
> > > 
> > > Done, v3 attached.
> > > 
> > > Thanks,
> > > -- 
> > > Giuseppe D'Angelo
> > > 
> > 
> > > From de3751a38330f508be9f08b77136a31481018828 Mon Sep 17 00:00:00 2001
> > > From: Giuseppe D'Angelo 
> > > Date: Sun, 16 Feb 2025 19:37:07 +0100
> > > Subject: [PATCH] libstdc++: implement constexpr memory algorithms
> > > 
> > > This commit adds support for C++26's constexpr specialized memory
> > > algorithms, introduced by P2283R2, P3508R0, P3369R0.
> > > 
> > > The uninitialized_default, value, copy, move and fill algorithms are
> > > affected, in all of their variants (iterator-based, range-based and _n
> > > versions.)
> > > 
> > > The changes are mostly mechanical -- add `constexpr` to a number of
> > > signatures when compiling in C++26 and above modes. The internal helper
> > > guard class for range algorithms instead can be marked unconditionally.
> > > 
> > > The only "real" change to the implementation of the algorithms is that
> > > during constant evaluation I need to dispatch to a constexpr-friendly
> > > version of them.
> > > 
> > > For each algorithm family I've added only one test to cover it and its
> > > variants; the idea is to avoid too much repetition and simplify future
> > > maintenance.
> > 
> > The patch itself looks good, but I have some comments on the ChangeLog
> > part.
> 
> I'm adding a v4 because I discovered that one more tweak was necessary to an
> existing test of the  header synopsis...
> 
> 
>  These are thigns I would have tweaked myself before pushing, but
> > now that you have commit access ...
> > 
> > > libstdc++-v3/ChangeLog:
> > > 
> > >   * include/bits/ranges_uninitialized.h: Mark the specialized
> > >   memory algorithms as constexpr in C++26. Also mark the members
> > >   of the _DestroyGuard helper class.
> > >   * include/bits/stl_uninitialized.h: Ditto.
> > >   * include/bits/stl_construct.h: Mark _Construct_novalue (which
> > >   uses placement new to do default initialization) as constexpr
> > >   in C++26. This is possible due to P2747R2, which GCC already
> > >   implements; other compilers in C++26 modes already implement
> > >   P2448R2, so there should be no issues there.
> > 
> > Per-file ChangeLog entries should say "what, not why" ... for reasons.
> 
> The best reasons :)
> 
> 
> > That makes old GNU-style ChangeLog files sometimes not very useful.
> > But at least we now have Git commit messages where we can put all the
> > detailed background, rationale etc.
> > 
> > So the explanation that it's possible to make it constexpr due to
> > P2448R2 should be above in the free text part of the commit message
> > above, along with the parenthesis saying what the function does.
> > 
> > Also, please mention the changed function by name, i.e.
> > 
> > * include/bits/stl_construct.h (_Construct_novalue): Mark
> > constexpr for C++26.
> > 
> > See
> > https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html
> > for the format.
> > 
> > For the changes in ranges_uninitialized.h and stl_uninitialized.h it's
> > basically "all the functions" so I think it's OK not to name them all.
> > (It would also be OK if you did name them all individually, but I
> > don't think it's necessary.)
> > 
> > >   * include/bits/version.def: Bump the feature-testing macro.
> > 
> > Please mention which macro, e.g. something like:
> > 
> > * include/bits/version.def (raw_memory_algorithms): Bump
> >   value.
> > 
> > >   * include/bits/version.h: Regenerate.
> > 
> > (No need to mention the specific macro by name here, because the whole
> > file was regenerated so this is fine.)
> > 
> > >   * testsuite/20_util/specialized_algorithms/feature_test_macro.cc: New
> > > test.
> > 
> > Please add a line break after the colon here (and the lines below). We
> > try to keep ChangeLog entries below about 78 columns, but for
> > libstdc++ tests that's often impossible because the pathname is
> > already longer than that! But we can still put the "New test." on a
> > new line.
> 
> The contrib/mklog.py script has a maximum column width of 100, should it be
> amended to 78? (I think that's what happened here, the script generated the
> overlong line and I didn't touch it.)
> 
> 
> > 
> > Feel free to push to trunk with those changes to the ChangeLog lines
> > (and add Reviewed-by: tags for me and/or Patrick if you like). If
> > you'd feel more comfortable sending it to the mailing list again for a
> > final check, that's fine too.
> > 
> > Either way, the traditional first commit when you've got write after
> > approval access is to add yourself to the MAINTAINERS file, as per:
> > http

Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-04 Thread Robin Dapp

Yeah I didn't know how to articulate  it (and perhaps this still requires
clarification)

Say we have following

// reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-1.c
 main
    set_frm (4);    // orig global FRM update

    test_float_point_frm_run_1 (op1, op2, vl)
   set_frm (0);
   result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
   assert_equal (1, get_frm ())
   // restore global 4 before returning

    assert_equal (4, get_frm ()  <-- here call restores global

vs.

// reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-5.c
main
  set_frm (1);    // orig global FRM update

  test_float_point_frm_run_1 (op1, op2, vl)
      other_function()
           set_frm (2);    // also global update

  assert_equal (2, get_frm ()    <-- here call doesn't restore

There's no explicit annotation or anything about the FRM, yet in 2nd example
other_function () we consider frm write a global event but not in
test_float_point_frm_run_1 () in 1st example: I'm missing words how to explain
that :-)


AFAIR the general idea is that every function can potentially "clobber", i.e. 
set FRM globally.


The intent of the first test is to save and restore the global rounding mode 
because we modify it locally by a rounding-mode changing intrinsic.


However, as you note, the set_frm at the beginning of 
test_float_point_frm_run_1 has no effect.  What should intuitively(?) happen is 
that it sets a new global FRM that we restore upon exit.  We, however, restore 
the previous global FRM.  I believe that's because set_frm is an inline 
assembly, inlined into test_float_point_frm_run_1 and we expect inline 
assemblies to _not_ change the rounding mode.  That was one of my concerns 
during the review some time in 2023.  If set_frm were a non-inlined function 
we'd need to assume it changes the rounding mode and restore the value it has 
after the set_frm call.


And that's also what is different in frm-run-5.c (your second snippet).  We 
wrap the set_frm call in another non-inlinable call.  Thus, we need to assume 
the global rounding mode changes and save/restore.


I guess it was a deliberate decision to handle asm snippets different.  At 
least as long as they are in the same visible scope it's obvious when they're 
changing the rounding mode.  One could argue that users doing that are on their 
own.  Not sure that argument has worked very well, historically, though ;)


--
Regards
Robin



Re: [3/3 PATCH v6]middle-end: delay checking for alignment to load [PR118464]

2025-03-04 Thread Richard Biener
On Tue, 4 Mar 2025, Tamar Christina wrote:

> Hi All,
> 
> This fixes two PRs on Early break vectorization by delaying the safety checks 
> to
> vectorizable_load when the VF, VMAT and vectype are all known.
> 
> This patch does add two new restrictions:
> 
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>group sizes, as they are unaligned every n % 2 iterations and so may cross
>a page unwittingly.
> 
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization 
> if
>we cannot peel for alignment, as the alignment requirement is quite large 
> at
>GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
>don't support it for now.
> 
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
> 
> As a fall-back, when the alignment fails we require partial vector support.
> 
> For VLA targets like SVE return element alignment as the desired vector
> alignment.  This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
> 
> So what I think needs to happen in GCC 16 is that.
> 
> 1. during vect_compute_data_ref_alignment we need to take the max of
>POLY_VALUE_MIN and vector_alignment.
> 
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
>check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use 
> as a
>proxy for pagesize.
> 
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>vect_determine_partial_vectors_and_peeling since the first iteration has to
>be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>vectorize.
> 
> 4. Create a default mask to be used, so that 
> vect_use_loop_mask_for_alignment_p
>becomes true and we generate the peeled check through loop control for
>partial loops.  From what I can tell this won't work for
>LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
>all in the compiler.  That would need to be done independently from the
>above.
> 
> In any case, not GCC 15 material so I've kept the WIP patches I have 
> downstream.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>   checks.
>   (vect_compute_data_ref_alignment): Remove alignment checks and move to
>   get_load_store_type, increase group access alignment.
>   (vect_enhance_data_refs_alignment): Add note to comment needing
>   investigating.
>   (vect_analyze_data_refs_alignment): Likewise.
>   (vect_supportable_dr_alignment): For group loads look at first DR.
>   * tree-vect-stmts.cc (get_load_store_type):
>   Perform safety checks for early break pfa.
>   * tree-vectorizer.h (dr_set_safe_speculative_read_required,
>   dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
>   (need_peeling_for_alignment): Renamed to...
>   (safe_speculative_read_required): .. This
>   (class dr_vec_info): Add scalar_access_known_in_bounds.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>   load type is relaxed later.
>   * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>   * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
>   * gcc.dg/vect/vect-early-break_128.c: Likewise.
>   * gcc.dg/vect/vect-early-break_26.c: Likewise.
>   * gcc.dg/vect/vect-early-break_43.c: Likewise.
>   * gcc.dg/vect/vect-early-break_44.c: Likewise.
>   * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
>   * gcc.dg/vect/vect-early-break_7.c: Likewise.
>   * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>   * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>   * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
>   * gcc

[PATCH][stage1] tree-optimization/119103 - missed overwidening detection for shift

2025-03-04 Thread Richard Biener
When vectorizing a shift of u16 data by an amount that's known to
be less than 16 we currently fail to emit a vector u16 shift.  The
first reason is that the promotion of the shift amount is hoisted
only by PRE and that cannot preserve range info, the second reason
is that pattern detection doesn't use range info when computing
the precision required for an operation.

The following addresses the first issue by making LIM hoist all
expressions for the pass that runs right before PRE and the
second issue by querying ranges for the shift amount.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

PR tree-optimization/119103
* tree-ssa-loop-im.cc (in_loop_pipeline): Globalize.
(compute_invariantness): Override costing when we run
right before PRE and PRE is enabled.
(pass_lim::execute): Adjust.
* tree-vect-patterns.cc (vect_determine_precisions_from_users):
For variable shift amounts use range information.

* gcc.target/i386/pr119103.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr119103.c | 13 +
 gcc/tree-ssa-loop-im.cc  | 10 --
 gcc/tree-vect-patterns.cc| 15 ---
 3 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr119103.c

diff --git a/gcc/testsuite/gcc.target/i386/pr119103.c 
b/gcc/testsuite/gcc.target/i386/pr119103.c
new file mode 100644
index 000..57210dc3bbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr119103.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+void lshift(unsigned short *x, unsigned char amount)
+{
+  if (amount > 15)
+__builtin_unreachable();
+
+  for (int i = 0; i < 16; i++)
+x[i] <<= amount;
+}
+
+/* { dg-final { scan-assembler "vpsllw" } } */
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 225964c6215..a3ca5af3e3e 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -143,6 +143,8 @@ public:
   different modes.  */
 };
 
+static bool in_loop_pipeline;
+
 /* We use six bits per loop in the ref->dep_loop bitmap to record
the dep_kind x dep_state combinations.  */
 
@@ -1239,7 +1241,11 @@ compute_invariantness (basic_block bb)
   lim_data->cost);
}
 
-  if (lim_data->cost >= LIM_EXPENSIVE)
+  if (lim_data->cost >= LIM_EXPENSIVE
+ /* When we run before PRE and PRE is active hoist all expressions
+since PRE would do so anyway and we can preserve range info
+but PRE cannot.  */
+ || (flag_tree_pre && !in_loop_pipeline))
set_profitable_level (stmt);
 }
 }
@@ -3759,7 +3765,7 @@ public:
 unsigned int
 pass_lim::execute (function *fun)
 {
-  bool in_loop_pipeline = scev_initialized_p ();
+  in_loop_pipeline = scev_initialized_p ();
   if (!in_loop_pipeline)
 loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
 
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4f0a7ea162b..09408359cf2 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6544,10 +6544,19 @@ vect_determine_precisions_from_users (stmt_vec_info 
stmt_info, gassign *stmt)
 case RSHIFT_EXPR:
   {
tree shift = gimple_assign_rhs2 (stmt);
-   if (TREE_CODE (shift) != INTEGER_CST
-   || !wi::ltu_p (wi::to_widest (shift), precision))
+   unsigned int const_shift;
+   wide_int min_shift, max_shift;
+   if (TREE_CODE (shift) == SSA_NAME
+   && vect_get_range_info (shift, &min_shift, &max_shift)
+   && wi::ge_p (min_shift, 0, TYPE_SIGN (TREE_TYPE (shift)))
+   && wi::lt_p (max_shift, TYPE_PRECISION (type),
+TYPE_SIGN (TREE_TYPE (shift
+ const_shift = max_shift.to_uhwi ();
+   else if (TREE_CODE (shift) == INTEGER_CST
+&& wi::ltu_p (wi::to_widest (shift), precision))
+ const_shift = TREE_INT_CST_LOW (shift);
+   else
  return;
-   unsigned int const_shift = TREE_INT_CST_LOW (shift);
if (code == LSHIFT_EXPR)
  {
/* Avoid creating an undefined shift.
-- 
2.43.0


[Fortran, Patch, PR104684, v1] Fix gimple error when pointer assigning alloc array.

2025-03-04 Thread Andre Vehreschild
Hi all,

attached patch fixes a gimplification fault when a pointer assignment to an
allocatable array is done. The tree type is different, because of that flag in
the lang_specific structure. View-converting the type fixes the issue.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 3acd5266f70c29d6b2b3078e122f61f6537b602d Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Tue, 4 Mar 2025 17:06:31 +0100
Subject: [PATCH] Fortran: Add view convert to pointer assign when only
 pointer/alloc attr differs [PR104684]

	PR fortran/104684

gcc/fortran/ChangeLog:

	* trans-array.cc (gfc_conv_expr_descriptor): Look at the
	lang-specific akind and do a view convert when only the akind
	attribute differs between pointer and allocatable array.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/ptr_comp_6.f08: New test.
---
 gcc/fortran/trans-array.cc| 10 +++-
 .../gfortran.dg/coarray/ptr_comp_6.f08| 25 +++
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 6a00d26cb2f..925030465ac 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -8186,8 +8186,16 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
 	{
 	  if (se->direct_byref && !se->byref_noassign)
 	{
+	  struct lang_type *lhs_ls
+		= TYPE_LANG_SPECIFIC (TREE_TYPE (se->expr)),
+		*rhs_ls = TYPE_LANG_SPECIFIC (TREE_TYPE (desc));
+	  /* When only the array_kind differs, do a view_convert.  */
+	  tmp = lhs_ls && rhs_ls && lhs_ls->rank == rhs_ls->rank
+			&& lhs_ls->akind != rhs_ls->akind
+		  ? build1 (VIEW_CONVERT_EXPR, TREE_TYPE (se->expr), desc)
+		  : desc;
 	  /* Copy the descriptor for pointer assignments.  */
-	  gfc_add_modify (&se->pre, se->expr, desc);
+	  gfc_add_modify (&se->pre, se->expr, tmp);

 	  /* Add any offsets from subreferences.  */
 	  gfc_get_dataptr_offset (&se->pre, se->expr, desc, NULL_TREE,
diff --git a/gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08 b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08
new file mode 100644
index 000..397a09bc8bc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_6.f08
@@ -0,0 +1,25 @@
+!{ dg-do run }
+!
+! Contributed by Arseny Solokha  
+
+program pr104684
+  type :: index_map
+integer, allocatable :: send_index(:)
+  end type
+  type(index_map) :: imap
+
+  imap%send_index = [5,4,3]
+  call sub(imap)
+contains
+  subroutine sub(this)
+type(index_map), intent(inout), target :: this
+type :: box
+  integer, pointer :: array(:)
+end type
+type(box), allocatable :: buffer[:]
+allocate(buffer[*])
+buffer%array => this%send_index
+if (any(buffer%array /= [5,4,3])) stop 1
+  end subroutine
+end program
+
--
2.48.1



Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-03-04 Thread Torbjorn SVENSSON




On 2025-03-04 16:44, Richard Earnshaw (lists) wrote:

On 04/03/2025 14:49, Torbjorn SVENSSON wrote:



On 2025-03-03 18:00, Richard Earnshaw wrote:

On 28/02/2025 16:18, Richard Earnshaw wrote:

On 28/02/2025 16:12, Richard Earnshaw wrote:

On 08/11/2024 18:47, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

A long time ago, this test forced -march=armv6.

With -marm, the generated assembler is:
foo:
   sub r0, r0, #48
   cmp r0, #9
   movhi   r0, #0
   movls   r0, #1
   bx  lr

With -mthumb, the generated assembler is:
foo:
   subs    r0, r0, #48
   movs    r2, #9
   uxtb    r3, r0
   movs    r0, #0
   cmp r2, r3
   adcs    r0, r0, r0
   uxtb    r0, r0
   bx  lr



Looking at this code, I think both the uxtb instructions are unnecessary.

For the first UXTB.  On entry, 'c' is a char (unsigned) so the value is passed 
by the caller in a 32-bit reg, but range-limited (by the ABI) to values between 
0 and 255.
We subtract 48 from that, so the range now, is from -48 to 207, but we're going 
to look at this as an unsigned value, so it's effectively 0..207 or 
UINT_MAX-48..UINT_MAX.
The UXTB instruction then converts the range so that the range becomes 0..255 
again (but importantly, the values between UINT_MAX-48 and UINT_MAX are mapped to 
the range 208..255.  We then do an unsigned comparison between that value and the 
constant 9 to test whether the result is >= 9.
Now, importantly, all the values that are in the range 208..255 are >= 9, but were 
also >= 9 before the UXTB instruction.  In effect, the UXTB is completely 
redundant.


I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That does 
affect the rest of the argument.

R.


The second UXTB is even more egregious.  We have 0 in r0 and we then 
conditionally add 1 to it, so r0 is in the range 0..1.  Zero extending that 
with a UXTB is therefore clearly pointless.

So neither of the UXTB instructions should be present, even if generating 
Thumb1 code.

I think the failures are, therefore, real and we should look to fix the 
compiler rather than 'fix' the scope of the test.

R.


Require effective-target arm_arm_ok to skip the test for thumb-only
targets (Cortex-M).

gcc/testsuite/ChangeLog:

  * gcc.target/arm/unsigned-extend-1.c: Use effective-target
  arm_arm_ok.

Signed-off-by: Torbjörn SVENSSON 
---
    gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
    1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ 
testsuite/gcc.target/arm/unsigned-extend-1.c
index 3b4ab048fb0..73f2e1a556d 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
@@ -1,4 +1,5 @@
    /* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
    /* { dg-options "-O2" } */
    unsigned char foo (unsigned char c)






I've just pushed a patch to eliminate one of the redundant zero-extends.  The 
other is harder to eliminate, so I've adjusted the test to xfail when compiling 
for thumb1.


Hi Richard,

Can this be backported to releases/gcc-14 too?
I can do the backport if you just approve it.

Kind regards,
Torbjörn


By all means backport the testsuite change, but not the change to thumb1.md, 
please.


Partial backport done in r14.2.0-853-g7fb1d7bff18.

Kind regards,
Torbjörn


Re: The COBOL front end, version 3, now in 14 easy pieces

2025-03-04 Thread James K. Lowden
On Tue, 4 Mar 2025 00:08:16 +0100
Jakub Jelinek  wrote:

> On Mon, Mar 03, 2025 at 05:21:38PM -0500, James K. Lowden wrote:
> > However IMO, the incantation:
> > 
> > make install DESTDIR=/foo
> > 
> > is invalid.  The compiler's library search path is fixed when the
> > compiler is built, based on configure options.  Installing into an
> > arbitrary directory cannot work; there is no opportunity then to
> > alter the gcobol binary.
> 
> This is how most distros install stuff for their packaging systems,
> make install DESTDIR=/some/directory
> and then everything under /some/directory
> is packaged into the package.
> GCC is generally relocatable, the compiler driver should find the
> compiler and library/include directories etc. relative to where the
> driver resides in the filesystem.

Thank you for taking the time to explain that, Jakub.  I stand
corrected, and I believe you'll find that DESTDIR works as intended
using the current cobol-patched branch.  It works here for me.  

Somehow, despite reading the automake documentation over and over
these last several days while we struggled to bring the libgcobol
build process into the 20th [sic] century, I didn't put 2 + 2 together.
Looking at it now, I see that by using automake we get DESTDIR for free,
provided (as we do) that we use libtool for installation.  

In short, despite not trying to support DESTDIR, we do anyway, by happy
accident.  And we are now better informed.  

Regards, 

--jkl


Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Wilco Dijkstra
Hi Kyrill,

> This restriction should be documented in invoke.texi IMO.
> I also think it would be more user friendly to warn them about the 
> incompatibility if an explicit -moutline-atomics option is passed.
> It’s okay though to silently turn off the implicit default-on option though.

I've updated the documentation for -mcmodel=large so it's a little closer to 
reality.
I've also added the same check as for other incompatible options like -fPIC - 
this
means with explicit use of outline atomics you get an error immediately instead 
of
an unclear linker error.

Cheers,
Wilco


v2: Update docs, report incompatible use of -mcmodel=large and outline atomics.

Outline atomics is not designed to be used with -mcmodel=large, so disable
it automatically if the large code model is used.  Report incompatible
explicit use of outline atomics with large model.  Update documentation.

gcc:
PR target/112465
* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Turn off outline atomics with -mcmodel=large.
(initialize_aarch64_code_model): Report incompatible use of large model
and outline atomics.
* doc/invoke.texi (-mcmodel=large): Update description.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..8bf0c011309d1f603137556006297dae5f009a47
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct 
gcc_options *opts)
  intermediary step for the former.  */
   if (flag_mlow_precision_sqrt)
 flag_mrecip_low_precision_sqrt = true;
+
+  /* Turn off outline atomics with -mcmodel=large.  */
+  if (aarch64_cmodel == AARCH64_CMODEL_LARGE)
+opts->x_aarch64_flag_outline_atomics = 0;
 }
 
 /* 'Unpack' up the internal tuning structs and update the options
@@ -19150,6 +19154,8 @@ initialize_aarch64_code_model (struct gcc_options *opts)
   opts->x_flag_pic > 1 ? "PIC" : "pic");
   if (opts->x_aarch64_abi == AARCH64_ABI_ILP32)
sorry ("code model %qs not supported in ilp32 mode", "large");
+  if (opts->x_aarch64_flag_outline_atomics == 1)
+   sorry ("code model %qs does not support %<-moutline-atomics%>", 
"large");
   break;
 case AARCH64_CMODEL_TINY_PIC:
 case AARCH64_CMODEL_SMALL_PIC:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
bad49a017cc1bb0922eba9f0b2db80166d409cd7..7352da7724a60efdbdbb6b743b1f7fef258f4bc8
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21512,10 +21512,12 @@ dynamically linked.  This is the default code model.
 
 @opindex mcmodel=large
 @item -mcmodel=large
-Generate code for the large code model.  This makes no assumptions about
-addresses and sizes of sections.  Programs can be statically linked only.  The
+Generate code for the large code model.  This allows large .bss and .data
+sections, however .text and .rodata must still be < 2GiB in total.  Position
+independent code or statically linked libraries are not supported.  The
 @option{-mcmodel=large} option is incompatible with @option{-mabi=ilp32},
-@option{-fpic} and @option{-fPIC}.
+@option{-fpie}, @option{-fPIE}, @option{-fpic}, @option{-fPIC},
+@option{-static}, @option{-moutline-atomics}.
 
 @item -mtp=@var{name}
 @opindex mtp



Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Ramana Radhakrishnan
On Tue, Mar 4, 2025 at 11:47 PM Wilco Dijkstra  wrote:
>
> Hi Kyrill,
>
> > This restriction should be documented in invoke.texi IMO.
> > I also think it would be more user friendly to warn them about the 
> > incompatibility if an explicit -moutline-atomics option is passed.
> > It’s okay though to silently turn off the implicit default-on option though.
>
> I've updated the documentation for -mcmodel=large so it's a little closer to 
> reality.
> I've also added the same check as for other incompatible options like -fPIC - 
> this
> means with explicit use of outline atomics you get an error immediately 
> instead of
> an unclear linker error.
>
> Cheers,
> Wilco
>
>
> v2: Update docs, report incompatible use of -mcmodel=large and outline 
> atomics.
>
> Outline atomics is not designed to be used with -mcmodel=large, so disable
> it automatically if the large code model is used.  Report incompatible
> explicit use of outline atomics with large model.  Update documentation.
>
> gcc:
> PR target/112465
> * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
> Turn off outline atomics with -mcmodel=large.
> (initialize_aarch64_code_model): Report incompatible use of large 
> model
> and outline atomics.
> * doc/invoke.texi (-mcmodel=large): Update description.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..8bf0c011309d1f603137556006297dae5f009a47
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct 
> gcc_options *opts)
>   intermediary step for the former.  */
>if (flag_mlow_precision_sqrt)
>  flag_mrecip_low_precision_sqrt = true;
> +
> +  /* Turn off outline atomics with -mcmodel=large.  */
> +  if (aarch64_cmodel == AARCH64_CMODEL_LARGE)
> +opts->x_aarch64_flag_outline_atomics = 0;
>  }
>
>  /* 'Unpack' up the internal tuning structs and update the options
> @@ -19150,6 +19154,8 @@ initialize_aarch64_code_model (struct gcc_options 
> *opts)
>opts->x_flag_pic > 1 ? "PIC" : "pic");
>if (opts->x_aarch64_abi == AARCH64_ABI_ILP32)
> sorry ("code model %qs not supported in ilp32 mode", "large");
> +  if (opts->x_aarch64_flag_outline_atomics == 1)
> +   sorry ("code model %qs does not support %<-moutline-atomics%>", 
> "large");
>break;
>  case AARCH64_CMODEL_TINY_PIC:
>  case AARCH64_CMODEL_SMALL_PIC:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> bad49a017cc1bb0922eba9f0b2db80166d409cd7..7352da7724a60efdbdbb6b743b1f7fef258f4bc8
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21512,10 +21512,12 @@ dynamically linked.  This is the default code model.
>
>  @opindex mcmodel=large
>  @item -mcmodel=large
> -Generate code for the large code model.  This makes no assumptions about
> -addresses and sizes of sections.  Programs can be statically linked only.  
> The
> +Generate code for the large code model.  This allows large .bss and .data
> +sections, however .text and .rodata must still be < 2GiB in total.  Position
> +independent code or statically linked libraries are not supported.  The
>  @option{-mcmodel=large} option is incompatible with @option{-mabi=ilp32},
> -@option{-fpic} and @option{-fPIC}.
> +@option{-fpie}, @option{-fPIE}, @option{-fpic}, @option{-fPIC},
> +@option{-static}, @option{-moutline-atomics}.

What's the issue with -static ?


Ramana

>
>  @item -mtp=@var{name}
>  @opindex mtp
>


Re: [Fortran, Patch, PR104684, v1] Fix gimple error when pointer assigning alloc array.

2025-03-04 Thread Harald Anlauf

Hi Andre,

Am 04.03.25 um 17:11 schrieb Andre Vehreschild:

Hi all,

attached patch fixes a gimplification fault when a pointer assignment to an
allocatable array is done. The tree type is different, because of that flag in
the lang_specific structure. View-converting the type fixes the issue.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?


this LGTM.

I am wondering if this kind of fix also helps with pr107143,
where pointer remapping is used, but your code is not reached
in that situation.

Thanks for the patch!

Harald


Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de




Re: [PATCH] c++: disable -Wnonnull in unevaluated context [PR115580]

2025-03-04 Thread Jason Merrill

On 3/4/25 3:26 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR complains that we issue a -Wnonnull even in a decltype.
Since we can't use cp_unevaluated_operand in c-common.cc, this


Would it make sense to check c_inhibit_evaluation_warnings instead?


fix targets even -Wformat and -Wrestrict.  I think that's fine.

PR c++/115580

gcc/cp/ChangeLog:

* call.cc (build_over_call): Don't call check_function_arguments
when cp_unevaluated_operand.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wnonnull16.C: New test.
---
  gcc/cp/call.cc |  1 +
  gcc/testsuite/g++.dg/warn/Wnonnull16.C | 16 
  2 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wnonnull16.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index be9b0cf62f1..87ca02507cf 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10669,6 +10669,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
  
bool warned_p = false;

if ((complain & tf_warning)
+  && !cp_unevaluated_operand
&& (warn_nonnull
  || warn_format
  || warn_suggest_attribute_format
diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull16.C 
b/gcc/testsuite/g++.dg/warn/Wnonnull16.C
new file mode 100644
index 000..8740f351ac7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wnonnull16.C
@@ -0,0 +1,16 @@
+// PR c++/115580
+// { dg-do compile { target c++11 } }
+
+class WithMember {
+public:
+  int foo();
+};
+
+decltype(((WithMember*)nullptr)->foo()) footype; // { dg-bogus "pointer is 
null" }
+
+int f(void*) __attribute__((nonnull));
+
+void g()
+{
+  [[maybe_unused]] decltype(f(nullptr)) b; // { dg-bogus "non-null" }
+}

base-commit: 879fd9c822633ecf2c62471d1a7f9b9619e296b7




Re: [PATCH] c++: ICE with operator new[] in constexpr [PR118775]

2025-03-04 Thread Jason Merrill

On 2/11/25 6:24 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we ICE since r11-7740 because we no longer say that (long)&a
(where a is a global var) is non_constant_p.  So VERIFY_CONSTANT
does not return and we crash on tree_to_uhwi.  We should check
tree_fits_uhwi_p before calling tree_to_uhwi.

PR c++/118775

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Check tree_fits_uhwi_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-new24.C: New test.
* g++.dg/cpp2a/constexpr-new25.C: New test.
---
  gcc/cp/constexpr.cc  |  7 +
  gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C | 25 ++
  gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C | 27 
  3 files changed, 59 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f142dd32bc8..f8f9a9df1a2 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2909,6 +2909,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  gcc_assert (arg0);
  if (new_op_p)
{
+ if (!tree_fits_uhwi_p (arg0))
+   {
+ if (!ctx->quiet)
+   error_at (loc, "cannot allocate array: size too large");


"too large" seems misleading in this case, where it just isn't a 
compile-time constant.


Why didn't the VERIFY_CONSTANT just above already reject this?


+ *non_constant_p = true;
+ return t;
+   }
  tree type = build_array_type_nelts (char_type_node,
  tree_to_uhwi (arg0));
  tree var = build_decl (loc, VAR_DECL,
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
new file mode 100644
index 000..debb7f0f5c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
@@ -0,0 +1,25 @@
+// PR c++/118775
+// { dg-do compile { target c++20 } }
+
+int a;
+
+constexpr char *
+f1 ()
+{
+  constexpr auto p = new char[(long int) &a]; // { dg-error "size too large" }
+  return p;
+}
+
+constexpr char *
+f2 ()
+{
+  auto p = new char[(long int) &a];  // { dg-error "size too large" }
+  return p;
+}
+
+void
+g ()
+{
+  auto r1 = f2 ();
+  constexpr auto r2 = f2 (); // { dg-message "in .constexpr. expansion" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
new file mode 100644
index 000..91c0318abd8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
@@ -0,0 +1,27 @@
+// PR c++/118775
+// { dg-do compile { target c++20 } }
+
+namespace std {
+struct __uniq_ptr_impl {
+  constexpr __uniq_ptr_impl(char *) {}
+};
+template  struct unique_ptr {
+  __uniq_ptr_impl _M_t;
+  constexpr ~unique_ptr() {}
+};
+template  struct _MakeUniq;
+template  struct _MakeUniq<_Tp[]> {
+  typedef unique_ptr<_Tp[]> __array;
+};
+template  using __unique_ptr_array_t = _MakeUniq<_Tp>::__array;
+constexpr __unique_ptr_array_t make_unique(long __num) {
+  return unique_ptr(new char[__num]);
+}
+} // namespace std
+int a;
+int
+main ()
+{
+  std::unique_ptr p = std::make_unique((long)&a);
+  constexpr std::unique_ptr p2 = std::make_unique((long)&a); // { dg-error 
"conversion" }
+}

base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2




Re: [PATCH v2 3/5][STAGE1] ctf: translate annotation DIEs to internal ctf

2025-03-04 Thread Yonghong Song




On 2/24/25 1:55 PM, David Faust wrote:


On 2/24/25 11:04, Indu Bhagat wrote:

On 2/6/25 11:54 AM, David Faust wrote:

Translate DW_TAG_GNU_annotation DIEs created for C attributes
btf_decl_tag and btf_type_tag into an in-memory representation in the
CTF/BTF container.  They will be output in BTF as BTF_KIND_DECL_TAG and
BTF_KIND_TYPE_TAG records.

The new CTF kinds used to represent these annotations, CTF_K_DECL_TAG
and CTF_K_TYPE_TAG, are expected to be formalized in the next version of
the CTF specification.  For now they only exist in memory as a
translation step to BTF, and are not emitted when generating CTF
information.


The patch overall looks OK to me.  Some comments inlined below.


David, any progress with this? Looks like the patch is in pretty good
shape. Let us push hard so the patch can be merged soon?

Thanks,
Yonghong



Re: [PATCH] c++: ICE with operator new[] in constexpr [PR118775]

2025-03-04 Thread Marek Polacek
Ping.

On Tue, Feb 11, 2025 at 06:24:32PM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> Here we ICE since r11-7740 because we no longer say that (long)&a
> (where a is a global var) is non_constant_p.  So VERIFY_CONSTANT
> does not return and we crash on tree_to_uhwi.  We should check
> tree_fits_uhwi_p before calling tree_to_uhwi.
> 
>   PR c++/118775
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_call_expression): Check tree_fits_uhwi_p.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/constexpr-new24.C: New test.
>   * g++.dg/cpp2a/constexpr-new25.C: New test.
> ---
>  gcc/cp/constexpr.cc  |  7 +
>  gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C | 25 ++
>  gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C | 27 
>  3 files changed, 59 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index f142dd32bc8..f8f9a9df1a2 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -2909,6 +2909,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
> gcc_assert (arg0);
> if (new_op_p)
>   {
> +   if (!tree_fits_uhwi_p (arg0))
> + {
> +   if (!ctx->quiet)
> + error_at (loc, "cannot allocate array: size too large");
> +   *non_constant_p = true;
> +   return t;
> + }
> tree type = build_array_type_nelts (char_type_node,
> tree_to_uhwi (arg0));
> tree var = build_decl (loc, VAR_DECL,
> diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C 
> b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
> new file mode 100644
> index 000..debb7f0f5c4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new24.C
> @@ -0,0 +1,25 @@
> +// PR c++/118775
> +// { dg-do compile { target c++20 } }
> +
> +int a;
> +
> +constexpr char *
> +f1 ()
> +{
> +  constexpr auto p = new char[(long int) &a]; // { dg-error "size too large" 
> }
> +  return p;
> +}
> +
> +constexpr char *
> +f2 ()
> +{
> +  auto p = new char[(long int) &a];  // { dg-error "size too large" }
> +  return p;
> +}
> +
> +void
> +g ()
> +{
> +  auto r1 = f2 ();
> +  constexpr auto r2 = f2 (); // { dg-message "in .constexpr. expansion" }
> +}
> diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C 
> b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
> new file mode 100644
> index 000..91c0318abd8
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-new25.C
> @@ -0,0 +1,27 @@
> +// PR c++/118775
> +// { dg-do compile { target c++20 } }
> +
> +namespace std {
> +struct __uniq_ptr_impl {
> +  constexpr __uniq_ptr_impl(char *) {}
> +};
> +template  struct unique_ptr {
> +  __uniq_ptr_impl _M_t;
> +  constexpr ~unique_ptr() {}
> +};
> +template  struct _MakeUniq;
> +template  struct _MakeUniq<_Tp[]> {
> +  typedef unique_ptr<_Tp[]> __array;
> +};
> +template  using __unique_ptr_array_t = _MakeUniq<_Tp>::__array;
> +constexpr __unique_ptr_array_t make_unique(long __num) {
> +  return unique_ptr(new char[__num]);
> +}
> +} // namespace std
> +int a;
> +int
> +main ()
> +{
> +  std::unique_ptr p = std::make_unique((long)&a);
> +  constexpr std::unique_ptr p2 = std::make_unique((long)&a); // { dg-error 
> "conversion" }
> +}
> 
> base-commit: 299a8e2dc667e795991bc439d2cad5ea5bd379e2
> -- 
> 2.48.1
> 

Marek



Re: [PATCH] c++: Check invalid use of constrained auto with trailing return type [PR100589]

2025-03-04 Thread Patrick Palka
On Sun, 2 Mar 2025, xxie-xd wrote:

> * Patrick Palka  [2025-02-28 10:37:37]:
> 
> > Hi,
> > 
> > On Fri, 28 Feb 2025, Da Xie wrote:
> > 
> > > This bug comes from a missing check when a function declaration has a
> > > late-specified return type and a constrained auto type specifier. By
> > > adding such a check, we can catch such uses and diagnose them as errors.
> > > 
> > > Successfully bootstrapped and regretested on x86_64-pc-linux-gnu:
> > > adds 6 new test results and 4 UNSUPPORTED results to g++.sum.  The 4
> > > UNSUPPORTED results are due to the fact that concepts are not supported
> > > before c++20, so tests cannot pass with c++17 and lower standards.
> > > 
> > >   PR c++/100589
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * decl.cc (grokdeclarator): Issue an error for a declarator with
> > >   constrained auto type specifier and trailing return types.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/other/pr100589.C: New test.
> > 
> > Thanks for the patch!  Looks good to me overall, some minor comments
> > below.
> > 
> 
> Thank you for the review! I'll make changes to the patch based on your
> advice.
> 
> > > 
> > > Signed-off-by: Da Xie 
> > > ---
> > >  gcc/cp/decl.cc| 7 +++
> > >  gcc/testsuite/g++.dg/other/pr100589.C | 7 +++
> > >  2 files changed, 14 insertions(+)
> > >  create mode 100644 gcc/testsuite/g++.dg/other/pr100589.C
> > > 
> > > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > > index 9ca8c6c4481..f283ed996a2 100644
> > > --- a/gcc/cp/decl.cc
> > > +++ b/gcc/cp/decl.cc
> > > @@ -14037,6 +14037,13 @@ grokdeclarator (const cp_declarator *declarator,
> > >   "invalid use of %");
> > >   return error_mark_node;
> > > }
> > > + else if (is_constrained_auto (type))
> > > +   {
> > > + error_at (typespec_loc, "invalid use of "
> > > +   "constrained auto type specifier "
> > > +   "in function with trailing return type");
> > 
> > We should quote the 'auto' in this diagnostic by writing %
> > for consistency with the other diagnostics.
> > 
> 
> I have added an if statement to handle cases where the declarator
> represents a named function. In such scenarios, an error will be issued,
> including the function name, the use of constrained %, and the
> existence of a trailing return type. For all other cases, an error
> message will indicate the invalid use of constrained %.
> 
> Curiously, I found that if the same error message occurred multiple
> times(In my case, the above error message does not print function names, 
> so every function/type declartions issues the same error message), 
> a single dg-error will catch all of them, even if these messages
> are issued on different lines of code.

I believe that's actually because of the .* in your pattern, which also
matches trailing newlines and so it can greedily match errors issued
later in the testcase.  Normally dg-error matches only errors occurring on
the line it's written in.

So for that reason it's better to use \[^\n\r\]* instead of .* as the
"match anything" pattern in a dg-error.  Very rarely do we actually want
a dg-error to look into multiple lines of errors.  This could use
documenting on the wiki page.

> I think this is the bug mentioned
> in GCC Wiki (https://gcc.gnu.org/wiki/HowToPrepareATestcase#TODO).
> Thankfully, in the updated testcase, the error messages are issued for a
> function declaration and type declaration each. I have updated the
> dg-error regex pattern to capture error messages separately for
> functions and types. This ensures more precise matching and handling of
> the respective error cases.
> 
> > > + return error_mark_node;
> > > +   }
> > >   tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node);
> > >   if (!tmpl)
> > > if (tree late_auto = type_uses_auto (late_return_type))
> > > diff --git a/gcc/testsuite/g++.dg/other/pr100589.C 
> > > b/gcc/testsuite/g++.dg/other/pr100589.C
> > > new file mode 100644
> > > index 000..3fa054ed34d
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/other/pr100589.C
> > 
> > Since this is a concepts-specific test let's name it
> > 
> >   g++.dg/cpp2a/concepts-pr100589.C
> > 
> 
> Renamed.
> 
> > > @@ -0,0 +1,7 @@
> > > +// PR c++/100589
> > > +// { dg-do compile { target c++20 } }
> > > +
> > > +template
> > > +concept false_concept = false;
> > > +
> > > +false_concept auto f() -> int; // { dg-error "constrained auto type 
> > > specifier.* trailing return type" }
> > 
> > It'd be good to also test a bare function type:
> > 
> >  using type = false_concept auto() -> int;
> > 
> 
> Test added.
> 
> > > \ No newline at end of file
> > > -- 
> > > 2.34.1
> > > 
> > > 
> 
> Thanks again for your reply. I will send updated patch when the
> bootstrap and regression tests are finished.
> 
> 



Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-03-04 Thread Torbjorn SVENSSON




On 2025-03-03 18:00, Richard Earnshaw wrote:

On 28/02/2025 16:18, Richard Earnshaw wrote:

On 28/02/2025 16:12, Richard Earnshaw wrote:

On 08/11/2024 18:47, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

A long time ago, this test forced -march=armv6.

With -marm, the generated assembler is:
foo:
  sub r0, r0, #48
  cmp r0, #9
  movhi   r0, #0
  movls   r0, #1
  bx  lr

With -mthumb, the generated assembler is:
foo:
  subs    r0, r0, #48
  movs    r2, #9
  uxtb    r3, r0
  movs    r0, #0
  cmp r2, r3
  adcs    r0, r0, r0
  uxtb    r0, r0
  bx  lr



Looking at this code, I think both the uxtb instructions are unnecessary.

For the first UXTB.  On entry, 'c' is a char (unsigned) so the value is passed 
by the caller in a 32-bit reg, but range-limited (by the ABI) to values between 
0 and 255.
We subtract 48 from that, so the range now, is from -48 to 207, but we're going 
to look at this as an unsigned value, so it's effectively 0..207 or 
UINT_MAX-48..UINT_MAX.
The UXTB instruction then converts the range so that the range becomes 0..255 
again (but importantly, the values between UINT_MAX-48 and UINT_MAX are mapped to 
the range 208..255.  We then do an unsigned comparison between that value and the 
constant 9 to test whether the result is >= 9.
Now, importantly, all the values that are in the range 208..255 are >= 9, but were 
also >= 9 before the UXTB instruction.  In effect, the UXTB is completely 
redundant.


I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That does 
affect the rest of the argument.

R.


The second UXTB is even more egregious.  We have 0 in r0 and we then 
conditionally add 1 to it, so r0 is in the range 0..1.  Zero extending that 
with a UXTB is therefore clearly pointless.

So neither of the UXTB instructions should be present, even if generating 
Thumb1 code.

I think the failures are, therefore, real and we should look to fix the 
compiler rather than 'fix' the scope of the test.

R.


Require effective-target arm_arm_ok to skip the test for thumb-only
targets (Cortex-M).

gcc/testsuite/ChangeLog:

 * gcc.target/arm/unsigned-extend-1.c: Use effective-target
 arm_arm_ok.

Signed-off-by: Torbjörn SVENSSON 
---
   gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ 
testsuite/gcc.target/arm/unsigned-extend-1.c
index 3b4ab048fb0..73f2e1a556d 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
@@ -1,4 +1,5 @@
   /* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
   /* { dg-options "-O2" } */
   unsigned char foo (unsigned char c)






I've just pushed a patch to eliminate one of the redundant zero-extends.  The 
other is harder to eliminate, so I've adjusted the test to xfail when compiling 
for thumb1.


Hi Richard,

Can this be backported to releases/gcc-14 too?
I can do the backport if you just approve it.

Kind regards,
Torbjörn


Re: [PATCH v2] c++: Check invalid use of constrained auto with trailing return type [PR100589]

2025-03-04 Thread Patrick Palka
On Sun, 2 Mar 2025, Da Xie wrote:

> Add check for constrained auto type specifier in function declaration or
> function type declaration with trailing return type. Issue error if such
> usage is detected.
> 
> Test file renamed, and added a new test for type declaration.
> 
> Successfully bootstrapped and regretested on x86_64-pc-linux-gnu:
> Added 6 passed and 4 unsupported tests.
> 
>   PR c++/100589
> 
> gcc/cp/ChangeLog:
> 
>   * decl.cc (grokdeclarator): Issue an error for a declarator with
>   constrained auto type specifier and trailing return types. Include
>   function names if available.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-pr100589.C: New test.

LGTM, thanks!  Jason, shall I push this to trunk?

> 
> Signed-off-by: Da Xie 
> ---
>  gcc/cp/decl.cc | 13 +
>  gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C |  9 +
>  2 files changed, 22 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C
> 
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 9ca8c6c4481..337ee65752e 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -14037,6 +14037,19 @@ grokdeclarator (const cp_declarator *declarator,
>   "invalid use of %");
>   return error_mark_node;
> }
> + else if (is_constrained_auto (type))
> +   {
> + if (funcdecl_p)
> +   error_at (typespec_loc,
> + "%qs function with trailing return type "
> + "has constrained % type specifier "
> + "rather than plain %",
> + name);
> + else
> +   error_at (typespec_loc,
> + "invalid use of constrained % type");
> + return error_mark_node;
> +   }
>   tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node);
>   if (!tmpl)
> if (tree late_auto = type_uses_auto (late_return_type))
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C
> new file mode 100644
> index 000..0c60d31f29b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr100589.C
> @@ -0,0 +1,9 @@
> +// PR c++/100589
> +// { dg-do compile { target c++20 } }
> +
> +template
> +concept false_concept = false;
> +
> +false_concept auto f() -> int; // { dg-error "'f' .* trailing return type.* 
> constrained 'auto'" }

I can replace the use of .* with \[^\r\n\]* before pushing.

> +
> +using type = false_concept auto() -> int; // { dg-error "invalid use of 
> constrained 'auto' type"}
> -- 
> 2.34.1
> 
> 



RE: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-04 Thread Li, Pan2
Agree with Robin and Andrew.

I was thinking we may not need to take care of vxrm similar as frm in previous, 
but seems
not that simple up to a point after Richard S points it out.

I also failed to locate some spec/doc about the details. I remember Jeff has 
some findings/comments,
and may wait for a while if we need to tweak the vxrm behaviors, like set, 
consume, global_reg, ... etc.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, March 4, 2025 8:51 PM
To: Vineet Gupta ; Andrew Waterman 
Cc: Li, Pan2 ; jeffreya...@gmail.com; 
gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
richard.sandif...@arm.com; Robin Dapp 
Subject: Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global 
register [PR118103])

> Yeah I didn't know how to articulate  it (and perhaps this still requires
> clarification)
>
> Say we have following
>
> // reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-1.c
>  main
>     set_frm (4);    // orig global FRM update
>
>     test_float_point_frm_run_1 (op1, op2, vl)
>    set_frm (0);
>    result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
>    assert_equal (1, get_frm ())
>    // restore global 4 before returning
>
>     assert_equal (4, get_frm ()  <-- here call restores global
>
> vs.
>
> // reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-5.c
> main
>   set_frm (1);    // orig global FRM update
>
>   test_float_point_frm_run_1 (op1, op2, vl)
>       other_function()
>            set_frm (2);    // also global update
>
>   assert_equal (2, get_frm ()    <-- here call doesn't restore
>
> There's no explicit annotation or anything about the FRM, yet in 2nd example
> other_function () we consider frm write a global event but not in
> test_float_point_frm_run_1 () in 1st example: I'm missing words how to explain
> that :-)

AFAIR the general idea is that every function can potentially "clobber", i.e. 
set FRM globally.

The intent of the first test is to save and restore the global rounding mode 
because we modify it locally by a rounding-mode changing intrinsic.

However, as you note, the set_frm at the beginning of 
test_float_point_frm_run_1 has no effect.  What should intuitively(?) happen is 
that it sets a new global FRM that we restore upon exit.  We, however, restore 
the previous global FRM.  I believe that's because set_frm is an inline 
assembly, inlined into test_float_point_frm_run_1 and we expect inline 
assemblies to _not_ change the rounding mode.  That was one of my concerns 
during the review some time in 2023.  If set_frm were a non-inlined function 
we'd need to assume it changes the rounding mode and restore the value it has 
after the set_frm call.

And that's also what is different in frm-run-5.c (your second snippet).  We 
wrap the set_frm call in another non-inlinable call.  Thus, we need to assume 
the global rounding mode changes and save/restore.

I guess it was a deliberate decision to handle asm snippets different.  At 
least as long as they are in the same visible scope it's obvious when they're 
changing the rounding mode.  One could argue that users doing that are on their 
own.  Not sure that argument has worked very well, historically, though ;)

-- 
Regards
 Robin



Re: [PATCH] testsuite: arm: Use effective-target for pr68674.c test

2025-03-04 Thread Torbjorn SVENSSON




On 2025-02-28 16:06, Richard Earnshaw wrote:

On 08/11/2024 17:44, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68674.c: Use effective-target arm_arch_v7a
and arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.target/arm/pr68674.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr68674.c b/gcc/testsuite/ 
gcc.target/arm/pr68674.c

index 0b3237458fe..3fd562d0518 100644
--- a/gcc/testsuite/gcc.target/arm/pr68674.c
+++ b/gcc/testsuite/gcc.target/arm/pr68674.c
@@ -1,9 +1,10 @@
  /* PR target/68674 */
  /* { dg-do compile } */
-/* { dg-require-effective-target arm_neon_ok } */
-/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-require-effective-target arm_libc_fp_abi_ok } */
  /* { dg-options "-O2" } */
-/* { dg-add-options arm_fp } */
+/* { dg-add-options arm_arch_v7a } */
+/* { dg-add-options arm_libc_fp_abi } */
  #pragma GCC target ("fpu=vfp")


I've spent some time thinking about this issue.  If we change the test 
in the way you suggest we lose some coverage of other variants where 
this ought to work, but the alternative is to only run the test on 
platforms where the base architecture is compatible with neon anyway, so 
we lose coverage for other builds entirely.


So there's no easy answer here.  However, the options you've picked 
above pretty much match the original bug report.  I think that's 
sufficient.


OK, and apologies for taking so long pondering this issue.

R.


Pushed as r15-7815-g879fd9c8226 and r14.2.0-852-ga42ef0938f3.

Kind regards,
Torbjörn


Re: [PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-04 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi all,
>
> In this testcase late-combine was failing to merge:
> dup v31.4s, v31.s[3]
> fmla v30.4s, v31.4s, v29.4s
> into the lane-wise fmla form.
> This is because late-combine checks may_trap_p under the hood on the dup insn.
> This ended up returning true for the insn:
> (set (reg:V4SF 152 [ _32 ])
>   (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
>   (parallel:V4SF [
>   (const_int 3 [0x3])]
>
> Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
> floating-point modes can't trap, it assumed that the V4SF parallel can trap.
> The correct behaviour is to recurse into vector inside the PARALLEL and check
> the sub-expression. This patch adjusts may_trap_p_1 to do just that.
> With this check the above insn is not deemed to be trapping and is propagated
> into the FMLA giving:
> fmla vD.4s, vA.4s, vB.s[3]
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Apparently this also fixes a regression in
> gcc.target/aarch64/vmul_element_cost.c that I observed.
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   PR rtl-optimization/119046
>   * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.
>
> gcc/testsuite/
>
>   PR rtl-optimization/119046
>   * gcc.target/aarch64/pr119046.c: New test.
>
> From 39fa1be73e8fed7fc673329e1854ba41587623c4 Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov 
> Date: Thu, 27 Feb 2025 09:00:25 -0800
> Subject: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with
>  floating-point mode as trapping
>
> In this testcase late-combine was failing to merge:
> dup v31.4s, v31.s[3]
> fmlav30.4s, v31.4s, v29.4s
> into the lane-wise fmla form.
> This is because late-combine checks may_trap_p under the hood on the dup insn.
> This ended up returning true for the insn:
> (set (reg:V4SF 152 [ _32 ])
> (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
> (parallel:V4SF [
> (const_int 3 [0x3])]
>
> Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
> floating-point modes can't trap, it assumed that the V4SF parallel can trap.
> The correct behaviour is to recurse into vector inside the PARALLEL and check
> the sub-expression.  This patch adjusts may_trap_p_1 to do just that.
> With this check the above insn is not deemed to be trapping and is propagated
> into the FMLA giving:
> fmlavD.4s, vA.4s, vB.s[3]
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Apparently this also fixes a regression in
> gcc.target/aarch64/vmul_element_cost.c that I observed.
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   PR rtl-optimization/119046
>   * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.

I don't object to this, if someone else agrees that it's the right fix.
But the mode on the parallel looks iffy to me, in that the data is not
floating-point data.  VOIDmode would probably be more accurate and
seems to be what x86 uses.

Thanks,
Richard

>
> gcc/testsuite/
>
>   PR rtl-optimization/119046
>   * gcc.target/aarch64/pr119046.c: New test.
> ---
>  gcc/rtlanal.cc  |  1 +
>  gcc/testsuite/gcc.target/aarch64/pr119046.c | 16 
>  2 files changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr119046.c
>
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index 8caffafdaa4..7ad67afb9fe 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -3252,6 +3252,7 @@ may_trap_p_1 (const_rtx x, unsigned flags)
>   return true;
>break;
>  
> +case PARALLEL:
>  case NEG:
>  case ABS:
>  case SUBREG:
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr119046.c 
> b/gcc/testsuite/gcc.target/aarch64/pr119046.c
> new file mode 100644
> index 000..aa5fa7c848c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr119046.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2" } */
> +
> +#include 
> +
> +float32x4_t madd_helper_1(float32x4_t a, float32x4_t b, float32x4_t d)
> +{
> +  float32x4_t t = a;
> +  t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
> +  t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
> +  return t;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tdup\tv[0-9]+\.4s, v[0-9]+.s\[1\]\n} } 
> } */
> +/* { dg-final { scan-assembler-times {\tfmla\tv[0-9]+\.4s, v[0-9]+\.4s, 
> v[0-9]+\.s\[1\]\n} 2 } } */
> +


Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Wilco Dijkstra
Hi Richard&Kyrill,

>> I’m in favour of this.
>
> Yeah, seems ok to me too.  I suppose we ought to update the documentation too:

I've added a note to the documentation. However it is impossible to be complete 
here
since many targets switch off early scheduling under various circumstances. So 
I've just
mentioned what x86 and AArch64 do (see v2 below).

Cheers,
Wilco


v2: Update documentation

Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15 benefits
from much faster build times with -O2, but avoids the regressions in lbm which
is very sensitive to minor scheduling changes due to long FMA chains.

gcc:
PR target/118351
* common/config/aarch64/aarch64-common.cc: Enable early scheduling with
-O3 and higher.
* doc/invoke.texi (-fschedule-insns): Update comment.

---

diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -54,8 +54,10 @@ static const struct default_options 
aarch_option_optimization_table[] =
 { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 },
 /* Enable -fsched-pressure by default when optimizing.  */
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-/* Disable early scheduling due to high compile-time overheads.  */
+/* Except for -O3 and higher, disable early scheduling due to high
+   compile-time overheads.  */
 { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
+{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 },
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
bad49a017cc1bb0922eba9f0b2db80166d409cd7..5eb2e96c483126ad075b427969e8c02ce3f51e7a
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13503,7 +13503,9 @@ helps machines that have slow floating point or memory 
load instructions
 by allowing other instructions to be issued until the result of the load
 or floating-point instruction is required.
 
-Enabled at levels @option{-O2}, @option{-O3}.
+Enabled at levels @option{-O2}, @option{-O3}.  Many targets use a different
+default, for example, it is disabled on x86 and enabled only at level
+@option{-O3} on AArch64.
 
 @opindex fschedule-insns2
 @item -fschedule-insns2


[PATCH] c++: disable -Wnonnull in unevaluated context [PR115580]

2025-03-04 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR complains that we issue a -Wnonnull even in a decltype.
Since we can't use cp_unevaluated_operand in c-common.cc, this
fix targets even -Wformat and -Wrestrict.  I think that's fine.

PR c++/115580

gcc/cp/ChangeLog:

* call.cc (build_over_call): Don't call check_function_arguments
when cp_unevaluated_operand.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wnonnull16.C: New test.
---
 gcc/cp/call.cc |  1 +
 gcc/testsuite/g++.dg/warn/Wnonnull16.C | 16 
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wnonnull16.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index be9b0cf62f1..87ca02507cf 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10669,6 +10669,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
 
   bool warned_p = false;
   if ((complain & tf_warning)
+  && !cp_unevaluated_operand
   && (warn_nonnull
  || warn_format
  || warn_suggest_attribute_format
diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull16.C 
b/gcc/testsuite/g++.dg/warn/Wnonnull16.C
new file mode 100644
index 000..8740f351ac7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wnonnull16.C
@@ -0,0 +1,16 @@
+// PR c++/115580
+// { dg-do compile { target c++11 } }
+
+class WithMember {
+public:
+  int foo();
+};
+
+decltype(((WithMember*)nullptr)->foo()) footype; // { dg-bogus "pointer is 
null" }
+
+int f(void*) __attribute__((nonnull));
+
+void g()
+{
+  [[maybe_unused]] decltype(f(nullptr)) b; // { dg-bogus "non-null" }
+}

base-commit: 879fd9c822633ecf2c62471d1a7f9b9619e296b7
-- 
2.48.1



[PATCH] i386: Correct mask width for bf8->fp16 intrin on 256/512 bit

2025-03-04 Thread Haochen Jiang
Hi all,

For bf8 -> pf16 convert, when dst is 256 bit, the mask should be
16 bit since 16*16=256, not the 8 bit in the current intrin. In
512 bit intrin, the mask bit is also halved. This patch will fix
both of them.

Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

* config/i386/avx10_2-512convertintrin.h
(_mm512_mask_cvtbf8_ph): Correct mask width.
(_mm512_maskz_cvtbf8_ph): Ditto.
* config/i386/avx10_2convertintrin.h
(_mm256_mask_cvtbf8_ph): Ditto.
(_mm256_maskz_cvtbf8_ph): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-convert-1.c: Change function call.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
---
 gcc/config/i386/avx10_2-512convertintrin.h| 4 ++--
 gcc/config/i386/avx10_2convertintrin.h| 4 ++--
 gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c | 4 ++--
 gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/avx10_2-512convertintrin.h 
b/gcc/config/i386/avx10_2-512convertintrin.h
index 1079e0a2bda..a44481e0b4e 100644
--- a/gcc/config/i386/avx10_2-512convertintrin.h
+++ b/gcc/config/i386/avx10_2-512convertintrin.h
@@ -550,7 +550,7 @@ _mm512_cvtbf8_ph (__m256i __A)
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtbf8_ph (__m512h __S, __mmask16 __U, __m256i __A)
+_mm512_mask_cvtbf8_ph (__m512h __S, __mmask32 __U, __m256i __A)
 {
   return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_mask_slli_epi16 (
 (__m512i) __S, __U, (__m512i) _mm512_cvtepi8_epi16 (__A), 8));
@@ -558,7 +558,7 @@ _mm512_mask_cvtbf8_ph (__m512h __S, __mmask16 __U, __m256i 
__A)
 
 extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtbf8_ph (__mmask16 __U, __m256i __A)
+_mm512_maskz_cvtbf8_ph (__mmask32 __U, __m256i __A)
 {
   return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_slli_epi16 (
 (__m512i) _mm512_maskz_cvtepi8_epi16 (__U, __A), 8));
diff --git a/gcc/config/i386/avx10_2convertintrin.h 
b/gcc/config/i386/avx10_2convertintrin.h
index 3fc51b17435..7c9c238a3b4 100644
--- a/gcc/config/i386/avx10_2convertintrin.h
+++ b/gcc/config/i386/avx10_2convertintrin.h
@@ -1004,7 +1004,7 @@ _mm256_cvtbf8_ph (__m128i __A)
 
 extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_mask_cvtbf8_ph (__m256h __S, __mmask8 __U, __m128i __A)
+_mm256_mask_cvtbf8_ph (__m256h __S, __mmask16 __U, __m128i __A)
 {
   return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_mask_slli_epi16 (
 (__m256i) __S, __U, (__m256i) _mm256_cvtepi8_epi16 (__A), 8));
@@ -1012,7 +1012,7 @@ _mm256_mask_cvtbf8_ph (__m256h __S, __mmask8 __U, __m128i 
__A)
 
 extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_maskz_cvtbf8_ph (__mmask8 __U, __m128i __A)
+_mm256_maskz_cvtbf8_ph (__mmask16 __U, __m128i __A)
 {
   return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_slli_epi16 (
 (__m256i) _mm256_maskz_cvtepi8_epi16 (__U, __A), 8));
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c
index bda74b5776b..c1e44efdb2f 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c
@@ -183,6 +183,6 @@ void extern
 avx10_2_512_cvtbf8_fp16_test (void)
 {
   y = _mm512_cvtbf8_ph (z1);
-  y = _mm512_mask_cvtbf8_ph (z, m16, z1);
-  y = _mm512_maskz_cvtbf8_ph (m16, z1);
+  y = _mm512_mask_cvtbf8_ph (z, m32, z1);
+  y = _mm512_maskz_cvtbf8_ph (m32, z1);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c
index 57b5fce7fb6..729496f7173 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c
@@ -289,6 +289,6 @@ avx10_2_cvtbf8_fp16_test (void)
   y = _mm_maskz_cvtbf8_ph (m8, z3);
 
   y2 = _mm256_cvtbf8_ph (z3);
-  y2 = _mm256_mask_cvtbf8_ph (z2, m8, z3);
-  y2 = _mm256_maskz_cvtbf8_ph (m8, z3);
+  y2 = _mm256_mask_cvtbf8_ph (z2, m16, z3);
+  y2 = _mm256_maskz_cvtbf8_ph (m16, z3);
 }
-- 
2.31.1



Re: [PATCH] c++: Fix checking assert upon invalid class definition [PR116740]

2025-03-04 Thread Jason Merrill

On 2/18/25 8:00 AM, Simon Martin wrote:

A checking assert triggers upon the following invalid code since
GCC 11:

=== cut here ===
class { a (struct b;
} struct b
=== cut here ===

The problem is that during error recovery, we call
set_identifier_type_value_with_scope for B in the global namespace, and
the checking assert added via r11-7228-g8f93e1b892850b fails.

This patch relaxes that assert to not fail if we've seen a parser error
(it a generalization of another fix done to that checking assert via
r11-7266-g24bf79f1798ad1).

Successfully tested on x86_64-pc-linux-gnu.


OK.


PR c++/116740

gcc/cp/ChangeLog:

* name-lookup.cc (set_identifier_type_value_with_scope): Don't
fail assert with ill-formed input.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash80.C: New test.

---
  gcc/cp/name-lookup.cc| 6 ++
  gcc/testsuite/g++.dg/parse/crash80.C | 7 +++
  2 files changed, 9 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/crash80.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index d1abb205bc7..742e5d289dc 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5101,10 +5101,8 @@ set_identifier_type_value_with_scope (tree id, tree 
decl, cp_binding_level *b)
if (b->kind == sk_namespace)
  /* At namespace scope we should not see an identifier type value.  */
  gcc_checking_assert (!REAL_IDENTIFIER_TYPE_VALUE (id)
-/* We could be pushing a friend underneath a template
-   parm (ill-formed).  */
-|| (TEMPLATE_PARM_P
-(TYPE_NAME (REAL_IDENTIFIER_TYPE_VALUE (id);
+/* But we might end up here with ill-formed input.  */
+|| seen_error ());
else
  {
/* Push the current type value, so we can restore it later  */
diff --git a/gcc/testsuite/g++.dg/parse/crash80.C 
b/gcc/testsuite/g++.dg/parse/crash80.C
new file mode 100644
index 000..cd9216adf5c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash80.C
@@ -0,0 +1,7 @@
+// PR c++/116740
+// { dg-do "compile" }
+
+class K {
+  int a(struct b; // { dg-error "expected '\\)'" }
+};
+struct b {};




[pushed] c++: C++23 range-for temps and ?: [PR119073]

2025-03-04 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Here gimplification got confused because extend_temps_r messed up the types
of the arms of a COND_EXPR.

PR c++/119073

gcc/cp/ChangeLog:

* call.cc (extend_temps_r): Preserve types of COND_EXPR arms.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/range-for39.C: New test.
---
 gcc/cp/call.cc   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/range-for39.C | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/range-for39.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index be9b0cf62f1..f7b4cccb1c7 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14902,7 +14902,7 @@ extend_temps_r (tree *tp, int *walk_subtrees, void 
*data)
  {
tree set = build2 (MODIFY_EXPR, boolean_type_node,
   cur_cond_guard, boolean_true_node);
-   op = add_stmt_to_compound (set, op);
+   op = cp_build_compound_expr (set, op, tf_none);
  }
   };
   walk_arm (TREE_OPERAND (*tp, 1));
diff --git a/gcc/testsuite/g++.dg/cpp0x/range-for39.C 
b/gcc/testsuite/g++.dg/cpp0x/range-for39.C
new file mode 100644
index 000..ebb6acafe7f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/range-for39.C
@@ -0,0 +1,12 @@
+// PR c++/119073
+// { dg-do compile { target c++11 } }
+
+struct A { ~A (); };
+struct B { B (const A &a = A ()); int *begin (); int *end (); ~B (); };
+
+void
+foo (bool x)
+{
+  for (auto i : (x ? B{} : B{}))
+;
+}

base-commit: 8d7762945bba3c8321b5856cee3d5e5aed55facf
-- 
2.48.1



Re: [PATCH] c++: Use capture from outer lambda, if any, instead of erroring out [PR110584]

2025-03-04 Thread Jason Merrill

On 2/18/25 5:12 AM, Simon Martin wrote:

We've been rejecting this valid code since r8-4571:

=== cut here ===
void foo (float);
int main () {
   constexpr float x = 0;
   (void) [&] () {
 foo (x);
 (void) [] () {
   foo (x);
 };
   };
}
=== cut here ===

The problem is that when processing X in the inner lambda,
process_outer_var_ref errors out even though it does find the capture
from the enclosing lambda.

This patch changes process_outer_var_ref to accept and return the outer
proxy if it finds any.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/110584

gcc/cp/ChangeLog:

* semantics.cc (process_outer_var_ref): Use capture from
enclosing lambda, if any.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-nested10.C: New test.

---
  gcc/cp/semantics.cc   |  4 ++
  .../g++.dg/cpp0x/lambda/lambda-nested10.C | 46 +++
  2 files changed, 50 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 7c7d3e3c432..7bbc82f7dc1 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4598,6 +4598,10 @@ process_outer_var_ref (tree decl, tsubst_flags_t 
complain, bool odr_use)
if (!odr_use && context == containing_function)
  decl = add_default_capture (lambda_stack,
/*id=*/DECL_NAME (decl), initializer);
+  /* When doing lambda capture, if we found a capture in an enclosing lambda,
+ we can use it.  */
+  else if (!odr_use && is_capture_proxy (decl))
+return decl;


What makes this OK is not that it's a capture, but that the captured 
variable is constant, so it should have satisfied this branch:


  /* Only an odr-use of an outer automatic variable causes an
 error, and a constant variable can decay to a prvalue   
 constant without odr-use.  So don't complain yet.  */

  else if (!odr_use && decl_constant_var_p (decl))
return decl;


is_constant_capture_proxy might be useful here.  And/or a new 
strip_normal_capture function (that I'm surprised doesn't exist yet).


Jason



Re: [PATCH] c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787]

2025-03-04 Thread Jason Merrill

On 2/11/25 1:04 PM, Jakub Jelinek wrote:

Hi!

The following testcase IMO in violation of the P2552R3 paper doesn't
pedwarn on alignas applying to dependent types or alignas with dependent
argument.

tsubst was just ignoring TYPE_ATTRIBUTES.

The following patch fixes it for the POINTER/REFERENCE_TYPE and
ARRAY_TYPE cases, but perhaps we need to do the same also for other
types (INTEGER_TYPE/REAL_TYPE and the like).  I guess I'll need to
construct more testcases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2025-02-11  Jakub Jelinek  

PR c++/118787
* pt.cc (tsubst) : Use return t; only if it doesn't
have any TYPE_ATTRIBUTES.  Call apply_late_template_attributes.
: Likewise.  Formatting fix.

* g++.dg/cpp0x/alignas22.C: New test.

--- gcc/cp/pt.cc.jj 2025-02-07 17:03:13.560227281 +0100
+++ gcc/cp/pt.cc2025-02-10 17:17:47.65131 +0100
@@ -16854,7 +16854,9 @@ tsubst (tree t, tree args, tsubst_flags_
  case POINTER_TYPE:
  case REFERENCE_TYPE:
{
-   if (type == TREE_TYPE (t) && TREE_CODE (type) != METHOD_TYPE)
+   if (type == TREE_TYPE (t)
+   && TREE_CODE (type) != METHOD_TYPE
+   && TYPE_ATTRIBUTES (t) == NULL_TREE)
  return t;
  
  	/* [temp.deduct]

@@ -16924,9 +16926,9 @@ tsubst (tree t, tree args, tsubst_flags_
 A,' while an attempt to create the type type rvalue reference to
 cv T' creates the type T"
  */
- r = cp_build_reference_type
- (TREE_TYPE (type),
-  TYPE_REF_IS_RVALUE (t) && TYPE_REF_IS_RVALUE (type));
+ r = cp_build_reference_type (TREE_TYPE (type),
+  TYPE_REF_IS_RVALUE (t)
+  && TYPE_REF_IS_RVALUE (type));
else
  r = cp_build_reference_type (type, TYPE_REF_IS_RVALUE (t));
r = cp_build_qualified_type (r, cp_type_quals (t), complain);
@@ -16935,6 +16937,11 @@ tsubst (tree t, tree args, tsubst_flags_
  /* Will this ever be needed for TYPE_..._TO values?  */
  layout_type (r);
  
+	if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t),

+/*flags=*/0,
+args, complain, in_decl))
+ return error_mark_node;
+
return r;
}
  case OFFSET_TYPE:
@@ -17009,7 +17016,9 @@ tsubst (tree t, tree args, tsubst_flags_
  
  	/* As an optimization, we avoid regenerating the array type if

   it will obviously be the same as T.  */
-   if (type == TREE_TYPE (t) && domain == TYPE_DOMAIN (t))
+   if (type == TREE_TYPE (t)
+   && domain == TYPE_DOMAIN (t)
+   && TYPE_ATTRIBUTES (t) == NULL_TREE)
  return t;
  
  	/* These checks should match the ones in create_array_type_for_decl.

@@ -17048,6 +17057,11 @@ tsubst (tree t, tree args, tsubst_flags_
TYPE_USER_ALIGN (r) = 1;
  }
  
+	if (!apply_late_template_attributes (&r, TYPE_ATTRIBUTES (t),

+/*flags=*/0,
+args, complain, in_decl))
+ return error_mark_node;
+
return r;
}
  
--- gcc/testsuite/g++.dg/cpp0x/alignas22.C.jj	2025-02-10 17:33:16.242452750 +0100

+++ gcc/testsuite/g++.dg/cpp0x/alignas22.C  2025-02-10 17:36:28.739046629 
+0100
@@ -0,0 +1,23 @@
+// PR c++/118787
+// { dg-do compile { target c++11 } }
+// { dg-options "-pedantic" }
+
+template 
+void foo (T & alignas (N));// { dg-warning "'alignas' on a type 
other than class" }
+template 
+void bar (T (&)[N] alignas (N));   // { dg-warning "'alignas' on a type other than 
class" }
+template 
+using U = T * alignas (N); // { dg-warning "'alignas' on a type other 
than class" }
+template 
+using V = T[N] alignas (N);// { dg-warning "'alignas' on a type other 
than class" }
+
+void
+baz ()
+{
+  int x alignas (4) = 0;
+  foo  (x);
+  int y alignas (4) [4];
+  bar  (y);
+  U  u;
+  V  v;
+}

Jakub





Re: [PATCH] c++: Don't replace INDIRECT_REFs by a const capture proxy too eagerly [PR117504]

2025-03-04 Thread Jason Merrill

On 2/14/25 12:08 PM, Simon Martin wrote:

We have been miscompiling the following valid code since GCC8, and
r8-3497-g281e6c1d8f1b4c

=== cut here ===
struct span {
   span (const int (&__first)[1]) : _M_ptr (__first) {}
   int operator[] (long __i) { return _M_ptr[__i]; }
   const int *_M_ptr;
};
void foo () {
   constexpr int a_vec[]{1};
   auto vec{[&a_vec]() -> span { return a_vec; }()};
}
=== cut here ===

The problem is that perform_implicit_conversion_flags (via
mark_rvalue_use) replaces "a_vec" in the return statement by a
CONSTRUCTOR representing a_vec's constant value, and then takes its
address when invoking span's constructor. So we end up with an instance
that points to garbage instead of a_vec's storage.

I've tried many things to somehow recover from this replacement, but I
actually think we should not do it when converting to a class type: we
have no idea whether the conversion will involve a constructor taking an
address or reference. So we should assume it's the case, and call
mark_lvalue_use, not mark_rvalue_use (I might very weel be overseeing
things, and feedback is more than welcome).


Yeah, those mark_*_use calls seem misplaced, they should be called 
instead by the code that actually does the conversion.


What if we replace all of them here with just mark_exp_read?  Or nothing?


This is what the patch does, successfully tested on x86_64-pc-linux-gnu.

PR c++/117504

gcc/cp/ChangeLog:

* call.cc (perform_implicit_conversion_flags): When possibly
converting to a class, call mark_lvalue_use.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-117504.C: New test.
* g++.dg/cpp2a/constexpr-117504a.C: New test.

---
  gcc/cp/call.cc|  4 ++
  gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C | 60 +++
  .../g++.dg/cpp2a/constexpr-117504a.C  | 12 
  3 files changed, 76 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 38a8f7fdcda..097b1fa55a4 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13973,6 +13973,10 @@ perform_implicit_conversion_flags (tree type, tree 
expr,
  
if (TYPE_REF_P (type))

  expr = mark_lvalue_use (expr);
+  else if (MAYBE_CLASS_TYPE_P (type))
+/* We might convert using a constructor that takes the address of EXPR, so
+   assume that it will be the case.  */
+expr = mark_lvalue_use (expr);
else
  expr = mark_rvalue_use (expr);
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C

new file mode 100644
index 000..290d3dfd61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C
@@ -0,0 +1,60 @@
+// PR c++/117504 - Initial report
+// { dg-do "run" { target c++20 } }
+
+struct span {
+  span (const int (&__first)[1]) : _M_ptr (__first) {}
+  int operator[] (long __i) { return _M_ptr[__i]; }
+  const int *_M_ptr;
+};
+
+constexpr int a_global_vec[]{1};
+span myFunctor() {
+  return a_global_vec;
+}
+
+int main() {
+  constexpr int a_vec[]{1};
+
+  //
+  // This PR's case, that used to be miscompiled.
+  //
+  auto lambda_1 = [&a_vec] () -> span { return a_vec; };
+  auto vec_1 { lambda_1 () };
+  if (vec_1[0] != 1)
+__builtin_abort ();
+
+  // Variant that used to be miscompiled as well.
+  auto lambda_2 = [&] () -> span { return a_vec; };
+  auto vec_2 { lambda_2 () };
+  if (vec_2[0] != 1)
+__builtin_abort ();
+
+  //
+  // Related cases that worked already.
+  //
+  auto lambda_3 = [&a_vec] () /* -> span */ { return a_vec; };
+  auto vec_3 { lambda_3 () };
+  if (vec_3[0] != 1)
+__builtin_abort ();
+
+  auto lambda_4 = [&] () /* -> span */ { return a_vec; };
+  auto vec_4 { lambda_4 () };
+  if (vec_4[0] != 1)
+__builtin_abort ();
+
+  const int (&vec_5)[1] = a_vec;
+  if (vec_5[0] != 1)
+__builtin_abort ();
+
+  span vec_6 (a_vec);
+  if (vec_6[0] != 1)
+__builtin_abort ();
+
+  auto vec_7 = myFunctor ();
+  if (vec_7[0] != 1)
+__builtin_abort ();
+
+  const int (&vec_8)[1] { a_vec };
+  if (vec_8[0] != 1)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C
new file mode 100644
index 000..f6d4dc8cbc5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C
@@ -0,0 +1,12 @@
+// PR c++/117504 - ICE discovered by ppalka@ when reducing.
+// { dg-do "compile" { target c++20 } }
+
+struct span {
+  span (const int* __first) : _M_ptr (__first) {}
+  int operator[] (long __i) { return _M_ptr[__i]; }
+  const int *_M_ptr;
+};
+int main() {
+  constexpr int a_vec[]{1};
+  auto vec { [&a_vec]() -> span { return a_vec; } () };
+}




[committed] openmp, c++: Fix up OpenMP/OpenACC handling in C++ modules [PR119102]

2025-03-04 Thread Jakub Jelinek
Hi!

modules.cc has apparently support for extensions and attempts to ensure
that if a module is compiled with those extensions enabled, sources which
use the module are compiled with the same extensions.
The only extension supported is SE_OPENMP right now.
And the use of the extension is keyed on streaming out or in OMP_CLAUSE
tree.
This is undesirable for several reasons.
OMP_CLAUSE is the only tree which can appear in the IL even without
-fopenmp/-fopenmp-simd/-fopenacc (when simd ("notinbranch") or
simd ("inbranch") attributes are used), and it can appear also in all
the 3 modes mentioned above.  On the other side, with the exception of
arguments of attributes added e.g. for declare simd where no harm should
be done if -fopenmp/-fopenmp-simd isn't enabled later on, OMP_CLAUSE appears
in OMP_*_CLAUSES of OpenMP/OpenACC construct trees.  And those construct
trees often have no clauses at all, so keying the extension on OMP_CLAUSE
doesn't catch many cases that should be caught.
Furthermore, for OpenMP we have 2 modes, -fopenmp-simd which parses some
OpenMP but constructs from that mostly OMP_SIMD and a few other cases,
and -fopenmp which includes that and far more on top of that; and there is
also -fopenacc.

So, this patch stops setting/requesting the extension on OMP_CLAUSE,
introduces 3 extensions rather than one (SE_OPENMP_SIMD, SE_OPENMP and
SE_OPENACC) and keyes those on OpenMP constructs from the -fopenmp-simd
subset, other OpenMP constructs and OpenACC constructs.

2025-03-05  Jakub Jelinek  

PR c++/119102
gcc/cp/
* module.cc (enum streamed_extensions): Add SE_OPENMP_SIMD
and SE_OPENACC, change value of SE_OPENMP and SE_BITS.
(CASE_OMP_SIMD_CODE, CASE_OMP_CODE, CASE_OACC_CODE): Define.
(trees_out::start): Don't set SE_OPENMP extension for OMP_CLAUSE.
Set SE_OPENMP_SIMD extension for CASE_OMP_SIMD_CODE, SE_OPENMP
for CASE_OMP_CODE and SE_OPENACC for CASE_OACC_CODE.
(trees_in::start): Don't fail for OMP_CLAUSE with missing
SE_OPENMP extension.  Do fail for CASE_OMP_SIMD_CODE and missing
SE_OPENMP_SIMD extension, or CASE_OMP_CODE and missing SE_OPENMP
extension, or CASE_OACC_CODE and missing SE_OPENACC extension.
(module_state::write_readme): Write all of SE_OPENMP_SIMD, SE_OPENMP
and SE_OPENACC extensions.
(module_state::read_config): Diagnose missing -fopenmp, -fopenmp-simd
and/or -fopenacc depending on extensions used.
gcc/testsuite/
* g++.dg/modules/pr119102_a.H: New test.
* g++.dg/modules/pr119102_b.C: New test.
* g++.dg/modules/omp-3_a.C: New test.
* g++.dg/modules/omp-3_b.C: New test.
* g++.dg/modules/omp-3_c.C: New test.
* g++.dg/modules/omp-3_d.C: New test.
* g++.dg/modules/oacc-1_a.C: New test.
* g++.dg/modules/oacc-1_b.C: New test.
* g++.dg/modules/oacc-1_c.C: New test.

--- gcc/cp/module.cc.jj 2025-03-03 17:36:37.263748933 +0100
+++ gcc/cp/module.cc2025-03-04 17:52:16.439987517 +0100
@@ -3613,8 +3613,10 @@ void slurping::release_macros ()
 /* Flags for extensions that end up being streamed.  */
 
 enum streamed_extensions {
-  SE_OPENMP = 1 << 0,
-  SE_BITS = 1
+  SE_OPENMP_SIMD = 1 << 0,
+  SE_OPENMP = 1 << 1,
+  SE_OPENACC = 1 << 2,
+  SE_BITS = 3
 };
 
 /* Counter indices.  */
@@ -5276,6 +5278,53 @@ trees_in::tree_list (bool has_purpose)
 
   return res;
 }
+
+#define CASE_OMP_SIMD_CODE \
+case OMP_SIMD: \
+case OMP_STRUCTURED_BLOCK: \
+case OMP_LOOP: \
+case OMP_ORDERED:  \
+case OMP_TILE: \
+case OMP_UNROLL
+#define CASE_OMP_CODE \
+case OMP_PARALLEL: \
+case OMP_TASK: \
+case OMP_FOR:  \
+case OMP_DISTRIBUTE:   \
+case OMP_TASKLOOP: \
+case OMP_TEAMS:\
+case OMP_TARGET_DATA:  \
+case OMP_TARGET:   \
+case OMP_SECTIONS: \
+case OMP_CRITICAL: \
+case OMP_SINGLE:   \
+case OMP_SCOPE:\
+case OMP_TASKGROUP:\
+case OMP_MASKED:   \
+case OMP_DISPATCH: \
+case OMP_INTEROP:  \
+case OMP_MASTER:   \
+case OMP_TARGET_UPDATE:\
+case OMP_TARGET_ENTER_DATA:\
+case OMP_TARGET_EXIT_DATA: \
+case OMP_METADIRECTIVE:\
+case OMP_ATOMIC:   \
+case OMP_ATOMIC_READ:  \
+case OMP_ATOMIC_CAPTURE_OLD:   \
+case OMP_ATOMIC_CAPTURE_NEW
+#define CASE_OACC_CODE \
+case OACC_PARALLEL:\
+case OACC_KERNELS: \
+case OACC_SERIAL:  \
+case OACC_DATA:\
+case OACC_HOS

Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-03-04 Thread Jin Ma
On Fri, 28 Feb 2025 12:48:36 +0100, "Robin Dapp" wrote:
> > Okay, let me explain the background of my previous patch.
> >
> > Prior to applying my patch, for the test case bug-10.c (a reduced example 
> > of 
> > a larger program with incorrect runtime results),
> > the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as 
> > follows:
> > 1. vsetvli zero,a4,e16,m4,ta,ma + vsetvli zero,a4,e32,m8,ta,ma + vsetvli 
> > zero,a4,e8,m2,ta,ma
> >
> > The vsetvli sequence compiled with --param=vsetvl-strategy=optim was as 
> > follows:
> > 2. vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,zero,e8,m2,ta,ma >
> > Although vl remains unchanged, the SEW/LMUL ratio in sequence 2 changes, 
> > leading to undefined behavior.
> 
> The only difference I see with your patch vs without is
> 
> <   vsetvli zero,zero,e8,m2,ta,ma
> ---
> >   vsetvli zero,a3,e8,m2,ta,ma
> 
> and we ensure the former doesn't occur in the test.
> 
> But that difference doesn't matter because the ratio is the same before and 
> after.  That's why I'm asking.  bug-10.c as is doesn't test anything 
> reasonable 
> IMHO.  Right, the ratio (or rather the associated LMUL) was wrong but the 
> current test doesn't make sure it isn't.  Can you share the non-reduced (or 
> less reduced) case?

Hi Robin,

I apologize for the delayed response. I spent quite a bit of time trying to 
reproduce
the case, and given the passage of time, it wasn't easy to refine the testing.
Fortunately, you can see the results here.

https://godbolt.org/z/Mc8veW7oT

Using GCC version 14.2.0 should allow you to replicate the issue. If all goes as
expected, you will encounter a "Segmentation fault (core dumped)."
By disassembling the binary, you'll notice the presence of "vsetvli 
zero,zero,e32,m4,ta,ma",
which is where the problem lies, just as I mentioned previously.

Best regards,
Jin Ma

> > /* { dg-do compile { target { rv64 } } } */
> > /* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3" } */
> >
> > #include 
> >
> > _Float16 a (uint64_t);
> > int8_t b () {
> >   int c = 100;
> >   double *d;
> >   _Float16 *e;
> >   for (size_t f;; c -= f)
> > {
> >   f = c;
> >   __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 2, f);
> >   vfloat16mf4_t g;
> >   a (1);
> >   g = __riscv_vfmv_s_f_f16mf4 (2, f);
> >   vfloat64m1_t i = __riscv_vfmv_s_f_f64m1 (30491, f);
> >   vuint16mf4_t j;
> >   __riscv_vsoxei16_v_f16mf4 (e, j, g, f);
> >   vuint8mf8_t k = __riscv_vsll_vx_u8mf8 (__riscv_vid_v_u8mf8 (f), 3, f);
> >   __riscv_vsoxei8_v_f64m1 (d, k, i, f);
> > }
> > }
> >
> > /* { dg-final { scan-assembler-not "e64,mf4" } } */
> 
> That works, thanks.
> 
> -- 
> Regards
>  Robin


[committed] c++: Fix a comment typo

2025-03-04 Thread Jakub Jelinek
Hi!

During the 118874 coro investigation I found a typo in a comment.

Fixed thusly, committed to trunk as obvious.

2025-03-05  Jakub Jelinek  

* typeck.cc (check_return_expr): Fix comment typo, rom -> from.

--- gcc/cp/typeck.cc.jj 2025-02-25 09:26:38.761944513 +0100
+++ gcc/cp/typeck.cc2025-03-04 12:19:41.495160309 +0100
@@ -11450,7 +11450,7 @@ check_return_expr (tree retval, bool *no
   bool converted = false;
   tree moved;
   /* Until C++23, this was only interesting for class type, but in C++23,
-we should do the below when we're converting rom/to a class/reference
+we should do the below when we're converting from/to a class/reference
 (a non-scalar type).  */
if ((cxx_dialect < cxx23
 ? CLASS_TYPE_P (functype)

Jakub



[pushed] Fix folding of BIT_NOT_EXPR for POLY_INT_CST [PR118976]

2025-03-04 Thread Richard Sandiford
There was an embarrassing typo in the folding of BIT_NOT_EXPR for
POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
how that happened, but it might have been due to the way that
~x is implemented as -1 - x internally.

Tested on aarch64-linux-gnu & powerpc64le-linux-gnu (as part
of other testing).  Pushed as obvious.  I'll backport to all
open branches.

Richard


gcc/
PR tree-optimization/118976
* fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
* config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
(aarch64_run_selftests): Run it.
---
 gcc/config/aarch64/aarch64.cc | 11 +++
 gcc/fold-const.cc |  2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index fe76730b0a7..af3871ce8a1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -31336,6 +31336,16 @@ aarch64_test_sysreg_encoding_clashes (void)
 }
 }
 
+/* Test SVE arithmetic folding.  */
+
+static void
+aarch64_test_sve_folding ()
+{
+  tree res = fold_unary (BIT_NOT_EXPR, ssizetype,
+ssize_int (poly_int64 (1, 1)));
+  ASSERT_TRUE (operand_equal_p (res, ssize_int (poly_int64 (-2, -1;
+}
+
 /* Run all target-specific selftests.  */
 
 static void
@@ -31344,6 +31354,7 @@ aarch64_run_selftests (void)
   aarch64_test_loading_full_dump ();
   aarch64_test_fractional_cost ();
   aarch64_test_sysreg_encoding_clashes ();
+  aarch64_test_sve_folding ();
 }
 
 } // namespace selftest
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index f9f7f4d2f91..fef7a6cc48e 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1964,7 +1964,7 @@ const_unop (enum tree_code code, tree type, tree arg0)
   if (TREE_CODE (arg0) == INTEGER_CST)
return fold_not_const (arg0, type);
   else if (POLY_INT_CST_P (arg0))
-   return wide_int_to_tree (type, -poly_int_cst_value (arg0));
+   return wide_int_to_tree (type, ~poly_int_cst_value (arg0));
   /* Perform BIT_NOT_EXPR on each element individually.  */
   else if (TREE_CODE (arg0) == VECTOR_CST)
{
-- 
2.25.1



Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-04 Thread Jakub Jelinek
On Tue, Mar 04, 2025 at 09:26:07AM +, Richard Sandiford wrote:
> gcc/
>   PR rtl-optimization/119002
>   * simplify-rtx.cc
>   (simplify_context::simplify_logical_relational_operation): Handle
>   comparisons between CC values.  If there is no evidence that the
>   CC values are unsigned, restrict the fold to always-true or
>   always-false results.
> 
> gcc/testsuite/
>   * gcc.c-torture/execute/ieee/pr119002.c: New test.
>   * gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O.

LGTM.

Jakub



Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-04 Thread Richard Sandiford
Richard Sandiford  writes:
> Jakub Jelinek  writes:
>> On Mon, Mar 03, 2025 at 01:46:20PM +, Richard Sandiford wrote:
>>> Jakub Jelinek  writes:
>>> > On Mon, Mar 03, 2025 at 01:02:07PM +, Richard Sandiford wrote:
>>> >> ...how about something like this?  Completely untested, and I haven't
>>> >> thought about it much.  Just didn't want to hold up the discussion.
>>> >
>>> > Works for me.
>>> >
>>> > Just wonder if there is anything that will actually verify that XEXP 
>>> > (op0, 0)
>>> > and XEXP (op1, 0) modes are at least from the same class, rather than say
>>> > have one of the comparisons in MODE_CC and another in MODE_INT or vice 
>>> > versa
>>> > or whatever other modes.
>>> 
>>> There's:
>>> 
>>>   if (!(rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
>>> && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1
>>> return 0;
>>
>> You're right, and rtx_equal_p returns false for GET_MODE differences.
>>
>> Will you test your patch (with the testcase from my patch) or should I?
>
> I'm just about to test it with a tweak to the mode check.  Should be
> done in a couple of hours.

Heh.  Forgot when I said that that I would need to test on powerpc64le
as well, which isn't something I do regularly, and so took a couple of
gos to get right.  I've now bootstrapped & regression-tested on both
powerpc64le-linux-gnu and aarch64-linux-gnu.  OK to install?

Richard


The following testcase is miscompiled on powerpc64le-linux starting with
r15-6777.  During combine we see:

(set (reg:SI 134)
(ior:SI (ge:SI (reg:CCFP 128)
(const_int 0 [0]))
(lt:SI (reg:CCFP 128)
(const_int 0 [0]

The simplify_logical_relational_operation code (in its current form)
was written with arithmetic rather than CC modes in mind.  Since CCFP
is a CC mode, it fails the HONOR_NANS check, and so the function assumes
that ge | lt => true.

If one comparison is unsigned then it should be safe to assume that
the other comparison is also unsigned, even for CC modes, since the
optimisation checks that the comparisons are between the same operands.
For the other cases, we can only safely fold comparisons of CC mode
values if the result is always-true (15) or always-false (0).

It turns out that the original testcase for PR117186, which ran at -O,
was relying on the old behaviour for some of the functions.  It needs
4-instruction combinations, and so -fexpensive-optimizations, to pass
in its intended form.

gcc/
PR rtl-optimization/119002
* simplify-rtx.cc
(simplify_context::simplify_logical_relational_operation): Handle
comparisons between CC values.  If there is no evidence that the
CC values are unsigned, restrict the fold to always-true or
always-false results.

gcc/testsuite/
* gcc.c-torture/execute/ieee/pr119002.c: New test.
* gcc.target/aarch64/pr117186.c: Run at -O2 rather than -O.

Co-authored-by: Jakub Jelinek 
---
 gcc/simplify-rtx.cc   | 12 --
 .../gcc.c-torture/execute/ieee/pr119002.c | 23 +++
 gcc/testsuite/gcc.target/aarch64/pr117186.c   |  2 +-
 3 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index c478bd060fc..fe007bc7d96 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2655,6 +2655,9 @@ simplify_context::simplify_logical_relational_operation 
(rtx_code code,
 
   enum rtx_code code0 = GET_CODE (op0);
   enum rtx_code code1 = GET_CODE (op1);
+  machine_mode cmp_mode = GET_MODE (XEXP (op0, 0));
+  if (cmp_mode == VOIDmode)
+cmp_mode = GET_MODE (XEXP (op0, 1));
 
   /* Assume at first that the comparisons are on integers, and that the
  operands are therefore ordered.  */
@@ -2672,8 +2675,10 @@ simplify_context::simplify_logical_relational_operation 
(rtx_code code,
 }
   else
 {
-  /* See whether the operands might be unordered.  */
-  if (HONOR_NANS (GET_MODE (XEXP (op0, 0
+  /* See whether the operands might be unordered.  Assume that all
+results are possible for CC modes, and punt later if we don't get an
+always-true or always-false answer.  */
+  if (GET_MODE_CLASS (cmp_mode) == MODE_CC || HONOR_NANS (cmp_mode))
all = 15;
   mask0 = comparison_to_mask (code0) & all;
   mask1 = comparison_to_mask (code1) & all;
@@ -2702,6 +2707,9 @@ simplify_context::simplify_logical_relational_operation 
(rtx_code code,
 code = mask_to_unsigned_comparison (mask);
   else
 {
+  if (GET_MODE_CLASS (cmp_mode) == MODE_CC)
+   return 0;
+
   code = mask_to_comparison (mask);
   /* LTGT and NE are arithmetically equivalent for ordered operands,
 with NE being the canonical choice.  */
diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c 
b/gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c
new file mode 100644
index 000..af1a705f17

[committed] testsuite: Add tests for already fixed PR [PR119071]

2025-03-04 Thread Jakub Jelinek
Hi!

Uros' r15-7793 fixed this PR as well, I'm just committing tests
from the PR so that it can be closed.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2025-03-04  Jakub Jelinek  

PR rtl-optimization/119071
* gcc.dg/pr119071.c: New test.
* gcc.c-torture/execute/pr119071.c: New test.

--- gcc/testsuite/gcc.dg/pr119071.c.jj  2025-03-03 21:39:28.828809833 +0100
+++ gcc/testsuite/gcc.dg/pr119071.c 2025-03-03 21:41:21.847251265 +0100
@@ -0,0 +1,45 @@
+/* PR rtl-optimization/119071 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fgimple" } */
+
+int a, b;
+
+int __GIMPLE (ssa,startwith("expand"))
+foo (void)
+{
+  int _1;
+  int _2;
+  int _3;
+  int _5;
+  _Bool _7;
+  int _8;
+  int _9;
+  _Bool _14;
+  int _15;
+  int _16;
+  _Bool _17;
+  int _18;
+
+  __BB(2):
+  _1 = a;
+  _17 = _1 != _Literal (int) -2;
+  _18 = (int) _17;
+  _2 = _18 + _Literal (int) -1;
+  _3 = _2 + _18;
+  _14 = _3 != 1;
+  _15 = (int) _14;
+  _16 = -_15;
+  _7 = _3 == 1;
+  _9 = (int) _7;
+  _5 = _9 << _16;
+  _8 = _5 - _15;
+  b = _8;
+  return _8;
+}
+
+int
+main ()
+{
+  if (foo () != 1)
+__builtin_abort ();
+}
--- gcc/testsuite/gcc.c-torture/execute/pr119071.c.jj   2025-03-03 
21:40:21.931077530 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr119071.c  2025-03-03 
21:40:11.765217724 +0100
@@ -0,0 +1,15 @@
+/* PR rtl-optimization/119071 */
+
+int a, b;
+
+int
+main ()
+{
+  int c = 0;
+  if (a + 2)
+c = 1;
+  int d = (1 + c - 2 + c == 1) - 1;
+  b = ((d + 1) << d) + d;
+  if (b != 1)
+__builtin_abort ();
+}

Jakub



Re: [libstdc++] Testsuite should pay attention to extra flags

2025-03-04 Thread Thomas Schwinge
Hi!

On 2025-03-03T17:30:06+0100, I wrote:
> On 2002-04-17T21:37:50-0400, Phil Edwards  wrote:
>> If the user decides to build the library with extra compiler options
>> via --enable-cxx-flags, the testsuite should (by default) use those same
>> options when running.
>
> Hmm, are we sure that's what we actually want?
>
>> Verified by passing strange things via --enable
>> and watching their effects on the testsuite.
>
>> --- testsuite_flags.in   7 Jan 2002 00:07:27 -   1.11
>> +++ testsuite_flags.in   18 Apr 2002 01:34:43 -
>> @@ -49,7 +49,7 @@ case ${query} in
>>;;
>>  --cxxflags)
>>CXXFLAGS=' -g @SECTION_FLAGS@ @SECTION_LDFLAGS@
>> - -fmessage-length=0
>> + -fmessage-length=0 @EXTRA_CXX_FLAGS@
>>   -DDEBUG_ASSERT  -DLOCALEDIR="@glibcpp_localedir@" '
>>echo ${CXXFLAGS}
>>;;
>
> This got installed in Subversion r52450
> (Git commit 822ca943a31961fd7339de6fbe6593118ec7f231).
>
> To me at least, this comes unexpected: per my reading of (the current)
> 'libstdc++-v3/acinclude.m4:GLIBCXX_ENABLE_CXX_FLAGS', I'd have thought
> these flags apply only to the build of libstdc++ itself, but apparently
> "flags to pass to the compiler while building" or
> "extra compiler flags for building" also includes use by
> 'make check-target-libstdc++-v3'?
>
> I'm OK to leave this as-is, if only for hysterical raisins, but would
> then proposa a patch to document this behavior?
>
>
> Alas, there's an additional issue: with Subversion r42272
> (Git commit 28e8acb68f8a427552bc9f2d4cae12a1ac477855) -- so, already
> in place when the change above got installed -- we'd gotten:
>
>   * testsuite/lib/libstdc++-v3-dg.exp (libstdc++-v3-init): Set flags
>   appropriately for remote testing and testing installed files without
>   a build dir.
>
> -[...]
> -set cxxflags [exec sh ${blddir}/testsuite_flags --cxxflags]
> -[...]
> +if [is_remote host] {
> +  [...]
> +  set cxxflags "-ggdb3 -DDEBUG_ASSERT"
> +  [...]
> +} else {
> +# If we find a testsuite_flags file, we're testing in the build 
> dir.
> +set flags_file "${blddir}/testsuite_flags"
> +if { [file exists $flags_file] } {
> +[...]
> +set cxxflags [exec sh $flags_file --cxxflags]
> +[...]
> +} else {
> +[...]
> +set cxxflags "-ggdb3 -DDEBUG_ASSERT"
> +[...]
> +}
> +}
>
> (Nowadays: 'libstdc++-v3/testsuite/lib/libstdc++.exp:libstdc++_init',
> similarly.)
>
> That means, any 'EXTRA_CXX_FLAGS' specified by '--enable-cxx-flags=[...]'
> are *not* considered for remote host testing -- which in turn appears to
> be in conflict with the original intent, quoted at the beginning of this
> email?  Hmm...
>
>
> I think we should clarify the intended cases where 'EXTRA_CXX_FLAGS' (as
> specified by '--enable-cxx-flags=[...]') apply, and then make that apply
> in a uniform way?  I think I'd be in favor of removing them from
> 'testsuite_flags --cxxflags' -- but can't quite tell which use cases
> that'd break...

I still maintain my position that 'testsuite_flags' should not be about
passing compiler flags (as opposed to all kinds of search paths, where
it's appropriate).

However, 'libstdc++-v3/doc/xml/manual/test.xml' also "You can run the
tests with different options by adding them to the output of the
'--cxxflags' option of that script, or [...]", so there's more precedence
of doing it this way.

And, 'testsuite_flags -cxxflags' actually isn't only about
'EXTRA_CXX_FLAGS' but also '-fmessage-length=0 -fno-show-column', and
'@SECTION_FLAGS@' ('-ffunction-sections -fdata-sections'), and other
random ones, like '-fcf-protection', '-mshstk'.

All this doesn't work for the '[is_remote host]' case, as far as I can
tell, but resolving that shall not be my problem of today.  ;-)

And, considering that for GCN/nvptx libstdc++ I also need a way to inject
compiler flags into 'make check-target-libstdc++-v3', and these happen to
overlap with those needed for the libstdc++ build, so the current
'EXTRA_CXX_FLAGS' actually fits that model.  ;-)

And for flags that only apply to libstdc++ compilation but not apply to
'make check-target-libstdc++-v3', I may use 'OPTIMIZE_CXXFLAGS'.

Sounds like a plan, I guess?  (So, effectively no code changes, but I'll
still propose updates to the documentation.)


Grüße
 Thomas


[PATCH] arm: Handle fixed PIC register in require_pic_register (PR target/115485)

2025-03-04 Thread Christophe Lyon
Commit r9-4307-g89d7557202d25a forgot to accept a fixed PIC register
when extending the assert in require_pic_register.

arm_pic_register can be set explicitly by the user
(e.g. -mpic-register=r9) or implicitly as the default value with
-fpic/-fPIC/-fPIE and -mno-pic-data-is-text-relative -mlong-calls, and
we want to use/accept it when recording cfun->machine->pic_reg as used
to be the case.

Tested on arm-none-linux-gnueabihf and arm-none-eabi with several sets
of options, covering all M-profile architecture versions.

PR target/115485
gcc/
* config/arm/arm.cc (require_pic_register): Fix typos in
comment. Handle fixed arm_pic_register.

gcc/testsuite/
* g++.target/arm/pr115485.C: New test.
---
 gcc/config/arm/arm.cc   |  5 +++--
 gcc/testsuite/g++.target/arm/pr115485.C | 16 
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/arm/pr115485.C

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 00499a26bae..670f487bcce 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -8086,8 +8086,8 @@ legitimate_pic_operand_p (rtx x)
 
 /* Record that the current function needs a PIC register.  If PIC_REG is null,
a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
-   both case cfun->machine->pic_reg is initialized if we have not already done
-   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   both cases cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decides whether and where to set the PIC register.  If 
true,
PIC register is reloaded in the current position of the instruction stream
irregardless of whether it was loaded before.  Otherwise, it is only loaded
if not already done so (crtl->uses_pic_offset_table is null).  Note that
@@ -8107,6 +8107,7 @@ require_pic_register (rtx pic_reg, bool compute_now)
   if (!crtl->uses_pic_offset_table || compute_now)
 {
   gcc_assert (can_create_pseudo_p ()
+ || (arm_pic_register != INVALID_REGNUM)
  || (pic_reg != NULL_RTX
  && REG_P (pic_reg)
  && GET_MODE (pic_reg) == Pmode));
diff --git a/gcc/testsuite/g++.target/arm/pr115485.C 
b/gcc/testsuite/g++.target/arm/pr115485.C
new file mode 100644
index 000..491b48c726a
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/pr115485.C
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fPIE -mno-pic-data-is-text-relative -mlong-calls 
-ffunction-sections" } */
+
+struct c1 {
+  virtual void func1() = 0;
+};
+struct  c2 {
+  virtual ~c2() {}
+};
+struct c3 : c2, c1 {
+  void func1() override;
+  void func3();
+};
+void c3::func1() {
+  func3();
+}
-- 
2.34.1



[committed] libstdc++: Remove stray comma in testing docs

2025-03-04 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* doc/xml/manual/test.xml: Remove stray comma.
* doc/html/manual/test.html: Regenerate.
---

Pushed to trunk.

 libstdc++-v3/doc/html/manual/test.html | 2 +-
 libstdc++-v3/doc/xml/manual/test.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/test.html 
b/libstdc++-v3/doc/html/manual/test.html
index 1c7af1193da..4b295e583f6 100644
--- a/libstdc++-v3/doc/html/manual/test.html
+++ b/libstdc++-v3/doc/html/manual/test.html
@@ -230,7 +230,7 @@ cat 27_io/objects/char/3_xin.in | a.out
 
   The testsuite will create a number of files in the directory in
-  which you run this command,.  Some of those files might use the
+  which you run this command.  Some of those files might use the
   same name as files created by other testsuites (like the ones
   for GCC and G++), so you should not try to run all the
   testsuites in parallel from the same directory.
diff --git a/libstdc++-v3/doc/xml/manual/test.xml 
b/libstdc++-v3/doc/xml/manual/test.xml
index 6b7f1b04a2a..963e3e13500 100644
--- a/libstdc++-v3/doc/xml/manual/test.xml
+++ b/libstdc++-v3/doc/xml/manual/test.xml
@@ -381,7 +381,7 @@ cat 27_io/objects/char/3_xin.in | a.out
 
 
   The testsuite will create a number of files in the directory in
-  which you run this command,.  Some of those files might use the
+  which you run this command.  Some of those files might use the
   same name as files created by other testsuites (like the ones
   for GCC and G++), so you should not try to run all the
   testsuites in parallel from the same directory.
-- 
2.48.1



Re: libatomic: use HWCAPs in AArch64 ifunc tests

2025-03-04 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Feedback from the kernel team suggests that it's best to only use HWCAPs
> rather than also use low-level checks as done by has_lse128() and has_rcpc3().
> So change these to just use HWCAPs which simplifies the code and speeds up
> ifunc selection by avoiding expensive system register accesses.

Could you give details?  I thought it was always known that trapped
system register accesses were slow.  In the previous versions, the
checks seemed to be presented as an up-front price worth paying for
faster atomic operations, on the systems that would use those paths.
Now the checks are being presented as something that are good to remove
to make the code simpler and faster.

There have been a few changes to this code in the current release cycle,
and each time it seems like the new version is being presented as better
than the previous one with single-sentence justifications.

Could we instead have a comment in the code discussing the various
approaches that we could take, including the ones that previous versions
took, describes the trade-offs, and explains why we've chosen to do what
we've chosen to do?

Thanks,
Richard

>
> Passes regress, OK for commit?
>
> libatomic:
> * config/linux/aarch64/host-config.h (has_lse2): Remove unused arg.
> (has_lse128): Change to just use HWCAPs.
> (has_rcpc3): Likewise.
>
> ---
>
> diff --git a/libatomic/config/linux/aarch64/host-config.h 
> b/libatomic/config/linux/aarch64/host-config.h
> index 
> d0d44bf18eaa64437f52c2894da6ece9e02618df..6a4f7014323a2ed196cabe408aaa6df0d2521518
>  100644
> --- a/libatomic/config/linux/aarch64/host-config.h
> +++ b/libatomic/config/linux/aarch64/host-config.h
> @@ -69,7 +69,7 @@ typedef struct __ifunc_arg_t {
>  #  elif defined (LSE2_LRCPC3_ATOP)
>  #   define IFUNC_NCOND(N)  2
>  #   define IFUNC_COND_1(has_rcpc3 (hwcap, features))
> -#   define IFUNC_COND_2(has_lse2 (hwcap, features))
> +#   define IFUNC_COND_2(has_lse2 (hwcap))
>  #  elif defined (LSE128_ATOP)
>  #   define IFUNC_NCOND(N)  1
>  #   define IFUNC_COND_1(has_lse128 (hwcap, features))
> @@ -86,7 +86,7 @@ typedef struct __ifunc_arg_t {
>  #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
>
>  static inline bool
> -has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
> +has_lse2 (unsigned long hwcap)
>  {
>/* Check for LSE2.  */
>if (hwcap & HWCAP_USCAT)
> @@ -105,50 +105,20 @@ has_lse2 (unsigned long hwcap, const __ifunc_arg_t 
> *features)
>return false;
>  }
>
> -/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, bits[23:20].
> -   The minimum value for LSE128 is 0b0011.  */
> -
> -#define AT_FEAT_FIELD(isar0)   (((isar0) >> 20) & 15)
> -
>  static inline bool
>  has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
>  {
> -  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128)
> -return true;
> -
> -  /* If LSE2 and CPUID are supported, check for LSE128.  */
> -  if (hwcap & HWCAP_CPUID && hwcap & HWCAP_USCAT)
> -{
> -  unsigned long isar0;
> -  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
> -  return AT_FEAT_FIELD (isar0) >= 3;
> -}
> -
> -  return false;
> +  return hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128;
>  }
>
> -/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].
> -   The minimum value for LRCPC3 is 0b0011.  */
> -
>  static inline bool
>  has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features)
>  {
>/* LSE2 is a prerequisite for atomic LDIAPP/STILP - check HWCAP_USCAT since
>   has_lse2 is more expensive and Neoverse N1 does not have LRCPC3. */
> -  if (!(hwcap & HWCAP_USCAT))
> -return false;
> -
> -  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LRCPC3)
> -return true;
> -
> -  if (hwcap & HWCAP_CPUID)
> -{
> -  unsigned long isar1;
> -  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1));
> -  return AT_FEAT_FIELD (isar1) >= 3;
> -}
> -
> -  return false;
> +  return (hwcap & HWCAP_USCAT
> + && hwcap & _IFUNC_ARG_HWCAP
> + && features->_hwcap2 & HWCAP2_LRCPC3);
>  }
>
>  #endif /* HAVE_IFUNC */


[PATCH] ranger: Improve nonnull_if_nonzero attribute [PR117023]

2025-03-04 Thread Jakub Jelinek
Hi!

On Fri, Nov 22, 2024 at 07:25:23PM -0500, Andrew MacLeod wrote:
>  I will shortly be submitting , and presumable committing, this patch as
> part of a series to improve VRP time for 117467..
> 
> So it may be in place by the time you need it

> On 11/18/24 09:31, Andrew MacLeod wrote:
> > Attached is a pre-approved patch which adds a range_query to the
> > inferred range mechanism.
> > 
> > The only change you will need to make is to replace "get_range_query
> > (cfun)->"  with "q->" which is passed in.
> > 
> > This regstraps on x86 without your patch, and I got as far as a
> > bootstrap with your patches..

Sorry for the delay.  Only recently the remaining patches have been
committed, so I'm getting back to this.

The current state of the trunk is that it handles constant arg2 (if it is
zero or non-zero) and punts on everything else.

I've changed get_range_query (cfun)-> to q-> but unfortunately the test
FAILs with it, while it improves the if (n >= 42) case where there is a
SSA_NAME for n - 10, it doesn't improve the if (n) case, so it feels like
it is querying just the global range rather than the local one from the
statement, because on the foo (b, n); statement which is guarded by if (n)
n should be provably [1, SIZE_MAX].

The patch still bootstraps/regtests on x86_64-linux and i686-linux
and improves at least something, so I guess I could just comment out part
of the testcase.  Any thoughts why this happens though?  And is that something
that can be improved for GCC 15 or should wait for GCC 16?

2025-03-04  Jakub Jelinek  
Andrew MacLeod  

PR c/117023
* gimple-range-infer.cc (gimple_infer_range::gimple_infer_range):
For nonnull_if_nonzero attribute check also arg2 range if it doesn't
include zero and in that case call add_nonzero too.

* gcc.dg/tree-ssa/pr78154-2.c: New test.

--- gcc/gimple-range-infer.cc.jj2025-01-02 11:23:24.0 +0100
+++ gcc/gimple-range-infer.cc   2025-03-03 12:37:52.836895208 +0100
@@ -208,8 +208,13 @@ gimple_infer_range::gimple_infer_range (
  continue;
if (integer_nonzerop (arg2))
  add_nonzero (arg);
-   // FIXME: Can one query here whether arg2 has
-   // nonzero range if it is a SSA_NAME?
+   else
+ {
+   value_range r (TREE_TYPE (arg2));
+   if (q->range_of_expr (r, arg2, s)
+   && !r.contains_p (build_zero_cst (TREE_TYPE (arg2
+ add_nonzero (arg);
+ }
  }
  }
   // Fallthru and walk load/store ops now.
--- gcc/testsuite/gcc.dg/tree-ssa/pr78154-2.c.jj2025-03-03 
12:38:37.841273517 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr78154-2.c   2025-03-03 12:38:37.841273517 
+0100
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp-slim -fdelete-null-pointer-checks" } */
+/* { dg-skip-if "" { keeps_null_pointer_checks } } */
+
+void foo (void *, __SIZE_TYPE__) __attribute__((nonnull_if_nonzero (1, 2)));
+void baz (void);
+
+void
+bar (void *a, void *b, void *c, void *d, void *e, __SIZE_TYPE__ n)
+{
+  foo (a, 42);
+  if (a == 0)
+__builtin_abort ();
+  if (n)
+{
+  foo (b, n);
+  if (b == 0)
+   __builtin_abort ();
+}
+  if (n >= 42)
+{
+  foo (c, n - 10);
+  if (c == 0)
+   __builtin_abort ();
+}
+  foo (d, 0);
+  if (d == 0)
+baz ();
+  if (n != 42)
+{
+  foo (e, n);
+  if (e == 0)
+   baz ();
+}
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */
+/* { dg-final { scan-tree-dump-times "baz \\\(" 2 "evrp" } } */


Jakub



Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Richard Sandiford
Kyrylo Tkachov  writes:
>> On 3 Mar 2025, at 19:58, Wilco Dijkstra  wrote:
>> 
>> 
>> Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15 
>> benefits
>> from much faster build times with -O2, but avoids the regressions in lbm 
>> which
>> is very sensitive to minor scheduling changes due to long FMA chains.  We can
>> then revisit this for GCC16.
>> 
>
> I’m in favour of this.

Yeah, seems ok to me too.  I suppose we ought to update the documentation too:

  @opindex fschedule-insns
  @item -fschedule-insns
  If supported for the target machine, attempt to reorder instructions to
  eliminate execution stalls due to required data being unavailable.  This
  helps machines that have slow floating point or memory load instructions
  by allowing other instructions to be issued until the result of the load
  or floating-point instruction is required.

  Enabled at levels @option{-O2}, @option{-O3}.

Richard

> So ok for me if no objections within 24h.
> Thanks,
> Kyrill
>
>> gcc:
>>PR target/118351
>>* common/config/aarch64/aarch64-common.cc: Enable early scheduling 
>> with
>>-O3 and higher.
>> 
>> ---
>> 
>> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
>> b/gcc/common/config/aarch64/aarch64-common.cc
>> index 
>> 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd
>>  100644
>> --- a/gcc/common/config/aarch64/aarch64-common.cc
>> +++ b/gcc/common/config/aarch64/aarch64-common.cc
>> @@ -54,8 +54,10 @@ static const struct default_options 
>> aarch_option_optimization_table[] =
>> { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 },
>> /* Enable -fsched-pressure by default when optimizing.  */
>> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>> -/* Disable early scheduling due to high compile-time overheads.  */
>> +/* Except for -O3 and higher, disable early scheduling due to high
>> +   compile-time overheads.  */
>> { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
>> +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 },
>> /* Enable redundant extension instructions removal at -O2 and higher.  */
>> { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>> { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
>> 


Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Kyrylo Tkachov


> On 3 Mar 2025, at 19:58, Wilco Dijkstra  wrote:
> 
> 
> Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15 benefits
> from much faster build times with -O2, but avoids the regressions in lbm which
> is very sensitive to minor scheduling changes due to long FMA chains.  We can
> then revisit this for GCC16.
> 

I’m in favour of this.
So ok for me if no objections within 24h.
Thanks,
Kyrill

> gcc:
>PR target/118351
>* common/config/aarch64/aarch64-common.cc: Enable early scheduling with
>-O3 and higher.
> 
> ---
> 
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> 3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -54,8 +54,10 @@ static const struct default_options 
> aarch_option_optimization_table[] =
> { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 },
> /* Enable -fsched-pressure by default when optimizing.  */
> { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> -/* Disable early scheduling due to high compile-time overheads.  */
> +/* Except for -O3 and higher, disable early scheduling due to high
> +   compile-time overheads.  */
> { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> +{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 },
> /* Enable redundant extension instructions removal at -O2 and higher.  */
> { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
> 



Re: [RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Richard Biener
On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Jan Hubicka  writes:
> >>>
> >>> Thanks for running these.  I saw poor results for perlbench with my
> >>> initial aarch64 hooks because the hooks reduced the cost to zero for
> >>> the entry case:
> >>>
> >>> auto entry_cost = targetm.callee_save_cost
> >>>   (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> >>>ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> >>>allocated_callee_save_regs, existing_spills_p);
> >>> /* In the event of a tie between caller-save and callee-save,
> >>>prefer callee-save.  We apply this to the entry cost rather
> >>>than the exit cost since the entry frequency must be at
> >>>least as high as the exit frequency.  */
> >>> if (entry_cost > 0)
> >>>   entry_cost -= 1;
> >>>
> >>> I "fixed" that by bumping the cost to a minimum of 2, but I was
> >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 1",
> >>> so that the cost is always greater than not using a callee save for
> >>> registers that don't cross a call.  WDYT?
> >>
> >> For x86 perfomance costs, the push cost should be memory_move_cost which
> >> is 6, -2 for adjustment in the target hook and -1 for this. So cost
> >> should not be 0 I think.
> >>
> >> For size cost, I currently return 1, so we indeed get 0 after
> >> adjustment.
> >>
> >> I think cost of 0 will make us to pick callee save even if caller save
> >> is available and there are no function calls, so I guess we do not want
> >> that
> >
> > OK, here's an updated patch that makes that change.  The x86 parts
> > should be replaced by your patch.
> >
> > Tested on aarch64-linux-gnu.  I also tried to test on pwoerpc64el-linux-gnu
> > (on gcc112), but I keep getting broken pipes during the test runs,
> > so I'm struggling to get good before/after comparisons.  It does at
> > least bootstrap though...
>
> Here's the patch with Honza's x86 changes.  Boostrapped & regresiion-tested
> on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120).  The powerpc64le
> results regressed:
>
> FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 i == 5
> FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 sizeof 
> (a) == 17 * sizeof (short)
>
> but the same test already failed for -O2 and -O3.
>
> OK to install now?  Or, given the lateness in the release cycle,
> would it be better to wait for GCC 16?

I think it's OK to install now.  Not installing anything isn't an option, the
alternative would be to at least revert HJs change.

Thanks,
Richard.

>
> Thanks,
> Richard
>
>
> Following on from the discussion in:
>
>   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html
>
> this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
> replaces it with two hooks: one that controls the cost of using an
> extra callee-saved register and one that controls the cost of allocating
> a frame for the first spill.
>
> (The patch does not attempt to address the shrink-wrapping part of
> the thread above.)
>
> On AArch64, this is enough to fix PR117477, as verified by the new tests.
> The patch does not change the SPEC2017 scores significantly.  (I saw a
> slight improvement in fotonik3d and roms, but I'm not convinced that
> the improvements are real.)
>
> The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
> which is a scan-dump correctness test that relies on not using
> caller saves.  The decision to use caller saves looks appropriate,
> and saves an instruction, so I've just added -fno-caller-saves
> to the test options.
>
> The x86 parts were written by Honza.
>
> gcc/
> PR rtl-optimization/117477
> * config/aarch64/aarch64.cc (aarch64_count_saves): New function.
> (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
> (aarch64_frame_allocation_cost): Likewise.
> (TARGET_CALLEE_SAVE_COST): Define.
> (TARGET_FRAME_ALLOCATION_COST): Likewise.
> * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> Replace with...
> (ix86_callee_save_cost): ...this new hook.
> (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
> (TARGET_CALLEE_SAVE_COST): Define.
> * target.h (spill_cost_type, frame_cost_type): New enums.
> * target.def (callee_save_cost, frame_allocation_cost): New hooks.
> (ira_callee_saved_register_cost_scale): Delete.
> * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): 
> Delete.
> (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
> * doc/tm.texi: Regenerate.
> * hard-reg-set.h (hard_reg_set_popcount): New function.
> * ira-color.cc (allocated_memory_p): New variable.
> (allocated_callee_save_regs): Likewise.
> (record_allocation): New function.
>  

[PATCH] go/119098 - bump libgo version for GCC 15 release

2025-03-04 Thread Richard Biener
As PR119098 shows the gccgo driver uses the install path compiled
into the shared libgo and thus updating that but not the driver
will cause the driver to fail to find other tools like cgo.

Thus bump the SONAME for GCC 15 as well, as we've done for each
release in the past.

I think libgo changes go through another "upstream", so Ian, can you
please take care of this?

Thanks,
Richard.

PR go/119098
libgo/
* configure.ac (libtool_VERSION): Bump.
* configure: Regenerate.
---
 libgo/configure| 2 +-
 libgo/configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgo/configure b/libgo/configure
index c0d0a1560f5..b1a2228fa1b 100755
--- a/libgo/configure
+++ b/libgo/configure
@@ -2611,7 +2611,7 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 ac_config_headers="$ac_config_headers config.h"
 
 
-libtool_VERSION=23:0:0
+libtool_VERSION=24:0:0
 
 
 # Default to --enable-multilib
diff --git a/libgo/configure.ac b/libgo/configure.ac
index 898091276f7..0b05551aacb 100644
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -10,7 +10,7 @@ AC_INIT(package-unused, version-unused,, libgo)
 AC_CONFIG_SRCDIR(Makefile.am)
 AC_CONFIG_HEADER(config.h)
 
-libtool_VERSION=23:0:0
+libtool_VERSION=24:0:0
 AC_SUBST(libtool_VERSION)
 
 AM_ENABLE_MULTILIB(, ..)
-- 
2.43.0


[RFA] ira: Add new hooks for callee-save vs spills [PR117477]

2025-03-04 Thread Richard Sandiford
Richard Sandiford  writes:
> Jan Hubicka  writes:
>>> 
>>> Thanks for running these.  I saw poor results for perlbench with my
>>> initial aarch64 hooks because the hooks reduced the cost to zero for
>>> the entry case:
>>> 
>>> auto entry_cost = targetm.callee_save_cost
>>>   (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
>>>ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
>>>allocated_callee_save_regs, existing_spills_p);
>>> /* In the event of a tie between caller-save and callee-save,
>>>prefer callee-save.  We apply this to the entry cost rather
>>>than the exit cost since the entry frequency must be at
>>>least as high as the exit frequency.  */
>>> if (entry_cost > 0)
>>>   entry_cost -= 1;
>>> 
>>> I "fixed" that by bumping the cost to a minimum of 2, but I was
>>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 1",
>>> so that the cost is always greater than not using a callee save for
>>> registers that don't cross a call.  WDYT?
>>
>> For x86 perfomance costs, the push cost should be memory_move_cost which
>> is 6, -2 for adjustment in the target hook and -1 for this. So cost
>> should not be 0 I think.
>>
>> For size cost, I currently return 1, so we indeed get 0 after
>> adjustment.
>>
>> I think cost of 0 will make us to pick callee save even if caller save
>> is available and there are no function calls, so I guess we do not want
>> that
>
> OK, here's an updated patch that makes that change.  The x86 parts
> should be replaced by your patch.
>
> Tested on aarch64-linux-gnu.  I also tried to test on pwoerpc64el-linux-gnu
> (on gcc112), but I keep getting broken pipes during the test runs,
> so I'm struggling to get good before/after comparisons.  It does at
> least bootstrap though...

Here's the patch with Honza's x86 changes.  Boostrapped & regresiion-tested
on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120).  The powerpc64le
results regressed:

FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 i == 5
FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 sizeof (a) 
== 17 * sizeof (short)

but the same test already failed for -O2 and -O3.

OK to install now?  Or, given the lateness in the release cycle,
would it be better to wait for GCC 16?

Thanks,
Richard


Following on from the discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html

this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.

(The patch does not attempt to address the shrink-wrapping part of
the thread above.)

On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores significantly.  (I saw a
slight improvement in fotonik3d and roms, but I'm not convinced that
the improvements are real.)

The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
which is a scan-dump correctness test that relies on not using
caller saves.  The decision to use caller saves looks appropriate,
and saves an instruction, so I've just added -fno-caller-saves
to the test options.

The x86 parts were written by Honza.

gcc/
PR rtl-optimization/117477
* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
(aarch64_frame_allocation_cost): Likewise.
(TARGET_CALLEE_SAVE_COST): Define.
(TARGET_FRAME_ALLOCATION_COST): Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
Replace with...
(ix86_callee_save_cost): ...this new hook.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST): Define.
* target.h (spill_cost_type, frame_cost_type): New enums.
* target.def (callee_save_cost, frame_allocation_cost): New hooks.
(ira_callee_saved_register_cost_scale): Delete.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
* doc/tm.texi: Regenerate.
* hard-reg-set.h (hard_reg_set_popcount): New function.
* ira-color.cc (allocated_memory_p): New variable.
(allocated_callee_save_regs): Likewise.
(record_allocation): New function.
(assign_hard_reg): Use targetm.frame_allocation_cost to model
the cost of the first spill or first caller save.  Use
targetm.callee_save_cost to model the cost of using new callee-saved
registers.  Apply the exit rather than entry frequency to the cost
of restoring a register or deallocating the frame.  Update the
new variables above.
(improve_allocatio

[PATCH] c++: Handle RAW_DATA_CST in modules.cc [PR119076]

2025-03-04 Thread Jakub Jelinek
Hi!

The following testcases (one with #embed, one with large initializer
turned into RAW_DATA_CST) show that I forgot to handle RAW_DATA_CST in
module streaming.

Similar to the PCH case we need to stream out RAW_DATA_CST with NULL
RAW_DATA_OWNER (i.e. a tree which has data owned by libcpp buffer) so
that it will be streamed back in as STRING_CST which owns the data,
but because the data can be really large (hopefully not so much for
header modules though), without actually trying to build a STRING_CST
on the module writing side because that would mean another large
allocation and copying of the large data.
RAW_DATA_CST with RAW_DATA_OWNER then needs to be streamed out and in
by streaming the owner and offset from owner's data and length.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-03-04  Jakub Jelinek  

PR c++/119076
* module.cc (trees_out::start): Handle RAW_DATA_CST.
(trees_in::start): Likewise.
(trees_out::core_vals): Likewise.
(trees_in::core_vals): Likewise.

* g++.dg/modules/pr119076-1_a.H: New test.
* g++.dg/modules/pr119076-1_b.C: New test.
* g++.dg/modules/pr119076-2_a.H: New test.
* g++.dg/modules/pr119076-2_b.C: New test.

--- gcc/cp/module.cc.jj 2025-01-28 09:23:38.195599960 +0100
+++ gcc/cp/module.cc2025-03-04 00:52:50.947063367 +0100
@@ -5319,6 +5319,30 @@ trees_out::start (tree t, bool code_stre
   str (TREE_STRING_POINTER (t), TREE_STRING_LENGTH (t));
   break;
 
+case RAW_DATA_CST:
+  if (RAW_DATA_OWNER (t) == NULL_TREE)
+   {
+ /* Stream RAW_DATA_CST with no owner (i.e. data pointing
+into libcpp buffers) as something we can stream in as
+STRING_CST which owns the data.  */
+ u (0);
+ /* Can't use str (RAW_DATA_POINTER (t), RAW_DATA_LENGTH (t));
+here as there isn't a null termination after it.  */
+ z (RAW_DATA_LENGTH (t));
+ if (RAW_DATA_LENGTH (t))
+   if (void *ptr = buf (RAW_DATA_LENGTH (t) + 1))
+ {
+   memcpy (ptr, RAW_DATA_POINTER (t), RAW_DATA_LENGTH (t));
+   ((char *) ptr)[RAW_DATA_LENGTH (t)] = '\0';
+ }
+   }
+  else
+   {
+ gcc_assert (RAW_DATA_LENGTH (t));
+ u (RAW_DATA_LENGTH (t));
+   }
+  break;
+
 case VECTOR_CST:
   u (VECTOR_CST_LOG2_NPATTERNS (t));
   u (VECTOR_CST_NELTS_PER_PATTERN (t));
@@ -5400,6 +5424,24 @@ trees_in::start (unsigned code)
   }
   break;
 
+case RAW_DATA_CST:
+  {
+   size_t l = u ();
+   if (l == 0)
+ {
+   /* Stream in RAW_DATA_CST with no owner as STRING_CST
+  which owns the data.  */
+   const char *chars = str (&l);
+   t = build_string (l, chars);
+ }
+   else
+ {
+   t = make_node (RAW_DATA_CST);
+   RAW_DATA_LENGTH (t) = l;
+ }
+  }
+  break;
+
 case VECTOR_CST:
   {
unsigned log2_npats = u ();
@@ -6311,6 +6353,22 @@ trees_out::core_vals (tree t)
   /* Streamed during start.  */
   break;
 
+case RAW_DATA_CST:
+  if (RAW_DATA_OWNER (t) == NULL_TREE)
+   break; /* Streamed as STRING_CST during start.  */
+  WT (RAW_DATA_OWNER (t));
+  if (streaming_p ())
+   {
+ if (TREE_CODE (RAW_DATA_OWNER (t)) == RAW_DATA_CST)
+   z (RAW_DATA_POINTER (t) - RAW_DATA_POINTER (RAW_DATA_OWNER (t)));
+ else if (TREE_CODE (RAW_DATA_OWNER (t)) == STRING_CST)
+   z (RAW_DATA_POINTER (t)
+  - TREE_STRING_POINTER (RAW_DATA_OWNER (t)));
+ else
+   gcc_unreachable ();
+   }
+  break;
+
 case VECTOR_CST:
   for (unsigned ix = vector_cst_encoded_nelts (t); ix--;)
WT (VECTOR_CST_ENCODED_ELT (t, ix));
@@ -6845,6 +6903,13 @@ trees_in::core_vals (tree t)
   /* Streamed during start.  */
   break;
 
+case RAW_DATA_CST:
+  RT (RAW_DATA_OWNER (t));
+  gcc_assert (TREE_CODE (RAW_DATA_OWNER (t)) == STRING_CST
+ && TREE_STRING_LENGTH (RAW_DATA_OWNER (t)));
+  RAW_DATA_POINTER (t) = TREE_STRING_POINTER (RAW_DATA_OWNER (t)) + z ();
+  break;
+
 case VECTOR_CST:
   for (unsigned ix = vector_cst_encoded_nelts (t); ix--;)
RT (VECTOR_CST_ENCODED_ELT (t, ix));
--- gcc/testsuite/g++.dg/modules/pr119076-1_a.H.jj  2025-03-03 
10:22:17.514294281 +0100
+++ gcc/testsuite/g++.dg/modules/pr119076-1_a.H 2025-03-03 12:04:10.041822216 
+0100
@@ -0,0 +1,41 @@
+// { dg-additional-options "-fmodule-header -Wno-pedantic" }
+// { dg-module-cmi {} }
+
+constexpr unsigned char a[] = {
+#embed __FILE__
+};
+
+constexpr int
+foo (const unsigned char *p, int s)
+{
+  int r = 0;
+  for (int i = 0; i < s; ++i)
+r += p[i];
+  return r;
+}
+
+constexpr int b = foo (a, sizeof a);
+
+inline int
+bar ()
+{
+  const unsigned char b[] = {
+#embed __FILE__
+  };
+  int r = 0;
+  for (int i =

Re: [patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists

2025-03-04 Thread Andre Vehreschild
Hi Thomas,

while the patch looks ok to me, why did you not choose to generate a "22.7.2
Variable-Length Parameter Lists"

https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Variable-Number-of-Arguments.html

when the arguments differ? Then its the callee responsibility to figure
stuff out. Just out of curiosity. Is this prohibited by any standard? My
question is just out of interest. No need to change the patch. Having a warning
is better, then running into "WTF?".

Look good to me, ok for mainline.

Thanks for the patch,
Andre

On Mon, 3 Mar 2025 22:50:16 +0100
Thomas Koenig  wrote:

> Hello world,
>
> this patch is a bit more complicated than originally envisioned.
>
> The problem was that we were not handling external dummy arguments
> with -fc-prototypes-external. In looking at this, I found that we
> were not warning about external procedures with different argument
> lists.  This can actually be legal (see the two test cases) but
> creates a problem for the C prototypes: If we have something like
>
> subroutine foo(a,n)
>external a
>if (n == 1) call a(1)
>if (n == 2) call a(2,3)
> end subroutine foo
>
> then, pre-C23, we could just have written out the prototype as
>
> void foo_ (void (*a) (), int *n);
>
> but this is illegal in C23. What to do?  I finally chose to warn
> about the argument mismatch, with a new option. Warn only because the
> code above is legal, but include in -Wall because such code seems highly
> suspect.  This option is also implied in -fc-prototypes-external. I also
> put a warning in the generated header file in that case, so users
> have a chance to see what is going on (especially since gcc now
> defaults to C23).
>
> Regression-tested.
>
> Comments?  Suggestions for better wordings?  Is -Wall too strong,
> should this be -Wextra (but then nobody would see it, probably...)?
> OK for trunk?
>
> Best regards
>
>   Thomas


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH 0/2] __builtin_bswapXX: improve docs and tests

2025-03-04 Thread Oscar Gustafsson
Let's see if I get this correct... Apologies in advance if not.

While browsing the intrinsic/built-in functions documentation, I had to
think a bit extra to see what the function did as the example was ambiguous.

Searching for the place to change it, I also realized that the tests were
not ideal as the will not detect the (admittedly unlikely) event that a
nibble swap instruction is used.

BR Oscar Gustafsson


[PATCH 1/2] __builtin_bswapXX: improve docs and tests

2025-03-04 Thread Oscar Gustafsson
ChangeLog:

* doc/extend.texi: Improve example for __builtin_swap16.

---

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ec9bb59900c..83f6e45170b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16338,7 +16338,7 @@ Returns the first argument raised to the power of
the second.  Unlike the

 @defbuiltin{uint16_t __builtin_bswap16 (uint16_t @var{x})}
 Returns @var{x} with the order of the bytes reversed; for example,
-@code{0xaabb} becomes @code{0xbbaa}.  Byte here always means
+@code{0xabcd} becomes @code{0xcdab}.  Byte here always means
 exactly 8 bits.
 @enddefbuiltin


[PATCH 1/2] __builtin_bswapXX: improve docs and tests

2025-03-04 Thread Oscar Gustafsson
gcc/testsuite/ChangeLog

* gcc.dg/builtin-bswap-5.c: improve test vector to avoid nibble swaps
passing

---

diff --git a/gcc/testsuite/gcc.dg/builtin-bswap-5.c
b/gcc/testsuite/gcc.dg/builtin-bswap-5.c
index b29931e4e10..2ba6c31194f 100644
--- a/gcc/testsuite/gcc.dg/builtin-bswap-5.c
+++ b/gcc/testsuite/gcc.dg/builtin-bswap-5.c
@@ -6,13 +6,13 @@ main (void)
   /* Test constant folding.  */
   extern void link_error (void);

-  if (__builtin_bswap16(0xaabb) != 0xbbaa)
+  if (__builtin_bswap16(0xabcd) != 0xcdab)
 link_error ();

-  if (__builtin_bswap32(0xaabbccdd) != 0xddccbbaa)
+  if (__builtin_bswap32(0x89abcdef) != 0xefcdab89)
 link_error ();

-  if (__builtin_bswap64(0x1122334455667788ULL) != 0x8877665544332211ULL)
+  if (__builtin_bswap64(0x0123456789abcdefULL) != 0xefcdab8967452301ULL)
 link_error ();

   return 0;