date:20250303

[Fortran, Patch, PR77872, v1] Fix ICE when getting caf-token from abstract class type.

2025-03-03 Thread Andre Vehreschild

Hi all,

attached patches fix a 12-regression, when a caf token is requested from an
abstract class-typed dummy. The token was not looked up in the correct spot.
Due the class typed object getting an artificial variable for direct derived
type access, the get_caf_decl was looking at the wrong decl.

This patch consists of two parts, the first is just some code complexity
reduction, where an existing attr is now used instead of checking for BT_CLASS
type and branching.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 9b7aeeef184b1e7afbc771e4ef723e4367e8f832 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Mon, 3 Mar 2025 14:42:28 +0100
Subject: [PATCH 2/2] Fortran: Prevent ICE when getting caf-token from abstract
 type [PR77872]

	PR fortran/77872

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_get_tree_for_caf_expr): Pick up token from
	decl when it is present there for class types.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/class_1.f90: New test.
---
 gcc/fortran/trans-expr.cc |  5 +
 gcc/testsuite/gfortran.dg/coarray/class_1.f90 | 16 
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/class_1.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 7c0b17428cd..0d790b63f95 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2394,6 +2394,11 @@ gfc_get_tree_for_caf_expr (gfc_expr *expr)
 	  if (CLASS_DATA (expr->symtree->n.sym)->attr.codimension)
 	return caf_decl;
 	}
+  else if (DECL_P (caf_decl) && DECL_LANG_SPECIFIC (caf_decl)
+	   && GFC_DECL_TOKEN (caf_decl)
+	   && CLASS_DATA (expr->symtree->n.sym)->attr.codimension)
+	return caf_decl;
+
   for (ref = expr->ref; ref; ref = ref->next)
 	{
 	  if (ref->type == REF_COMPONENT
diff --git a/gcc/testsuite/gfortran.dg/coarray/class_1.f90 b/gcc/testsuite/gfortran.dg/coarray/class_1.f90
new file mode 100644
index 000..fa70b1d6162
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/class_1.f90
@@ -0,0 +1,16 @@
+!{ dg-do compile }
+!
+! Compiling the call x%f() ICEd.  Check it's fixed.
+! Contributed by Gerhard Steinmetz  
+
+module pr77872_abs
+   type, abstract :: t
+   contains
+  procedure(s), pass, deferred :: f
+   end type
+contains
+   subroutine s(x)
+  class(t) :: x[*]
+  call x%f()
+   end
+end module pr77872_abs
--
2.48.1

From 504b6270f535bf41ba5943d87e6bbbf7fc1df62a Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Mon, 3 Mar 2025 10:41:05 +0100
Subject: [PATCH 1/2] Fortran: Reduce code complexity [PR77872]

	PR fortran/77872

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_procedure_call): Use attr instead of
	doing type check and branching for BT_CLASS.
---
 gcc/fortran/trans-expr.cc | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index e619013f261..7c0b17428cd 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -8216,23 +8216,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   /* For descriptorless coarrays and assumed-shape coarray dummies, we
 	 pass the token and the offset as additional arguments.  */
   if (fsym && e == NULL && flag_coarray == GFC_FCOARRAY_LIB
-	  && ((fsym->ts.type != BT_CLASS && fsym->attr.codimension
-	   && !fsym->attr.allocatable)
-	  || (fsym->ts.type == BT_CLASS
-		  && CLASS_DATA (fsym)->attr.codimension
-		  && !CLASS_DATA (fsym)->attr.allocatable)))
+	  && attr->codimension && !attr->allocatable)
 	{
 	  /* Token and offset.  */
 	  vec_safe_push (stringargs, null_pointer_node);
 	  vec_safe_push (stringargs, build_int_cst (gfc_array_index_type, 0));
 	  gcc_assert (fsym->attr.optional);
 	}
-  else if (fsym && flag_coarray == GFC_FCOARRAY_LIB
-	   && ((fsym->ts.type != BT_CLASS && fsym->attr.codimension
-		&& !fsym->attr.allocatable)
-		   || (fsym->ts.type == BT_CLASS
-		   && CLASS_DATA (fsym)->attr.codimension
-		   && !CLASS_DATA (fsym)->attr.allocatable)))
+  else if (fsym && flag_coarray == GFC_FCOARRAY_LIB && attr->codimension
+	   && !attr->allocatable)
 	{
 	  tree caf_decl, caf_type, caf_desc = NULL_TREE;
 	  tree offset, tmp2;
--
2.48.1

Personalized Bags and Boxes

2025-03-03 Thread kyle . mypackagingpro

Hi,

We make personalized packaging—tuck boxes, CBD boxes, mailer boxes, rigid 
boxes, bakery boxes, retail boxes, shopping bags, label stickers and more.

Simply provide us with the dimensions (L x W x H), quantity, and preferred box 
style, and we'll provide you with a competitive quote. Plus, for a limited 
time, we’re also providing complimentary design services and shipping on all 
orders.

Thanks, and looking forward to potentially working together.

Best regards,
Kyle Duke
Packaging Expert
My Packaging Pro

[committed] combine: Reverse negative logic in ternary operator

2025-03-03 Thread Uros Bizjak

Reverse negative logic in !a ? b : c to become a ? c : b.

No functional changes.

gcc/ChangeLog:

* combine.cc (distribute_notes):
Reverse negative logic in ternary operators.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed as an obvious patch.

Uros.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 1b2bd34748e..892d37641e9 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -14515,9 +14515,9 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
  if (from_insn != i3)
break;
 
- if (! (REG_P (XEXP (note, 0))
-? find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 0)))
-: find_reg_note (i3, REG_UNUSED, XEXP (note, 0
+ if (REG_P (XEXP (note, 0))
+ ? find_reg_note (i3, REG_UNUSED, XEXP (note, 0))
+ : find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 0
place = i3;
}
  /* Otherwise, if this register is used by I3, then this register
@@ -14525,9 +14525,9 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
 is one already.  */
  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3)))
{
- if (! (REG_P (XEXP (note, 0))
-? find_regno_note (i3, REG_DEAD, REGNO (XEXP (note, 0)))
-: find_reg_note (i3, REG_DEAD, XEXP (note, 0
+ if (REG_P (XEXP (note, 0))
+ ? find_reg_note (i3, REG_DEAD, XEXP (note, 0))
+ : find_regno_note (i3, REG_DEAD, REGNO (XEXP (note, 0
{
  PUT_REG_NOTE_KIND (note, REG_DEAD);
  place = i3;
@@ -14564,11 +14564,11 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
{
  if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
PUT_REG_NOTE_KIND (note, REG_DEAD);
- if (! (REG_P (XEXP (note, 0))
-? find_regno_note (i2, REG_NOTE_KIND (note),
-   REGNO (XEXP (note, 0)))
-: find_reg_note (i2, REG_NOTE_KIND (note),
- XEXP (note, 0
+ if (REG_P (XEXP (note, 0))
+ ? find_reg_note (i2, REG_NOTE_KIND (note),
+  XEXP (note, 0))
+ : find_regno_note (i2, REG_NOTE_KIND (note),
+REGNO (XEXP (note, 0
place = i2;
}
}

Re: [libstdc++] Testsuite should pay attention to extra flags

2025-03-03 Thread Thomas Schwinge

Hi!

On 2002-04-17T21:37:50-0400, Phil Edwards  wrote:
> If the user decides to build the library with extra compiler options
> via --enable-cxx-flags, the testsuite should (by default) use those same
> options when running.

Hmm, are we sure that's what we actually want?

> Verified by passing strange things via --enable
> and watching their effects on the testsuite.

> --- testsuite_flags.in7 Jan 2002 00:07:27 -   1.11
> +++ testsuite_flags.in18 Apr 2002 01:34:43 -
> @@ -49,7 +49,7 @@ case ${query} in
>;;
>  --cxxflags)
>CXXFLAGS=' -g @SECTION_FLAGS@ @SECTION_LDFLAGS@
> -  -fmessage-length=0
> +  -fmessage-length=0 @EXTRA_CXX_FLAGS@
>-DDEBUG_ASSERT  -DLOCALEDIR="@glibcpp_localedir@" '
>echo ${CXXFLAGS}
>;;

This got installed in Subversion r52450
(Git commit 822ca943a31961fd7339de6fbe6593118ec7f231).

To me at least, this comes unexpected: per my reading of (the current)
'libstdc++-v3/acinclude.m4:GLIBCXX_ENABLE_CXX_FLAGS', I'd have thought
these flags apply only to the build of libstdc++ itself, but apparently
"flags to pass to the compiler while building" or
"extra compiler flags for building" also includes use by
'make check-target-libstdc++-v3'?

I'm OK to leave this as-is, if only for hysterical raisins, but would
then proposa a patch to document this behavior?

Alas, there's an additional issue: with Subversion r42272
(Git commit 28e8acb68f8a427552bc9f2d4cae12a1ac477855) -- so, already
in place when the change above got installed -- we'd gotten:

* testsuite/lib/libstdc++-v3-dg.exp (libstdc++-v3-init): Set flags
appropriately for remote testing and testing installed files without
a build dir.

-[...]
-set cxxflags [exec sh ${blddir}/testsuite_flags --cxxflags]
-[...]
+if [is_remote host] {
+  [...]
+  set cxxflags "-ggdb3 -DDEBUG_ASSERT"
+  [...]
+} else {
+# If we find a testsuite_flags file, we're testing in the build 
dir.
+set flags_file "${blddir}/testsuite_flags"
+if { [file exists $flags_file] } {
+[...]
+set cxxflags [exec sh $flags_file --cxxflags]
+[...]
+} else {
+[...]
+set cxxflags "-ggdb3 -DDEBUG_ASSERT"
+[...]
+}
+}

(Nowadays: 'libstdc++-v3/testsuite/lib/libstdc++.exp:libstdc++_init',
similarly.)

That means, any 'EXTRA_CXX_FLAGS' specified by '--enable-cxx-flags=[...]'
are *not* considered for remote host testing -- which in turn appears to
be in conflict with the original intent, quoted at the beginning of this
email?  Hmm...

I think we should clarify the intended cases where 'EXTRA_CXX_FLAGS' (as
specified by '--enable-cxx-flags=[...]') apply, and then make that apply
in a uniform way?  I think I'd be in favor of removing them from
'testsuite_flags --cxxflags' -- but can't quite tell which use cases
that'd break...

Grüße
 Thomas

Re: [committed] combine: Reverse negative logic in ternary operator

2025-03-03 Thread Richard Biener




> Am 03.03.2025 um 17:08 schrieb Uros Bizjak :
> 
> Reverse negative logic in !a ? b : c to become a ? c : b.
> 
> No functional changes.
> 
> gcc/ChangeLog:
> 
>* combine.cc (distribute_notes):
>Reverse negative logic in ternary operators.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Committed as an obvious patch.

Err, isn’t the result of the ternary negated and thus your patch is wrong?
> 
> Uros.
>

Re: [PATCH htdocs] bugs: Link to all 'Porting to' docs in 'Common problems when upgrading ...'

2025-03-03 Thread Gerald Pfeifer

On Wed, 12 Feb 2025, Sam James wrote:
> Suggested by Andrew Pinski. I think it makes sense to have it in here 
> even if perhaps a bit verbose, because we really try to tell bug 
> reporters to read the page properly.

Makes sense.

> This could also be a table.

I'm not sure how a table would look like? (Genuinely, if you feel it'd be 
better happy to have a look.)

> +GCC maintains a 'Porting to' resource for new versions of the compiler:

How about skipping "of the compiler" (not the least since it's a set of 
compilers ;-) and then something like

  "...new version: GCC 15 | GCC 14 | GCC 13 | ..."

(or with commas or something along these lines)?

> +  https://gcc.gnu.org/gcc-15/porting_to.html";>GCC 15

Can you please make all these links relative, such as 
../gcc-15/porting_to.html ?

Fine with these changes. 

Can you please also enhance the documentation for our release managers?
Probably branching.html (over releasing HTML since some distros give new 
GCC versions a spin before the first release in the series).

Thank you,
Gerald

Re: [committed] combine: Reverse negative logic in ternary operator

2025-03-03 Thread Uros Bizjak

On Mon, Mar 3, 2025 at 5:44 PM Richard Biener
 wrote:
>
>
>
> > Am 03.03.2025 um 17:08 schrieb Uros Bizjak :
> >
> > Reverse negative logic in !a ? b : c to become a ? c : b.
> >
> > No functional changes.
> >
> > gcc/ChangeLog:
> >
> >* combine.cc (distribute_notes):
> >Reverse negative logic in ternary operators.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Committed as an obvious patch.
>
> Err, isn’t the result of the ternary negated and thus your patch is wrong?

Ouch ... indeed. What I was looking at... :(

Reverted, and thanks!

Uros.

Re: [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c

2025-03-03 Thread Richard Earnshaw

On 28/02/2025 16:18, Richard Earnshaw wrote:
> On 28/02/2025 16:12, Richard Earnshaw wrote:
>> On 08/11/2024 18:47, Torbjörn SVENSSON wrote:
>>> Ok for trunk and releases/gcc-14?
>>>
>>> -- 
>>>
>>> A long time ago, this test forced -march=armv6.
>>>
>>> With -marm, the generated assembler is:
>>> foo:
>>>  sub r0, r0, #48
>>>  cmp r0, #9
>>>  movhi   r0, #0
>>>  movls   r0, #1
>>>  bx  lr
>>>
>>> With -mthumb, the generated assembler is:
>>> foo:
>>>  subs    r0, r0, #48
>>>  movs    r2, #9
>>>  uxtb    r3, r0
>>>  movs    r0, #0
>>>  cmp r2, r3
>>>  adcs    r0, r0, r0
>>>  uxtb    r0, r0
>>>  bx  lr
>>>
>>
>> Looking at this code, I think both the uxtb instructions are unnecessary.
>>
>> For the first UXTB.  On entry, 'c' is a char (unsigned) so the value is 
>> passed by the caller in a 32-bit reg, but range-limited (by the ABI) to 
>> values between 0 and 255.
>> We subtract 48 from that, so the range now, is from -48 to 207, but we're 
>> going to look at this as an unsigned value, so it's effectively 0..207 or 
>> UINT_MAX-48..UINT_MAX.
>> The UXTB instruction then converts the range so that the range becomes 
>> 0..255 again (but importantly, the values between UINT_MAX-48 and UINT_MAX 
>> are mapped to the range 208..255.  We then do an unsigned comparison between 
>> that value and the constant 9 to test whether the result is >= 9.
>> Now, importantly, all the values that are in the range 208..255 are >= 9, 
>> but were also >= 9 before the UXTB instruction.  In effect, the UXTB is 
>> completely redundant.
>>
> I said, >= 9 above, but it should be > 9 (hence condition code 'hi'). That 
> does affect the rest of the argument.
> 
> R.
> 
>> The second UXTB is even more egregious.  We have 0 in r0 and we then 
>> conditionally add 1 to it, so r0 is in the range 0..1.  Zero extending that 
>> with a UXTB is therefore clearly pointless.
>>
>> So neither of the UXTB instructions should be present, even if generating 
>> Thumb1 code.
>>
>> I think the failures are, therefore, real and we should look to fix the 
>> compiler rather than 'fix' the scope of the test.
>>
>> R.
>>
>>> Require effective-target arm_arm_ok to skip the test for thumb-only
>>> targets (Cortex-M).
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/arm/unsigned-extend-1.c: Use effective-target
>>> arm_arm_ok.
>>>
>>> Signed-off-by: Torbjörn SVENSSON 
>>> ---
>>>   gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ 
>>> testsuite/gcc.target/arm/unsigned-extend-1.c
>>> index 3b4ab048fb0..73f2e1a556d 100644
>>> --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
>>> +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
>>> @@ -1,4 +1,5 @@
>>>   /* { dg-do compile } */
>>> +/* { dg-require-effective-target arm_arm_ok } */
>>>   /* { dg-options "-O2" } */
>>>   unsigned char foo (unsigned char c)
>>
> 

I've just pushed a patch to eliminate one of the redundant zero-extends.  The 
other is harder to eliminate, so I've adjusted the test to xfail when compiling 
for thumb1.

R.

Re: [PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-03-03 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> When the input is already a subreg and we try to make a paradoxical
> subreg out of it for copysign this can fail if it violates the sugreg

subreg

> relationship.
>
> Use force_lowpart_subreg instead of lowpart_subreg to then force the
> results to a register instead of ICEing.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/118892
>   * config/aarch64/aarch64.md (copysign3): Use
>   force_lowpart_subreg instead of lowpart_subreg.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/118892
>   * gcc.target/aarch64/copysign-pr118892.c: New test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> cfe730f3732ce45c914b30a908851a4a7dd77c0f..62be9713cf417922b3c06e38f12f401872751fa2
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7479,8 +7479,8 @@ (define_expand "copysign3"
>&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
>  {
>emit_insn (gen_ior3 (
> - lowpart_subreg (mode, operands[0], mode),
> - lowpart_subreg (mode, operands[1], mode),
> + force_lowpart_subreg (mode, operands[0], mode),
> + force_lowpart_subreg (mode, operands[1], mode),

force_lowpart_subreg conditionally forces the SUBREG_REG into a new temporary
register and then takes the subreg of that.  It's therefore only appropriate
for source operands, not destination operands.

It's true that the same problem could in principle occur for the
destination, but that would need to be fixed in a different way.

OK with just the operands[1] change, without the operands[0] change.

Thanks,
Richard


>   v_bitmask));
>DONE;
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c 
> b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> new file mode 100644
> index 
> ..adfa30dc3e2db895af4f2057bdd1011fdb7d4537
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast" } */
> +
> +double l();
> +double f()
> +{
> +  double t6[2] = {l(), l()};
> +  double t7[2];
> +  __builtin_memcpy(&t7, &t6, sizeof(t6));
> +  return -__builtin_fabs(t7[1]);
> +}

RE: [PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-03-03 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, March 3, 2025 10:12 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH]AArch64: force operand to fresh register to avoid subreg
> issues [PR118892]
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > When the input is already a subreg and we try to make a paradoxical
> > subreg out of it for copysign this can fail if it violates the sugreg
> 
> subreg
> 
> > relationship.
> >
> > Use force_lowpart_subreg instead of lowpart_subreg to then force the
> > results to a register instead of ICEing.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/118892
> > * config/aarch64/aarch64.md (copysign3): Use
> > force_lowpart_subreg instead of lowpart_subreg.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/118892
> > * gcc.target/aarch64/copysign-pr118892.c: New test.
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index
> cfe730f3732ce45c914b30a908851a4a7dd77c0f..62be9713cf417922b3c06e38f
> 12f401872751fa2 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -7479,8 +7479,8 @@ (define_expand "copysign3"
> >&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
> >  {
> >emit_insn (gen_ior3 (
> > -   lowpart_subreg (mode, operands[0], mode),
> > -   lowpart_subreg (mode, operands[1], mode),
> > +   force_lowpart_subreg (mode, operands[0],
> mode),
> > +   force_lowpart_subreg (mode, operands[1],
> mode),
> 
> force_lowpart_subreg conditionally forces the SUBREG_REG into a new temporary
> register and then takes the subreg of that.  It's therefore only appropriate
> for source operands, not destination operands.
> 
> It's true that the same problem could in principle occur for the
> destination, but that would need to be fixed in a different way.
> 

Ah, true. Should have thought about it a bit more.

> OK with just the operands[1] change, without the operands[0] change.
> 

I forgot to ask if OK for GCC 14 backport after some stew.

Thanks,
Tamar
> Thanks,
> Richard
> 
> 
> > v_bitmask));
> >DONE;
> >  }
> > diff --git a/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> > new file mode 100644
> > index
> ..adfa30dc3e2db895af4f205
> 7bdd1011fdb7d4537
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast" } */
> > +
> > +double l();
> > +double f()
> > +{
> > +  double t6[2] = {l(), l()};
> > +  double t7[2];
> > +  __builtin_memcpy(&t7, &t6, sizeof(t6));
> > +  return -__builtin_fabs(t7[1]);
> > +}

Re: [PATCH htdocs] bugs: Link to all 'Porting to' docs in 'Common problems when upgrading ...'

2025-03-03 Thread Sam James

Sam James  writes:

> Suggested by Andrew Pinski. I think it makes sense to have it in here even
> if perhaps a bit verbose, because we really try to tell bug reporters to
> read the page properly.
>
> This could also be a table.

Ping on this if I may.

Re: [PATCH htdocs 1/2] bugs: improve "ABI changes" subsection

2025-03-03 Thread Sam James

Jonathan Wakely  writes:

> This looks more accurate than the current wording, yes.
>
> Specifically, only objects/libraries "built with experimental standard
> support" need to be recompiled.
>
> LGTM, but I'll let Jason give approval.
>

ping (I've found myself citing this section a few times recently and
don't want people to get the wrong idea).

>
>
>
> On Wed, 12 Feb 2025 at 09:30, Sam James  wrote:
>>
>> C++ ABI for C++ standards with full support by GCC (rather than those
>> marked as experimental per https://gcc.gnu.org/projects/cxx-status.html)
>> should be stable. It's certainly not the case in 2025 that one needs a
>> full world rebuild for C++ libraries using e.g. the default standard
>> or any other supported standard by C++, unless it is marked experimental
>> where we provide no guarantees.
>> ---
>>  htdocs/bugs/index.html | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
>> index d6556b26..99d19095 100644
>> --- a/htdocs/bugs/index.html
>> +++ b/htdocs/bugs/index.html
>> @@ -633,14 +633,14 @@ changed the parser rules so that <:: 
>> works as expected.
>>  components: the first defines how the elements of classes are laid
>>  out, how functions are called, how function names are mangled, etc;
>>  the second part deals with the internals of the objects in libstdc++.
>> -Although we strive for a non-changing ABI, so far we have had to
>> -modify it with each major release.  If you change your compiler to a
>> -different major release you must recompile all libraries that
>> -contain C++ code.  If you fail to do so you risk getting linker
>> -errors or malfunctioning programs.
>> -It should not be necessary to recompile if you have changed
>> -to a bug-fix release of the same version of the compiler; bug-fix
>> -releases are careful to avoid ABI changes. See also the
>> +For C++ standards marked as
>> +https://gcc.gnu.org/projects/cxx-status.html";>experimental,
>> +stable ABI is not guaranteed: for these, if you change your compiler to a
>> +different major release you must recompile any such libraries built
>> +with experimental standard support that contain C++ code.  If you fail
>> +to do so, you risk getting linker errors or malfunctioning programs.
>> +It should not be necessary to recompile for C++ standards supported fully
>> +by GCC, such as the default standard.  See also the
>>  > href="https://gcc.gnu.org/onlinedocs/gcc/Compatibility.html";>compatibility
>>  section of the GCC manual.
>>
>> --
>> 2.48.1
>>

[PING 2][PATCH v2] libcpp: Fix incorrect line numbers in large files [PR108900]

2025-03-03 Thread Yash . Shinde

From: Yash Shinde 

This patch addresses an issue in the C preprocessor where incorrect line number 
information is generated when processing
files with a large number of lines. The problem arises from improper handling 
of location intervals in the line map,
particularly when locations exceed LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES.

By ensuring that the highest location is not decremented if it would move to a 
different ordinary map, this fix resolves
the line number discrepancies observed in certain test cases. This change 
improves the accuracy of line number reporting,
benefiting users relying on precise code coverage and debugging information.

Signed-off-by: Jeremy Bettis 
Signed-off-by: Yash Shinde 
---
 libcpp/files.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/libcpp/files.cc b/libcpp/files.cc
index 1ed19ca..3e6ca119ad5 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -1046,6 +1046,14 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, 
include_type type,
&& type < IT_DIRECTIVE_HWM
&& (pfile->line_table->highest_location
!= LINE_MAP_MAX_LOCATION - 1));
+
+  if (decrement && LINEMAPS_ORDINARY_USED (pfile->line_table))
+{
+  const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP 
(pfile->line_table);
+  if (map && map->start_location == pfile->line_table->highest_location)
+   decrement = false;
+}
+
   if (decrement)
 pfile->line_table->highest_location--;
 
-- 
2.43.0

Re: [Fortran, Patch, PR118747, v1] Prevent double free alloc. comp. in derived type function results

2025-03-03 Thread Andre Vehreschild

Hi Paul,

thanks for the review. Committed as gcc-15-7789-g43c11931acc.

The regression is tagged as 15-regression only and was caused by PR
fortran/90068. At least the change in trans-array.cc:2000-.. is one of
major causes for that regression.

Thanks again,
Andre

On Sat, 1 Mar 2025 08:09:46 +
Paul Richard Thomas  wrote:

> Hi Andre,
>
> This looks fine to me. You say that this is a regression. How far back does
> it go?
>
> OK for mainline and, if required, for backporting.
>
> Thanks for the patch.
>
> Paul
>
>
> On Fri, 28 Feb 2025 at 15:54, Andre Vehreschild  wrote:
>
> > Hi all,
> >
> > on this regression I had to chew a longer time. Assume this Fortran:
> >
> > type T
> >integer, allocatable:: a
> > end type T
> >
> > result(type T) function bar()
> >   allocate(bar%a)
> > end function
> >
> > call foo([bar()])
> >
> > That Fortran fragment was translated to something like (pseudo code):
> >
> > T temp;
> > T arr[];
> > temp = bar();
> > arr[0]= temp;
> > foo(arr);
> > if (temp.a) { free(temp.a); temp.a= NULL;}
> > for (i in size(arr))
> >   if (arr[i].a) { free(arr[i].a]; <-- double free here
> > arr[i].a = NULL;
> > }
> >
> > I.e., when the derived type result of a function was used in an array
> > constructor that was used a function argument, then the temporary used to
> > evaluate the function only ones was declared to be of value. When the
> > derived
> > type now had allocatable components, freeing those would be done on the
> > value
> > typed temporary (here temp). But later on the array would also be freed.
> > Now a
> > doulbe free occured, because the temporary variable was already freed. The
> > patch fixes this, by preventing the temporary when not necessary, or using
> > a
> > temporary that is reference into the array, i.e., the memory freed (and
> > marked
> > as such) is stored at the same location.
> >
> > So after the patch this looks like this:
> >
> > T *temp; // Now a pointer!
> > T arr[];
> > arr[0] = bar();
> > temp = &arr[0];
> > ... Now we're safe, because freeing temp->a sets arr[0].a to NULL and the
> > following loop is safe.
> >
> > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> >
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
> >


--
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov

Hi all,

In this testcase late-combine was failing to merge:
dup v31.4s, v31.s[3]
fmla v30.4s, v31.4s, v29.4s
into the lane-wise fmla form.
This is because late-combine checks may_trap_p under the hood on the dup insn.
This ended up returning true for the insn:
(set (reg:V4SF 152 [ _32 ])
(vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
(parallel:V4SF [
(const_int 3 [0x3])]

Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
floating-point modes can't trap, it assumed that the V4SF parallel can trap.
The correct behaviour is to recurse into vector inside the PARALLEL and check
the sub-expression. This patch adjusts may_trap_p_1 to do just that.
With this check the above insn is not deemed to be trapping and is propagated
into the FMLA giving:
fmla vD.4s, vA.4s, vB.s[3]

Bootstrapped and tested on aarch64-none-linux-gnu.
Apparently this also fixes a regression in
gcc.target/aarch64/vmul_element_cost.c that I observed.

Signed-off-by: Kyrylo Tkachov 

gcc/

PR rtl-optimization/119046
* rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.

gcc/testsuite/

PR rtl-optimization/119046
* gcc.target/aarch64/pr119046.c: New test.



v2_0001-PR-rtl-optimization-119046-Don-t-mark-PARALLEL-RTXes.patch
Description: v2_0001-PR-rtl-optimization-119046-Don-t-mark-PARALLEL-RTXes.patch

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov

> On 3 Mar 2025, at 09:49, Andrew Pinski  wrote:
> 
> On Mon, Mar 3, 2025 at 12:43 AM Kyrylo Tkachov  wrote:
>> 
>> 
>> 
>>> On 28 Feb 2025, at 19:06, Andrew Pinski  wrote:
>>> 
>>> On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov  wrote:

 Hi all,

 In this PR late-combine was failing to merge:
 dup v31.4s, v31.s[3]
 fmla v30.4s, v31.4s, v29.4s
 into the lane-wise fmla form.
 This is because late-combine checks may_trap_p under the hood on the dup 
 insn.
 This ended up returning true for the insn:
 (set (reg:V4SF 152 [ _32 ])
  (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
  (parallel:V4SF [
  (const_int 3 [0x3])]

 Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
 floating-point modes can't trap, it assumed that the V4SF parallel can 
 trap.
 The correct behaviour is to recurse into vector inside the PARALLEL and 
 check
 the sub-expression. This patch adjusts may_trap_p_1 to do just that.
 With this check the above insn is not deemed to be trapping and is 
 propagated
 into the FMLA giving:

 fmla vD.4s, vA.4s, vB.s[3]

 The testcase is reduced from a larger example so it has some C++ cruft 
 around
 the main part but still demonstrates the desired effect.

 Bootstrapped and tested on aarch64-none-linux-gnu.
 Apparently this also fixes a regression in
 gcc.target/aarch64/vmul_element_cost.c that I observed.

 Ok for trunk?
>>> 
>>> This looks ok but here is a better/more reduced testcase without the
>>> extra C++ism:
>>> ```
>>> /* { dg-do compile } */
>>> /* { dg-additional-options "-O2" } */
>>> #include 
>>> 
>>> float32x4_t madd_helper_1(float32x4_t a, float32x4_t b, float32x4_t d)
>>> {
>>> float32x4_t t = a;
>>> t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
>>> t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
>>> return t;
>>> }
>>> /* { dg-final { scan-assembler-not {\tdup\tv[0-9]+\.4s, v[0-9]+.s\[3\]\n} } 
>>> } */
>>> /* { dg-final { scan-assembler-times {\tfmla\tv[0-9]+\.4s,
>>> v[0-9]+\.4s, v[0-9]+\.s\[3\]\n} 1 } } */
>>> ```
>>> 
>> 
>> Thanks, but this doesn’t reproduce the issue on trunk. Normal combine merges 
>> the lane forms as is.
>> What we want is a case where combine refuses to do the merging but late 
>> combine does (or should be doing).
> 
> I just tested it on the trunk and we get dup:
> https://godbolt.org/z/Yfc7d1r1x
> 
> If I had used vgetq_lane_f32 with lane 0, then there would have been
> no dup as that would be represented as a subreg.
> 

You’re right, I must have been misusing godbolt this morning.
I’ve sent an updated patch with the smaller test case at:
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676730.html

Thanks again,
Kyrill

> Thanks,
> Andrew
> 
> 
>> 
>> Thanks,
>> Kyrill
>> 
>>> Thanks,
>>> Andrew Pinski
>>> 
 Thanks,
 Kyrill

 Signed-off-by: Kyrylo Tkachov 

 gcc/

  PR rtl-optimization/119046
  * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.

 gcc/testsuite/

  PR rtl-optimization/119046
  * g++.target/aarch64/pr119046.C: New test.

>>

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov



> On 28 Feb 2025, at 19:06, Andrew Pinski  wrote:
> 
> On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov  wrote:
>> 
>> Hi all,
>> 
>> In this PR late-combine was failing to merge:
>> dup v31.4s, v31.s[3]
>> fmla v30.4s, v31.4s, v29.4s
>> into the lane-wise fmla form.
>> This is because late-combine checks may_trap_p under the hood on the dup 
>> insn.
>> This ended up returning true for the insn:
>> (set (reg:V4SF 152 [ _32 ])
>>   (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
>>   (parallel:V4SF [
>>   (const_int 3 [0x3])]
>> 
>> Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
>> floating-point modes can't trap, it assumed that the V4SF parallel can trap.
>> The correct behaviour is to recurse into vector inside the PARALLEL and check
>> the sub-expression. This patch adjusts may_trap_p_1 to do just that.
>> With this check the above insn is not deemed to be trapping and is propagated
>> into the FMLA giving:
>> 
>> fmla vD.4s, vA.4s, vB.s[3]
>> 
>> The testcase is reduced from a larger example so it has some C++ cruft around
>> the main part but still demonstrates the desired effect.
>> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Apparently this also fixes a regression in
>> gcc.target/aarch64/vmul_element_cost.c that I observed.
>> 
>> Ok for trunk?
> 
> This looks ok but here is a better/more reduced testcase without the
> extra C++ism:
> ```
> /* { dg-do compile } */
> /* { dg-additional-options "-O2" } */
> #include 
> 
> float32x4_t madd_helper_1(float32x4_t a, float32x4_t b, float32x4_t d)
> {
> float32x4_t t = a;
> t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
> t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
> return t;
> }
> /* { dg-final { scan-assembler-not {\tdup\tv[0-9]+\.4s, v[0-9]+.s\[3\]\n} } } 
> */
> /* { dg-final { scan-assembler-times {\tfmla\tv[0-9]+\.4s,
> v[0-9]+\.4s, v[0-9]+\.s\[3\]\n} 1 } } */
> ```
> 

Thanks, but this doesn’t reproduce the issue on trunk. Normal combine merges 
the lane forms as is.
What we want is a case where combine refuses to do the merging but late combine 
does (or should be doing).

Thanks,
Kyrill

> Thanks,
> Andrew Pinski
> 
>> Thanks,
>> Kyrill
>> 
>> Signed-off-by: Kyrylo Tkachov 
>> 
>> gcc/
>> 
>>   PR rtl-optimization/119046
>>   * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.
>> 
>> gcc/testsuite/
>> 
>>   PR rtl-optimization/119046
>>   * g++.target/aarch64/pr119046.C: New test.
>>

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Andrew Pinski

On Mon, Mar 3, 2025 at 12:43 AM Kyrylo Tkachov  wrote:
>
>
>
> > On 28 Feb 2025, at 19:06, Andrew Pinski  wrote:
> >
> > On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov  wrote:
> >>
> >> Hi all,
> >>
> >> In this PR late-combine was failing to merge:
> >> dup v31.4s, v31.s[3]
> >> fmla v30.4s, v31.4s, v29.4s
> >> into the lane-wise fmla form.
> >> This is because late-combine checks may_trap_p under the hood on the dup 
> >> insn.
> >> This ended up returning true for the insn:
> >> (set (reg:V4SF 152 [ _32 ])
> >>   (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
> >>   (parallel:V4SF [
> >>   (const_int 3 [0x3])]
> >>
> >> Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
> >> floating-point modes can't trap, it assumed that the V4SF parallel can 
> >> trap.
> >> The correct behaviour is to recurse into vector inside the PARALLEL and 
> >> check
> >> the sub-expression. This patch adjusts may_trap_p_1 to do just that.
> >> With this check the above insn is not deemed to be trapping and is 
> >> propagated
> >> into the FMLA giving:
> >>
> >> fmla vD.4s, vA.4s, vB.s[3]
> >>
> >> The testcase is reduced from a larger example so it has some C++ cruft 
> >> around
> >> the main part but still demonstrates the desired effect.
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Apparently this also fixes a regression in
> >> gcc.target/aarch64/vmul_element_cost.c that I observed.
> >>
> >> Ok for trunk?
> >
> > This looks ok but here is a better/more reduced testcase without the
> > extra C++ism:
> > ```
> > /* { dg-do compile } */
> > /* { dg-additional-options "-O2" } */
> > #include 
> >
> > float32x4_t madd_helper_1(float32x4_t a, float32x4_t b, float32x4_t d)
> > {
> > float32x4_t t = a;
> > t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
> > t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d);
> > return t;
> > }
> > /* { dg-final { scan-assembler-not {\tdup\tv[0-9]+\.4s, v[0-9]+.s\[3\]\n} } 
> > } */
> > /* { dg-final { scan-assembler-times {\tfmla\tv[0-9]+\.4s,
> > v[0-9]+\.4s, v[0-9]+\.s\[3\]\n} 1 } } */
> > ```
> >
>
> Thanks, but this doesn’t reproduce the issue on trunk. Normal combine merges 
> the lane forms as is.
> What we want is a case where combine refuses to do the merging but late 
> combine does (or should be doing).

I just tested it on the trunk and we get dup:
https://godbolt.org/z/Yfc7d1r1x

If I had used vgetq_lane_f32 with lane 0, then there would have been
no dup as that would be represented as a subreg.

Thanks,
Andrew


>
> Thanks,
> Kyrill
>
> > Thanks,
> > Andrew Pinski
> >
> >> Thanks,
> >> Kyrill
> >>
> >> Signed-off-by: Kyrylo Tkachov 
> >>
> >> gcc/
> >>
> >>   PR rtl-optimization/119046
> >>   * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as 
> >> trapping.
> >>
> >> gcc/testsuite/
> >>
> >>   PR rtl-optimization/119046
> >>   * g++.target/aarch64/pr119046.C: New test.
> >>
>

Re: [PATCH v1] RISC-V: Fix the test case bug-3.c failure

2025-03-03 Thread Robin Dapp


LGTM.

--
Regards
Robin

[PATCH] ipa/119067 - bogus TYPE_PRECISION check on VECTOR_TYPE

2025-03-03 Thread Richard Biener

odr_types_equivalent_p can end up using TYPE_PRECISION on vector
types which is a no-go.  The following instead uses TYPE_VECTOR_SUBPARTS
for vector types so we also end up comparing the number of vector elements.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR ipa/119067
* ipa-devirt.cc (odr_types_equivalent_p): Check
TYPE_VECTOR_SUBPARTS for vectors.

* g++.dg/lto/pr119067_0.C: New testcase.
* g++.dg/lto/pr119067_1.C: Likewise.
---
 gcc/ipa-devirt.cc | 10 +-
 gcc/testsuite/g++.dg/lto/pr119067_0.C | 22 ++
 gcc/testsuite/g++.dg/lto/pr119067_1.C | 10 ++
 3 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr119067_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/pr119067_1.C

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index c31658f57ef..532e25e87c6 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -1259,13 +1259,21 @@ odr_types_equivalent_p (tree t1, tree t2, bool warn, 
bool *warned,
   || TREE_CODE (t1) == OFFSET_TYPE
   || POINTER_TYPE_P (t1))
 {
-  if (TYPE_PRECISION (t1) != TYPE_PRECISION (t2))
+  if (!VECTOR_TYPE_P (t1) && TYPE_PRECISION (t1) != TYPE_PRECISION (t2))
{
  warn_odr (t1, t2, NULL, NULL, warn, warned,
G_("a type with different precision is defined "
   "in another translation unit"));
  return false;
}
+  if (VECTOR_TYPE_P (t1)
+ && maybe_ne (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2)))
+   {
+ warn_odr (t1, t2, NULL, NULL, warn, warned,
+   G_("a vector type with different number of elements "
+  "is defined in another translation unit"));
+ return false;
+   }
   if (TYPE_UNSIGNED (t1) != TYPE_UNSIGNED (t2))
{
  warn_odr (t1, t2, NULL, NULL, warn, warned,
diff --git a/gcc/testsuite/g++.dg/lto/pr119067_0.C 
b/gcc/testsuite/g++.dg/lto/pr119067_0.C
new file mode 100644
index 000..e0f813ceffe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr119067_0.C
@@ -0,0 +1,22 @@
+/* { dg-lto-do link } */
+/* { dg-skip-if "" { ! { x86_64-*-* i?86-*-* } } } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-require-effective-target shared } */
+/* { dg-lto-options { { -O2 -fPIC -flto } } } */
+/* { dg-extra-ld-options { -shared } } */
+
+#pragma GCC push_options
+#pragma GCC target("avx2")
+typedef char __v32qi __attribute__ ((__vector_size__ (32)));
+struct ff
+{
+  __v32qi t;
+};
+__v32qi g(struct ff a);
+
+__v32qi h(__v32qi a)
+{
+  struct ff t = {a};
+  return g(t);
+}
+#pragma GCC pop_options
diff --git a/gcc/testsuite/g++.dg/lto/pr119067_1.C 
b/gcc/testsuite/g++.dg/lto/pr119067_1.C
new file mode 100644
index 000..d8e2935fa24
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr119067_1.C
@@ -0,0 +1,10 @@
+/* { dg-options "-mavx2" } */
+
+typedef char __v32qi __attribute__ ((__vector_size__ (32)));
+struct ff
+{
+  __v32qi t;
+};
+__v32qi g(struct ff a) {
+ return a.t;
+}
-- 
2.43.0

Re: [PATCH] gimple: sccopy: Prune removed statements from SCCs [PR117919]

2025-03-03 Thread Richard Biener

On Mon, 3 Mar 2025, Filip Kastl wrote:

> Hi Richard,
> 
> I almost forgot that the issue is also present on GCC 14.  Can I backport to
> releases/gcc-14 branch?

Sure.

> Thanks,
> Filip Kastl
> 
> On Fri 2025-02-28 17:46:42, Richard Biener wrote:
> > 
> > 
> > > Am 28.02.2025 um 17:02 schrieb Filip Kastl :
> > > 
> > > Hi,
> > > 
> > > bootstrapped and regtested on x86_64 linux.  Ok to be pushed?
> > 
> > Ok
> > 
> > Richard 
> > 
> > > Thanks,
> > > Filip Kastl
> > > 
> > > 
> > > -- 8< --
> > > 
> > > 
> > > While writing the sccopy pass I didn't realize that 'replace_uses_by ()' 
> > > can
> > > remove portions of the CFG.  This happens when replacing arguments of some
> > > statement results in the removal of an EH edge.  Because of this sccopy 
> > > can
> > > then work with GIMPLE statements that aren't part of the IR anymore.  In
> > > PR117919 this triggered an assertion within the pass which assumes that
> > > statements the pass works with are reachable.
> > > 
> > > This patch tells the pass to notice when a statement isn't in the IR 
> > > anymore
> > > and remove it from it's worklist.
> > > 
> > >PR tree-optimization/117919
> > > 
> > > gcc/ChangeLog:
> > > 
> > >* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Prune
> > >statements that 'replace_uses_by ()' removed.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >* g++.dg/pr117919.C: New test.
> > > 
> > > Signed-off-by: Filip Kastl 
> > > ---
> > > gcc/gimple-ssa-sccopy.cc| 13 +
> > > gcc/testsuite/g++.dg/pr117919.C | 52 +
> > > 2 files changed, 65 insertions(+)
> > > create mode 100644 gcc/testsuite/g++.dg/pr117919.C
> > > 
> > > diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> > > index 9f25fbaff36..7ffb5718ab6 100644
> > > --- a/gcc/gimple-ssa-sccopy.cc
> > > +++ b/gcc/gimple-ssa-sccopy.cc
> > > @@ -568,6 +568,19 @@ scc_copy_prop::propagate ()
> > > {
> > >   vec scc = worklist.pop ();
> > > 
> > > +  /* When we do 'replace_scc_by_value' it may happen that some EH 
> > > edges
> > > + get removed.  That means parts of CFG get removed.  Those may
> > > + contain copy statements.  For that reason we prune SCCs here.  */
> > > +  unsigned i;
> > > +  for (i = 0; i < scc.length (); i++)
> > > +if (gimple_bb (scc[i]) == NULL)
> > > +  scc.unordered_remove (i);
> > > +  if (scc.is_empty ())
> > > +{
> > > +  scc.release ();
> > > +  continue;
> > > +}
> > > +
> > >   auto_vec inner;
> > >   hash_set outer_ops;
> > >   tree last_outer_op = NULL_TREE;
> > > diff --git a/gcc/testsuite/g++.dg/pr117919.C 
> > > b/gcc/testsuite/g++.dg/pr117919.C
> > > new file mode 100644
> > > index 000..fa2d9c9cd1e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/pr117919.C
> > > @@ -0,0 +1,52 @@
> > > +/* PR tree-optimization/117919 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O1 -fno-tree-forwprop -fnon-call-exceptions 
> > > --param=early-inlining-insns=192 -std=c++20" } */
> > > +
> > > +char _M_p, _M_construct___beg;
> > > +struct _Alloc_hider {
> > > +  _Alloc_hider(char);
> > > +};
> > > +long _M_string_length;
> > > +void _M_destroy();
> > > +void _S_copy_chars(char *, char *, char *) noexcept;
> > > +char _M_local_data();
> > > +struct Trans_NS___cxx11_basic_string {
> > > +  _Alloc_hider _M_dataplus;
> > > +  bool _M_is_local() {
> > > +if (_M_local_data())
> > > +  if (_M_string_length)
> > > +return true;
> > > +return false;
> > > +  }
> > > +  void _M_dispose() {
> > > +if (!_M_is_local())
> > > +  _M_destroy();
> > > +  }
> > > +  char *_M_construct___end;
> > > +  Trans_NS___cxx11_basic_string(Trans_NS___cxx11_basic_string &)
> > > +  : _M_dataplus(0) {
> > > +struct _Guard {
> > > +  ~_Guard() { _M_guarded->_M_dispose(); }
> > > +  Trans_NS___cxx11_basic_string *_M_guarded;
> > > +} __guard0;
> > > +_S_copy_chars(&_M_p, &_M_construct___beg, _M_construct___end);
> > > +  }
> > > +};
> > > +namespace filesystem {
> > > +struct path {
> > > +  path();
> > > +  Trans_NS___cxx11_basic_string _M_pathname;
> > > +};
> > > +} // namespace filesystem
> > > +struct FileWriter {
> > > +  filesystem::path path;
> > > +  FileWriter() : path(path) {}
> > > +};
> > > +struct LanguageFileWriter : FileWriter {
> > > +  LanguageFileWriter(filesystem::path) {}
> > > +};
> > > +int
> > > +main() {
> > > +  filesystem::path output_file;
> > > +  LanguageFileWriter writer(output_file);
> > > +}
> > > --
> > > 2.47.1
> > > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v1] RISC-V: Fix the test case bug-3.c failure

2025-03-03 Thread pan2 . li

From: Pan Li 

The bug-3.c would like to check the slli a[0-9]+, a[0-9]+, 33 for the
big poly int handling.  But the underlying insn may change to slli 1
+ slli 32 with sorts of optimization.  Thus, update the asm check to
function body check with above slli 1 + slli 32 series.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/bug-3.c: Update asm check to
function body check.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-3.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-3.c
index 05ac2e54cbe..2d5f4c2e0de 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d -mrvv-max-lmul=m8 
-mrvv-vector-bits=scalable -fno-vect-cost-model -O2 -ffast-math" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #define N 16
 
@@ -25,7 +26,15 @@ _Complex float res[N] =
   -740.0F + 2488.0iF, -760.0F + 2638.0iF,
   -780.0F + 2792.0iF, -800.0F + 2950.0iF };
 
-
+/*
+** foo:
+** ...
+** csrr\s+[atx][0-9]+,\s*vlenb
+** slli\s+[atx][0-9]+,\s*[atx][0-9],\s*1
+** ...
+** slli\s+[atx][0-9]+,\s*[atx][0-9],\s*32
+** ...
+*/
 void
 foo (void)
 {
@@ -36,4 +45,3 @@ foo (void)
 }
 
 /* { dg-final { scan-assembler-not {li\s+[a-x0-9]+,\s*0} } } */
-/* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*33} 1 } 
} */
-- 
2.43.0

Re: [PATCH] Fortran: reject empty derived type with bind(C) attribute [PR101577]

2025-03-03 Thread Andre Vehreschild

Hi Harald,

in +++ b/gcc/fortran/symbol.cc
@@ -4624,12 +4624,28 @@ verify_bind_c_derived_type (gfc_symbol *derived_sym)

there is

+  else if (!pedantic)
+   gfc_warning (0, "Derive ...

To me the "not pedantic" is counter-intuitive. In pedantic mode I would have
expected this to be at least a warning (if not an error). Why is it not flagged
at all? May be I expect something wrong from "pedantic".

Besides that: Looks good to me.

Regards,
Andre

On Sun, 2 Mar 2025 22:35:47 +0100
Harald Anlauf  wrote:

> Dear all,
>
> due to an oversight in the Fortran standard before 2018,
> empty derived types with bind(C) attribute were explicitly
> (deliberately?) accepted by gfortran, giving a warning that
> the companion processor might not provide an interoperating
> entity.
>
> In the PR, Tobias pointed to a discussion on the J3 ML that
> there was a defect in older standards.  The attached patch
> now generates an error when -std=f20xx is specified, and
> continues to generate a warning otherwise.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Thanks,
> Harald
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH] RISC-V: Using O2 instead of O1 in testsuites when using -fdump-ext_dce

2025-03-03 Thread Liao Shihua

The pass ext-dce is only activated at O2 and above. Using O2 instead of O1 
in testsuites when using -fdump-ext_dce.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/core_list_init.c: Using -O2 instead of -O1.
* gcc.target/riscv/pr111384.c: Ditto.

---
 gcc/testsuite/gcc.target/riscv/core_list_init.c | 2 +-
 gcc/testsuite/gcc.target/riscv/pr111384.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/core_list_init.c 
b/gcc/testsuite/gcc.target/riscv/core_list_init.c
index 2f36dae85aa7..be341fa6b456 100644
--- a/gcc/testsuite/gcc.target/riscv/core_list_init.c
+++ b/gcc/testsuite/gcc.target/riscv/core_list_init.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-rtl-ext_dce" } */
+/* { dg-options "-O2 -fdump-rtl-ext_dce" } */
 /* { dg-final { scan-rtl-dump {Successfully transformed} "ext_dce" } } */
 
 unsigned short
diff --git a/gcc/testsuite/gcc.target/riscv/pr111384.c 
b/gcc/testsuite/gcc.target/riscv/pr111384.c
index a4e77d4aeb64..7bf36a403611 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111384.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111384.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-rtl-ext_dce" } */
+/* { dg-options "-O2 -fdump-rtl-ext_dce" } */
 /* { dg-final { scan-rtl-dump {Successfully transformed} "ext_dce" } } */
 
 void
-- 
2.43.0

Re: [PATCH] gimple: sccopy: Prune removed statements from SCCs [PR117919]

2025-03-03 Thread Filip Kastl

Hi Richard,

I almost forgot that the issue is also present on GCC 14.  Can I backport to
releases/gcc-14 branch?

Thanks,
Filip Kastl

On Fri 2025-02-28 17:46:42, Richard Biener wrote:
> 
> 
> > Am 28.02.2025 um 17:02 schrieb Filip Kastl :
> > 
> > Hi,
> > 
> > bootstrapped and regtested on x86_64 linux.  Ok to be pushed?
> 
> Ok
> 
> Richard 
> 
> > Thanks,
> > Filip Kastl
> > 
> > 
> > -- 8< --
> > 
> > 
> > While writing the sccopy pass I didn't realize that 'replace_uses_by ()' can
> > remove portions of the CFG.  This happens when replacing arguments of some
> > statement results in the removal of an EH edge.  Because of this sccopy can
> > then work with GIMPLE statements that aren't part of the IR anymore.  In
> > PR117919 this triggered an assertion within the pass which assumes that
> > statements the pass works with are reachable.
> > 
> > This patch tells the pass to notice when a statement isn't in the IR anymore
> > and remove it from it's worklist.
> > 
> >PR tree-optimization/117919
> > 
> > gcc/ChangeLog:
> > 
> >* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Prune
> >statements that 'replace_uses_by ()' removed.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >* g++.dg/pr117919.C: New test.
> > 
> > Signed-off-by: Filip Kastl 
> > ---
> > gcc/gimple-ssa-sccopy.cc| 13 +
> > gcc/testsuite/g++.dg/pr117919.C | 52 +
> > 2 files changed, 65 insertions(+)
> > create mode 100644 gcc/testsuite/g++.dg/pr117919.C
> > 
> > diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> > index 9f25fbaff36..7ffb5718ab6 100644
> > --- a/gcc/gimple-ssa-sccopy.cc
> > +++ b/gcc/gimple-ssa-sccopy.cc
> > @@ -568,6 +568,19 @@ scc_copy_prop::propagate ()
> > {
> >   vec scc = worklist.pop ();
> > 
> > +  /* When we do 'replace_scc_by_value' it may happen that some EH edges
> > + get removed.  That means parts of CFG get removed.  Those may
> > + contain copy statements.  For that reason we prune SCCs here.  */
> > +  unsigned i;
> > +  for (i = 0; i < scc.length (); i++)
> > +if (gimple_bb (scc[i]) == NULL)
> > +  scc.unordered_remove (i);
> > +  if (scc.is_empty ())
> > +{
> > +  scc.release ();
> > +  continue;
> > +}
> > +
> >   auto_vec inner;
> >   hash_set outer_ops;
> >   tree last_outer_op = NULL_TREE;
> > diff --git a/gcc/testsuite/g++.dg/pr117919.C 
> > b/gcc/testsuite/g++.dg/pr117919.C
> > new file mode 100644
> > index 000..fa2d9c9cd1e
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pr117919.C
> > @@ -0,0 +1,52 @@
> > +/* PR tree-optimization/117919 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fno-tree-forwprop -fnon-call-exceptions 
> > --param=early-inlining-insns=192 -std=c++20" } */
> > +
> > +char _M_p, _M_construct___beg;
> > +struct _Alloc_hider {
> > +  _Alloc_hider(char);
> > +};
> > +long _M_string_length;
> > +void _M_destroy();
> > +void _S_copy_chars(char *, char *, char *) noexcept;
> > +char _M_local_data();
> > +struct Trans_NS___cxx11_basic_string {
> > +  _Alloc_hider _M_dataplus;
> > +  bool _M_is_local() {
> > +if (_M_local_data())
> > +  if (_M_string_length)
> > +return true;
> > +return false;
> > +  }
> > +  void _M_dispose() {
> > +if (!_M_is_local())
> > +  _M_destroy();
> > +  }
> > +  char *_M_construct___end;
> > +  Trans_NS___cxx11_basic_string(Trans_NS___cxx11_basic_string &)
> > +  : _M_dataplus(0) {
> > +struct _Guard {
> > +  ~_Guard() { _M_guarded->_M_dispose(); }
> > +  Trans_NS___cxx11_basic_string *_M_guarded;
> > +} __guard0;
> > +_S_copy_chars(&_M_p, &_M_construct___beg, _M_construct___end);
> > +  }
> > +};
> > +namespace filesystem {
> > +struct path {
> > +  path();
> > +  Trans_NS___cxx11_basic_string _M_pathname;
> > +};
> > +} // namespace filesystem
> > +struct FileWriter {
> > +  filesystem::path path;
> > +  FileWriter() : path(path) {}
> > +};
> > +struct LanguageFileWriter : FileWriter {
> > +  LanguageFileWriter(filesystem::path) {}
> > +};
> > +int
> > +main() {
> > +  filesystem::path output_file;
> > +  LanguageFileWriter writer(output_file);
> > +}
> > --
> > 2.47.1
> >

Re: [PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-03-03 Thread Richard Sandiford

Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, March 3, 2025 10:12 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH]AArch64: force operand to fresh register to avoid subreg
>> issues [PR118892]
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > When the input is already a subreg and we try to make a paradoxical
>> > subreg out of it for copysign this can fail if it violates the sugreg
>> 
>> subreg
>> 
>> > relationship.
>> >
>> > Use force_lowpart_subreg instead of lowpart_subreg to then force the
>> > results to a register instead of ICEing.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/118892
>> >* config/aarch64/aarch64.md (copysign3): Use
>> >force_lowpart_subreg instead of lowpart_subreg.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/118892
>> >* gcc.target/aarch64/copysign-pr118892.c: New test.
>> >
>> > ---
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> > index
>> cfe730f3732ce45c914b30a908851a4a7dd77c0f..62be9713cf417922b3c06e38f
>> 12f401872751fa2 100644
>> > --- a/gcc/config/aarch64/aarch64.md
>> > +++ b/gcc/config/aarch64/aarch64.md
>> > @@ -7479,8 +7479,8 @@ (define_expand "copysign3"
>> >&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
>> >  {
>> >emit_insn (gen_ior3 (
>> > -  lowpart_subreg (mode, operands[0], mode),
>> > -  lowpart_subreg (mode, operands[1], mode),
>> > +  force_lowpart_subreg (mode, operands[0],
>> mode),
>> > +  force_lowpart_subreg (mode, operands[1],
>> mode),
>> 
>> force_lowpart_subreg conditionally forces the SUBREG_REG into a new temporary
>> register and then takes the subreg of that.  It's therefore only appropriate
>> for source operands, not destination operands.
>> 
>> It's true that the same problem could in principle occur for the
>> destination, but that would need to be fixed in a different way.
>> 
>
> Ah, true. Should have thought about it a bit more.
>
>> OK with just the operands[1] change, without the operands[0] change.
>> 
>
> I forgot to ask if OK for GCC 14 backport after some stew.

Yeah, ok for GCC 14 too.  The force_lowpart_subreg function hasn't been
backported to GCC 14 yet, but I think it should be (as part of this patch).
Other backportable fixes rely on it too.

Thanks,
Richard

>
> Thanks,
> Tamar
>> Thanks,
>> Richard
>> 
>> 
>> >v_bitmask));
>> >DONE;
>> >  }
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
>> b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
>> > new file mode 100644
>> > index
>> ..adfa30dc3e2db895af4f205
>> 7bdd1011fdb7d4537
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
>> > @@ -0,0 +1,11 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-Ofast" } */
>> > +
>> > +double l();
>> > +double f()
>> > +{
>> > +  double t6[2] = {l(), l()};
>> > +  double t7[2];
>> > +  __builtin_memcpy(&t7, &t6, sizeof(t6));
>> > +  return -__builtin_fabs(t7[1]);
>> > +}

Re: [PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-03-03 Thread Christophe Lyon

On Mon, 3 Mar 2025 at 12:29, Richard Sandiford
 wrote:
>
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Monday, March 3, 2025 10:12 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> ; ktkac...@gcc.gnu.org
> >> Subject: Re: [PATCH]AArch64: force operand to fresh register to avoid 
> >> subreg
> >> issues [PR118892]
> >>
> >> Tamar Christina  writes:
> >> > Hi All,
> >> >
> >> > When the input is already a subreg and we try to make a paradoxical
> >> > subreg out of it for copysign this can fail if it violates the sugreg
> >>
> >> subreg
> >>
> >> > relationship.
> >> >
> >> > Use force_lowpart_subreg instead of lowpart_subreg to then force the
> >> > results to a register instead of ICEing.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >PR target/118892
> >> >* config/aarch64/aarch64.md (copysign3): Use
> >> >force_lowpart_subreg instead of lowpart_subreg.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >PR target/118892
> >> >* gcc.target/aarch64/copysign-pr118892.c: New test.
> >> >
> >> > ---
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64.md 
> >> > b/gcc/config/aarch64/aarch64.md
> >> > index
> >> cfe730f3732ce45c914b30a908851a4a7dd77c0f..62be9713cf417922b3c06e38f
> >> 12f401872751fa2 100644
> >> > --- a/gcc/config/aarch64/aarch64.md
> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> > @@ -7479,8 +7479,8 @@ (define_expand "copysign3"
> >> >&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
> >> >  {
> >> >emit_insn (gen_ior3 (
> >> > -  lowpart_subreg (mode, operands[0], mode),
> >> > -  lowpart_subreg (mode, operands[1], mode),
> >> > +  force_lowpart_subreg (mode, operands[0],
> >> mode),
> >> > +  force_lowpart_subreg (mode, operands[1],
> >> mode),
> >>
> >> force_lowpart_subreg conditionally forces the SUBREG_REG into a new 
> >> temporary
> >> register and then takes the subreg of that.  It's therefore only 
> >> appropriate
> >> for source operands, not destination operands.
> >>
> >> It's true that the same problem could in principle occur for the
> >> destination, but that would need to be fixed in a different way.
> >>
> >
> > Ah, true. Should have thought about it a bit more.
> >
> >> OK with just the operands[1] change, without the operands[0] change.
> >>
> >
> > I forgot to ask if OK for GCC 14 backport after some stew.
>
> Yeah, ok for GCC 14 too.  The force_lowpart_subreg function hasn't been
> backported to GCC 14 yet, but I think it should be (as part of this patch).
> Other backportable fixes rely on it too.
>

Looks like I was too conservative when I backported my fix for PR
114801, and chose not to include force_lowpart_subreg.

Should my backport be updated to match trunk once force_lowpart_subreg
is backported to gcc-14 too?

(see https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673048.html)

Thanks,

Christophe

> Thanks,
> Richard
>
> >
> > Thanks,
> > Tamar
> >> Thanks,
> >> Richard
> >>
> >>
> >> >v_bitmask));
> >> >DONE;
> >> >  }
> >> > diff --git a/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> >> b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> >> > new file mode 100644
> >> > index
> >> ..adfa30dc3e2db895af4f205
> >> 7bdd1011fdb7d4537
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/aarch64/copysign-pr118892.c
> >> > @@ -0,0 +1,11 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-Ofast" } */
> >> > +
> >> > +double l();
> >> > +double f()
> >> > +{
> >> > +  double t6[2] = {l(), l()};
> >> > +  double t7[2];
> >> > +  __builtin_memcpy(&t7, &t6, sizeof(t6));
> >> > +  return -__builtin_fabs(t7[1]);
> >> > +}

Re: [PATCH v3] aarch64: Ignore target pragmas while defining intrinsics

2025-03-03 Thread Richard Sandiford

Andrew Carlotti  writes:
> Compared to v2, this splits out the alignment switching into a new class and
> merges the rest of the switching functionality into aarch64_target_switcher,
> as agreed with Richard in the previous review discussion.
>
> Bootstrapped and regression tested on aarch64. Is this ok for master?
>
> ---
>
> Refactor the switcher classes into two separate classes:
>
> - sve_alignment_switcher takes the alignment switching functionality,
>   and is used only for ABI correctness when defining sve structure
>   types.
> - aarch64_target_switcher takes the rest of the functionality of
>   aarch64_simd_switcher and sve_switcher, and gates simd/sve specific
>   parts upon the specified feature flags.
>
> Additionally, aarch64_target_switcher now adds dependencies of the
> specified flags (which adds +fcma and +bf16 to some intrinsic
> declarations), and unsets current_target_pragma.
>
> This last change fixes an internal bug where we would sometimes add a
> user specified target pragma (stored in current_target_pragma) on top of
> an internally specified target architecture while initialising
> intrinsics with `#pragma GCC aarch64 "arm_*.h"`.  As far as I can tell, this
> has no visible impact at the moment.  However, the unintended target
> feature combinations lead to unwanted behaviour in an under-development
> patch.
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (struct aarch64_extension_info): Add field.
>   (aarch64_get_required_features): New.
>   * config/aarch64/aarch64-builtins.cc
>   (aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
>   (aarch64_target_switcher::aarch64_target_switcher): ...this,
>   and extend to handle sve, nosimd and target pragmas.
>   (aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
>   (aarch64_target_switcher::~aarch64_target_switcher): ...this,
>   and extend to handle sve, nosimd and target pragmas.
>   (handle_arm_acle_h): Use aarch64_target_switcher.
>   (handle_arm_neon_h): Rename switcher and pass explicit flags.
>   (aarch64_general_init_builtins): Ditto.
>   * config/aarch64/aarch64-protos.h
>   (class aarch64_simd_switcher): Rename to...
>   (class aarch64_target_switcher): ...this, and add new members.
>   (aarch64_get_required_features): New prototype.
>   * config/aarch64/aarch64-sve-builtins.cc
>   (sve_switcher::sve_switcher): Delete
>   (sve_switcher::~sve_switcher): Delete
>   (sve_alignment_switcher::sve_alignment_switcher): New
>   (sve_alignment_switcher::~sve_alignment_switcher): New
>   (register_builtin_types): Use alignment switcher
>   (init_builtins): Rename switcher.
>   (handle_arm_sve_h): Ditto.
>   (handle_arm_neon_sve_bridge_h): Ditto.
>   (handle_arm_sme_h): Ditto.
>   * config/aarch64/aarch64-sve-builtins.h
>   (class sve_switcher): Delete.
>   (class sme_switcher): Delete.
>   (class sve_alignment_switcher): New.

OK, thanks.  Personally I think we should keep the sve_alignment_switcher
at function scope (in handle_arm_sve_h), for two reasons:

(a) Nothing in arm_sve.h should be affected by -fpack-struct, so it seems
safer/more future-proof to apply it to the whole header.

(b) Even the reduced scope isn't precise, since it includes vectors as
well as structures.

So I'd slightly prefer the patch in that form (pre-approved).  The patch
is still OK as posted though.

Richard

> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> ef4458fb69308d2bb6785e97be5be85226cf0ebb..500bf784983d851c54ea4ec59cf3cad29e5e309e
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -157,6 +157,8 @@ struct aarch64_extension_info
>aarch64_feature_flags flags_on;
>/* If this feature is turned off, these bits also need to be turned off.  
> */
>aarch64_feature_flags flags_off;
> +  /* If this feature remains enabled, these bits must also remain enabled.  
> */
> +  aarch64_feature_flags flags_required;
>  };
>  
>  /* ISA extensions in AArch64.  */
> @@ -164,9 +166,10 @@ static constexpr aarch64_extension_info all_extensions[] 
> =
>  {
>  #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
>{NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
> -   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
> +   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
> +   feature_deps::IDENT ().enable},
>  #include "config/aarch64/aarch64-option-extensions.def"
> -  {NULL, 0, 0, 0}
> +  {NULL, 0, 0, 0, 0}
>  };
>  
>  struct aarch64_arch_info
> @@ -204,6 +207,18 @@ static constexpr aarch64_processor_info all_cores[] =
>{NULL, aarch64_no_cpu, aarch64_no_arch, 0}
>  };
>  
> +/* Return the set of feature flags that are required to be enabled when the
> +   features in FLAGS are enable

C++ patch ping (Re: [PATCH] c++: Apply/diagnose attributes when instatiating ARRAY/POINTER/REFERENCE_TYPE [PR118787])

2025-03-03 Thread Jakub Jelinek

Hi!

On Tue, Feb 11, 2025 at 07:04:31PM +0100, Jakub Jelinek wrote:
> The following testcase IMO in violation of the P2552R3 paper doesn't
> pedwarn on alignas applying to dependent types or alignas with dependent
> argument.
> 
> tsubst was just ignoring TYPE_ATTRIBUTES.
> 
> The following patch fixes it for the POINTER/REFERENCE_TYPE and
> ARRAY_TYPE cases, but perhaps we need to do the same also for other
> types (INTEGER_TYPE/REAL_TYPE and the like).  I guess I'll need to
> construct more testcases.

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675531.html
patch.

Thanks.

> 2025-02-11  Jakub Jelinek  
> 
>   PR c++/118787
>   * pt.cc (tsubst) : Use return t; only if it doesn't
>   have any TYPE_ATTRIBUTES.  Call apply_late_template_attributes.
>   : Likewise.  Formatting fix.
> 
>   * g++.dg/cpp0x/alignas22.C: New test.

Jakub

Re: [1/3 PATCH]AArch64: add support for partial modes to last extractions [PR118464]

2025-03-03 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> The last extraction instructions work full both full and partial SVE vectors,
> however we currrently only define them for FULL vectors.
>
> Early break code for VLA now however requires partial vector support, which
> relies on extract_last support.
>
> I have not added any new testcases as they overlap with the existing Early
> break tests which now fail without this.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * config/aarch64/aarch64-sve.md (@extract__,
>   @fold_extract__,
>   @aarch64_fold_extract_vector__): Change SVE_FULL to
>   SVE_ALL/
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> e975286a01904bec0b283b7ba4afde6f0fd60bf1..6c0be3c1a51449274720175b5e6e7d7535928de6
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3107,7 +3107,7 @@ (define_insn "@extract__"
>[(set (match_operand: 0 "register_operand")
>   (unspec:
> [(match_operand: 1 "register_operand")
> -(match_operand:SVE_FULL 2 "register_operand")]
> +(match_operand:SVE_ALL 2 "register_operand")]
> LAST))]
>"TARGET_SVE"
>{@ [ cons: =0 , 1   , 2  ]

It looks like this will use (say):

  lasta b, pg, z.b

for VNx4QI, is that right?  I don't think that's safe, since the .b form
treats all bits of the pg input as significant, whereas only one in every
four bits of pg is defined for VNx4BI (the predicate associated with VNx4QI).

I think converting these patterns to partial vectors means operating
on containers rather than elements.  E.g. the VNx4QI case should use
.s rather than .b.  That should just be a case of changing vwcore to
vccore and Vetype to Vctype, but I haven't looked too deeply.

Thanks,
Richard

> @@ -8899,7 +8899,7 @@ (define_insn "@fold_extract__"
>   (unspec:
> [(match_operand: 1 "register_operand")
>  (match_operand: 2 "register_operand")
> -(match_operand:SVE_FULL 3 "register_operand")]
> +(match_operand:SVE_ALL 3 "register_operand")]
> CLAST))]
>"TARGET_SVE"
>{@ [ cons: =0 , 1 , 2   , 3  ]
> @@ -8909,11 +8909,11 @@ (define_insn "@fold_extract__"
>  )
>  
>  (define_insn "@aarch64_fold_extract_vector__"
> -  [(set (match_operand:SVE_FULL 0 "register_operand")
> - (unspec:SVE_FULL
> -   [(match_operand:SVE_FULL 1 "register_operand")
> +  [(set (match_operand:SVE_ALL 0 "register_operand")
> + (unspec:SVE_ALL
> +   [(match_operand:SVE_ALL 1 "register_operand")
>  (match_operand: 2 "register_operand")
> -(match_operand:SVE_FULL 3 "register_operand")]
> +(match_operand:SVE_ALL 3 "register_operand")]
> CLAST))]
>"TARGET_SVE"
>{@ [ cons: =0 , 1 , 2   , 3  ]

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Richard Sandiford

Jakub Jelinek  writes:
> Hi!
>
> The following testcase is miscompiled on powerpc64le-linux starting with
> r15-6777.
> That change has the if (HONOR_NANS (GET_MODE (XEXP (op0, 0 all = 15;
> lines which work fine if the comparisons use MODE_FLOAT or MODE_INT operands
> (or say MODE_VECTOR* etc.).  But on this testcase on ppc64le during combine
> we see
> (set (reg:SI 134)
> (ior:SI (ge:SI (reg:CCFP 128)
> (const_int 0 [0]))
> (lt:SI (reg:CCFP 128)
> (const_int 0 [0]
> HONOR_NANS is obviously false on CCFPmode, because MODE_HAS_NANS is false,
> it isn't FLOAT_MODE_P.  But still it is a MODE_CC mode used for floating
> point comparisons and so we need to consider the possibility of unordered
> operands.
> I'm not sure how we could look at the setter of those MODE_CC regs
> from the simplifiers, after all they can happen in the middle of combiner
> trying to combine multiple instructions.
> So, instead the following patch attempts to be conservative for MODE_CC
> with some exceptions.  One is flag_finite_math_only, regardless of
> MODE_HAS_NANS in that case HONOR_NANS will be always false.
> Another one is for targets which provide REVERSE_CONDITION condition
> and reverse one way the floating point MODE_CC modes and another the
> integral ones.  If REVERSE_CONDITION for GT gives LE, then unordered
> is not an option.  And finally it searches if there are any scalar floating
> point modes with MODE_HAS_NANS at all, if not, it is also safe to
> assume there are no NaNs.
>
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
> aarch64-linux and bootstrapped on s390x-linux (regtest there still pending).
>
> Ok for trunk?
>
> Or any other ideas how to handle this?

I think we should instead go back to punting on comparisons whose inputs
are CC modes, as we did (indirectly, via comparison_code_valid_for_mode)
before r15-6777.  Sorry, I'd forgotten/hadn't thought to exclude CC modes
explicitly when removing that function.

Richard

>
> 2025-02-24  Jakub Jelinek  
>
>   PR rtl-optimization/119002
>   * simplify-rtx.cc: Include tm_p.h.
>   (simplify_context::simplify_logical_relational_operation): Set
>   all = 15 also if op0's first operand has MODE_CC mode and it
>   is or could be floating point comparison which honors NaNs.
>
>   * gcc.c-torture/execute/ieee/pr119002.c: New test.
>
> --- gcc/simplify-rtx.cc.jj2025-01-15 08:43:39.611918569 +0100
> +++ gcc/simplify-rtx.cc   2025-02-24 21:16:09.980758481 +0100
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
>  #include "selftest-rtl.h"
>  #include "rtx-vector-builder.h"
>  #include "rtlanal.h"
> +#include "tm_p.h"
>  
>  /* Simplification and canonicalization of RTL.  */
>  
> @@ -2675,6 +2676,24 @@ simplify_context::simplify_logical_relat
>/* See whether the operands might be unordered.  */
>if (HONOR_NANS (GET_MODE (XEXP (op0, 0
>   all = 15;
> +  else if (GET_MODE_CLASS (GET_MODE (XEXP (op0, 0))) == MODE_CC
> +&& !flag_finite_math_only)
> + {
> +   /* HONOR_NANS will be false for MODE_CC comparisons, eventhough
> +  they could actually be floating point.  If the mode is
> +  reversible, ask the backend if it could be unordered, otherwise
> +  err on the side of caution and assume it could be unordered
> +  if any supported floating mode honors NaNs.  */
> +   machine_mode mode = GET_MODE (XEXP (op0, 0));
> +   if (!REVERSIBLE_CC_MODE (mode)
> +   || REVERSE_CONDITION (GT, mode) != LE)
> + FOR_EACH_MODE_IN_CLASS (mode, MODE_FLOAT)
> +   if (HONOR_NANS (mode))
> + {
> +   all = 15;
> +   break;
> + }
> + }
>mask0 = comparison_to_mask (code0) & all;
>mask1 = comparison_to_mask (code1) & all;
>  }
>
> --- gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c.jj2025-02-24 
> 21:18:45.880622627 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/ieee/pr119002.c   2025-02-24 
> 21:19:02.418396051 +0100
> @@ -0,0 +1,23 @@
> +/* PR rtl-optimization/119002 */
> +
> +__attribute__((noipa)) unsigned int
> +foo (void *x, float y, float z)
> +{
> +  unsigned int a, b;
> +  float c, d, e;
> +  c = y;
> +  d = z;
> +  a = c < d;
> +  d = y;
> +  e = z;
> +  b = d >= e;
> +  a |= b;
> +  return a;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo ((void *) 0, 0.f, __builtin_nanf ("")))
> +__builtin_abort ();
> +}
>
>   Jakub

[PATCH] ipa-cp: Avoid ICE when redistributing nodes among edges to recursive clones (PR 118318)

2025-03-03 Thread Martin Jambor

Hi,

PR 118318 reported an ICE during PGO build of Firefox when IPA-CP, in
the final stages of update_counts_for_self_gen_clones where it
attempts to guess how to distribute profile count among clones created
for recursive edges and the various edges that are created in the
process.  If one such edge has profile count of kind GUESSED_GLOBAL0,
the compatibility check in the operator+ will lead to an ICE.  After
discussing the situation with Honza, we concluded that there is little
more we can do other than check for this situation before touching the
edge count, so this is what this patch does.

Bootstrapped and LTO-profile-bootstrapped and tested on x86_64.  OK for
master?  (Should I then backport this to active release branches?  I
guess it would make sense.)

Thanks,

Martin


gcc/ChangeLog:

2025-02-28  Martin Jambor  

PR ipa/118318
* ipa-cp.cc (adjust_clone_incoming_counts): Add a compatible_p check.
---
 gcc/ipa-cp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 68959f2677b..a63463c2906 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -4599,7 +4599,8 @@ adjust_clone_incoming_counts (cgraph_node *node,
cs->count = cs->count.combine_with_ipa_count (sum);
   }
 else if (!desc->processed_edges->contains (cs)
-&& cs->caller->clone_of == desc->orig)
+&& cs->caller->clone_of == desc->orig
+&& cs->count.compatible_p (desc->count))
   {
cs->count += desc->count;
if (dump_file)
-- 
2.47.1

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Mon, Mar 03, 2025 at 01:02:07PM +, Richard Sandiford wrote:
>> ...how about something like this?  Completely untested, and I haven't
>> thought about it much.  Just didn't want to hold up the discussion.
>
> Works for me.
>
> Just wonder if there is anything that will actually verify that XEXP (op0, 0)
> and XEXP (op1, 0) modes are at least from the same class, rather than say
> have one of the comparisons in MODE_CC and another in MODE_INT or vice versa
> or whatever other modes.

There's:

  if (!(rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
&& rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1
return 0;

But I suppose there's the (separate) question of whether (const_int 0)
can be the first operand and the CC value the second.  AIUI that ought
to be canonicalised to the opposite order, but if we're relying on the
order for correctness, perhaps we should check.

Thanks,
Richard

>
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -2655,6 +2655,7 @@ 
>> simplify_context::simplify_logical_relational_operation (rtx_code code,
>>  
>>enum rtx_code code0 = GET_CODE (op0);
>>enum rtx_code code1 = GET_CODE (op1);
>> +  machine_mode cmp_mode = GET_MODE (XEXP (op0, 0));
>>  
>>/* Assume at first that the comparisons are on integers, and that the
>>   operands are therefore ordered.  */
>> @@ -2672,8 +2673,10 @@ 
>> simplify_context::simplify_logical_relational_operation (rtx_code code,
>>  }
>>else
>>  {
>> -  /* See whether the operands might be unordered.  */
>> -  if (HONOR_NANS (GET_MODE (XEXP (op0, 0
>> +  /* See whether the operands might be unordered.  Assume that all
>> + results are possible for CC modes, and punt later if don't get
>> + an all-true or all-false answer.  */
>> +  if (GET_MODE_CLASS (cmp_mode) == MODE_CC || HONOR_NANS (cmp_mode))
>>  all = 15;
>>mask0 = comparison_to_mask (code0) & all;
>>mask1 = comparison_to_mask (code1) & all;
>> @@ -2702,6 +2705,9 @@ 
>> simplify_context::simplify_logical_relational_operation (rtx_code code,
>>  code = mask_to_unsigned_comparison (mask);
>>else
>>  {
>> +  if (GET_MODE_CLASS (cmp_mode) == MODE_CC)
>> +return 0;
>> +
>>code = mask_to_comparison (mask);
>>/* LTGT and NE are arithmetically equivalent for ordered operands,
>>   with NE being the canonical choice.  */
>
>   Jakub

Re: [3/3 PATCH v4]middle-end: delay checking for alignment to load [PR118464]

2025-03-03 Thread Richard Biener

On Fri, 28 Feb 2025, Tamar Christina wrote:

> Hi All,
> 
> This fixes two PRs on Early break vectorization by delaying the safety checks 
> to
> vectorizable_load when the VF, VMAT and vectype are all known.
> 
> This patch does add two new restrictions:
> 
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>group sizes, as they are unaligned every n % 2 iterations and so may cross
>a page unwittingly.
> 
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization 
> if
>we cannot peel for alignment, as the alignment requirement is quite large 
> at
>GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
>don't support it for now.
> 
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
> 
> As a fall-back, when the alignment fails we require partial vector support.
> 
> For VLA targets like SVE return element alignment as the desired vector
> alignment.  This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
> 
> So what I think needs to happen in GCC 16 is that.
> 
> 1. during vect_compute_data_ref_alignment we need to take the max of
>POLY_VALUE_MIN and vector_alignment.
> 
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
>check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use 
> as a
>proxy for pagesize.
> 
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>vect_determine_partial_vectors_and_peeling since the first iteration has to
>be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>vectorize.
> 
> 4. Create a default mask to be used, so that 
> vect_use_loop_mask_for_alignment_p
>becomes true and we generate the peeled check through loop control for
>partial loops.  From what I can tell this won't work for
>LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
>all in the compiler.  That would need to be done independently from the
>above.
> 
> In any case, not GCC 15 material so I've kept the WIP patches I have 
> downstream.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>   checks.
>   (vect_compute_data_ref_alignment): Remove alignment checks and move to
>   get_load_store_type, increase group access alignment.
>   (vect_enhance_data_refs_alignment): Add note to comment needing
>   investigating.
>   (vect_analyze_data_refs_alignment): Likewise.
>   (vect_supportable_dr_alignment): For group loads look at first DR.
>   * tree-vect-stmts.cc (get_load_store_type):
>   Perform safety checks for early break pfa.
>   * tree-vectorizer.h (dr_set_safe_speculative_read_required,
>   dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
>   (need_peeling_for_alignment): Renamed to...
>   (safe_speculative_read_required): .. This
>   (class dr_vec_info): Add scalar_access_known_in_bounds.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>   load type is relaxed later.
>   * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>   * gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
>   * gcc.dg/vect/vect-early-break_128.c: Likewise.
>   * gcc.dg/vect/vect-early-break_26.c: Likewise.
>   * gcc.dg/vect/vect-early-break_43.c: Likewise.
>   * gcc.dg/vect/vect-early-break_44.c: Likewise.
>   * gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
>   * gcc.dg/vect/vect-early-break_7.c: Likewise.
>   * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>   * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>   * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
>   * gcc.dg/vect/vect-early-break_53.c: Likewise.
>   * gcc.dg/vect/vect-early-break_5

C patch ping

2025-03-03 Thread Jakub Jelinek

Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675704.html
and
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675765.html
patches (both for PR117178, -Wunterminated-string-initialization
regressions and attempts to mitigate it).

Thanks.

Jakub

Re: [PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-03-03 Thread Richard Sandiford

Christophe Lyon  writes:
> On Mon, 3 Mar 2025 at 12:29, Richard Sandiford
>  wrote:
>>
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Monday, March 3, 2025 10:12 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> >> ; ktkac...@gcc.gnu.org
>> >> Subject: Re: [PATCH]AArch64: force operand to fresh register to avoid 
>> >> subreg
>> >> issues [PR118892]
>> >>
>> >> Tamar Christina  writes:
>> >> > Hi All,
>> >> >
>> >> > When the input is already a subreg and we try to make a paradoxical
>> >> > subreg out of it for copysign this can fail if it violates the sugreg
>> >>
>> >> subreg
>> >>
>> >> > relationship.
>> >> >
>> >> > Use force_lowpart_subreg instead of lowpart_subreg to then force the
>> >> > results to a register instead of ICEing.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >> >
>> >> > Ok for master?
>> >> >
>> >> > Thanks,
>> >> > Tamar
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> >PR target/118892
>> >> >* config/aarch64/aarch64.md (copysign3): Use
>> >> >force_lowpart_subreg instead of lowpart_subreg.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> >PR target/118892
>> >> >* gcc.target/aarch64/copysign-pr118892.c: New test.
>> >> >
>> >> > ---
>> >> >
>> >> > diff --git a/gcc/config/aarch64/aarch64.md 
>> >> > b/gcc/config/aarch64/aarch64.md
>> >> > index
>> >> cfe730f3732ce45c914b30a908851a4a7dd77c0f..62be9713cf417922b3c06e38f
>> >> 12f401872751fa2 100644
>> >> > --- a/gcc/config/aarch64/aarch64.md
>> >> > +++ b/gcc/config/aarch64/aarch64.md
>> >> > @@ -7479,8 +7479,8 @@ (define_expand "copysign3"
>> >> >&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
>> >> >  {
>> >> >emit_insn (gen_ior3 (
>> >> > -  lowpart_subreg (mode, operands[0], mode),
>> >> > -  lowpart_subreg (mode, operands[1], mode),
>> >> > +  force_lowpart_subreg (mode, operands[0],
>> >> mode),
>> >> > +  force_lowpart_subreg (mode, operands[1],
>> >> mode),
>> >>
>> >> force_lowpart_subreg conditionally forces the SUBREG_REG into a new 
>> >> temporary
>> >> register and then takes the subreg of that.  It's therefore only 
>> >> appropriate
>> >> for source operands, not destination operands.
>> >>
>> >> It's true that the same problem could in principle occur for the
>> >> destination, but that would need to be fixed in a different way.
>> >>
>> >
>> > Ah, true. Should have thought about it a bit more.
>> >
>> >> OK with just the operands[1] change, without the operands[0] change.
>> >>
>> >
>> > I forgot to ask if OK for GCC 14 backport after some stew.
>>
>> Yeah, ok for GCC 14 too.  The force_lowpart_subreg function hasn't been
>> backported to GCC 14 yet, but I think it should be (as part of this patch).
>> Other backportable fixes rely on it too.
>>
>
> Looks like I was too conservative when I backported my fix for PR
> 114801, and chose not to include force_lowpart_subreg.
>
> Should my backport be updated to match trunk once force_lowpart_subreg
> is backported to gcc-14 too?
>
> (see https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673048.html)

One for Richrd E, but FWIW, either way sounds ok to me.

Thanks,
Richard

[PATCH] tree-optimization/119057 - bogus double reduction detection

2025-03-03 Thread Richard Biener

We are detecting a cycle as double reduction where the inner loop
cycle has extra out-of-loop uses.  This clashes at least with
assumptions from the SLP discovery code which says the cycle
isn't reachable from another SLP instance.  It also was not intended
to support this case, in fact with GCC 14 we seem to generate wrong
code here.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/119057
* tree-vect-loop.cc (check_reduction_path): Add argument
specifying whether we're analyzing the inner loop of a
double reduction.  Do not allow extra uses outside of the
double reduction cycle in this case.
(vect_is_simple_reduction): Adjust.

* gcc.dg/vect/pr119057.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr119057.c | 19 +++
 gcc/tree-vect-loop.cc| 12 +++-
 2 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr119057.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr119057.c 
b/gcc/testsuite/gcc.dg/vect/pr119057.c
new file mode 100644
index 000..582bb8ff86c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr119057.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-vrp -fno-tree-forwprop" } */
+
+int a, b, c, d;
+unsigned e;
+static void f(void)
+{
+  unsigned h;
+  for (d = 0; d < 2; d++)
+b |= e;
+  h = b;
+  c |= h;
+}
+int main()
+{
+  for (; a; a++)
+f();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b279ebe2793..dc15b955aad 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4044,7 +4044,8 @@ needs_fold_left_reduction_p (tree type, code_helper code)
 static bool
 check_reduction_path (dump_user_location_t loc, loop_p loop, gphi *phi,
  tree loop_arg, code_helper *code,
- vec > &path)
+ vec > &path,
+ bool inner_loop_of_double_reduc)
 {
   auto_bitmap visited;
   tree lookfor = PHI_RESULT (phi);
@@ -4181,7 +4182,8 @@ pop:
  break;
}
   /* Check there's only a single stmt the op is used on.  For the
-not value-changing tail and the last stmt allow out-of-loop uses.
+not value-changing tail and the last stmt allow out-of-loop uses,
+but not when this is the inner loop of a double reduction.
 ???  We could relax this and handle arbitrary live stmts by
 forcing a scalar epilogue for example.  */
   imm_use_iterator imm_iter;
@@ -4216,7 +4218,7 @@ pop:
}
}
  else if (!is_gimple_debug (op_use_stmt)
-  && (*code != ERROR_MARK
+  && ((*code != ERROR_MARK || inner_loop_of_double_reduc)
   || flow_bb_inside_loop_p (loop,
 gimple_bb (op_use_stmt
FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
@@ -4238,7 +4240,7 @@ check_reduction_path (dump_user_location_t loc, loop_p 
loop, gphi *phi,
 {
   auto_vec > path;
   code_helper code_;
-  return (check_reduction_path (loc, loop, phi, loop_arg, &code_, path)
+  return (check_reduction_path (loc, loop, phi, loop_arg, &code_, path, false)
  && code_ == code);
 }
 
@@ -4449,7 +4451,7 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   auto_vec > path;
   code_helper code;
   if (check_reduction_path (vect_location, loop, phi, latch_def, &code,
-   path))
+   path, inner_loop_of_double_reduc))
 {
   STMT_VINFO_REDUC_CODE (phi_info) = code;
   if (code == COND_EXPR && !nested_in_vect_loop)
-- 
2.43.0

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Jakub Jelinek

On Mon, Mar 03, 2025 at 01:02:07PM +, Richard Sandiford wrote:
> ...how about something like this?  Completely untested, and I haven't
> thought about it much.  Just didn't want to hold up the discussion.

Works for me.

Just wonder if there is anything that will actually verify that XEXP (op0, 0)
and XEXP (op1, 0) modes are at least from the same class, rather than say
have one of the comparisons in MODE_CC and another in MODE_INT or vice versa
or whatever other modes.

> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2655,6 +2655,7 @@ simplify_context::simplify_logical_relational_operation 
> (rtx_code code,
>  
>enum rtx_code code0 = GET_CODE (op0);
>enum rtx_code code1 = GET_CODE (op1);
> +  machine_mode cmp_mode = GET_MODE (XEXP (op0, 0));
>  
>/* Assume at first that the comparisons are on integers, and that the
>   operands are therefore ordered.  */
> @@ -2672,8 +2673,10 @@ 
> simplify_context::simplify_logical_relational_operation (rtx_code code,
>  }
>else
>  {
> -  /* See whether the operands might be unordered.  */
> -  if (HONOR_NANS (GET_MODE (XEXP (op0, 0
> +  /* See whether the operands might be unordered.  Assume that all
> +  results are possible for CC modes, and punt later if don't get
> +  an all-true or all-false answer.  */
> +  if (GET_MODE_CLASS (cmp_mode) == MODE_CC || HONOR_NANS (cmp_mode))
>   all = 15;
>mask0 = comparison_to_mask (code0) & all;
>mask1 = comparison_to_mask (code1) & all;
> @@ -2702,6 +2705,9 @@ simplify_context::simplify_logical_relational_operation 
> (rtx_code code,
>  code = mask_to_unsigned_comparison (mask);
>else
>  {
> +  if (GET_MODE_CLASS (cmp_mode) == MODE_CC)
> + return 0;
> +
>code = mask_to_comparison (mask);
>/* LTGT and NE are arithmetically equivalent for ordered operands,
>with NE being the canonical choice.  */

Jakub

[PATCH] tree-optimization/119096 - bogus conditional reduction vectorization

2025-03-03 Thread Richard Biener

When we vectorize a .COND_ADD reduction and apply the single-use-def
cycle optimization we can end up chosing the wrong else value for
subsequent .COND_ADD.  The following rectifies this.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/119096
* tree-vect-loop.cc (vect_transform_reduction): Use the
correct else value for .COND_fn.

* gcc.dg/vect/pr119096.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr119096.c | 21 +
 gcc/tree-vect-loop.cc|  2 +-
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr119096.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr119096.c 
b/gcc/testsuite/gcc.dg/vect/pr119096.c
new file mode 100644
index 000..2c03a593683
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr119096.c
@@ -0,0 +1,21 @@
+#include "tree-vect.h"
+
+long __attribute__((noipa))
+sum(int* A, int* B)
+{
+long total = 0;
+for(int j = 0; j < 16; j++)
+if((A[j] > 0) & (B[j] > 0))
+total += (long)A[j];
+return total;
+}
+int main()
+{
+  int A[16] = { 1,1,1,1, 1,1,1,1, 1,1,1,1, 1,1,1,1 };
+  int B[16] = { };
+  check_vect ();
+  if (sum (A, B) != 0)
+abort ();
+  return 0;
+}
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index dc15b955aad..52533623cab 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9064,7 +9064,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
new_stmt = gimple_build_call_internal (internal_fn (code),
   op.num_ops,
   vop[0], vop[1], vop[2],
-  vop[1]);
+  vop[reduc_index]);
  else
new_stmt = gimple_build_assign (vec_dest, tree_code (op.code),
vop[0], vop[1], vop[2]);
-- 
2.43.0

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Jakub Jelinek

On Mon, Mar 03, 2025 at 12:20:00PM +, Richard Sandiford wrote:
> I think we should instead go back to punting on comparisons whose inputs
> are CC modes, as we did (indirectly, via comparison_code_valid_for_mode)
> before r15-6777.  Sorry, I'd forgotten/hadn't thought to exclude CC modes
> explicitly when removing that function.

I believe that is not what was the case before r15-6777.
We punted simply because comparison_to_mask returned for GE 6, for LT it
returned 8, 6 | 8 is not 15, no optimization.
There wasn't this all = 14 vs. 15 thing.
comparison_code_valid_for_mode is actually checking mode which is the mode
in which IOR is performed, e.g. SImode in the testcase.

So, do you want a simpler
  if (GET_MODE (XEXP (op0, 0)) == MODE_CC
  || HONOR_NANS (GET_MODE (XEXP (op0, 0
all = 15;
or
  if ((!INTEGRAL_MODE_P (GET_MODE (XEXP (op0, 0)))
   && !FLOAT_MODE_P (GET_MODE (XEXP (op0, 0)))
   && !VECTOR_MODE_P (GET_MODE (XEXP (op0, 0
  || HONOR_NANS (GET_MODE (XEXP (op0, 0
all = 15;
or something else?

Jakub

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Mon, Mar 03, 2025 at 12:20:00PM +, Richard Sandiford wrote:
>> I think we should instead go back to punting on comparisons whose inputs
>> are CC modes, as we did (indirectly, via comparison_code_valid_for_mode)
>> before r15-6777.  Sorry, I'd forgotten/hadn't thought to exclude CC modes
>> explicitly when removing that function.
>
> I believe that is not what was the case before r15-6777.
> We punted simply because comparison_to_mask returned for GE 6, for LT it
> returned 8, 6 | 8 is not 15, no optimization.
> There wasn't this all = 14 vs. 15 thing.
> comparison_code_valid_for_mode is actually checking mode which is the mode
> in which IOR is performed, e.g. SImode in the testcase.

Ah, right.  But like I said in the covering note, that choice of mode
seemed to be unintentional (since it should never be a floating-point
mode, and even if it were, the mode of the IOR wouldn't affect whether
something like ORDERED is valid).

So I still think that punting (returning 0) on CC modes would be safer.
We just don't have enough information to tell what a CCmode value represents.

If that seems too conservative, and in particular if we want to preserve
the old "all true" optimisation, then...

> So, do you want a simpler
>   if (GET_MODE (XEXP (op0, 0)) == MODE_CC
>   || HONOR_NANS (GET_MODE (XEXP (op0, 0
> all = 15;
> or
>   if ((!INTEGRAL_MODE_P (GET_MODE (XEXP (op0, 0)))
>&& !FLOAT_MODE_P (GET_MODE (XEXP (op0, 0)))
>&& !VECTOR_MODE_P (GET_MODE (XEXP (op0, 0
>   || HONOR_NANS (GET_MODE (XEXP (op0, 0
> all = 15;
> or something else?

...how about something like this?  Completely untested, and I haven't
thought about it much.  Just didn't want to hold up the discussion.

Thanks,
Richard


diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index c478bd060fc..d20aa518a64 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2655,6 +2655,7 @@ simplify_context::simplify_logical_relational_operation 
(rtx_code code,
 
   enum rtx_code code0 = GET_CODE (op0);
   enum rtx_code code1 = GET_CODE (op1);
+  machine_mode cmp_mode = GET_MODE (XEXP (op0, 0));
 
   /* Assume at first that the comparisons are on integers, and that the
  operands are therefore ordered.  */
@@ -2672,8 +2673,10 @@ simplify_context::simplify_logical_relational_operation 
(rtx_code code,
 }
   else
 {
-  /* See whether the operands might be unordered.  */
-  if (HONOR_NANS (GET_MODE (XEXP (op0, 0
+  /* See whether the operands might be unordered.  Assume that all
+results are possible for CC modes, and punt later if don't get
+an all-true or all-false answer.  */
+  if (GET_MODE_CLASS (cmp_mode) == MODE_CC || HONOR_NANS (cmp_mode))
all = 15;
   mask0 = comparison_to_mask (code0) & all;
   mask1 = comparison_to_mask (code1) & all;
@@ -2702,6 +2705,9 @@ simplify_context::simplify_logical_relational_operation 
(rtx_code code,
 code = mask_to_unsigned_comparison (mask);
   else
 {
+  if (GET_MODE_CLASS (cmp_mode) == MODE_CC)
+   return 0;
+
   code = mask_to_comparison (mask);
   /* LTGT and NE are arithmetically equivalent for ordered operands,
 with NE being the canonical choice.  */

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Jakub Jelinek

On Mon, Mar 03, 2025 at 01:46:20PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Mon, Mar 03, 2025 at 01:02:07PM +, Richard Sandiford wrote:
> >> ...how about something like this?  Completely untested, and I haven't
> >> thought about it much.  Just didn't want to hold up the discussion.
> >
> > Works for me.
> >
> > Just wonder if there is anything that will actually verify that XEXP (op0, 
> > 0)
> > and XEXP (op1, 0) modes are at least from the same class, rather than say
> > have one of the comparisons in MODE_CC and another in MODE_INT or vice versa
> > or whatever other modes.
> 
> There's:
> 
>   if (!(rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
>   && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1
> return 0;

You're right, and rtx_equal_p returns false for GET_MODE differences.

Will you test your patch (with the testcase from my patch) or should I?

Jakub

[PING] [PATCH] c++: Fix checking assert upon invalid class definition [PR116740]

2025-03-03 Thread Simon Martin

Hi,

On 18 Feb 2025, at 14:00, Simon Martin wrote:

> A checking assert triggers upon the following invalid code since
> GCC 11:
>
> === cut here ===
> class { a (struct b;
> } struct b
> === cut here ===
>
> The problem is that during error recovery, we call
> set_identifier_type_value_with_scope for B in the global namespace, 
> and
> the checking assert added via r11-7228-g8f93e1b892850b fails.
>
> This patch relaxes that assert to not fail if we've seen a parser 
> error
> (it a generalization of another fix done to that checking assert via
> r11-7266-g24bf79f1798ad1).
>
> Successfully tested on x86_64-pc-linux-gnu.
Friendly ping.

Thanks! Simon

>   PR c++/116740
>
> gcc/cp/ChangeLog:
>
>   * name-lookup.cc (set_identifier_type_value_with_scope): Don't
>   fail assert with ill-formed input.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/parse/crash80.C: New test.
>
> ---
>  gcc/cp/name-lookup.cc| 6 ++
>  gcc/testsuite/g++.dg/parse/crash80.C | 7 +++
>  2 files changed, 9 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/parse/crash80.C
>
> diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> index d1abb205bc7..742e5d289dc 100644
> --- a/gcc/cp/name-lookup.cc
> +++ b/gcc/cp/name-lookup.cc
> @@ -5101,10 +5101,8 @@ set_identifier_type_value_with_scope (tree id, 
> tree decl, cp_binding_level *b)
>if (b->kind == sk_namespace)
>  /* At namespace scope we should not see an identifier type value. 
>  */
>  gcc_checking_assert (!REAL_IDENTIFIER_TYPE_VALUE (id)
> -  /* We could be pushing a friend underneath a template
> - parm (ill-formed).  */
> -  || (TEMPLATE_PARM_P
> -  (TYPE_NAME (REAL_IDENTIFIER_TYPE_VALUE (id);
> +  /* But we might end up here with ill-formed input.  */
> +  || seen_error ());
>else
>  {
>/* Push the current type value, so we can restore it later  */
> diff --git a/gcc/testsuite/g++.dg/parse/crash80.C 
> b/gcc/testsuite/g++.dg/parse/crash80.C
> new file mode 100644
> index 000..cd9216adf5c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/parse/crash80.C
> @@ -0,0 +1,7 @@
> +// PR c++/116740
> +// { dg-do "compile" }
> +
> +class K {
> +  int a(struct b; // { dg-error "expected '\\)'" }
> +};
> +struct b {};
> -- 
> 2.44.0

[PING] [PATCH] c++: Use capture from outer lambda, if any, instead of erroring out [PR110584]

2025-03-03 Thread Simon Martin

Hi,

On 18 Feb 2025, at 11:12, Simon Martin wrote:

> We've been rejecting this valid code since r8-4571:
>
> === cut here ===
> void foo (float);
> int main () {
>   constexpr float x = 0;
>   (void) [&] () {
> foo (x);
> (void) [] () {
>   foo (x);
> };
>   };
> }
> === cut here ===
>
> The problem is that when processing X in the inner lambda,
> process_outer_var_ref errors out even though it does find the capture
> from the enclosing lambda.
>
> This patch changes process_outer_var_ref to accept and return the 
> outer
> proxy if it finds any.
>
> Successfully tested on x86_64-pc-linux-gnu.
Friendly ping.

Thanks! Simon

>   PR c++/110584
>
> gcc/cp/ChangeLog:
>
>   * semantics.cc (process_outer_var_ref): Use capture from
>   enclosing lambda, if any.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp0x/lambda/lambda-nested10.C: New test.
>
> ---
>  gcc/cp/semantics.cc   |  4 ++
>  .../g++.dg/cpp0x/lambda/lambda-nested10.C | 46 
> +++
>  2 files changed, 50 insertions(+)
>  create mode 100644 
> gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C
>
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 7c7d3e3c432..7bbc82f7dc1 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -4598,6 +4598,10 @@ process_outer_var_ref (tree decl, 
> tsubst_flags_t complain, bool odr_use)
>if (!odr_use && context == containing_function)
>  decl = add_default_capture (lambda_stack,
>   /*id=*/DECL_NAME (decl), initializer);
> +  /* When doing lambda capture, if we found a capture in an enclosing 
> lambda,
> + we can use it.  */
> +  else if (!odr_use && is_capture_proxy (decl))
> +return decl;
>/* Only an odr-use of an outer automatic variable causes an
>   error, and a constant variable can decay to a prvalue
>   constant without odr-use.  So don't complain yet.  */
> diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C 
> b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C
> new file mode 100644
> index 000..2dd9dd4955e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested10.C
> @@ -0,0 +1,46 @@
> +// PR c++/110584
> +// { dg-do "run" { target c++11 } }
> +
> +void foo (int i) {
> +  if (i != 0)
> +__builtin_abort ();
> +}
> +
> +int main () {
> +  const int x = 0;
> +
> +  // We would error out on this.
> +  (void) [&] () {
> +foo (x);
> +(void)[] () {
> +  foo (x);
> +};
> +  } ();
> +  // As well as those.
> +  (void) [&] () {
> +(void) [] () {
> +  foo (x);
> +};
> +  } ();
> +  (void) [&x] () {
> +(void) [] () {
> +  foo (x);
> +};
> +  } ();
> +  // But those would work already.
> +  (void) [] () {
> +(void) [&] () {
> +  foo (x);
> +};
> +  } ();
> +  (void) [&] () {
> +(void) [&] () {
> +  foo (x);
> +};
> +  } ();
> +  (void) [=] () {
> +(void) [] () {
> +  foo (x);
> +};
> +  } ();
> +}
> -- 
> 2.44.0

[PING] [PATCH] c++: Don't replace INDIRECT_REFs by a const capture proxy too eagerly [PR117504]

2025-03-03 Thread Simon Martin

Hi,

On 14 Feb 2025, at 18:08, Simon Martin wrote:

> We have been miscompiling the following valid code since GCC8, and
> r8-3497-g281e6c1d8f1b4c
>
> === cut here ===
> struct span {
>   span (const int (&__first)[1]) : _M_ptr (__first) {}
>   int operator[] (long __i) { return _M_ptr[__i]; }
>   const int *_M_ptr;
> };
> void foo () {
>   constexpr int a_vec[]{1};
>   auto vec{[&a_vec]() -> span { return a_vec; }()};
> }
> === cut here ===
>
> The problem is that perform_implicit_conversion_flags (via
> mark_rvalue_use) replaces "a_vec" in the return statement by a
> CONSTRUCTOR representing a_vec's constant value, and then takes its
> address when invoking span's constructor. So we end up with an 
> instance
> that points to garbage instead of a_vec's storage.
>
> I've tried many things to somehow recover from this replacement, but I
> actually think we should not do it when converting to a class type: we
> have no idea whether the conversion will involve a constructor taking 
> an
> address or reference. So we should assume it's the case, and call
> mark_lvalue_use, not mark_rvalue_use (I might very weel be overseeing
> things, and feedback is more than welcome).
>
> This is what the patch does, successfully tested on 
> x86_64-pc-linux-gnu.
Friendly ping.

Thanks! Simon

>   PR c++/117504
>
> gcc/cp/ChangeLog:
>
>   * call.cc (perform_implicit_conversion_flags): When possibly
>   converting to a class, call mark_lvalue_use.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp2a/constexpr-117504.C: New test.
>   * g++.dg/cpp2a/constexpr-117504a.C: New test.
>
> ---
>  gcc/cp/call.cc|  4 ++
>  gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C | 60 
> +++
>  .../g++.dg/cpp2a/constexpr-117504a.C  | 12 
>  3 files changed, 76 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C
>
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index 38a8f7fdcda..097b1fa55a4 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -13973,6 +13973,10 @@ perform_implicit_conversion_flags (tree type, 
> tree expr,
>
>if (TYPE_REF_P (type))
>  expr = mark_lvalue_use (expr);
> +  else if (MAYBE_CLASS_TYPE_P (type))
> +/* We might convert using a constructor that takes the address of 
> EXPR, so
> +   assume that it will be the case.  */
> +expr = mark_lvalue_use (expr);
>else
>  expr = mark_rvalue_use (expr);
>
> diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C 
> b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C
> new file mode 100644
> index 000..290d3dfd61e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504.C
> @@ -0,0 +1,60 @@
> +// PR c++/117504 - Initial report
> +// { dg-do "run" { target c++20 } }
> +
> +struct span {
> +  span (const int (&__first)[1]) : _M_ptr (__first) {}
> +  int operator[] (long __i) { return _M_ptr[__i]; }
> +  const int *_M_ptr;
> +};
> +
> +constexpr int a_global_vec[]{1};
> +span myFunctor() {
> +  return a_global_vec;
> +}
> +
> +int main() {
> +  constexpr int a_vec[]{1};
> +
> +  //
> +  // This PR's case, that used to be miscompiled.
> +  //
> +  auto lambda_1 = [&a_vec] () -> span { return a_vec; };
> +  auto vec_1 { lambda_1 () };
> +  if (vec_1[0] != 1)
> +__builtin_abort ();
> +
> +  // Variant that used to be miscompiled as well.
> +  auto lambda_2 = [&] () -> span { return a_vec; };
> +  auto vec_2 { lambda_2 () };
> +  if (vec_2[0] != 1)
> +__builtin_abort ();
> +
> +  //
> +  // Related cases that worked already.
> +  //
> +  auto lambda_3 = [&a_vec] () /* -> span */ { return a_vec; };
> +  auto vec_3 { lambda_3 () };
> +  if (vec_3[0] != 1)
> +__builtin_abort ();
> +
> +  auto lambda_4 = [&] () /* -> span */ { return a_vec; };
> +  auto vec_4 { lambda_4 () };
> +  if (vec_4[0] != 1)
> +__builtin_abort ();
> +
> +  const int (&vec_5)[1] = a_vec;
> +  if (vec_5[0] != 1)
> +__builtin_abort ();
> +
> +  span vec_6 (a_vec);
> +  if (vec_6[0] != 1)
> +__builtin_abort ();
> +
> +  auto vec_7 = myFunctor ();
> +  if (vec_7[0] != 1)
> +__builtin_abort ();
> +
> +  const int (&vec_8)[1] { a_vec };
> +  if (vec_8[0] != 1)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C 
> b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C
> new file mode 100644
> index 000..f6d4dc8cbc5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117504a.C
> @@ -0,0 +1,12 @@
> +// PR c++/117504 - ICE discovered by ppalka@ when reducing.
> +// { dg-do "compile" { target c++20 } }
> +
> +struct span {
> +  span (const int* __first) : _M_ptr (__first) {}
> +  int operator[] (long __i) { return _M_ptr[__i]; }
> +  const int *_M_ptr;
> +};
> +int main() {
> +  constexpr int a_vec[]{1};
> +  auto vec { [&a_vec]() -> span { return a_vec; } () };
> +}
> -- 
> 2.44.0

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation [PR119002]

2025-03-03 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Mon, Mar 03, 2025 at 01:46:20PM +, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > On Mon, Mar 03, 2025 at 01:02:07PM +, Richard Sandiford wrote:
>> >> ...how about something like this?  Completely untested, and I haven't
>> >> thought about it much.  Just didn't want to hold up the discussion.
>> >
>> > Works for me.
>> >
>> > Just wonder if there is anything that will actually verify that XEXP (op0, 
>> > 0)
>> > and XEXP (op1, 0) modes are at least from the same class, rather than say
>> > have one of the comparisons in MODE_CC and another in MODE_INT or vice 
>> > versa
>> > or whatever other modes.
>> 
>> There's:
>> 
>>   if (!(rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
>>  && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1
>> return 0;
>
> You're right, and rtx_equal_p returns false for GET_MODE differences.
>
> Will you test your patch (with the testcase from my patch) or should I?

I'm just about to test it with a tweak to the mode check.  Should be
done in a couple of hours.

Thanks,
Richard

RE: [3/3 PATCH v4]middle-end: delay checking for alignment to load [PR118464]

2025-03-03 Thread Tamar Christina

> >/* For now assume all conditional loads/stores support unaligned
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..b661dd400e5826fc1c4f70
> 957b335d1741fa 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info  *vinfo,
> stmt_vec_info stmt_info,
> >return false;
> >  }
> >
> > +  /* If this DR needs alignment for correctness, we must ensure the target
> > + alignment is a constant power-of-two multiple of the amount read per
> > + vector iteration or force masking.  */
> > +  if (dr_safe_speculative_read_required (stmt_info))
> > +{
> > +  /* We can only peel for loops, of course.  */
> > +  gcc_checking_assert (loop_vinfo);
> > +
> > +  /* Check if we support the operation if early breaks are needed.  
> > Here we
> > +must ensure that we don't access any more than the scalar code would
> > +have.  A masked operation would ensure this, so for these load types
> > +force masking.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > + && (*memory_access_type == VMAT_GATHER_SCATTER
> > + || *memory_access_type == VMAT_STRIDED_SLP))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"early break not supported: cannot peel for "
> > +"alignment. With non-contiguous memory
> vectorization"
> > +" could read out of bounds at %G ",
> > +STMT_VINFO_STMT (stmt_info));
> > + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > +   }
> > +
> > +  auto target_alignment
> > +   = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> > +  unsigned HOST_WIDE_INT target_align;
> > +  bool inbounds
> > +   = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> > +
> > +  /* If the scalar loop is known to be in bounds, and we're using 
> > scalar
> > +accesses then there's no need to check further.  */
> > +  if (inbounds
> > + && *memory_access_type == VMAT_ELEMENTWISE)
> > +   {
> > + *alignment_support_scheme = dr_aligned;
> 
> Nothing should look at *alignment_support_scheme for VMAT_ELEMENTWISE.
> Did you actually need this adjustment?
> 

Yes, bitfields are relaxed a few lines up from contiguous to this:

  if (SLP_TREE_LANES (slp_node) == 1)
{
  *memory_access_type = VMAT_ELEMENTWISE;
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "single-element interleaving not supported 
"
 "for not adjacent vector loads, using "
 "elementwise access\n");
}

This means we then reach:
  if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())

we bail out because the permutes still exist.  The code relaxed the load to 
elements but
never removed the permutes or any associated information.

If the permutes are removed or some other workaround, you then hit

  if (!group_aligned && inbounds)
LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;

Because these aren't group loads.  Because the original load didn't have any 
misalignment they
never needed peeling and as such are dr_unaligned_supported.

So the only way to avoid checking elementwise if by guarding the top level with

  if (dr_safe_speculative_read_required (stmt_info)
  && *alignment_support_scheme == dr_aligned)
{

Instead of just 

  if (dr_safe_speculative_read_required (stmt_info))
{

Which I wasn't sure if it was the right thing to do...  Anyway if I do that I 
can remove...

> > + return true;
> > +   }
> > +
> > +  bool group_aligned = false;
> > +  if (*alignment_support_scheme == dr_aligned
> > + && target_alignment.is_constant (&target_align)
> > + && nunits.is_constant ())
> > +   {
> > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > + auto vectype_size
> > +   = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > + poly_uint64 required_alignment = vf * vectype_size;
> > + /* If we have a grouped access we require that the alignment be N * 
> > elem.
> */
> > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > +   required_alignment *=
> > +   DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > + if (!multiple_p (target_alignment, required_alignment))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"desired alignment %wu not met. Instead got %wu "
> > +"for DR alignment at %G",
> > +required_alignment.to_constant (),
> > +targe

Re: [PATCH] libstdc++: implement tuple protocol for std::complex (P2819R2)

2025-03-03 Thread Jonathan Wakely

On Sat, 1 Mar 2025 at 15:52, Giuseppe D'Angelo
 wrote:
>
> Hello,
>
> The attached patch implements the tuple protocol for std::complex (added
> by P2819R2 for C++26).
>
> Tested on x86-64 Linux. Beware that you need the latest GCC trunk,
> otherwise you'll get an ICE (PR 119045).
>
> It's also on Forge here
>
> https://forge.sourceware.org/gcc/gcc-TEST/pulls/34

Thanks - comments submitted there.

>
> together with a workaround for the ICE (please ignore that, the GCC
> mirror hasn't synced the proper fix just yet.)

And I've just synced it again (I should really set a cronjob to do that).

[PATCH 13/17] LoongArch: Add -m[no-]scq option

2025-03-03 Thread Xi Ruoyao

We'll use the sc.q instruction for some 16-byte atomic operations, but
it's only added in LoongArch 1.1 evolution so we need to gate it with
an option.

gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in (scq): New evolution
feature.
* config/loongarch/loongarch-evolution.cc: Regenerate.
* config/loongarch/loongarch-evolution.h: Regenerate.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-def.cc: Make -mscq the default for
-march=la664 and -march=la64v1.1.
* doc/invoke.texi (LoongArch Options): Document -m[no-]scq.
---
 gcc/config/loongarch/genopts/isa-evolution.in |  1 +
 gcc/config/loongarch/loongarch-def.cc |  4 ++--
 gcc/config/loongarch/loongarch-evolution.cc   |  4 
 gcc/config/loongarch/loongarch-evolution.h|  8 ++--
 gcc/config/loongarch/loongarch-str.h  |  1 +
 gcc/config/loongarch/loongarch.opt|  4 
 gcc/doc/invoke.texi   | 11 ++-
 7 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/genopts/isa-evolution.in 
b/gcc/config/loongarch/genopts/isa-evolution.in
index 50f72d5a0bc..836d93a0038 100644
--- a/gcc/config/loongarch/genopts/isa-evolution.in
+++ b/gcc/config/loongarch/genopts/isa-evolution.in
@@ -2,4 +2,5 @@
 2  26  div32   1.1 Support div.w[u] and mod.w[u] 
instructions with inputs not sign-extended.
 2  27  lam-bh  1.1 Support am{swap/add}[_db].{b/h} 
instructions.
 2  28  lamcas  1.1 Support amcas[_db].{b/h/w/d} 
instructions.
+2  30  scq 1.1 Support sc.q instruction.
 3  23  ld-seq-sa   1.1 Do not need load-load barriers 
(dbar 0x700).
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index 5f235a04ef2..b19720ee066 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -72,7 +72,7 @@ array_arch loongarch_cpu_default_isa =
.simd_ (ISA_EXT_SIMD_LASX)
.evolution_ (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA
 | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS
-| OPTION_MASK_ISA_FRECIPE))
+| OPTION_MASK_ISA_FRECIPE | OPTION_MASK_ISA_SCQ))
 .set (ARCH_LA64V1_0,
  loongarch_isa ()
.base_ (ISA_BASE_LA64)
@@ -86,7 +86,7 @@ array_arch loongarch_cpu_default_isa =
.simd_ (ISA_EXT_SIMD_LSX)
.evolution_ (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA
 | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS
-| OPTION_MASK_ISA_FRECIPE));
+| OPTION_MASK_ISA_FRECIPE | OPTION_MASK_ISA_SCQ));
 
 
 static inline loongarch_cache la464_cache ()
diff --git a/gcc/config/loongarch/loongarch-evolution.cc 
b/gcc/config/loongarch/loongarch-evolution.cc
index de68624f949..a92a6455df6 100644
--- a/gcc/config/loongarch/loongarch-evolution.cc
+++ b/gcc/config/loongarch/loongarch-evolution.cc
@@ -32,6 +32,7 @@ int la_evo_feature_masks[] = {
   OPTION_MASK_ISA_DIV32,
   OPTION_MASK_ISA_LAM_BH,
   OPTION_MASK_ISA_LAMCAS,
+  OPTION_MASK_ISA_SCQ,
   OPTION_MASK_ISA_LD_SEQ_SA,
 };
 
@@ -40,6 +41,7 @@ const char* la_evo_macro_name[] = {
   "__loongarch_div32",
   "__loongarch_lam_bh",
   "__loongarch_lamcas",
+  "__loongarch_scq",
   "__loongarch_ld_seq_sa",
 };
 
@@ -48,6 +50,7 @@ int la_evo_version_major[] = {
   1,/* DIV32 */
   1,/* LAM_BH */
   1,/* LAMCAS */
+  1,/* SCQ */
   1,/* LD_SEQ_SA */
 };
 
@@ -56,5 +59,6 @@ int la_evo_version_minor[] = {
   1,/* DIV32 */
   1,/* LAM_BH */
   1,/* LAMCAS */
+  1,/* SCQ */
   1,/* LD_SEQ_SA */
 };
diff --git a/gcc/config/loongarch/loongarch-evolution.h 
b/gcc/config/loongarch/loongarch-evolution.h
index 5f908394c22..7fb7b0d3d86 100644
--- a/gcc/config/loongarch/loongarch-evolution.h
+++ b/gcc/config/loongarch/loongarch-evolution.h
@@ -36,6 +36,7 @@ static constexpr struct {
   { 2, 1u << 26, OPTION_MASK_ISA_DIV32 },
   { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH },
   { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS },
+  { 2, 1u << 30, OPTION_MASK_ISA_SCQ },
   { 3, 1u << 23, OPTION_MASK_ISA_LD_SEQ_SA },
 };
 
@@ -58,8 +59,9 @@ enum {
   EVO_DIV32 = 1,
   EVO_LAM_BH = 2,
   EVO_LAMCAS = 3,
-  EVO_LD_SEQ_SA = 4,
-  N_EVO_FEATURES = 5
+  EVO_SCQ = 4,
+  EVO_LD_SEQ_SA = 5,
+  N_EVO_FEATURES = 6
 };
 
 /* Condition macros */
@@ -71,6 +73,8 @@ enum {
   (la_target.isa.evolution & OPTION_MASK_ISA_LAM_BH)
 #define ISA_HAS_LAMCAS \
   (la_target.isa.evolution & OPTION_MASK_ISA_LAMCAS)
+#define ISA_HAS_SCQ \
+  (la_target.isa.evolution & OPTION_MASK_ISA_SCQ)
 #define ISA_HAS_LD_SEQ_SA \
   (la_target.isa.evolution & OPTION_MASK_ISA_LD_SEQ_SA)
 
diff --git a/gcc/config/loonga

[PATCH 08/17] LoongArch: Implement subword atomic_fetch_{and, or, xor} with am*.w instructions

2025-03-03 Thread Xi Ruoyao

We can just shift the mask and fill the other bits with 0 (for ior/xor)
or 1 (for and), and use an am*.w instruction to perform the atomic
operation, instead of using a LL-SC loop.

gcc/ChangeLog:

* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND):
Remove.
(UNSPEC_COMPARE_AND_SWAP_XOR): Remove.
(UNSPEC_COMPARE_AND_SWAP_OR): Remove.
(atomic_test_and_set): Rename to ...
(atomic_fetch_): ... this, and
adapt the expansion to use it for any bitwise operations and any
val, instead of just ior 1.
(atomic_test_and_set): New define_expand.
---
 gcc/config/loongarch/sync.md | 177 +++
 1 file changed, 34 insertions(+), 143 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index b3666c0c992..b6acfff3a61 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -24,9 +24,6 @@ (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP_AMCAS
   UNSPEC_COMPARE_AND_SWAP_ADD
   UNSPEC_COMPARE_AND_SWAP_SUB
-  UNSPEC_COMPARE_AND_SWAP_AND
-  UNSPEC_COMPARE_AND_SWAP_XOR
-  UNSPEC_COMPARE_AND_SWAP_OR
   UNSPEC_COMPARE_AND_SWAP_NAND
   UNSPEC_SYNC_OLD_OP
   UNSPEC_SYNC_EXCHANGE
@@ -343,17 +340,18 @@ (define_expand "atomic_compare_and_swap"
   DONE;
 })
 
-(define_expand "atomic_test_and_set"
-  [(match_operand:QI 0 "register_operand" "") ;; bool output
-   (match_operand:QI 1 "memory_operand" "+ZB");; memory
-   (match_operand:SI 2 "const_int_operand" "")]   ;; model
+(define_expand "atomic_fetch_"
+  [(match_operand:SHORT 0 "register_operand" "");; output
+   (any_bitwise (match_operand:SHORT 1 "memory_operand"   "+ZB") ;; memory
+   (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; val
+   (match_operand:SI 3 "const_int_operand" "")] ;; 
model
   ""
 {
-  /* We have no QImode atomics, so use the address LSBs to form a mask,
- then use an aligned SImode atomic.  */
+  /* We have no QI/HImode bitwise atomics, so use the address LSBs to form
+ a mask, then use an aligned SImode atomic.  */
   rtx result = operands[0];
   rtx mem = operands[1];
-  rtx model = operands[2];
+  rtx model = operands[3];
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
   rtx mask = gen_int_mode (-4, Pmode);
   rtx aligned_addr = gen_reg_rtx (Pmode);
@@ -367,7 +365,8 @@ (define_expand "atomic_test_and_set"
   set_mem_alias_set (aligned_mem, 0);
 
   rtx tmp = gen_reg_rtx (SImode);
-  emit_move_insn (tmp, GEN_INT (1));
+  emit_move_insn (tmp, simplify_gen_unary (ZERO_EXTEND, SImode,
+  operands[2], mode));
 
   /* Note that we have defined SHIFT_COUNT_TRUNCATED to 1, so we don't need
  to mask addr with 0b11 here.  */
@@ -378,14 +377,37 @@ (define_expand "atomic_test_and_set"
   rtx word = gen_reg_rtx (SImode);
   emit_move_insn (word, gen_rtx_ASHIFT (SImode, tmp, shmt));
 
+  if ()
+{
+  /* word = word | ~(mode_mask << shmt) */
+  rtx tmp = force_reg (SImode,
+  gen_int_mode (GET_MODE_MASK (mode),
+SImode));
+  emit_move_insn (tmp, gen_rtx_ASHIFT (SImode, tmp, shmt));
+  emit_move_insn (word, gen_rtx_IOR (SImode, gen_rtx_NOT (SImode, tmp),
+word));
+}
+
   tmp = gen_reg_rtx (SImode);
-  emit_insn (gen_atomic_fetch_orsi (tmp, aligned_mem, word, model));
+  emit_insn (gen_atomic_fetch_si (tmp, aligned_mem, word, model));
 
   emit_move_insn (gen_lowpart (SImode, result),
  gen_rtx_LSHIFTRT (SImode, tmp, shmt));
   DONE;
 })
 
+(define_expand "atomic_test_and_set"
+  [(match_operand:QI 0 "register_operand" "") ;; bool output
+   (match_operand:QI 1 "memory_operand" "+ZB");; memory
+   (match_operand:SI 2 "const_int_operand" "")]   ;; model
+  ""
+{
+  rtx one = force_reg (QImode, gen_int_mode (1, QImode));
+  emit_insn (gen_atomic_fetch_orqi (operands[0], operands[1], one,
+   operands[2]));
+  DONE;
+})
+
 (define_insn "atomic_cas_value_cmp_and_7_"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
(match_operand:GPR 1 "memory_operand" "+ZC"))
@@ -524,83 +546,6 @@ (define_insn "atomic_cas_value_sub_7_"
 }
   [(set (attr "length") (const_int 28))])
 
-(define_insn "atomic_cas_value_and_7_"
-  [(set (match_operand:GPR 0 "register_operand" "=&r") 
;; res
-   (match_operand:GPR 1 "memory_operand" "+ZC"))
-   (set (match_dup 1)
-   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ") 
;; mask
- (match_operand:GPR 3 "reg_or_0_operand" "rJ") 
;; inverted_mask
- (match_operand:GPR 4 "reg_or_0_operand"  "rJ")
;; old val
- (match_operand:GPR 5 "reg_or_0_operand"  "rJ")
;; new val
- (match_operand:SI 6 "const_int_operand")]

RE: [3/3 PATCH v4]middle-end: delay checking for alignment to load [PR118464]

2025-03-03 Thread Richard Biener

On Mon, 3 Mar 2025, Tamar Christina wrote:

> > >/* For now assume all conditional loads/stores support unaligned
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..b661dd400e5826fc1c4f70
> > 957b335d1741fa 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2597,6 +2597,128 @@ get_load_store_type (vec_info  *vinfo,
> > stmt_vec_info stmt_info,
> > >return false;
> > >  }
> > >
> > > +  /* If this DR needs alignment for correctness, we must ensure the 
> > > target
> > > + alignment is a constant power-of-two multiple of the amount read per
> > > + vector iteration or force masking.  */
> > > +  if (dr_safe_speculative_read_required (stmt_info))
> > > +{
> > > +  /* We can only peel for loops, of course.  */
> > > +  gcc_checking_assert (loop_vinfo);
> > > +
> > > +  /* Check if we support the operation if early breaks are needed.  
> > > Here we
> > > +  must ensure that we don't access any more than the scalar code would
> > > +  have.  A masked operation would ensure this, so for these load types
> > > +  force masking.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +   && (*memory_access_type == VMAT_GATHER_SCATTER
> > > +   || *memory_access_type == VMAT_STRIDED_SLP))
> > > + {
> > > +   if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > +  "early break not supported: cannot peel for "
> > > +  "alignment. With non-contiguous memory
> > vectorization"
> > > +  " could read out of bounds at %G ",
> > > +  STMT_VINFO_STMT (stmt_info));
> > > +   LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > > + }
> > > +
> > > +  auto target_alignment
> > > + = DR_TARGET_ALIGNMENT (STMT_VINFO_DR_INFO (stmt_info));
> > > +  unsigned HOST_WIDE_INT target_align;
> > > +  bool inbounds
> > > + = DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info));
> > > +
> > > +  /* If the scalar loop is known to be in bounds, and we're using 
> > > scalar
> > > +  accesses then there's no need to check further.  */
> > > +  if (inbounds
> > > +   && *memory_access_type == VMAT_ELEMENTWISE)
> > > + {
> > > +   *alignment_support_scheme = dr_aligned;
> > 
> > Nothing should look at *alignment_support_scheme for VMAT_ELEMENTWISE.
> > Did you actually need this adjustment?
> > 
> 
> Yes, bitfields are relaxed a few lines up from contiguous to this:
> 
> if (SLP_TREE_LANES (slp_node) == 1)
>   {
> *memory_access_type = VMAT_ELEMENTWISE;
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"single-element interleaving not supported 
> "
>"for not adjacent vector loads, using "
>"elementwise access\n");
>   }
> 
> This means we then reach:
>   if (SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> 
> we bail out because the permutes still exist.  The code relaxed the load to 
> elements but
> never removed the permutes or any associated information.
> 
> If the permutes are removed or some other workaround, you then hit
> 
>   if (!group_aligned && inbounds)
>   LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true;
> 
> Because these aren't group loads.  Because the original load didn't have any 
> misalignment they
> never needed peeling and as such are dr_unaligned_supported.
> 
> So the only way to avoid checking elementwise if by guarding the top level 
> with
> 
>   if (dr_safe_speculative_read_required (stmt_info)
>   && *alignment_support_scheme == dr_aligned)
> {
> 
> Instead of just 
> 
>   if (dr_safe_speculative_read_required (stmt_info))
> {
> 
> Which I wasn't sure if it was the right thing to do...  Anyway if I do that I 
> can remove...
> 
> > > +   return true;
> > > + }
> > > +
> > > +  bool group_aligned = false;
> > > +  if (*alignment_support_scheme == dr_aligned
> > > +   && target_alignment.is_constant (&target_align)
> > > +   && nunits.is_constant ())
> > > + {
> > > +   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > > +   auto vectype_size
> > > + = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
> > > +   poly_uint64 required_alignment = vf * vectype_size;
> > > +   /* If we have a grouped access we require that the alignment be N * 
> > > elem.
> > */
> > > +   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > > + required_alignment *=
> > > + DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
> > > +   if (!multiple_p (target_alignment, required_alignment))
> > > + {
> > > +   if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +

[committed v4] aarch64: Ignore target pragmas while defining intrinsics

2025-03-03 Thread Andrew Carlotti

Compared to v3, this version:
- moves the sve_alignment_switcher in handle_arm_sve_h to function scope (and
  fixes an inaccurate changelog message);
- updates affected Makefile dependencies.

The patch was preapproved by Richard with the first change, and the second
change is obvious, so I've committed the below after verifying that it still
works on some SVE intrinsic tests.

---

Refactor the switcher classes into two separate classes:

- sve_alignment_switcher takes the alignment switching functionality,
  and is used only for ABI correctness when defining sve structure
  types.
- aarch64_target_switcher takes the rest of the functionality of
  aarch64_simd_switcher and sve_switcher, and gates simd/sve specific
  parts upon the specified feature flags.

Additionally, aarch64_target_switcher now adds dependencies of the
specified flags (which adds +fcma and +bf16 to some intrinsic
declarations), and unsets current_target_pragma.

This last change fixes an internal bug where we would sometimes add a
user specified target pragma (stored in current_target_pragma) on top of
an internally specified target architecture while initialising
intrinsics with `#pragma GCC aarch64 "arm_*.h"`.  As far as I can tell, this
has no visible impact at the moment.  However, the unintended target
feature combinations lead to unwanted behaviour in an under-development
patch.

This also fixes a missing Makefile dependency, which was due to
aarch64-sve-builtins.o incorrectly depending on the undefined $(REG_H).
The correct $(REGS_H) dependency is added to the switcher's new source
location.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_extension_info): Add field.
(aarch64_get_required_features): New.
* config/aarch64/aarch64-builtins.cc
(aarch64_simd_switcher::aarch64_simd_switcher): Rename to...
(aarch64_target_switcher::aarch64_target_switcher): ...this,
and extend to handle sve, nosimd and target pragmas.
(aarch64_simd_switcher::~aarch64_simd_switcher): Rename to...
(aarch64_target_switcher::~aarch64_target_switcher): ...this,
and extend to handle sve, nosimd and target pragmas.
(handle_arm_acle_h): Use aarch64_target_switcher.
(handle_arm_neon_h): Rename switcher and pass explicit flags.
(aarch64_general_init_builtins): Ditto.
* config/aarch64/aarch64-protos.h
(class aarch64_simd_switcher): Rename to...
(class aarch64_target_switcher): ...this, and add new members.
(aarch64_get_required_features): New prototype.
* config/aarch64/aarch64-sve-builtins.cc
(sve_switcher::sve_switcher): Delete
(sve_switcher::~sve_switcher): Delete
(sve_alignment_switcher::sve_alignment_switcher): New
(sve_alignment_switcher::~sve_alignment_switcher): New
(register_builtin_types): Use alignment switcher
(init_builtins): Rename switcher.
(handle_arm_neon_sve_bridge_h): Ditto.
(handle_arm_sme_h): Ditto.
(handle_arm_sve_h): Ditto, and use alignment switcher.
* config/aarch64/aarch64-sve-builtins.h
(class sve_switcher): Delete.
(class sme_switcher): Delete.
(class sve_alignment_switcher): New.
* config/aarch64/t-aarch64 (aarch64-builtins.o): Add $(REGS_H).
(aarch64-sve-builtins.o): Remove $(REG_H).


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
ef4458fb69308d2bb6785e97be5be85226cf0ebb..500bf784983d851c54ea4ec59cf3cad29e5e309e
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -157,6 +157,8 @@ struct aarch64_extension_info
   aarch64_feature_flags flags_on;
   /* If this feature is turned off, these bits also need to be turned off.  */
   aarch64_feature_flags flags_off;
+  /* If this feature remains enabled, these bits must also remain enabled.  */
+  aarch64_feature_flags flags_required;
 };
 
 /* ISA extensions in AArch64.  */
@@ -164,9 +166,10 @@ static constexpr aarch64_extension_info all_extensions[] =
 {
 #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
-   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
+   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
+   feature_deps::IDENT ().enable},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0}
+  {NULL, 0, 0, 0, 0}
 };
 
 struct aarch64_arch_info
@@ -204,6 +207,18 @@ static constexpr aarch64_processor_info all_cores[] =
   {NULL, aarch64_no_cpu, aarch64_no_arch, 0}
 };
 
+/* Return the set of feature flags that are required to be enabled when the
+   features in FLAGS are enabled.  */
+
+aarch64_feature_flags
+aarch64_get_required_features (aarch64_feature_flags flags)
+{
+  const struct aarch64_extension_info *opt;
+  for (opt = all_extensions

Make ix86_macro_fusion_pair_p and ix86_fuse_mov_alu_p match current CPUs better

2025-03-03 Thread Jan Hubicka

Hi,
The current implementation of fussion predicates misses some common
fussion cases on zen and more recent cores.  I added knobs for
individual conditionals we test.

 1) I split checks for fusing ALU with conditional operands when the ALU
 has memory operand.  This seems to be supported by zen3+ and by
 tigerlake and coperlake (according to Agner Fog's manual)

 2) znver4 and 5 supports fussion of ALU and conditional even if ALU has
memory and immediate operands.
This seems to be relatively important enabling 25% more fusions on
gcc bootstrap.

 3) no CPU supports fusing when ALU contains IP relative memory
references.  I added separate knob so we do not forger about this if
this gets supoorted later.

The patch does not solve the limitation of sched that fuse pairs must be
adjacent on imput and the first operation must be signle-set.  Fixing
single-set is easy (I have separate patch for this), for non-adjacent
pairs we need bigger surgery.

To verify what CPU really does I made simpe test script.

jh@ryzen3:~> cat fuse-test.c
#ifdef IPRELATIVE
int b;
const int z = 0;
const int o = 1;
#endif
int
main()
{
int a = 10;
#ifndef IPRELATIVE
int b;
int z = 0;
int o = 1;
#endif
asm volatile ("\n"
".L1234:\n"
#ifndef FUSE
"nop\n"
#endif
"subl   %3, %0\n"

#ifdef CMP
"movl %0, %1\n"
"cmpl %2, %1\n"
#endif
#ifdef TEST
"movl %0, %1\n"
"test %1, %1\n"
#endif

#ifndef FUSE
"nop\n"
#endif
"jne.L1234":"=a"(a),
#if (defined(MEM) && !defined (CMP)) || defined (MEMIMM)
"=m"(b)
#else
"=r"(b)
#endif
:
#ifdef MEM
"m"(z),
"m"(o),
#else
"i"(0),
"i"(1),
#endif
"0"(a)
);
}
jh@ryzen3:~> cat fuse-test.sh 
EVENT=ex_ret_fused_instr
#EVENT=ex_ret_fus_brnch_inst
dotest()
{
gcc -O2  fuse-test.c $* -o fuse-cmp-imm-mem-nofuse
perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse  2>&1 | grep $EVENT
gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse 
perf stat  -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT
}

echo ALU with immediate
dotest 
echo ALU with memory
dotest -D MEM
echo ALU with IP relative memory
dotest -D MEM -D IPRELATIVE
echo CMP with immediate
dotest -D CMP
echo CMP with memory
dotest -D CMP -D MEM
echo CMP with memory and immediate
dotest -D CMP -D MEMIMM
echo CMP with IP relative memory
dotest -D CMP -D MEM -D IPRELATIVE
echo TEST
dotest -D TEST

On zen5 I get:
ALU with immediate
20,345  ex_ret_fused_instr:u
  
 1,000,020,278  ex_ret_fused_instr:u
  
ALU with memory
20,367  ex_ret_fused_instr:u
  
 1,000,020,290  ex_ret_fused_instr:u
  
ALU with IP relative memory
20,395  ex_ret_fused_instr:u
  
20,403  ex_ret_fused_instr:u
  
CMP with immediate
20,369  ex_ret_fused_instr:u
  
 1,000,020,301  ex_ret_fused_instr:u
  
CMP with memory
20,314  ex_ret_fused_instr:u
  
 1,000,020,341  ex_ret_fused_instr:u
  
CMP with memory and immediate
20,372  ex_ret_fused_instr:u
  
 1,000,020,266  ex_ret_fused_instr:u
  
CMP with IP relative memory
20,382  ex_ret_fused_instr:u
  
20,369  ex_ret_fused_instr:u
  
TEST
20,346  ex_ret_fused_instr:u
  
 1,000,020,301  ex_ret_fused_instr:u
  

IP relative memory seems to not be documented.

On zen3/4 I get:

ALU with immediate
20,263  ex_ret_fused_instr:u
  
 1,000,020,051  ex_ret_fused_instr:u
  
ALU with memory
20,255  ex_ret_fused_instr:u
  
 1,000,020,056  ex_ret_fused_instr:u
  
ALU with IP relative memory
20,253  ex_ret_fused_instr:u
  
20,266  ex_ret_fused_instr:u
  
CMP with immediate
20,264  ex_ret_fused

Re: [Fortran, Patch, PR77872, v1] Fix ICE when getting caf-token from abstract class type.

2025-03-03 Thread Steve Kargl

On Mon, Mar 03, 2025 at 03:58:24PM +0100, Andre Vehreschild wrote:
> 
> attached patches fix a 12-regression, when a caf token is requested from an
> abstract class-typed dummy. The token was not looked up in the correct spot.
> Due the class typed object getting an artificial variable for direct derived
> type access, the get_caf_decl was looking at the wrong decl.
> 
> This patch consists of two parts, the first is just some code complexity
> reduction, where an existing attr is now used instead of checking for BT_CLASS
> type and branching.
> 
> Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> 

Thanks.  OK to commit.

-- 
steve

[committed] arm: remove some redundant zero_extend ops on thumb1

2025-03-03 Thread Richard Earnshaw

The code in gcc.target/unsigned-extend-1.c really should not need an
unsigned extension operations when the optimizers are used.  For Arm
and thumb2 that is indeed the case, but for thumb1 code it gets more
complicated as there are too many instructions for combine to look at.
For thumb1 we end up with two redundant zero_extend patterns which are
not removed: the first after the subtract instruction and the second of
the final boolean result.

We can partially fix this (for the second case above) by adding a new
split pattern for LEU and GEU patterns which work because the two
instructions for the [LG]EU pattern plus the redundant extension
instruction are combined into a single insn, which we can then split
using the 3->2 method back into the two insns of the [LG]EU sequence.

Because we're missing the optimization for all thumb1 cases (not just
those architectures with UXTB), I've adjust the testcase to detect all
the idioms that we might use for zero-extending a value, namely:

   UXTB
   AND ...#255 (in thumb1 this would require a register to hold 255)
   LSL ... #24; LSR ... #24

but I've also marked this test as XFAIL for thumb1 because we can't yet
eliminate the first of the two extend instructions.

gcc/
* config/arm/thumb1.md (split patterns for GEU and LEU): New.

gcc/testsuite:
* gcc.target/arm/unsigned-extend-1.c: Expand check for any
insn suggesting a zero-extend.  XFAIL for thumb1 code.
---
 gcc/config/arm/thumb1.md  | 28 +++
 .../gcc.target/arm/unsigned-extend-1.c|  4 +--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 548c36979f1..f9e89e991d9 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1810,6 +1810,34 @@ (define_insn "thumb1_addsi3_addgeu"
(set_attr "type" "multiple")]
 )
 
+;; Re-split an LEU/GEU sequence if combine tries to oversimplify a 3-plus
+;; insn sequence.  Beware of the early-clobber of operand0
+(define_split
+ [(set (match_operand:SI 0 "s_register_operand")
+   (leu:SI (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")))]
+ "TARGET_THUMB1
+  && !reg_overlap_mentioned_p (operands[0], operands[1])
+  && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (const_int 0))
+  (set (match_dup 0) (plus:SI (plus:SI (match_dup 0) (match_dup 0))
+ (geu:SI (match_dup 2) (match_dup 1]
+ {}
+)
+
+(define_split
+ [(set (match_operand:SI 0 "s_register_operand")
+   (geu:SI (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "thumb1_cmp_operand")))]
+ "TARGET_THUMB1
+  && !reg_overlap_mentioned_p (operands[0], operands[1])
+  && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (const_int 0))
+  (set (match_dup 0) (plus:SI (plus:SI (match_dup 0) (match_dup 0))
+ (geu:SI (match_dup 1) (match_dup 2]
+ {}
+)
+
 
 (define_insn "*thumb_jump"
   [(set (pc)
diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c 
b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
index 3b4ab048fb0..fa3d34400bf 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c
@@ -5,5 +5,5 @@ unsigned char foo (unsigned char c)
 {
   return (c >= '0') && (c <= '9');
 }
-
-/* { dg-final { scan-assembler-not "uxtb" } } */
+/* We shouldn't need any zero-extension idioms here.  */
+/* { dg-final { scan-assembler-not "\t(uxtb|and|lsr|lsl)" { xfail arm_thumb1 } 
} } */
-- 
2.34.1

AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-03 Thread Wilco Dijkstra


Enable the early scheduler on AArch64 for O3/Ofast.  This means GCC15 benefits
from much faster build times with -O2, but avoids the regressions in lbm which
is very sensitive to minor scheduling changes due to long FMA chains.  We can
then revisit this for GCC16.

gcc:
PR target/118351
* common/config/aarch64/aarch64-common.cc: Enable early scheduling with
-O3 and higher.

---

diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
3d694f16d1fd84e142254a4880c91a7f053e72aa..3044336923415d9414b6c66e66d872612ead24cd
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -54,8 +54,10 @@ static const struct default_options 
aarch_option_optimization_table[] =
 { OPT_LEVELS_FAST, OPT_fomit_frame_pointer, NULL, 1 },
 /* Enable -fsched-pressure by default when optimizing.  */
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
-/* Disable early scheduling due to high compile-time overheads.  */
+/* Except for -O3 and higher, disable early scheduling due to high
+   compile-time overheads.  */
 { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
+{ OPT_LEVELS_3_PLUS, OPT_fschedule_insns, NULL, 1 },
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },

libgcc: Remove PREDRES and LS64 from AArch64 cpuinfo

2025-03-03 Thread Wilco Dijkstra

Change AArch64 cpuinfo to follow the latest updates to the FMV spec [1]:
Remove FEAT_PREDRES and FEAT_LS64*.  Preserve the ordering in enum CPUFeatures.

Passes regress, OK for commit?

[1] https://github.com/ARM-software/acle/pull/382

gcc:
* common/config/aarch64/cpuinfo.h: Remove FEAT_PREDRES and FEAT_LS64*.  
* config/aarch64/aarch64-option-extensions.def: Remove FMV support
for PREDRES.

libgcc:
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Remove FEAT_PREDRES and FEAT_LS64* support.

---

diff --git a/gcc/common/config/aarch64/cpuinfo.h 
b/gcc/common/config/aarch64/cpuinfo.h
index 
aff43908e01a685bebe56351d61cb966c3cc9736..cd3c2b20c5315b035870528fa39246bbc780f369
 100644
--- a/gcc/common/config/aarch64/cpuinfo.h
+++ b/gcc/common/config/aarch64/cpuinfo.h
@@ -75,13 +75,13 @@ enum CPUFeatures {
   FEAT_MEMTAG2,
   FEAT_MEMTAG3,
   FEAT_SB,
-  FEAT_PREDRES,
+  FEAT_unused1,
   FEAT_SSBS,
   FEAT_SSBS2,
   FEAT_BTI,
-  FEAT_LS64,
-  FEAT_LS64_V,
-  FEAT_LS64_ACCDATA,
+  FEAT_unused2,
+  FEAT_unused3,
+  FEAT_unused4,
   FEAT_WFXT,
   FEAT_SME_F64,
   FEAT_SME_I64,
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
aa8d315c240fbd25b49008b131cc09f04001eb80..79b79358c5d4a9e23c7601f7a1ba742dddadb778
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -213,7 +213,7 @@ AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
 
 AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
 
-AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
+AARCH64_OPT_EXTENSION("predres", PREDRES, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
 
diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
index 
6b4952ee542e3fdb58f007200aa8690261b1c543..dda9dc696893cd392dd1e15d03672053cc481c6f
 100644
--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -339,25 +339,6 @@ __init_cpu_features_constructor (unsigned long hwcap,
 setCPUFeature(FEAT_SME_I64);
   if (hwcap2 & HWCAP2_SME_F64F64)
 setCPUFeature(FEAT_SME_F64);
-  if (hwcap & HWCAP_CPUID)
-{
-  unsigned long ftr;
-
-  getCPUFeature(ID_AA64ISAR1_EL1, ftr);
-  /* ID_AA64ISAR1_EL1.SPECRES >= 0b0001  */
-  if (extractBits(ftr, 40, 4) >= 0x1)
-   setCPUFeature(FEAT_PREDRES);
-  /* ID_AA64ISAR1_EL1.LS64 >= 0b0001  */
-  if (extractBits(ftr, 60, 4) >= 0x1)
-   setCPUFeature(FEAT_LS64);
-  /* ID_AA64ISAR1_EL1.LS64 >= 0b0010  */
-  if (extractBits(ftr, 60, 4) >= 0x2)
-   setCPUFeature(FEAT_LS64_V);
-  /* ID_AA64ISAR1_EL1.LS64 >= 0b0011  */
-  if (extractBits(ftr, 60, 4) >= 0x3)
-   setCPUFeature(FEAT_LS64_ACCDATA);
-}
-
   if (hwcap & HWCAP_FP)
 {
   setCPUFeature(FEAT_FP);

libatomic: use HWCAPs in AArch64 ifunc tests

2025-03-03 Thread Wilco Dijkstra


Feedback from the kernel team suggests that it's best to only use HWCAPs
rather than also use low-level checks as done by has_lse128() and has_rcpc3().
So change these to just use HWCAPs which simplifies the code and speeds up
ifunc selection by avoiding expensive system register accesses.

Passes regress, OK for commit?

libatomic:
* config/linux/aarch64/host-config.h (has_lse2): Remove unused arg.
(has_lse128): Change to just use HWCAPs.
(has_rcpc3): Likewise.

---

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 
d0d44bf18eaa64437f52c2894da6ece9e02618df..6a4f7014323a2ed196cabe408aaa6df0d2521518
 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -69,7 +69,7 @@ typedef struct __ifunc_arg_t {
 #  elif defined (LSE2_LRCPC3_ATOP)
 #   define IFUNC_NCOND(N)  2
 #   define IFUNC_COND_1(has_rcpc3 (hwcap, features))
-#   define IFUNC_COND_2(has_lse2 (hwcap, features))
+#   define IFUNC_COND_2(has_lse2 (hwcap))
 #  elif defined (LSE128_ATOP)
 #   define IFUNC_NCOND(N)  1
 #   define IFUNC_COND_1(has_lse128 (hwcap, features))
@@ -86,7 +86,7 @@ typedef struct __ifunc_arg_t {
 #define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
 
 static inline bool
-has_lse2 (unsigned long hwcap, const __ifunc_arg_t *features)
+has_lse2 (unsigned long hwcap)
 {
   /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
@@ -105,50 +105,20 @@ has_lse2 (unsigned long hwcap, const __ifunc_arg_t 
*features)
   return false;
 }
 
-/* LSE128 atomic support encoded in ID_AA64ISAR0_EL1.Atomic, bits[23:20].
-   The minimum value for LSE128 is 0b0011.  */
-
-#define AT_FEAT_FIELD(isar0)   (((isar0) >> 20) & 15)
-
 static inline bool
 has_lse128 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
-  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128)
-return true;
-
-  /* If LSE2 and CPUID are supported, check for LSE128.  */
-  if (hwcap & HWCAP_CPUID && hwcap & HWCAP_USCAT)
-{
-  unsigned long isar0;
-  asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
-  return AT_FEAT_FIELD (isar0) >= 3;
-}
-
-  return false;
+  return hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LSE128;
 }
 
-/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].
-   The minimum value for LRCPC3 is 0b0011.  */
-
 static inline bool
 has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features)
 {
   /* LSE2 is a prerequisite for atomic LDIAPP/STILP - check HWCAP_USCAT since
  has_lse2 is more expensive and Neoverse N1 does not have LRCPC3. */
-  if (!(hwcap & HWCAP_USCAT))
-return false;
-
-  if (hwcap & _IFUNC_ARG_HWCAP && features->_hwcap2 & HWCAP2_LRCPC3)
-return true;
-
-  if (hwcap & HWCAP_CPUID)
-{
-  unsigned long isar1;
-  asm volatile ("mrs %0, ID_AA64ISAR1_EL1" : "=r" (isar1));
-  return AT_FEAT_FIELD (isar1) >= 3;
-}
-
-  return false;
+  return (hwcap & HWCAP_USCAT
+ && hwcap & _IFUNC_ARG_HWCAP
+ && features->_hwcap2 & HWCAP2_LRCPC3);
 }
 
 #endif /* HAVE_IFUNC */

Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-03 Thread Andrew Waterman

On Mon, Mar 3, 2025 at 2:37 PM Vineet Gupta  wrote:
>
> Hi Pan, Andrew
>
> I'm trying to understand the semantics of FRM as it intersects with calling
> convention.
> psABI is not explicit about it and refers to C standard [1]
>
> > On 2/14/25 03:39, Li, Pan2 wrote:
>
> [snip]
>
>
> > With option "-march=rv64gcv_zvfh -O3"
> >
> >   10   │ vxrm:
> >   11   │ csrwi   vxrm,2  // Just set rm directly
> > ...
> >   17   │ vle16.v v2,0(a4)
> >   18   │ vle16.v v1,0(a3)
> > ...
> >   21   │ vaaddu.vv   v1,v1,v2
> >   22   │ vse16.v v1,0(a4)
> >   23   │ tailcall_external
> >   28   │ frm:
> >   29   │ frrma2// backup
> >   30   │ fsrmi   2  // set rm
> > ...
> >   35   │ vle16.v v1,0(a3)
> >   36   │ addia5,a5,%lo(bf)
> >   37   │ vfnmadd.vv  v1,v1,v1
> >   38   │ vse16.v v1,0(a5)
> >   39   │ fsrma2   // restore
> >   40   │ tailcall_external
>
> [snip]
>
> > If instead we want to set a global register to a specific local value,
> > the sequence would be:
> >
> > call foo
> > TMP := FIXED_REG
> > FIXED_REG := ...
> > ...use FIXED_REG...
> > FIXED_REG := TMP
> > call bar
> >
> > It sounds like this is the correct sequence for FRM and it seemed to be
> > what the port was generating in the PR.
>
> So from above msg snippets and commit 46a508ec7aee503 and its numerous tests,
> I'm summarizing the following
>
> >1. The static frm before call should not pollute the frm value in call.
>
> In simple terms: Before a call, if FRM is clobbered, it needs to be restored
> before making the call (to "retain the global value")

That sounds right to me.

>
> >2. The updated frm value in call should be sticky after call completed.
>
> After a call (which can potentially set a FRM globally), if the caller 
> clobbers
> FRM, it needs to be restored back (by reading the value right after the call).

That also sounds right to me.

>
> So in some convoluted way both the above scenarios have callee-saved semantics
> for FRM, except for the leaf function which unconditionally sets FRM where 
> this
> save/restore is not done.

I don't follow the last part about leaf functions.  Unless the leaf
function intends to change FRM globally (e.g. the leaf function in
question is fesetround), then, if it changes FRM, it must restore FRM
before returning.


>
> Is the above understanding correct, or is there more to it.
>
> Thx,
> -Vineet
>
> [1] 
> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc
>

Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-03 Thread Vineet Gupta

On 3/3/25 15:18, Andrew Waterman wrote:
>> So in some convoluted way both the above scenarios have callee-saved 
>> semantics
>> for FRM, except for the leaf function which unconditionally sets FRM where 
>> this
>> save/restore is not done.
> I don't follow the last part about leaf functions.  Unless the leaf
> function intends to change FRM globally (e.g. the leaf function in
> question is fesetround), then, if it changes FRM, it must restore FRM
> before returning.

Yeah I didn't know how to articulate  it (and perhaps this still requires
clarification)

Say we have following

// reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-1.c
 main
    set_frm (4);    // orig global FRM update

    test_float_point_frm_run_1 (op1, op2, vl)
   set_frm (0);
   result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
   assert_equal (1, get_frm ())
   // restore global 4 before returning

    assert_equal (4, get_frm ()  <-- here call restores global

vs.

// reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-5.c
main
  set_frm (1);    // orig global FRM update

  test_float_point_frm_run_1 (op1, op2, vl)
      other_function()
           set_frm (2);    // also global update

  assert_equal (2, get_frm ()    <-- here call doesn't restore

There's no explicit annotation or anything about the FRM, yet in 2nd example
other_function () we consider frm write a global event but not in
test_float_point_frm_run_1 () in 1st example: I'm missing words how to explain
that :-)

Thx,
-Vineet

Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-03 Thread Andrew Waterman

On Mon, Mar 3, 2025 at 3:40 PM Vineet Gupta  wrote:
>
> On 3/3/25 15:18, Andrew Waterman wrote:
> >> So in some convoluted way both the above scenarios have callee-saved 
> >> semantics
> >> for FRM, except for the leaf function which unconditionally sets FRM where 
> >> this
> >> save/restore is not done.
> > I don't follow the last part about leaf functions.  Unless the leaf
> > function intends to change FRM globally (e.g. the leaf function in
> > question is fesetround), then, if it changes FRM, it must restore FRM
> > before returning.
>
> Yeah I didn't know how to articulate  it (and perhaps this still requires
> clarification)
>
> Say we have following
>
> // reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-1.c
>  main
> set_frm (4);// orig global FRM update
>
> test_float_point_frm_run_1 (op1, op2, vl)
>set_frm (0);
>result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
>assert_equal (1, get_frm ())
>// restore global 4 before returning
>
> assert_equal (4, get_frm ()  <-- here call restores global
>
> vs.
>
> // reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-5.c
> main
>   set_frm (1);// orig global FRM update
>
>   test_float_point_frm_run_1 (op1, op2, vl)
>   other_function()
>set_frm (2);// also global update
>
>   assert_equal (2, get_frm ()<-- here call doesn't restore
>
> There's no explicit annotation or anything about the FRM, yet in 2nd example
> other_function () we consider frm write a global event but not in
> test_float_point_frm_run_1 () in 1st example: I'm missing words how to explain
> that :-)

Looks like the assumption is that set_frm is locally scoped, as
opposed to e.g. fesetround, which is global.  This doesn't imply that
leaf functions are treated differently than non-leaf ones (which is
good, because they shouldn't be).

>
> Thx,
> -Vineet

[wwwdocs] steering.html

2025-03-03 Thread David Edelsohn

Update my affiliation.

Cheers,
David

diff --git a/htdocs/steering.html b/htdocs/steering.html
index 6039a503..b03ade5a 100644
--- a/htdocs/steering.html
+++ b/htdocs/steering.html
@@ -29,7 +29,7 @@ committee.
 place to reach them is the gcc mailing list.

 
-David Edelsohn (IBM)
+David Edelsohn (Nvidia)
 Kaveh R. Ghazi
 Jeffrey A. Law (Ventana Micro Systems)
 Marc Lehmann (nethype GmbH)

Re: [PING, REFORMAT][PATCH v2, 0/1] libstdc++: Fix localized D_T_FMT %c formatting for [PR117214]

2025-03-03 Thread Jonathan Wakely

On Sat, 1 Mar 2025 at 05:19, XU Kailiang  wrote:
>
> Hello libstdc++ maintainers,
>
> I sent a patch in January, but as it was my first patch, my email client
> was not properly configured so the patch format was broken. So I am
> re-sending it now.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674531.html
>
> Since I do not have commit access, if you find it okay, could you please
> help commit it? Or please let me know if there is something I still need
> to improve.

Thank you for the reminder, I'll review this.

>
> Thank you for your assistance!
>
> Best regards,
> XU Kailiang
>
> XU Kailiang (1):
>   libstdc++: Fix localized D_T_FMT %c formatting for  [PR117214]
>
>  libstdc++-v3/include/bits/chrono_io.h | 35 ++-
>  .../testsuite/std/time/format/pr117214.cc | 32 +
>  2 files changed, 51 insertions(+), 16 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/format/pr117214.cc
>
> --
> 2.48.1
>

[patch, Fortran] Fix PR 119049 and 119074, external prototypes with different arglists

2025-03-03 Thread Thomas Koenig


Hello world,

this patch is a bit more complicated than originally envisioned.

The problem was that we were not handling external dummy arguments
with -fc-prototypes-external. In looking at this, I found that we
were not warning about external procedures with different argument
lists.  This can actually be legal (see the two test cases) but
creates a problem for the C prototypes: If we have something like

subroutine foo(a,n)
  external a
  if (n == 1) call a(1)
  if (n == 2) call a(2,3)
end subroutine foo

then, pre-C23, we could just have written out the prototype as

void foo_ (void (*a) (), int *n);

but this is illegal in C23. What to do?  I finally chose to warn
about the argument mismatch, with a new option. Warn only because the
code above is legal, but include in -Wall because such code seems highly
suspect.  This option is also implied in -fc-prototypes-external. I also
put a warning in the generated header file in that case, so users
have a chance to see what is going on (especially since gcc now
defaults to C23).

Regression-tested.

Comments?  Suggestions for better wordings?  Is -Wall too strong,
should this be -Wextra (but then nobody would see it, probably...)?
OK for trunk?

Best regards

Thomas
gcc/fortran/ChangeLog:

PR fortran/119049
PR fortran/119074
* dump-parse-tree.cc (seen_conflict): New static varaible.
(gfc_dump_external_c_prototypes): Initialize it. If it was
set, write out a warning that -std=c23 will not work.
(write_proc): Move the work of actually writing out the
formal arglist to...
(write_formal_arglist): New function. Handle external dummy
parameters and their argument lists. If there were mismatched
arguments, output an empty argument list in pre-C23 style.
* gfortran.h (struct gfc_symbol): Add ext_dummy_arglist_mismatch
flag and formal_at.
* invoke.texi: Document -Wexternal-argument-mismatch.
* lang.opt: Put it in.
* resolve.cc (resolve_function): If warning about external
argument mismatches, build a formal from actual arglist the
first time around, and later compare and warn.
(resolve_call): Likewise

gcc/testsuite/ChangeLog:

PR fortran/119049
PR fortran/119074
* gfortran.dg/interface_55.f90: New test.
* gfortran.dg/interface_56.f90: New test.

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 7726b708ad8..1a15757b57b 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -4108,6 +4108,8 @@ gfc_dump_c_prototypes (FILE *file)
 
 /* Loop over all external symbols, writing out their declarations.  */
 
+static bool seen_conflict;
+
 void
 gfc_dump_external_c_prototypes (FILE * file)
 {
@@ -4119,6 +4121,7 @@ gfc_dump_external_c_prototypes (FILE * file)
 return;
 
   dumpfile = file;
+  seen_conflict = false;
   fprintf (dumpfile,
 	   _("/* Prototypes for external procedures generated from %s\n"
 	 "   by GNU Fortran %s%s.\n\n"
@@ -4130,6 +4133,11 @@ gfc_dump_external_c_prototypes (FILE * file)
 return;
 
   gfc_traverse_gsymbol (gfc_gsym_root, show_external_symbol, (void *) &bind_c);
+  if (seen_conflict)
+fprintf (dumpfile,
+	 _("\n\n/* WARNING: Because of differing arguments to an external\n"
+	   "   procedure, this header file is not compatible with -std=c23."
+	   "\n\n   Use another -std option to compile.  */\n"));
 }
 
 /* Callback function for dumping external symbols, be they BIND(C) or
@@ -4406,52 +4414,35 @@ write_variable (gfc_symbol *sym)
   fputs (";\n", dumpfile);
 }
 
-
-/* Write out a procedure, including its arguments.  */
 static void
-write_proc (gfc_symbol *sym, bool bind_c)
+write_formal_arglist (gfc_symbol *sym, bool bind_c)
 {
-  const char *pre, *type_name, *post;
-  bool asterisk;
-  enum type_return rok;
   gfc_formal_arglist *f;
-  const char *sym_name;
-  const char *intent_in;
-  bool external_character;
-
-  external_character =  sym->ts.type == BT_CHARACTER && !bind_c;
-
-  if (sym->binding_label)
-sym_name = sym->binding_label;
-  else
-sym_name = sym->name;
-
-  if (sym->ts.type == BT_UNKNOWN || external_character)
-{
-  fprintf (dumpfile, "void ");
-  fputs (sym_name, dumpfile);
-}
-  else
-write_decl (&(sym->ts), sym->as, sym_name, true, &sym->declared_at, bind_c);
-
-  if (!bind_c)
-fputs ("_", dumpfile);
 
-  fputs (" (", dumpfile);
-  if (external_character)
-{
-  fprintf (dumpfile, "char *result_%s, size_t result_%s_len",
-	   sym_name, sym_name);
-  if (sym->formal)
-	fputs (", ", dumpfile);
-}
-
-  for (f = sym->formal; f; f = f->next)
+  for (f = sym->formal; f != NULL; f = f->next)
 {
+  enum type_return rok;
+  const char *intent_in;
   gfc_symbol *s;
+  const char *pre, *type_name, *post;
+  bool asterisk;
+
   s = f->sym;
   rok = get_c_type_name (&(s->ts), s->as, &pre,

Re: The COBOL front end, version 3, now in 14 easy pieces

2025-03-03 Thread James K. Lowden

On Mon, 24 Feb 2025 14:51:27 +0100
Richard Biener  wrote:

> > Our repository is
> >
> > https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/
> >
> > using branch
> >
> > cobol-stage
> >
> > I tested these patches using "git apply" to an unpublished branch
> > "cobol-patched".
> 
> I have now built the compiler from the (now published) cobol-patched
> branch.
> 
> On x86_64-linux and noticed the following issues:

Hi Richard,

I have regenerated and force-pushed cobol-patched.  It is still based
on 3e08a4ecea27c54fda90e8f58641b1986ad957e1 from February 4 because I
wanted to maintain a consistent baseline.  We will update our
repository from master tomorow; the next set, if needed, will be from
March 4.  

https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/-/tree/cobol-patched?ref_type=heads

There are now 15 patches, the last being the info documents.  To
remind, there have been changes to the texi files since our last update
from master.  These patches apply cleanly to master as of the above
commit, but not to the current status.  

I think we have answered all issues raised: 

- Flex & Bison requirements are documented
- gcobc is intentional
- man pages install correctly
- libgcobol installs to PREFIX/lib64
- nothing is written to the source tree unless requested
- any and all variations of 32-bit are prevented
- under "all" languages, gcobol is disabled except for
  x86_64 and aaarch64.
- for --languages=cobol, gcobol will build for a
  64-bit host and target.

By far the hardest for us was the autotool work for libgcobol.  My
thanks to our comrades on IRC who helped me grope my way along. 

Details below, with reference to your message:

> The toplevel configure script on the branch wasn't re-generated (duh),

committed

>  %require "3.5.1"  //3.8.2 also works, but not 3.8.0
>   ^^^
> 
> this requirement isn't documented

documented.  cobol-patched currently is missing updates to gcc/doc
from master branch.  Will update soon.

During the build we write to

gcc/cobol/charmaps-dupe.cc
gcc/cobol/valconv-dupe.cc

corrected

> Installing the result via make install DESTDIR=/foo I see both a
> 'gcobol' and a 'gcobc' program
> being installed - is that intentional?

yes, gcobc is an emulation script

> I also see the gcobol.3
> manpage reside directly in /foo/gcobol.3 rather
> than in the expected /foo/usr/local/man/..

corrected

> Installing of libgcobol fails for me:

corrected, with a caveat.  libgcobol.so (etc) now installs in
PREFIX/lib64 (previously PREFIX/lib).

However IMO, the incantation:

make install DESTDIR=/foo

is invalid.  The compiler's library search path is fixed when the
compiler is built, based on configure options.  Installing into an
arbitrary directory cannot work; there is no opportunity then to alter
the gcobol binary.

> so the suppression seems incomplete?

gcobol -m32 now produces an error message.  The built compiler will
not build 32-bit binaries, and libgcobol cannot be configured for
32-bit targets.

With regard to 64-bit hosts and targets:

1.  if --enable-languages is not used, gcobol is not built because it
is not a default language.

2.  if --enable-languages=all, gcobol is built only if the host and
target are x86_64 or aarch64.  We have not successfully built a
cross-compiler (yet), but it's allowed.

3.  if --enable-languages=cobol, gcobol will be built for any 64-bit
host or target.  The presumption is that the user has read the
documented limitations.  If he's working outside those bounds, we hope
it's because he's hoping to move them.  You mentioned,

> The way to check is likely whether a mode for __int128 exists and is
> supported.  The C fronted does
> 
>   if (targetm.scalar_mode_supported_p (TImode))
> 
> it also checks float128_type_node, but I didn't find who initializes
> that.

We have not investigated that tweak yet.  

> > ./install/gcc-cobol/usr/local/bin/gcobol t.cob -m32
> 
> only fails during linking as we do build a 32bit binary but do not
> find (the not built) 32bit runtime.  

corrected

I am sorry this took as long as it did.  We've been working on it night
and day, and know a lot more about the build system now than we ever
expected to.

Kind regards,

--jkl

[PATCH] Makefile.tpl: Implement per-stage GDCFLAGS [PR116975]

2025-03-03 Thread Iain Buclaw

Hi,

This patch implements STAGE1_GDCFLAGS and others to the configure
machinery, allowing the GDCFLAGS for each bootstrap stage of building
gdc to be overriden, as is the case with CXXFLAGS for other front-ends.

This is limited to just the generation of recipes for the
configure-stage-gcc and all-stage-gcc, as a D compiler is not optionally
needed by any other module.

OK for mainline?

Regards,
Iain.

---
PR d/116975

ChangeLog:

* Makefile.in: Regenerate.
* Makefile.tpl (STAGE[+id+]_GDCFLAGS): New.
(STAGE2_GDCFLAGS): Add -fno-checking.
(STAGE3_GDCFLAGS): Add -fchecking=1.
(BASE_FLAGS_TO_PASS): Pass STAGE[+id+]_GDCFLAGS down.
(configure-stage[+id+]-[+prefix+][+module+]): Set GDCFLAGS for all gcc
module stages.
(all-stage[+id+]-[+prefix+][+module+]): Likewise.
---
 Makefile.in  | 51 +++
 Makefile.tpl | 15 +--
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 966d6045496..b80855ffc78 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -619,6 +619,26 @@ STAGE1_CONFIGURE_FLAGS = $(STAGE1_CHECKING) \
  --disable-build-format-warnings
 
 @if target-libphobos-bootstrap
+# Defaults for each stage if we're bootstrapping D.
+
+STAGE1_GDCFLAGS = $(GDCFLAGS)
+
+STAGE2_GDCFLAGS = $(GDCFLAGS)
+
+STAGE3_GDCFLAGS = $(GDCFLAGS)
+
+STAGE4_GDCFLAGS = $(GDCFLAGS)
+
+STAGEprofile_GDCFLAGS = $(GDCFLAGS)
+
+STAGEtrain_GDCFLAGS = $(GDCFLAGS)
+
+STAGEfeedback_GDCFLAGS = $(GDCFLAGS)
+
+STAGEautoprofile_GDCFLAGS = $(GDCFLAGS)
+
+STAGEautofeedback_GDCFLAGS = $(GDCFLAGS)
+
 STAGE1_CONFIGURE_FLAGS += --with-libphobos-druntime-only
 STAGE2_CONFIGURE_FLAGS += --with-libphobos-druntime-only
 @endif target-libphobos-bootstrap
@@ -632,6 +652,10 @@ STAGE2_CFLAGS += -fno-checking
 STAGE2_TFLAGS += -fno-checking
 STAGE3_CFLAGS += -fchecking=1
 STAGE3_TFLAGS += -fchecking=1
+@if target-libphobos-bootstrap
+STAGE2_GDCFLAGS += -fno-checking
+STAGE3_GDCFLAGS += -fchecking=1
+@endif target-libphobos-bootstrap
 
 STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate
 STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
@@ -921,38 +945,47 @@ BASE_FLAGS_TO_PASS = \
"LEAN=$(LEAN)" \
"STAGE1_CFLAGS=$(STAGE1_CFLAGS)" \
"STAGE1_CXXFLAGS=$(STAGE1_CXXFLAGS)" \
+   "STAGE1_GDCFLAGS=$(STAGE1_GDCFLAGS)" \
"STAGE1_GENERATOR_CFLAGS=$(STAGE1_GENERATOR_CFLAGS)" \
"STAGE1_TFLAGS=$(STAGE1_TFLAGS)" \
"STAGE2_CFLAGS=$(STAGE2_CFLAGS)" \
"STAGE2_CXXFLAGS=$(STAGE2_CXXFLAGS)" \
+   "STAGE2_GDCFLAGS=$(STAGE2_GDCFLAGS)" \
"STAGE2_GENERATOR_CFLAGS=$(STAGE2_GENERATOR_CFLAGS)" \
"STAGE2_TFLAGS=$(STAGE2_TFLAGS)" \
"STAGE3_CFLAGS=$(STAGE3_CFLAGS)" \
"STAGE3_CXXFLAGS=$(STAGE3_CXXFLAGS)" \
+   "STAGE3_GDCFLAGS=$(STAGE3_GDCFLAGS)" \
"STAGE3_GENERATOR_CFLAGS=$(STAGE3_GENERATOR_CFLAGS)" \
"STAGE3_TFLAGS=$(STAGE3_TFLAGS)" \
"STAGE4_CFLAGS=$(STAGE4_CFLAGS)" \
"STAGE4_CXXFLAGS=$(STAGE4_CXXFLAGS)" \
+   "STAGE4_GDCFLAGS=$(STAGE4_GDCFLAGS)" \
"STAGE4_GENERATOR_CFLAGS=$(STAGE4_GENERATOR_CFLAGS)" \
"STAGE4_TFLAGS=$(STAGE4_TFLAGS)" \
"STAGEprofile_CFLAGS=$(STAGEprofile_CFLAGS)" \
"STAGEprofile_CXXFLAGS=$(STAGEprofile_CXXFLAGS)" \
+   "STAGEprofile_GDCFLAGS=$(STAGEprofile_GDCFLAGS)" \
"STAGEprofile_GENERATOR_CFLAGS=$(STAGEprofile_GENERATOR_CFLAGS)" \
"STAGEprofile_TFLAGS=$(STAGEprofile_TFLAGS)" \
"STAGEtrain_CFLAGS=$(STAGEtrain_CFLAGS)" \
"STAGEtrain_CXXFLAGS=$(STAGEtrain_CXXFLAGS)" \
+   "STAGEtrain_GDCFLAGS=$(STAGEtrain_GDCFLAGS)" \
"STAGEtrain_GENERATOR_CFLAGS=$(STAGEtrain_GENERATOR_CFLAGS)" \
"STAGEtrain_TFLAGS=$(STAGEtrain_TFLAGS)" \
"STAGEfeedback_CFLAGS=$(STAGEfeedback_CFLAGS)" \
"STAGEfeedback_CXXFLAGS=$(STAGEfeedback_CXXFLAGS)" \
+   "STAGEfeedback_GDCFLAGS=$(STAGEfeedback_GDCFLAGS)" \
"STAGEfeedback_GENERATOR_CFLAGS=$(STAGEfeedback_GENERATOR_CFLAGS)" \
"STAGEfeedback_TFLAGS=$(STAGEfeedback_TFLAGS)" \
"STAGEautoprofile_CFLAGS=$(STAGEautoprofile_CFLAGS)" \
"STAGEautoprofile_CXXFLAGS=$(STAGEautoprofile_CXXFLAGS)" \
+   "STAGEautoprofile_GDCFLAGS=$(STAGEautoprofile_GDCFLAGS)" \

"STAGEautoprofile_GENERATOR_CFLAGS=$(STAGEautoprofile_GENERATOR_CFLAGS)" \
"STAGEautoprofile_TFLAGS=$(STAGEautoprofile_TFLAGS)" \
"STAGEautofeedback_CFLAGS=$(STAGEautofeedback_CFLAGS)" \
"STAGEautofeedback_CXXFLAGS=$(STAGEautofeedback_CXXFLAGS)" \
+   "STAGEautofeedback_GDCFLAGS=$(STAGEautofeedback_GDCFLAGS)" \

"STAGEautofeedback_GENERATOR_CFLAGS=$(STAGEautofeedback_GENERATOR_CFLAGS)" \
"STAGEautofeedback_TFLAGS=$(STAGEautofeedback_TFLAGS)" \
$(CXX_FOR_TARGET_FLAG_TO_PASS) \
@@ -12114,6 +12147,7 @@ configure-stage1-gcc:
$(HOST_EXPORTS) \
CFLAGS="$(STAGE1_CFLAGS)"; export CFLAGS; \
CX

FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-03 Thread Vineet Gupta

Hi Pan, Andrew

I'm trying to understand the semantics of FRM as it intersects with calling
convention.
psABI is not explicit about it and refers to C standard [1]

> On 2/14/25 03:39, Li, Pan2 wrote:

[snip] 


> With option "-march=rv64gcv_zvfh -O3"
>
>   10   │ vxrm:
>   11   │ csrwi   vxrm,2  // Just set rm directly
> ...
>   17   │ vle16.v v2,0(a4)
>   18   │ vle16.v v1,0(a3)
> ...
>   21   │ vaaddu.vv   v1,v1,v2
>   22   │ vse16.v v1,0(a4)
>   23   │ tailcall_external
>   28   │ frm:
>   29   │ frrma2// backup
>   30   │ fsrmi   2  // set rm
> ...
>   35   │ vle16.v v1,0(a3)
>   36   │ addia5,a5,%lo(bf)
>   37   │ vfnmadd.vv  v1,v1,v1
>   38   │ vse16.v v1,0(a5)
>   39   │ fsrma2   // restore
>   40   │ tailcall_external

[snip]

> If instead we want to set a global register to a specific local value,
> the sequence would be:
>
> call foo
> TMP := FIXED_REG
> FIXED_REG := ...
> ...use FIXED_REG...
> FIXED_REG := TMP
> call bar
>
> It sounds like this is the correct sequence for FRM and it seemed to be
> what the port was generating in the PR. 

So from above msg snippets and commit 46a508ec7aee503 and its numerous tests,
I'm summarizing the following

>    1. The static frm before call should not pollute the frm value in call.

In simple terms: Before a call, if FRM is clobbered, it needs to be restored
before making the call (to "retain the global value")

>    2. The updated frm value in call should be sticky after call completed.

After a call (which can potentially set a FRM globally), if the caller clobbers
FRM, it needs to be restored back (by reading the value right after the call).

So in some convoluted way both the above scenarios have callee-saved semantics
for FRM, except for the leaf function which unconditionally sets FRM where this
save/restore is not done.

Is the above understanding correct, or is there more to it.

Thx,
-Vineet

[1] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc

Re: The COBOL front end, version 3, now in 14 easy pieces

2025-03-03 Thread Jakub Jelinek

On Mon, Mar 03, 2025 at 05:21:38PM -0500, James K. Lowden wrote:
> However IMO, the incantation:
> 
> make install DESTDIR=/foo
> 
> is invalid.  The compiler's library search path is fixed when the
> compiler is built, based on configure options.  Installing into an
> arbitrary directory cannot work; there is no opportunity then to alter
> the gcobol binary.

This is how most distros install stuff for their packaging systems,
make install DESTDIR=/some/directory
and then everything under /some/directory
is packaged into the package.
GCC is generally relocatable, the compiler driver should find the
compiler and library/include directories etc. relative to where the
driver resides in the filesystem.

Jakub

Re: [PATCH] Fortran: reject empty derived type with bind(C) attribute [PR101577]

2025-03-03 Thread Harald Anlauf


Hi Andre,

Am 03.03.25 um 10:08 schrieb Andre Vehreschild:

Hi Harald,

in +++ b/gcc/fortran/symbol.cc
@@ -4624,12 +4624,28 @@ verify_bind_c_derived_type (gfc_symbol *derived_sym)

there is

+  else if (!pedantic)
+   gfc_warning (0, "Derive ...

To me the "not pedantic" is counter-intuitive. In pedantic mode I would have
expected this to be at least a warning (if not an error). Why is it not flagged
at all? May be I expect something wrong from "pedantic".


it is actually flagged, but one would get the warning *twice*
without the above.  The reason is the following in gfc_post_options:

  /* If -pedantic, warn about the use of GNU extensions.  */
  if (pedantic && (gfc_option.allow_std & GFC_STD_GNU) != 0)
gfc_option.warn_std |= GFC_STD_GNU;
  /* -std=legacy -pedantic is effectively -std=gnu.  */
  if (pedantic && (gfc_option.allow_std & GFC_STD_LEGACY) != 0)
gfc_option.warn_std |= GFC_STD_F95_OBS | GFC_STD_F95_DEL | 
GFC_STD_LEGACY;


Therefore gfc_notify_std always warns for -std=gnu and -std=legacy
when -pedantic is given, unless it generates an error.

I've added a comment and pushed as r15-7798-gf9f16b9f74b767 .

Thanks for the review!

Harald


Besides that: Looks good to me.

Regards,
Andre

On Sun, 2 Mar 2025 22:35:47 +0100
Harald Anlauf  wrote:


Dear all,

due to an oversight in the Fortran standard before 2018,
empty derived types with bind(C) attribute were explicitly
(deliberately?) accepted by gfortran, giving a warning that
the companion processor might not provide an interoperating
entity.

In the PR, Tobias pointed to a discussion on the J3 ML that
there was a defect in older standards.  The attached patch
now generates an error when -std=f20xx is specified, and
continues to generate a warning otherwise.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald




--
Andre Vehreschild * Email: vehre ad gmx dot de

AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-03 Thread Wilco Dijkstra


Outline atomics is not designed to be used with -mcmodel=large, so disable
it automatically if the large code model is used.

Passes regress, OK for commit?

gcc:
PR target/112465
* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Turn off outline atomics with -mcmodel=large.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
fe76730b0a7c8a2baaae24152e13d82a12d5d0a3..31d083d11bfc3e9756b41c901c96749a7b8a840a
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18563,6 +18563,10 @@ aarch64_override_options_after_change_1 (struct 
gcc_options *opts)
  intermediary step for the former.  */
   if (flag_mlow_precision_sqrt)
 flag_mrecip_low_precision_sqrt = true;
+
+  /* Turn off outline atomics with -mcmodel=large.  */
+  if (aarch64_cmodel == AARCH64_CMODEL_LARGE)
+opts->x_aarch64_flag_outline_atomics = 0;
 }
 
 /* 'Unpack' up the internal tuning structs and update the options

70 matches

Mail list logo