[PATCH] gcc: add trigonometric pi-based functions as gcc builtins

2025-05-14 Thread Yuao Ma
Hi Joseph,

I have updated the patch based on your review comments. I added the newly 
introduced builtin to extend.texi and mentioned the PR in the commit message. 
Could you please take another look when you have a moment?

Yuao

From: Joseph Myers 
Sent: Thursday, May 15, 2025 0:47
To: Yuao Ma 
Cc: gcc-patches@gcc.gnu.org ; fort...@gcc.gnu.org 
; tbur...@baylibre.com 
Subject: Re: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins

On Wed, 14 May 2025, Yuao Ma wrote:

> Hi all,
>
> This patch adds trigonometric pi-based functions as gcc builtins: acospi, 
> asinpi, atan2pi,
> atanpi, cospi, sinpi, and tanpi. Latest glibc already provides support for
> these functions, which we plan to leverage in future gfortran implementations.
>
> The patch includes two test cases to verify both correct code generation and
> function definition.
>
> If approved, I suggest committing this foundational change first. Constant
> folding for these builtins will be addressed in subsequent patches.

Note that either this change, or a subsequent one that makes the built-in
functions do something useful, should also update extend.texi, "Library
Builtins", to mention the new functions.  (The text there doesn't
distinguish existing C23 built-in functions, such as exp10 or roundeven,
from those that are pure extensions, but addressing that is independent of
adding new functions to the list.  Also, I'm not sure these sentences with
very long lists of functions are really the optimal way of presenting the
information about such built-in functions; maybe Sandra has better ideas
about how to document this, but again that's independent of adding new
functions.)

The commit message should reference PR c/118592 (it's not a full fix, but
it's partial progress towards the full set of built-in functions /
constant folding).

--
Joseph S. Myers
josmy...@redhat.com



0001-gcc-add-trigonometric-pi-based-functions-as-gcc-buil.patch
Description: 0001-gcc-add-trigonometric-pi-based-functions-as-gcc-buil.patch


Re: [PATCH v2] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-05-14 Thread Rainer Orth
Hi Jonathan,

> On 14/05/25 10:01 +0200, Tomasz Kamiński wrote:
>>This commits adjust the way how the arguments are stored in the _Arg_value
>>(and thus basic_format_args), by preserving the types of fixed width
>>floating-point types, that were previously converted to float, double,
>>long double.
>>
>>The _Arg_value union now contains alternatives with std::bfloat16_t,
>>std::float16_t, std::float32_t, std::float64_t that use pre-existing
>>_Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
>>
>>This does not affect formatting, as specialization of formatters for fixed
>>width floating-point types formats them by casting to the corresponding
>>standard floating point type.
>>
>>For the 128bit floating we need to handle the ppc64 architecture,
>>(_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
>>basis) designate either __ibm128 and __ieee128 type, we need to store both
>>types in the _Arg_value and have two _Arg_types (_Arg_ibm128, _Arg_ieee128).
>>On other architectures we use extra enumerator value to store __float128,
>>that is different from long double and _Float128. This is consistent with 
>>ppc64,
>>for which __float128, if present, is same type as __ieee128. We use
> _Arg_float128
>>_M_float128 names that deviate from _Arg_fN naming scheme, to emphasize that
>>this flag is not used for std::float128_t (_Float128) type, that is 
>>consistenly
>>formatted via handle.
>>
>>The __format::__float128_t type is renamed to __format::__flt128_t, to 
>>mitigate
>>visual confusion between this type and __float128. We also introduce 
>>__bflt16_t
>>typedef instead of using of decltype.
>>
>>We add new alternative for the _Arg_value and allow them to be accessed
> via _S_get,
>>when the types are available. However, we produce and handle corresponding
> _Arg_type,
>>only when we can format them. See also r14-3329-g27d0cfcb2b33de.
>>
>>The formatter<_Float128, _CharT> that formats via __format::__flt128_t is 
>>always
>>provided, when type is available. It is still correct when 
>>__format::__flt128_t
>>is _Float128.
>>
>>We also provide formatter<__float128, _CharT> that formats via __flt128_t.
>>As this type may be disabled (-mno-float128), extra care needs to be taken,
>>for situation when __float128 is same as long double. If the formatter would 
>>be
>>defined in such case, the formatter would be generated
>>from different specializations, and have different mangling:
>>  * formatter<__float128, _CharT> if __float128 is present,
>>  * formatter<__format::__formattable_float, _CharT> otherwise.
>>To best of my knowledge this happens only on ppc64 for __ieee128 and 
>>__float128,
>>so the formatter is not defined in this case. static_assert is added to detect
>>other configurations like that. In such case we should replace it with
> constraint.
>>
>>  PR libstdc++/119246
>>
>>libstdc++-v3/ChangeLog:
>>
>>  * include/std/format (__format::__bflt16_t): Define.
>>  (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128
>>  is used.
>>  (__format::__float128_t): Renamed to __format::__flt128_t.
>>  (std::formatter<_Float128, _CharT>): Define always if there is
>>  formattable 128bit float.
>>  (std::formatter<__float128, _CharT>): Define.
>>  (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
>>  (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
>>  (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
>>  (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
>>  (_Arg_value::_M_ieee128, _Arg_value::_M_float128)
>>  (_Arg_value::_M_bf16, _Arg_value::_M_f16, _Arg_value::_M_f32)
>>   _Arg_value::_M_f64): Define.
>>  (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle __bflt16,
>>  _Float16, _Float32, _Float64, and __float128 types.
>>  (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
>>  _Float32, _Float64 and __float128 types.
>>  (basic_format_arg::_M_visit): Handle _Arg_float128, _Arg_ieee128,
>>  _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
>>  * testsuite/std/format/arguments/args.cc: Updated to illustrate
>>  that extended floating point types use handles now. Added test
>>  for __float128.
>>  * testsuite/std/format/parse_ctx.cc: Extended test to cover class
>>  to check_dynamic_spec with floating point types and handles.
>>---
>>I believe I have fixed all the typos. OK for trunk?
>
>
> OK, thanks

this patch broke Solaris bootstrap, both i386-pc-solaris2.11 and
sparc-sun-solaris2.11:

In file included from 
/vol/gcc/src/hg/master/local/libstdc++-v3/src/c++20/format.cc:29:
/var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/format:
 In member function ‘typename std::basic_format_context<_Out, _CharT>::iterator 
std::formatter<__float128, _CharT>::format(__float128, 
std::basic_format_context<_Out, _CharT>&) const’:
/var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libst

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-14 Thread Karl Meakin



On 07/05/2025 14:32, Richard Sandiford wrote:

Karl Meakin  writes:

Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR
extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): emit CMPBR
instructions if possible.
(cbranch4): new expand rule.
(aarch64_cb): likewise.
(aarch64_cb): likewise.
* config/aarch64/iterators.md (cmpbr_suffix): new mode attr.
* config/aarch64/predicates.md (const_0_to_63_operand): new
predicate.
(aarch64_cb_immediate): likewise.
(aarch64_cb_operand): likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: update tests.

In addition to Kyrill's comments (which I agree with):


@@ -720,18 +720,41 @@ (define_constants
  ;; Conditional jumps
  ;; ---
  
-(define_expand "cbranch4"

+(define_expand "cbranch4"
[(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
   (label_ref (match_operand 3))
   (pc)))]
""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], mode))
+{
+  emit_jump_insn (gen_aarch64_cb (operands[0], operands[1],
+   operands[2], operands[3]));
+  DONE;
+}

There is an implicit choice here to use a separate CMP + Bcc if the
immediate is out of range, rather than force out-of-range immediates into
a temporary register.  That can be the right choice for immediates in the
range of CMP, but whether it is or not depends on global information that
we don't have.  If the immediate is needed for multiple branches, it would
be better (sizewise) to load the immediate into a temporary register and
use it for each branch, provided that there's a call-clobbered register
free and that the branches are in the 1KiB range.  In other situations,
what the patch does is best.


Do you mean replacing code like
```
cmp x1, 100
beq .L1
cmp x2 100
beq .L2
```

with
```
mov x0 100
cbeq x0, x1, .L1
cbeq x0, x2, .L2
```

That would be preferable, but as you say we would need to know whether 
.L1 and .L2 are in range, and if x0 is free.
I don't think that is something we can easily determine in the middle of 
RTL generation.



But perhaps it would be worth forcing values that are outside the
range of CMP into a register and using the new form, rather than
emitting an immediate move, a CMP, and a branch.


That already happens thanks to the ordering of the rules (the rule for 
MOV+CB comes before the rule for MOV+CMP+B).

I will add some tests to record this behaviour.



Either way, I think it's worth a comment saying what we do with
out-of-range immediates.


+  else
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+operands[1], operands[2]);
+  operands[2] = const0_rtx;
+}
+  }
+)
+
@@ -758,6 +781,58 @@ (define_expand "cbranchcc4"
""
  )
  
+;; Emit a `CB (register)` or `CB (immediate)` instruction.

+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPI 1 "register_operand")
+(match_operand:GPI 2 "aarch64_cb_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb%m0\\t%1, %2, %l3";
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; Emit a `CBB (register)` or `CBH (register)` instruction.
+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 
"aarch64_cb_short_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb%m0\\t%1, %2, %l3";
+  [(set_attr "type" "branch")

Re: [PATCH v4 0/3] extend "counted_by" attribute to pointer fields of structures

2025-05-14 Thread Qing Zhao
FYI.

This feature has been committed into CLANG yesterday.

https://github.com/llvm/llvm-project/pull/137250


Qing
> On May 13, 2025, at 17:03, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 4th version of the patch set to extend "counted_by" attribute
> to pointer fields of structures.
> 
> compared to the 3rd version:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682310.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682312.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682311.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/682313.html
> 
> The major change are:
> 
> A. Explicitly disallow counted_by attribute for a void * field. Report error
> for such cases. Delete the support for void * from both __bdos and bound
> sanitizer.
> 
> B. Some refactoring on the 3rd patch, bound santizer to make it easierly to
> be understood.
> 
> C. Bug fixes on the 3rd patch to fix a bug in bound santizer Kees reported 
> when he run the 3rd version on his testing suites.
> 
> 
> This patch set includes 3 parts:
> 
> 1.Extend "counted_by" attribute to pointer fields of structures. 
> 2.Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE
>and use it in builtinin-object-size.
> 3.Use the counted_by attribute of pointers in array bound checker.
> 
> In which, the patch 1 and 2 are simple and straightforward, however, the 
> patch 3  
> is a little complicate due to the following reason:
> 
>Current array bound checker only instruments ARRAY_REF, and the INDEX
>information is the 2nd operand of the ARRAY_REF.
> 
>When extending the array bound checker to pointer references with
>counted_by attributes, the hardest part is to get the INDEX of the
>corresponding array ref from the offset computation expression of
>the pointer ref. 
> 
> I have done some study on the other appraoch I thought previously, and 
> realized
> that the current implementation might be better. Please see the following
> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683136.html
> for details.
> 
> The whole patch set has been bootstrapped and regression tested on both
> aarch64 and x86.
> 
> Okay for trunk?
> 
> Thanks a lot.
> 
> Qing
> 
> 
> 
> 
> the first version was submitted 4 months ago on 1/16/2025, and triggered
> a lot of discussion on whether we need a new syntax for counted_by
> attribute.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673837.html
> 
> After a long discussion since then: 
> (https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677024.html)
> 
> We agreed to the following compromised solution:
> 
> 1. Keep the current syntax of counted_by for lone identifier;
> 2. Add a new attribute "counted_by_exp" for expressions.
> 
> Although there are still some discussion going on for the new 
> counted_by_exp attribute (In Clang community) 
> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
> 
> The syntax for the lone identifier is kept the same as before.
> 
> So, I'd like to resubmit my previous patch of extending "counted_by"
> to pointer fields of structures. 
> 
> The whole patch set has been rebased on the latest trunk, some testing case
> adjustment,  bootstrapped  and regression tested on both aarch64 and x86.
> 
> There will be a seperate patch set for the new "counted_by_exp" 
> attribute later to cover the expressions cases.
> 
> The following are more details on this patch set:
> 
> For example:
> 
> struct PP {
>  size_t count2;
>  char other1;
>  char *array2 __attribute__ ((counted_by (count2)));
>  int other2;
> } *pp;
> 
> specifies that the "array2" is an array that is pointed by the
> pointer field, and its number of elements is given by the field
> "count2" in the same structure.
> 
> There are the following important facts about "counted_by" on pointer
> fields compared to the "counted_by" on FAM fields:
> 
> 1. one more new requirement for pointer fields with "counted_by" attribute:
>   pp->array2 and pp->count2 can ONLY be changed by changing the whole 
> structure
>   at the same time.
> 
> 2. the following feature for FAM field with "counted_by" attribute is NOT
>   valid for the pointer field any more:
> 
>" One important feature of the attribute is, a reference to the
> flexible array member field uses the latest value assigned to the
> field that represents the number of the elements before that
> reference.  For example,
> 
>p->count = val1;
>p->array[20] = 0;  // ref1 to p->array
>p->count = val2;
>p->array[30] = 0;  // ref2 to p->array
> 
> in the above, 'ref1' uses 'val1' as the number of the elements in
> 'p->array', and 'ref2' uses 'val2' as the number of elements in
> 'p->array'. "



Re: [PATCH][x86] Fix regression from x86 multi-epilogue tuning

2025-05-14 Thread Jan Hubicka
> With the avx512_two_epilogues tuning enabled for zen4 and zen5
> the gcc.target/i386/vect-epilogues-5.c testcase below regresses
> and ends up using AVX2 sized vectors for the masked epilogue
> rather than AVX512 sized vectors.  The following patch rectifies
> this and adds coverage for the intended behavior.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> OK for trunk and 15 branch if that succeeds?
> 
> Thanks,
> Richard.
> 
>   * config/i386/i386.cc (ix86_vector_costs::finish_cost):
>   Do not suggest a first epilogue mode for AVX512 sized
>   main loops with X86_TUNE_AVX512_TWO_EPILOGUES as that
>   interferes with using a masked epilogue.
> 
>   * gcc.target/i386/vect-epilogues-1.c: New testcase.
>   * gcc.target/i386/vect-epilogues-2.c: Likewise.
>   * gcc.target/i386/vect-epilogues-3.c: Likewise.
>   * gcc.target/i386/vect-epilogues-4.c: Likewise.
>   * gcc.target/i386/vect-epilogues-5.c: Likewise.
OK
thanks,
Honza


Re: [PATCH] c++: unifying specializations of non-primary tmpls [PR120161]

2025-05-14 Thread Jason Merrill

On 5/12/25 7:53 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK
for trunk/15/14?

-- >8 --

Here unification of P=Wrap::type, A=Wrap::type wrongly
succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
no longer recurse into template arguments for non-primary templates
(since they're a non-deduced context) and so the int/long mismatch that
makes the two types distinct goes unnoticed.

In the case of (comparing specializations of) a non-primary template,
unify should still go on to compare the types directly before returning
success.


Should the PRIMARY_TEMPLATE_P check instead move up to join the 
CLASSTYPE_TEMPLATE_INFO check?  try_class_deduction also doesn't seem 
applicable to non-primary templates.



PR c++/120161

gcc/cp/ChangeLog:

* pt.cc (unify) : When comparing specializations
of a non-primary template, still perform a type comparison.

gcc/testsuite/ChangeLog:

* g++.dg/template/unify13.C: New test.
---
  gcc/cp/pt.cc|  6 +++---
  gcc/testsuite/g++.dg/template/unify13.C | 18 ++
  2 files changed, 21 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/unify13.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0d64a1cfb128..868dd0e2b3ff 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -25785,10 +25785,10 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
  INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)),
  INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (t)),
  UNIFY_ALLOW_NONE, explain_p);
- else
-   return unify_success (explain_p);
+ gcc_checking_assert (t == arg);
}
-  else if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
+
+  if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
return unify_type_mismatch (explain_p, parm, arg);
return unify_success (explain_p);
  
diff --git a/gcc/testsuite/g++.dg/template/unify13.C b/gcc/testsuite/g++.dg/template/unify13.C

new file mode 100644
index ..ec7ca9d17a44
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/unify13.C
@@ -0,0 +1,18 @@
+// PR c++/120161
+
+template
+struct mp_list { };
+
+template
+struct Wrap { struct type { }; };
+
+struct A : mp_list::type, void>
+ , mp_list::type, void> { };
+
+template
+void f(mp_list::type, U>*);
+
+int main() {
+  A a;
+  f(&a);
+}




Re: [PATCH 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA, and ADDHNB instructions

2025-05-14 Thread Dhruv Chawla

On 06/05/25 19:35, Richard Sandiford wrote:

External email: Use caution opening links or attachments


Hi,

Thanks for the update.  The patch mostly looks good, but one minor and
one more substantial comment below.

BTW, the patch seems to have been corrupted en route, in that unchanged
lines have too much space.  Attaching is fine if that's easier.


Hi,

I have tried using git send-email for the next round of patches. Please let me 
know if the formatting is still broken! Thanks.



Dhruv Chawla  writes:

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index a72ca2a500d..42802bac653 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4149,80 +4149,58 @@
   (define_expand "@aarch64_adr_shift"
 [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
   (plus:SVE_FULL_SDI
-   (unspec:SVE_FULL_SDI
- [(match_dup 4)
-  (ashift:SVE_FULL_SDI
-(match_operand:SVE_FULL_SDI 2 "register_operand")
-(match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
- UNSPEC_PRED_X)
+   (ashift:SVE_FULL_SDI
+ (match_operand:SVE_FULL_SDI 2 "register_operand")
+ (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))
 (match_operand:SVE_FULL_SDI 1 "register_operand")))]
 "TARGET_SVE && TARGET_NON_STREAMING"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
+  {}
   )


The {} can be removed.


[...]
@@ -4803,6 +4781,9 @@

   ;; Unpredicated shift by a scalar, which expands into one of the vector
   ;; shifts below.
+;;
+;; The unpredicated form is emitted only when the shift amount is a constant
+;; value that is valid for the shift being carried out.
   (define_expand "3"
 [(set (match_operand:SVE_I 0 "register_operand")
   (ASHIFT:SVE_I
@@ -4810,20 +4791,29 @@
 (match_operand: 2 "general_operand")))]
 "TARGET_SVE"
 {
-rtx amount;
+rtx amount = NULL_RTX;
   if (CONST_INT_P (operands[2]))
 {
- amount = gen_const_vec_duplicate (mode, operands[2]);
- if (!aarch64_sve_shift_operand (operands[2], mode))
-   amount = force_reg (mode, amount);
+ if (aarch64_simd_shift_imm_p (operands[2], mode, _optab == 
ashl_optab))
+   operands[2] = aarch64_simd_gen_const_vector_dup (mode, INTVAL 
(operands[2]));
+ else
+   {
+ amount = gen_const_vec_duplicate (mode, operands[2]);
+ if (!aarch64_sve_shift_operand (operands[2], mode))
+   amount = force_reg (mode, amount);
+   }
 }
   else
 {
   amount = convert_to_mode (mode, operands[2], 0);
   amount = expand_vector_broadcast (mode, amount);
 }
-emit_insn (gen_v3 (operands[0], operands[1], amount));
-DONE;
+
+if (amount)
+  {
+ emit_insn (gen_v3 (operands[0], operands[1], amount));
+ DONE;
+  }
 }
   )



Instead of the two hunks above, I think we should leave 3
alone and change v3.


I was not able to move all the changes to v3: I had to gate the call to 
force_reg on aarch64_simd_shift_imm_p as it would otherwise move the immediate into a register and it 
would fail to match the pattern later on in v3.



This would involve changing:


@@ -4867,27 +4857,27 @@
 ""
   )



-;; Unpredicated shift operations by a constant (post-RA only).
+;; Unpredicated shift operations by a constant.
   ;; These are generated by splitting a predicated instruction whose
   ;; predicate is unused.
-(define_insn "*post_ra_v_ashl3"
+(define_insn "*v_ashl3"
 [(set (match_operand:SVE_I 0 "register_operand")
   (ashift:SVE_I
 (match_operand:SVE_I 1 "register_operand")
 (match_operand:SVE_I 2 "aarch64_simd_lshift_imm")))]
-  "TARGET_SVE && reload_completed"
+  "TARGET_SVE"
 {@ [ cons: =0 , 1 , 2   ]
[ w, w , vs1 ] add\t%0., %1., %1.
[ w, w , Dl  ] lsl\t%0., %1., #%2
 }
   )



-(define_insn "*post_ra_v_3"
+(define_insn "*v_3"
 [(set (match_operand:SVE_I 0 "register_operand" "=w")
   (SHIFTRT:SVE_I
 (match_operand:SVE_I 1 "register_operand" "w")
 (match_operand:SVE_I 2 "aarch64_simd_rshift_imm")))]
-  "TARGET_SVE && reload_completed"
+  "TARGET_SVE"
 "\t%0., %1., #%2"
   )


...these instructions to named patterns, e.g. aarch64_vashl3_const
and aarch64_v3_const respectively (with no _ after v, for
consistency with the optab name).  Then v3 could
generate aarch64_v3_const for the constant case.

Thanks,
Richard


--
Regards,
Dhruv



[PATCH] libstdc++: Implement C++26 function_ref [PR119126]

2025-05-14 Thread Tomasz Kamiński
This patch implements C++26 function_ref as specified in P0792R14,
with correction for constraints for constructor accepting nontype_t
parameter from LWG 4256.

As function_ref may store a pointer to the const object, __Ptrs::_M_obj is
changed to const void*, so again we do not cast away const from const
objects. To help with necessary cast, a __polyfunc::__cast_to helper is
added, that accepts a reference to that type.

The _Invoker now defines additional call methods used by function_ref:
_S_ptrs() for invoking target passed by reference, and __S_nttp, _S_bind_ptr,
_S_bind_ref for handling constructors accepting nontype_t. The existing
_S_call_storage is changed to thin wrappers, that initialies _Ptrs,
and forwards to _S_call_ptrs.

This reduced the most uses of _Storage::_M_ptr and _Storage::_M_ref,
so this functions was removed, and _Manager uses were adjusted.

Finally we make function_ref available in freestanding mode, as
move_only_function and copyable_function iarecurrently only available in hosted,
so we define _Manager and _Mo_base only if either __glibcxx_move_only_function
or __glibcxx_copyable_function is defined.

PR libstdc++/119126

libstdc++-v3/ChangeLog:

* doc/doxygen/stdheader.cc: Added funcref_impl.h file.
* include/Makefile.am: Added funcref_impl.h file.
* include/Makefile.in: Added funcref_impl.h file.
* include/bits/funcref_impl.h: New file.
* include/bits/funcwrap.h: (_Ptrs::_M_obj): Const-qualify.
(_Storage::_M_ptr, _Storage::_M_ref): Remove.
(__polyfunc::__cast_to) Define.
(_Base_invoker::_S_ptrs, _Base_invoker::_S_nttp)
(_Base_invoker::_S_bind_ptrs, _Base_invoker::_S_bind_ref)
(_Base_invoker::_S_call_ptrs): Define.
(_Base_invoker::_S_call_storage): Foward to _S_call_ptrs.
(_Manager::_S_local, _Manager::_S_ptr): Adjust for _M_obj being
const qualified.
(__polyfunc::_Manager, __polyfunc::_Mo_base): Guard with
__glibcxx_move_only_function || __glibcxx_copyable_function.
(std::function_ref, std::__is_function_ref_v)
[__glibcxx_function_ref]: Define.
* include/bits/utility.h (std::nontype_t, std::nontype)
(__is_nontype_v) [__glibcxx_function_ref]: Define.
* include/bits/version.def: Define function_ref.
* include/bits/version.h: Regenerate.
* src/c++23/std.cc.in (std::function_ref) [__cpp_lib_function_ref]:
 Export.
* testsuite/20_util/function_ref/assign.cc: New test.
* testsuite/20_util/function_ref/call.cc: New test.
* testsuite/20_util/function_ref/cons.cc: New test.
* testsuite/20_util/function_ref/cons_neg.cc: New test.
* testsuite/20_util/function_ref/conv.cc: New test.
---
Would appreciate check of the documentation comments in funcref_impl.h
file. 

 libstdc++-v3/doc/doxygen/stdheader.cc |   1 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/funcref_impl.h  | 185 +++
 libstdc++-v3/include/bits/funcwrap.h  | 154 
 libstdc++-v3/include/bits/utility.h   |  17 ++
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/functional   |   4 +-
 libstdc++-v3/src/c++23/std.cc.in  |   3 +
 .../testsuite/20_util/function_ref/assign.cc  | 110 +
 .../testsuite/20_util/function_ref/call.cc| 145 
 .../testsuite/20_util/function_ref/cons.cc| 219 ++
 .../20_util/function_ref/cons_neg.cc  |  30 +++
 .../testsuite/20_util/function_ref/conv.cc| 152 
 15 files changed, 993 insertions(+), 47 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/funcref_impl.h
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/assign.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/call.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons_neg.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/conv.cc

diff --git a/libstdc++-v3/doc/doxygen/stdheader.cc 
b/libstdc++-v3/doc/doxygen/stdheader.cc
index 839bfc81bc0..938b2b04a26 100644
--- a/libstdc++-v3/doc/doxygen/stdheader.cc
+++ b/libstdc++-v3/doc/doxygen/stdheader.cc
@@ -55,6 +55,7 @@ void init_map()
 headers["functional_hash.h"]= "functional";
 headers["mofunc_impl.h"]= "functional";
 headers["cpyfunc_impl.h"]   = "functional";
+headers["funcref_impl.h"]   = "functional";
 headers["funcwrap.h"]   = "functional";
 headers["invoke.h"] = "functional";
 headers["ranges_cmp.h"] = "functional";
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 3e5b6c4142e..baf0

[PATCH][x86] Fix regression from x86 multi-epilogue tuning

2025-05-14 Thread Richard Biener
With the avx512_two_epilogues tuning enabled for zen4 and zen5
the gcc.target/i386/vect-epilogues-5.c testcase below regresses
and ends up using AVX2 sized vectors for the masked epilogue
rather than AVX512 sized vectors.  The following patch rectifies
this and adds coverage for the intended behavior.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk and 15 branch if that succeeds?

Thanks,
Richard.

* config/i386/i386.cc (ix86_vector_costs::finish_cost):
Do not suggest a first epilogue mode for AVX512 sized
main loops with X86_TUNE_AVX512_TWO_EPILOGUES as that
interferes with using a masked epilogue.

* gcc.target/i386/vect-epilogues-1.c: New testcase.
* gcc.target/i386/vect-epilogues-2.c: Likewise.
* gcc.target/i386/vect-epilogues-3.c: Likewise.
* gcc.target/i386/vect-epilogues-4.c: Likewise.
* gcc.target/i386/vect-epilogues-5.c: Likewise.
---
 gcc/config/i386/i386.cc  | 10 +++---
 gcc/testsuite/gcc.target/i386/vect-epilogues-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/vect-epilogues-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/vect-epilogues-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/vect-epilogues-4.c | 13 +
 gcc/testsuite/gcc.target/i386/vect-epilogues-5.c | 13 +
 6 files changed, 73 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-epilogues-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-epilogues-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-epilogues-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-epilogues-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-epilogues-5.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38df84f7db2..a6f0a582c3d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25545,14 +25545,10 @@ ix86_vector_costs::finish_cost (const vector_costs 
*scalar_costs)
   /* When X86_TUNE_AVX512_TWO_EPILOGUES is enabled arrange for both
  a AVX2 and a SSE epilogue for AVX512 vectorized loops.  */
   if (loop_vinfo
+  && LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+  && GET_MODE_SIZE (loop_vinfo->vector_mode) == 32
   && ix86_tune_features[X86_TUNE_AVX512_TWO_EPILOGUES])
-{
-  if (GET_MODE_SIZE (loop_vinfo->vector_mode) == 64)
-   m_suggested_epilogue_mode = V32QImode;
-  else if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
-  && GET_MODE_SIZE (loop_vinfo->vector_mode) == 32)
-   m_suggested_epilogue_mode = V16QImode;
-}
+m_suggested_epilogue_mode = V16QImode;
   /* When a 128bit SSE vectorized epilogue still has a VF of 16 or larger
  enable a 64bit SSE epilogue.  */
   if (loop_vinfo
diff --git a/gcc/testsuite/gcc.target/i386/vect-epilogues-1.c 
b/gcc/testsuite/gcc.target/i386/vect-epilogues-1.c
new file mode 100644
index 000..a7f5f12c71b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-epilogues-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-avx512f -mtune=generic 
-fdump-tree-vect-optimized" } */
+
+int test (signed char *data, int n)
+{
+  int sum = 0;
+  for (int i = 0; i < n; ++i)
+sum += data[i];
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump "loop vectorized using 32 byte vectors" "vect" 
} } */
+/* { dg-final { scan-tree-dump "loop vectorized using 16 byte vectors" "vect" 
} } */
+/* { dg-final { scan-tree-dump "loop vectorized using 8 byte vectors" "vect" { 
target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-epilogues-2.c 
b/gcc/testsuite/gcc.target/i386/vect-epilogues-2.c
new file mode 100644
index 000..d6c06edcacd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-epilogues-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512bw -mtune=generic -fdump-tree-vect-optimized" } */
+
+int test (signed char *data, int n)
+{
+  int sum = 0;
+  for (int i = 0; i < n; ++i)
+sum += data[i];
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump "loop vectorized using 64 byte vectors" "vect" 
} } */
+/* { dg-final { scan-tree-dump "loop vectorized using 32 byte vectors" "vect" 
} } */
+/* { dg-final { scan-tree-dump-not "loop vectorized using 16 byte vectors" 
"vect" } } */
+/* { dg-final { scan-tree-dump-not "loop vectorized using 8 byte vectors" 
"vect" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-epilogues-3.c 
b/gcc/testsuite/gcc.target/i386/vect-epilogues-3.c
new file mode 100644
index 000..0ee610f5e3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-epilogues-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512bw -mtune=znver4 -fdump-tree-vect-optimized" } */
+
+int test (signed char *data, int n)
+{
+  int sum = 0;
+  for (int i = 0; i < n; ++i)
+sum += data[i];
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump "loop vectorized using 64 byte vectors" "vect" 
} } */
+/* {

[PATCH] Enhance -fopt-info-vec vectorized loop diagnostic

2025-05-14 Thread Richard Biener
The following includes whether we vectorize an epilogue, whether
we use loop masking and what vectorization factor (unroll factor)
we use.  So it's now

t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 32
t.c:4:21: optimized: epilogue loop vectorized using masked 64 byte vectors and 
unroll factor 32

for a masked epilogue with AVX512 and HImode data for example.  Rather
than

t.c:4:21: optimized: loop vectorized using 64 byte vectors
t.c:4:21: optimized: loop vectorized using 64 byte vectors

I verified we don't translate opt-info messages and thus excessive
use of %s to compose the strings should be OK.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu
(merely to look for testcases scanning for the old message too
closely).

Any comments or suggestions for improvements?

* tree-vectorizer.cc (vect_transform_loops): When diagnosing
a vectorized loop indicate whether we vectorized an epilogue,
whether we used masked vectors and what unroll factor was
used.
---
 gcc/tree-vectorizer.cc | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index 447f882c518..2f77e46ba99 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1026,10 +1026,19 @@ vect_transform_loops (hash_table 
*&simduid_to_vf_htab,
 {
   if (GET_MODE_SIZE (loop_vinfo->vector_mode).is_constant (&bytes))
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"loop vectorized using %wu byte vectors\n", bytes);
+"%sloop vectorized using %s%wu byte vectors and"
+" unroll factor %u\n",
+LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+? "epilogue " : "",
+LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+? "masked " : "", bytes,
+(unsigned int) LOOP_VINFO_VECT_FACTOR
+(loop_vinfo).to_constant ());
   else
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"loop vectorized using variable length vectors\n");
+"%sloop vectorized using variable length vectors\n",
+LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+? "epilogue " : "");
 }
 
   loop_p new_loop = vect_transform_loop (loop_vinfo,
-- 
2.43.0


[PATCH v3 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-14 Thread dhruvc
From: Dhruv Chawla 

This patch modifies the shift expander to immediately lower constant
shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns
to match the lowered forms of the shifts, as the predicate register is
not required for these instructions.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift):
Match lowered form of ashift.
(*aarch64_adr_shift): Likewise.
(*aarch64_adr_shift_sxtw): Likewise.
(*aarch64_adr_shift_uxtw): Likewise.
(3): Avoid moving legal immediate shift
amounts into a new register.
(v3): Generate unpredicated shifts for constant
operands.
(*post_ra_v_ashl3): Rename to ...
(aarch64_vashl3_const): ... this and remove reload requirement.
(*post_ra_v_3): Rename to ...
(aarch64_v3_const): ... this and remove reload
requirement.
* gcc/config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_): Match lowered form of
SHIFTRT.
(*aarch64_sve2_sra): Likewise.
(*bitmask_shift_plus): Match lowered form of lshiftrt.
---
 gcc/config/aarch64/aarch64-sve.md  | 90 +-
 gcc/config/aarch64/aarch64-sve2.md | 46 +--
 2 files changed, 53 insertions(+), 83 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index bf7569f932b..cb88d6d95a6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4234,80 +4234,57 @@
 (define_expand "@aarch64_adr_shift"
   [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
(plus:SVE_FULL_SDI
- (unspec:SVE_FULL_SDI
-   [(match_dup 4)
-(ashift:SVE_FULL_SDI
-  (match_operand:SVE_FULL_SDI 2 "register_operand")
-  (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:SVE_FULL_SDI
+   (match_operand:SVE_FULL_SDI 2 "register_operand")
+   (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))
  (match_operand:SVE_FULL_SDI 1 "register_operand")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
 )
 
-(define_insn_and_rewrite "*aarch64_adr_shift"
+(define_insn "*aarch64_adr_shift"
   [(set (match_operand:SVE_24I 0 "register_operand" "=w")
(plus:SVE_24I
- (unspec:SVE_24I
-   [(match_operand 4)
-(ashift:SVE_24I
-  (match_operand:SVE_24I 2 "register_operand" "w")
-  (match_operand:SVE_24I 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:SVE_24I
+   (match_operand:SVE_24I 2 "register_operand" "w")
+   (match_operand:SVE_24I 3 "const_1_to_3_operand"))
  (match_operand:SVE_24I 1 "register_operand" "w")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0., [%1., %2., lsl %3]"
-  "&& !CONSTANT_P (operands[4])"
-  {
-operands[4] = CONSTM1_RTX (mode);
-  }
 )
 
 ;; Same, but with the index being sign-extended from the low 32 bits.
 (define_insn_and_rewrite "*aarch64_adr_shift_sxtw"
   [(set (match_operand:VNx2DI 0 "register_operand" "=w")
(plus:VNx2DI
- (unspec:VNx2DI
-   [(match_operand 4)
-(ashift:VNx2DI
-  (unspec:VNx2DI
-[(match_operand 5)
- (sign_extend:VNx2DI
-   (truncate:VNx2SI
- (match_operand:VNx2DI 2 "register_operand" "w")))]
-UNSPEC_PRED_X)
-  (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:VNx2DI
+   (unspec:VNx2DI
+ [(match_operand 4)
+  (sign_extend:VNx2DI
+(truncate:VNx2SI
+  (match_operand:VNx2DI 2 "register_operand" "w")))]
+UNSPEC_PRED_X)
+   (match_operand:VNx2DI 3 "const_1_to_3_operand"))
  (match_operand:VNx2DI 1 "register_operand" "w")))]
   "TARGET_SVE && TARGET_NON_STREAMING"
   "adr\t%0.d, [%1.d, %2.d, sxtw %3]"
-  "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))"
+  "&& !CONSTANT_P (operands[4])"
   {
-operands[5] = operands[4] = CONSTM1_RTX (VNx2BImode);
+operands[4] = CONSTM1_RTX (VNx2BImode);
   }
 )
 
 ;; Same, but with the index being zero-extended from the low 32 bits.
-(define_insn_and_rewrite "*aarch64_adr_shift_uxtw"
+(define_insn "*aarch64_adr_shift_uxtw"
   [(set (match_operand:VNx2DI 0 "register_operand" "=w")
(plus:VNx2DI
- (unspec:VNx2DI
-   [(match_operand 5)
-(ashift:VNx2DI
-  (and:VNx2DI
-(match_operand:VNx2DI 2 "register_operand" "w")
-(match_operand:VNx2DI 4 "aarch64_sve_uxtw_immediate"))
-  (match_operand:VNx2DI 3 "const_1_to_3_operand"))]
-   UNSPEC_PRED_X)
+ (ashift:VNx2DI
+   

Re: [PATCH] RISC-V: Fix uninit riscv_subset_list::m_allow_adding_dup issue

2025-05-14 Thread Jeff Law




On 5/12/25 8:34 PM, Kito Cheng wrote:

We forgot to initialize m_allow_adding_dup in the constructor of
riscv_subset_list, then that will be a random value...that will lead
to a random behavior of the -march may accpet duplicate extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::riscv_subset_list): Init m_allow_adding_dup.
Thanks.  I haven't dug into the failure yet, but I'm hoping this was the 
cause of the recent bootstrap failure.


jeff



Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-14 Thread Karl Meakin



On 07/05/2025 14:32, Richard Sandiford wrote:

Karl Meakin  writes:

Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR
extension is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.md (cbranch4): emit CMPBR
instructions if possible.
(cbranch4): new expand rule.
(aarch64_cb): likewise.
(aarch64_cb): likewise.
* config/aarch64/iterators.md (cmpbr_suffix): new mode attr.
* config/aarch64/predicates.md (const_0_to_63_operand): new
predicate.
(aarch64_cb_immediate): likewise.
(aarch64_cb_operand): likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cmpbr.c: update tests.

In addition to Kyrill's comments (which I agree with):


@@ -720,18 +720,41 @@ (define_constants
  ;; Conditional jumps
  ;; ---
  
-(define_expand "cbranch4"

+(define_expand "cbranch4"
[(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
[(match_operand:GPI 1 "register_operand")
 (match_operand:GPI 2 "aarch64_plus_operand")])
   (label_ref (match_operand 3))
   (pc)))]
""
-  "
-  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1],
-operands[2]);
-  operands[2] = const0_rtx;
-  "
+  {
+  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], mode))
+{
+  emit_jump_insn (gen_aarch64_cb (operands[0], operands[1],
+   operands[2], operands[3]));
+  DONE;
+}

There is an implicit choice here to use a separate CMP + Bcc if the
immediate is out of range, rather than force out-of-range immediates into
a temporary register.  That can be the right choice for immediates in the
range of CMP, but whether it is or not depends on global information that
we don't have.  If the immediate is needed for multiple branches, it would
be better (sizewise) to load the immediate into a temporary register and
use it for each branch, provided that there's a call-clobbered register
free and that the branches are in the 1KiB range.  In other situations,
what the patch does is best.

But perhaps it would be worth forcing values that are outside the
range of CMP into a register and using the new form, rather than
emitting an immediate move, a CMP, and a branch.

Either way, I think it's worth a comment saying what we do with
out-of-range immediates.


+  else
+{
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
+operands[1], operands[2]);
+  operands[2] = const0_rtx;
+}
+  }
+)
+
@@ -758,6 +781,58 @@ (define_expand "cbranchcc4"
""
  )
  
+;; Emit a `CB (register)` or `CB (immediate)` instruction.

+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:GPI 1 "register_operand")
+(match_operand:GPI 2 "aarch64_cb_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb%m0\\t%1, %2, %l3";
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_string "no")
+ (const_string "yes")))]
+)
+
+;; Emit a `CBB (register)` or `CBH (register)` instruction.
+(define_insn "aarch64_cb"
+  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
+   [(match_operand:SHORT 1 "register_operand")
+(match_operand:SHORT 2 
"aarch64_cb_short_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
+  "TARGET_CMPBR"
+  "cb%m0\\t%1, %2, %l3";
+  [(set_attr "type" "branch")
+   (set (attr "length")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_P_1Kib)))
+ (const_int 4)
+ (const_int 8)))
+   (set (attr "far_branch")
+   (if_then_else (and (ge (minus (match_dup 3) (pc))
+  (const_int BRANCH_LEN_N_1Kib))
+  (lt 

Re: [patch, Fortran] Fix PR 120139, missing asterisk on prototype with -fc-prototypes

2025-05-14 Thread Thomas Koenig

Hi Paul,


Same remark as for PR120107! LGTM for both branches.


Committed both patches. Thanks for the reviews!

Best regards

Thomas



Re: [PATCH] c++: unifying specializations of non-primary tmpls [PR120161]

2025-05-14 Thread Patrick Palka
On Wed, 14 May 2025, Jason Merrill wrote:

> On 5/12/25 7:53 PM, Patrick Palka wrote:
> > Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK
> > for trunk/15/14?
> > 
> > -- >8 --
> > 
> > Here unification of P=Wrap::type, A=Wrap::type wrongly
> > succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
> > no longer recurse into template arguments for non-primary templates
> > (since they're a non-deduced context) and so the int/long mismatch that
> > makes the two types distinct goes unnoticed.
> > 
> > In the case of (comparing specializations of) a non-primary template,
> > unify should still go on to compare the types directly before returning
> > success.
> 
> Should the PRIMARY_TEMPLATE_P check instead move up to join the
> CLASSTYPE_TEMPLATE_INFO check?  try_class_deduction also doesn't seem
> applicable to non-primary templates.

I don't think that'd work, for either the CLASSTYPE_TEMPLATE_INFO (parm) check
or the earlier CLASSTYPE_TEMPLATE_INFO (arg) check.

While try_class_deduction directly doesn't apply to non-primary templates,
get_template_base still might, so if we move up the PRIMARY_TEMPLATE_P to join
the C_T_I (parm) check, then we wouldn't try get_template_base anymore which
would  break e.g.

template struct B { };

template
struct A {
  struct C : B { };
};

template void f(B*);

int main() {
  A::C c;
  f(&c);
}

If we move the PRIMARY_TEMPLATE_P check up to the C_T_I (arg) check, then
that'd mean we still don't check same_type_p on the two types in the
non-primary case, which seems wrong (although it'd fix the PR thanks to the
parm == arg early exit in unify).

> 
> > PR c++/120161
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (unify) : When comparing specializations
> > of a non-primary template, still perform a type comparison.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/template/unify13.C: New test.
> > ---
> >   gcc/cp/pt.cc|  6 +++---
> >   gcc/testsuite/g++.dg/template/unify13.C | 18 ++
> >   2 files changed, 21 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/template/unify13.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 0d64a1cfb128..868dd0e2b3ff 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -25785,10 +25785,10 @@ unify (tree tparms, tree targs, tree parm, tree
> > arg, int strict,
> >   INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)),
> >   INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (t)),
> >   UNIFY_ALLOW_NONE, explain_p);
> > - else
> > -   return unify_success (explain_p);
> > + gcc_checking_assert (t == arg);
> > }
> > -  else if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
> > +
> > +  if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
> > return unify_type_mismatch (explain_p, parm, arg);
> > return unify_success (explain_p);
> >   diff --git a/gcc/testsuite/g++.dg/template/unify13.C
> > b/gcc/testsuite/g++.dg/template/unify13.C
> > new file mode 100644
> > index ..ec7ca9d17a44
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/template/unify13.C
> > @@ -0,0 +1,18 @@
> > +// PR c++/120161
> > +
> > +template
> > +struct mp_list { };
> > +
> > +template
> > +struct Wrap { struct type { }; };
> > +
> > +struct A : mp_list::type, void>
> > + , mp_list::type, void> { };
> > +
> > +template
> > +void f(mp_list::type, U>*);
> > +
> > +int main() {
> > +  A a;
> > +  f(&a);
> > +}
> 
> 



Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-14 Thread Richard Sandiford
Karl Meakin  writes:
> On 07/05/2025 14:32, Richard Sandiford wrote:
>> Karl Meakin  writes:
>>> Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR
>>> extension is enabled.
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aarch64/aarch64.md (cbranch4): emit CMPBR
>>> instructions if possible.
>>> (cbranch4): new expand rule.
>>> (aarch64_cb): likewise.
>>> (aarch64_cb): likewise.
>>> * config/aarch64/iterators.md (cmpbr_suffix): new mode attr.
>>> * config/aarch64/predicates.md (const_0_to_63_operand): new
>>> predicate.
>>> (aarch64_cb_immediate): likewise.
>>> (aarch64_cb_operand): likewise.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/aarch64/cmpbr.c: update tests.
>> In addition to Kyrill's comments (which I agree with):
>>
>>> @@ -720,18 +720,41 @@ (define_constants
>>>   ;; Conditional jumps
>>>   ;; ---
>>>   
>>> -(define_expand "cbranch4"
>>> +(define_expand "cbranch4"
>>> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>>> [(match_operand:GPI 1 "register_operand")
>>>  (match_operand:GPI 2 "aarch64_plus_operand")])
>>>(label_ref (match_operand 3))
>>>(pc)))]
>>> ""
>>> -  "
>>> -  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), 
>>> operands[1],
>>> -operands[2]);
>>> -  operands[2] = const0_rtx;
>>> -  "
>>> +  {
>>> +  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], mode))
>>> +{
>>> +  emit_jump_insn (gen_aarch64_cb (operands[0], operands[1],
>>> +   operands[2], operands[3]));
>>> +  DONE;
>>> +}
>> There is an implicit choice here to use a separate CMP + Bcc if the
>> immediate is out of range, rather than force out-of-range immediates into
>> a temporary register.  That can be the right choice for immediates in the
>> range of CMP, but whether it is or not depends on global information that
>> we don't have.  If the immediate is needed for multiple branches, it would
>> be better (sizewise) to load the immediate into a temporary register and
>> use it for each branch, provided that there's a call-clobbered register
>> free and that the branches are in the 1KiB range.  In other situations,
>> what the patch does is best.
>
> Do you mean replacing code like
> ```
> cmp x1, 100
> beq .L1
> cmp x2 100
> beq .L2
> ```
>
> with
> ```
> mov x0 100
> cbeq x0, x1, .L1
> cbeq x0, x2, .L2
> ```
>
> That would be preferable, but as you say we would need to know whether 
> .L1 and .L2 are in range, and if x0 is free.
> I don't think that is something we can easily determine in the middle of 
> RTL generation.

Right, exactly.

>> But perhaps it would be worth forcing values that are outside the
>> range of CMP into a register and using the new form, rather than
>> emitting an immediate move, a CMP, and a branch.
>
> That already happens thanks to the ordering of the rules (the rule for 
> MOV+CB comes before the rule for MOV+CMP+B).

Ah, right, I missed that the predicate on operand 2 was still
restricted to the CMP range.

> I will add some tests to record this behaviour.

Thanks.  Like I say, a comment acknowledging/explaining the choice would
be good too.

Richard

>> Either way, I think it's worth a comment saying what we do with
>> out-of-range immediates.


[PATCH v2] c++, coroutines: Fix handling of early exceptions [PR113773].

2025-05-14 Thread Iain Sandoe
>>  that indicates we have not yet reached the ramp return.

>This flag was not part of the fix on trunk, and could use more rationale.

The original fix was OK on trunk because exceptions thrown from the
return expression would happen before the initial suspend.  Having fixed
BZ199916 (which restores the state as per GCC-14) then such throws would
become indistinguishable.  Unfortunately, on many OSs that the frame is
destroyed does not become immediately obvious and testcases pass.  This
is why I included Sparc9 Solaris in the testing - there the frame
destruction does cause a fail.  So, for an implementation where the
return expr. can throw after the frame is destroyed the addition flag
is needed (I amended the patch desciption with an abbreviated comment).

>> +  gate = build2 (TRUTH_AND_EXPR, boolean_type_node, gate,
>> + coro_before_return);

>Doesn't the order of operands to the && need to be the other way around, 
>to avoid checking iarc_x after the coro state has been destroyed?

Thanks for catching that, I need to check this on the trunk patch too.

OK now (after a retest)?
thanks
Iain

--- 8< ---

This is a GCC-14 version of the same strategy as used on trunk, but
with the more wide-ranging code cleanups elided.  Since the return
expression could throw, but after the frame is destroyed, we must
also account for this, in addition to whether initial await_resume
has been called.

PR c++/113773

gcc/cp/ChangeLog:

* coroutines.cc (coro_rewrite_function_body): Do not set
initial_await_resume_called here.
(morph_fn_to_coro): Set it here, and introduce a new flag
that indicates we have not yet reached the ramp return.
Gate the EH cleanups on both of these flags).

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr113773.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc  | 45 ++---
 .../g++.dg/coroutines/torture/pr113773.C  | 66 +++
 2 files changed, 102 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr113773.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 8811d249c02..d96176973ec 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4460,7 +4460,7 @@ coro_rewrite_function_body (location_t fn_start, tree 
fnbody, tree orig,
   tree i_a_r_c
= coro_build_artificial_var (fn_start, coro_frame_i_a_r_c_id,
 boolean_type_node, orig,
-boolean_false_node);
+NULL_TREE);
   DECL_CHAIN (i_a_r_c) = var_list;
   var_list = i_a_r_c;
   add_decl_expr (i_a_r_c);
@@ -4779,10 +4779,15 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   tree coro_gro_live
 = coro_build_artificial_var (fn_start, "_Coro_gro_live",
 boolean_type_node, orig, boolean_false_node);
-
   DECL_CHAIN (coro_gro_live) = varlist;
   varlist = coro_gro_live;
 
+  tree coro_before_return
+= coro_build_artificial_var (fn_start, "_Coro_before_return",
+boolean_type_node, orig, boolean_true_node);
+  DECL_CHAIN (coro_before_return) = varlist;
+  varlist = coro_before_return;
+
   /* Collected the scope vars we need ... only one for now. */
   BIND_EXPR_VARS (ramp_bind) = nreverse (varlist);
 
@@ -4811,6 +4816,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   }
   add_decl_expr (coro_promise_live);
   add_decl_expr (coro_gro_live);
+  add_decl_expr (coro_before_return);
 
   /* The CO_FRAME internal function is a mechanism to allow the middle end
  to adjust the allocation in response to optimizations.  We provide the
@@ -4964,8 +4970,10 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
 
   tree allocated = build1 (CONVERT_EXPR, coro_frame_ptr, new_fn);
   tree r = cp_build_init_expr (coro_fp, allocated);
-  r = coro_build_cvt_void_expr_stmt (r, fn_start);
-  add_stmt (r);
+  finish_expr_stmt (r);
+
+  /* deref the frame pointer, to use in member access code.  */
+  tree deref_fp = build_x_arrow (fn_start, coro_fp, tf_warning_or_error);
 
   /* If the user provided a method to return an object on alloc fail, then
  check the returned pointer and call the func if it's null.
@@ -5001,16 +5009,22 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  destruction in the case that promise or g.r.o setup fails or an exception
  is thrown from the initial suspend expression.  */
   tree ramp_cleanup = NULL_TREE;
+  tree iarc_x = NULL_TREE;
   if (flag_exceptions)
 {
+  iarc_x = lookup_member (coro_frame_type, coro_frame_i_a_r_c_id,
+/*protect=*/1, /*want_type=*/0, tf_warning_or_error);
+  iarc_x
+   = build_class_member_access_expr (deref_fp, iarc_x, NULL_TREE, false,
+ tf_warning_or

Re: [PATCH] c++: unifying specializations of non-primary tmpls [PR120161]

2025-05-14 Thread Patrick Palka
On Wed, 14 May 2025, Patrick Palka wrote:

> On Wed, 14 May 2025, Jason Merrill wrote:
> 
> > On 5/12/25 7:53 PM, Patrick Palka wrote:
> > > Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK
> > > for trunk/15/14?
> > > 
> > > -- >8 --
> > > 
> > > Here unification of P=Wrap::type, A=Wrap::type wrongly
> > > succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
> > > no longer recurse into template arguments for non-primary templates
> > > (since they're a non-deduced context) and so the int/long mismatch that
> > > makes the two types distinct goes unnoticed.
> > > 
> > > In the case of (comparing specializations of) a non-primary template,
> > > unify should still go on to compare the types directly before returning
> > > success.
> > 
> > Should the PRIMARY_TEMPLATE_P check instead move up to join the
> > CLASSTYPE_TEMPLATE_INFO check?  try_class_deduction also doesn't seem
> > applicable to non-primary templates.
> 
> I don't think that'd work, for either the CLASSTYPE_TEMPLATE_INFO (parm) check
> or the earlier CLASSTYPE_TEMPLATE_INFO (arg) check.
> 
> While try_class_deduction directly doesn't apply to non-primary templates,
> get_template_base still might, so if we move up the PRIMARY_TEMPLATE_P to join
> the C_T_I (parm) check, then we wouldn't try get_template_base anymore which
> would  break e.g.
> 
> template struct B { };
> 
> template
> struct A {
>   struct C : B { };
> };
> 
> template void f(B*);
> 
> int main() {
>   A::C c;
>   f(&c);
> }
> 
> If we move the PRIMARY_TEMPLATE_P check up to the C_T_I (arg) check, then
> that'd mean we still don't check same_type_p on the two types in the
> non-primary case, which seems wrong (although it'd fix the PR thanks to the
> parm == arg early exit in unify).

FWIW it seems part of the weird/subtle logic here is due to the fact
that when unifying e.g. P=C with A=C, we do it twice, first via
try_class_deduction using a copy of 'targs', and if that succeeds we do
it again with the real 'targs'.  I think the logic could simultaneously
be simplified and made memory efficient if we made it so that if the
trial unification from try_class_deduction succeeds we just use its
'targs' instead of having to repeat the unification.

> 
> > 
> > >   PR c++/120161
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * pt.cc (unify) : When comparing specializations
> > >   of a non-primary template, still perform a type comparison.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/template/unify13.C: New test.
> > > ---
> > >   gcc/cp/pt.cc|  6 +++---
> > >   gcc/testsuite/g++.dg/template/unify13.C | 18 ++
> > >   2 files changed, 21 insertions(+), 3 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/template/unify13.C
> > > 
> > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > index 0d64a1cfb128..868dd0e2b3ff 100644
> > > --- a/gcc/cp/pt.cc
> > > +++ b/gcc/cp/pt.cc
> > > @@ -25785,10 +25785,10 @@ unify (tree tparms, tree targs, tree parm, tree
> > > arg, int strict,
> > > INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS 
> > > (parm)),
> > > INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS 
> > > (t)),
> > > UNIFY_ALLOW_NONE, explain_p);
> > > -   else
> > > - return unify_success (explain_p);
> > > +   gcc_checking_assert (t == arg);
> > >   }
> > > -  else if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
> > > +
> > > +  if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
> > >   return unify_type_mismatch (explain_p, parm, arg);
> > > return unify_success (explain_p);
> > >   diff --git a/gcc/testsuite/g++.dg/template/unify13.C
> > > b/gcc/testsuite/g++.dg/template/unify13.C
> > > new file mode 100644
> > > index ..ec7ca9d17a44
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/template/unify13.C
> > > @@ -0,0 +1,18 @@
> > > +// PR c++/120161
> > > +
> > > +template
> > > +struct mp_list { };
> > > +
> > > +template
> > > +struct Wrap { struct type { }; };
> > > +
> > > +struct A : mp_list::type, void>
> > > + , mp_list::type, void> { };
> > > +
> > > +template
> > > +void f(mp_list::type, U>*);
> > > +
> > > +int main() {
> > > +  A a;
> > > +  f(&a);
> > > +}
> > 
> > 
> 



Re: [PATCH 2/6] RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE

2025-05-14 Thread Vineet Gupta
On 5/13/25 10:07, Vineet Gupta wrote:
>
>
> On 5/10/25 07:20, Jeff Law wrote:
>> On 5/9/25 2:27 PM, Vineet Gupta wrote:
>>> This is effectively reverting e5d1f538bb7d
>>> "(RISC-V: Allow different dynamic floating point mode to be merged)"
>>> while retaining the testcase.
>>>
>>> The change itself is valid, however it obfuscates the deficiencies in
>>> current frm mode switching code.
>>>
>>> Also for a SPEC2017 -Ofast -march=rv64gcv build, it ends up generating
>>> net more FRM restores (writes) vs. the rest of this changeset.
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/riscv.cc (riscv_dynamic_frm_mode_p): Remove.
>>> (riscv_mode_confluence): Ditto.
>>> (TARGET_MODE_CONFLUENCE): Ditto.
>> Unsure on this one.
>>
>>
>>
>>> -   /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible.
>>> -  Although we already try to set the mode needed to FRM_DYN after a
>>> -  function call, there are still some corner cases where both FRM_DYN
>>> -  and FRM_DYN_CALL may appear on incoming edges.  */
>> Do we have an understanding of these corner cases?  That's my biggest 
>> worry with simply removing this code.

Yes we do.

1. First argument is that W/ or W/o the patch, we have same results in the end.

(a) With Confluence patch + my changes on top we have the following

   a1-confluence
   a2-rem-edge-insert
   a3-remove-mode-after
   a4-reduce-frm-restore a5-call-backtrack
   
    frrm fsrmi fsrm  
    perlbench_r   17    0    1 17    0    1 
   cpugcc_r   11    0    0 11    0    0 
   bwaves_r   16    0    1 16    0    1 
  mcf_r   11    0    0 11    0    0 
   cactusBSSN_r   19    0    1 19    0    1 
 namd_r   14    0    1 14    0    1 
   parest_r   24    0    1 24    0    1 
   povray_r   26    1    6 26    1    6 
  lbm_r    6    0    0  6    0    0 
  omnetpp_r   17    0    1 17    0    1 
  wrf_r  411   13  164    613   13   82 
 cpuxalan_r   17    0    1 17    0    1 
   ldecod_r   11    0    0 11    0    0 
 x264_r   11    0    0 11    0    0 
  blender_r   37   12   16 39   12   16 
 cam4_r   37   13   17 40   13   17 
    deepsjeng_r   11    0    0 11    0    0 
  imagick_r   33   16   18 33   16   18 
    leela_r   12    0    0 12    0    0 
  nab_r   13    0    1 13    0    1 
    exchange2_r   16    0    1 16    0    1 
    fotonik3d_r   19    0    1 19    0    1 
 roms_r   21    0    1 21    0    1
   xz_r    6    0    0  6    0    0
 ---
 816   55  232   1023   55  150


(b) Revert the confluence patch + my changes, we still see the final result

>   a1-confluence  b1-revert-confluence   
> b2-rem-edge-insert    b4-reduce-frm-restores  b5-call-backtrack
>     
> b3-remove-mode-after  b6-readd-confluence
>   
> ---
>     perlbench_r    17    0    1   42    0    4   42    0    4 
>     17    0    1  17    0    1   
>    cpugcc_r    11    0    0  167    0   17  167    0   17 
>     11    0    0  11    0    0   
>    bwaves_r    16    0    1   16    0    1   16    0    1 
>     16    0    1  16    0    1   
>   mcf_r    11    0    0   11    0    0   11    0    0 
>     11    0    0  11    0    0   
>    cactusBSSN_r    19    0    1   79    0   27   76    0   27 
>     19    0    1  19    0    1   
>  namd_r    14    0    1  119    0   63  119    0   63 
>     14    0    1  14    0    1   
>    parest_r    24    0    1  218    0  114  168    0  114 
>     24    0    1  24    0    1   
>    povray_r    26    1    6  123    1   17  123    1   17 
>     26    1    6  26    1    6   
>   lbm_r 6    0    0    6    0    0    6    0    0 
>  6    0    0   6    0    0   
>   omnetpp_r    17    0    1   17    0    1   17    0    1 
>     17    0    1  17    0    1   
>   wrf_r   411   13  164 2287   13 1956 2287   13 1956 
>   1268   13 1603 613   13   82   
>  cpuxalan_r    17    0    1   17    0    1   17    0    1 
>     17    0    1  17    0    1   
>    ldecod_r    11    0    0   11    0    0 

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-14 Thread Richard Sandiford
Karl Meakin  writes:
>>> +  else
>>> +{
>>> +  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
>>> +operands[1], operands[2]);
>>> +  operands[2] = const0_rtx;
>>> +}
>>> +  }
>>> +)
>>> +
>>> @@ -758,6 +781,58 @@ (define_expand "cbranchcc4"
>>> ""
>>>   )
>>>   
>>> +;; Emit a `CB (register)` or `CB (immediate)` instruction.
>>> +(define_insn "aarch64_cb"
>>> +  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>>> +   [(match_operand:GPI 1 "register_operand")
>>> +(match_operand:GPI 2 "aarch64_cb_operand")])
>>> +  (label_ref (match_operand 3))
>>> +  (pc)))]
>>> +  "TARGET_CMPBR"
>>> +  "cb%m0\\t%1, %2, %l3";
>>> +  [(set_attr "type" "branch")
>>> +   (set (attr "length")
>>> +   (if_then_else (and (ge (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_N_1Kib))
>>> +  (lt (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_P_1Kib)))
>>> + (const_int 4)
>>> + (const_int 8)))
>>> +   (set (attr "far_branch")
>>> +   (if_then_else (and (ge (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_N_1Kib))
>>> +  (lt (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_P_1Kib)))
>>> + (const_string "no")
>>> + (const_string "yes")))]
>>> +)
>>> +
>>> +;; Emit a `CBB (register)` or `CBH (register)` instruction.
>>> +(define_insn "aarch64_cb"
>>> +  [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>>> +   [(match_operand:SHORT 1 "register_operand")
>>> +(match_operand:SHORT 2 
>>> "aarch64_cb_short_operand")])
>>> +  (label_ref (match_operand 3))
>>> +  (pc)))]
>>> +  "TARGET_CMPBR"
>>> +  "cb%m0\\t%1, %2, %l3";
>>> +  [(set_attr "type" "branch")
>>> +   (set (attr "length")
>>> +   (if_then_else (and (ge (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_N_1Kib))
>>> +  (lt (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_P_1Kib)))
>>> + (const_int 4)
>>> + (const_int 8)))
>>> +   (set (attr "far_branch")
>>> +   (if_then_else (and (ge (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_N_1Kib))
>>> +  (lt (minus (match_dup 3) (pc))
>>> +  (const_int BRANCH_LEN_P_1Kib)))
>>> + (const_string "no")
>>> + (const_string "yes")))]
>>> +)
>>> +
>> The patch defines cmpbr_suffix to handle :GPI as well as :SHORT.
>> It looks like the main remaining difference between these two
>> patterns is the predicate (and as Kyrill says, the constraint)
>> on operand 2.  It would be possible to handle that difference
>> using mode attributes too, such as:
>>
>>"aarch64_" "r"
>>
>> with cb_operand defined to cb_short_operand for :SHORT and
>> cb_operand for :GPI.  Similarly  would map to Z for
>> :SHORT and a new constraint for :GPI.  I don't know that that's
>> better, just though I'd mention it in case.
>>
>> A slight wrinkle is that the CB immediate instruction requires CBLT
>> rather than CBLE, etc.  IIRC, GCC canonicalises in the opposite
>> direction, preferring LEU over LTU, etc.
>>
>> So I think we might need a custom version of aarch64_comparison_operator
>> that checks whether the immediate is in the range [0, 63] for the "native"
>> cmoparisons and an appropriate variant for the "non-native" comparisons
>> (LE, GE, LEU, GEU).  The output asm section would then need to adjust
>> the instruction accordingly before printing it out.
>
> I think that will be handled for us by the assembler: `CBLE x0 42` will 
> be rewritten to `CBLT x0 43` etc

Ah, ok, so we can just emit the natural asm.  But I think the rest still
stands.  We'd need to use different constraints, since the lower and upper
bounds of CBLE are one less than the corresponding bounds of CBLT.

It would also be good to make the tests accept both the alias and
non-alias forms.

Richard


Re: [PATCH] Enhance -fopt-info-vec vectorized loop diagnostic

2025-05-14 Thread Richard Sandiford
Richard Biener  writes:
> The following includes whether we vectorize an epilogue, whether
> we use loop masking and what vectorization factor (unroll factor)
> we use.  So it's now
>
> t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 
> 32
> t.c:4:21: optimized: epilogue loop vectorized using masked 64 byte vectors 
> and unroll factor 32
>
> for a masked epilogue with AVX512 and HImode data for example.  Rather
> than
>
> t.c:4:21: optimized: loop vectorized using 64 byte vectors
> t.c:4:21: optimized: loop vectorized using 64 byte vectors
>
> I verified we don't translate opt-info messages and thus excessive
> use of %s to compose the strings should be OK.
>
> Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu
> (merely to look for testcases scanning for the old message too
> closely).
>
> Any comments or suggestions for improvements?

It might be worth adding the vector-level unroll factor for the
variable-length case (suggested_unroll_factor), but that could be
a future change.

So LGTM FWIW.  Thanks for doing this.

Richard

>
>   * tree-vectorizer.cc (vect_transform_loops): When diagnosing
>   a vectorized loop indicate whether we vectorized an epilogue,
>   whether we used masked vectors and what unroll factor was
>   used.
> ---
>  gcc/tree-vectorizer.cc | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index 447f882c518..2f77e46ba99 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -1026,10 +1026,19 @@ vect_transform_loops (hash_table 
> *&simduid_to_vf_htab,
>  {
>if (GET_MODE_SIZE (loop_vinfo->vector_mode).is_constant (&bytes))
>   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> -  "loop vectorized using %wu byte vectors\n", bytes);
> +  "%sloop vectorized using %s%wu byte vectors and"
> +  " unroll factor %u\n",
> +  LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  ? "epilogue " : "",
> +  LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> +  ? "masked " : "", bytes,
> +  (unsigned int) LOOP_VINFO_VECT_FACTOR
> +  (loop_vinfo).to_constant ());
>else
>   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> -  "loop vectorized using variable length vectors\n");
> +  "%sloop vectorized using variable length vectors\n",
> +  LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  ? "epilogue " : "");
>  }
>  
>loop_p new_loop = vect_transform_loop (loop_vinfo,


Re: [PATCH 4/5] c++, coroutines: Use decltype(auto) for the g_r_o.

2025-05-14 Thread Jason Merrill

On 5/13/25 10:30 AM, Iain Sandoe wrote:

The revised wording for coroutines, uses decltype(auto) for the
type of the get return object, which preserves references. The
test is expected to fail, since it attempts to initialize the
return object from an object that has already been destroyed.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Use
decltype(auto) to determine the type of the temporary
get_return_object.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr115908.C: Count promise construction
and destruction.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc   | 22 ---
  gcc/testsuite/g++.dg/coroutines/pr115908.C | 69 +++---
  2 files changed, 60 insertions(+), 31 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 42f6e32e89c..ce3e022a516 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -5120,8 +5120,11 @@ cp_coroutine_transform::build_ramp_function ()
/* Check for a bad get return object type.
   [dcl.fct.def.coroutine] / 7 requires:
   The expression promise.get_return_object() is used to initialize the
- returned reference or prvalue result object ... */
-  tree gro_type = TREE_TYPE (get_ro);
+ returned reference or prvalue result object ...
+ When we use a local to hold this, it is decltype(auto).  */
+  tree gro_type
+= finish_decltype_type (get_ro, /*id_expression_or_member_access_p*/true,


This should be false, not true; a call is not an id-expr or member access.


+   tf_warning_or_error); // TREE_TYPE (get_ro);
if (VOID_TYPE_P (gro_type) && !void_ramp_p)
  {
error_at (fn_start, "no viable conversion from % provided by"
@@ -5129,11 +5132,6 @@ cp_coroutine_transform::build_ramp_function ()
return false;
  }
  
-  /* Initialize the resume_idx_var to 0, meaning "not started".  */

-  coro_build_and_push_artificial_var_with_dve
-(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
- build_zero_cst (short_unsigned_type_node), deref_fp);


Moving this initialization doesn't seem connected to the type of gro, or 
mentioned above?



/* [dcl.fct.def.coroutine] / 7
   The expression promise.get_return_object() is used to initialize the
   glvalue result or prvalue result object of a call to a coroutine.  */
@@ -5153,7 +5151,7 @@ cp_coroutine_transform::build_ramp_function ()
= coro_build_and_push_artificial_var (loc, "_Coro_gro", gro_type,
  orig_fn_decl, NULL_TREE);
  
-  r = cp_build_init_expr (coro_gro, get_ro);

+  r = cp_build_init_expr (coro_gro, STRIP_REFERENCE_REF (get_ro));
finish_expr_stmt (r);
tree coro_gro_cleanup
= cxx_maybe_build_cleanup (coro_gro, tf_warning_or_error);
@@ -5161,6 +5159,11 @@ cp_coroutine_transform::build_ramp_function ()
push_cleanup (coro_gro, coro_gro_cleanup, /*eh_only*/false);
  }
  
+  /* Initialize the resume_idx_var to 0, meaning "not started".  */

+  coro_build_and_push_artificial_var_with_dve
+(loc, coro_resume_index_id, short_unsigned_type_node,  orig_fn_decl,
+ build_zero_cst (short_unsigned_type_node), deref_fp);
+
/* Start the coroutine body.  */
r = build_call_expr_loc (fn_start, resumer, 1, coro_fp);
finish_expr_stmt (r);
@@ -5179,7 +5182,8 @@ cp_coroutine_transform::build_ramp_function ()
/* The ramp is done, we just need the return statement, which we build from
   the return object we constructed before we called the function body.  */
  
-  finish_return_stmt (void_ramp_p ? NULL_TREE : coro_gro);

+  r = void_ramp_p ? NULL_TREE : convert_from_reference (coro_gro);
+  finish_return_stmt (r);
  
if (flag_exceptions)

  {
diff --git a/gcc/testsuite/g++.dg/coroutines/pr115908.C 
b/gcc/testsuite/g++.dg/coroutines/pr115908.C
index ac27d916de2..6956c83a8df 100644
--- a/gcc/testsuite/g++.dg/coroutines/pr115908.C
+++ b/gcc/testsuite/g++.dg/coroutines/pr115908.C
@@ -6,23 +6,28 @@
  
  struct Promise;
  
-bool promise_live = false;

+int promise_life = 0;
  
  struct Handle : std::coroutine_handle {

+
+#if 1
+/* We now expect the handle to be created after the promise is destroyed.  
*/


Yes, though this is terrible, as noted in my email to core today.  Why 
doesn't this also break folly::Optional/Expected?


This shouldn't block this patch series, but ISTM we'll want something 
further to address this issue.



  Handle(Promise &p) : 
std::coroutine_handle(Handle::from_promise(p)) {
-if (!promise_live)
-  __builtin_abort ();
  #ifdef OUTPUT
-std::cout << "Handle(Promise &)\n";
+std::cout << "Handle(Promise &) " << promise_life << std::endl;
  #endif
-}
-Handle(Promise &&p) : 
std::coroutine_handle(Handle::from_promise(p)) {
-if (!promise_live)
+ if (promise_life <= 0)
__bu

Re: [PATCH 5/5] c++, coroutines: Clean up the ramp cleanups.

2025-05-14 Thread Jason Merrill

On 5/13/25 10:30 AM, Iain Sandoe wrote:

This replaces the cleanup try-catch block in the ramp with a series of
eh-only cleanup statements.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Replace ramp
cleanup try-catch block with eh-only cleanup statements.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc | 207 +++
  1 file changed, 69 insertions(+), 138 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index ce3e022a516..299e36fd3c2 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4866,39 +4866,6 @@ cp_coroutine_transform::build_ramp_function ()
coro_fp = pushdecl (coro_fp);
add_decl_expr (coro_fp);
  
-  tree coro_promise_live = NULL_TREE;

-  if (flag_exceptions)
-{
-  /* Signal that we need to clean up the promise object on exception.  */
-  coro_promise_live
-   = coro_build_and_push_artificial_var (loc, "_Coro_promise_live",
- boolean_type_node, orig_fn_decl,
- boolean_false_node);
-
-  /* To signal that we need to cleanup copied function args.  */
-  if (DECL_ARGUMENTS (orig_fn_decl))
-   for (tree arg = DECL_ARGUMENTS (orig_fn_decl); arg != NULL;
-arg = DECL_CHAIN (arg))
- {
-   param_info *parm_i = param_uses.get (arg);
-   if (parm_i->trivial_dtor)
- continue;
-   parm_i->guard_var = pushdecl (parm_i->guard_var);
-   add_decl_expr (parm_i->guard_var);
- }
-}
-
-  /* deref the frame pointer, to use in member access code.  */
-  tree deref_fp
-= cp_build_indirect_ref (loc, coro_fp, RO_UNARY_STAR,
-tf_warning_or_error);
-  tree frame_needs_free
-= coro_build_and_push_artificial_var_with_dve (loc,
-  coro_frame_needs_free_id,
-  boolean_type_node,
-  orig_fn_decl, NULL_TREE,
-  deref_fp);
-
/* Build the frame.  */
  
/* The CO_FRAME internal function is a mechanism to allow the middle end

@@ -4942,25 +4909,24 @@ cp_coroutine_transform::build_ramp_function ()
finish_if_stmt (if_stmt);
  }
  
+  /* deref the frame pointer, to use in member access code.  */

+  tree deref_fp
+= cp_build_indirect_ref (loc, coro_fp, RO_UNARY_STAR,
+tf_warning_or_error);
+
/* For now, once allocation has succeeded we always assume that this needs
   destruction, there's no impl. for frame allocation elision.  */
-  r = cp_build_init_expr (frame_needs_free, boolean_true_node);
-  finish_expr_stmt (r);
-
-  /* Set up the promise.  */
-  tree p
-= coro_build_and_push_artificial_var_with_dve (loc, coro_promise_id,
-  promise_type, orig_fn_decl,
-  NULL_TREE, deref_fp);
+  tree frame_needs_free
+= coro_build_and_push_artificial_var_with_dve (loc,
+  coro_frame_needs_free_id,
+  boolean_type_node,
+  orig_fn_decl,
+  boolean_true_node,
+  deref_fp);
+  /* Although it appears to be unused here the frame entry is needed and we
+ just set it true.  */
+  TREE_USED (frame_needs_free) = true;
  
-  /* Up to now any exception thrown will propagate directly to the caller.

- This is OK since the only source of such exceptions would be in allocation
- of the coroutine frame, and therefore the ramp will not have initialized
- any further state.  From here, we will track state that needs explicit
- destruction in the case that promise or g.r.o setup fails or an exception
- is thrown from the initial suspend expression.  */
-  tree ramp_try_block = NULL_TREE;
-  tree ramp_try_stmts = NULL_TREE;
tree iarc_x = NULL_TREE;
tree coro_before_return = NULL_TREE;
if (flag_exceptions)
@@ -4976,8 +4942,17 @@ cp_coroutine_transform::build_ramp_function ()
   orig_fn_decl,
   boolean_false_node,
   deref_fp);
-  ramp_try_block = begin_try_block ();
-  ramp_try_stmts = begin_compound_stmt (BCS_TRY_BLOCK);
+  tree frame_cleanup = push_stmt_list ();
+  tree do_fr_cleanup
+   = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
+  do_fr_cleanup = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node,
+ do_fr_cleanup, coro_before_return);


This al

Re: [PATCH 1/2] forwprop: Fix looping after fold_stmt and some forwprop local folds happen

2025-05-14 Thread Richard Biener



> Am 13.05.2025 um 19:24 schrieb Andrew Pinski :
> 
> r10-2587-gcc19f80ceb27cc added a loop over the current statment if there was
> a change. Except in some cases it turns out changed will turn from true to 
> false
> because instead of doing |= after the fold_stmt, there was an just an `=`.
> This fixes that and now we loop even if fold_stmt changed the statement and
> there was a local fold that happened.

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-ssa-forwprop.cc (pass_forwprop::execute): Use `|=` for
>changed on the local folding.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/tree-ssa-forwprop.cc | 14 +++---
> 1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index fafc4d6b77a..bcdec1aadc3 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -4564,7 +4564,7 @@ pass_forwprop::execute (function *fun)
>  bitmap_set_bit (to_purge, bb->index);
>if (did_something == 2)
>  cfg_changed = true;
> -changed = did_something != 0;
> +changed |= did_something != 0;
>  }
>else if ((code == PLUS_EXPR
>  || code == BIT_IOR_EXPR
> @@ -4580,15 +4580,15 @@ pass_forwprop::execute (function *fun)
>  }
>else if (code == CONSTRUCTOR
> && TREE_CODE (TREE_TYPE (rhs1)) == VECTOR_TYPE)
> -  changed = simplify_vector_constructor (&gsi);
> +  changed |= simplify_vector_constructor (&gsi);
>else if (code == ARRAY_REF)
> -  changed = simplify_count_trailing_zeroes (&gsi);
> +  changed |= simplify_count_trailing_zeroes (&gsi);
>break;
>  }
> 
>case GIMPLE_SWITCH:
> -  changed = simplify_gimple_switch (as_a  (stmt),
> -edges_to_remove);
> +  changed |= simplify_gimple_switch (as_a  (stmt),
> + edges_to_remove);
>  break;
> 
>case GIMPLE_COND:
> @@ -4597,7 +4597,7 @@ pass_forwprop::execute (function *fun)
>(as_a  (stmt));
>if (did_something == 2)
>  cfg_changed = true;
> -changed = did_something != 0;
> +changed |= did_something != 0;
>break;
>  }
> 
> @@ -4606,7 +4606,7 @@ pass_forwprop::execute (function *fun)
>tree callee = gimple_call_fndecl (stmt);
>if (callee != NULL_TREE
>&& fndecl_built_in_p (callee, BUILT_IN_NORMAL))
> -  changed = simplify_builtin_call (&gsi, callee);
> +  changed |= simplify_builtin_call (&gsi, callee);
>break;
>  }
> 
> --
> 2.43.0
> 


Re: [PATCH 1/2] forwprop: Fix looping after fold_stmt and some forwprop local folds happen

2025-05-14 Thread Richard Biener



> Am 13.05.2025 um 19:24 schrieb Andrew Pinski :
> 
> r10-2587-gcc19f80ceb27cc added a loop over the current statment if there was
> a change. Except in some cases it turns out changed will turn from true to 
> false
> because instead of doing |= after the fold_stmt, there was an just an `=`.
> This fixes that and now we loop even if fold_stmt changed the statement and
> there was a local fold that happened.

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-ssa-forwprop.cc (pass_forwprop::execute): Use `|=` for
>changed on the local folding.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/tree-ssa-forwprop.cc | 14 +++---
> 1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index fafc4d6b77a..bcdec1aadc3 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -4564,7 +4564,7 @@ pass_forwprop::execute (function *fun)
>  bitmap_set_bit (to_purge, bb->index);
>if (did_something == 2)
>  cfg_changed = true;
> -changed = did_something != 0;
> +changed |= did_something != 0;
>  }
>else if ((code == PLUS_EXPR
>  || code == BIT_IOR_EXPR
> @@ -4580,15 +4580,15 @@ pass_forwprop::execute (function *fun)
>  }
>else if (code == CONSTRUCTOR
> && TREE_CODE (TREE_TYPE (rhs1)) == VECTOR_TYPE)
> -  changed = simplify_vector_constructor (&gsi);
> +  changed |= simplify_vector_constructor (&gsi);
>else if (code == ARRAY_REF)
> -  changed = simplify_count_trailing_zeroes (&gsi);
> +  changed |= simplify_count_trailing_zeroes (&gsi);
>break;
>  }
> 
>case GIMPLE_SWITCH:
> -  changed = simplify_gimple_switch (as_a  (stmt),
> -edges_to_remove);
> +  changed |= simplify_gimple_switch (as_a  (stmt),
> + edges_to_remove);
>  break;
> 
>case GIMPLE_COND:
> @@ -4597,7 +4597,7 @@ pass_forwprop::execute (function *fun)
>(as_a  (stmt));
>if (did_something == 2)
>  cfg_changed = true;
> -changed = did_something != 0;
> +changed |= did_something != 0;
>break;
>  }
> 
> @@ -4606,7 +4606,7 @@ pass_forwprop::execute (function *fun)
>tree callee = gimple_call_fndecl (stmt);
>if (callee != NULL_TREE
>&& fndecl_built_in_p (callee, BUILT_IN_NORMAL))
> -  changed = simplify_builtin_call (&gsi, callee);
> +  changed |= simplify_builtin_call (&gsi, callee);
>break;
>  }
> 
> --
> 2.43.0
> 


[PATCH v2 3/3] libstdc++: Renamed bits/move_only_function.h to bits/funcwrap.h [PR119125]

2025-05-14 Thread Tomasz Kamiński
The file now includes copyable_function in addition to
move_only_function.

PR libstdc++/119125

libstdc++-v3/ChangeLog:
* include/bits/move_only_function.h: Move to...
* include/bits/funcwrap.h: ...here.
* doc/doxygen/stdheader.cc (init_map): Replaced move_only_function.h
with funcwrap.h.
* include/Makefile.am: Likewise.
* include/Makefile.in: Likewise.
* include/std/functional: Likewise.
---
 libstdc++-v3/doc/doxygen/stdheader.cc | 2 +-
 libstdc++-v3/include/Makefile.am  | 2 +-
 libstdc++-v3/include/Makefile.in  | 2 +-
 .../include/bits/{move_only_function.h => funcwrap.h} | 8 
 libstdc++-v3/include/std/functional   | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)
 rename libstdc++-v3/include/bits/{move_only_function.h => funcwrap.h} (98%)

diff --git a/libstdc++-v3/doc/doxygen/stdheader.cc 
b/libstdc++-v3/doc/doxygen/stdheader.cc
index 8a201334410..839bfc81bc0 100644
--- a/libstdc++-v3/doc/doxygen/stdheader.cc
+++ b/libstdc++-v3/doc/doxygen/stdheader.cc
@@ -55,7 +55,7 @@ void init_map()
 headers["functional_hash.h"]= "functional";
 headers["mofunc_impl.h"]= "functional";
 headers["cpyfunc_impl.h"]   = "functional";
-headers["move_only_function.h"] = "functional";
+headers["funcwrap.h"]   = "functional";
 headers["invoke.h"] = "functional";
 headers["ranges_cmp.h"] = "functional";
 headers["refwrap.h"]= "functional";
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 5cc13381b02..3e5b6c4142e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -205,6 +205,7 @@ bits_headers = \
${bits_srcdir}/fs_ops.h \
${bits_srcdir}/fs_path.h \
${bits_srcdir}/fstream.tcc \
+   ${bits_srcdir}/funcwrap.h \
${bits_srcdir}/gslice.h \
${bits_srcdir}/gslice_array.h \
${bits_srcdir}/hashtable.h \
@@ -224,7 +225,6 @@ bits_headers = \
${bits_srcdir}/mask_array.h \
${bits_srcdir}/memory_resource.h \
${bits_srcdir}/mofunc_impl.h \
-   ${bits_srcdir}/move_only_function.h \
${bits_srcdir}/new_allocator.h \
${bits_srcdir}/node_handle.h \
${bits_srcdir}/ostream.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 6e5e97aa236..3531162b5f7 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -558,6 +558,7 @@ bits_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/fs_ops.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/fs_path.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/fstream.tcc \
+@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/funcwrap.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/gslice.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/gslice_array.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/hashtable.h \
@@ -577,7 +578,6 @@ bits_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/mask_array.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/memory_resource.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/mofunc_impl.h \
-@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/move_only_function.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/new_allocator.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/node_handle.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/ostream.tcc \
diff --git a/libstdc++-v3/include/bits/move_only_function.h 
b/libstdc++-v3/include/bits/funcwrap.h
similarity index 98%
rename from libstdc++-v3/include/bits/move_only_function.h
rename to libstdc++-v3/include/bits/funcwrap.h
index ecaded79d37..aa4b962c234 100644
--- a/libstdc++-v3/include/bits/move_only_function.h
+++ b/libstdc++-v3/include/bits/funcwrap.h
@@ -22,13 +22,13 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-/** @file include/bits/move_only_function.h
+/** @file include/bits/funcwrap.h
  *  This is an internal header file, included by other library headers.
  *  Do not attempt to use it directly. @headername{functional}
  */
 
-#ifndef _GLIBCXX_MOVE_ONLY_FUNCTION_H
-#define _GLIBCXX_MOVE_ONLY_FUNCTION_H 1
+#ifndef _GLIBCXX_FUNCWRAP_H
+#define _GLIBCXX_FUNCWRAP_H 1
 
 #ifdef _GLIBCXX_SYSHDR
 #pragma GCC system_header
@@ -504,4 +504,4 @@ _GLIBCXX_END_NAMESPACE_VERSION
 #endif // __glibcxx_copyable_function
 
 #endif // __cplusplus > 202002L && _GLIBCXX_HOSTED
-#endif // _GLIBCXX_MOVE_ONLY_FUNCTION_H
+#endif // _GLIBCXX_FUNCWRAP_H
diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 46179998eeb..1f9c7df1891 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -73,7 +73,7 @@
 # include 
 #endif
 #if __cplusplus > 202002L && _GLIBCXX_HOSTED
-# include 
+# include 
 #endif
 
 #define __glibcxx_want_boyer_moore_searcher
-- 
2.49.0



Re: [PATCH 3/3] gimple: Move canonicalization of bool==0 and bool!=1 to cleanupcfg

2025-05-14 Thread Richard Biener



> Am 14.05.2025 um 03:12 schrieb Andrew Pinski :
> 
> This moves the canonicalization of `bool==0` and `bool!=1` from
> forwprop to cleanupcfg. We will still need to call it from forwprop
> so we don't need to call forwprop a few times to fp comparisons in some
> cases (forwprop-16.c was added originally for this code even).
> 
> This is the first step in removing forward_propagate_into_gimple_cond
> and forward_propagate_into_comparison.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-cfgcleanup.cc (canonicalize_bool_cond): New function.
>(cleanup_control_expr_graph): Call canonicalize_bool_cond for GIMPLE_COND.
>* tree-cfgcleanup.h (canonicalize_bool_cond): New declaration.
>* tree-ssa-forwprop.cc (forward_propagate_into_gimple_cond):
>Call canonicalize_bool_cond.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/tree-cfgcleanup.cc   | 39 +++
> gcc/tree-cfgcleanup.h|  1 +
> gcc/tree-ssa-forwprop.cc | 18 ++
> 3 files changed, 42 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
> index 38a62499f93..66575393a44 100644
> --- a/gcc/tree-cfgcleanup.cc
> +++ b/gcc/tree-cfgcleanup.cc
> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-into-ssa.h"
> #include "tree-cfgcleanup.h"
> #include "target.h"
> +#include "gimple-pretty-print.h"
> 
> 
> /* The set of blocks in that at least one of the following changes happened:
> @@ -123,6 +124,41 @@ convert_single_case_switch (gswitch *swtch, 
> gimple_stmt_iterator &gsi)
>   return true;
> }
> 
> +/* Canonicalize _Bool == 0 and _Bool != 1 to _Bool != 0 of STMT in BB by
> +   swapping edges of the BB.  */
> +bool
> +canonicalize_bool_cond (gcond *stmt, basic_block bb)
> +{
> +  tree rhs1 = gimple_cond_lhs (stmt);
> +  tree rhs2 = gimple_cond_rhs (stmt);
> +  enum tree_code code = gimple_cond_code (stmt);
> +  if (code != EQ_EXPR && code != NE_EXPR)
> +return false;
> +  if (TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE
> +  && (!INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
> +   || TYPE_PRECISION (TREE_TYPE (rhs1)) != 1))
> +return false;
> +
> +  /* Canonicalize _Bool == 0 and _Bool != 1 to _Bool != 0 by swapping edges. 
>  */
> +  if (code == EQ_EXPR && !integer_zerop (rhs2))
> +return false;
> +  if (code == NE_EXPR && !integer_onep (rhs2))
> +return false;
> +
> +  gimple_cond_set_code (stmt, NE_EXPR);
> +  gimple_cond_set_rhs (stmt, build_zero_cst (TREE_TYPE (rhs1)));
> +  EDGE_SUCC (bb, 0)->flags ^= (EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
> +  EDGE_SUCC (bb, 1)->flags ^= (EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
> +
> +  if (dump_file)
> +{
> +  fprintf (dump_file, "  Swapped '");
> +  print_gimple_expr (dump_file, stmt, 0);
> +  fprintf (dump_file, "'\n");
> +}
> +  return true;
> +}
> +
> /* Disconnect an unreachable block in the control expression starting
>at block BB.  */
> 
> @@ -146,6 +182,9 @@ cleanup_control_expr_graph (basic_block bb, 
> gimple_stmt_iterator gsi)
>  && convert_single_case_switch (as_a (stmt), gsi))
>stmt = gsi_stmt (gsi);
> 
> +  if (gimple_code (stmt) == GIMPLE_COND)
> +canonicalize_bool_cond (as_a (stmt), bb);
> +
>   fold_defer_overflow_warnings ();
>   switch (gimple_code (stmt))
>{
> diff --git a/gcc/tree-cfgcleanup.h b/gcc/tree-cfgcleanup.h
> index 83c857fe33a..94b430e0c71 100644
> --- a/gcc/tree-cfgcleanup.h
> +++ b/gcc/tree-cfgcleanup.h
> @@ -28,5 +28,6 @@ extern bool delete_unreachable_blocks_update_callgraph 
> (cgraph_node *dst_node,
>bool update_clones);
> extern unsigned clean_up_loop_closed_phi (function *);
> extern bool phi_alternatives_equal (basic_block, edge, edge);
> +extern bool canonicalize_bool_cond (gcond *stmt, basic_block bb);
> 
> #endif /* GCC_TREE_CFGCLEANUP_H */
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index bd407ef8a69..d718d8f7faf 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -579,22 +579,8 @@ forward_propagate_into_gimple_cond (gcond *stmt)
>   return (cfg_changed || is_gimple_min_invariant (tmp)) ? 2 : 1;
> }
> 
> -  /* Canonicalize _Bool == 0 and _Bool != 1 to _Bool != 0 by swapping edges. 
>  */
> -  if ((TREE_CODE (TREE_TYPE (rhs1)) == BOOLEAN_TYPE
> -   || (INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
> -   && TYPE_PRECISION (TREE_TYPE (rhs1)) == 1))
> -  && ((code == EQ_EXPR
> -   && integer_zerop (rhs2))
> -  || (code == NE_EXPR
> -  && integer_onep (rhs2
> -{
> -  basic_block bb = gimple_bb (stmt);
> -  gimple_cond_set_code (stmt, NE_EXPR);
> -  gimple_cond_set_rhs (stmt, build_zero_cst (TREE_TYPE (rhs1)));
> -  EDGE_SUCC (bb, 0)->flags ^= (EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
> -  EDGE_SUCC (bb, 1)->flags ^= (EDGE_TRUE_VALUE|EDGE_FALSE_VALUE);
> -  return 1;
> -}
> +  if (canonicalize_bool_cond (s

Re: [PATCH 1/3] forwprop: Change an if into an assert

2025-05-14 Thread Richard Biener



> Am 14.05.2025 um 03:13 schrieb Andrew Pinski :
> 
> Since the merge of the tuples branch (r0-88576-g726a989a8b74bf), the
> if:
> ```
>  if (TREE_CODE_CLASS (gimple_cond_code (stmt)) != tcc_comparison)
> ```
> Will always be false so let's change it into an assert.

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-ssa-forwprop.cc (forward_propagate_into_gimple_cond): Assert
>that gimple_cond_code is always a comparison.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/tree-ssa-forwprop.cc | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index fafc4d6b77a..bd407ef8a69 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -551,9 +551,8 @@ forward_propagate_into_gimple_cond (gcond *stmt)
>   tree rhs1 = gimple_cond_lhs (stmt);
>   tree rhs2 = gimple_cond_rhs (stmt);
> 
> -  /* We can do tree combining on SSA_NAME and comparison expressions.  */
> -  if (TREE_CODE_CLASS (gimple_cond_code (stmt)) != tcc_comparison)
> -return 0;
> +  /* GIMPLE_COND will always be a comparison.  */
> +  gcc_assert (TREE_CODE_CLASS (gimple_cond_code (stmt)) == tcc_comparison);
> 
>   tmp = forward_propagate_into_comparison_1 (stmt, code,
> boolean_type_node,
> --
> 2.43.0
> 


[PATCH v2 1/3] libstdc++: Avoid double indirection in move_only_function when possible [PR119125]

2025-05-14 Thread Tomasz Kamiński
Based on the provision in C++26 [func.wrap.general] p2 this patch adjust the 
generic
move_only_function(_Fn&&) constructor, such that when _Fn refers to selected
move_only_function instantiations, the ownership of the target object is 
direclty
transfered to constructor object. This avoid cost of double indireciton in this 
situation.
We apply this also in C++23 mode.

We also fix handling of self assigments, to match behavior required by standard,
due use of copy and swap idiom.

An instantiations MF1 of move_only_function can transfer target of another
instantiation MF2, if it can be constructed via usual rules 
(__is_callable_from<_MF2>),
and their invoker are convertible (__is_invocer_convertible()), i.e.:
* MF1 is less noexcept than MF2,
* return types are the same after stripping cv-quals
* adujsted parameters type are the same (__poly::_param_t), i.e. param of types 
T and T&&
  are compatible for non-trivially copyable objects.
Compatiblity of cv ref qualification is checked via __is_callable_from<_MF2>.

To achieve above the generation of _M_invoke functions is moved to _Invoke class
templates, that only depends on noexcept, return type and adjusted parameter of 
the
signature. To make the invoker signature compatible between const and mutable
qualified signatures, we always accept _Storage as const& and perform a 
const_cast
for locally stored object. This approach guarantees that we never strip const 
from
const object.

Another benefit of this approach is that move_only_function
and move_only_function use same funciton pointer, which 
should
reduce binary size.

The _Storage and _Manager functionality was also extracted and adjusted from
_Mo_func base, in preparation for implementation for copyable_function and
function_ref. The _Storage was adjusted to store functions pointers as 
void(*)().
The manage function, now accepts _Op enum parameter, and supports additional
operations:
 * _Op::_Address stores address of target object in destination
 * _Op::_Copy, when enabled, copies from source to destination
Furthremore, we provide a type-independent mamange functions for handling all:
 * function pointer types
 * trivially copyable object stored locally.
Similary as in case of invoker, we always pass source as const (for copy),
and cast away constness in case of move operations, where we know that source
is mutable.

Finally, the new helpers are defined in __polyfunc internal namespace.

PR libstdc++/119125

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h: (std::move_only_function): Adjusted for
changes in bits/move_only_function.h
(move_only_function::move_only_function(_Fn&&)): Special case
move_only_functions with same invoker.
(move_only_function::operator=(move_only_function&&)): Handle self
assigment.
* include/bits/move_only_function.h (__polyfunc::_Ptrs)
(__polyfunc::_Storage): Refactored from _Mo_func::_Storage.
(__polyfunc::__param_t): Moved from move_only_function::__param_t.
(__polyfunc::_Base_invoker, __polyfunc::_Invoke): Refactored from
move_only_function::_S_invoke.
(__polyfunc::_Manager): Refactored from _Mo_func::_S_manager.
(std::_Mofunc_base): Moved into __polyfunc::_Mo_base with parts
extracted to __polyfunc::_Storage and __polyfunc::_Manager.
(__polyfunc::__deref_as, __polyfunc::__invoker_of)
(__polyfunc::__base_of, __polyfunc::__is_invoker_convertible): Define.
(std::__is_move_only_function_v): Renamed to
__is_polymorphic_function_v.
(std::__is_polymorphic_function_v): Renamed from
__is_move_only_function_v.
* testsuite/20_util/move_only_function/call.cc: Test for
functions pointers.
* testsuite/20_util/move_only_function/conv.cc: New test.
* testsuite/20_util/move_only_function/move.cc: Tests for
self assigment.
---
In addition to adjusting formatting and fixing typo, this update:
 * consistently call global new when placement new is used, and 
   non-global for heap allocations
 * moves _Invoker before _Manager.
The _Invoker can be supported for non hosted enviroment, as well
as function_ref.

 libstdc++-v3/include/bits/mofunc_impl.h   |  74 +--
 .../include/bits/move_only_function.h | 455 +-
 .../20_util/move_only_function/call.cc|  14 +
 .../20_util/move_only_function/conv.cc| 188 
 .../20_util/move_only_function/move.cc|  11 +
 5 files changed, 588 insertions(+), 154 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/move_only_function/conv.cc

diff --git a/libstdc++-v3/include/bits/mofunc_impl.h 
b/libstdc++-v3/include/bits/mofunc_impl.h
index 318a55e618f..5eb4b5a0047 100644
--- a/libstdc++-v3/include/bits/mofunc_impl.h
+++ b/libstdc++-v3/include/bits/mofunc_impl.h
@@ -62,8 +62,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class move_only_function<_Res(_ArgTypes...) _GLIBCXX_M

Re: [PATCH 2/3] gimple: Add assert for code being a comparison in gimple_cond_set_code

2025-05-14 Thread Richard Biener



> Am 14.05.2025 um 03:12 schrieb Andrew Pinski :
> 
> We have code later on that verifies the code is a comparison. So let's
> try to catch it earlier. So it is easier to debug where the incorrect code
> gets set.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

> gcc/ChangeLog:
> 
>* gimple.h (gimple_cond_set_code): Add assert of the code
>being a comparison.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/gimple.h | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 977ff1c923c..94d5a13fcb2 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -3716,6 +3716,7 @@ gimple_cond_code (const gimple *gs)
> inline void
> gimple_cond_set_code (gcond *gs, enum tree_code code)
> {
> +  gcc_gimple_checking_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>   gs->subcode = code;
> }
> 
> --
> 2.43.0
> 


[PATCH] libgcobol: Add multilib support

2025-05-14 Thread Rainer Orth
Prompted by Jakub's recent work to enable a 32-bit biarch gcobol to
compile 64-bit COBOL code, I tried to bootstrap with cobol included on
Solaris/i386, Linux/i686, Darwin/i386, and Solaris/sparc.

While the builds mostly finished, all tests failed since the 64-bit
libgcobol was missing.  As it turns out, libgcobol currently lacks
multilib support, which this patch adds.  Unlike some runtime libs that
can get away without setting AM_MAKEFLAGS and friends, libgcobol can not
since it then tries to link the 64-bit libgcobol with 32-bit libstdc++.

Bootstrapped on i386-pc-solaris2.11, amd64-pc-solaris2.11,
i686-pc-linux-gnu, x86_64-pc-linux-gnu, x86_64-apple-darwin20.6.0,
sparc-sun-solaris2.11, and sparcv9-sun-solaris2.11.

A i386-apple-darwin15.6.0 bootstrap still fails due to PR cobol/119975
(unportable use of clock_gettime).

On Solaris/x86 and Linux/x86, 64-bit cobol.dg test results were
identical between 32-bit-default and 64-bit-default configurations,
while on Solaris/SPARC the 32-bit-default build shows additions
failures.  However, given the sorry state of the cobol.dg testsuite on
big-endian (and strict-alignment) targets in general, I won't worry
about that now.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2025-05-12  Rainer Orth  

libgcobol:
* Makefile.am: Only wrap toolexeclib_LTLIBRARIES, toolexeclib_DATA
in BUILD_LIBGCOBOL.
(MAKEOVERRIDES, AM_MAKEFLAGS, FLAGS_TO_PASS): New macros.
($(top_srcdir)/../multilib.am): Include.
* Makefile.in: Regenerate.

# HG changeset patch
# Parent  cfa382c0315b5932a2b465bc1f4fa9ef7ef8fbdc
libgcobol: Add multilib support

diff --git a/libgcobol/Makefile.am b/libgcobol/Makefile.am
--- a/libgcobol/Makefile.am
+++ b/libgcobol/Makefile.am
@@ -25,10 +25,10 @@ ACLOCAL_AMFLAGS = -I .. -I ../config
 # May be used by various substitution variables.
 gcc_version := $(shell @get_gcc_base_ver@ $(top_srcdir)/../gcc/BASE-VER)
 
-# Skip the whole process if we are not building libgcobol.
 if BUILD_LIBGCOBOL
 toolexeclib_LTLIBRARIES  = libgcobol.la
 toolexeclib_DATA = libgcobol.spec
+endif
 
 ##
 ## 2.2.12 Automatic Dependency Tracking
@@ -66,4 +66,46 @@ libgcobol_la_LDFLAGS = $(LTLDFLAGS) $(LI
 	$(extra_ldflags_libgcobol) $(LIBS) $(version_arg)
 libgcobol_la_DEPENDENCIES = libgcobol.spec $(LIBQUADLIB_DEP)
 
-endif BUILD_LIBGCOBOL
+# Multilib support.
+MAKEOVERRIDES=
+
+# Work around what appears to be a GNU make bug handling MAKEFLAGS
+# values defined in terms of make variables, as is the case for CC and
+# friends when we are called from the top level Makefile.
+AM_MAKEFLAGS = \
+	"AR_FLAGS=$(AR_FLAGS)" \
+	"CC_FOR_BUILD=$(CC_FOR_BUILD)" \
+	"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
+	"CFLAGS=$(CFLAGS)" \
+	"CXXFLAGS=$(CXXFLAGS)" \
+	"CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
+	"CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
+	"EXPECT=$(EXPECT)" \
+	"INSTALL=$(INSTALL)" \
+	"INSTALL_DATA=$(INSTALL_DATA)" \
+	"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
+	"INSTALL_SCRIPT=$(INSTALL_SCRIPT)" \
+	"LDFLAGS=$(LDFLAGS)" \
+	"MAKE=$(MAKE)" \
+	"MAKEINFO=$(MAKEINFO) $(MAKEINFOFLAGS)" \
+	"SHELL=$(SHELL)" \
+	"RUNTESTFLAGS=$(RUNTESTFLAGS)" \
+	"exec_prefix=$(exec_prefix)" \
+	"infodir=$(infodir)" \
+	"libdir=$(libdir)" \
+	"includedir=$(includedir)" \
+	"prefix=$(prefix)" \
+	"AR=$(AR)" \
+	"AS=$(AS)" \
+	"LD=$(LD)" \
+	"RANLIB=$(RANLIB)" \
+	"NM=$(NM)" \
+	"NM_FOR_BUILD=$(NM_FOR_BUILD)" \
+	"NM_FOR_TARGET=$(NM_FOR_TARGET)" \
+	"DESTDIR=$(DESTDIR)" \
+	"WERROR=$(WERROR)"
+
+# Subdir rules rely on $(FLAGS_TO_PASS)
+FLAGS_TO_PASS = $(AM_MAKEFLAGS)
+
+include $(top_srcdir)/../multilib.am
diff --git a/libgcobol/Makefile.in b/libgcobol/Makefile.in
--- a/libgcobol/Makefile.in
+++ b/libgcobol/Makefile.in
@@ -113,8 +113,8 @@ host_triplet = @host@
 target_triplet = @target@
 
 # Handle embedded rpaths for Darwin.
-@BUILD_LIBGCOBOL_TRUE@@ENABLE_DARWIN_AT_RPATH_TRUE@am__append_1 = -Wc,-nodefaultrpaths \
-@BUILD_LIBGCOBOL_TRUE@@ENABLE_DARWIN_AT_RPATH_TRUE@	-Wl,-rpath,@loader_path
+@ENABLE_DARWIN_AT_RPATH_TRUE@am__append_1 = -Wc,-nodefaultrpaths \
+@ENABLE_DARWIN_AT_RPATH_TRUE@	-Wl,-rpath,@loader_path
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/depstand.m4 \
@@ -175,10 +175,8 @@ am__installdirs = "$(DESTDIR)$(toolexecl
 	"$(DESTDIR)$(toolexeclibdir)"
 LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
 libgcobol_la_LIBADD =
-@BUILD_LIBGCOBOL_TRUE@am_libgcobol_la_OBJECTS = charmaps.lo \
-@BUILD_LIBGCOBOL_TRUE@	constants.lo gfileio.lo gmath.lo \
-@BUILD_LIBGCOBOL_TRUE@	intrinsic.lo io.lo libgcobol.lo \
-@BUILD_LIBGCOBOL_TRUE@	valconv.lo
+am_libgcobol_la_OBJECTS = charmaps.lo constants.lo gfileio.lo gmath.lo \
+	intrinsic.lo io.lo libgcobol.lo valconv.lo
 libgcobol_la_OBJECTS = $(am_libgcobol_la_OBJECTS)
 @BUILD_LIBGCOBOL_TRUE@am_libgcobol_la_rpath = -rpath $(toolexeclibdir)
 AM_V_P = $(am__v_P_@AM_V@)
@

[PATCH] libiberty: remove duplicated declaration of mkstemps

2025-05-14 Thread Andreas Schwab
* libiberty.h (mkstemps): Remove duplicate.
---
 include/libiberty.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/include/libiberty.h b/include/libiberty.h
index d4e8791b14b..4ec9b9afd17 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -215,10 +215,6 @@ extern int ffs(int);
 extern int mkstemps(char *, int);
 #endif
 
-#if defined (HAVE_DECL_MKSTEMPS) && !HAVE_DECL_MKSTEMPS
-extern int mkstemps(char *, int);
-#endif
-
 /* Make memrchr available on systems that do not have it.  */
 #if !defined (__GNU_LIBRARY__ ) && !defined (__linux__) && \
 !defined (HAVE_MEMRCHR)
-- 
2.49.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] RISC-V: Fix uninit riscv_subset_list::m_allow_adding_dup issue

2025-05-14 Thread Christoph Müllner
On Tue, May 13, 2025 at 4:34 AM Kito Cheng  wrote:
>
> We forgot to initialize m_allow_adding_dup in the constructor of
> riscv_subset_list, then that will be a random value...that will lead
> to a random behavior of the -march may accpet duplicate extension.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (riscv_subset_list::riscv_subset_list): Init m_allow_adding_dup.

Reviewed-by: Christoph Müllner 

Thanks!

> ---
>  gcc/common/config/riscv/riscv-common.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index d3240f79240..2834697a857 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -620,7 +620,7 @@ riscv_subset_t::riscv_subset_t ()
>
>  riscv_subset_list::riscv_subset_list (const char *arch, location_t loc)
>: m_arch (arch), m_loc (loc), m_head (NULL), m_tail (NULL), m_xlen (0),
> -m_subset_num (0)
> +m_subset_num (0), m_allow_adding_dup (false)
>  {
>  }
>
> --
> 2.34.1
>


2nd Ping: Re: [Stage 1][Middle-end][PATCH v5 0/3] Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2025-05-14 Thread Qing Zhao
Hi,

This patch set has been waiting for the Middle-end review for a very long time 
since last year. 

Could you Please take a look and let me know whether it’s ready for GCC16? 

Thanks a lot.

Qing

On May 1, 2025, at 10:02, Qing Zhao  wrote:
> 
> Hi, 
> 
> A gentle ping on review of the Middle-end of change of this patch set.
> The diagnostic part has been reviewed and approved by David last year 
> already. 
> 
> The 4th version of the patch set has been sent for review since Nov 5, 2024. 
> Pinged 5 times since then. 
> 
> Linux Kernel has been using this feature for a while, and found it very 
> useful.
> Kees has been asking for the status of this patch many many times.  -:)
> 
> We are hoping to make this into GCC16, should be a nice improvement in 
> general.
> 
> Please take a look and let me know whether it’s ready for GCC16? 
> 
> Thanks a lot.
> 
> Qing.
> 
> For your convenience, the links to the latest version are:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680336.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680337.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680339.html
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680338.html
> 
> 
>> On Apr 7, 2025, at 11:04, Qing Zhao  wrote:
>> 
>> Hi,
>> 
>> These are the patches for fixing PR109071 for GCC16 stage1:
>> 
>> Adding -fdiagnotics-details into GCC to provide more hints to the
>> end users on how the warnings come from, in order to help the user
>> to locate the exact location in source code on the specific warnings
>> due to compiler optimizations.
>> 
>> They base on the the following 4th version of the patch and rebased
>> on the latest trunk. 
>> 
>> bootstrapping and regression testing on both x86 and aarch64.
>> 
>> Kees and Sam have been using this option for a while in linux kernel
>> and other applications and both found very helpful.
>> 
>> They asked me several times about the status of this work and hope
>> the functionality can be available in GCC as soon as possible.
>> 
>> The diagnostic part of the patch had been reviewed and approved by
>> David already last year. 
>> 
>> Please review the middle-end part of the change.
>> 
>> thanks a lot.
>> 
>> Qing
>> 
>> ===
>> 
>> The latest version of(4th version) is:
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667613.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667614.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667615.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667616.html
>> 
>> The major improvements to this patch compared to version 3 are:
>> 
>> 1. Divide the patch into 3 parts:
>>   Part 1: Add new data structure move_history, record move_history during
>>   transformation;
>>   Part 2: In warning analysis, Use the new move_history to form a rich
>>   location with a sequence of events, to report more context info
>>   of the warnings.
>>   Part 3: Add debugging mechanism for move_history.
>> 
>> 2. Major change to the above Part 2, completely rewritten based on David's
>>  new class lazy_diagnostic_path. 
>> 
>> 3. Fix all issues identied By Sam;
>>  A. fix PR117375 (Bug in tree-ssa-sink.cc);
>>  B. documentation clarification;
>>  C. Add all the duplicated PRs in the commit comments;
>> 
>> 4. Bootstrap GCC with the new -fdiagnostics-details on by default (Init (1)).
>>  exposed some ICE similar as PR117375 in tree-ssa-sink.cc, fixed.
>> 
>> Qing Zhao (3):
>> Provide more contexts for -Warray-bounds, -Wstringop-*warning messages
>>   due to code movements from compiler transformation (Part 1)
>>   [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
>> Provide more contexts for -Warray-bounds, -Wstringop-*warning messages
>>   due to code movements from compiler transformation (Part 2)
>>   [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
>> Provide more contexts for -Warray-bounds, -Wstringop-* warning
>>   messages due to code movements from compiler transformation (Part 3)
>>   [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
>> 
>> gcc/Makefile.in   |   2 +
>> gcc/common.opt|   4 +
>> gcc/diagnostic-move-history.cc| 332 ++
>> gcc/diagnostic-move-history.h |  94 +
>> gcc/doc/invoke.texi   |  11 +
>> gcc/gimple-array-bounds.cc|  39 ++--
>> gcc/gimple-array-bounds.h |   2 +-
>> gcc/gimple-iterator.cc|   3 +
>> gcc/gimple-pretty-print.cc|   4 +
>> gcc/gimple-ssa-isolate-paths.cc   |  21 ++
>> gcc/gimple-ssa-warn-access.cc | 131 +++-
>> gcc/gimple-ssa-warn-restrict.cc   |  25 ++-
>> gcc/move-history-rich-location.cc |  56 +
>> gcc/move-history-rich-location.h  |  65 ++
>> gcc/testsuite/gcc.dg/pr109071.c   |  43 
>> gcc/testsuite/gcc.dg/pr109071_1.c |  36 
>> gcc/testsuite/gcc.dg/pr109071_2.c |  50 +
>> gcc/testsuite/

Re: [PATCH][GCC15/14/13/12] dwarf2out: Propagate dtprel into the .debug_addr table in resolve_addr_in_expr

2025-05-14 Thread Kyle Huey
On Wed, May 14, 2025 at 9:26 AM Richard Biener
 wrote:
>
> On Wed, May 14, 2025 at 5:25 AM Kyle Huey  wrote:
> >
> > For a debugger to display statically-allocated[0] TLS variables the compiler
> > must communicate information[1] that can be used in conjunction with 
> > knowledge
> > of the runtime enviroment[2] to calculate a location for the variable for
> > each thread. That need gives rise to dw_loc_dtprel in dwarf2out, a flag 
> > tracking
> > whether the location description is dtprel, or relative to the
> > "dynamic thread pointer". Location descriptions in the .debug_info section 
> > for
> > TLS variables need to be relocated by the static linker accordingly, and
> > dw_loc_dtprel controls emission of the needed relocations.
> >
> > This is further complicated by -gsplit-dwarf. -gsplit-dwarf is designed to 
> > allow
> > as much debugging information as possible to bypass the static linker to 
> > improve
> > linking performance. One of the ways that is done is by introducing a layer 
> > of
> > indirection for relocatable values[3]. That gives rise to addr_index_table 
> > which
> > ultimately results in the .debug_addr section.
> >
> > While the code handling addr_index_table clearly contemplates the existence 
> > of
> > dtprel entries[4] resolve_addr_in_expr does not, and the result is that when
> > using -gsplit-dwarf the DWARF for TLS variables contains an address[5] 
> > rather
> > than an offset, and debuggers can't work with that.
> >
> > This is visible on a trivial example. Compile
> >
> > ```
> > static __thread int tls_var;
> >
> > int main(void) {
> >   tls_var = 42;
> >   return 0;
> > }
> > ```
> >
> > with -g and -g -gsplit-dwarf. Run the program under gdb. When examining the
> > value of tls_var before and after the assignment, -g behaves as one would
> > expect but -g -gsplit-dwarf does not. If the user is lucky and the 
> > miscalculated
> > address is not mapped, gdb will print "Cannot access memory at address ...".
> > If the user is unlucky and the miscalculated address is mapped, gdb will 
> > simply
> > give the wrong value. You can further confirm that the issue is the address
> > calculation by asking gdb for the address of tls_var and comparing that to 
> > what
> > one would expect.[6]
> >
> > Thankfully this is trivial to fix by modifying resolve_addr_in_expr to 
> > propagate
> > the dtprel character of the location where necessary. gdb begins working as
> > expected and the diff in the generated assembly is clear.
> >
> > ```
> > .section.debug_addr,"",@progbits
> > .long   0x14
> > .value  0x5
> > .byte   0x8
> > .byte   0
> >  .Ldebug_addr0:
> > -   .quad   tls_var
> > +   .long   tls_var@dtpoff, 0
> > .quad   .LFB0
> > ```
> >
> > [0] Referring to e.g. __thread as statically-allocated vs. e.g. a
> > dynamically-allocated pthread_key_create() call.
> > [1] Generally an offset in a TLS block.
> > [2] With glibc, provided by libthread_db.so.
> > [3] Relocatable values are moved to a table in the .debug_addr section, 
> > those
> > values in .debug_info are replaced with special values that look up 
> > indexes
> > in that table, and then the static linker elsewhere assigns a single 
> > per-CU
> > starting index in the .debug_addr section, allowing those special 
> > values to
> > remain permanently fixed and the resulting data to be ignored by the 
> > linker.
> > [4] ate_kind_rtx_dtprel exists, after all, and new_addr_loc_descr does 
> > produce
> > it where appropriate.
> > [5] e.g. an address in the .tbss/.tdata section.
> > [6] e.g. on x86-64 by examining %fsbase and the offset in the assembly
>
> I have bootstrapped/tested this on x86_64-unknown-linux-gnu on the
> 15 and 14 branches and pushed there.
>
> Richard.

Thank you.

- Kyle

> > 2025-05-01  Kyle Huey  
> >
> > * dwarf2out.cc (resolve_addr_in_expr): Propagate dtprel into the 
> > address
> > table when appropriate.
> > ---
> >  gcc/dwarf2out.cc | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> > index 69e9d775d0d..2437610d48d 100644
> > --- a/gcc/dwarf2out.cc
> > +++ b/gcc/dwarf2out.cc
> > @@ -31068,7 +31068,8 @@ resolve_addr_in_expr (dw_attr_node *a, 
> > dw_loc_descr_ref loc)
> >return false;
> >  remove_addr_table_entry (loc->dw_loc_oprnd1.val_entry);
> > loc->dw_loc_oprnd1.val_entry
> > - = add_addr_table_entry (rtl, ate_kind_rtx);
> > + = add_addr_table_entry (rtl, loc->dtprel
> > + ? ate_kind_rtx_dtprel : ate_kind_rtx);
> >}
> > break;
> >case DW_OP_const4u:
> > --
> > 2.43.0
> >


Re: [PATCH] libstdc++: Implement C++26 function_ref [PR119126]

2025-05-14 Thread Patrick Palka


On Wed, 14 May 2025, Tomasz Kamiński wrote:

> This patch implements C++26 function_ref as specified in P0792R14,
> with correction for constraints for constructor accepting nontype_t
> parameter from LWG 4256.
> 
> As function_ref may store a pointer to the const object, __Ptrs::_M_obj is
> changed to const void*, so again we do not cast away const from const
> objects. To help with necessary cast, a __polyfunc::__cast_to helper is
> added, that accepts a reference to that type.
> 
> The _Invoker now defines additional call methods used by function_ref:
> _S_ptrs() for invoking target passed by reference, and __S_nttp, _S_bind_ptr,
> _S_bind_ref for handling constructors accepting nontype_t. The existing
> _S_call_storage is changed to thin wrappers, that initialies _Ptrs,
> and forwards to _S_call_ptrs.
> 
> This reduced the most uses of _Storage::_M_ptr and _Storage::_M_ref,
> so this functions was removed, and _Manager uses were adjusted.
> 
> Finally we make function_ref available in freestanding mode, as
> move_only_function and copyable_function iarecurrently only available in 
> hosted,

"are currently"

> so we define _Manager and _Mo_base only if either __glibcxx_move_only_function
> or __glibcxx_copyable_function is defined.
> 
>   PR libstdc++/119126
> 
> libstdc++-v3/ChangeLog:
> 
>   * doc/doxygen/stdheader.cc: Added funcref_impl.h file.
>   * include/Makefile.am: Added funcref_impl.h file.
>   * include/Makefile.in: Added funcref_impl.h file.
>   * include/bits/funcref_impl.h: New file.
>   * include/bits/funcwrap.h: (_Ptrs::_M_obj): Const-qualify.
>   (_Storage::_M_ptr, _Storage::_M_ref): Remove.
>   (__polyfunc::__cast_to) Define.
>   (_Base_invoker::_S_ptrs, _Base_invoker::_S_nttp)
>   (_Base_invoker::_S_bind_ptrs, _Base_invoker::_S_bind_ref)
>   (_Base_invoker::_S_call_ptrs): Define.
>   (_Base_invoker::_S_call_storage): Foward to _S_call_ptrs.
>   (_Manager::_S_local, _Manager::_S_ptr): Adjust for _M_obj being
>   const qualified.
>   (__polyfunc::_Manager, __polyfunc::_Mo_base): Guard with
>   __glibcxx_move_only_function || __glibcxx_copyable_function.
>   (std::function_ref, std::__is_function_ref_v)
>   [__glibcxx_function_ref]: Define.
>   * include/bits/utility.h (std::nontype_t, std::nontype)
>   (__is_nontype_v) [__glibcxx_function_ref]: Define.
>   * include/bits/version.def: Define function_ref.
>   * include/bits/version.h: Regenerate.
>   * src/c++23/std.cc.in (std::function_ref) [__cpp_lib_function_ref]:
>Export.
>   * testsuite/20_util/function_ref/assign.cc: New test.
>   * testsuite/20_util/function_ref/call.cc: New test.
>   * testsuite/20_util/function_ref/cons.cc: New test.
>   * testsuite/20_util/function_ref/cons_neg.cc: New test.
>   * testsuite/20_util/function_ref/conv.cc: New test.

Should some of these tests run in freestanding mode too, given that
function_ref is freestanding?

> ---
> Would appreciate check of the documentation comments in funcref_impl.h
> file. 
> 
>  libstdc++-v3/doc/doxygen/stdheader.cc |   1 +
>  libstdc++-v3/include/Makefile.am  |   1 +
>  libstdc++-v3/include/Makefile.in  |   1 +
>  libstdc++-v3/include/bits/funcref_impl.h  | 185 +++
>  libstdc++-v3/include/bits/funcwrap.h  | 154 
>  libstdc++-v3/include/bits/utility.h   |  17 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  10 +
>  libstdc++-v3/include/std/functional   |   4 +-
>  libstdc++-v3/src/c++23/std.cc.in  |   3 +
>  .../testsuite/20_util/function_ref/assign.cc  | 110 +
>  .../testsuite/20_util/function_ref/call.cc| 145 
>  .../testsuite/20_util/function_ref/cons.cc| 219 ++
>  .../20_util/function_ref/cons_neg.cc  |  30 +++
>  .../testsuite/20_util/function_ref/conv.cc| 152 
>  15 files changed, 993 insertions(+), 47 deletions(-)
>  create mode 100644 libstdc++-v3/include/bits/funcref_impl.h
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/assign.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/call.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/cons_neg.cc
>  create mode 100644 libstdc++-v3/testsuite/20_util/function_ref/conv.cc
> 
> diff --git a/libstdc++-v3/doc/doxygen/stdheader.cc 
> b/libstdc++-v3/doc/doxygen/stdheader.cc
> index 839bfc81bc0..938b2b04a26 100644
> --- a/libstdc++-v3/doc/doxygen/stdheader.cc
> +++ b/libstdc++-v3/doc/doxygen/stdheader.cc
> @@ -55,6 +55,7 @@ void init_map()
>  headers["functional_hash.h"]= "functional";
>  headers["mofunc_impl.h"]= "functional";
>  headers["cpyfunc_impl.h"]   = "functional";
> +headers["funcref_impl.h"]

[COMMITTED][gcc13] PR tree-optimization/117287 - Backport new assume implementation

2025-05-14 Thread Andrew MacLeod


On 4/29/25 18:00, Andrew MacLeod wrote:


On 3/28/25 05:25, Jakub Jelinek wrote:

On Fri, Mar 28, 2025 at 08:12:35AM +0100, Richard Biener wrote:
On Thu, Mar 27, 2025 at 8:14 PM Andrew MacLeod  
wrote:

This patch backports the ASSUME support that was rewritten in GCC 15.

Its slightly more complicated than the port to GCC 14 was in that a 
few

classes have been rewritten. I've isolated them all to tree-assume.cc
which contains the pass.

It has to also bring in the ssa_cache and lazy_ssa_cache from gcc14,
along with some tweaks to those classes to deal with changes in the 
way
range_allocators worked started in GCC14. Those changes are are all 
the
top of the tree-assume.cc file. The rest of the file is a carbon 
copy of

the GCC14 version. (well, what should be... there is an outstanding
debug output support that was never submitted I discovered)

I'm not sure if its worth putting this in GCC13 or not, but I will
submit it and leave it to the release managers :-)  It should be low
risk, especially since assume was experimental support?

I have no strong opinion here besides questioning whether it's
necessary (as you say, assume is experimental) and the fact that
by splicing out the VRP changes to a special place further maintenance
is made more difficult.

IMO, up to you (expecting you'll fix issues if they come up), but would
like to hear a 2nd opinion from Jakub.

I'd probably apply it, it was a wrong-code issue and I'm not sure
users understand assume as experimental.
While the [[assume (...)]]; form is a C++23 feature which is 
experimental,

we accept that attribute even since C++11 and in C23 and in the
__attribute__((assume (...))); form everywhere and as a documented
extension.

If the ranger changes are done only when users actually use assume 
rather

than all the time (and only when using non-trivial assumptions, trivial
ones with no side-effects are turned into if (!x) 
__builtin_unreachable ()),

I think this decreases the risks.

Jakub



I've committed the following patch to GCC 13 branch.

Bootstrapped on x86_64-pc-linux-gnu  with no regressions. Pushed.

From ba19612f021ccd39925ef99b51c6aa2c59155800 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 27 Mar 2025 10:51:16 -0400
Subject: [PATCH] Backport new assume implementation and cache.

Rework the assume pass to work properly and fail conservatively when it
does.  Also move it to its own file.

	PR tree-optimization/117287
	gcc/
	* Makefile.in (IBJS): Add tree-assume.o
	* gimple-range-fold.cc (class fur_edge): Relocate to...
	* gimple-range-fold.h (class fur_edge): Here.
	* gimple-range.cc (assume_query::assume_range_p): Remove.
	(assume_query::range_of_expr): Remove.
	(assume_query::assume_query): Move to tree-assume.cc.
	(assume_query::~assume_query): Remove.
	(assume_query::calculate_op): Move to tree-assume.cc.
	(assume_query::calculate_phi): Likewise.
	(assume_query::check_taken_edge): Remove.
	(assume_query::calculate_stmt): Move to tree-assume.cc.
	(assume_query::dump): Remove.
	* gimple-range.h (class assume_query): Move to tree-assume.cc
	* tree-assume.cc: New
	* tree-vrp.cc (struct pass_data_assumptions): Move to tree-assume.cc.
	(class pass_assumptions): Likewise.
	(make_pass_assumptions): Likewise.

	gcc/testsuite/
	* g++.dg/cpp23/pr117287-attr.C: New.
---
 gcc/Makefile.in|   1 +
 gcc/gimple-range-fold.cc   |  13 -
 gcc/gimple-range-fold.h|  12 +
 gcc/gimple-range.cc| 189 --
 gcc/gimple-range.h |  19 -
 gcc/testsuite/g++.dg/cpp23/pr117287-attr.C |  38 ++
 gcc/tree-assume.cc | 650 +
 gcc/tree-vrp.cc|  68 ---
 8 files changed, 701 insertions(+), 289 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/pr117287-attr.C
 create mode 100644 gcc/tree-assume.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 775aaa1b3c4..1d9e10127ca 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1633,6 +1633,7 @@ OBJS = \
 	ubsan.o \
 	sanopt.o \
 	sancov.o \
+	tree-assume.o \
 	tree-call-cdce.o \
 	tree-cfg.o \
 	tree-cfgcleanup.o \
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 180f349eda9..e2bb294624f 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -103,21 +103,8 @@ fur_source::register_relation (edge e ATTRIBUTE_UNUSED,
 {
 }
 
-// This version of fur_source will pick a range up off an edge.
-
-class fur_edge : public fur_source
-{
-public:
-  fur_edge (edge e, range_query *q = NULL);
-  virtual bool get_operand (vrange &r, tree expr) override;
-  virtual bool get_phi_operand (vrange &r, tree expr, edge e) override;
-private:
-  edge m_edge;
-};
-
 // Instantiate an edge based fur_source.
 
-inline
 fur_edge::fur_edge (edge e, range_query *q) : fur_source (q)
 {
   m_edge = e;
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index 68c6d7743e9..0a028e31be0 100644
--

[PATCH] match: Allow some optional casts for boolean comparisons

2025-05-14 Thread Andrew Pinski
This is the next step in removing forward_propagate_into_comparison
and forward_propagate_into_gimple_cond; In the case of `((int)(a cmp b)) != 0`
we want to do the transformation to `a cmp b` even if the cast is used twice.
This is exactly what 
forward_propagate_into_comparison/forward_propagate_into_gimple_cond
do and does the copy.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`(a cmp b) != false`, `(a cmp b) == true`,
`(a cmp b) != true`, `(a cmp b) == false`): Allow an
optional cast between the comparison and the eq/ne.
(`bool_val != false`, `bool_val == true`): Allow an optional
cast between the bool_val and the ne/eq.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 79485f9678a..ffb1695e6e6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6913,15 +6913,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (ncmp @0 @1)
  /* The following bits are handled by fold_binary_op_with_conditional_arg.  */
  (simplify
-  (ne (cmp@2 @0 @1) integer_zerop)
+  (ne (convert? (cmp@2 @0 @1)) integer_zerop)
   (if (types_match (type, TREE_TYPE (@2)))
(cmp @0 @1)))
  (simplify
-  (eq (cmp@2 @0 @1) integer_truep)
+  (eq (convert? (cmp@2 @0 @1)) integer_truep)
   (if (types_match (type, TREE_TYPE (@2)))
(cmp @0 @1)))
  (simplify
-  (ne (cmp@2 @0 @1) integer_truep)
+  (ne (convert? (cmp@2 @0 @1)) integer_truep)
   (if (types_match (type, TREE_TYPE (@2)))
(with { enum tree_code ic = invert_tree_comparison
 (cmp, HONOR_NANS (@0)); }
@@ -6930,7 +6930,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (ic == ncmp)
   (ncmp @0 @1))
  (simplify
-  (eq (cmp@2 @0 @1) integer_zerop)
+  (eq (convert? (cmp@2 @0 @1)) integer_zerop)
   (if (types_match (type, TREE_TYPE (@2)))
(with { enum tree_code ic = invert_tree_comparison
 (cmp, HONOR_NANS (@0)); }
@@ -8104,13 +8104,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* bool_var != 0 becomes bool_var.  */
 (simplify
- (ne @0 integer_zerop)
+ (ne (convert? @0) integer_zerop)
  (if (TREE_CODE (TREE_TYPE (@0)) == BOOLEAN_TYPE
   && types_match (type, TREE_TYPE (@0)))
   (non_lvalue @0)))
 /* bool_var == 1 becomes bool_var.  */
 (simplify
- (eq @0 integer_onep)
+ (eq (convert? @0) integer_onep)
  (if (TREE_CODE (TREE_TYPE (@0)) == BOOLEAN_TYPE
   && types_match (type, TREE_TYPE (@0)))
   (non_lvalue @0)))
-- 
2.43.0



RE: [PATCH] x86: Enable separate shrink wrapping

2025-05-14 Thread Cui, Lili


> -Original Message-
> From: Uros Bizjak 
> Sent: Tuesday, May 13, 2025 6:04 PM
> To: Cui, Lili 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] x86: Enable separate shrink wrapping
> 
> On Tue, May 13, 2025 at 8:15 AM Cui, Lili  wrote:
> >
> > From: Lili Cui 
> >
> > Hi,
> >
> > This patch is to enale separate shrink wrapping for x86.
> >
> > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> 
> Unfortunately, the patched compiler fails to boot the latest linux kernel.
> 
Thank you very much for reporting this issue, I will reproduce it.

Lili.

> Uros.
> 
> 
> 
> Uros.
> >
> >
> > This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> > enable separate shrink wrapping for function prologues/epilogues in
> > x86.
> >
> > When performing separate shrink wrapping, we choose to use mov instead
> > of push/pop, because using push/pop is more complicated to handle rsp
> > adjustment and may lose performance, so here we choose to use mov,
> > which has a small impact on code size, but guarantees performance.
> >
> > Tested against SPEC CPU 2017, this change always has a net-positive
> > effect on the dynamic instruction count.  See the following table for
> > the breakdown on how this reduces the number of dynamic instructions
> > per workload on a like-for-like (with/without this commit):
> >
> > instruction count   basewith commit (commit-base)/commit
> > 502.gcc_r   98666845943 96891561634 -1.80%
> > 526.blender_r   6.21226E+11 6.12992E+11 -1.33%
> > 520.omnetpp_r   1.1241E+11  1.11093E+11 -1.17%
> > 500.perlbench_r 1271558717  1263268350  -0.65%
> > 523.xalancbmk_r 2.20103E+11 2.18836E+11 -0.58%
> > 531.deepsjeng_r 2.73591E+11 2.72114E+11 -0.54%
> > 500.perlbench_r 64195557393 63881512409 -0.49%
> > 541.leela_r 2.99097E+11 2.98245E+11 -0.29%
> > 548.exchange2_r 1.27976E+11 1.27784E+11 -0.15%
> > 527.cam4_r  88981458425 7334679 -0.11%
> > 554.roms_r  2.60072E+11 2.59809E+11 -0.10%
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-protos.h (ix86_get_separate_components):
> > New function.
> > (ix86_components_for_bb): Likewise.
> > (ix86_disqualify_components): Likewise.
> > (ix86_emit_prologue_components): Likewise.
> > (ix86_emit_epilogue_components): Likewise.
> > (ix86_set_handled_components): Likewise.
> > * config/i386/i386.cc (save_regs_using_push_pop):
> > Encapsulate code.
> > (ix86_compute_frame_layout):
> > Handle save_regs_using_push_pop.
> > (ix86_emit_save_regs_using_mov):
> > Skip registers that are wrapped separately.
> > (ix86_expand_prologue): Likewise.
> > (ix86_emit_restore_regs_using_mov): Likewise.
> > (ix86_expand_epilogue): Likewise.
> > (ix86_get_separate_components): New function.
> > (ix86_components_for_bb): Likewise.
> > (ix86_disqualify_components): Likewise.
> > (ix86_emit_prologue_components): Likewise.
> > (ix86_emit_epilogue_components): Likewise.
> > (ix86_set_handled_components): Likewise.
> > (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> > (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> > (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
> > (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
> > (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
> > (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
> > * config/i386/i386.h (struct machine_function):Add
> > reg_is_wrapped_separately array for register wrapping
> > information.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/x86_64/abi/callabi/leaf-2.c: Adjust the test.
> > * gcc.target/i386/interrupt-16.c: Likewise.
> > * g++.target/i386/shrink_wrap_separate.c: New test.
> > ---
> >  gcc/config/i386/i386-protos.h |   7 +
> >  gcc/config/i386/i386.cc   | 261 +++---
> >  gcc/config/i386/i386.h|   1 +
> >  .../g++.target/i386/shrink_wrap_separate.c|  24 ++
> >  gcc/testsuite/gcc.target/i386/interrupt-16.c  |   4 +-
> >  .../gcc.target/x86_64/abi/callabi/leaf-2.c|   2 +-
> >  6 files changed, 257 insertions(+), 42 deletions(-)  create mode
> > 100644 gcc/testsuite/g++.target/i386/shrink_wrap_separate.c
> >
> > diff --git a/gcc/config/i386/i386-protos.h
> > b/gcc/config/i386/i386-protos.h index e85b925704b..11d26e93973 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -436,6 +436,13 @@ extern rtl_opt_pass *make_pass_align_tight_loops
> > (gcc::context *);  extern bool ix86_has_no_direct_extern_access;
> > extern bool ix86_rpa

[PATCH] RISC-V: Add new operand constraint: cR

2025-05-14 Thread Kito Cheng
This commit introduces a new operand constraint `cR` for the RISC-V
architecture, which allows the use of an even-odd RVC general purpose register
(x8-x15) in inline asm.

Ref: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/102

gcc/ChangeLog:

* config/riscv/constraints.md (cR): New constraint.
* doc/md.texi (Machine Constraints::RISC-V): Document the new cR
constraint.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/constraint-cR.c: New test case.
---
 gcc/config/riscv/constraints.md|  4 
 gcc/doc/md.texi|  3 +++
 gcc/testsuite/gcc.target/riscv/constraint-cR.c | 13 +
 3 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/constraint-cR.c

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 18556a59141..58355cf03f2 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -43,6 +43,10 @@ (define_register_constraint "cr" "RVC_GR_REGS"
 (define_register_constraint "cf" "TARGET_HARD_FLOAT ? RVC_FP_REGS : 
(TARGET_ZFINX ? RVC_GR_REGS : NO_REGS)"
   "RVC floating-point registers (f8-f15), if available, reuse GPR as FPR when 
use zfinx.")
 
+(define_register_constraint "cR" "RVC_GR_REGS"
+  "Even-odd RVC general purpose register (x8-x15)."
+  "regno % 2 == 0")
+
 ;; General constraints
 
 (define_constraint "I"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f6314af4692..1a1c1b73089 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3694,6 +3694,9 @@ RVC general purpose register (x8-x15).
 RVC floating-point registers (f8-f15), if available, reuse GPR as FPR when use
 zfinx.
 
+@item cR
+Even-odd RVC general purpose register pair.
+
 @item R
 Even-odd general purpose register pair.
 
diff --git a/gcc/testsuite/gcc.target/riscv/constraint-cR.c 
b/gcc/testsuite/gcc.target/riscv/constraint-cR.c
new file mode 100644
index 000..479246b632a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/constraint-cR.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void foo(int a0, int a1, int a2, int a3, int a4, int a5, int a6, int a7, int 
m0, int m1) {
+/*
+** foo:
+**   ...
+**   addi\s*t0,\s*(a[024]|s0),\s*(a[024]|s0)
+**   ...
+*/
+__asm__ volatile("addi t0, %0, %0" : : "cR" (m0) : "memory");
+}
-- 
2.34.1



[PATCH] RISC-V: Support Zilsd code gen

2025-05-14 Thread Kito Cheng
This commit adds the code gen support for Zilsd, which is a
newly added extension for RISC-V. The Zilsd extension allows
for loading and storing 64-bit values using even-odd register
pairs.

We only try to do miminal code gen support for that, which means only
use the new instructions when the load store is 64 bits data, we can use
that to optimize the code gen of memcpy/memset/memmove and also the
prologue and epilogue of functions, but I think that probably should be
done in a follow up patch.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Handle
load/store with odd-even reg pair.
(riscv_split_64bit_move_p): Don't split load/store if zilsd enabled.
(riscv_hard_regno_mode_ok): Only allow even reg can be used for
64 bits mode for zilsd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zilsd-code-gen.c: New test.
---
 gcc/config/riscv/riscv.cc | 38 +++
 .../gcc.target/riscv/zilsd-code-gen.c | 18 +
 2 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zilsd-code-gen.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d28aee4b439..f5ee3ce9034 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3742,6 +3742,25 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
   return true;
 }
 
+  if (TARGET_ZILSD
+  && (GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))
+  && ((REG_P (dest) && MEM_P (src))
+ || (MEM_P (dest) && REG_P (src
+{
+  rtx reg = REG_P (dest) ? dest : src;
+  unsigned regno = REGNO (reg);
+  /* ZILSD require even-odd register pair, let RA to
+fix the constraint if the reg is hard reg and not even reg.  */
+  if ((regno < FIRST_PSEUDO_REGISTER)
+ && (regno % 2) != 0)
+   {
+ rtx tmp = gen_reg_rtx (GET_MODE (reg));
+ emit_move_insn (tmp, src);
+ emit_move_insn (dest, tmp);
+ return true;
+   }
+}
+
   /* RISC-V GCC may generate non-legitimate address due to we provide some
  pattern for optimize access PIC local symbol and it's make GCC generate
  unrecognizable instruction during optimizing.  */
@@ -4577,6 +4596,19 @@ riscv_split_64bit_move_p (rtx dest, rtx src)
   if (TARGET_64BIT)
 return false;
 
+  /* Zilsd provides load/store with even-odd register pair. */
+  if (TARGET_ZILSD
+  && (((REG_P (dest) && MEM_P (src))
+ || (MEM_P (dest) && REG_P (src)
+{
+  rtx reg = REG_P (dest) ? dest : src;
+  unsigned regno = REGNO (reg);
+  /* GCC may still generating some load/store with odd-even reg pair
+because the ABI handling, but that's fine, just split that later.  */
+  if (GP_REG_P (regno))
+   return (regno < FIRST_PSEUDO_REGISTER) && ((regno % 2) != 0);
+}
+
   /* There is no need to split if the FLI instruction in the `Zfa` extension 
can be used.  */
   if (satisfies_constraint_zfli (src))
 return false;
@@ -9799,6 +9831,12 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   if (riscv_v_ext_mode_p (mode))
return false;
 
+  /* Zilsd require load/store with even-odd reg pair.  */
+  if (TARGET_ZILSD
+ && (GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))
+ && ((regno % 2) != 0))
+   return false;
+
   if (!GP_REG_P (regno + nregs - 1))
return false;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/zilsd-code-gen.c 
b/gcc/testsuite/gcc.target/riscv/zilsd-code-gen.c
new file mode 100644
index 000..9155622ea55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zilsd-code-gen.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32i_zilsd -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+long long foo1(long long *a)
+{
+return *a;
+}
+
+long long g;
+
+void foo2(long long a)
+{
+g = a;
+}
+
+/* { dg-final { scan-assembler-times "ld\t" 1 } } */
+/* { dg-final { scan-assembler-times "sd\t" 1 } } */
-- 
2.34.1



Re: [PATCH] match: Allow some optional casts for boolean comparisons

2025-05-14 Thread Andrew Pinski
On Wed, May 14, 2025 at 7:39 PM Andrew Pinski  wrote:
>
> This is the next step in removing forward_propagate_into_comparison
> and forward_propagate_into_gimple_cond; In the case of `((int)(a cmp b)) != 0`
> we want to do the transformation to `a cmp b` even if the cast is used twice.
> This is exactly what 
> forward_propagate_into_comparison/forward_propagate_into_gimple_cond
> do and does the copy.

Actually I am thinking we should change:
this set of patterns:
```
(for cmp (simple_comparison)
 (simplify
  (cmp (convert@0 @00) (convert?@1 @10))
  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
   /* Disable this optimization if we're casting a function pointer
  type on targets that require function pointer canonicalization.  */
   && !(targetm.have_canonicalize_funcptr_for_compare ()
&& ((POINTER_TYPE_P (TREE_TYPE (@00))
&& FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@00
|| (POINTER_TYPE_P (TREE_TYPE (@10))
&& FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@10))
   && single_use (@0))
```
In the case of:
(if (TREE_CODE (@1) == INTEGER_CST)
 (cmp @00 ...)
We don't need to care if @0 is single_use or not as we remove one cast.
Plus all of the cases where we produce constants don't care about
single_use either.

So let's ignore this patch for now. I will get back to it tomorrow.

Thanks,
Andrew

>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * match.pd (`(a cmp b) != false`, `(a cmp b) == true`,
> `(a cmp b) != true`, `(a cmp b) == false`): Allow an
> optional cast between the comparison and the eq/ne.
> (`bool_val != false`, `bool_val == true`): Allow an optional
> cast between the bool_val and the ne/eq.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 79485f9678a..ffb1695e6e6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6913,15 +6913,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (ncmp @0 @1)
>   /* The following bits are handled by fold_binary_op_with_conditional_arg.  
> */
>   (simplify
> -  (ne (cmp@2 @0 @1) integer_zerop)
> +  (ne (convert? (cmp@2 @0 @1)) integer_zerop)
>(if (types_match (type, TREE_TYPE (@2)))
> (cmp @0 @1)))
>   (simplify
> -  (eq (cmp@2 @0 @1) integer_truep)
> +  (eq (convert? (cmp@2 @0 @1)) integer_truep)
>(if (types_match (type, TREE_TYPE (@2)))
> (cmp @0 @1)))
>   (simplify
> -  (ne (cmp@2 @0 @1) integer_truep)
> +  (ne (convert? (cmp@2 @0 @1)) integer_truep)
>(if (types_match (type, TREE_TYPE (@2)))
> (with { enum tree_code ic = invert_tree_comparison
>  (cmp, HONOR_NANS (@0)); }
> @@ -6930,7 +6930,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (ic == ncmp)
>(ncmp @0 @1))
>   (simplify
> -  (eq (cmp@2 @0 @1) integer_zerop)
> +  (eq (convert? (cmp@2 @0 @1)) integer_zerop)
>(if (types_match (type, TREE_TYPE (@2)))
> (with { enum tree_code ic = invert_tree_comparison
>  (cmp, HONOR_NANS (@0)); }
> @@ -8104,13 +8104,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* bool_var != 0 becomes bool_var.  */
>  (simplify
> - (ne @0 integer_zerop)
> + (ne (convert? @0) integer_zerop)
>   (if (TREE_CODE (TREE_TYPE (@0)) == BOOLEAN_TYPE
>&& types_match (type, TREE_TYPE (@0)))
>(non_lvalue @0)))
>  /* bool_var == 1 becomes bool_var.  */
>  (simplify
> - (eq @0 integer_onep)
> + (eq (convert? @0) integer_onep)
>   (if (TREE_CODE (TREE_TYPE (@0)) == BOOLEAN_TYPE
>&& types_match (type, TREE_TYPE (@0)))
>(non_lvalue @0)))
> --
> 2.43.0
>


RE: [PATCH] x86: Enable separate shrink wrapping

2025-05-14 Thread Cui, Lili
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, May 13, 2025 7:49 PM
> To: Uros Bizjak 
> Cc: Cui, Lili ; gcc-patches@gcc.gnu.org; Liu, Hongtao
> 
> Subject: Re: [PATCH] x86: Enable separate shrink wrapping
> 
> On Tue, May 13, 2025 at 12:36 PM Uros Bizjak  wrote:
> >
> > On Tue, May 13, 2025 at 8:15 AM Cui, Lili  wrote:
> > >
> > > From: Lili Cui 
> > >
> > > Hi,
> > >
> > > This patch is to enale separate shrink wrapping for x86.
> > >
> > > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> > >
> > > Ok for trunk?
> >
> > Unfortunately, the patched compiler fails to boot the latest linux kernel.
> 
> Michael Matz also posted x86 separate shrink wrapping here:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657519.html
> 

Thanks Richard. When I analyzed the 511.povray_r regressions, I found that some 
hot functions in 511 often return early, and using separate shrink wrapping can 
reduce the push and pop instructions before return. So, I created this patch. 
But I didn't notice that Michael had already posted it. I will test and compare 
the two patches, hope to find the optimal solution.

Lili.

> Richard.
> 
> > Uros.
> >
> >
> >
> > Uros.
> > >
> > >
> > > This commit implements the target macros (TARGET_SHRINK_WRAP_*) that
> > > enable separate shrink wrapping for function prologues/epilogues in
> > > x86.
> > >
> > > When performing separate shrink wrapping, we choose to use mov
> > > instead of push/pop, because using push/pop is more complicated to
> > > handle rsp adjustment and may lose performance, so here we choose to
> > > use mov, which has a small impact on code size, but guarantees
> performance.
> > >
> > > Tested against SPEC CPU 2017, this change always has a net-positive
> > > effect on the dynamic instruction count.  See the following table
> > > for the breakdown on how this reduces the number of dynamic
> > > instructions per workload on a like-for-like (with/without this commit):
> > >
> > > instruction count   basewith commit (commit-base)/commit
> > > 502.gcc_r   98666845943 96891561634 -1.80%
> > > 526.blender_r   6.21226E+11 6.12992E+11 -1.33%
> > > 520.omnetpp_r   1.1241E+11  1.11093E+11 -1.17%
> > > 500.perlbench_r 1271558717  1263268350  -0.65%
> > > 523.xalancbmk_r 2.20103E+11 2.18836E+11 -0.58%
> > > 531.deepsjeng_r 2.73591E+11 2.72114E+11 -0.54%
> > > 500.perlbench_r 64195557393 63881512409 -0.49%
> > > 541.leela_r 2.99097E+11 2.98245E+11 -0.29%
> > > 548.exchange2_r 1.27976E+11 1.27784E+11 -0.15%
> > > 527.cam4_r  88981458425 7334679 -0.11%
> > > 554.roms_r  2.60072E+11 2.59809E+11 -0.10%
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-protos.h (ix86_get_separate_components):
> > > New function.
> > > (ix86_components_for_bb): Likewise.
> > > (ix86_disqualify_components): Likewise.
> > > (ix86_emit_prologue_components): Likewise.
> > > (ix86_emit_epilogue_components): Likewise.
> > > (ix86_set_handled_components): Likewise.
> > > * config/i386/i386.cc (save_regs_using_push_pop):
> > > Encapsulate code.
> > > (ix86_compute_frame_layout):
> > > Handle save_regs_using_push_pop.
> > > (ix86_emit_save_regs_using_mov):
> > > Skip registers that are wrapped separately.
> > > (ix86_expand_prologue): Likewise.
> > > (ix86_emit_restore_regs_using_mov): Likewise.
> > > (ix86_expand_epilogue): Likewise.
> > > (ix86_get_separate_components): New function.
> > > (ix86_components_for_bb): Likewise.
> > > (ix86_disqualify_components): Likewise.
> > > (ix86_emit_prologue_components): Likewise.
> > > (ix86_emit_epilogue_components): Likewise.
> > > (ix86_set_handled_components): Likewise.
> > > (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS): Define.
> > > (TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB): Likewise.
> > > (TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS): Likewise.
> > > (TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS): Likewise.
> > > (TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS): Likewise.
> > > (TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Likewise.
> > > * config/i386/i386.h (struct machine_function):Add
> > > reg_is_wrapped_separately array for register wrapping
> > > information.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/x86_64/abi/callabi/leaf-2.c: Adjust the test.
> > > * gcc.target/i386/interrupt-16.c: Likewise.
> > > * g++.target/i386/shrink_wrap_separate.c: New test.
> > > ---
> > >  gcc/config/i386/i386-protos.h |   7 +
> > >  gcc/config/i386/i386.cc   | 261 +++---
> > >  gcc/config/i386/i386.h   

Re: [PATCH] libstdc++: Make debug iterator pointer sequence const [PR116369]

2025-05-14 Thread François Dumont

Got

On 14/05/2025 18:46, Jonathan Wakely wrote:

On Wed, 14 May 2025 at 17:31, François Dumont  wrote:

On 12/05/2025 23:03, Jonathan Wakely wrote:

On 31/03/25 22:20 +0200, François Dumont wrote:

Hi

Following this previous patch
https://gcc.gnu.org/pipermail/libstdc++/2024-August/059418.html I've
completed it for the _Safe_unordered_container_base type and
implemented the rest of the change to store the safe iterator
sequence as a pointer-to-const.

 libstdc++: Make debug iterator pointer sequence const [PR116369]

 In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug
sequence
 have been made mutable to allow attach iterators to const
containers.
 This change completes this fix by also declaring debug unordered
container
 members mutable.

 Additionally the debug iterator sequence is now a
pointer-to-const and so
 _Safe_sequence_base _M_attach and all other methods are const
qualified.
 Symbols export are maintained thanks to __asm directives.


I can't compile this, it seems to be missing changes to
safe_local_iterator.tcc:

In file included from
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.h:444,
  from
/home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:33:
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:
In member function ‘typename
__gnu_debug::_Distance_traits<_Iterator>::__type
__gnu_debug::_Safe_local_iterator<_Iterator,
_Sequence>::_M_get_distance_to(const
__gnu_debug::_Safe_local_iterator<_Iterator, _Sequence>&) const’:
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
error: there are no arguments to ‘_M_get_sequence’ that depend on a
template parameter, so a declaration of ‘_M_get_sequence’ must be
available [-Wtemplate-body]
47 | _M_get_sequence()->bucket_size(bucket()),
   | ^~~
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
note: (if you use ‘-fpermissive’, G++ will accept your code, but
allowing the use of an undeclared name is deprecated)
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:59:18:
error: there are no arguments to ‘_M_get_sequence’ that depend on a
template parameter, so a declaration of ‘_M_get_sequence’ must be
available [-Wtemplate-body]
59 | -_M_get_sequence()->bucket_size(bucket()),
   |  ^~~


Yes, sorry, I had already spotted this problem, but only updated the PR
and not re-sending patch here.



Also available as a PR

https://forge.sourceware.org/gcc/gcc-TEST/pulls/47

 /** Detach all singular iterators.
  *  @post for all iterators i attached to this sequence,
  *   i->_M_version == _M_version.
  */
 void
-_M_detach_singular();
+_M_detach_singular() const
+ __asm("_ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEv");

Does this work on all targets?

No idea ! I thought the symbol name used here just had to match the
entries in config/abi/pre/gnu.ver.

That linker script is not used for all targets.


Ok, got it, I only need to use this when symbol versioning is activated.

I think this new patch should do it if so.

François

diff --git a/libstdc++-v3/include/debug/formatter.h 
b/libstdc++-v3/include/debug/formatter.h
index d80e8a78dcb..8aa84adec77 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -96,7 +96,7 @@ namespace __gnu_debug
   template
 class _Safe_iterator;
 
-  template
+  template
 class _Safe_local_iterator;
 
   template
@@ -316,8 +316,8 @@ namespace __gnu_debug
}
}
 
-  template
-   _Parameter(_Safe_local_iterator<_Iterator, _Sequence> const& __it,
+  template
+   _Parameter(_Safe_local_iterator<_Iterator, _UContainer> const& __it,
   const char* __name, _Is_iterator)
: _M_kind(__iterator),  _M_variant()
{
@@ -326,8 +326,8 @@ namespace __gnu_debug
  _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(_Iterator);
  _M_variant._M_iterator._M_constness =
__it._S_constant() ? __const_iterator : __mutable_iterator;
- _M_variant._M_iterator._M_sequence = __it._M_get_sequence();
- _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence);
+ _M_variant._M_iterator._M_sequence = __it._M_get_ucontainer();
+ _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_UContainer);
 
  if (__it._M_singular())
{
diff --git a/libstdc++-v3/include/debug/safe_base.h 
b/libstdc++-v3/include/debug/safe_base.h
index cf3f1708ad2..7f7876e4017 100644
--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -31,6 +31,12 @@
 
 #include 
 
+#if _GLIBCXX_SYMVER_GNU
+# define _GLIBCXX_SYMVER_ASM(S) __asm(S)
+#

[committed] RISC-V: Drop duplicate build rule for riscv-ext.opt [NFC]

2025-05-14 Thread Kito Cheng
gcc/ChangeLog:

* config/riscv/t-riscv: Drop duplicate build rule for
riscv-ext.opt.
---
 gcc/config/riscv/t-riscv | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index e99d6689ba0..854daa96e73 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -198,8 +198,6 @@ RISCV_EXT_DEFS = \
 
 $(srcdir)/config/riscv/riscv-ext.opt: $(RISCV_EXT_DEFS)
 
-$(srcdir)/config/riscv/riscv-ext.opt: s-riscv-ext.opt ; @true
-
 build/gen-riscv-ext-opt$(build_exeext): 
$(srcdir)/config/riscv/gen-riscv-ext-opt.cc \
$(RISCV_EXT_DEFS)
$(CXX_FOR_BUILD) $(CXXFLAGS_FOR_BUILD) $< -o $@
-- 
2.34.1



[PATCH 2/2] forwprop: Add alias walk limit to optimize_memcpy_to_memset.

2025-05-14 Thread Andrew Pinski
As sugguested in 
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681507.html,
this adds the aliasing walk limit.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Add a limit on the 
alias walk.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-forwprop.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 71de99f46ff..0f52e8fe6ef 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -1216,13 +1216,15 @@ optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, 
tree dest, tree src, tree
   ao_ref_init (&read, src);
   tree vuse = gimple_vuse (stmt);
   gimple *defstmt;
+  unsigned limit = param_sccvn_max_alias_queries_per_access;
   do {
 if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
   return false;
 defstmt = SSA_NAME_DEF_STMT (vuse);
 if (is_a (defstmt))
   return false;
-
+if (limit-- == 0)
+  return false;
 /* If the len was null, then we can use TBBA. */
 if (stmt_may_clobber_ref_p_1 (defstmt, &read,
  /* tbaa_p = */ len_was_null))
-- 
2.43.0



[PATCH 1/2] forwprop: Move memcpy_to_memset from gimple fold to forwprop

2025-05-14 Thread Andrew Pinski
Since this optimization now walks the vops, it is better to only
do it in forwprop rather than in all the time in fold_stmt.

The next patch will add the limit to the alias walk.

gcc/ChangeLog:

* gimple-fold.cc (optimize_memcpy_to_memset): Move to
tree-ssa-forwprop.cc.
(gimple_fold_builtin_memory_op): Remove call to
optimize_memcpy_to_memset.
(fold_stmt_1): Likewise.
* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Move from
gimple-fold.cc.
(simplify_builtin_call): Try to optimize memcpy/memset.
(pass_forwprop::execute): Try to optimize memcpy like assignment
from a previous memset.

gcc/testsuite/ChangeLog:

* gcc.dg/pr78408-1.c: Update scan to forwprop1 only.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc   | 168 -
 gcc/testsuite/gcc.dg/pr78408-1.c |   3 +-
 gcc/tree-ssa-forwprop.cc | 176 +++
 3 files changed, 177 insertions(+), 170 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 009c5737ef9..b74fb8bb50c 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -894,155 +894,6 @@ size_must_be_zero_p (tree size)
   return vr.zero_p ();
 }
 
-/* Optimize
-   a = {};
-   b = a;
-   into
-   a = {};
-   b = {};
-   Similarly for memset (&a, ..., sizeof (a)); instead of a = {};
-   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;  */
-
-static bool
-optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, tree dest, tree src, 
tree len)
-{
-  ao_ref read;
-  gimple *stmt = gsi_stmt (*gsip);
-  if (gimple_has_volatile_ops (stmt))
-return false;
-
-
-  tree src2 = NULL_TREE, len2 = NULL_TREE;
-  poly_int64 offset, offset2;
-  tree val = integer_zero_node;
-  bool len_was_null = len == NULL_TREE;
-  if (len == NULL_TREE)
-len = (TREE_CODE (src) == COMPONENT_REF
-  ? DECL_SIZE_UNIT (TREE_OPERAND (src, 1))
-  : TYPE_SIZE_UNIT (TREE_TYPE (src)));
-  if (len == NULL_TREE
-  || !poly_int_tree_p (len))
-return false;
-
-  ao_ref_init (&read, src);
-  tree vuse = gimple_vuse (stmt);
-  gimple *defstmt;
-  do {
-if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
-  return false;
-defstmt = SSA_NAME_DEF_STMT (vuse);
-if (is_a (defstmt))
-  return false;
-
-/* If the len was null, then we can use TBBA. */
-if (stmt_may_clobber_ref_p_1 (defstmt, &read,
- /* tbaa_p = */ len_was_null))
-  break;
-vuse = gimple_vuse (defstmt);
-  } while (true);
-
-  if (gimple_store_p (defstmt)
-  && gimple_assign_single_p (defstmt)
-  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == STRING_CST
-  && !gimple_clobber_p (defstmt))
-{
-  tree str = gimple_assign_rhs1 (defstmt);
-  src2 = gimple_assign_lhs (defstmt);
-  /* The string must contain all null char's for now.  */
-  for (int i = 0; i < TREE_STRING_LENGTH (str); i++)
-   {
- if (TREE_STRING_POINTER (str)[i] != 0)
-   {
- src2 = NULL_TREE;
- break;
-   }
-   }
-}
-  else if (gimple_store_p (defstmt)
-  && gimple_assign_single_p (defstmt)
-  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR
-  && !gimple_clobber_p (defstmt))
-src2 = gimple_assign_lhs (defstmt);
-  else if (gimple_call_builtin_p (defstmt, BUILT_IN_MEMSET)
-  && TREE_CODE (gimple_call_arg (defstmt, 0)) == ADDR_EXPR
-  && TREE_CODE (gimple_call_arg (defstmt, 1)) == INTEGER_CST)
-{
-  src2 = TREE_OPERAND (gimple_call_arg (defstmt, 0), 0);
-  len2 = gimple_call_arg (defstmt, 2);
-  val = gimple_call_arg (defstmt, 1);
-  /* For non-0 val, we'd have to transform stmt from assignment
-into memset (only if dest is addressable).  */
-  if (!integer_zerop (val) && is_gimple_assign (stmt))
-   src2 = NULL_TREE;
-}
-
-  if (src2 == NULL_TREE)
-return false;
-
-  if (len2 == NULL_TREE)
-len2 = (TREE_CODE (src2) == COMPONENT_REF
-   ? DECL_SIZE_UNIT (TREE_OPERAND (src2, 1))
-   : TYPE_SIZE_UNIT (TREE_TYPE (src2)));
-  if (len2 == NULL_TREE
-  || !poly_int_tree_p (len2))
-return false;
-
-  src = get_addr_base_and_unit_offset (src, &offset);
-  src2 = get_addr_base_and_unit_offset (src2, &offset2);
-  if (src == NULL_TREE
-  || src2 == NULL_TREE
-  || maybe_lt (offset, offset2))
-return false;
-
-  if (!operand_equal_p (src, src2, 0))
-return false;
-
-  /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
- Make sure that
- [ src + offset, src + offset + len - 1 ] is a subset of that.  */
-  if (maybe_gt (wi::to_poly_offset (len) + (offset - offset2),
-   wi::to_poly_offset (len2)))
-return false;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-{
-  fprintf (dump_file, "Simplified\n  ");
-  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
-  fprintf (dump_file, "aft

Re: [PATCH 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-14 Thread Dhruv Chawla

On 06/05/25 21:57, Richard Sandiford wrote:

External email: Use caution opening links or attachments


Dhruv Chawla  writes:

This patch modifies the intrinsic expanders to expand svlsl and svlsr to
unpredicated forms when the predicate is a ptrue. It also folds the
following pattern:

   lsl , , 
   lsr , , 
   orr , , 

to:

   revb/h/w , 

when the shift amount is equal to half the bitwidth of the 
register.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

   * config/aarch64/aarch64-sve-builtins-base.cc
   (svlsl_impl::expand): Define.
   (svlsr_impl): New class.
   (svlsr_impl::fold): Define.
   (svlsr_impl::expand): Likewise.
   * config/aarch64/aarch64-sve.md
   (*v_rev): New pattern.
   (*v_revvnx8hi): Likewise.

gcc/testsuite/ChangeLog:

   * gcc.target/aarch64/sve/shift_rev_1.c: New test.
   * gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
---
  .../aarch64/aarch64-sve-builtins-base.cc  | 33 +++-
  gcc/config/aarch64/aarch64-sve.md | 49 +++
  .../gcc.target/aarch64/sve/shift_rev_1.c  | 83 +++
  .../gcc.target/aarch64/sve/shift_rev_2.c  | 63 ++
  4 files changed, 227 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 927c5bbae21..938d010e11b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2086,6 +2086,37 @@ public:
{
  return f.fold_const_binary (LSHIFT_EXPR);
}
+
+  rtx expand (function_expander &e) const override
+  {
+tree pred = TREE_OPERAND (e.call_expr, 3);
+tree shift = TREE_OPERAND (e.call_expr, 5);
+if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
+ && uniform_integer_cst_p (shift))
+  return e.use_unpred_insn (e.direct_optab_handler (ashl_optab));
+return rtx_code_function::expand (e);
+  }
+};
+
+class svlsr_impl : public rtx_code_function
+{
+public:
+  CONSTEXPR svlsr_impl () : rtx_code_function (LSHIFTRT, LSHIFTRT) {}
+
+  gimple *fold (gimple_folder &f) const override
+  {
+return f.fold_const_binary (RSHIFT_EXPR);
+  }
+
+  rtx expand (function_expander &e) const override
+  {
+tree pred = TREE_OPERAND (e.call_expr, 3);
+tree shift = TREE_OPERAND (e.call_expr, 5);
+if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
+ && uniform_integer_cst_p (shift))
+  return e.use_unpred_insn (e.direct_optab_handler (lshr_optab));
+return rtx_code_function::expand (e);
+  }
  };

  class svmad_impl : public function_base
@@ -3572,7 +3603,7 @@ FUNCTION (svldnt1, svldnt1_impl,)
  FUNCTION (svlen, svlen_impl,)
  FUNCTION (svlsl, svlsl_impl,)
  FUNCTION (svlsl_wide, shift_wide, (ASHIFT, UNSPEC_ASHIFT_WIDE))
-FUNCTION (svlsr, rtx_code_function, (LSHIFTRT, LSHIFTRT))
+FUNCTION (svlsr, svlsr_impl, )
  FUNCTION (svlsr_wide, shift_wide, (LSHIFTRT, UNSPEC_LSHIFTRT_WIDE))
  FUNCTION (svmad, svmad_impl,)
  FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX,


I'm hoping that this won't be necessary after the changes I mentioned
in patch 1.  The expander should handle everything itself.


Hi,

Unfortunately this still turned out to be required - removing the changes to the expander would cause a 
call to @aarch64_pred_ which would bypass the whole 
v3 pattern.




diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 42802bac653..7cce18c024b 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3232,6 +3232,55 @@
  ;; - REVW
  ;; -

+(define_insn_and_split "*v_rev"
+  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w")
+ (rotate:SVE_FULL_HSDI
+   (match_operand:SVE_FULL_HSDI 1 "register_operand" "w")
+   (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))]
+  "TARGET_SVE"
+  "#"
+  "&& !reload_completed"


This is an ICE trap: a pattern that requires a split ("#") must have
an unconditional split.

Which makes this awkward...argh!

However, since this is intended to match a 3-instruction combination,
I think we can do without the define_insn and just use a define_split.
That only works if we emit at most 2 instructions, but that can be
done with a bit of work.

I've attached a patch below that does that.


+  [(set (match_dup 3)
+ (ashift:SVE_FULL_HSDI (match_dup 1)
+   (match_dup 2)))
+   (set (match_dup 0)
+ (plus:SVE_FULL_HSDI
+   (lshiftrt:SVE_FULL_HSDI (match_dup 1)
+   (match_dup 4))
+   (match_dup 3)))]


This is an SVE2 instruction, but the guard is only for TARGET_SVE.
For TARGET_SVE, we should FAIL i

[PATCH v3 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-14 Thread dhruvc
From: Dhruv Chawla 

This patch modifies the intrinsic expanders to expand svlsl and svlsr to
unpredicated forms when the predicate is a ptrue. It also folds the
following pattern:

  lsl , , 
  lsr , , 
  orr , , 

to:

  revb/h/w , 

when the shift amount is equal to half the bitwidth of the 
register.

Bootstrapped and regtested on aarch64-linux-gnu.

Signed-off-by: Dhruv Chawla 
Co-authored-by: Richard Sandiford 

gcc/ChangeLog:

* expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the
target already provided the result in the expected register.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const):
Avoid forcing subregs into fresh registers unnecessarily.
* config/aarch64/aarch64-sve-builtins-base.cc
(svlsl_impl::expand): Define.
(svlsr_impl): New class.
(svlsr_impl::fold): Define.
(svlsr_impl::expand): Likewise.
* config/aarch64/aarch64-sve.md: Add define_split for rotate.
(*v_revvnx8hi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/shift_rev_1.c: New test.
* gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
* gcc.target/aarch64/sve/shift_rev_3.c: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 33 +++-
 gcc/config/aarch64/aarch64-sve.md | 55 
 gcc/config/aarch64/aarch64.cc | 10 ++-
 gcc/expmed.cc |  3 +-
 .../gcc.target/aarch64/sve/shift_rev_1.c  | 83 +++
 .../gcc.target/aarch64/sve/shift_rev_2.c  | 63 ++
 .../gcc.target/aarch64/sve/shift_rev_3.c  | 83 +++
 7 files changed, 326 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_3.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index b4396837c24..90dd5c97a10 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2086,6 +2086,37 @@ public:
   {
 return f.fold_const_binary (LSHIFT_EXPR);
   }
+
+  rtx expand (function_expander &e) const override
+  {
+tree pred = TREE_OPERAND (e.call_expr, 3);
+tree shift = TREE_OPERAND (e.call_expr, 5);
+if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
+   && uniform_integer_cst_p (shift))
+  return e.use_unpred_insn (e.direct_optab_handler (ashl_optab));
+return rtx_code_function::expand (e);
+  }
+};
+
+class svlsr_impl : public rtx_code_function
+{
+public:
+  CONSTEXPR svlsr_impl () : rtx_code_function (LSHIFTRT, LSHIFTRT) {}
+
+  gimple *fold (gimple_folder &f) const override
+  {
+return f.fold_const_binary (RSHIFT_EXPR);
+  }
+
+  rtx expand (function_expander &e) const override
+  {
+tree pred = TREE_OPERAND (e.call_expr, 3);
+tree shift = TREE_OPERAND (e.call_expr, 5);
+if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ()))
+   && uniform_integer_cst_p (shift))
+  return e.use_unpred_insn (e.direct_optab_handler (lshr_optab));
+return rtx_code_function::expand (e);
+  }
 };
 
 class svmad_impl : public function_base
@@ -3586,7 +3617,7 @@ FUNCTION (svldnt1, svldnt1_impl,)
 FUNCTION (svlen, svlen_impl,)
 FUNCTION (svlsl, svlsl_impl,)
 FUNCTION (svlsl_wide, shift_wide, (ASHIFT, UNSPEC_ASHIFT_WIDE))
-FUNCTION (svlsr, rtx_code_function, (LSHIFTRT, LSHIFTRT))
+FUNCTION (svlsr, svlsr_impl,)
 FUNCTION (svlsr_wide, shift_wide, (LSHIFTRT, UNSPEC_LSHIFTRT_WIDE))
 FUNCTION (svmad, svmad_impl,)
 FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX,
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index cb88d6d95a6..0156afc1e7d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3317,6 +3317,61 @@
 ;; - REVW
 ;; -
 
+(define_split
+  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand")
+   (rotate:SVE_FULL_HSDI
+ (match_operand:SVE_FULL_HSDI 1 "register_operand")
+ (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))]
+  "TARGET_SVE && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (ashift:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 2)))
+   (set (match_dup 0)
+   (plus:SVE_FULL_HSDI
+ (lshiftrt:SVE_FULL_HSDI (match_dup 1)
+ (match_dup 4))
+ (match_dup 3)))]
+  {
+if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2]))
+  DONE;
+
+if (!TARGET_SVE2)
+  FAIL;
+
+operands[3] = gen_reg_rtx (mode);
+HOST_WIDE_INT shift_amount =
+  INTVAL (unwrap_const_vec_duplicate (operands[2]));
+int bitwidth = GET_MODE_UNIT_BITSIZE (mode);
+

Re: [PATCH v2 2/8] RISC-V: Use riscv-ext.def to generate target options and variables

2025-05-14 Thread Kito Cheng
Hi Mark:

Thanks for your reminder, I got a few mails from buildbot, but didn't
figure out how to regen that correctly yesterday, and I finally found
the right way to regen...(Yeah, I was add an empty manually to make it
buildable since it actually not introduce new command line option)

https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683671.html

On Wed, May 14, 2025 at 6:58 PM Mark Wielaard  wrote:
>
> Hi Kito,
>
> On Mon, May 12, 2025 at 10:17:36PM +0800, Kito Cheng wrote:
> > Leverage the centralized riscv-ext.def definitions to auto-generate
> > the target option parsing and associated internal flags, replacing
> > manual listings in riscv.opt; `riscv_ext_flag_table` part will remove in
> > later patch.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/gen-riscv-ext-opt.cc: New.
> >   * config/riscv/riscv.opt: Drop manual entries for target
> >   options, and include riscv-ext.opt.
> >   * config/riscv/riscv-ext.opt: New.
> >   * config/riscv/riscv-ext.opt.urls: New.
> [...]
> >  gcc/config/riscv/riscv-ext.opt.urls   |   0
> >  gcc/config/riscv/riscv-opts.h |  12 +-
> >  gcc/config/riscv/riscv-vector-builtins.cc |  20 +-
> >  gcc/config/riscv/riscv.opt| 336 +-
> >  gcc/config/riscv/t-riscv  |  13 +
> >  10 files changed, 603 insertions(+), 406 deletions(-)
> >  create mode 100644 gcc/config/riscv/gen-riscv-ext-opt.cc
> >  create mode 100644 gcc/config/riscv/riscv-ext.opt
> >  create mode 100644 gcc/config/riscv/riscv-ext.opt.urls
> [...]
> > diff --git a/gcc/config/riscv/riscv-ext.opt.urls 
> > b/gcc/config/riscv/riscv-ext.opt.urls
> > new file mode 100644
> > index ..e69de29bb2d1
>
> This added an empty riscv-ext.opt.urls file.
> Which breaks the autoregen builder:
> https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
>
> Because when regenerating this file with make regenerate-opt-urls it
> is still empty, but gets an header:
>
> diff --git a/gcc/config/riscv/riscv-ext.opt.urls 
> b/gcc/config/riscv/riscv-ext.opt.urls
> index e69de29bb2d..c4f471079df 100644
> --- a/gcc/config/riscv/riscv-ext.opt.urls
> +++ b/gcc/config/riscv/riscv-ext.opt.urls
> @@ -0,0 +1,2 @@
> +; Autogenerated by regenerate-opt-urls.py from 
> gcc/config/riscv/riscv-ext.opt and generated HTML
> +
>
> Could you see if you need to add this header or if the file should be
> regenerated differently so it isn't empty?
>
> Thanks,
>
> Mark


[committed] RISC-V: Regen riscv-ext.opt.urls

2025-05-14 Thread Kito Cheng
gcc/ChangeLog:

* config/riscv/riscv-ext.opt.urls: Regenerate.
---
 gcc/config/riscv/riscv-ext.opt.urls | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv-ext.opt.urls 
b/gcc/config/riscv/riscv-ext.opt.urls
index e69de29bb2d..c4f471079df 100644
--- a/gcc/config/riscv/riscv-ext.opt.urls
+++ b/gcc/config/riscv/riscv-ext.opt.urls
@@ -0,0 +1,2 @@
+; Autogenerated by regenerate-opt-urls.py from gcc/config/riscv/riscv-ext.opt 
and generated HTML
+
-- 
2.34.1



Re: [PATCH] libgcobol: Allow for lack of LOG_PERROR

2025-05-14 Thread James K. Lowden
On Mon, 12 May 2025 14:05:50 +0200
Rainer Orth  wrote:

> Before going further, I'd first like to understand why you chose to
> use syslog in a runtime lib.  While logging to syslog is certainly
> useful in daemons and such, a runtime lib is different IMO: while
> regular users can log to syslog, access to the log files is usually
> restricted to privileged users. 

Hi Rainer, 

Thank you for asking the question, and for providing the exact minimal
fix I would have suggested.  I support your effort to ensure the COBOL
FE compiles on Solaris, and I'm sorry i ran afoul of it.  

I added syslog for error messages to libgcobol because it will be
required in production.  Daemon mode is normal for COBOL
programs, and interactive mode is rare.  There is no need for a 
daemonization process on a mainframe because running "headless" was
its modus operandi before it had a name.  

Most COBOL systems are launched by something like JCL, and monitored via
logs. They may be batch programs.  Or they may be online in the form of
e.g. CICS, which has as you might guess extensive logging.

The use of LOG_PERROR is a convenience for the gcobol user during
development and testing. It's my intention to make it an option,
something that I'll address when I overhaul options Real Soon Now. 

I don't intend an option to defeat logging.  If logging is defeated
during testing and enabled in production, surprises will be
unpleasant.  If logging is defeated in production, support will be much
more difficult.  If log messages are overwhelming or inconvenient,
that's what identification, facility, and level are for, in syslog
configuration.  

As a matter of GCC policy for libgcobol messages, I would suggest:

1.  Use LOG_PERROR if available, else #define LOG_PERROR 0.  We
can manage that in configure.ac.  For systems that don't support it,
document that it's not available.  If that presents a problem to
someone, the license lets them modify the code.  (As can we, of
course.)  

2.  If different syslog systems format their messages differently,
that's ok.  The messages produced by libgcobol will be in the format
familiar to the sysadmin.  

3.  Simplest is best.  As of now, there are 3 calls to syslog(3), with
no workarounds and no conditional code.  That serves all demonstrated
need.  By keeping it simple we make it reliable, which I'm sure we all
agree is a key feature of logging.  

Finally, I want to clarify for those following along at home why
libgcobol produces any message at all, ever, since the normal rule is
that libraries should be quiet.  

ISO COBOL defines Exception Conditions that may be enabled by the
program and Error Status on files.  Some are defined as fatal, meaning
if they aren't handled by the program they lead to process
termination.  In libgcobol, any fatal condition -- enabled explicitly
or implicitly -- that is not handled by the program is logged and leads
to a call to abort(3).  In addition, any EC explicitly enabled and not
explicitly handled (fatal or not) is logged.  

By default, no EC is enabled.  The theory is that if the program
enables an EC and doesn't handle it, that represents a logic error.
The message can be quelled by disabling the EC or handling it, either
way.  

I hope that answers your question.  It's not the only way in which
COBOL is different from C, believe me.  Welcome to my world.   

--jkl


Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-14 Thread Richard Sandiford
Kugan Vivekanandarajah  writes:
> Adding Eugene and Andi to CC as Sam suggested.
>
>> On 13 May 2025, at 12:57 am, Richard Sandiford  
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Kugan Vivekanandarajah  writes:
>>> diff --git a/configure.ac b/configure.ac
>>> index 730db3c1402..701284e38f2 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -621,6 +621,14 @@ case "${target}" in
>>> ;;
>>> esac
>>>
>>> +autofdo_target="i386"
>>> +case "${target}" in
>>> +  aarch64-*-*)
>>> +autofdo_target="aarch64"
>>> +;;
>>> +esac
>>> +AC_SUBST(autofdo_target)
>>> +
>>> # Disable libssp for some systems.
>>> case "${target}" in
>>>   avr-*-*)
>>
>> Couldn't we use the existing $cpu_type, rather than adding a new variable?
>> I don't think the two would ever need to diverge.
>
> I tried doing this but looks to me that $cpu_type is available only in 
> libgcc. Am I missing something  or do you want me to replicate that here?

What I meant was that we could simply add:

AC_SUBST(cpu_type)

rather than add a new variable and case statement.

Thanks,
Richard


Re: [14.x PATCH] c: Allow bool and enum null pointer constants [PR112556]

2025-05-14 Thread Sam James
Joseph Myers  writes:

> On Wed, 14 May 2025, Sam James wrote:
>
>> > (cherry picked from commit 3d525fce70fa0ffa0b22af6e213643e1ceca5ab5)
>> > ---
>> > As discussed on the PR, I feel like this is worth having for 14 as we're
>> > asking upstreams to try reproduce issues w/ -std=gnu23 (or -std=c23) if
>> > they don't have access to GCC 15, and this bug may lead to them being
>> > confused.
>> >
>> > Regtested on x86_64-pc-linux-gnu with no regressions.
>> >
>> > OK?
>> 
>> Ping, as 14 RC is tomorrow.
>
> Backporting this to GCC 14 is OK.

Thanks.


Re: [PATCH v2] c++, coroutines: Fix handling of early exceptions [PR113773].

2025-05-14 Thread Jason Merrill

On 5/14/25 2:10 PM, Iain Sandoe wrote:

that indicates we have not yet reached the ramp return.



This flag was not part of the fix on trunk, and could use more rationale.


The original fix was OK on trunk because exceptions thrown from the
return expression would happen before the initial suspend.  Having fixed
BZ199916 (which restores the state as per GCC-14) then such throws would
become indistinguishable.  Unfortunately, on many OSs that the frame is
destroyed does not become immediately obvious and testcases pass.  This
is why I included Sparc9 Solaris in the testing - there the frame
destruction does cause a fail.  So, for an implementation where the
return expr. can throw after the frame is destroyed the addition flag
is needed (I amended the patch desciption with an abbreviated comment).


+  gate = build2 (TRUTH_AND_EXPR, boolean_type_node, gate,
+coro_before_return);



Doesn't the order of operands to the && need to be the other way around,
to avoid checking iarc_x after the coro state has been destroyed?


Thanks for catching that, I need to check this on the trunk patch too.

OK now (after a retest)?


OK.


thanks
Iain

--- 8< ---

This is a GCC-14 version of the same strategy as used on trunk, but
with the more wide-ranging code cleanups elided.  Since the return
expression could throw, but after the frame is destroyed, we must
also account for this, in addition to whether initial await_resume
has been called.

PR c++/113773

gcc/cp/ChangeLog:

* coroutines.cc (coro_rewrite_function_body): Do not set
initial_await_resume_called here.
(morph_fn_to_coro): Set it here, and introduce a new flag
that indicates we have not yet reached the ramp return.
Gate the EH cleanups on both of these flags).

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr113773.C: New test.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc  | 45 ++---
  .../g++.dg/coroutines/torture/pr113773.C  | 66 +++
  2 files changed, 102 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr113773.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 8811d249c02..d96176973ec 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4460,7 +4460,7 @@ coro_rewrite_function_body (location_t fn_start, tree 
fnbody, tree orig,
tree i_a_r_c
= coro_build_artificial_var (fn_start, coro_frame_i_a_r_c_id,
 boolean_type_node, orig,
-boolean_false_node);
+NULL_TREE);
DECL_CHAIN (i_a_r_c) = var_list;
var_list = i_a_r_c;
add_decl_expr (i_a_r_c);
@@ -4779,10 +4779,15 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
tree coro_gro_live
  = coro_build_artificial_var (fn_start, "_Coro_gro_live",
 boolean_type_node, orig, boolean_false_node);
-
DECL_CHAIN (coro_gro_live) = varlist;
varlist = coro_gro_live;
  
+  tree coro_before_return

+= coro_build_artificial_var (fn_start, "_Coro_before_return",
+boolean_type_node, orig, boolean_true_node);
+  DECL_CHAIN (coro_before_return) = varlist;
+  varlist = coro_before_return;
+
/* Collected the scope vars we need ... only one for now. */
BIND_EXPR_VARS (ramp_bind) = nreverse (varlist);
  
@@ -4811,6 +4816,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer)

}
add_decl_expr (coro_promise_live);
add_decl_expr (coro_gro_live);
+  add_decl_expr (coro_before_return);
  
/* The CO_FRAME internal function is a mechanism to allow the middle end

   to adjust the allocation in response to optimizations.  We provide the
@@ -4964,8 +4970,10 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  
tree allocated = build1 (CONVERT_EXPR, coro_frame_ptr, new_fn);

tree r = cp_build_init_expr (coro_fp, allocated);
-  r = coro_build_cvt_void_expr_stmt (r, fn_start);
-  add_stmt (r);
+  finish_expr_stmt (r);
+
+  /* deref the frame pointer, to use in member access code.  */
+  tree deref_fp = build_x_arrow (fn_start, coro_fp, tf_warning_or_error);
  
/* If the user provided a method to return an object on alloc fail, then

   check the returned pointer and call the func if it's null.
@@ -5001,16 +5009,22 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   destruction in the case that promise or g.r.o setup fails or an exception
   is thrown from the initial suspend expression.  */
tree ramp_cleanup = NULL_TREE;
+  tree iarc_x = NULL_TREE;
if (flag_exceptions)
  {
+  iarc_x = lookup_member (coro_frame_type, coro_frame_i_a_r_c_id,
+/*protect=*/1, /*want_type=*/0, tf_warning_or_error);
+  iarc_x
+   = build_class_member_access_expr (deref

Re: [PATCH v2] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-05-14 Thread Jonathan Wakely
On Wed, 14 May 2025 at 20:50, Iain Sandoe  wrote:
>
>
>
> > On 14 May 2025, at 18:42, Rainer Orth  wrote:
> >
> > Hi Jonathan,
> >
> >> On 14/05/25 10:01 +0200, Tomasz Kamiński wrote:
> >>> This commits adjust the way how the arguments are stored in the _Arg_value
> >>> (and thus basic_format_args), by preserving the types of fixed width
> >>> floating-point types, that were previously converted to float, double,
> >>> long double.
> >>>
> >>> The _Arg_value union now contains alternatives with std::bfloat16_t,
> >>> std::float16_t, std::float32_t, std::float64_t that use pre-existing
> >>> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
> >>>
> >>> This does not affect formatting, as specialization of formatters for fixed
> >>> width floating-point types formats them by casting to the corresponding
> >>> standard floating point type.
> >>>
> >>> For the 128bit floating we need to handle the ppc64 architecture,
> >>> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
> >>> basis) designate either __ibm128 and __ieee128 type, we need to store both
> >>> types in the _Arg_value and have two _Arg_types (_Arg_ibm128, 
> >>> _Arg_ieee128).
> >>> On other architectures we use extra enumerator value to store __float128,
> >>> that is different from long double and _Float128. This is consistent with 
> >>> ppc64,
> >>> for which __float128, if present, is same type as __ieee128. We use
> >> _Arg_float128
> >>> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize 
> >>> that
> >>> this flag is not used for std::float128_t (_Float128) type, that is 
> >>> consistenly
> >>> formatted via handle.
> >>>
> >>> The __format::__float128_t type is renamed to __format::__flt128_t, to 
> >>> mitigate
> >>> visual confusion between this type and __float128. We also introduce 
> >>> __bflt16_t
> >>> typedef instead of using of decltype.
> >>>
> >>> We add new alternative for the _Arg_value and allow them to be accessed
> >> via _S_get,
> >>> when the types are available. However, we produce and handle corresponding
> >> _Arg_type,
> >>> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
> >>>
> >>> The formatter<_Float128, _CharT> that formats via __format::__flt128_t is 
> >>> always
> >>> provided, when type is available. It is still correct when 
> >>> __format::__flt128_t
> >>> is _Float128.
> >>>
> >>> We also provide formatter<__float128, _CharT> that formats via __flt128_t.
> >>> As this type may be disabled (-mno-float128), extra care needs to be 
> >>> taken,
> >>> for situation when __float128 is same as long double. If the formatter 
> >>> would be
> >>> defined in such case, the formatter would be 
> >>> generated
> >>> from different specializations, and have different mangling:
> >>> * formatter<__float128, _CharT> if __float128 is present,
> >>> * formatter<__format::__formattable_float, _CharT> otherwise.
> >>> To best of my knowledge this happens only on ppc64 for __ieee128 and 
> >>> __float128,
> >>> so the formatter is not defined in this case. static_assert is added to 
> >>> detect
> >>> other configurations like that. In such case we should replace it with
> >> constraint.
> >>>
> >>> PR libstdc++/119246
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>> * include/std/format (__format::__bflt16_t): Define.
> >>> (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128
> >>> is used.
> >>> (__format::__float128_t): Renamed to __format::__flt128_t.
> >>> (std::formatter<_Float128, _CharT>): Define always if there is
> >>> formattable 128bit float.
> >>> (std::formatter<__float128, _CharT>): Define.
> >>> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
> >>> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
> >>> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
> >>> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
> >>> (_Arg_value::_M_ieee128, _Arg_value::_M_float128)
> >>> (_Arg_value::_M_bf16, _Arg_value::_M_f16, _Arg_value::_M_f32)
> >>>  _Arg_value::_M_f64): Define.
> >>> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle __bflt16,
> >>> _Float16, _Float32, _Float64, and __float128 types.
> >>> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
> >>> _Float32, _Float64 and __float128 types.
> >>> (basic_format_arg::_M_visit): Handle _Arg_float128, _Arg_ieee128,
> >>> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
> >>> * testsuite/std/format/arguments/args.cc: Updated to illustrate
> >>> that extended floating point types use handles now. Added test
> >>> for __float128.
> >>> * testsuite/std/format/parse_ctx.cc: Extended test to cover class
> >>> to check_dynamic_spec with floating point types and handles.
> >>> ---
> >>> I believe I have fixed all the typos. OK for trunk?
> >>
> >>
> >> OK, thanks
> >
> > this patch broke Solaris bootstrap, both i386-pc-solaris2.11 a

[Patch] OpenMP/Fortran: Fix allocatable-component mapping of derived-type array comps

2025-05-14 Thread Tobias Burnus

The testcase was found when looking at mapping fails with
SPEC HPC's 619.clvleaf_s; however, the variant fixed by the
attached patch only showed up when experimenting and not
in the SPEC testcase itself.
Before the included fix, to be added testcase failed with
an ICE.

I intent to commit the attached patch tomorrow,
unless there are comments or suggestions.

Thanks,

Tobias
OpenMP/Fortran: Fix allocatable-component mapping of derived-type array comps

The check whether the location expression in map clause has allocatable
components was failing for some derived-type array expressions such as
  map(var%tiles(1))
as the compiler produced
  _4 = var.tiles;
  MEMREF(_4, _5);
This commit now also handles this case.

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_omp_deep_mapping_do): Handle SSA_NAME if
	a def_stmt is available.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/alloc-comp-4.f90: New test.

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 0b8150fb977..2a48d4af527 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2478,6 +2478,26 @@ gfc_omp_deep_mapping_do (bool is_cnt, const gimple *ctx, tree clause,
   else
 while (TREE_CODE (tmp) == COMPONENT_REF || TREE_CODE (tmp) == ARRAY_REF)
   tmp = TREE_OPERAND (tmp, TREE_CODE (tmp) == COMPONENT_REF ? 1 : 0);
+  if (TREE_CODE (tmp) == MEM_REF)
+tmp = TREE_OPERAND (tmp, 0);
+  if (TREE_CODE (tmp) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (tmp);
+  if (gimple_code (def_stmt) == GIMPLE_ASSIGN)
+	{
+	  tmp = gimple_assign_rhs1 (def_stmt);
+	  if (poly)
+	{
+	  tmp = TYPE_FIELDS (type);
+	  type = TREE_TYPE (tmp);
+	}
+	  else
+	while (TREE_CODE (tmp) == COMPONENT_REF
+		   || TREE_CODE (tmp) == ARRAY_REF)
+	  tmp = TREE_OPERAND (tmp,
+  TREE_CODE (tmp) == COMPONENT_REF ? 1 : 0);
+	}
+}
   /* If the clause argument is nonallocatable, skip is-allocate check. */
   if (GFC_DECL_GET_SCALAR_ALLOCATABLE (tmp)
   || GFC_DECL_GET_SCALAR_POINTER (tmp)
diff --git a/libgomp/testsuite/libgomp.fortran/alloc-comp-4.f90 b/libgomp/testsuite/libgomp.fortran/alloc-comp-4.f90
new file mode 100644
index 000..d5e982ba1a8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/alloc-comp-4.f90
@@ -0,0 +1,75 @@
+!
+! Check that mapping with map(var%tiles(1)) works.
+!
+! This uses deep mapping to handle the allocatable
+! derived-type components
+!
+! The tricky part is that GCC generates intermittently
+! an SSA_NAME that needs to be resolved.
+!
+module m
+type t
+ integer, allocatable :: den1(:,:), den2(:,:)
+end type t
+
+type t2
+ type(t), allocatable :: tiles(:)
+end type t2
+end
+
+use m
+use iso_c_binding
+implicit none (type, external)
+type(t2), target :: var
+logical :: is_self_map
+type(C_ptr) :: pden1, pden2, ptiles, ptiles1
+
+allocate(var%tiles(1))
+var%tiles(1)%den1 = reshape([1,2,3,4],[2,2])
+var%tiles(1)%den2 = reshape([11,22,33,44],[2,2])
+
+ptiles = c_loc(var%tiles)
+ptiles1 = c_loc(var%tiles(1))
+pden1 = c_loc(var%tiles(1)%den1)
+pden2 = c_loc(var%tiles(1)%den2)
+
+
+is_self_map = .false.
+!$omp target map(to: is_self_map)
+  is_self_map = .true.
+!$omp end target
+
+!$omp target enter data map(var%tiles(1))
+
+!$omp target firstprivate(ptiles, ptiles1, pden1, pden2)
+ if (any (var%tiles(1)%den1 /= reshape([1,2,3,4],[2,2]))) stop 1
+ if (any (var%tiles(1)%den2 /= reshape([11,22,33,44],[2,2]))) stop 2
+ var%tiles(1)%den1 = var%tiles(1)%den1 + 5
+ var%tiles(1)%den2 = var%tiles(1)%den2 + 7
+
+ if (is_self_map) then
+   if (.not. c_associated (ptiles, c_loc(var%tiles))) stop 3
+   if (.not. c_associated (ptiles1, c_loc(var%tiles(1 stop 4
+   if (.not. c_associated (pden1, c_loc(var%tiles(1)%den1))) stop 5
+   if (.not. c_associated (pden2, c_loc(var%tiles(1)%den2))) stop 6
+ else
+   if (c_associated (ptiles, c_loc(var%tiles))) stop 3
+   if (c_associated (ptiles1, c_loc(var%tiles(1 stop 4
+   if (c_associated (pden1, c_loc(var%tiles(1)%den1))) stop 5
+   if (c_associated (pden2, c_loc(var%tiles(1)%den2))) stop 6
+ endif
+!$omp end target
+
+if (is_self_map) then
+  if (any (var%tiles(1)%den1 /= 5 + reshape([1,2,3,4],[2,2]))) stop 7
+  if (any (var%tiles(1)%den2 /= 7 + reshape([11,22,33,44],[2,2]))) stop 8
+else
+  if (any (var%tiles(1)%den1 /= reshape([1,2,3,4],[2,2]))) stop 7
+  if (any (var%tiles(1)%den2 /= reshape([11,22,33,44],[2,2]))) stop 8
+endif
+
+!$omp target exit data map(var%tiles(1))
+
+if (any (var%tiles(1)%den1 /= 5 + reshape([1,2,3,4],[2,2]))) stop 7
+if (any (var%tiles(1)%den2 /= 7 + reshape([11,22,33,44],[2,2]))) stop 8
+end


Re: [PATCH] c++: unifying specializations of non-primary tmpls [PR120161]

2025-05-14 Thread Jason Merrill

On 5/14/25 2:44 PM, Patrick Palka wrote:

On Wed, 14 May 2025, Patrick Palka wrote:


On Wed, 14 May 2025, Jason Merrill wrote:


On 5/12/25 7:53 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK
for trunk/15/14?

-- >8 --

Here unification of P=Wrap::type, A=Wrap::type wrongly
succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
no longer recurse into template arguments for non-primary templates
(since they're a non-deduced context) and so the int/long mismatch that
makes the two types distinct goes unnoticed.

In the case of (comparing specializations of) a non-primary template,
unify should still go on to compare the types directly before returning
success.


Should the PRIMARY_TEMPLATE_P check instead move up to join the
CLASSTYPE_TEMPLATE_INFO check?  try_class_deduction also doesn't seem
applicable to non-primary templates.


I don't think that'd work, for either the CLASSTYPE_TEMPLATE_INFO (parm) check
or the earlier CLASSTYPE_TEMPLATE_INFO (arg) check.

While try_class_deduction directly doesn't apply to non-primary templates,
get_template_base still might, so if we move up the PRIMARY_TEMPLATE_P to join
the C_T_I (parm) check, then we wouldn't try get_template_base anymore which
would  break e.g.

 template struct B { };

 template
 struct A {
   struct C : B { };
 };

 template void f(B*);

 int main() {
   A::C c;
   f(&c);
 }

If we move the PRIMARY_TEMPLATE_P check up to the C_T_I (arg) check, then
that'd mean we still don't check same_type_p on the two types in the
non-primary case, which seems wrong (although it'd fix the PR thanks to the
parm == arg early exit in unify).


FWIW it seems part of the weird/subtle logic here is due to the fact
that when unifying e.g. P=C with A=C, we do it twice, first via
try_class_deduction using a copy of 'targs', and if that succeeds we do
it again with the real 'targs'.  I think the logic could simultaneously
be simplified and made memory efficient if we made it so that if the
trial unification from try_class_deduction succeeds we just use its
'targs' instead of having to repeat the unification.


Hmm, good point, though I don't see what you mean by "a copy", it looks 
to me like we do it twice with the real 'targs'.  Seems like we should 
move try_class_unification out of the UNIFY_ALLOW_DERIVED block and 
remove the unify that your previous patch conditionalized.


Jason



Re: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins

2025-05-14 Thread Joseph Myers
On Wed, 14 May 2025, Yuao Ma wrote:

> Hi Joseph,
> 
> I have updated the patch based on your review comments. I added the 
> newly introduced builtin to extend.texi and mentioned the PR in the 
> commit message. Could you please take another look when you have a 
> moment?

This version is OK in the absence of objections within 48 hours.

-- 
Joseph S. Myers
josmy...@redhat.com



[wwwdocs] Remove claims that the release timeline shows future releases

2025-05-14 Thread Jonathan Wakely
The timeline hasn't shown any tentative dates for future releases since
2006-03-06 when GCC 3.4.6 was released and the tentative date got
replaced with the real date. Since then only actual release dates have
been added, on the day when the release happens.
---

OK for wwwdocs?

 htdocs/develop.html  | 2 +-
 htdocs/releases.html | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/htdocs/develop.html b/htdocs/develop.html
index d0ae36bd..f4625817 100644
--- a/htdocs/develop.html
+++ b/htdocs/develop.html
@@ -298,7 +298,7 @@ number carried little to no useful information.
 
 Release Timeline
 
-Here is a history of recent and a tentative timeline of upcoming
+Here is a timeline of historical
 stages of development, branch points, and releases:
 
 
diff --git a/htdocs/releases.html b/htdocs/releases.html
index 56f73ca5..13a086c5 100644
--- a/htdocs/releases.html
+++ b/htdocs/releases.html
@@ -28,8 +28,8 @@ binaries. for various platforms.
 GCC Timeline
 
 The table is sorted by date.  Please refer to our
-development plan for future
-releases and an alternative view of the release history.
+development plan for
+an alternative view of the release history.
 
 
 ReleaseRelease date
-- 
2.49.0



Re: [PATCH] c++: Add testcase for issue fixed in GCC 15 [PR120126]

2025-05-14 Thread Jason Merrill

On 5/14/25 3:43 AM, Simon Martin wrote:

Patrick noticed that this PR's testcase has been fixed by the patch for
PR c++/114292 (r15-7238-gceabea405ffdc8), more specifically the part
that walks the type of DECL_EXPR DECLs.

This simply adds the case to the testsuite.


OK.


Successfully tested on x86_64-pc-linux-gnu.

PR c++/120126

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-ice33.C: New test.

---
  gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice33.C | 12 
  1 file changed, 12 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice33.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice33.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice33.C
new file mode 100644
index 000..85642863530
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice33.C
@@ -0,0 +1,12 @@
+// PR c++/120126
+// { dg-do compile { target c++11 } }
+
+template 
+int sum(Args... args) {
+  return [args...] { // { dg-error "parameter packs not expanded with" }
+typename decltype(args)::type temp;
+  };
+}
+int main() {
+  sum(1, 10);
+}




[Patch] OpenMP: Fix mapping of zero-sized arrays with non-literal size: map(var[:n]), n = 0

2025-05-14 Thread Tobias Burnus

[I intent to commit this patch later today or probably tomorrow,
unless there are comments, questions or concerns.]

The issue showed up for SPEC HPC's 634.hpgmgfv_s benchmark, but only when
running with multiple MPI processes. The reason is that in that case, the
work is distributed over multiple processed and for some, the code

#pragma omp target enter data 
map(to:level->vectors[i][:num_my_boxes*box_volume])

happens to have the value num_my_boxes == 0.

While for   map(ptr[:0])  the pointer attach was permitted to fail
and for  map(ptr[:5])  both mapping and pointer attach happened,
it failed for  map(ptr[:n]) if n == 0 at runtime

In this case, it is simple to check whether a previous item - e.g.
the one just before - is now a zero-sized item (which has an extra
map type - updated for non-constant values at runtime).
However, with 'target enter data' items get split - and while that
tries to keep items together, for more complex code like in the
second test case, only the ATTACH remains. The third testcase is
even weirder as the preceding item is unrelated to the attach
and just happens to sit there - as with struct mappings, the
attach are all clustered at the end.

An example is code like 'map(var->array[i][:n])' and possibly
combined with mapping multiple members in of the same 'var',
some with n > 0 and others with n == 0.


Solution: For 'target' an attempt is made to check whether this
is just a broken attachment or valid. But for 'target data',
the code now assumes that not finding a pointer target is
fine as it comes from some zero-sized attachment.

Thus, while loosing some diagnostic checks, it at least has no
false positives for valid real-world code.

Any comment before I apply the patch?

Tobias

PS: I excluded OpenACC - but I think it will have similar issues.
However, I have not tried to identify them.
OpenMP: Fix mapping of zero-sized arrays with non-literal size: map(var[:n]), n = 0

For map(ptr[:0]), the used map kind is GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION
and it is permitted that 'ptr' does not exist. 'ptr' is set to the device
pointee if it exists or to the host value otherwise.

For map(ptr[:3]), the variable is first mapped and then ptr is updated to point
to the just-mapped device data; the attachment uses GOMP_MAP_ATTACH.

For map(ptr[:n]), generates always a GOMP_MAP_ATTACH, but when n == 0, it
was failing with:
   "pointer target not mapped for attach"

The solution is not to fail but first to check whether it was mapped before.
It turned out that for the mapping part, GCC adds a run-time check whether
n == 0 - and uses GOMP_MAP_ZERO_LEN_ARRAY_SECTION for the mapping.
Thus, we just have to check whether there such a mapping for the address
for which the GOMP_MAP_ATTACH. was requested. And, if there was, the
error diagnostic can be skipped.

Unsurprisingly, this issue occurs in real-world code; it was detected in
a code that distributes work via MPI and for some processes, some bounds
ended up to be zero.

libgomp/ChangeLog:

	* target.c (gomp_attach_pointer): Return bool; accept additional
	bool to optionally silence the fatal pointee-not-found error.
	(gomp_map_vars_internal): If the pointee could not be found,
	check whether it was mapped as GOMP_MAP_ZERO_LEN_ARRAY_SECTION.
	* libgomp.h (gomp_attach_pointer): Update prototype.
	* oacc-mem.c (acc_attach_async, goacc_enter_data_internal): Update
	calls.
	* testsuite/libgomp.c/target-map-zero-sized.c: New test.
	* testsuite/libgomp.c/target-map-zero-sized-2.c: New test.
	* testsuite/libgomp.c/target-map-zero-sized-3.c: New test.

 libgomp/libgomp.h  |   4 +-
 libgomp/oacc-mem.c |   6 +-
 libgomp/target.c   |  64 +---
 .../testsuite/libgomp.c/target-map-zero-sized-2.c  |  74 ++
 .../testsuite/libgomp.c/target-map-zero-sized-3.c  |  49 ++
 .../testsuite/libgomp.c/target-map-zero-sized.c| 107 +
 6 files changed, 288 insertions(+), 16 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d97768f5125..6030f9d0a2c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1468,10 +1468,10 @@ extern void gomp_copy_dev2host (struct gomp_device_descr *,
 struct goacc_asyncqueue *, void *, const void *,
 size_t);
 extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t);
-extern void gomp_attach_pointer (struct gomp_device_descr *,
+extern bool gomp_attach_pointer (struct gomp_device_descr *,
  struct goacc_asyncqueue *, splay_tree,
  splay_tree_key, uintptr_t, size_t,
- struct gomp_coalesce_buf *, bool);
+ struct gomp_coalesce_buf *, bool, bool);
 extern void gomp_detach_pointer (struct gomp_device_descr *,
  struct goacc_asyncqueue *, splay_tree_key,
  uintptr_t, bool, struct gomp_coalesce_buf *);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 718252b44ba..0482ed37d95 100644
--- a/libgomp/oacc-mem.c
+++ b

Re: [PATCH][GCC15/14/13/12] dwarf2out: Propagate dtprel into the .debug_addr table in resolve_addr_in_expr

2025-05-14 Thread Richard Biener
On Wed, May 14, 2025 at 5:25 AM Kyle Huey  wrote:
>
> For a debugger to display statically-allocated[0] TLS variables the compiler
> must communicate information[1] that can be used in conjunction with knowledge
> of the runtime enviroment[2] to calculate a location for the variable for
> each thread. That need gives rise to dw_loc_dtprel in dwarf2out, a flag 
> tracking
> whether the location description is dtprel, or relative to the
> "dynamic thread pointer". Location descriptions in the .debug_info section for
> TLS variables need to be relocated by the static linker accordingly, and
> dw_loc_dtprel controls emission of the needed relocations.
>
> This is further complicated by -gsplit-dwarf. -gsplit-dwarf is designed to 
> allow
> as much debugging information as possible to bypass the static linker to 
> improve
> linking performance. One of the ways that is done is by introducing a layer of
> indirection for relocatable values[3]. That gives rise to addr_index_table 
> which
> ultimately results in the .debug_addr section.
>
> While the code handling addr_index_table clearly contemplates the existence of
> dtprel entries[4] resolve_addr_in_expr does not, and the result is that when
> using -gsplit-dwarf the DWARF for TLS variables contains an address[5] rather
> than an offset, and debuggers can't work with that.
>
> This is visible on a trivial example. Compile
>
> ```
> static __thread int tls_var;
>
> int main(void) {
>   tls_var = 42;
>   return 0;
> }
> ```
>
> with -g and -g -gsplit-dwarf. Run the program under gdb. When examining the
> value of tls_var before and after the assignment, -g behaves as one would
> expect but -g -gsplit-dwarf does not. If the user is lucky and the 
> miscalculated
> address is not mapped, gdb will print "Cannot access memory at address ...".
> If the user is unlucky and the miscalculated address is mapped, gdb will 
> simply
> give the wrong value. You can further confirm that the issue is the address
> calculation by asking gdb for the address of tls_var and comparing that to 
> what
> one would expect.[6]
>
> Thankfully this is trivial to fix by modifying resolve_addr_in_expr to 
> propagate
> the dtprel character of the location where necessary. gdb begins working as
> expected and the diff in the generated assembly is clear.
>
> ```
> .section.debug_addr,"",@progbits
> .long   0x14
> .value  0x5
> .byte   0x8
> .byte   0
>  .Ldebug_addr0:
> -   .quad   tls_var
> +   .long   tls_var@dtpoff, 0
> .quad   .LFB0
> ```
>
> [0] Referring to e.g. __thread as statically-allocated vs. e.g. a
> dynamically-allocated pthread_key_create() call.
> [1] Generally an offset in a TLS block.
> [2] With glibc, provided by libthread_db.so.
> [3] Relocatable values are moved to a table in the .debug_addr section, those
> values in .debug_info are replaced with special values that look up 
> indexes
> in that table, and then the static linker elsewhere assigns a single 
> per-CU
> starting index in the .debug_addr section, allowing those special values 
> to
> remain permanently fixed and the resulting data to be ignored by the 
> linker.
> [4] ate_kind_rtx_dtprel exists, after all, and new_addr_loc_descr does produce
> it where appropriate.
> [5] e.g. an address in the .tbss/.tdata section.
> [6] e.g. on x86-64 by examining %fsbase and the offset in the assembly

I have bootstrapped/tested this on x86_64-unknown-linux-gnu on the
15 and 14 branches and pushed there.

Richard.

> 2025-05-01  Kyle Huey  
>
> * dwarf2out.cc (resolve_addr_in_expr): Propagate dtprel into the 
> address
> table when appropriate.
> ---
>  gcc/dwarf2out.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 69e9d775d0d..2437610d48d 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -31068,7 +31068,8 @@ resolve_addr_in_expr (dw_attr_node *a, 
> dw_loc_descr_ref loc)
>return false;
>  remove_addr_table_entry (loc->dw_loc_oprnd1.val_entry);
> loc->dw_loc_oprnd1.val_entry
> - = add_addr_table_entry (rtl, ate_kind_rtx);
> + = add_addr_table_entry (rtl, loc->dtprel
> + ? ate_kind_rtx_dtprel : ate_kind_rtx);
>}
> break;
>case DW_OP_const4u:
> --
> 2.43.0
>


Re: [PATCH] libgcobol: Add multilib support

2025-05-14 Thread James K. Lowden
On Wed, 14 May 2025 11:04:50 +0200
Rainer Orth  wrote:

> Work around what appears to be a GNU make bug handling MAKEFLAGS

Before I say Yes, could someone please tell me why this rumored bug is
responsible for so much boilerplate in our Makefile.am files?  You
say, 

> Unlike some runtime libs that can get away without setting
> AM_MAKEFLAGS and friends, libgcobol can not since it then tries to
> link the 64-bit libgcobol with 32-bit libstdc++.

but I don't see the connection between that and 20 lines of definition
resting on "what appears to be a bug".  

I guess I can live with "no one knows, that's what we do."  But I'm
sure I'm not alone in preferring to understand how the build builds.  

--jkl


Re: [PATCH] libstdc++: Make debug iterator pointer sequence const [PR116369]

2025-05-14 Thread François Dumont

On 12/05/2025 23:03, Jonathan Wakely wrote:

On 31/03/25 22:20 +0200, François Dumont wrote:

Hi

Following this previous patch 
https://gcc.gnu.org/pipermail/libstdc++/2024-August/059418.html I've 
completed it for the _Safe_unordered_container_base type and 
implemented the rest of the change to store the safe iterator 
sequence as a pointer-to-const.


    libstdc++: Make debug iterator pointer sequence const [PR116369]

    In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug 
sequence
    have been made mutable to allow attach iterators to const 
containers.
    This change completes this fix by also declaring debug unordered 
container

    members mutable.

    Additionally the debug iterator sequence is now a 
pointer-to-const and so
    _Safe_sequence_base _M_attach and all other methods are const 
qualified.

    Symbols export are maintained thanks to __asm directives.


I can't compile this, it seems to be missing changes to
safe_local_iterator.tcc:

In file included from 
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.h:444,
 from 
/home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:33:
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc: 
In member function ‘typename 
__gnu_debug::_Distance_traits<_Iterator>::__type 
__gnu_debug::_Safe_local_iterator<_Iterator, 
_Sequence>::_M_get_distance_to(const 
__gnu_debug::_Safe_local_iterator<_Iterator, _Sequence>&) const’:
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17: 
error: there are no arguments to ‘_M_get_sequence’ that depend on a 
template parameter, so a declaration of ‘_M_get_sequence’ must be 
available [-Wtemplate-body]

   47 | _M_get_sequence()->bucket_size(bucket()),
  | ^~~
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17: 
note: (if you use ‘-fpermissive’, G++ will accept your code, but 
allowing the use of an undeclared name is deprecated)
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:59:18: 
error: there are no arguments to ‘_M_get_sequence’ that depend on a 
template parameter, so a declaration of ‘_M_get_sequence’ must be 
available [-Wtemplate-body]

   59 | -_M_get_sequence()->bucket_size(bucket()),
  |  ^~~

Yes, sorry, I had already spotted this problem, but only updated the PR 
and not re-sending patch here.






Also available as a PR

https://forge.sourceware.org/gcc/gcc-TEST/pulls/47

    /** Detach all singular iterators.
 *  @post for all iterators i attached to this sequence,
 *   i->_M_version == _M_version.
 */
    void
-    _M_detach_singular();
+    _M_detach_singular() const
+ __asm("_ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEv");


Does this work on all targets?


No idea ! I thought the symbol name used here just had to match the 
entries in config/abi/pre/gnu.ver.


It is what other usages of __asm in the lib are doing for the moment, in 
 header, wo target considerations.




I think darwin uses __Z as the prefix
for mangled names.
It might be necessary to use a macro to do this, so that it
conditionally puts "_" before the name.

If it's the only "exotic" target I can indeed deal with it with a macro. 
Otherwise I might have to find another alternative.


Here is an updated version considering all your other remarks.

    libstdc++: Make debug iterator pointer sequence const [PR116369]

    In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug sequence
    have been made mutable to allow attach iterators to const containers.
    This change completes this fix by also declaring debug unordered 
container

    members mutable.

    Additionally the debug iterator sequence is now a pointer-to-const 
and so
    _Safe_sequence_base _M_attach and all other methods are const 
qualified.

    Symbols export are maintained thanks to __asm directives.

    libstdc++-v3/ChangeLog:

    PR c++/116369
    * include/debug/safe_base.h
    (_Safe_iterator_base::_M_sequence): Declare as 
pointer-to-const.
    (_Safe_iterator_base::_M_attach, _M_attach_single): Take 
pointer-to-const
    _Safe_sequence_base. Add __asm directive to preserve name 
mangling.
    (_Safe_sequence_base::_M_detach_all, _M_detach_singular, 
_M_revalidate_singular)
    (_M_swap, _M_get_mutex, _M_attach, _M_attach_single, 
_M_detash, _M_detash_single):
    Add const qualifier and __asm directive to preserve name 
mangling.

    * include/debug/safe_unordered_base.h
    (_Safe_local_iterator_base::_M_safe_container): New.
(_Safe_local_iterator_base::_Safe_local_iterator_base): Take
    _Safe_unordered_container_base as pointer-to-const.
    (_Safe_unordered_container_base:

Re: [PATCH GCC-14.3] c++, coroutines: Fix handling of early exceptions [PR113773].

2025-05-14 Thread Jason Merrill

On 5/13/25 11:06 AM, Iain Sandoe wrote:

This could not be done as a cherry-pick from the trunk resolution.
Tested on x86_64-darwin, powerpc64le linux sparc9 solaris,
OK for 14.3 ?
thanks
Iain

--- 8< ---

This is a GCC-14 version of the same strategy as used on trunk, but
with the more wide-ranging code cleanups elided.

PR c++/113773

gcc/cp/ChangeLog:

* coroutines.cc (coro_rewrite_function_body): Do not set
initial_await_resume_called here.
(morph_fn_to_coro): Set it here, and introduce a new flag
that indicates we have not yet reached the ramp return.


This flag was not part of the fix on trunk, and could use more rationale.


+  gate = build2 (TRUTH_AND_EXPR, boolean_type_node, gate,
+coro_before_return);


Doesn't the order of operands to the && need to be the other way around, 
to avoid checking iarc_x after the coro state has been destroyed?


Jason



Re: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins

2025-05-14 Thread Joseph Myers
On Wed, 14 May 2025, Yuao Ma wrote:

> Hi all,
> 
> This patch adds trigonometric pi-based functions as gcc builtins: acospi, 
> asinpi, atan2pi,
> atanpi, cospi, sinpi, and tanpi. Latest glibc already provides support for
> these functions, which we plan to leverage in future gfortran implementations.
> 
> The patch includes two test cases to verify both correct code generation and
> function definition.
> 
> If approved, I suggest committing this foundational change first. Constant
> folding for these builtins will be addressed in subsequent patches.

Note that either this change, or a subsequent one that makes the built-in 
functions do something useful, should also update extend.texi, "Library 
Builtins", to mention the new functions.  (The text there doesn't 
distinguish existing C23 built-in functions, such as exp10 or roundeven, 
from those that are pure extensions, but addressing that is independent of 
adding new functions to the list.  Also, I'm not sure these sentences with 
very long lists of functions are really the optimal way of presenting the 
information about such built-in functions; maybe Sandra has better ideas 
about how to document this, but again that's independent of adding new 
functions.)

The commit message should reference PR c/118592 (it's not a full fix, but 
it's partial progress towards the full set of built-in functions / 
constant folding).

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH 2/2][v2] Remove the mixed stmt_vec_info/SLP node record_stmt_cost overload

2025-05-14 Thread Richard Biener
The following changes the record_stmt_cost calls in
vectorizable_load/store to only pass the SLP node when costing
vector stmts.  For now we'll still pass the stmt_vec_info,
determined from SLP_TREE_REPRESENTATIVE, so this merely cleans up
the API.

v2 does away with the idea to use stmt_info and not slp_node for
scalar_{load,store,stmt} for now, it confuses the aarch64
hooks.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-vectorizer.h (record_stmt_cost): Remove mixed
stmt_vec_info/SLP node inline overload.
* tree-vect-stmts.cc (vectorizable_store): For costing
vector stmts only pass SLP node to record_stmt_cost.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-stmts.cc | 62 +-
 gcc/tree-vectorizer.h  | 13 -
 2 files changed, 25 insertions(+), 50 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index eb0b0d00e75..66958543bf8 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8680,7 +8680,7 @@ vectorizable_store (vec_info *vinfo,
   }
 else if (vls_type != VLS_STORE_INVARIANT)
   return;
-*prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info,
+*prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
slp_node, 0, vect_prologue);
   };
 
@@ -8989,8 +8989,7 @@ vectorizable_store (vec_info *vinfo,
  if (nstores > 1)
inside_cost
  += record_stmt_cost (cost_vec, n_adjacent_stores,
-  vec_to_scalar, stmt_info, slp_node,
-  0, vect_body);
+  vec_to_scalar, slp_node, 0, vect_body);
}
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
@@ -9327,8 +9326,7 @@ vectorizable_store (vec_info *vinfo,
{
  if (costing_p && vls_type == VLS_STORE_INVARIANT)
prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
-  stmt_info, slp_node, 0,
-  vect_prologue);
+  slp_node, 0, vect_prologue);
  else if (!costing_p)
{
  /* Since the store is not grouped, DR_GROUP_SIZE is 1, and
@@ -9402,8 +9400,7 @@ vectorizable_store (vec_info *vinfo,
  unsigned int cnunits = vect_nunits_for_cost (vectype);
  inside_cost
+= record_stmt_cost (cost_vec, cnunits, scalar_store,
-stmt_info, slp_node, 0,
-vect_body);
+slp_node, 0, vect_body);
  continue;
}
 
@@ -9471,7 +9468,7 @@ vectorizable_store (vec_info *vinfo,
  unsigned int cnunits = vect_nunits_for_cost (vectype);
  inside_cost
+= record_stmt_cost (cost_vec, cnunits, scalar_store,
-stmt_info, slp_node, 0, vect_body);
+slp_node, 0, vect_body);
  continue;
}
 
@@ -9579,14 +9576,14 @@ vectorizable_store (vec_info *vinfo,
 consumed by the load).  */
  inside_cost
+= record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
-stmt_info, slp_node, 0, vect_body);
+slp_node, 0, vect_body);
  /* N scalar stores plus extracting the elements.  */
  inside_cost
+= record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
-stmt_info, slp_node, 0, vect_body);
+slp_node, 0, vect_body);
  inside_cost
+= record_stmt_cost (cost_vec, cnunits, scalar_store,
-stmt_info, slp_node, 0, vect_body);
+slp_node, 0, vect_body);
  continue;
}
 
@@ -9780,8 +9777,7 @@ vectorizable_store (vec_info *vinfo,
  int group_size = DR_GROUP_SIZE (first_stmt_info);
  int nstmts = ceil_log2 (group_size) * group_size;
  inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
-  stmt_info, slp_node, 0,
-  vect_body);
+  slp_node, 0, vect_body);
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "vect_model_store_cost: "
@@ -9810,8 +9806,7 @@ vectorizable_store (vec_info *vi

[PATCH 1/2] Use vectype from SLP node for vect_get_{load, store}_cost if possible

2025-05-14 Thread Richard Biener
The vect_get_{load,store}_cost API is used from both vectorizable_*
where we've done SLP analysis and from alignment peeling analysis
with is done before this and thus only stmt_vec_infos are available.
The following patch makes sure we pick the vector type relevant
for costing from the SLP node when available.

* tree-vect-stmts.cc (Compute vectype based on whether we got
SLP node or stmt_vec_info and use the full record_stmt_cost API.
(vect_get_load_cost): Likewise.
---
 gcc/tree-vect-stmts.cc | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ec50f5098b5..eb0b0d00e75 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1017,13 +1017,15 @@ vect_get_store_cost (vec_info *, stmt_vec_info 
stmt_info, slp_tree slp_node,
 unsigned int *inside_cost,
 stmt_vector_for_cost *body_cost_vec)
 {
+  tree vectype
+= slp_node ? SLP_TREE_VECTYPE (slp_node) : STMT_VINFO_VECTYPE (stmt_info);
   switch (alignment_support_scheme)
 {
 case dr_aligned:
   {
*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
- vector_store, stmt_info, slp_node, 0,
- vect_body);
+ vector_store, stmt_info, slp_node,
+ vectype, 0, vect_body);
 
 if (dump_enabled_p ())
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -1036,7 +1038,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, 
slp_tree slp_node,
 /* Here, we assign an additional cost for the unaligned store.  */
*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
  unaligned_store, stmt_info, slp_node,
- misalignment, vect_body);
+ vectype, misalignment, vect_body);
 if (dump_enabled_p ())
   dump_printf_loc (MSG_NOTE, vect_location,
"vect_model_store_cost: unaligned supported by "
@@ -1070,12 +1072,15 @@ vect_get_load_cost (vec_info *, stmt_vec_info 
stmt_info, slp_tree slp_node,
stmt_vector_for_cost *body_cost_vec,
bool record_prologue_costs)
 {
+  tree vectype
+= slp_node ? SLP_TREE_VECTYPE (slp_node) : STMT_VINFO_VECTYPE (stmt_info);
   switch (alignment_support_scheme)
 {
 case dr_aligned:
   {
*inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
- stmt_info, slp_node, 0, vect_body);
+ stmt_info, slp_node, vectype,
+ 0, vect_body);
 
 if (dump_enabled_p ())
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -1088,7 +1093,7 @@ vect_get_load_cost (vec_info *, stmt_vec_info stmt_info, 
slp_tree slp_node,
 /* Here, we assign an additional cost for the unaligned load.  */
*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
  unaligned_load, stmt_info, slp_node,
- misalignment, vect_body);
+ vectype, misalignment, vect_body);
 
 if (dump_enabled_p ())
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -1100,18 +1105,19 @@ vect_get_load_cost (vec_info *, stmt_vec_info 
stmt_info, slp_tree slp_node,
 case dr_explicit_realign:
   {
*inside_cost += record_stmt_cost (body_cost_vec, ncopies * 2,
- vector_load, stmt_info, slp_node, 0,
- vect_body);
+ vector_load, stmt_info, slp_node,
+ vectype, 0, vect_body);
*inside_cost += record_stmt_cost (body_cost_vec, ncopies,
- vec_perm, stmt_info, slp_node, 0,
- vect_body);
+ vec_perm, stmt_info, slp_node,
+ vectype, 0, vect_body);
 
 /* FIXME: If the misalignment remains fixed across the iterations of
the containing loop, the following cost should be added to the
prologue costs.  */
 if (targetm.vectorize.builtin_mask_for_load)
  *inside_cost += record_stmt_cost (body_cost_vec, 1, vector_stmt,
-   stmt_info, slp_node, 0, vect_body);
+   stmt_info, slp_node, vectype,
+   0, vect_body);
 
 if (dump_enabled_p ())
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -1137,17 +1143,21 @@ vect_get_load_c

Re: [PATCH v3] RISC-V: Add augmented hypervisor series extensions.

2025-05-14 Thread Kito Cheng
Pushed, thanks :)


On Tue, May 13, 2025 at 3:25 PM Jiawei  wrote:
>
> The augmented hypervisor series extensions 'sha'[1] is a new profile-defined
> extension series that captures the full set of features that are mandated to
> be supported along with the 'H' extension.
>
> [1] 
> https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc#rva23s64-profile
>
> Version log: Update implements, fix testcase format.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-ext.def: New extension defs.
> * config/riscv/riscv-ext.opt: Ditto.
> * doc/riscv-ext.texi: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-55.c: New test.
>
> ---
>  gcc/config/riscv/riscv-ext.def   | 91 
>  gcc/config/riscv/riscv-ext.opt   | 17 +
>  gcc/doc/riscv-ext.texi   | 28 
>  gcc/testsuite/gcc.target/riscv/arch-55.c |  9 +++
>  4 files changed, 145 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-55.c
>
> diff --git a/gcc/config/riscv/riscv-ext.def b/gcc/config/riscv/riscv-ext.def
> index 34742d912f8..97b576617ad 100644
> --- a/gcc/config/riscv/riscv-ext.def
> +++ b/gcc/config/riscv/riscv-ext.def
> @@ -1571,6 +1571,97 @@ DEFINE_RISCV_EXT(
>/* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
>/* EXTRA_EXTENSION_FLAGS */ 0)
>
> +DEFINE_RISCV_EXT(
> +  /* NAME */ sha,
> +  /* UPPERCAE_NAME */ SHA,
> +  /* FULL_NAME */ "The augmented hypervisor extension",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h", "shcounterenw", "shgatpa", "shtvala", "shvstvala", 
> "shvstvecd", "shvsatpa", "ssstateen"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
> +DEFINE_RISCV_EXT(
> +  /* NAME */ shcounterenw,
> +  /* UPPERCAE_NAME */ SHCOUNTERENW,
> +  /* FULL_NAME */ "Support writeable enables for any supported counter",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h", "zihpm"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
> +DEFINE_RISCV_EXT(
> +  /* NAME */ shgatpa,
> +  /* UPPERCAE_NAME */ SHGATPA,
> +  /* FULL_NAME */ "SvNNx4 mode supported for all modes supported by satp",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h", "ssstateen"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
> +DEFINE_RISCV_EXT(
> +  /* NAME */ shtvala,
> +  /* UPPERCAE_NAME */ SHTVALA,
> +  /* FULL_NAME */ "The htval register provides all needed values",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
> +DEFINE_RISCV_EXT(
> +  /* NAME */ shvstvala,
> +  /* UPPERCAE_NAME */ SHVSTVALA,
> +  /* FULL_NAME */ "The vstval register provides all needed values",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
> +DEFINE_RISCV_EXT(
> +  /* NAME */ shvstvecd,
> +  /* UPPERCAE_NAME */ SHVSTVECD,
> +  /* FULL_NAME */ "The vstvec register supports Direct mode",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
> +DEFINE_RISCV_EXT(
> +  /* NAME */ shvsatpa,
> +  /* UPPERCAE_NAME */ SHVSATPA,
> +  /* FULL_NAME */ "The vsatp register supports all modes supported by satp",
> +  /* DESC */ "",
> +  /* URL */ ,
> +  /* DEP_EXTS */ ({"h"}),
> +  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
> +  /* FLAG_GROUP */ sh,
> +  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
> +  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
> +  /* EXTRA_EXTENSION_FLAGS */ 0)
> +
>  DEFINE_RISCV_EXT(
>/* NAME */ smaia,
>/* UPPERCAE_NAME */ SMAIA,
> diff --git a/gcc/config/riscv/riscv-ext.opt b/gcc/config/riscv/riscv-ext.opt
> index 0c56dc9b271..9199aa31b42 100644
> --- a/gcc/config/riscv/riscv-ext.opt
> +++ b/gcc/config/riscv/riscv-ext.opt
> @@ -28,6 +28,9 @@ int riscv_base_subext
>  TargetVariable
>  int riscv_sd_subext
>
> +TargetVariable
> +int riscv_sh_subext
> +
>  TargetVariable
>  int riscv_sm_sub

Re: [14.x PATCH] c: Allow bool and enum null pointer constants [PR112556]

2025-05-14 Thread Sam James
Sam James  writes:

> From: Joseph Myers 
>
> As reported in bug 112556, GCC wrongly rejects conversion of null
> pointer constants with bool or enum type to pointers in
> convert_for_assignment (assignment, initialization, argument passing,
> return).  Fix the code there to allow BOOLEAN_TYPE and ENUMERAL_TYPE;
> it already allowed INTEGER_TYPE and BITINT_TYPE.
>
> This bug (together with -std=gnu23 meaning false has type bool rather
> than int) has in turn resulted in people thinking they need to fix
> code using false as a null pointer constant for C23 compatibility.
> While such a usage is certainly questionable, it has nothing to do
> with C23 compatibility and the right place for warnings about such
> usage is -Wzero-as-null-pointer-constant.  I think it would be
> appropriate to extend -Wzero-as-null-pointer-constant to cover
> BOOLEAN_TYPE, ENUMERAL_TYPE and BITINT_TYPE (in all the various
> contexts in which that option generates warnings), though this patch
> doesn't do anything about that option.
>
> Bootstrapped with no regressions for x86-64-pc-linux-gnu.
>
>   PR c/112556
>
> gcc/c/
>   * c-typeck.cc (convert_for_assignment): Allow conversion of
>   ENUMERAL_TYPE and BOOLEAN_TYPE null pointer constants to pointers.
>
> gcc/testsuite/
>   * gcc.dg/c11-null-pointer-constant-1.c,
>   gcc.dg/c23-null-pointer-constant-1.c: New tests.
>
> (cherry picked from commit 3d525fce70fa0ffa0b22af6e213643e1ceca5ab5)
> ---
> As discussed on the PR, I feel like this is worth having for 14 as we're
> asking upstreams to try reproduce issues w/ -std=gnu23 (or -std=c23) if
> they don't have access to GCC 15, and this bug may lead to them being
> confused.
>
> Regtested on x86_64-pc-linux-gnu with no regressions.
>
> OK?

Ping, as 14 RC is tomorrow.


Re: [PATCH] RISC-V: Fix uninit riscv_subset_list::m_allow_adding_dup issue

2025-05-14 Thread Kito Cheng
pushed :)

On Wed, May 14, 2025 at 9:18 PM Christoph Müllner <
christoph.muell...@vrull.eu> wrote:

> On Tue, May 13, 2025 at 4:34 AM Kito Cheng  wrote:
> >
> > We forgot to initialize m_allow_adding_dup in the constructor of
> > riscv_subset_list, then that will be a random value...that will lead
> > to a random behavior of the -march may accpet duplicate extension.
> >
> > gcc/ChangeLog:
> >
> > * common/config/riscv/riscv-common.cc
> > (riscv_subset_list::riscv_subset_list): Init m_allow_adding_dup.
>
> Reviewed-by: Christoph Müllner 
>
> Thanks!
>
> > ---
> >  gcc/common/config/riscv/riscv-common.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/common/config/riscv/riscv-common.cc
> b/gcc/common/config/riscv/riscv-common.cc
> > index d3240f79240..2834697a857 100644
> > --- a/gcc/common/config/riscv/riscv-common.cc
> > +++ b/gcc/common/config/riscv/riscv-common.cc
> > @@ -620,7 +620,7 @@ riscv_subset_t::riscv_subset_t ()
> >
> >  riscv_subset_list::riscv_subset_list (const char *arch, location_t loc)
> >: m_arch (arch), m_loc (loc), m_head (NULL), m_tail (NULL), m_xlen
> (0),
> > -m_subset_num (0)
> > +m_subset_num (0), m_allow_adding_dup (false)
> >  {
> >  }
> >
> > --
> > 2.34.1
> >
>


Re: [PATCH v1] libstdc++: Fix class mandate for extents.

2025-05-14 Thread Luc Grosheintz

To make review easier, I'd like to provide links to the
sections in the standard I used.

mandate: https://eel.is/c++draft/mdspan.extents#overview-1.1

signed integer: https://eel.is/c++draft/basic.fundamental#1
unsigned integer: https://eel.is/c++draft/basic.fundamental#2

integral: https://eel.is/c++draft/basic.fundamental#11
  https://eel.is/c++draft/meta#tab:meta.unary.cat-row-4

On 5/14/25 9:13 PM, Luc Grosheintz wrote:

The standard states that the IndexType must be a signed or unsigned
integer. This mandate was implemented using `std::is_integral_v`. Which
also includes (among others) char and bool, which neither signed nor
unsigned integers.

libstdc++-v3/ChangeLog:

* include/std/mdspan: Implement the mandate for extents as
signed or unsigned integer and not any interal type.
* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: Check
that extents and extents are invalid.
* testsuite/23_containers/mdspan/extents/misc.cc: Update
tests to avoid `char` and `bool` as IndexType.
---
  libstdc++-v3/include/std/mdspan|  3 ++-
  .../23_containers/mdspan/extents/class_mandates_neg.cc | 10 +++---
  .../testsuite/23_containers/mdspan/extents/misc.cc |  8 
  3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index aee96dda7cd..22509d9c8f4 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -163,7 +163,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template
  class extents
  {
-  static_assert(is_integral_v<_IndexType>, "_IndexType must be integral.");
+  static_assert(__is_standard_integer<_IndexType>::value,
+   "_IndexType must be a signed or unsigned integer.");
static_assert(
  (__mdspan::__valid_static_extent<_Extents, _IndexType> && ...),
  "Extents must either be dynamic or representable as _IndexType");
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
index b654e3920a8..63a2db77c08 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
@@ -1,8 +1,12 @@
  // { dg-do compile { target c++23 } }
  #include
  
-std::extents e1; // { dg-error "from here" }

-std::extents e2;// { dg-error "from here" }
+#include 
+
+std::extents e1; // { dg-error "from here" }
+std::extents e2; // { dg-error "from here" }
+std::extents e3; // { dg-error "from here" }
+std::extents e4;   // { dg-error "from here" }
  // { dg-prune-output "dynamic or representable as _IndexType" }
-// { dg-prune-output "must be integral" }
+// { dg-prune-output "signed or unsigned integer" }
  // { dg-prune-output "invalid use of incomplete type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
index 1620475..e71fdc54230 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
@@ -1,6 +1,7 @@
  // { dg-do run { target c++23 } }
  #include 
  
+#include 

  #include 
  
  constexpr size_t dyn = std::dynamic_extent;

@@ -20,7 +21,6 @@ static_assert(std::is_same_v::rank_type, 
size_t>);
  static_assert(std::is_unsigned_v::size_type>);
  static_assert(std::is_unsigned_v::size_type>);
  
-static_assert(std::is_same_v::index_type, char>);

  static_assert(std::is_same_v::index_type, int>);
  static_assert(std::is_same_v::index_type,
  unsigned int>);
@@ -49,7 +49,7 @@ static_assert(check_rank_return_types());
  
  // Check that the static extents don't take up space.

  static_assert(sizeof(std::extents) == sizeof(int));
-static_assert(sizeof(std::extents) == sizeof(char));
+static_assert(sizeof(std::extents) == sizeof(short));
  
  template

  class Container
@@ -58,7 +58,7 @@ class Container
[[no_unique_address]] std::extents b0;
  };
  
-static_assert(sizeof(Container>) == sizeof(int));

+static_assert(sizeof(Container>) == sizeof(int));
  static_assert(sizeof(Container>) == sizeof(int));
  
  // operator=

@@ -103,7 +103,7 @@ test_deduction_all()
test_deduction<0>();
test_deduction<1>(1);
test_deduction<2>(1.0, 2.0f);
-  test_deduction<3>(int(1), char(2), size_t(3));
+  test_deduction<3>(int(1), short(2), size_t(3));
return true;
  }
  




Contents of PO file 'cpplib-15.1-b20250316.es.po'

2025-05-14 Thread Translation Project Robot


cpplib-15.1-b20250316.es.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



New Spanish PO file for 'cpplib' (version 15.1-b20250316)

2025-05-14 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Spanish team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/es.po

(This file, 'cpplib-15.1-b20250316.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH v1] libstdc++: Fix class mandate for extents.

2025-05-14 Thread Luc Grosheintz
The standard states that the IndexType must be a signed or unsigned
integer. This mandate was implemented using `std::is_integral_v`. Which
also includes (among others) char and bool, which neither signed nor
unsigned integers.

libstdc++-v3/ChangeLog:

* include/std/mdspan: Implement the mandate for extents as
signed or unsigned integer and not any interal type.
* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: Check
that extents and extents are invalid.
* testsuite/23_containers/mdspan/extents/misc.cc: Update
tests to avoid `char` and `bool` as IndexType.
---
 libstdc++-v3/include/std/mdspan|  3 ++-
 .../23_containers/mdspan/extents/class_mandates_neg.cc | 10 +++---
 .../testsuite/23_containers/mdspan/extents/misc.cc |  8 
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index aee96dda7cd..22509d9c8f4 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -163,7 +163,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 class extents
 {
-  static_assert(is_integral_v<_IndexType>, "_IndexType must be integral.");
+  static_assert(__is_standard_integer<_IndexType>::value,
+   "_IndexType must be a signed or unsigned integer.");
   static_assert(
  (__mdspan::__valid_static_extent<_Extents, _IndexType> && ...),
  "Extents must either be dynamic or representable as _IndexType");
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
index b654e3920a8..63a2db77c08 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_mandates_neg.cc
@@ -1,8 +1,12 @@
 // { dg-do compile { target c++23 } }
 #include
 
-std::extents e1; // { dg-error "from here" }
-std::extents e2;// { dg-error "from here" }
+#include 
+
+std::extents e1; // { dg-error "from here" }
+std::extents e2; // { dg-error "from here" }
+std::extents e3; // { dg-error "from here" }
+std::extents e4;   // { dg-error "from here" }
 // { dg-prune-output "dynamic or representable as _IndexType" }
-// { dg-prune-output "must be integral" }
+// { dg-prune-output "signed or unsigned integer" }
 // { dg-prune-output "invalid use of incomplete type" }
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
index 1620475..e71fdc54230 100644
--- a/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/misc.cc
@@ -1,6 +1,7 @@
 // { dg-do run { target c++23 } }
 #include 
 
+#include 
 #include 
 
 constexpr size_t dyn = std::dynamic_extent;
@@ -20,7 +21,6 @@ static_assert(std::is_same_v::rank_type, size_t>);
 static_assert(std::is_unsigned_v::size_type>);
 static_assert(std::is_unsigned_v::size_type>);
 
-static_assert(std::is_same_v::index_type, char>);
 static_assert(std::is_same_v::index_type, int>);
 static_assert(std::is_same_v::index_type,
  unsigned int>);
@@ -49,7 +49,7 @@ static_assert(check_rank_return_types());
 
 // Check that the static extents don't take up space.
 static_assert(sizeof(std::extents) == sizeof(int));
-static_assert(sizeof(std::extents) == sizeof(char));
+static_assert(sizeof(std::extents) == sizeof(short));
 
 template
 class Container
@@ -58,7 +58,7 @@ class Container
   [[no_unique_address]] std::extents b0;
 };
 
-static_assert(sizeof(Container>) == sizeof(int));
+static_assert(sizeof(Container>) == sizeof(int));
 static_assert(sizeof(Container>) == sizeof(int));
 
 // operator=
@@ -103,7 +103,7 @@ test_deduction_all()
   test_deduction<0>();
   test_deduction<1>(1);
   test_deduction<2>(1.0, 2.0f);
-  test_deduction<3>(int(1), char(2), size_t(3));
+  test_deduction<3>(int(1), short(2), size_t(3));
   return true;
 }
 
-- 
2.49.0



[PATCH] libstdc++: Deprecate non-standard std::fabs(const complex&) [PR120235]

2025-05-14 Thread Jonathan Wakely
There was an overload of fabs for std::complex in TR1 and in some C++0x
drafts, but it was removed from the working draft by LWG 595.

Since we've been providing it for decades we should deprecate it before
removing it.

libstdc++-v3/ChangeLog:

PR libstdc++/120235
* doc/html/*: Regenerate.
* doc/xml/manual/evolution.xml: Document deprecation.
* include/std/complex: Replace references to TR1 subclauses with
corresponding C++11 subclauses.
(fabs): Add deprecated attribute.
* testsuite/26_numerics/complex/fabs_neg.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/doc/html/index.html  |  2 +-
 libstdc++-v3/doc/html/manual/api.html |  5 -
 libstdc++-v3/doc/html/manual/appendix.html|  2 +-
 .../doc/html/manual/appendix_porting.html |  2 +-
 libstdc++-v3/doc/html/manual/index.html   |  2 +-
 libstdc++-v3/doc/xml/manual/evolution.xml |  9 +++-
 libstdc++-v3/include/std/complex  | 22 +--
 .../testsuite/26_numerics/complex/fabs_neg.cc | 13 +++
 8 files changed, 40 insertions(+), 17 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/26_numerics/complex/fabs_neg.cc

diff --git a/libstdc++-v3/doc/html/index.html b/libstdc++-v3/doc/html/index.html
index d465fb6c7f57..dd31cff31819 100644
--- a/libstdc++-v3/doc/html/index.html
+++ b/libstdc++-v3/doc/html/index.html
@@ -142,7 +142,7 @@
 Existing tests
 
 C++11 Requirements Test Sequence Descriptions
-ABI Policy and 
GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed 
ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.31415Backwards 
CompatibilityFirstSecondThirdPre-ISO headers 
removedExtension headers hash_map, 
hash_set moved to ext or backwardsNo ios::nocreate/ios::noreplace.
+ABI Policy and 
GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed 
ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.3141516Backwards 
CompatibilityFirstSecondThirdPre-ISO headers 
removedExtension headers hash_map, 
hash_set moved to ext or backwardsNo ios::nocreate/ios::noreplace.
 
 No stream::attach(int fd)
 
diff --git a/libstdc++-v3/doc/html/manual/api.html 
b/libstdc++-v3/doc/html/manual/api.html
index 09afdb3e7033..4441d9cdbae0 100644
--- a/libstdc++-v3/doc/html/manual/api.html
+++ b/libstdc++-v3/doc/html/manual/api.html
@@ -490,7 +490,7 @@ to provide the symbols for the experimental C++ Contracts 
support.
 header were added to the static library libstdc++exp.a.
 14
-Deprecate the non-standard overload that allows std::setfill
+Deprecated the non-standard overload that allows std::setfill
 to be used with std::basic_istream.
 
   The extension allowing std::basic_string to be 
instantiated
@@ -509,4 +509,7 @@ and removed in C++20:
 
 Nested result_type and argument_type removed from
 std::hash specializations for C++20.
+16
+Deprecated the non-standard overload of std::fabs for
+std::complex arguments.
 Prev Up NextABI Policy and Guidelines Home Backwards 
Compatibility
\ No newline at end of file
diff --git a/libstdc++-v3/doc/html/manual/appendix.html 
b/libstdc++-v3/doc/html/manual/appendix.html
index 69a0e0018f37..e71ea5423a0a 100644
--- a/libstdc++-v3/doc/html/manual/appendix.html
+++ b/libstdc++-v3/doc/html/manual/appendix.html
@@ -16,7 +16,7 @@
 Existing tests
 
 C++11 Requirements Test Sequence Descriptions
-ABI Policy and GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.31415Backwards 
CompatibilityFirstSecondThirdPre-ISO 
headers removedExtension headers hash_map, hash_set 
moved to ext or backwardsNo ios::nocreate/ios::noreplace.
+ABI Policy and GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.3141516Backwards 
CompatibilityFirstSecondThirdPre-ISO 
headers removedExtension headers hash_map, hash_set 
moved to ext or backwardsNo ios::nocreate/ios::noreplace.
 
 No stream::attach(int fd)
 
diff --git a/libstdc++-v3/doc/html/manual/appendix_porting.html 
b/libstdc++-v3/doc/html/manual/appendix_porting.html
index c76ef295e782..e0f52dba6d2a 100644
--- a/libstdc++-v3/doc/

Re: [PATCH] c++: unifying specializations of non-primary tmpls [PR120161]

2025-05-14 Thread Patrick Palka
On Wed, 14 May 2025, Patrick Palka wrote:

> On Wed, 14 May 2025, Jason Merrill wrote:
> 
> > On 5/14/25 2:44 PM, Patrick Palka wrote:
> > > On Wed, 14 May 2025, Patrick Palka wrote:
> > > 
> > > > On Wed, 14 May 2025, Jason Merrill wrote:
> > > > 
> > > > > On 5/12/25 7:53 PM, Patrick Palka wrote:
> > > > > > Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK
> > > > > > for trunk/15/14?
> > > > > > 
> > > > > > -- >8 --
> > > > > > 
> > > > > > Here unification of P=Wrap::type, A=Wrap::type wrongly
> > > > > > succeeds ever since r14-4112 which made the RECORD_TYPE case of 
> > > > > > unify
> > > > > > no longer recurse into template arguments for non-primary templates
> > > > > > (since they're a non-deduced context) and so the int/long mismatch
> > > > > > that
> > > > > > makes the two types distinct goes unnoticed.
> > > > > > 
> > > > > > In the case of (comparing specializations of) a non-primary 
> > > > > > template,
> > > > > > unify should still go on to compare the types directly before
> > > > > > returning
> > > > > > success.
> > > > > 
> > > > > Should the PRIMARY_TEMPLATE_P check instead move up to join the
> > > > > CLASSTYPE_TEMPLATE_INFO check?  try_class_deduction also doesn't seem
> > > > > applicable to non-primary templates.
> > > > 
> > > > I don't think that'd work, for either the CLASSTYPE_TEMPLATE_INFO (parm)
> > > > check
> > > > or the earlier CLASSTYPE_TEMPLATE_INFO (arg) check.
> > > > 
> > > > While try_class_deduction directly doesn't apply to non-primary 
> > > > templates,
> > > > get_template_base still might, so if we move up the PRIMARY_TEMPLATE_P 
> > > > to
> > > > join
> > > > the C_T_I (parm) check, then we wouldn't try get_template_base anymore
> > > > which
> > > > would  break e.g.
> > > > 
> > > >  template struct B { };
> > > > 
> > > >  template
> > > >  struct A {
> > > >struct C : B { };
> > > >  };
> > > > 
> > > >  template void f(B*);
> > > > 
> > > >  int main() {
> > > >A::C c;
> > > >f(&c);
> > > >  }
> > > > 
> > > > If we move the PRIMARY_TEMPLATE_P check up to the C_T_I (arg) check, 
> > > > then
> > > > that'd mean we still don't check same_type_p on the two types in the
> > > > non-primary case, which seems wrong (although it'd fix the PR thanks to
> > > > the
> > > > parm == arg early exit in unify).
> > > 
> > > FWIW it seems part of the weird/subtle logic here is due to the fact
> > > that when unifying e.g. P=C with A=C, we do it twice, first via
> > > try_class_deduction using a copy of 'targs', and if that succeeds we do
> > > it again with the real 'targs'.  I think the logic could simultaneously
> > > be simplified and made memory efficient if we made it so that if the
> > > trial unification from try_class_deduction succeeds we just use its
> > > 'targs' instead of having to repeat the unification.
> > 
> > Hmm, good point, though I don't see what you mean by "a copy", it looks to 
> > me
> > like we do it twice with the real 'targs'.  Seems like we should move
> > try_class_unification out of the UNIFY_ALLOW_DERIVED block and remove the
> > unify that your previous patch conditionalized.
> 
> By a copy, I mean via the call to copy_template_args from
> try_class_unification?  There's currently no way to get at the
> arguments that were deduced by try_class_unification because of
> that copy.
> 
> Ah, and the function has a long comment with an example about why it
> uses an empty (innermost) targ vector rather than a straight copy.  If
> that comment is still correct, I guess we won't be able to avoid the
> trial unify after all :/ But I noticed that Clang accepts the example in
> the comment, whereas GCC rejects.  I wonder who is correct?

In any case, shall we go with the original patch for sake of backports?

> 
> > 
> > Jason
> > 
> > 
> 



Re: [PATCH RFC] libstdc++: run testsuite with -Wabi

2025-05-14 Thread Jonathan Wakely
On Mon, 12 May 2025 at 21:30, Jonathan Wakely  wrote:
>
> On Mon, 12 May 2025 at 16:13, Jason Merrill  wrote:
> >
> > On 5/9/25 1:31 PM, Jonathan Wakely wrote:
> > > On Fri, 9 May 2025 at 18:13, Jonathan Wakely  wrote:
> > >>
> > >> On Fri, 9 May 2025 at 11:19, Jonathan Wakely  wrote:
> > >>>
> > >>> On Thu, 8 May 2025 at 20:56, Jason Merrill  wrote:
> > 
> >  Tested x86_64-pc-linux-gnu.  Does this make sense for trunk?
> > >>>
> > >>> Yes, it looks useful. I'm going to test it with my "very -std and -m32
> > >>> and old-string ABI" test settings to be sure it doesn't cause any
> > >>> problems.
> > >>
> > >> There are a few failures when using GLIBCXX_TESTSUITE_STDS=20 to run
> > >> tests as C++20 or later:
> > >>
> > >> FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++23
> > >> (test for excess errors)
> > >> Excess errors:
> > >> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/experimental/internet:2100:
> > >> warning: offset of
> > >> 'std::experimental::net::v1::ip::basic_resolver::_M_ctx'
> > >> for '-std=c++20' and up changes in '-fabi-version=21' (GCC 16) [-Wabi]
> > >
> > > We have code like this in the networking TS headers:
> > >
> > > struct Base {
> > > protected:
> > > Base() = default;
> > > ~Base() = default;
> > > };
> > >
> > > struct Derived : Base {
> > > void* ptr;
> > > };
> > >
> > > Is the warning wrong?
> >
> > Hmm, it seems to be: sizeof(Derived) isn't affected by -std=c++20 before
> > the fix, I guess the warning doesn't handle empty bases properly.
>
>
> Thanks for the fix, I'll rerun the tests.

The testsuite is clean with -Wabi=20 now, if you want to push it.



[PATCH] libstdc++: Micro-optimization in std::arg overload for scalars

2025-05-14 Thread Jonathan Wakely
Use __builtin_signbit directly instead of std::signbit.

libstdc++-v3/ChangeLog:

* include/std/complex (arg(T)): Use __builtin_signbit instead of
std::signbit.
---

This would avoid overload resolution for std::signbit, and avoid a
function call for -O0, but I'm not sure it's worth bothering.

Tested x86_64-linux.

 libstdc++-v3/include/std/complex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/complex b/libstdc++-v3/include/std/complex
index 67f37d4ec2b7..d9d2d8afda89 100644
--- a/libstdc++-v3/include/std/complex
+++ b/libstdc++-v3/include/std/complex
@@ -2532,8 +2532,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
   typedef typename __gnu_cxx::__promote<_Tp>::__type __type;
 #if (_GLIBCXX11_USE_C99_MATH && !_GLIBCXX_USE_C99_FP_MACROS_DYNAMIC)
-  return std::signbit(__x) ? __type(3.1415926535897932384626433832795029L)
-  : __type();
+  return __builtin_signbit(__type(__x))
+  ? __type(3.1415926535897932384626433832795029L) : __type();
 #else
   return std::arg(std::complex<__type>(__x));
 #endif
-- 
2.49.0



Re: [PATCH] c++: unifying specializations of non-primary tmpls [PR120161]

2025-05-14 Thread Patrick Palka
On Wed, 14 May 2025, Jason Merrill wrote:

> On 5/14/25 2:44 PM, Patrick Palka wrote:
> > On Wed, 14 May 2025, Patrick Palka wrote:
> > 
> > > On Wed, 14 May 2025, Jason Merrill wrote:
> > > 
> > > > On 5/12/25 7:53 PM, Patrick Palka wrote:
> > > > > Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK
> > > > > for trunk/15/14?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > Here unification of P=Wrap::type, A=Wrap::type wrongly
> > > > > succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
> > > > > no longer recurse into template arguments for non-primary templates
> > > > > (since they're a non-deduced context) and so the int/long mismatch
> > > > > that
> > > > > makes the two types distinct goes unnoticed.
> > > > > 
> > > > > In the case of (comparing specializations of) a non-primary template,
> > > > > unify should still go on to compare the types directly before
> > > > > returning
> > > > > success.
> > > > 
> > > > Should the PRIMARY_TEMPLATE_P check instead move up to join the
> > > > CLASSTYPE_TEMPLATE_INFO check?  try_class_deduction also doesn't seem
> > > > applicable to non-primary templates.
> > > 
> > > I don't think that'd work, for either the CLASSTYPE_TEMPLATE_INFO (parm)
> > > check
> > > or the earlier CLASSTYPE_TEMPLATE_INFO (arg) check.
> > > 
> > > While try_class_deduction directly doesn't apply to non-primary templates,
> > > get_template_base still might, so if we move up the PRIMARY_TEMPLATE_P to
> > > join
> > > the C_T_I (parm) check, then we wouldn't try get_template_base anymore
> > > which
> > > would  break e.g.
> > > 
> > >  template struct B { };
> > > 
> > >  template
> > >  struct A {
> > >struct C : B { };
> > >  };
> > > 
> > >  template void f(B*);
> > > 
> > >  int main() {
> > >A::C c;
> > >f(&c);
> > >  }
> > > 
> > > If we move the PRIMARY_TEMPLATE_P check up to the C_T_I (arg) check, then
> > > that'd mean we still don't check same_type_p on the two types in the
> > > non-primary case, which seems wrong (although it'd fix the PR thanks to
> > > the
> > > parm == arg early exit in unify).
> > 
> > FWIW it seems part of the weird/subtle logic here is due to the fact
> > that when unifying e.g. P=C with A=C, we do it twice, first via
> > try_class_deduction using a copy of 'targs', and if that succeeds we do
> > it again with the real 'targs'.  I think the logic could simultaneously
> > be simplified and made memory efficient if we made it so that if the
> > trial unification from try_class_deduction succeeds we just use its
> > 'targs' instead of having to repeat the unification.
> 
> Hmm, good point, though I don't see what you mean by "a copy", it looks to me
> like we do it twice with the real 'targs'.  Seems like we should move
> try_class_unification out of the UNIFY_ALLOW_DERIVED block and remove the
> unify that your previous patch conditionalized.

By a copy, I mean via the call to copy_template_args from
try_class_unification?  There's currently no way to get at the
arguments that were deduced by try_class_unification because of
that copy.

Ah, and the function has a long comment with an example about why it
uses an empty (innermost) targ vector rather than a straight copy.  If
that comment is still correct, I guess we won't be able to avoid the
trial unify after all :/ But I noticed that Clang accepts the example in
the comment, whereas GCC rejects.  I wonder who is correct?

> 
> Jason
> 
> 



Re: [wwwdocs] Remove claims that the release timeline shows future releases

2025-05-14 Thread Jakub Jelinek
On Wed, May 14, 2025 at 09:31:59PM +0100, Jonathan Wakely wrote:
> The timeline hasn't shown any tentative dates for future releases since
> 2006-03-06 when GCC 3.4.6 was released and the tentative date got
> replaced with the real date. Since then only actual release dates have
> been added, on the day when the release happens.
> ---
> 
> OK for wwwdocs?

Yes.

Jakub



New Swedish PO file for 'gcc' (version 15.1.0)

2025-05-14 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-15.1.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH 3/5] c++, coroutines: Address CWG2563 return value init [PR119916].

2025-05-14 Thread Jason Merrill

On 5/13/25 10:30 AM, Iain Sandoe wrote:

This addresses the clarification that, when the get_return_object is of a
different type from the ramp return, any necessary conversions should be
performed on the return expression (so that they typically occur after the
function body has started execution).

PR c++/119916

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Do not
initialise initial_await_resume_called here...
(cp_coroutine_transform::build_ramp_function): ... but here.
When the coroutine is not void, initialize a GRO object from
promise.get_return_object().  Use this as the argument to the
return expression.  Use a regular cleanup for the GRO, since
it is ramp-local.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/special-termination-00-sync-completion.C:
Amend for CWG2563 expected behaviour.
* g++.dg/coroutines/torture/special-termination-01-self-destruct.C:
Likewise.
* g++.dg/coroutines/torture/pr119916.C: New test.

Signed-off-by: Iain Sandoe 



+  /* We must manage the cleanups ourselves, because the responsibility for
+ them changes after the initial suspend.  However, any use of
+ cxx_maybe_build_cleanup () can set the throwing_cleanup flag.  */
+  cp_function_chain->throwing_cleanup = false;


Hmm...what if the gro cleanup throws after initializing the (different 
type) return value?  That seems like a case that we need 
throwing_cleanup set for.



@@ -5245,8 +5195,11 @@ cp_coroutine_transform::build_ramp_function ()
  
tree not_iarc

= build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, iarc_x);
+  tree do_cleanup = build2_loc (loc, TRUTH_AND_EXPR, boolean_type_node,
+   not_iarc, coro_before_return);


As with the 14 patch, this should be reversed.

Jason



Re: [PATCH 6/8] AArch64: recognize `+cmpbr` option

2025-05-14 Thread Karl Meakin



On 07/05/2025 12:48, Kyrylo Tkachov wrote:



On 7 May 2025, at 12:27, Karl Meakin  wrote:

Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
extension.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cmpbr): new
option.
* config/aarch64/aarch64.h (TARGET_CMPBR): new macro.
* doc/invoke.texi (cmpbr): new option.

Looks ok to me.
Not a blocker here, but does this need any FMV handling? I guess this is one of 
those transparent codegen features and maybe doesn’t need FMV clones…
Thanks,
Kyrill

I don't think it needs any special handling, as you say it should be 
handled transparently.

---
gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
gcc/config/aarch64/aarch64.h | 3 +++
gcc/doc/invoke.texi  | 3 +++
3 files changed, 8 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index dbbb021f05a..1c3e69799f5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -249,6 +249,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "mops")

AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")

+AARCH64_OPT_EXTENSION("cmpbr", CMPBR, (), (), (), "cmpbr")
+
AARCH64_OPT_EXTENSION("lse128", LSE128, (LSE), (), (), "lse128")

AARCH64_OPT_EXTENSION("d128", D128, (LSE128), (), (), "d128")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e8bd8c73c12..d5c4a42e96d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -410,6 +410,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
/* CSSC instructions are enabled through +cssc.  */
#define TARGET_CSSC AARCH64_HAVE_ISA (CSSC)

+/* CB instructions are enabled through +cmpbr.  */
+#define TARGET_CMPBR AARCH64_HAVE_ISA (CMPBR)
+
/* Make sure this is always defined so we don't have to check for ifdefs
but rather use normal ifs.  */
#ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32bc45725de..3f05e5e0e34 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -22252,6 +22252,9 @@ Enable the FlagM2 flag conversion instructions.
Enable the Pointer Authentication Extension.
@item cssc
Enable the Common Short Sequence Compression instructions.
+@item cmpbr
+Enable the shorter compare and branch instructions, @code{cbb}, @code{cbh} and
+@code{cb}.
@item sme
Enable the Scalable Matrix Extension.  This is only supported when SVE2 is also
enabled.
--
2.45.2



Re: [14.x PATCH] c: Allow bool and enum null pointer constants [PR112556]

2025-05-14 Thread Joseph Myers
On Wed, 14 May 2025, Sam James wrote:

> > (cherry picked from commit 3d525fce70fa0ffa0b22af6e213643e1ceca5ab5)
> > ---
> > As discussed on the PR, I feel like this is worth having for 14 as we're
> > asking upstreams to try reproduce issues w/ -std=gnu23 (or -std=c23) if
> > they don't have access to GCC 15, and this bug may lead to them being
> > confused.
> >
> > Regtested on x86_64-pc-linux-gnu with no regressions.
> >
> > OK?
> 
> Ping, as 14 RC is tomorrow.

Backporting this to GCC 14 is OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH 0/8] AArch64: CMPBR support

2025-05-14 Thread Karl Meakin



On 07/05/2025 13:00, Kyrylo Tkachov wrote:

Hi Karl,


On 7 May 2025, at 12:27, Karl Meakin  wrote:

This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Thanks for the series.
You didn’t state it explicitly, but have you run a bootstrap and testsuite run 
with this series?
It’s usually best to include testing information in the patches to help 
reviewers.

Thanks,
Kyrill

Yes, I will update the cover letter to include that information, thanks 
for the reminder.

Karl Meakin (8):
  AArch64: place branch instruction rules together
  AArch64: reformat branch instruction rules
  AArch64: rename branch instruction rules
  AArch64: add constants for branch displacements
  AArch64: make `far_branch` attribute a boolean
  AArch64: recognize `+cmpbr` option
  AArch64: precommit test for CMPBR instructions
  AArch64: rules for CMPBR instructions

.../aarch64/aarch64-option-extensions.def |2 +
gcc/config/aarch64/aarch64-simd.md|2 +-
gcc/config/aarch64/aarch64-sme.md |3 +-
gcc/config/aarch64/aarch64.cc |2 +-
gcc/config/aarch64/aarch64.h  |3 +
gcc/config/aarch64/aarch64.md |  557 +---
gcc/config/aarch64/iterators.md   |5 +
gcc/config/aarch64/predicates.md  |   17 +
gcc/doc/invoke.texi   |3 +
gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1238 +
10 files changed, 1615 insertions(+), 217 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c

--
2.45.2



Re: [PATCH] libstdc++: Make debug iterator pointer sequence const [PR116369]

2025-05-14 Thread Jonathan Wakely
On Wed, 14 May 2025 at 17:31, François Dumont  wrote:
>
> On 12/05/2025 23:03, Jonathan Wakely wrote:
> > On 31/03/25 22:20 +0200, François Dumont wrote:
> >> Hi
> >>
> >> Following this previous patch
> >> https://gcc.gnu.org/pipermail/libstdc++/2024-August/059418.html I've
> >> completed it for the _Safe_unordered_container_base type and
> >> implemented the rest of the change to store the safe iterator
> >> sequence as a pointer-to-const.
> >>
> >> libstdc++: Make debug iterator pointer sequence const [PR116369]
> >>
> >> In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug
> >> sequence
> >> have been made mutable to allow attach iterators to const
> >> containers.
> >> This change completes this fix by also declaring debug unordered
> >> container
> >> members mutable.
> >>
> >> Additionally the debug iterator sequence is now a
> >> pointer-to-const and so
> >> _Safe_sequence_base _M_attach and all other methods are const
> >> qualified.
> >> Symbols export are maintained thanks to __asm directives.
> >>
> > I can't compile this, it seems to be missing changes to
> > safe_local_iterator.tcc:
> >
> > In file included from
> > /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.h:444,
> >  from
> > /home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:33:
> > /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:
> > In member function ‘typename
> > __gnu_debug::_Distance_traits<_Iterator>::__type
> > __gnu_debug::_Safe_local_iterator<_Iterator,
> > _Sequence>::_M_get_distance_to(const
> > __gnu_debug::_Safe_local_iterator<_Iterator, _Sequence>&) const’:
> > /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
> > error: there are no arguments to ‘_M_get_sequence’ that depend on a
> > template parameter, so a declaration of ‘_M_get_sequence’ must be
> > available [-Wtemplate-body]
> >47 | _M_get_sequence()->bucket_size(bucket()),
> >   | ^~~
> > /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:47:17:
> > note: (if you use ‘-fpermissive’, G++ will accept your code, but
> > allowing the use of an undeclared name is deprecated)
> > /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_local_iterator.tcc:59:18:
> > error: there are no arguments to ‘_M_get_sequence’ that depend on a
> > template parameter, so a declaration of ‘_M_get_sequence’ must be
> > available [-Wtemplate-body]
> >59 | -_M_get_sequence()->bucket_size(bucket()),
> >   |  ^~~
> >
> Yes, sorry, I had already spotted this problem, but only updated the PR
> and not re-sending patch here.
>
>
> >
> >> Also available as a PR
> >>
> >> https://forge.sourceware.org/gcc/gcc-TEST/pulls/47
> >>
> >> /** Detach all singular iterators.
> >>  *  @post for all iterators i attached to this sequence,
> >>  *   i->_M_version == _M_version.
> >>  */
> >> void
> >> -_M_detach_singular();
> >> +_M_detach_singular() const
> >> + __asm("_ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEv");
> >
> > Does this work on all targets?
>
> No idea ! I thought the symbol name used here just had to match the
> entries in config/abi/pre/gnu.ver.

That linker script is not used for all targets.

>
> It is what other usages of __asm in the lib are doing for the moment, in
>  header, wo target considerations.

The _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT macro is only ever defined for
powerpc64le-unknown-linux-gnu so those uses in  are certainly
not without target considerations. They are used on exactly one
particular target, and no others.

>
>
> > I think darwin uses __Z as the prefix
> > for mangled names.
> > It might be necessary to use a macro to do this, so that it
> > conditionally puts "_" before the name.
> >
> If it's the only "exotic" target I can indeed deal with it with a macro.
> Otherwise I might have to find another alternative.
>
> Here is an updated version considering all your other remarks.
>
>  libstdc++: Make debug iterator pointer sequence const [PR116369]
>
>  In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug sequence
>  have been made mutable to allow attach iterators to const containers.
>  This change completes this fix by also declaring debug unordered
> container
>  members mutable.
>
>  Additionally the debug iterator sequence is now a pointer-to-const
> and so
>  _Safe_sequence_base _M_attach and all other methods are const
> qualified.
>  Symbols export are maintained thanks to __asm directives.
>
>  libstdc++-v3/ChangeLog:
>
>  PR c++/116369
>  * include/debug/safe_base.h
>  (_Safe_iterator_base::_M_sequence): Declare as
> pointer-to-const.
>  (_Safe_iterator_base::_M_a

Re: [PATCH] libgcobol: Add multilib support

2025-05-14 Thread Richard Biener
On Wed, May 14, 2025 at 6:29 PM James K. Lowden
 wrote:
>
> On Wed, 14 May 2025 11:04:50 +0200
> Rainer Orth  wrote:
>
> > Work around what appears to be a GNU make bug handling MAKEFLAGS
>
> Before I say Yes, could someone please tell me why this rumored bug is
> responsible for so much boilerplate in our Makefile.am files?  You
> say,
>
> > Unlike some runtime libs that can get away without setting
> > AM_MAKEFLAGS and friends, libgcobol can not since it then tries to
> > link the 64-bit libgcobol with 32-bit libstdc++.
>
> but I don't see the connection between that and 20 lines of definition
> resting on "what appears to be a bug".
>
> I guess I can live with "no one knows, that's what we do."  But I'm
> sure I'm not alone in preferring to understand how the build builds.

That's the case ... though this boilerplate is not used consistently.

Richard.

>
> --jkl


[PATCH] tree: Canonical order for ADDR

2025-05-14 Thread Andrew Pinski
This is the followup based on the review at
https://inbox.sourceware.org/gcc-patches/cafiyyc3xeg75dswaf63zbu5uelpeaeohwgfogavydwouuj7...@mail.gmail.com/
.
We should put ADDR_EXPR last instead of just is_gimple_invariant_address ones.

Note a few match patterns needed to be updated for this change but we get a 
decent improvement
as forwprop-38.c is now able to optimize during CCP rather than taking all the 
way to forwprop.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* fold-const.cc (tree_swap_operands_p): Put ADDR_EXPR last
instead of just is_gimple_invariant_address ones.
* match.pd (`a ptr+ b !=\== ADDR`, `ADDR !=/== ssa_name`):
Move the ADDR to the last operand. Update comment.

Signed-off-by: Andrew Pinski 
---
 gcc/fold-const.cc | 6 +++---
 gcc/match.pd  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 35fcf5087fb..5f48ced5063 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -7246,10 +7246,10 @@ tree_swap_operands_p (const_tree arg0, const_tree arg1)
   if (TREE_CONSTANT (arg0))
 return true;
 
-  /* Put invariant address in arg1. */
-  if (is_gimple_invariant_address (arg1))
+  /* Put addresses in arg1. */
+  if (TREE_CODE (arg1) == ADDR_EXPR)
 return false;
-  if (is_gimple_invariant_address (arg0))
+  if (TREE_CODE (arg0) == ADDR_EXPR)
 return true;
 
   /* It is preferable to swap two SSA_NAME to ensure a canonical form
diff --git a/gcc/match.pd b/gcc/match.pd
index 96136404f5e..79485f9678a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2845,7 +2845,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* (&a + b) !=/== (&a[1] + c) -> (&a[0] - &a[1]) + b !=/== c */
 (for neeq (ne eq)
  (simplify
-  (neeq:c ADDR_EXPR@0 (pointer_plus @2 @3))
+  (neeq:c (pointer_plus @2 @3) ADDR_EXPR@0)
(with { poly_int64 diff; tree inner_type = TREE_TYPE (@3);}
 (if (ptr_difference_const (@0, @2, &diff))
  (neeq { build_int_cst_type (inner_type, diff); } @3
@@ -7658,8 +7658,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 (for cmp (eq ne)
  (simplify
-  /* SSA names are canonicalized to 2nd place.  */
-  (cmp addr@0 SSA_NAME@1)
+  /* ADDRs are canonicalized to 2nd place.  */
+  (cmp SSA_NAME@1 addr@0)
   (with
{
  poly_int64 off; tree base;
-- 
2.43.0



Re: [PATCH] gimple: Canonical order for invariants [PR118902]

2025-05-14 Thread Andrew Pinski
On Mon, May 12, 2025 at 3:53 AM Richard Biener
 wrote:
>
> On Fri, May 9, 2025 at 10:12 PM Andrew Pinski  wrote:
> >
> > On Mon, Apr 21, 2025 at 1:42 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Apr 17, 2025 at 7:37 PM Andrew Pinski  
> > > wrote:
> > > >
> > > > So unlike constants, address invariants are currently put first if
> > > > used with a SSA NAME.
> > > > It would be better if address invariants are consistent with constants
> > > > and this patch changes that.
> > > > gcc.dg/tree-ssa/pr118902-1.c is an example where this canonicalization
> > > > can help. In it if `p` variable was a global variable, FRE (VN) would 
> > > > have figured
> > > > it out that `a` could never be equal to `&p` inside the loop. But 
> > > > without the
> > > > canonicalization we end up with `&p == a.0_1` which VN does try to 
> > > > handle for conditional
> > > > VN.
> > > >
> > > > Bootstrapped and tested on x86_64.
> > > >
> > > > PR tree-optimization/118902
> > > > gcc/ChangeLog:
> > > >
> > > > * fold-const.cc (tree_swap_operands_p): Place invariants in the 
> > > > first operand
> > > > if not used with constants.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/pr118902-1.c: New test.
> > > >
> > > > Signed-off-by: Andrew Pinski 
> > > > ---
> > > >  gcc/fold-const.cc  |  6 ++
> > > >  gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c | 21 +
> > > >  2 files changed, 27 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c
> > > >
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index 1275ef75315..c9471ea44b0 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -7246,6 +7246,12 @@ tree_swap_operands_p (const_tree arg0, 
> > > > const_tree arg1)
> > > >if (TREE_CONSTANT (arg0))
> > > >  return true;
> > > >
> > > > +  /* Put invariant address in arg1. */
> > > > +  if (is_gimple_invariant_address (arg1))
> > > > +return false;
> > > > +  if (is_gimple_invariant_address (arg0))
> > > > +return true;
> > >
> > > We could make this cheaper by considering all ADDR_EXPRs here?
> > >
> > > I'll note that with this or the above
> > >
> > >   /* Put SSA_NAMEs last.  */
> > >   if (TREE_CODE (arg1) == SSA_NAME)
> > > return false;
> > >   if (TREE_CODE (arg0) == SSA_NAME)
> > > return true;
> > >
> > > is a bit redundant and contradicting, when we are in GIMPLE, at least.
> > > I'd say on GIMPLE reversing the above to put SSA_NAMEs first would
> > > solve the ADDR_EXPR issue as well.
> > >
> > > The idea of tree_swap_operands_p seems to be to put "simple" things
> > > second, but on GIMPLE SSA_NAME is not simple.  With GENERIC
> > > this would put memory refs first, SSA_NAME second, which is reasonable.
> > >
> > > I'd say since an ADDR_EXPR is always a "value" (not memory), putting it
> > > last makes sense in general, whether invariant or not.  Can you test that?
> > > The issue with is_gimple_invariant_address is that it walks all handled
> > > components.
> >
> > Coming back to this, I will make a change to put ADDR first instead of
> > my patch of is_gimple_invariant_address, next week.
> >
> > Note I just noticed while trying to remove
> > forward_propagate_into_gimple_cond and
> > forward_propagate_into_comparison that we have:
> > (for cmp (eq ne)
> >  (simplify
> >   /* SSA names are canonicalized to 2nd place.  */
> >   (cmp addr@0 SSA_NAME@1)
> >
> > But that seems wrong if we had SSA_NAME which was defined by an
> > ADDR_EXPR as we don't redo canonicalization when doing valueization.
>
> We always valueize first, then canonicalize, so this should work (but the
> above would need adjustment if we order ADDR_EXPR differently now).

So doing ADDR_EXPR last worked and I submitted a patch which fixes up
the issue with the already applied patch:
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683711.html
It includes the match change too as needed.

Thanks,
Andrew Pinski

>
> > It just happens to work in the end because fold will do the
> > canonicalization  before match and simplify and forwprop uses fold do
> > the simplifcation during forward_propagate_into_comparison_1. While
> > for match and simplify on gimple, it does not while valueization of
> > the names. Anyways I will fix that; and add a comment on why it is not
> > always canonicalized. Just bringing it up as a related issue here.
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > Richard.
> > >
> > > > +
> > > >/* It is preferable to swap two SSA_NAME to ensure a canonical form
> > > >   for commutative and comparison operators.  Ensuring a canonical
> > > >   form allows the optimizers to find additional redundancies without
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c 
> > > > b/gcc/testsuite/gcc.dg/tree-ssa/pr118902-1.c
> > > > new file mode 100644
> > > > index 000..fa21b8a74ef
> > > > --- /dev/null
> > >

Re: [PATCH v2] libstdc++: Preserve the argument type in basic_format_args [PR119246]

2025-05-14 Thread Iain Sandoe



> On 14 May 2025, at 18:42, Rainer Orth  wrote:
> 
> Hi Jonathan,
> 
>> On 14/05/25 10:01 +0200, Tomasz Kamiński wrote:
>>> This commits adjust the way how the arguments are stored in the _Arg_value
>>> (and thus basic_format_args), by preserving the types of fixed width
>>> floating-point types, that were previously converted to float, double,
>>> long double.
>>> 
>>> The _Arg_value union now contains alternatives with std::bfloat16_t,
>>> std::float16_t, std::float32_t, std::float64_t that use pre-existing
>>> _Arg_bf16, _Arg_f16, _Arg_f32, _Arg_f32 argument types.
>>> 
>>> This does not affect formatting, as specialization of formatters for fixed
>>> width floating-point types formats them by casting to the corresponding
>>> standard floating point type.
>>> 
>>> For the 128bit floating we need to handle the ppc64 architecture,
>>> (_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT) for which the long double may (per TU
>>> basis) designate either __ibm128 and __ieee128 type, we need to store both
>>> types in the _Arg_value and have two _Arg_types (_Arg_ibm128, _Arg_ieee128).
>>> On other architectures we use extra enumerator value to store __float128,
>>> that is different from long double and _Float128. This is consistent with 
>>> ppc64,
>>> for which __float128, if present, is same type as __ieee128. We use
>> _Arg_float128
>>> _M_float128 names that deviate from _Arg_fN naming scheme, to emphasize that
>>> this flag is not used for std::float128_t (_Float128) type, that is 
>>> consistenly
>>> formatted via handle.
>>> 
>>> The __format::__float128_t type is renamed to __format::__flt128_t, to 
>>> mitigate
>>> visual confusion between this type and __float128. We also introduce 
>>> __bflt16_t
>>> typedef instead of using of decltype.
>>> 
>>> We add new alternative for the _Arg_value and allow them to be accessed
>> via _S_get,
>>> when the types are available. However, we produce and handle corresponding
>> _Arg_type,
>>> only when we can format them. See also r14-3329-g27d0cfcb2b33de.
>>> 
>>> The formatter<_Float128, _CharT> that formats via __format::__flt128_t is 
>>> always
>>> provided, when type is available. It is still correct when 
>>> __format::__flt128_t
>>> is _Float128.
>>> 
>>> We also provide formatter<__float128, _CharT> that formats via __flt128_t.
>>> As this type may be disabled (-mno-float128), extra care needs to be taken,
>>> for situation when __float128 is same as long double. If the formatter 
>>> would be
>>> defined in such case, the formatter would be generated
>>> from different specializations, and have different mangling:
>>> * formatter<__float128, _CharT> if __float128 is present,
>>> * formatter<__format::__formattable_float, _CharT> otherwise.
>>> To best of my knowledge this happens only on ppc64 for __ieee128 and 
>>> __float128,
>>> so the formatter is not defined in this case. static_assert is added to 
>>> detect
>>> other configurations like that. In such case we should replace it with
>> constraint.
>>> 
>>> PR libstdc++/119246
>>> 
>>> libstdc++-v3/ChangeLog:
>>> 
>>> * include/std/format (__format::__bflt16_t): Define.
>>> (_GLIBCXX_FORMAT_F128): Separate value for cases where _Float128
>>> is used.
>>> (__format::__float128_t): Renamed to __format::__flt128_t.
>>> (std::formatter<_Float128, _CharT>): Define always if there is
>>> formattable 128bit float.
>>> (std::formatter<__float128, _CharT>): Define.
>>> (_Arg_type::_Arg_f128): Rename to _Arg_float128 and adjust value.
>>> (_Arg_type::_Arg_ibm128): Change value to _Arg_ldbl.
>>> (_Arg_type::_Arg_ieee128): Define as alias to _Arg_float128.
>>> (_Arg_value::_M_f128): Replaced with _M_ieee128 and _M_float128.
>>> (_Arg_value::_M_ieee128, _Arg_value::_M_float128)
>>> (_Arg_value::_M_bf16, _Arg_value::_M_f16, _Arg_value::_M_f32)
>>>  _Arg_value::_M_f64): Define.
>>> (_Arg_value::_S_get, basic_format_arg::_S_to_enum): Handle __bflt16,
>>> _Float16, _Float32, _Float64, and __float128 types.
>>> (basic_format_arg::_S_to_arg_type): Preserve _bflt16, _Float16,
>>> _Float32, _Float64 and __float128 types.
>>> (basic_format_arg::_M_visit): Handle _Arg_float128, _Arg_ieee128,
>>> _Arg_b16, _Arg_f16, _Arg_f32, _Arg_f64.
>>> * testsuite/std/format/arguments/args.cc: Updated to illustrate
>>> that extended floating point types use handles now. Added test
>>> for __float128.
>>> * testsuite/std/format/parse_ctx.cc: Extended test to cover class
>>> to check_dynamic_spec with floating point types and handles.
>>> ---
>>> I believe I have fixed all the typos. OK for trunk?
>> 
>> 
>> OK, thanks
> 
> this patch broke Solaris bootstrap, both i386-pc-solaris2.11 and
> sparc-sun-solaris2.11:
> 
> In file included from 
> /vol/gcc/src/hg/master/local/libstdc++-v3/src/c++20/format.cc:29:
> /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/format:
>  In member function ‘typename std::basic_format_c

Re: [PATCH v2 1/3] libstdc++: Avoid double indirection in move_only_function when possible [PR119125]

2025-05-14 Thread Jonathan Wakely

On 14/05/25 11:52 +0100, Jonathan Wakely wrote:

On 14/05/25 10:48 +0200, Tomasz Kamiński wrote:

Based on the provision in C++26 [func.wrap.general] p2 this patch adjust the 
generic
move_only_function(_Fn&&) constructor, such that when _Fn refers to selected
move_only_function instantiations, the ownership of the target object is 
direclty


s/direclty/directly/


transfered to constructor object. This avoid cost of double indireciton in this 
situation.


s/indireciton/indirection/


We apply this also in C++23 mode.

We also fix handling of self assigments, to match behavior required by standard,


s/assigments/assignments/


due use of copy and swap idiom.

An instantiations MF1 of move_only_function can transfer target of another
instantiation MF2, if it can be constructed via usual rules 
(__is_callable_from<_MF2>),
and their invoker are convertible (__is_invocer_convertible()), i.e.:
* MF1 is less noexcept than MF2,
* return types are the same after stripping cv-quals
* adujsted parameters type are the same (__poly::_param_t), i.e. param of types T and 
T&&
are compatible for non-trivially copyable objects.
Compatiblity of cv ref qualification is checked via __is_callable_from<_MF2>.

To achieve above the generation of _M_invoke functions is moved to _Invoke class


s/_Invoke/_Invoker/


templates, that only depends on noexcept, return type and adjusted parameter of 
the
signature. To make the invoker signature compatible between const and mutable
qualified signatures, we always accept _Storage as const& and perform a 
const_cast
for locally stored object. This approach guarantees that we never strip const 
from
const object.

Another benefit of this approach is that move_only_function
and move_only_function use same funciton pointer, which 
should
reduce binary size.

The _Storage and _Manager functionality was also extracted and adjusted from
_Mo_func base, in preparation for implementation for copyable_function and
function_ref. The _Storage was adjusted to store functions pointers as 
void(*)().
The manage function, now accepts _Op enum parameter, and supports additional
operations:
* _Op::_Address stores address of target object in destination
* _Op::_Copy, when enabled, copies from source to destination
Furthremore, we provide a type-independent mamange functions for handling all:


s/Furthremore/Furthermore/


* function pointer types
* trivially copyable object stored locally.
Similary as in case of invoker, we always pass source as const (for copy),
and cast away constness in case of move operations, where we know that source
is mutable.

Finally, the new helpers are defined in __polyfunc internal namespace.

PR libstdc++/119125

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h: (std::move_only_function): Adjusted for
changes in bits/move_only_function.h
(move_only_function::move_only_function(_Fn&&)): Special case
move_only_functions with same invoker.
(move_only_function::operator=(move_only_function&&)): Handle self
assigment.


s/assigment/assignment/


* include/bits/move_only_function.h (__polyfunc::_Ptrs)
(__polyfunc::_Storage): Refactored from _Mo_func::_Storage.
(__polyfunc::__param_t): Moved from move_only_function::__param_t.
(__polyfunc::_Base_invoker, __polyfunc::_Invoke): Refactored from


s/_Invoke/_Invoker/


move_only_function::_S_invoke.
(__polyfunc::_Manager): Refactored from _Mo_func::_S_manager.
(std::_Mofunc_base): Moved into __polyfunc::_Mo_base with parts
extracted to __polyfunc::_Storage and __polyfunc::_Manager.
(__polyfunc::__deref_as, __polyfunc::__invoker_of)
(__polyfunc::__base_of, __polyfunc::__is_invoker_convertible): Define.
(std::__is_move_only_function_v): Renamed to
__is_polymorphic_function_v.
(std::__is_polymorphic_function_v): Renamed from
__is_move_only_function_v.
* testsuite/20_util/move_only_function/call.cc: Test for
functions pointers.
* testsuite/20_util/move_only_function/conv.cc: New test.
* testsuite/20_util/move_only_function/move.cc: Tests for
self assigment.
---
In addition to adjusting formatting and fixing typo, this update:
* consistently call global new when placement new is used, and
 non-global for heap allocations
* moves _Invoker before _Manager.
The _Invoker can be supported for non hosted enviroment, as well
as function_ref.

libstdc++-v3/include/bits/mofunc_impl.h   |  74 +--
.../include/bits/move_only_function.h | 455 +-
.../20_util/move_only_function/call.cc|  14 +
.../20_util/move_only_function/conv.cc| 188 
.../20_util/move_only_function/move.cc|  11 +
5 files changed, 588 insertions(+), 154 deletions(-)
create mode 100644 libstdc++-v3/testsuite/20_util/move_only_function/conv.cc

diff --git a/libstdc++-v3/include/bits/mofunc_impl.h 
b/libstdc++-v3/includ

[PATCH v2 2/2]AArch64: propose -mmax-vectorization as an option to override vector costing

2025-05-14 Thread Tamar Christina
Hi All,

With the middle-end providing a way to make vectorization more profitable by
scaling vect-scalar-cost-multiplier this makes a more user friendly option
to make it easier to use.

I propose making it an actual -m option that we document and retain vs using
the parameter name.  In the future I would like to extend this option to modify
additional costing in the AArch64 backend itself.

This can be used together with --param aarch64-autovec-preference to get the
vectorizer to say, always vectorize with SVE.  I did consider making this an
additional enum to --param aarch64-autovec-preference but I also think this is
a useful thing to be able to set with pragmas and attributes, but am open to
suggestions.

Note that as a follow up I plan on extending -fdump-tree-vect to support -stats
which is then intended to be usable with this flag.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.opt (max-vectorization): New.
* config/aarch64/aarch64.cc (aarch64_override_options_internal): Save
and restore option.
Implement it through vect-scalar-cost-multiplier.
(aarch64_attributes): Default to off.
* common/config/aarch64/aarch64-common.cc (aarch64_handle_option):
Initialize option.
* doc/extend.texi (max-vectorization): Document attribute.
* doc/invoke.texi (max-vectorization): Document flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cost_model_17.c: New test.
* gcc.target/aarch64/sve/cost_model_18.c: New test.

---
diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
b9ed83642ade4462f1b030d68cf9744d31d70c23..1488697c6ce43108ae2938e5b8a00ac7ac262da6
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -142,6 +142,10 @@ aarch64_handle_option (struct gcc_options *opts,
   opts->x_aarch64_flag_outline_atomics = val;
   return true;
 
+case OPT_mmax_vectorization:
+  opts->x_flag_aarch64_max_vectorization = val;
+  return true;
+
 default:
   return true;
 }
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..01bc0f9f7bb6e14d5d180949404238baef3f1cac
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18973,6 +18973,12 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   if (TARGET_SME && !TARGET_SVE2)
 sorry ("no support for %qs without %qs", "sme", "sve2");
 
+  /* Set scalar costing to a high value such that we always pick
+ vectorization.  Increase scalar costing by 400%.  */
+  if (opts->x_flag_aarch64_max_vectorization)
+SET_OPTION_IF_UNSET (opts, &global_options_set,
+param_vect_scalar_cost_multiplier, 400);
+
   aarch64_override_options_after_change_1 (opts);
 }
 
@@ -19723,6 +19729,8 @@ static const struct aarch64_attribute_info 
aarch64_attributes[] =
  OPT_msign_return_address_ },
   { "outline-atomics", aarch64_attr_bool, true, NULL,
  OPT_moutline_atomics},
+  { "max-vectorization", aarch64_attr_bool, false, NULL,
+ OPT_mmax_vectorization},
   { NULL, aarch64_attr_custom, false, NULL, OPT }
 };
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 
f32d56d4ffaef7862c1c45a11753be5d480220d0..2725c50da64a2c05489ea6202bdd5eedf1ba7e27
 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -290,6 +290,10 @@ msve-vector-bits=
 Target RejectNegative Joined Enum(sve_vector_bits) 
Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
 -msve-vector-bits= Set the number of bits in an SVE vector 
register.
 
+mmax-vectorization
+Target Undocumented Var(flag_aarch64_max_vectorization) Save
+Override the scalar cost model such that vectorization is always profitable.
+
 mverbose-cost-dump
 Target Undocumented Var(flag_aarch64_verbose_cost)
 Enables verbose cost model dumping in the debug dump files.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
40ccf22b29f4316928f905ec2c978fdaf30a55ec..759a04bc7c4c66155154d55045bb75d695b2d6c2
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3882,6 +3882,13 @@ Enable or disable calls to out-of-line helpers to 
implement atomic operations.
 This corresponds to the behavior of the command-line options
 @option{-moutline-atomics} and @option{-mno-outline-atomics}.
 
+@cindex @code{max-vectorization} function attribute, AArch64
+@item max-vectorization
+Enable or disable loop costing overrides inside the current function to apply
+a penalty to scalar loops such that vector costing is always profitable.
+This corresponds to the behavior of the command-line options
+@option{-mmax-vectorization} and @option{-mno-max-vectorization}.
+
 @cindex @code{ind

Re: Stepping down as aarch64 port maintainer

2025-05-14 Thread Richard Earnshaw (lists)
On 13/05/2025 16:26, Marcus Shawcroft wrote:
> Hello GCC Community,
> 
> Many years have past since I actively worked on the aarch64 backend, so this 
> email is long over due. I’ll be leaving Arm shortly so this seems like a good 
> time to formerly step down as an AArch64 maintainer. It has been a privilege 
> to contribute to GCC and to work alongside the  community. I wish the project 
> all best going forward.
> 
> Cheers
> /Marcus
> 
> 

Hi Marcus,

Many thanks for your contributions to the project.  We wish you well in your 
future endeavours.

Richard.

I'll commit the following to the repository:

diff --git a/MAINTAINERS b/MAINTAINERS
index b1e7fadf1b8..a3e3f25d9d1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -57,7 +57,6 @@ docs, and the testsuite related to that.
 aarch64 ldp/stp Alex Coplan 
 aarch64 portRichard Earnshaw
 aarch64 portRichard Sandiford   
-aarch64 portMarcus Shawcroft
 aarch64 portKyrylo Tkachov  
 alpha port  Richard Henderson   
 amdgcn port Julian Brown
@@ -792,7 +791,6 @@ Senthil Kumar Selvaraj  saaadhu 

 Kostya Serebryany   kcc 
 Thiemo Seufer   -   
 Bill Seurer seurer  
-Marcus Shawcroftmshawcroft  
 Nathaniel Shead nshead  
 Tim Shentimshen 
 Joel Sherrill   joel


Re: [PATCH v2 2/3] libstdc++: Implement C++26 copyable_function [PR119125]

2025-05-14 Thread Jonathan Wakely

On 14/05/25 10:48 +0200, Tomasz Kamiński wrote:

This patch implements C++26 copyable_function as specified in P2548R6.
It also implements LWG 4255 that adjust move_only_function so constructing
from empty copyable_function, produces empty functor. This falls from
existing checks, after specializing __is_polymorphic_function_v for
copyable_function specializations.

For compatible invoker signatures, the move_only_function may be constructed
from copyable_funciton without double indirection. To achieve that we derive
_Cpy_base from _Mo_base, and specialize __is_polymorphic_function_v for
copyable_function. Similary copyable_functions with compatible signatures
can be converted without double indirection.

As we starting to use _Op::_Copy operation from the _M_manage function,
invocations of that functions may now throw exceptions, so noexcept needs
to be removed from the signature of stored _M_manage pointers. This also
affects operations in _Mo_base, however we already wrap _M_manage invocations
in noexcept member functions (_M_move, _M_destroy, swap).

PR libstdc++/119125

libstdc++-v3/ChangeLog:

* doc/doxygen/stdheader.cc: Addded cpyfunc_impl.h header.
* include/Makefile.am: Add bits cpyfunc_impl.h.
* include/Makefile.in: Add bits cpyfunc_impl.h.
* include/bits/cpyfunc_impl.h: New file.
* include/bits/mofunc_impl.h: Mention LWG 4255.
* include/bits/move_only_function.h: Update header description
and change guard to __cplusplus > 202002L.
(_Manager::_Func): Remove noexcept.
(std::__is_polymorphic_function_v>)

(__variant::_Never_valueless_alt>)
(move_only_function) [__glibcxx_move_only_function]: Adjust guard.
(std::__is_polymorphic_function_v>)
(__variant::_Never_valueless_alt>)
(__polyfunc::_Cpy_base, std::copyable_function)
[__glibcxx_copyable_function]: Define.
* include/bits/version.def: Define copyable_function.
* include/bits/version.h: Regenerate.
* include/std/functional: Define __cpp_lib_copyable_function.
* src/c++23/std.cc.in (copyable_function)
[__cpp_lib_copyable_function]: Export.
* testsuite/20_util/copyable_function/call.cc: New test based on
move_only_function tests.
* testsuite/20_util/copyable_function/cons.cc: New test based on
move_only_function tests.
* testsuite/20_util/copyable_function/conv.cc: New test based on
move_only_function tests.
* testsuite/20_util/copyable_function/copy.cc: New test.
* testsuite/20_util/copyable_function/move.cc: New test based on
move_only_function tests.
---
In addition to fixing formatting and typos, this patch adds export of
the copyable_function to std module.

libstdc++-v3/doc/doxygen/stdheader.cc |   1 +
libstdc++-v3/include/Makefile.am  |   1 +
libstdc++-v3/include/Makefile.in  |   1 +
libstdc++-v3/include/bits/cpyfunc_impl.h  | 269 ++
libstdc++-v3/include/bits/mofunc_impl.h   |   4 +
.../include/bits/move_only_function.h |  94 +-
libstdc++-v3/include/bits/version.def |  10 +
libstdc++-v3/include/bits/version.h   |  10 +
libstdc++-v3/include/std/functional   |   1 +
libstdc++-v3/src/c++23/std.cc.in  |   3 +
.../20_util/copyable_function/call.cc | 224 +++
.../20_util/copyable_function/cons.cc | 126 
.../20_util/copyable_function/conv.cc | 251 
.../20_util/copyable_function/copy.cc | 154 ++
.../20_util/copyable_function/move.cc | 120 
15 files changed, 1264 insertions(+), 5 deletions(-)
create mode 100644 libstdc++-v3/include/bits/cpyfunc_impl.h
create mode 100644 libstdc++-v3/testsuite/20_util/copyable_function/call.cc
create mode 100644 libstdc++-v3/testsuite/20_util/copyable_function/cons.cc
create mode 100644 libstdc++-v3/testsuite/20_util/copyable_function/conv.cc
create mode 100644 libstdc++-v3/testsuite/20_util/copyable_function/copy.cc
create mode 100644 libstdc++-v3/testsuite/20_util/copyable_function/move.cc

diff --git a/libstdc++-v3/doc/doxygen/stdheader.cc 
b/libstdc++-v3/doc/doxygen/stdheader.cc
index 3ee825feb66..8a201334410 100644
--- a/libstdc++-v3/doc/doxygen/stdheader.cc
+++ b/libstdc++-v3/doc/doxygen/stdheader.cc
@@ -54,6 +54,7 @@ void init_map()
headers["function.h"]   = "functional";
headers["functional_hash.h"]= "functional";
headers["mofunc_impl.h"]= "functional";
+headers["cpyfunc_impl.h"]   = "functional";
headers["move_only_function.h"] = "functional";
headers["invoke.h"] = "functional";
headers["ranges_cmp.h"] = "functional";
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 1140fa0dffd..5cc13381b02 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libs

RE: [PATCH v2 1/2]middle-end: Add new parameter to scale scalar loop costing in vectorizer

2025-05-14 Thread Tamar Christina
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 14, 2025 12:19 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de
> Subject: [PATCH v2 1/2]middle-end: Add new parameter to scale scalar loop
> costing in vectorizer
> 
> Hi All,
> 
> This patch adds a new param vect-scalar-cost-multiplier to scale the scalar
> costing during vectorization.  If the cost is set high enough and when using
> the dynamic cost model it has the effect of effectively disabling the
> costing vs scalar and assumes all vectorization to be profitable.
> 
> This is similar to using the unlimited cost model, but unlike unlimited it
> does not fully disable the vector cost model.  That means that we still
> perform comparisons between vector modes.  And it means it also still does
> costing for alias analysis.
> 
> As an example, the following:
> 
> void
> foo (char *restrict a, int *restrict b, int *restrict c,
>  int *restrict d, int stride)
> {
> if (stride <= 1)
> return;
> 
> for (int i = 0; i < 3; i++)
> {
> int res = c[i];
> int t = b[i * stride];
> if (a[i] != 0)
> res = t * d[i];
> c[i] = res;
> }
> }
> 
> compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to
> vectorize as it assumes scalar would be faster, and with
> -fvect-cost-model=unlimited it picks a vector type that's so big that the 
> large
> sequence generated is working on mostly inactive lanes:
> 
> ...
> and p3.b, p3/z, p4.b, p4.b
> whilelo p0.s, wzr, w7
> ld1wz23.s, p3/z, [x3, #3, mul vl]
> ld1wz28.s, p0/z, [x5, z31.s, sxtw 2]
> add x0, x5, x0
> punpklo p6.h, p6.b
> ld1wz27.s, p4/z, [x0, z31.s, sxtw 2]
> and p6.b, p6/z, p0.b, p0.b
> punpklo p4.h, p7.b
> ld1wz24.s, p6/z, [x3, #2, mul vl]
> and p4.b, p4/z, p2.b, p2.b
> uqdecw  w6
> ld1wz26.s, p4/z, [x3]
> whilelo p1.s, wzr, w6
> mul z27.s, p5/m, z27.s, z23.s
> ld1wz29.s, p1/z, [x4, z31.s, sxtw 2]
> punpkhi p7.h, p7.b
> mul z24.s, p5/m, z24.s, z28.s
> and p7.b, p7/z, p1.b, p1.b
> mul z26.s, p5/m, z26.s, z30.s
> ld1wz25.s, p7/z, [x3, #1, mul vl]
> st1wz27.s, p3, [x2, #3, mul vl]
> mul z25.s, p5/m, z25.s, z29.s
> st1wz24.s, p6, [x2, #2, mul vl]
> st1wz25.s, p7, [x2, #1, mul vl]
> st1wz26.s, p4, [x2]
> ...
> 
> With -fvect-cost-model=dynamic --param vect-scalar-cost-multiplier=200
> you get more reasonable code:
> 
> foo:
> cmp w4, 1
> ble .L1
> ptrue   p7.s, vl3
> index   z0.s, #0, w4
> ld1bz29.s, p7/z, [x0]
> ld1wz30.s, p7/z, [x1, z0.s, sxtw 2]
>   ptrue   p6.b, all
> cmpne   p7.b, p7/z, z29.b, #0
> ld1wz31.s, p7/z, [x3]
>   mul z31.s, p6/m, z31.s, z30.s
> st1wz31.s, p7, [x2]
> .L1:
> ret
> 
> This model has been useful internally for performance exploration and 
> cost-model
> validation.  It allows us to force realistic vectorization overriding the cost
> model to be able to tell whether it's correct wrt to profitability.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * params.opt (vect-scalar-cost-multiplier): New.
>   * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
>   * doc/invoke.texi (vect-scalar-cost-multiplier): Document it.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/cost_model_16.c: New test.
> 
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index
> 699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48
> bfad8f9c58bcc5f 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17273,6 +17273,10 @@ this parameter.  The default value of this parameter
> is 50.
>  @item vect-induction-float
>  Enable loop vectorization of floating point inductions.
> 
> +@item vect-scalar-cost-multiplier
> +Apply the given multiplier % to scalar loop costing during vectorization.
> +Increasing the cost multiplier will make vector loops more profitable.
> +
>  @item vrp-block-limit
>  Maximum number of basic blocks before VRP switches to a lower memory
> algorithm.
> 
> diff --git a/gcc/params.opt b/gcc/params.opt
> index
> 1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593f
> e17cd88f2fc32367 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies
> to the cost of statements i
>  Common Joined UInteger Var(param_vect_induction_float) Init(1)
> IntegerRange(0, 1) Param Optimization
>  Enable loop vectorization of floating point indu

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-14 Thread Tamar Christina
> > > >
> > > > -  /* Loops vectorized with a variable factor won't benefit from
> > > > +  /* Loops vectorized would have already taken into account unrolling
> specified
> > > > + by the user as the suggested unroll factor, as such we need to 
> > > > prevent the
> > > > + RTL unroller from unrolling twice.  The only exception is static 
> > > > known
> > > > + iterations where we would have expected the loop to be fully 
> > > > unrolled.
> > > > + Loops vectorized with a variable factor won't benefit from
> > > >   unrolling/peeling.  */
> > > > -  if (!vf.is_constant ())
> > > > +  if (LOOP_VINFO_USER_UNROLL (loop_vinfo)
> > >
> > > ... this is the transform phase - is LOOP_VINFO_USER_UNROLL copied
> > > from the earlier attempt?
> >
> > Ah, I see I forgot to copy it when the loop_vinfo is copied..  Will fix.
> >

I've been looking more into the behavior and I think it's correct not to copy 
it from an earlier attempt.
The flag would be re-set every time during vect_estimate_min_profitable_iters 
as we have to recalculate
the unroll based on the assumed_vf.

When vect_analyze_loop_2 initializes the costing structure, we just set it 
again during vect_analyze_loop_costing
as loop->unroll is not cleared until vectorization succeeds.

For the epilogue it would be false, which I think makes sense as the epilogues 
should determine their VF solely
based on that of the previous attempt? Because I think it makes sense that the 
epilogues should be able to tell
the vectorizer that it wants to re-use the same mode for the next attempt, just 
without the unrolling.

> In the end whatever we do it's going to be a matter of documenting
> the interaction between vectorization and #pragma GCC unroll.
> 

Docs added

> The way you handle it is reasonable, the question is whether to
> set loop->unroll to 1 in the end (disable any further unrolling)
> or to 0 (only auto-unroll based on heuristics).  I'd argue 0
> makes more sense - iff we chose to apply the extra unrolling
> during vectorization.

0 does make more sense to me as well.  I think where we got crossed earlier was 
that I was mentioning that
Having unroll > 1 after this wasn't a good idea, so was a miscommunication. But 
0 makes much sense.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vectorizer.h (vector_costs::set_suggested_unroll_factor,
LOOP_VINFO_USER_UNROLL): New.
(class _loop_vec_info): Add user_unroll.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Set
suggested_unroll_factor before calling backend costing.
(_loop_vec_info::_loop_vec_info): Initialize user_unroll.
(vect_transform_loop): Clear the loop->unroll value if the pragma was
used.
doc/extend.texi (pragma unroll): Document vectorizer interaction.

-- inline copy of patch --

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
e87a3c271f8420d8fd175823b5bb655f76c89afe..f8261d13903afc90d3341c09ab3fdbd0ab96ea49
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10398,6 +10398,11 @@ unrolled @var{n} times regardless of any commandline 
arguments.
 When the option is @var{preferred} then the user is allowed to override the
 unroll amount through commandline options.
 
+If the loop was vectorized the unroll factor specified will be used to seed the
+vectorizer unroll factor.  Whether the loop is unrolled or not will be
+determined by target costing.  The resulting vectorized loop may still be
+unrolled more in later passes depending on the target costing.
+
 @end table
 
 @node Thread-Local
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..1fbf92b5f4b176ada7379930b73ab503fb423e99
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1073,6 +1073,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 peeling_for_gaps (false),
 peeling_for_niter (false),
 early_breaks (false),
+user_unroll (false),
 no_data_dependencies (false),
 has_mask_store (false),
 scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -4983,6 +4984,26 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
}
 }
 
+  /* Seed the target cost model with what the user requested if the unroll
+ factor is larger than 1 vector VF.  */
+  auto user_unroll = LOOP_VINFO_LOOP (loop_vinfo)->unroll;
+  if (user_unroll > 1)
+{
+  LOOP_VINFO_USER_UNROLL (loop_vinfo) = true;
+  int unroll_fact = user_unroll / assumed_vf;
+  unroll_fact = 1 << ceil_log2 (unroll_fact);
+  if (unroll_fact > 1)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"setting unroll factor to %d based on user requested "
+"unroll factor %d 

[PATCH 2/2]AArch64: Use vectorizer initial unrolling as default

2025-05-14 Thread Tamar Christina
Hi All,

The vectorizer now tries to maintain the target VF that a user wanted through
uncreasing the unroll factor if the user used pragma GCC unroll and we've
vectorized the loop.

This change makes the AArch64 backend honor this initial value being set by
the vectorizer.

Consider the loop

void f1 (int *restrict a, int n)
{
#pragma GCC unroll 4 requested
  for (int i = 0; i < n; i++)
a[i] *= 2;
}

The target can then choose to create multiple epilogues to deal with the "rest".

The example above now generates:

.L4:
ldr q31, [x2]
add v31.4s, v31.4s, v31.4s
str q31, [x2], 16
cmp x2, x3
bne .L4

as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates:

.L4:
ldp q30, q31, [x2]
add v30.4s, v30.4s, v30.4s
add v31.4s, v31.4s, v31.4s
stp q30, q31, [x2], 32
cmp x3, x2
bne .L4

Note that as a follow up I plan on looking into asking the vectorizer to
generate multiple epilogues when we do unroll like this as we can
re-request the same mode but without the unroll as the first epilogue.
Atm I added a TODO since e.g. for early break we don't support vector
epilogues yet and multiple epilogues needs some thoughts and internal
discussions.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Use
m_suggested_unroll_factor instead of 1.
(aarch64_vector_costs::finish_cost): Add todo for epilogues.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/unroll-vect.c: New test.

---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..cf6f56a08d67044c8dc34578902eb4cb416641bd
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18075,7 +18075,7 @@ aarch64_vector_costs::determine_suggested_unroll_factor 
()
   if (!sve && !TARGET_SVE2 && m_has_avg)
 return 1;
 
-  unsigned int max_unroll_factor = 1;
+  unsigned int max_unroll_factor = m_suggested_unroll_factor;
   for (auto vec_ops : m_ops)
 {
   aarch64_simd_vec_issue_info const *vec_issue
@@ -18293,6 +18293,8 @@ aarch64_vector_costs::finish_cost (const vector_costs 
*uncast_scalar_costs)
 m_costs[vect_body]);
   m_suggested_unroll_factor = determine_suggested_unroll_factor ();
 
+  /* TODO: Add support for multiple epilogues and costing for early break. 
 */
+
   /* For gather and scatters there's an additional overhead for the first
 iteration.  For low count loops they're not beneficial so model the
 overhead as loop prologue costs.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/unroll-vect.c 
b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
new file mode 100644
index 
..3cb774ba95787ebee488fbe7306299ef28e6bb35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -march=armv8-a --param 
aarch64-autovec-preference=asimd-only -std=gnu99" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** f1:
+** ...
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** ...
+*/
+void f1 (int *restrict a, int n)
+{
+#pragma GCC unroll 16
+  for (int i = 0; i < n; i++)
+a[i] *= 2;
+}
+


-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..cf6f56a08d67044c8dc34578902eb4cb416641bd 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18075,7 +18075,7 @@ aarch64_vector_costs::determine_suggested_unroll_factor ()
   if (!sve && !TARGET_SVE2 && m_has_avg)
 return 1;
 
-  unsigned int max_unroll_factor = 1;
+  unsigned int max_unroll_factor = m_suggested_unroll_factor;
   for (auto vec_ops : m_ops)
 {
   aarch64_simd_vec_issue_info const *vec_issue
@@ -18293,6 +18293,8 @@ aarch64_vector_costs::finish_cost (const vector_costs *uncast_scalar_costs)
 	 m_costs[vect_body]);
   m_suggested_unroll_factor = determine_suggested_unroll_factor ();
 
+  /* TODO: Add support for multiple epilogues and costing for early break.  */
+
   /* For gather and scatters there's an additional overhead for the first
 	 iteration.  For low count loops they're not beneficial so model the
 	 overhead as loop prologue costs.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/unroll-vect.c b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c
new file mode 100644
index ..3cb774ba95

Re: [PATCH v2 1/3] libstdc++: Avoid double indirection in move_only_function when possible [PR119125]

2025-05-14 Thread Tomasz Kaminski
On Wed, May 14, 2025 at 12:52 PM Jonathan Wakely  wrote:

> On 14/05/25 10:48 +0200, Tomasz Kamiński wrote:
> >Based on the provision in C++26 [func.wrap.general] p2 this patch adjust
> the generic
> >move_only_function(_Fn&&) constructor, such that when _Fn refers to
> selected
> >move_only_function instantiations, the ownership of the target object is
> direclty
>
> s/direclty/directly/
>
> >transfered to constructor object. This avoid cost of double indireciton
> in this situation.
>
> s/indireciton/indirection/
>
> >We apply this also in C++23 mode.
> >
> >We also fix handling of self assigments, to match behavior required by
> standard,
>
> s/assigments/assignments/
>
> >due use of copy and swap idiom.
> >
> >An instantiations MF1 of move_only_function can transfer target of another
> >instantiation MF2, if it can be constructed via usual rules
> (__is_callable_from<_MF2>),
> >and their invoker are convertible (__is_invocer_convertible()),
> i.e.:
> >* MF1 is less noexcept than MF2,
> >* return types are the same after stripping cv-quals
> >* adujsted parameters type are the same (__poly::_param_t), i.e. param of
> types T and T&&
> >  are compatible for non-trivially copyable objects.
> >Compatiblity of cv ref qualification is checked via
> __is_callable_from<_MF2>.
> >
> >To achieve above the generation of _M_invoke functions is moved to
> _Invoke class
>
> s/_Invoke/_Invoker/
>
> >templates, that only depends on noexcept, return type and adjusted
> parameter of the
> >signature. To make the invoker signature compatible between const and
> mutable
> >qualified signatures, we always accept _Storage as const& and perform a
> const_cast
> >for locally stored object. This approach guarantees that we never strip
> const from
> >const object.
> >
> >Another benefit of this approach is that
> move_only_function
> >and move_only_function use same funciton pointer,
> which should
> >reduce binary size.
> >
> >The _Storage and _Manager functionality was also extracted and adjusted
> from
> >_Mo_func base, in preparation for implementation for copyable_function and
> >function_ref. The _Storage was adjusted to store functions pointers as
> void(*)().
> >The manage function, now accepts _Op enum parameter, and supports
> additional
> >operations:
> > * _Op::_Address stores address of target object in destination
> > * _Op::_Copy, when enabled, copies from source to destination
> >Furthremore, we provide a type-independent mamange functions for handling
> all:
>
> s/Furthremore/Furthermore/
>
> > * function pointer types
> > * trivially copyable object stored locally.
> >Similary as in case of invoker, we always pass source as const (for copy),
> >and cast away constness in case of move operations, where we know that
> source
> >is mutable.
> >
> >Finally, the new helpers are defined in __polyfunc internal namespace.
> >
> >   PR libstdc++/119125
> >
> >libstdc++-v3/ChangeLog:
> >
> >   * include/bits/mofunc_impl.h: (std::move_only_function): Adjusted
> for
> >   changes in bits/move_only_function.h
> >   (move_only_function::move_only_function(_Fn&&)): Special case
> >   move_only_functions with same invoker.
> >   (move_only_function::operator=(move_only_function&&)): Handle self
> >   assigment.
>
> s/assigment/assignment/
>
> >   * include/bits/move_only_function.h (__polyfunc::_Ptrs)
> >   (__polyfunc::_Storage): Refactored from _Mo_func::_Storage.
> >   (__polyfunc::__param_t): Moved from move_only_function::__param_t.
> >   (__polyfunc::_Base_invoker, __polyfunc::_Invoke): Refactored from
>
> s/_Invoke/_Invoker/
>
> >   move_only_function::_S_invoke.
> >   (__polyfunc::_Manager): Refactored from _Mo_func::_S_manager.
> >   (std::_Mofunc_base): Moved into __polyfunc::_Mo_base with parts
> >   extracted to __polyfunc::_Storage and __polyfunc::_Manager.
> >   (__polyfunc::__deref_as, __polyfunc::__invoker_of)
> >   (__polyfunc::__base_of, __polyfunc::__is_invoker_convertible):
> Define.
> >   (std::__is_move_only_function_v): Renamed to
> >   __is_polymorphic_function_v.
> >   (std::__is_polymorphic_function_v): Renamed from
> >   __is_move_only_function_v.
> >   * testsuite/20_util/move_only_function/call.cc: Test for
> >   functions pointers.
> >   * testsuite/20_util/move_only_function/conv.cc: New test.
> >   * testsuite/20_util/move_only_function/move.cc: Tests for
> >   self assigment.
> >---
> >In addition to adjusting formatting and fixing typo, this update:
> > * consistently call global new when placement new is used, and
> >   non-global for heap allocations
> > * moves _Invoker before _Manager.
> >The _Invoker can be supported for non hosted enviroment, as well
> >as function_ref.
> >
> > libstdc++-v3/include/bits/mofunc_impl.h   |  74 +--
> > .../include/bits/move_only_function.h | 455 +-
> > .../20_util/move_only_function/call.cc|  14 +
> > .../20_util/move

  1   2   >