date:20250401

[PATCH] c++: Fix typo in RAW_DATA_CST build_list_conv subsubconv hanling [PR119563]

2025-04-01 Thread Jakub Jelinek

Hi!

The following testcase ICEs (the embed one actually doesn't but
dereferences random uninitialized pointer far after allocated memory)
because of a typo.  In the RAW_DATA_CST handling of list conversion
where there are conversions to something other than
initializer_list<{{,un}signed ,}char>, the code now calls
implicit_conversion for all the RAW_DATA_CST elements and stores them
into subsubconvs array.
The next loop (done in a separate loop because subsubconvs[0] is
handled differently) attempts to do the
  for (i = 0; i < len; ++i)
{
  conversion *sub = subconvs[i];
  if (sub->rank > t->rank)
t->rank = sub->rank;
  if (sub->user_conv_p)
t->user_conv_p = true;
  if (sub->bad_p)
t->bad_p = true;
}
rank/user_conv_p/bad_p merging, but I mistyped the index, the loop
iterates with j iterator and i is subconvs index, so the loop effectively
doesn't do anything interesting except for merging from one of the
subsubconvs element, if lucky within the subsubconvs array, if unlucky
not even from inside of the array.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-01  Andrew Pinski  
Jakub Jelinek  

PR c++/119563
* call.cc (build_list_conv): Fix a typo in loop gathering
summary information from subsubconvs.

* g++.dg/cpp0x/pr119563.C: New test.
* g++.dg/cpp/embed-26.C: New test.

--- gcc/cp/call.cc.jj   2025-03-06 11:08:20.657231238 +0100
+++ gcc/cp/call.cc  2025-04-01 11:25:14.030321320 +0200
@@ -917,7 +917,7 @@ build_list_conv (tree type, tree ctor, i
  t->rank = cr_exact;
  for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j)
{
- sub = subsubconvs[i];
+ sub = subsubconvs[j];
  if (sub->rank > t->rank)
t->rank = sub->rank;
  if (sub->user_conv_p)
--- gcc/testsuite/g++.dg/cpp0x/pr119563.C.jj2025-04-01 11:14:24.702394749 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/pr119563.C   2025-04-01 11:13:50.962866358 
+0200
@@ -0,0 +1,79 @@
+// PR c++/119563
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+
+namespace std {
+template 
+struct initializer_list {
+private:
+  T *_M_array;
+  decltype (sizeof 0) _M_len;
+public:
+  constexpr decltype (sizeof 0)
+  size () const noexcept { return _M_len; }
+  constexpr const T *
+  begin () const noexcept { return _M_array; }
+  constexpr const T *
+  end () const noexcept { return begin () + size (); }
+};
+}
+
+struct A {} a;
+
+struct B {
+  constexpr B (int x) : B (a, x) {}
+  template 
+  constexpr B (A, T... x) : b(x...) {}
+  int b;
+};
+
+struct C {
+  C (std::initializer_list x)
+  {
+if (x.size () != 130 + 1024 + 130)
+  __builtin_abort ();
+unsigned int i = 1, j = 0;
+for (auto a = x.begin (); a < x.end (); ++a)
+  if (a->b != i)
+   __builtin_abort ();
+  else
+   {
+ if (j == 129 || j == 129 + 1024)
+   i = 0;
+ i = (i & 15) + 1;
+ ++j;
+   }
+c = true;
+  }
+  bool c;
+};
+
+#define D 1 + 0, 2 + 0, 3 + 0, 4 + 0, 5 + 0, 6 + 0, 7 + 0, 8 + 0, \
+ 9 + 0, 10 + 0, 11 + 0, 12 + 0, 13 + 0, 14 + 0, 15 + 0, 16 + 0
+#define E D, D, D, D, D, D, D, D
+
+C c { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, E, E, E, E, E, E, E, E,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2 };
+
+int
+main ()
+{
+  if (!c.c)
+__builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-26.C.jj  2025-04-01 11:18:56.563595381 
+0200
+++ gcc/testsuite/g++.dg/cpp/embed-26.C 2025-04-01 11:18:35.444890450 +0200
@@ -0,0 +1,63 @@
+// PR c++/119563
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+
+namespace std {
+template 
+struct initializer_list {
+private:
+  T *_M_array;
+  decltype (sizeof 0) _M_len;
+public:
+  constexpr decltype (sizeof 0)
+  size () const noexcept { return _M_len; }
+  constexpr const T *
+  begin () const noexcept { return _M_array; }
+  constexpr const T *
+  end () const noexcept { return begin () + size ();

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Qing Zhao



> On Apr 1, 2025, at 10:04, Martin Uecker  wrote:
> 
> 
> 
> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
>>> I'd like to offer up this to solve the issues we're facing. This is a
>>> combination of everything that's been discussed here (or at least that
>>> I've been able to read in the centi-thread :-).
> 
> Thanks! I think this proposal much better as it avoids undue burden
> on parsers, but it does not address all my concerns.
> 
> 
> From my side, the issue about compromising the scoping rules of C
> is also about unintended non-local effects of code changes. In
> my opinion, a change in a library header elsewhere should not cause 
> code in a local scope (which itself might also come from a macro) to
> emit a warning or require a programmer to add a workaround. So I am
> not convinced that adding warnings or a workaround such as
> __builtin_global_ref  is a good solution.
> 
> 
> I could see the following as a possible way forward: We only 
> allow the following two syntaxes:
> 
> 1. Single argument referring to a member.
> 
> __counted_by(len)
> 
> with an argument that must be a single identifier and where
> the identifier then must refer to a struct member. 
> 
> (I still think this is not ideal and potentially
> confusing, but in contrast to new scoping rules it is
> at least relatively easily to explain as a special rule.).  
> 
> 
> 2. Forward declarations. 
> 
> __counted_by(size_t len; len + PADDING)

In the above, the PADDING is some constant? 

More complicated expressions involving globals will not be supported?
> 
> where then the second part can also be a more complicated 
> expression, but with the explicit requirement that all
> identifiers in this expression are then looked up according to
> regular C language rules. So except for the forward declared
> member(s) they are *never* looked up in the member namespace of
> the struct, i.e. no new name lookup rules are introduced.

One question here:

What’s the major issue if we’d like to add one new scoping rule, for example,
“Structure scope” (the same as the C++’s instance scope) to C? 

(In addition to the "VLA in structure" issue I mentioned in my previous 
writeup, 
is there any other issue to prevent this new scoping rule being added into C ?).

Qing


> 
> 
> I think this could address my concerns about breaking
> scoping in C. Still, I personally would prefer designator syntax
> for both C and C++ as a nicer solution, and one that already
> has some support from WG14.
> 
> Martin
> 
> 
>>> 
>>> ---
>>> 
>>> 1. The use of '__self' isn't feasible, so we won't use it. Instead,
>>> we'll rely upon the current behavior—resolving any identifiers to the
>>> "instance scope". This new scope is used __only__ in attributes, and
>>> resolves identifiers to those in the least enclosing, non-anonymous
>>> struct. For example:
>>> 
>>> struct foo {
>>>  char count;
>>>  struct bar {
>>>struct {
>>>  int len;
>>>};
>>>struct {
>>>  struct {
>>>int *valid_use __counted_by(len); // Valid.
>>>  };
>>>};
>>>int *invalid_use __counted_by(count); // Invalid.
>>>  } b;
>>> };
>>> 
>>> Rationale: This is how '__guarded_by' currently resolves identifiers,
>>> so there's precedence. And if we can't force its usage in all
>>> situations, it's less a feature and more a "nicety" which will lead to
>>> a massive discrepancy between compiler implementations. Despite the
>>> fact that this introduces a new scoping mechanism to C, its use is not
>>> as extensive as C++'s instance scoping and will apply only to
>>> attributes. In the case where we have two different resolution
>>> techniquest happening within the same structure (e.g. VLAs), we can
>>> issue warnings as outlined in Yeoul's RFC[1].
>>> 
>>> 2. A method of forward declaring variables will be added for variables
>>> that occur in the struct after the attribute. For example:
>>> 
>>> A: Necessary usage:
>>> 
>>> struct foo {
>>>  int *buf __counted_by(char count; count);
>>>  char count;
>>> };
>>> 
>>> B: Unnecessary, but still valid, usage:
>>> 
>>> struct foo {
>>>  char count;
>>>  int *buf __counted_by(char count; count);
>>> };
>>> 
>>> * The forward declaration is required in (A) but not in (B).
>>> * The type of 'count' as declared in '__counted_by' *must* match the real 
>>> type.
>>> 
>>> Rationale: This alleviates the issues of "double parsing" for
>>> compilers that aren't able to handle it. (We can also remove the
>>> '-fexperimental-late-parse-attributes' flag in Clang.)
>>> 
>>> 3. A new builtin '__builtin_global_ref()' (or similarly named) is
>>> added to refer to variables outside of the most-enclosing structure.
>>> Example:
>>> 
>>> int count_that_will_never_change_we_promise;
>>> 
>>> struct foo {
>>>  int *bar 
>>> __counted_by(__builtin_global_ref(count_that_will_never_change_we_promise));
>>>  unsigned flags;
>>> };
>>> 
>>> As Yeoul pointed out, there isn't a way to refer to variables that
>>> have been shadowed, so t

Re: [PATCH] gimple-low: Diagnose assume attr expressions defining labels which are used as unary && operands outside of those [PR119537]

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> Hi!
> 
> The following testcases ICE on invalid code which defines
> labels inside of statement expressions and then uses &&label
> from code outside of the statement expressions.
> The C++ FE diagnoses that with a warning (not specifically for
> assume attribute, genericallly about taking address of a label
> outside of a statement expression so computed goto could violate
> the requirement that statement expression is not entered from
> outside of it through a jump into it), the C FE doesn't diagnose
> anything.
> Normal direct gotos to such labels are diagnosed by both C and C++.
> In the assume attribute case it is actually worse than for
> addresses of labels in normal statement expressions, in that case
> the labels are still in the current function, so invalid program
> can still jump to those (and in case of OpenMP/OpenACC where it
> is also invalid and stuff is moved to a separate function, such
> movement is done post cfg discovery of FORCED_LABELs and worst
> case one can run into cases which fail to assemble, but I haven't
> succeeded in creating ICE for that).
> For assume at -O0 we'd just throw away the assume expression if
> it is not a simple condition and so the label is then not defined
> anywhere and we ICE during cfg pass.
> The gimplify.cc hunks fix that, as we don't have FORCED_LABELs
> discovery done yet, it preserves all assume expressions which contain
> used user labels.
> With that we ICE during IRA, which is upset about an indirect jump
> to a label which doesn't exist.
> So, the gimple-low.cc hunks add diagnostics of the problem, it gathers
> uids of all the user used labels inside of the assume expressions (usually
> none) and if it finds any, walks the IL to find uses of those from outside
> of those expressions now outlined into separate magic functions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2025-04-01  Jakub Jelinek  
> 
>   PR middle-end/119537
>   * gimplify.cc (find_used_user_labels): New function.
>   (gimplify_call_expr): Don't remove complex assume expression at -O0
>   if it defines any user labels.
>   * gimple-low.cc: Include diagnostic-core.h.
>   (assume_labels): New variable.
>   (diagnose_assume_labels): New function.
>   (lower_function_body): Call it via walk_gimple_seq if assume_labels
>   is non-NULL, then BITMAP_FREE assume_labels.
>   (find_assumption_locals_r): Record in assume_labels uids of user
>   labels defined in assume attribute expressions.
> 
>   * c-c++-common/pr119537-1.c: New test.
>   * c-c++-common/pr119537-2.c: New test.
> 
> --- gcc/gimplify.cc.jj2025-03-31 12:53:44.853727077 +0200
> +++ gcc/gimplify.cc   2025-03-31 17:05:40.854893880 +0200
> @@ -4508,6 +4508,21 @@ gimplify_variant_call_expr (tree expr, f
>  }
>  
>  
> +/* Helper function for gimplify_call_expr, called via walk_tree.
> +   Find used user labels.  */
> +
> +static tree
> +find_used_user_labels (tree *tp, int *, void *)
> +{
> +  if (TREE_CODE (*tp) == LABEL_EXPR
> +  && !DECL_ARTIFICIAL (LABEL_EXPR_LABEL (*tp))
> +  && DECL_NAME (LABEL_EXPR_LABEL (*tp))
> +  && TREE_USED (LABEL_EXPR_LABEL (*tp)))
> +return *tp;
> +  return NULL_TREE;
> +}
> +
> +
>  /* Gimplify the CALL_EXPR node *EXPR_P into the GIMPLE sequence PRE_P.
> WANT_VALUE is true if the result of the call is desired.  */
>  
> @@ -4568,8 +4583,14 @@ gimplify_call_expr (tree *expr_p, gimple
>  fndecl, 0));
> return GS_OK;
>   }
> -   /* If not optimizing, ignore the assumptions.  */
> -   if (!optimize || seen_error ())
> +   /* If not optimizing, ignore the assumptions unless there
> +  are used user labels in it.  */
> +   if ((!optimize
> +&& !walk_tree_without_duplicates (&CALL_EXPR_ARG (*expr_p,
> +  0),
> +  find_used_user_labels,
> +  NULL))
> +   || seen_error ())
>   {
> *expr_p = NULL_TREE;
> return GS_ALL_DONE;
> --- gcc/gimple-low.cc.jj  2025-03-21 20:25:38.396064280 +0100
> +++ gcc/gimple-low.cc 2025-03-31 20:20:38.961772695 +0200
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-inline.h"
>  #include "gimple-walk.h"
>  #include "attribs.h"
> +#include "diagnostic-core.h"
>  
>  /* The differences between High GIMPLE and Low GIMPLE are the
> following:
> @@ -78,6 +79,10 @@ struct lower_data
>bool cannot_fallthru;
>  };
>  
> +/* Bitmap of LABEL_DECL uids for user labels moved into assume outlined
> +   functions.  */
> +static bitmap assume_labels;
> +
>  static void lower_stmt (gimple_stmt_iterator *, struct lower_data *);
>  static

Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern

2025-04-01 Thread Jeff Law





On 3/30/25 8:54 PM, Bohan Lei wrote:

The RTL pattern has an "and" operation, which clears out the upper bits after
the shift operation.  Since we have (INTVAL (operands[3]) >> INTVAL
(operands[2])) == 0x as a constraint, the RTL template and the split
code should be semantically identical.  Also, the RTL template here is
technically the same as that of "*slliuw" in bitmanip.md, and the split code
shows the semantics of an slli.uw operation.
You're absolutely right.  Not sure how I missed that.  I'll try to take 
another look at this tomorrow AM.  It's just too late to try tonight.


Thanks,
jeff

Re: [PATCH][v2] target/119549 - fixup handling of -mno-sse4 in target attribute

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 10:13:22AM +0200, Richard Biener wrote:
> The following fixes ix86_valid_target_attribute_inner_p to properly
> handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
> I've added asserts to ix86_handle_option that RejectNegative is honored
> for both.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks,
> Richard.
> 
>   PR target/119549
>   * common/config/i386/i386-common.cc (ix86_handle_option):
>   Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
>   * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
>   Process negated OPT_msse4 as OPT_mno_sse4.
> 
>   * gcc.target/i386/pr119549.c: New testcase.

LGTM.

Jakub

[PATCH] c++/modules: Forbid exposures of TU-local entities in inline variables [PR119551]

2025-04-01 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

An inline variable has vague linkage, and needs to be conditionally
emitted in TUs that reference it.  Unfortunately this clashes with
[basic.link] p14.2, which says that we ignore the initialisers of all
variables (including inline ones), since importers will not have access
to the referenced TU-local entities to write the definition.

This patch makes such exposures be ill-formed.  One case that continues
to work is if the exposure is part of the dynamic initialiser of an
inline variable; in such cases, the definition has been built as part of
the module interface unit anyway, and importers don't need to write it
out again, so such exposures are "harmless".

PR c++/119551

gcc/cp/ChangeLog:

* module.cc (trees_out::write_var_def): Only ignore non-inline
variable initializers.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-5_a.C: Add cases that should be
ignored.
* g++.dg/modules/internal-5_b.C: Test these new cases, and make
the testcase more robust.
* g++.dg/modules/internal-11.C: New test.
* g++.dg/modules/internal-12_a.C: New test.
* g++.dg/modules/internal-12_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc |  7 +++---
 gcc/testsuite/g++.dg/modules/internal-11.C   | 24 
 gcc/testsuite/g++.dg/modules/internal-12_a.C | 13 +++
 gcc/testsuite/g++.dg/modules/internal-12_b.C | 14 
 gcc/testsuite/g++.dg/modules/internal-5_a.C  |  8 ++-
 gcc/testsuite/g++.dg/modules/internal-5_b.C  |  6 +
 6 files changed, 68 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-11.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 894c70f7225..ce22b2ece3f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12679,9 +12679,10 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
 void
 trees_out::write_var_def (tree decl)
 {
-  /* The initializer of a variable or variable template is ignored for
- determining exposures.  */
-  auto ovr = make_temp_override (dep_hash->ignore_tu_local, VAR_P (decl));
+  /* The initializer of a non-inline variable or variable template is
+ ignored for determining exposures.  */
+  auto ovr = make_temp_override (dep_hash->ignore_tu_local,
+VAR_P (decl) && !DECL_INLINE_VAR_P (decl));
 
   tree init = DECL_INITIAL (decl);
   tree_node (init);
diff --git a/gcc/testsuite/g++.dg/modules/internal-11.C 
b/gcc/testsuite/g++.dg/modules/internal-11.C
new file mode 100644
index 000..53eb30a62be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-11.C
@@ -0,0 +1,24 @@
+// PR c++/119551
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi !M }
+
+export module M;
+
+static int tu_local = 5;
+static int* foo() { return &tu_local; }
+
+// For implementation reasons, we adjust [basic.link] p14.2 to restrict ignored
+// exposures to non-inline variables, since for inline variables without
+// dynamic initialisation we need to emit their initialiser for importer use.
+
+int* a = &tu_local;  // OK
+inline int* b = &tu_local;  // { dg-error "exposes TU-local entity" }
+
+// But dynamic initialisers are fine, importers will just treat them as 
external.
+inline int* c = foo();  // OK
+
+// For consistency, we follow the same rules with templates, noting that
+// we still need to emit definitions with dynamic initializers so we error.
+template  int* d = &tu_local;  // OK
+template  inline int* e = &tu_local;  // { dg-error "exposes 
TU-local entity" }
+template  inline int* f = foo();  // { dg-error "exposes TU-local 
entity" }
diff --git a/gcc/testsuite/g++.dg/modules/internal-12_a.C 
b/gcc/testsuite/g++.dg/modules/internal-12_a.C
new file mode 100644
index 000..5c4e7c602b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-12_a.C
@@ -0,0 +1,13 @@
+// PR c++/119551
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M }
+// Test that emitting variables referencing TU-local entities
+// builds and runs correctly.
+
+export module M;
+
+static int tu_local_var = 5;
+static int* tu_local_func() { return &tu_local_var; }
+
+export int* a = &tu_local_var;
+export inline int* b = tu_local_func();
diff --git a/gcc/testsuite/g++.dg/modules/internal-12_b.C 
b/gcc/testsuite/g++.dg/modules/internal-12_b.C
new file mode 100644
index 000..bc3edf99a11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-12_b.C
@@ -0,0 +1,14 @@
+// PR c++/119551
+// { dg-module-do run }
+// { dg-additional-options "-fmodules" }
+
+import M;
+
+int main() {
+  if (*a != 5)
+__builtin_abort();
+  if (*b != 5)
+__builtin_abort();
+  if (a != b)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/g++.dg/m

Re: [PATCH] tailc, v2: Improve tail recursion handling [PR119493]

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> On Tue, Apr 01, 2025 at 10:59:36AM +0200, Jakub Jelinek wrote:
> > On Tue, Apr 01, 2025 at 10:46:15AM +0200, Richard Biener wrote:
> > > This looks OK, but I wonder if ...
> > > > - /* The parameter should be a real operand, so that phi 
> > > > node
> > > > -created for it at the start of the function has the 
> > > > meaning
> > > > -of copying the value.  This test implies 
> > > > is_gimple_reg_type
> > > > -from the previous condition, however this one could be
> > > > -relaxed by being more careful with copying the new 
> > > > value
> > > > -of the parameter (emitting appropriate GIMPLE_ASSIGN 
> > > > and
> > > > -updating the virtual operands).  */
> > > > - if (!is_gimple_reg (param))
> > > > + if (is_gimple_reg_type (TREE_TYPE (param))
> > > > + ? !is_gimple_reg (param)
> > > 
> > > ... we want to restrict this to musttail calls at this point and
> > > relax for stage1 only?
> > 
> > I can do that, sure.
> 
> Here it is, ok if it passes bootstrap/regtest?  I'll queue the interdiff
> between this patch and the previous one for GCC 16.

OK.

Thanks,
Richard.

> 2025-04-01  Jakub Jelinek  
> 
>   PR tree-optimization/119493
>   * tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion
>   if some arguments don't have is_gimple_reg_type, only punt if they
>   have non-POD types, or volatile, or addressable or (for now) it is
>   not a musttail call.  Set tailr_arg_needs_copy in those cases too.
>   (eliminate_tail_call): Copy call arguments to params if they don't
>   have is_gimple_reg_type, use temporaries if the argument is used
>   later.
>   (tree_optimize_tail_calls_1): Skip !is_gimple_reg_type
>   tailr_arg_needs_copy parameters.  Formatting fix.
> 
>   * gcc.dg/pr119493-1.c: New test.
> 
> --- gcc/tree-tailcall.cc.jj   2025-03-28 10:49:30.0 +0100
> +++ gcc/tree-tailcall.cc  2025-04-01 11:47:31.417637335 +0200
> @@ -676,19 +676,17 @@ find_tail_calls (basic_block bb, struct
>have a copyable type and the two arguments must have reasonably
>equivalent types.  The latter requirement could be relaxed if
>we emitted a suitable type conversion statement.  */
> -   if (!is_gimple_reg_type (TREE_TYPE (param))
> +   if (TREE_ADDRESSABLE (TREE_TYPE (param))
> || !useless_type_conversion_p (TREE_TYPE (param),
>TREE_TYPE (arg)))
>   break;
>  
> -   /* The parameter should be a real operand, so that phi node
> -  created for it at the start of the function has the meaning
> -  of copying the value.  This test implies is_gimple_reg_type
> -  from the previous condition, however this one could be
> -  relaxed by being more careful with copying the new value
> -  of the parameter (emitting appropriate GIMPLE_ASSIGN and
> -  updating the virtual operands).  */
> -   if (!is_gimple_reg (param))
> +   if (is_gimple_reg_type (TREE_TYPE (param))
> +   ? !is_gimple_reg (param)
> +   : (!is_gimple_variable (param)
> +  || TREE_THIS_VOLATILE (param)
> +  || may_be_aliased (param)
> +  || !gimple_call_must_tail_p (call)))
>   break;
>   }
>   }
> @@ -938,9 +936,9 @@ find_tail_calls (basic_block bb, struct
>  param = DECL_CHAIN (param), idx++)
>   {
> tree ddef, arg = gimple_call_arg (call, idx);
> -   if (is_gimple_reg (param)
> -   && (ddef = ssa_default_def (cfun, param))
> -   && (arg != ddef))
> +   if (!is_gimple_reg (param)
> +   || ((ddef = ssa_default_def (cfun, param))
> +   && arg != ddef))
>   bitmap_set_bit (tailr_arg_needs_copy, idx);
>   }
>  }
> @@ -1216,6 +1214,7 @@ eliminate_tail_call (struct tailcall *t,
>  
>/* Add phi node entries for arguments.  The ordering of the phi nodes 
> should
>   be the same as the ordering of the arguments.  */
> +  auto_vec copies;
>for (param = DECL_ARGUMENTS (current_function_decl),
>idx = 0, gpi = gsi_start_phis (first);
> param;
> @@ -1224,6 +1223,35 @@ eliminate_tail_call (struct tailcall *t,
>if (!bitmap_bit_p (tailr_arg_needs_copy, idx))
>   continue;
>  
> +  if (!is_gimple_reg_type (TREE_TYPE (param)))
> + {
> +   if (param == gimple_call_arg (stmt, idx))
> + continue;
> +   /* First check if param isn't used by any of the following
> +  call arguments.  If it is, we need to copy first to
> +  a temporary and only after doing all the assignments copy it
> +  to param.  */
> +   size_t idx2 = idx + 1;
> +   tree

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Michael Matz

Hello,

On Mon, 31 Mar 2025, Bill Wendling wrote:

> 1. The use of '__self' isn't feasible, so we won't use it.

That's a bold statement.  How's that?  The only thing I read here is that 
the very spelling of "self" was objected to.  So, call it _Self, 
_Selfref, or something.  Even _Whatever42 would be better.

Lookup rule changes only for identifiers used in attributes (all of them? 
some of them?) is ... well, not good language design?  I think you used 
the word "overreaching" upthread, correctly so.  And that for wanting to 
avoid the established member-lookup syntax A.B by way of "we won't use 
it"?  That seems a terrible tradeoff.  I don't understand the push for 
that approach, at all.


Ciao,
Michael.

Re: [PATCH] combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> > > Looking at this some more today, I think we should special case
> > > set_noop_p because that can be put into i2 (except for the JUMP_P
> > > violations), currently both modified_between_p (pc_rtx, i2, i3)
> > > and reg_used_between_p (pc_rtx, i2, i3) returns false.
> > > I'll post a patch incrementally for that (but that feels like
> > > new optimization, so probably not something that should be backported).
> > 
> > Trying to review this I noticed that both the comment in combine suggests
> > that memory set is important and modified_between_p handles MEM_P
> > 'x' and for non-REG 'x' possibly MEM operands (CONCAT?) while
> > reg_used_between_p only handles REG_P 'reg'.  Also shouldn't
> > you use reg_referenced_p?  At least that function seems to handle
> > SET_SRC/SET_DEST separately?
> > 
> > Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
> > does the comment talk about memory?
> 
> I was worried about making too risky changes this late in stage4
> (and especially also for backports).  Most of this code is 1992-ish.
> I think many of the functions are just misnamed, the reg_ in there doesn't
> match what those functions do (bet they initially supported just REGs
> and later on support for other kinds of expressions was added, but haven't
> done git archeology to prove that).
> 
> What we know for sure is:
>&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
>&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
>&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
>&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
> that is checked earlier in the condition.
> Then it calls
>&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
>   XVECEXP (newpat, 0, 0))
>&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
>   XVECEXP (newpat, 0, 1))
> While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
> which is also misnamed, that function handles just fine all of
> REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
> STRICT_LOW_PART, PC and even some further cases.
> So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
> of REG, PC (at least the REG and PC cases are triggered on the testcase)
> and quite possibly also MEM (SUBREG of MEM not, see below).
> 
> Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
> function for constants just returns false, for PC returns true, for REG
> returns reg_set_between_p, for MEM recurses on the address, for
> MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
> whether the memory could have been modified in between, for all other
> rtxes recurses on the subrtxes.  This part didn't change in my patch.
> 
> I've only changed those
> - && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
> + && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
> where the former has been described above and clearly handles all of
> REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.
> 
> The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
> instruction in between i2 and i3.  So, there is clearly a difference
> in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
> returns unconditionally true even if there are no instructions in between,
> but reg_used_between_p if there are no non-debug insns in between returns
> false.  Sorry for missing that, guess I should check for that (with the
> exception of the noop moves which are often (set (pc) (pc)) and handled
> by the incremental patch).  In fact not just that, reg_used_between_p
> will only return true for PC if it is mentioned anywhere in the insns
> in between.
> Anyway, except for that, for REG it calls refers_to_regno_p
> and so should find any occurrences of any of the REG or parts of it for hard
> registers, for MEM returns true if it sees any MEMs in insns in between
> (conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
> (so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
> SUBREG_REG, PC I've already described.
> 
> Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
> already the initial
>&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
>   XVECEXP (newpat, 0, 0))
>&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
>   XVECEXP (newpat, 0, 1))
> calls would have failed --enable-checking=rtl or would have misbehaved, so
> I think there is no need to check for it further.
> 
> To your question why I don't use reg_referenced_p, that is because
> reg_refere

Re: [PATCH] Libstdc++: Fix bootstrap failure for cross without tm.tm_zone [PR119550]

2025-04-01 Thread Jonathan Wakely

On Tue, 1 Apr 2025 at 11:34, Tomasz Kaminski  wrote:
>
>
>
> On Mon, Mar 31, 2025 at 7:28 PM Jonathan Wakely  wrote:
>>
>> In r15-8491-g778c28c70f8573 I added a use of the Autoconf macro
>> AC_STRUCT_TIMEZONE, but that requires a link-test for the global tzname
>> object if tm.tm_zone isn't supported. That link-test isn't allowed for
>> cross-compilation, so bootstrap fails if tm.tm_zone isn't supported.
>>
>> Since libstdc++ only cares about tm.tm_zone and won't use tzname anyway,
>> we don't need the link-test. Replace AC_STRUCT_TIMEZONE with a custom
>> macro that only checks for tm.tm_zone. We can improve on the Autoconf
>> macro by checking it's a suitable type, which isn't actually checked by
>> AC_STRUCT_TIMEZONE.
>>
>> libstdc++-v3/ChangeLog:
>>
>> PR libstdc++/119550
>> * acinclude.m4 (GLIBCXX_STRUCT_TM_TM_ZONE): New macro.
>> * config.h.in: Regenerate.
>> * configure: Regenerate.
>> * configure.ac: Use GLIBCXX_STRUCT_TM_TM_ZONE.
>> * include/bits/chrono_io.h (__formatter_chrono::_M_c): Check
>> _GLIBCXX_USE_STRUCT_TM_TM_ZONE instead of
>> _GLIBCXX_HAVE_STRUCT_TM_TM_ZONE.
>> ---
>>
>> Testing x86_64-linux and sparc-solaris2.11, looks fine so far.
>>
>>  libstdc++-v3/acinclude.m4 |  35 
>>  libstdc++-v3/config.h.in  |  21 +--
>>  libstdc++-v3/configure| 238 --
>>  libstdc++-v3/configure.ac |   5 +-
>>  libstdc++-v3/include/bits/chrono_io.h |   2 +-
>>  5 files changed, 109 insertions(+), 192 deletions(-)
>>
>> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
>> index e668d2dba27..02fd349e11d 100644
>> --- a/libstdc++-v3/acinclude.m4
>> +++ b/libstdc++-v3/acinclude.m4
>> @@ -5744,6 +5744,41 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [
>>fi
>>  ])
>>
>> +dnl
>> +dnl Check for a tm_zone member in struct tm.
>> +dnl
>> +dnl This member is defined as const char* in Glibc, newlib, POSIX.1-2024,
>> +dnl and as char* in BSD (including macOS).
>> +dnl
>> +dnl Defines:
>> +dnl  _GLIBCXX_USE_STRUCT_TM_TM_ZONE if struct tm has a tm_zone member.
>> +dnl
>> +AC_DEFUN([GLIBCXX_STRUCT_TM_TM_ZONE], [
>> +
>> +  AC_LANG_SAVE
>
> From documentation this is deprecated in favor of AC_LANG_PUSH.

It looks like that is supported by autoconf 2.69 and is already used
elsewhere in GCC. We don't currently use it in libstdc++ but it should
be safe to do so.

>> +  AC_LANG_CPLUSPLUS
>> +  ac_save_CXXFLAGS="$CXXFLAGS"
>> +  CXXFLAGS="$CXXFLAGS -std=c++20"
>
> The program that is compiled does not seem to require C++20.
> If we change the declaration of "t" to use "= {}", we could do it in C++98.
> Any reason to adjust the flags at all?

We only need to use the tm_zone member in C++20 mode, and it's
possible that on some platform it's not exposed for older standards
(very unlikely, but possible). I wanted to test for exactly what we
require. I don't feel strongly about it though.


>>
>> +
>> +  AC_CACHE_CHECK([for tm_zone member of struct tm], glibcxx_cv_tm_zone, [
>> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([#include 
>> +   ],
>> +   [struct tm t{}; t.tm_zone = (char*)0;]
>> +   )],
>> +   [glibcxx_cv_tm_zone=yes],
>> +   [glibcxx_cv_tm_zone=no]
>> +  )
>> +])
>> +
>> +  if test $glibcxx_cv_tm_zone = yes; then
>> +AC_DEFINE(_GLIBCXX_USE_STRUCT_TM_TM_ZONE, 1,
>> + [Define if struct tm has a tm_zone member.])
>> +  fi
>> +
>> +  CXXFLAGS="$ac_save_CXXFLAGS"
>> +  AC_LANG_RESTORE
>> +])
>> +
>>  dnl
>>  dnl Check whether lock tables can be aligned to avoid false sharing.
>>  dnl
>> diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
>> index be151f43dd6..77bbaf1beaa 100644
>> --- a/libstdc++-v3/config.h.in
>> +++ b/libstdc++-v3/config.h.in
>> @@ -74,10 +74,6 @@
>> don't. */
>>  #undef HAVE_DECL_STRNLEN
>>
>> -/* Define to 1 if you have the declaration of `tzname', and to 0 if you 
>> don't.
>> -   */
>> -#undef HAVE_DECL_TZNAME
>> -
>>  /* Define to 1 if you have the  header file. */
>>  #undef HAVE_DIRENT_H
>>
>> @@ -412,9 +408,6 @@
>>  /* Define to 1 if `d_type' is a member of `struct dirent'. */
>>  #undef HAVE_STRUCT_DIRENT_D_TYPE
>>
>> -/* Define to 1 if `tm_zone' is a member of `struct tm'. */
>> -#undef HAVE_STRUCT_TM_TM_ZONE
>> -
>>  /* Define if strxfrm_l is available in . */
>>  #undef HAVE_STRXFRM_L
>>
>> @@ -506,17 +499,9 @@
>>  /* Define to 1 if the target supports thread-local storage. */
>>  #undef HAVE_TLS
>>
>> -/* Define to 1 if your `struct tm' has `tm_zone'. Deprecated, use
>> -   `HAVE_STRUCT_TM_TM_ZONE' instead. */
>> -#undef HAVE_TM_ZONE
>> -
>>  /* Define if truncate is available in . */
>>  #undef HAVE_TRUNCATE
>>
>> -/* Define to 1 if you don't have `tm_zone' but do have the external array
>> -   `tzname'. */
>> -#undef HAVE_TZNAME
>> -
>>  /* Define to 1 if you have the  header file. */
>>  #undef HAVE_UCHAR_H
>>
>> @@ -605,9 +590,6 @@
>>  /* Define to 1 if you have the ANSI C header fil

Re: [PATCH v2 0/4] Libsanitizer improvements

2025-04-01 Thread Richard Biener

On Mon, Mar 17, 2025 at 2:24 PM Aleksandar Rakic
 wrote:
>
> This patch series improves the libsanitizer for the mips target
> in GCC.

You should send these improvements to upstream libsanitizer instead.

> These patches are cherry-picked from the mips_rel/11_2_0/master
> and mips_rel/9_3_0/master branches from the MIPS' repository:
> https://github.com/MIPS/gcc .
> Further details on the individual changes are included in the
> respective patches.

[PATCH] tailr: Punt on tail recursions that would break musttail [PR119493]

2025-04-01 Thread Jakub Jelinek

Hi!

While working on the previous tailc patch, I've noticed the following
problem.
The testcase below fails, because we decide to tail recursion optimize
the call, but tail recursion (as documented in tree-tailcall.cc) needs to
add some result multiplication and/or addition if any tail recursion uses
accumulator, which is added right before the return.
So, if there are musttail non-recurive calls in the function, successful
tail recursion optimization will mean we'll later error on the musttail
calls.  musttail recursive calls are ok, those would be tail recursion
optimized.

So, the following patch punts on all tail recursion optimizations if it
needs accumulators (add and/or mult) if there is at least one non-recursive
musttail call.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-01  Jakub Jelinek  

PR tree-optimization/119493
* tree-tailcall.cc (tree_optimize_tail_calls_1): Ignore tail recursion
candidates which need accumulators if there is at least one musttail
non-recursive call.

* gcc.dg/pr119493-2.c: New test.

--- gcc/tree-tailcall.cc.jj 2025-03-31 13:00:58.288725496 +0200
+++ gcc/tree-tailcall.cc2025-03-31 14:34:31.285106333 +0200
@@ -1373,6 +1373,38 @@ tree_optimize_tail_calls_1 (bool opt_tai
   live_vars = NULL;
 }
 
+  if (cfun->has_musttail)
+{
+  /* We can't mix non-recursive must tail calls with tail recursive
+calls which require accumulators, because in that case we have to
+emit code in between the musttail calls and return, which prevent
+calling them as tail calls.  So, in that case give up on the
+tail recursion.  */
+  for (act = tailcalls; act; act = act->next)
+   if (!act->tail_recursion)
+ {
+   gcall *call = as_a  (gsi_stmt (act->call_gsi));
+   if (gimple_call_must_tail_p (call))
+ break;
+ }
+  if (act)
+   for (struct tailcall **p = &tailcalls; *p; )
+ {
+   if ((*p)->tail_recursion && ((*p)->add || (*p)->mult))
+ {
+   struct tailcall *a = *p;
+   *p = (*p)->next;
+   gcall *call = as_a  (gsi_stmt (a->call_gsi));
+   maybe_error_musttail (call,
+ _("tail recursion with accumulation "
+   "mixed with musttail "
+   "non-recursive call"), diag_musttail);
+   free (a);
+ }
+   else
+ p = &(*p)->next;
+ }
+}
   /* Construct the phi nodes and accumulators if necessary.  */
   a_acc = m_acc = NULL_TREE;
   for (act = tailcalls; act; act = act->next)
--- gcc/testsuite/gcc.dg/pr119493-2.c.jj2025-03-31 14:41:18.977464154 
+0200
+++ gcc/testsuite/gcc.dg/pr119493-2.c   2025-03-31 14:45:24.327068682 +0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/119493 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2 -fdump-tree-tailr1-details" } */
+/* { dg-final { scan-tree-dump-times "tail recursion with accumulation mixed 
with musttail non-recursive call" 2 "tailr1" } } */
+
+[[gnu::noipa]] int
+bar (int x, int y)
+{
+  return x + y;
+}
+
+[[gnu::noinline, gnu::noclone]] int
+foo (int x, int y)
+{
+  if (x < 10)
+[[gnu::musttail]] return bar (x, y);
+  if (y & 2)
+return foo (x - 1, y) * 2;
+  if (y & 1)
+[[gnu::musttail]] return foo (x - 1, y);
+  return foo (x - 1, y) * 3;
+}

Jakub

Re: [PATCH] phiprop: Avoid proping loads into loops [PR116835]

2025-04-01 Thread Richard Biener

On Tue, Apr 1, 2025 at 6:10 AM Andrew Pinski  wrote:
>
> phiprop can sometimes prop loads back into loops
> and in some cases cause wrong code when the load
> was from a weak symbol as now it becomes an unconditional
> load before the loop.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR tree-optimization/116835
>
> gcc/ChangeLog:
>
> * tree-ssa-phiprop.cc (propagate_with_phi): Check
> the use is at the same or deeper loop depth than
> the phi node.
> (pass_phiprop::execute): Initialize loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116835.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/torture/pr116835.c | 33 +
>  gcc/tree-ssa-phiprop.cc | 14 ++-
>  2 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116835.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116835.c 
> b/gcc/testsuite/gcc.dg/torture/pr116835.c
> new file mode 100644
> index 000..31d3b59d945
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116835.c
> @@ -0,0 +1,33 @@
> +/* { dg-do run { target { weak_undefined } } } */
> +/* { dg-add-options weak_undefined } */
> +
> +/* PR tree-optimization/116835 */
> +/* phiprop would prop into the loop the load of b
> +   and also prop the load of a before the loop.
> +   Which is incorrect as a is a weak symbol.  */
> +
> +struct s1
> +{
> +  int t;
> +  int t1;
> +};
> +typedef struct s1 type;
> +extern type a __attribute__((weak));
> +int t;
> +type b;
> +type bar (int c) __attribute__((noipa, noinline));
> +type
> +bar (int c)
> +{
> +  t = 1;
> +  type *p = &a;
> +  for (int j = 0; j < c; ++j)
> +p = &b;
> +  return *p;
> +}
> +
> +int main(void)
> +{
> + if (bar(&a == nullptr).t)
> +   __builtin_abort();
> +}
> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> index a2e1fb16a30..c5fe231d3e9 100644
> --- a/gcc/tree-ssa-phiprop.cc
> +++ b/gcc/tree-ssa-phiprop.cc
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-loop.h"
>  #include "tree-cfg.h"
>  #include "tree-ssa-dce.h"
> +#include "cfgloop.h"
>
>  /* This pass propagates indirect loads through the PHI node for its
> address to make the load source possibly non-addressable and to
> @@ -348,11 +349,17 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
> phiprop_d *phivn,
> calculate_dominance_info (CDI_POST_DOMINATORS);
>
>/* Only replace loads in blocks that post-dominate the PHI node.  That
> - makes sure we don't end up speculating loads.  */
> +makes sure we don't end up speculating loads.  */
>if (!dominated_by_p (CDI_POST_DOMINATORS,
>bb, gimple_bb (use_stmt)))
> continue;

I think the testcase shows the above test is not sufficient to avoid doing
invalid speculation.  We need to verify the assumption that the
PHI is dereferenced each time it is "executed".
With cycles SSA form and post-dominance are not helpful here.

I think we have to amend the post-dominance check to deal with
SSA cycles, but your check does not look fully correct (and its
comment is misleading).
We need to constrain the load to be in the same or a subloop (use
flow_loop_nested_p,
not loop depth) or in the same BB when either the load or the PHI is
in an irreducible region.

>
> +  /* Only replace loads in blocks are in the same loop
> +are inside an deeper loop.  This is to make sure not
> +to prop back into the loop.  */
> +  if (bb_loop_depth (gimple_bb (use_stmt)) < bb_loop_depth (bb))
> +   continue;
> +

So something like

   /* Amend the post-dominance check for SSA cycles, we need to
  make sure each PHI result value is dereferenced.  */
   if (!(gimple_bb (use_stmt) == bb
 || (!(bb->flags & BB_IRREDUCIBLE_LOOP)
&& !(gimple_bb (use_stmt)->flags & BB_IRREDUCIBLE_LOOP)
&& (bb->loop_father == gimple_bb (use_stmt)->loop_father
  || flow_loop_nested_p (bb->loop_father, gimple_bb
(use_stmt)->loop_father)
continue;

>/* Check whether this is a load of *ptr.  */
>if (!(is_gimple_assign (use_stmt)
> && gimple_assign_rhs_code (use_stmt) == MEM_REF
> @@ -520,6 +527,10 @@ pass_phiprop::execute (function *fun)
>size_t n;
>auto_bitmap dce_ssa_names;
>
> +  /* Find the loops, so that we can prevent moving loads in
> + them.  */
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +
>calculate_dominance_info (CDI_DOMINATORS);
>
>n = num_ssa_names;
> @@ -548,6 +559,7 @@ pass_phiprop::execute (function *fun)
>free (phivn);
>
>free_dominance_info (CDI_POST_DOMINATORS);
> +  loop_optimizer_finalize ();
>
>return did_something ? TODO_update_ssa_only_virtuals : 0;
>  }
> --
> 2.43.0
>

[PATCH][v2] target/119549 - fixup handling of -mno-sse4 in target attribute

2025-04-01 Thread Richard Biener

The following fixes ix86_valid_target_attribute_inner_p to properly
handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
I've added asserts to ix86_handle_option that RejectNegative is honored
for both.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Process negated OPT_msse4 as OPT_mno_sse4.

* gcc.target/i386/pr119549.c: New testcase.
---
 gcc/common/config/i386/i386-common.cc|  2 ++
 gcc/config/i386/i386-options.cc  |  7 +++
 gcc/testsuite/gcc.target/i386/pr119549.c | 15 +++
 3 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr119549.c

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 80aec3244bc..4815fbc4d35 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1519,11 +1519,13 @@ ix86_handle_option (struct gcc_options *opts,
   return true;
 
 case OPT_msse4:
+  gcc_assert (value != 0);
   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_SSE4_SET;
   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_SET;
   return true;
 
 case OPT_mno_sse4:
+  gcc_assert (value != 0);
   opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_SSE4_UNSET;
   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_SSE4_UNSET;
   opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_SSE4_UNSET;
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 91f59e16fd5..a9fac011f3d 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1271,6 +1271,13 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
args, char *p_strings[],
}
}
 
+  /* Fixup -msse4 which is RejectNegative to -mno-sse4 when negated.  */
+  if (opt == OPT_msse4 && !opt_set_p)
+   {
+ opt = OPT_mno_sse4;
+ opt_set_p = true;
+   }
+
   /* Process the option.  */
   if (opt == N_OPTS)
{
diff --git a/gcc/testsuite/gcc.target/i386/pr119549.c 
b/gcc/testsuite/gcc.target/i386/pr119549.c
new file mode 100644
index 000..a465bec3cf0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr119549.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-msse4" } */
+
+typedef long long v2di __attribute__((vector_size(16)));
+
+static inline __attribute__((always_inline))
+int rte_trace_feature_is_enabled() { return 1; } /* { dg-error "inlining 
failed" } */
+
+void __attribute__((target ("no-sse3"))) __attribute__((target ("no-sse4")))
+rte_eal_trace_generic_void_init(void)
+{
+  if (!rte_trace_feature_is_enabled()) return;
+  __asm__ volatile ("" : : : "memory");
+}
+
-- 
2.43.0

Re: [PATCH] tailc: Improve tail recursion handling [PR119493]

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 10:46:15AM +0200, Richard Biener wrote:
> This looks OK, but I wonder if ...
> > - /* The parameter should be a real operand, so that phi node
> > -created for it at the start of the function has the meaning
> > -of copying the value.  This test implies is_gimple_reg_type
> > -from the previous condition, however this one could be
> > -relaxed by being more careful with copying the new value
> > -of the parameter (emitting appropriate GIMPLE_ASSIGN and
> > -updating the virtual operands).  */
> > - if (!is_gimple_reg (param))
> > + if (is_gimple_reg_type (TREE_TYPE (param))
> > + ? !is_gimple_reg (param)
> 
> ... we want to restrict this to musttail calls at this point and
> relax for stage1 only?

I can do that, sure.

Jakub

Re: [PATCH] tailr: Punt on tail recursions that would break musttail [PR119493]

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 10:50:01AM +0200, Richard Biener wrote:
> > While working on the previous tailc patch, I've noticed the following
> > problem.
> > The testcase below fails, because we decide to tail recursion optimize
> > the call, but tail recursion (as documented in tree-tailcall.cc) needs to
> > add some result multiplication and/or addition if any tail recursion uses
> > accumulator, which is added right before the return.
> > So, if there are musttail non-recurive calls in the function, successful
> > tail recursion optimization will mean we'll later error on the musttail
> > calls.  musttail recursive calls are ok, those would be tail recursion
> > optimized.
> > 
> > So, the following patch punts on all tail recursion optimizations if it
> > needs accumulators (add and/or mult) if there is at least one non-recursive
> > musttail call.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> OK.
> 
> Do we update cfun->has_musttail at some point for optimized away
> musttail calls with respect to inlining/DCE?  I do wonder whether

I think we don't update it at all.
During inlining that is not a problem, because we clear musttail in functions
being inlined unless the call being inlined was musttail too and in that case
the caller should have cfun->has_musttail already.
And I think it isn't a big deal when it might be pessimistically set
even when the musttail calls are DCEd, in tree-tailcall.cc we can perform
some extra work if cfun->has_musttail is set, and in case of -O0/-Og even
that means running an entirely new pass, but it shouldn't be that expensive
and if we don't actually find any musttail calls, it behaves pretty much
the same as if cfun->has_musttail wasn't set (there are some cases where
with !cfun->has_musttail we punt earlier than with it, but if there are
no musttail calls, it shouldn't make a difference in the end except for
perhaps different details in -fdump-tree-tail{r,c}-details dump.

Jakub

[PATCH] tailc, v2: Improve tail recursion handling [PR119493]

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 10:59:36AM +0200, Jakub Jelinek wrote:
> On Tue, Apr 01, 2025 at 10:46:15AM +0200, Richard Biener wrote:
> > This looks OK, but I wonder if ...
> > > -   /* The parameter should be a real operand, so that phi node
> > > -  created for it at the start of the function has the meaning
> > > -  of copying the value.  This test implies is_gimple_reg_type
> > > -  from the previous condition, however this one could be
> > > -  relaxed by being more careful with copying the new value
> > > -  of the parameter (emitting appropriate GIMPLE_ASSIGN and
> > > -  updating the virtual operands).  */
> > > -   if (!is_gimple_reg (param))
> > > +   if (is_gimple_reg_type (TREE_TYPE (param))
> > > +   ? !is_gimple_reg (param)
> > 
> > ... we want to restrict this to musttail calls at this point and
> > relax for stage1 only?
> 
> I can do that, sure.

Here it is, ok if it passes bootstrap/regtest?  I'll queue the interdiff
between this patch and the previous one for GCC 16.

2025-04-01  Jakub Jelinek  

PR tree-optimization/119493
* tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion
if some arguments don't have is_gimple_reg_type, only punt if they
have non-POD types, or volatile, or addressable or (for now) it is
not a musttail call.  Set tailr_arg_needs_copy in those cases too.
(eliminate_tail_call): Copy call arguments to params if they don't
have is_gimple_reg_type, use temporaries if the argument is used
later.
(tree_optimize_tail_calls_1): Skip !is_gimple_reg_type
tailr_arg_needs_copy parameters.  Formatting fix.

* gcc.dg/pr119493-1.c: New test.

--- gcc/tree-tailcall.cc.jj 2025-03-28 10:49:30.0 +0100
+++ gcc/tree-tailcall.cc2025-04-01 11:47:31.417637335 +0200
@@ -676,19 +676,17 @@ find_tail_calls (basic_block bb, struct
 have a copyable type and the two arguments must have reasonably
 equivalent types.  The latter requirement could be relaxed if
 we emitted a suitable type conversion statement.  */
- if (!is_gimple_reg_type (TREE_TYPE (param))
+ if (TREE_ADDRESSABLE (TREE_TYPE (param))
  || !useless_type_conversion_p (TREE_TYPE (param),
 TREE_TYPE (arg)))
break;
 
- /* The parameter should be a real operand, so that phi node
-created for it at the start of the function has the meaning
-of copying the value.  This test implies is_gimple_reg_type
-from the previous condition, however this one could be
-relaxed by being more careful with copying the new value
-of the parameter (emitting appropriate GIMPLE_ASSIGN and
-updating the virtual operands).  */
- if (!is_gimple_reg (param))
+ if (is_gimple_reg_type (TREE_TYPE (param))
+ ? !is_gimple_reg (param)
+ : (!is_gimple_variable (param)
+|| TREE_THIS_VOLATILE (param)
+|| may_be_aliased (param)
+|| !gimple_call_must_tail_p (call)))
break;
}
}
@@ -938,9 +936,9 @@ find_tail_calls (basic_block bb, struct
   param = DECL_CHAIN (param), idx++)
{
  tree ddef, arg = gimple_call_arg (call, idx);
- if (is_gimple_reg (param)
- && (ddef = ssa_default_def (cfun, param))
- && (arg != ddef))
+ if (!is_gimple_reg (param)
+ || ((ddef = ssa_default_def (cfun, param))
+ && arg != ddef))
bitmap_set_bit (tailr_arg_needs_copy, idx);
}
 }
@@ -1216,6 +1214,7 @@ eliminate_tail_call (struct tailcall *t,
 
   /* Add phi node entries for arguments.  The ordering of the phi nodes should
  be the same as the ordering of the arguments.  */
+  auto_vec copies;
   for (param = DECL_ARGUMENTS (current_function_decl),
 idx = 0, gpi = gsi_start_phis (first);
param;
@@ -1224,6 +1223,35 @@ eliminate_tail_call (struct tailcall *t,
   if (!bitmap_bit_p (tailr_arg_needs_copy, idx))
continue;
 
+  if (!is_gimple_reg_type (TREE_TYPE (param)))
+   {
+ if (param == gimple_call_arg (stmt, idx))
+   continue;
+ /* First check if param isn't used by any of the following
+call arguments.  If it is, we need to copy first to
+a temporary and only after doing all the assignments copy it
+to param.  */
+ size_t idx2 = idx + 1;
+ tree param2 = DECL_CHAIN (param);
+ for (; param2; param2 = DECL_CHAIN (param2), idx2++)
+   if (!is_gimple_reg_type (TREE_TYPE (param)))
+ {
+   tree base = get_base_address (gimple_call_arg (stmt, idx2));
+

Re: [pushed] Only write gcov when file output is on [PR119553]

2025-04-01 Thread Richard Biener

On Mon, Mar 31, 2025 at 7:54 PM Jørgen Kvalsvik  wrote:
>
> gcov_write_* functions must be guarded so they only are called when
> output_to_file is true, like for -fcondition-coverage, otherwise it
> triggers an invalid read as detected by valgrind. The gcno file is
> mostly written to from profile.cc, so it doesn't make too much sense
> to hide it in path-coverage.cc. The find_paths name was a bit
> imprecise, and is renamed to instrument_prime_paths.

OK.

Richard.

> PR gcov-profile/119553
>
> gcc/ChangeLog:
>
> * path-coverage.cc (find_paths): Return path count, don't
> write to gcno, and rename to ...
> (instrument_prime_paths): ... this.
> * profile.cc (branch_prob): Write path counts to gcno.
> ---
>  gcc/path-coverage.cc | 15 ++-
>  gcc/profile.cc   | 12 ++--
>  2 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/path-coverage.cc b/gcc/path-coverage.cc
> index 55b39298603..55058fd04ff 100644
> --- a/gcc/path-coverage.cc
> +++ b/gcc/path-coverage.cc
> @@ -504,8 +504,8 @@ flush_on_gsi (gimple_stmt_iterator *gsi, size_t bucket, 
> tree local, tree mask,
> bit N%64 in bucket N/64.  For large functions, an individual basic block
> will only be part of a small subset of paths, and by extension buckets and
> local counters.  Only the necessary counters are read and written.  */
> -void
> -find_paths (struct function *fn)
> +unsigned
> +instrument_prime_paths (struct function *fn)
>  {
>mark_dfs_back_edges (fn);
>vec> paths = find_prime_paths (fn);
> @@ -515,7 +515,7 @@ find_paths (struct function *fn)
>warning_at (fn->function_start_locus, OPT_Wcoverage_too_many_paths,
>   "paths exceeding limit, giving up path coverage");
>release_vec_vec (paths);
> -  return;
> +  return 0;
>  }
>
>tree gcov_type_node = get_gcov_type ();
> @@ -526,14 +526,9 @@ find_paths (struct function *fn)
>if (!coverage_counter_alloc (GCOV_COUNTER_PATHS, nbuckets))
>  {
>release_vec_vec (paths);
> -  return;
> +  return 0;
>  }
>
> -  gcov_position_t offset {};
> -  offset = gcov_write_tag (GCOV_TAG_PATHS);
> -  gcov_write_unsigned (paths.length ());
> -  gcov_write_length (offset);
> -
>hash_map  ands;
>hash_map  iors;
>hash_map  flushes;
> @@ -771,6 +766,8 @@ find_paths (struct function *fn)
> }
>  }
>
> +  const unsigned npaths = paths.length ();
>blocks.release ();
>release_vec_vec (paths);
> +  return npaths;
>  }
> diff --git a/gcc/profile.cc b/gcc/profile.cc
> index 0b222cf3864..c0f5097726b 100644
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1611,9 +1611,17 @@ branch_prob (bool thunk)
> instrument_values (values);
>  }
>
> -  void find_paths (struct function*);
> +  unsigned instrument_prime_paths (struct function*);
>if (path_coverage_flag)
> -find_paths (cfun);
> +{
> +  const unsigned npaths = instrument_prime_paths (cfun);
> +  if (output_to_file)
> +   {
> + gcov_position_t offset = gcov_write_tag (GCOV_TAG_PATHS);
> + gcov_write_unsigned (npaths);
> + gcov_write_length (offset);
> +   }
> +}
>
>free_aux_for_edges ();
>
> --
> 2.39.5
>

[PATCH v2] RISC-V: Tweak testcase for PIE

2025-04-01 Thread Kito Cheng

Linux toolchain may configured with --enable-default-pie, and that will
cause lots of regression test failures because the function name will
append with @plt suffix (e.g. `call foo` become `call foo@plt`), also
some code generation  will different due to the code model like the address
generation for global variable, so we may add -fno-pie to those
testcases to prevent that.

We may consider just drop @plt suffix to prevent that at all, because
it's not difference between w/ and w/o @plt suffix, the linker will pick
the right one to do, however it's late stage of GCC development, so just
tweak the testcase should be the best way to do now.

Changes from v1:
- Add more testcase for PIE (from rvv.exp).
- Tweak the rule for match @plt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Tweak testcase for PIE.
* gcc.target/riscv/rv32e_zcmp.c: Likewise.
* gcc.target/riscv/zcmp_stack_alignment.c: Likewise.
* gcc.target/riscv/cm_mv_rv32.c: Likewise.
* gcc.target/riscv/cpymem-64.c: Likewise.
* gcc.target/riscv/fmax-snan.c: Likewise.
* gcc.target/riscv/fmaxf-snan.c: Likewise.
* gcc.target/riscv/fmin-snan.c: Likewise.
* gcc.target/riscv/fminf-snan.c: Likewise.
* gcc.target/riscv/large-model.c: Likewise.
* gcc.target/riscv/predef-1.c: Likewise.
* gcc.target/riscv/predef-4.c: Likewise.
* gcc.target/riscv/predef-7.c: Likewise.
* gcc.target/riscv/predef-9.c: Likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Likewise.
* gcc.target/riscv/rvv/base/cmpmem-1.c: Likewise.
* gcc.target/riscv/rvv/base/cmpmem-3.c: Likewise.
* gcc.target/riscv/rvv/base/cmpmem-4.c: Likewise.
* gcc.target/riscv/rvv/base/cpymem-1.c: Likewise.
* gcc.target/riscv/rvv/base/movmem-1.c: Likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-1.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise.
* gcc.target/riscv/rvv/base/spill-9.c: Likewise.
* g++.target/riscv/mv-symbols1.C: Likewise.
* g++.target/riscv/mv-symbols3.C: Likewise.
* g++.target/riscv/mv-symbols4.C: Likewise.
* g++.target/riscv/mv-symbols5.C: Likewise.
* g++.target/riscv/mvc-symbols1.C: Likewise.
* g++.target/riscv/mvc-symbols3.C: Likewise.
---
 gcc/testsuite/g++.target/riscv/mv-symbols1.C| 4 ++--
 gcc/testsuite/g++.target/riscv/mv-symbols3.C| 4 ++--
 gcc/testsuite/g++.target/riscv/mv-symbols4.C| 4 ++--
 gcc/testsuite/g++.target/riscv/mv-symbols5.C| 4 ++--
 gcc/testsuite/g++.target/riscv/mvc-symbols1.C   | 4 ++--
 gcc/testsuite/g++.target/riscv/mvc-symbols3.C   | 4 ++--
 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/cpymem-64.c  | 4 ++--
 gcc/testsuite/gcc.target/riscv/fmax-snan.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/fmaxf-snan.c | 2 +-
 gcc/testsuite/gcc.target/riscv/fmin-snan.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/fminf-snan.c | 2 +-
 gcc/testsuite/gcc.target/riscv/large-model.c| 2 +-
 gcc/testsuite/gcc.target/riscv/predef-1.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/predef-4.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/predef-7.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/predef-9.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c | 6 +++---
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c | 6 +++---
 .../riscv/rvv/base/abi-callee-saved-2-save-restore.c| 6 +++---
 .../gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c | 6 +++---
 .../gcc.target/riscv/rvv/base/abi-callee-saved-2.c  | 6 +++---
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c  | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c  | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c  | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c  | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c| 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/spill-9.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c   | 2 +-
 33 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/gcc/testsuite/g++.targ

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Uros Bizjak

On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu  wrote:
>
> On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For spiltter after 3_mask it now splits the pattern
> > to *3_mask, causing the splitter doesn't generate
> > nf variant. Add corresponding nf counterpart for define_insn_and_split
> > to make the splitter also works for nf insn.
> >
> > Bootstrapped & regtested on x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/119539
> > * config/i386/i386.md (*3_mask_nf): New
> > define_insn_and_split.
> > (*3_mask_1_nf): Likewise.
> > (*3_mask): Use force_lowpart_subreg.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/119539
> > * gcc.target/i386/apx-nf-pr119539.c: New test.
> > ---
> >  gcc/config/i386/i386.md   | 46 ++-
> >  .../gcc.target/i386/apx-nf-pr119539.c |  6 +++
> >  2 files changed, 50 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index f7f790d2aeb..42312f0c330 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -18131,6 +18131,30 @@ (define_expand "3"
> >DONE;
> >  })
> >
> > +;; Avoid useless masking of count operand.
> > +(define_insn_and_split "*3_mask_nf"
> > +  [(set (match_operand:SWI 0 "nonimmediate_operand")
> > +   (any_rotate:SWI
> > + (match_operand:SWI 1 "nonimmediate_operand")
> > + (subreg:QI
> > +   (and
> > + (match_operand 2 "int248_register_operand" "c")
> > + (match_operand 3 "const_int_operand")) 0)))]
> > +  "TARGET_APX_NF
> > +   && ix86_binary_operator_ok (, mode, operands)
> > +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> > +  == GET_MODE_BITSIZE (mode)-1
> > +   && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > +   (any_rotate:SWI (match_dup 1)
> > +   (match_dup 2)))]
> > +{
> > +  operands[2] = force_lowpart_subreg (QImode, operands[2],
> > + GET_MODE (operands[2]));
> > +})
> Can we just change the output in original pattern, I think combine
> will still match the pattern even w/ clobber flags.
>
> like
>
> @@ -17851,8 +17851,17 @@ (define_insn_and_split "*3_mask"
> (match_dup 2)))
>(clobber (reg:CC FLAGS_REG))])]
>  {
> -  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> -  operands[2] = gen_lowpart (QImode, operands[2]);
> +  if (TARGET_APX_F)
> +{
> +  emit_move_insn (operands[0],
> +gen_rtx_ (mode, operands[1], operands[2]));
> +  DONE;
> +}
> +  else
> +{
> +  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> +  operands[2] = gen_lowpart (QImode, operands[2]);

Please note we have a new "force_lowpart_subreg" function that
operates on a register operand and (if possible) simplifies a subreg
of a subreg, otherwise it forces the operand to register and creates a
subreg of the temporary. Similar to the above combination, with a
possibility to avoid a temporary reg.

> Also we can remove constraint "c" in the original pattern.

The constraint is here to handle corner case, where combine propagated
an insn RTX with fixed register, other than %ecx, into shift RTX.
Register allocator was not able to fix the combination, so
TARGET_LEGITIMATE_COMBINED_INSN hook was introduced that rejected
unwanted combinations. Please see the comment in
ix86_legitimate_combined_insn function.

Perhaps the above is not relevant anymore with the new register
allocator (LRA), and the constraint can indeed be removed. But please
take some caution.

Uros.

Re: [PATCH] combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]

2025-04-01 Thread Richard Biener

On Fri, 28 Mar 2025, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
> We have from earlier combinations
> (insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
> (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
>  (nil))
> (insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
> (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
>  (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
> (nil)))
> (note 24 23 25 4 NOTE_INSN_DELETED)
> (insn 25 24 26 4 (parallel [
> (set (reg:CCZ 17 flags)
> (compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
> (const_int 0 [0])))
> (set (reg/v:SI 116 [ e ])
> (neg:SI (reg:SI 104 [ _7 ])))
> ]) "pr119291.c":26:13 977 {*negsi_2}
>  (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
> (nil)))
> (note 26 25 27 4 NOTE_INSN_DELETED)
> (insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
> (ne:DI (reg:CCZ 17 flags)
> (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
>  (expr_list:REG_DEAD (reg:CCZ 17 flags)
> (nil)))
> and try_combine is called on i3 25 and i2 22 (second time)
> and reach the hunk being patched with simplified i3
> (insn 25 24 26 4 (parallel [
> (set (pc)
> (pc))
> (set (reg/v:SI 116 [ e ])
> (const_int 0 [0]))
> ]) "pr119291.c":28:13 977 {*negsi_2}
>  (expr_list:REG_DEAD (reg:SI 104 [ _7 ])
> (nil)))
> and
> (insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
> (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
>  (nil))
> Now, the try_combine code there attempts to split two independent
> sets in newpat by moving one of them to i2.
> And among other tests it checks
> !modified_between_p (SET_DEST (set1), i2, i3)
> which is certainly needed, if there would be say
> (set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
> in between i2 and i3, we couldn't do that, as that set would overwrite
> the value set by set1 we want to move to the i2 position.
> But in this case pseudo 116 isn't set in between i2 and i3, but used
> (and additionally there is a REG_DEAD note for it).
> 
> This is equally bad for the move, because while the i3 insn
> and later will see the pseudo value that we set, the insn in between
> which uses the value will see a different value from the one that
> it should see.
> 
> As we don't check for that, in the end try_combine succeeds and
> changes the IL to:
> (insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
> (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
>  (nil))
> (insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
> (reg/v:SI 116 [ e ])) 96 {*movsi_internal}
>  (expr_list:REG_DEAD (reg/v:SI 116 [ e ])
> (nil)))
> (note 24 23 25 4 NOTE_INSN_DELETED)
> (insn 25 24 26 4 (set (pc)
> (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
>  (nil))
> (note 26 25 27 4 NOTE_INSN_DELETED)
> (insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
> (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
>  (nil))
> (note, the i3 got turned into a nop and try_combine also modified insn 27).
> 
> The following patch replaces the modified_between_p
> tests with reg_used_between_p, my understanding is that
> modified_between_p is a subset of reg_used_between_p, so one
> doesn't need both.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux,
> powerpc64le-linux and s390x-linux, ok for trunk?
> 
> Looking at this some more today, I think we should special case
> set_noop_p because that can be put into i2 (except for the JUMP_P
> violations), currently both modified_between_p (pc_rtx, i2, i3)
> and reg_used_between_p (pc_rtx, i2, i3) returns false.
> I'll post a patch incrementally for that (but that feels like
> new optimization, so probably not something that should be backported).

Trying to review this I noticed that both the comment in combine suggests
that memory set is important and modified_between_p handles MEM_P
'x' and for non-REG 'x' possibly MEM operands (CONCAT?) while
reg_used_between_p only handles REG_P 'reg'.  Also shouldn't
you use reg_referenced_p?  At least that function seems to handle
SET_SRC/SET_DEST separately?

Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
does the comment talk about memory?

> 2025-03-28  Jakub Jelinek  
> 
>   PR rtl-optimization/119291
>   * combine.cc (try_combine): For splitting of PARALLEL with
>   2 independent SETs into i2 and i3 sets check reg_used_between_p
>   of the SET_DESTs rather than just modified_between_p.
> 
>   * gcc.c-torture/execute/pr119291.c: New test.
> 
> --- gcc/combine.cc.jj 2025-03-25 09:34:33.469102343 +0100
> +++ gcc/combine.cc2025-03-27 09:50:15.768567383 +0100
> @@ -4012,18 +4012,18 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>rtx set1 = XVECEXP (newpat, 0, 1);
>  
>/* Normally, it doesn't matter which of the two is done first, b

Re: [PATCH] testsuite: arm: Fix dg-final in short-vfp-1.c [PR119556]

2025-04-01 Thread Richard Earnshaw (lists)

On 31/03/2025 20:04, Christophe Lyon wrote:
> Recent syntactic fixes enabled the test, but the result was failing.
> 
> It turns out it was missing a space between the register arguments in
> the scan-assembler-times directives.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/119556
>   * gcc.target/arm/short-vfp-1.c: Add missing spaces.
> ---
>  gcc/testsuite/gcc.target/arm/short-vfp-1.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/short-vfp-1.c 
> b/gcc/testsuite/gcc.target/arm/short-vfp-1.c
> index 18d38a58037..f6866c4f601 100644
> --- a/gcc/testsuite/gcc.target/arm/short-vfp-1.c
> +++ b/gcc/testsuite/gcc.target/arm/short-vfp-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-require-effective-target arm_vfp_ok }
> +/* { dg-require-effective-target arm_vfp_ok } */
>  /* { dg-add-options arm_vfp } */
>  
>  int
> @@ -38,8 +38,8 @@ test_sihi (short x)
>return (int)x;
>  }
>  
> -/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+,s[0-9]+} 2 } } 
> */
> -/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+,s[0-9]+} 2 } } 
> */
> -/* { dg-final { scan-assembler-times {vmov\tr[0-9]+,s[0-9]+} 2 } } */
> -/* { dg-final { scan-assembler-times {vmov\ts[0-9]+,r[0-9]+} 2 } } */
> -/* { dg-final { scan-assembler-times {sxth\tr[0-9]+,r[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 2 } 
> } */
> +/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 2 } 
> } */
> +/* { dg-final { scan-assembler-times {vmov\tr[0-9]+, s[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, r[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {sxth\tr[0-9]+, r[0-9]+} 2 } } */

OK.

Changes like this should be considered obvious, BTW.

R.

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote:
> >Changing ix86_valid_target_attribute_inner_p might be even better because
> >OPT_msse4 is RejectNegative option, so !value for it looks weird.
> msse4 is defined as ix86_opt_isa in ix86_valid_target_attribute_inner_p
> 
> 1055IX86_ATTR_ISA ("sse4",  OPT_msse4),
> 
> and would be handled in ix86_handle_option
> 
> 1282  else if (type == ix86_opt_isa)
> 1283{
> 1284  struct cl_decoded_option decoded;
> 1285
> 1286  generate_option (opt, NULL, opt_set_p, CL_TARGET,
> &decoded);
> 1287  ix86_handle_option (opts, opts_set,
> 1288  &decoded, input_location);
> 1289}
> 
> So I think it's already correct here.

The entries are
msse4
Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and 
code generation.

mno-sse4
Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save
Do not support SSE4.1 and SSE4.2 built-in functions and code generation.

so if it is turned into something without RejectNegative,
it will be wrong in the Mask argument, either for the positive or for the
negative case.

I wonder though if msse4 should not be a Target RejectNegative Alias to msse4.2
and mno-sse4 RejectNegative Alias to mno-sse4.1.
Though, unsure if one will still be able to specify it in #pragma GCC target
or target attribute...

Jakub

[PATCH] gimple-low: Diagnose assume attr expressions defining labels which are used as unary && operands outside of those [PR119537]

2025-04-01 Thread Jakub Jelinek

Hi!

The following testcases ICE on invalid code which defines
labels inside of statement expressions and then uses &&label
from code outside of the statement expressions.
The C++ FE diagnoses that with a warning (not specifically for
assume attribute, genericallly about taking address of a label
outside of a statement expression so computed goto could violate
the requirement that statement expression is not entered from
outside of it through a jump into it), the C FE doesn't diagnose
anything.
Normal direct gotos to such labels are diagnosed by both C and C++.
In the assume attribute case it is actually worse than for
addresses of labels in normal statement expressions, in that case
the labels are still in the current function, so invalid program
can still jump to those (and in case of OpenMP/OpenACC where it
is also invalid and stuff is moved to a separate function, such
movement is done post cfg discovery of FORCED_LABELs and worst
case one can run into cases which fail to assemble, but I haven't
succeeded in creating ICE for that).
For assume at -O0 we'd just throw away the assume expression if
it is not a simple condition and so the label is then not defined
anywhere and we ICE during cfg pass.
The gimplify.cc hunks fix that, as we don't have FORCED_LABELs
discovery done yet, it preserves all assume expressions which contain
used user labels.
With that we ICE during IRA, which is upset about an indirect jump
to a label which doesn't exist.
So, the gimple-low.cc hunks add diagnostics of the problem, it gathers
uids of all the user used labels inside of the assume expressions (usually
none) and if it finds any, walks the IL to find uses of those from outside
of those expressions now outlined into separate magic functions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-01  Jakub Jelinek  

PR middle-end/119537
* gimplify.cc (find_used_user_labels): New function.
(gimplify_call_expr): Don't remove complex assume expression at -O0
if it defines any user labels.
* gimple-low.cc: Include diagnostic-core.h.
(assume_labels): New variable.
(diagnose_assume_labels): New function.
(lower_function_body): Call it via walk_gimple_seq if assume_labels
is non-NULL, then BITMAP_FREE assume_labels.
(find_assumption_locals_r): Record in assume_labels uids of user
labels defined in assume attribute expressions.

* c-c++-common/pr119537-1.c: New test.
* c-c++-common/pr119537-2.c: New test.

--- gcc/gimplify.cc.jj  2025-03-31 12:53:44.853727077 +0200
+++ gcc/gimplify.cc 2025-03-31 17:05:40.854893880 +0200
@@ -4508,6 +4508,21 @@ gimplify_variant_call_expr (tree expr, f
 }
 
 
+/* Helper function for gimplify_call_expr, called via walk_tree.
+   Find used user labels.  */
+
+static tree
+find_used_user_labels (tree *tp, int *, void *)
+{
+  if (TREE_CODE (*tp) == LABEL_EXPR
+  && !DECL_ARTIFICIAL (LABEL_EXPR_LABEL (*tp))
+  && DECL_NAME (LABEL_EXPR_LABEL (*tp))
+  && TREE_USED (LABEL_EXPR_LABEL (*tp)))
+return *tp;
+  return NULL_TREE;
+}
+
+
 /* Gimplify the CALL_EXPR node *EXPR_P into the GIMPLE sequence PRE_P.
WANT_VALUE is true if the result of the call is desired.  */
 
@@ -4568,8 +4583,14 @@ gimplify_call_expr (tree *expr_p, gimple
   fndecl, 0));
  return GS_OK;
}
- /* If not optimizing, ignore the assumptions.  */
- if (!optimize || seen_error ())
+ /* If not optimizing, ignore the assumptions unless there
+are used user labels in it.  */
+ if ((!optimize
+  && !walk_tree_without_duplicates (&CALL_EXPR_ARG (*expr_p,
+0),
+find_used_user_labels,
+NULL))
+ || seen_error ())
{
  *expr_p = NULL_TREE;
  return GS_ALL_DONE;
--- gcc/gimple-low.cc.jj2025-03-21 20:25:38.396064280 +0100
+++ gcc/gimple-low.cc   2025-03-31 20:20:38.961772695 +0200
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.
 #include "tree-inline.h"
 #include "gimple-walk.h"
 #include "attribs.h"
+#include "diagnostic-core.h"
 
 /* The differences between High GIMPLE and Low GIMPLE are the
following:
@@ -78,6 +79,10 @@ struct lower_data
   bool cannot_fallthru;
 };
 
+/* Bitmap of LABEL_DECL uids for user labels moved into assume outlined
+   functions.  */
+static bitmap assume_labels;
+
 static void lower_stmt (gimple_stmt_iterator *, struct lower_data *);
 static void lower_gimple_bind (gimple_stmt_iterator *, struct lower_data *);
 static void lower_try_catch (gimple_stmt_iterator *, struct lower_data *);
@@ -87,6 +92,29 @@ static void lower_builtin_posix_memalign
 static void lower_builti

[committed] OpenMP: Reorder diagnostic in modify_call_for_omp_dispatch [PR119559]

2025-04-01 Thread Tobias Burnus


Reorder to the diagnostic to avoid issues when
nappend < ninteropfor the case no append_args arguments at all (nappend == 0) and one 
interop clause to dispatch (ninterop == 1)


Committed as r15-9120-gde92ac6f11e605.

Tobias
commit de92ac6f11e605987421fe1443b5b81ff172dbb6
Author: Tobias Burnus 
Date:   Tue Apr 1 10:29:27 2025 +0200

OpenMP: Reorder diagnostic in modify_call_for_omp_dispatch [PR119559]

gcc/ChangeLog:

PR middle-end/119559
* gimplify.cc (modify_call_for_omp_dispatch): Reorder checks to avoid
asserts and bogus diagnostic.
---
 gcc/gimplify.cc | 36 
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index a8399dc8363..02ad3981adf 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3933,25 +3933,9 @@ modify_call_for_omp_dispatch (tree expr, tree dispatch_clauses,
  the split between early/late resolution, etc instead of the code
  as written by the user.  */
   if (dispatch_interop)
-{
-  for (tree t = dispatch_interop; t; t = TREE_CHAIN (t))
-	if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_INTEROP)
-	  ninterop++;
-  if (nappend < ninterop)
-	{
-	  error_at (OMP_CLAUSE_LOCATION (dispatch_interop),
-		"number of list items in % clause (%d) "
-		"exceeds the number of % items (%d) for "
-		"% candidate %qD",
-		ninterop, nappend, fndecl);
-	  inform (dispatch_append_args
-		  ? EXPR_LOCATION (TREE_PURPOSE (dispatch_append_args))
-		  : DECL_SOURCE_LOCATION (fndecl),
-		  "% candidate %qD declared here",
-		  fndecl);
-	  ninterop = nappend;
-	}
-}
+for (tree t = dispatch_interop; t; t = TREE_CHAIN (t))
+  if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_INTEROP)
+	ninterop++;
   if (dispatch_interop && !dispatch_device_num)
 {
   gcc_checking_assert (ninterop > 1);
@@ -3959,7 +3943,19 @@ modify_call_for_omp_dispatch (tree expr, tree dispatch_clauses,
 		"the % clause must be present if the % "
 		"clause has more than one list item");
 }
-  else if (dispatch_append_args)
+  if (nappend < ninterop)
+{
+  error_at (OMP_CLAUSE_LOCATION (dispatch_interop),
+		"number of list items in % clause (%d) "
+		"exceeds the number of % items (%d) for "
+		"% candidate %qD", ninterop, nappend, fndecl);
+  inform (dispatch_append_args
+	  ? EXPR_LOCATION (TREE_PURPOSE (dispatch_append_args))
+	  : DECL_SOURCE_LOCATION (fndecl),
+	  "% candidate %qD declared here", fndecl);
+  ninterop = nappend;
+}
+  if (dispatch_append_args)
 {
   tree *buffer = XALLOCAVEC (tree, nargs + nappend);
   tree arg = TYPE_ARG_TYPES (TREE_TYPE (fndecl));

[PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Hongyu Wang

Hi,

For spiltter after 3_mask it now splits the pattern
to *3_mask, causing the splitter doesn't generate
nf variant. Add corresponding nf counterpart for define_insn_and_split
to make the splitter also works for nf insn.

Bootstrapped & regtested on x86-64-pc-linux-gnu.

Ok for trunk?

gcc/ChangeLog:

PR target/119539
* config/i386/i386.md (*3_mask_nf): New
define_insn_and_split.
(*3_mask_1_nf): Likewise.
(*3_mask): Use force_lowpart_subreg.

gcc/testsuite/ChangeLog:

PR target/119539
* gcc.target/i386/apx-nf-pr119539.c: New test.
---
 gcc/config/i386/i386.md   | 46 ++-
 .../gcc.target/i386/apx-nf-pr119539.c |  6 +++
 2 files changed, 50 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f7f790d2aeb..42312f0c330 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18131,6 +18131,30 @@ (define_expand "3"
   DONE;
 })
 
+;; Avoid useless masking of count operand.
+(define_insn_and_split "*3_mask_nf"
+  [(set (match_operand:SWI 0 "nonimmediate_operand")
+   (any_rotate:SWI
+ (match_operand:SWI 1 "nonimmediate_operand")
+ (subreg:QI
+   (and
+ (match_operand 2 "int248_register_operand" "c")
+ (match_operand 3 "const_int_operand")) 0)))]
+  "TARGET_APX_NF
+   && ix86_binary_operator_ok (, mode, operands)
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (any_rotate:SWI (match_dup 1)
+   (match_dup 2)))]
+{
+  operands[2] = force_lowpart_subreg (QImode, operands[2],
+ GET_MODE (operands[2]));
+})
+
 ;; Avoid useless masking of count operand.
 (define_insn_and_split "*3_mask"
   [(set (match_operand:SWI 0 "nonimmediate_operand")
@@ -18153,8 +18177,8 @@ (define_insn_and_split "*3_mask"
   (match_dup 2)))
   (clobber (reg:CC FLAGS_REG))])]
 {
-  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
-  operands[2] = gen_lowpart (QImode, operands[2]);
+  operands[2] = force_lowpart_subreg (QImode, operands[2],
+ GET_MODE (operands[2]));
 })
 
 (define_split
@@ -18174,6 +18198,24 @@ (define_split
 (and:SI (match_dup 2) (match_dup 3)) 0)))]
  "operands[4] = gen_reg_rtx (mode);")
 
+(define_insn_and_split "*3_mask_1_nf"
+  [(set (match_operand:SWI 0 "nonimmediate_operand")
+   (any_rotate:SWI
+ (match_operand:SWI 1 "nonimmediate_operand")
+ (and:QI
+   (match_operand:QI 2 "register_operand" "c")
+   (match_operand:QI 3 "const_int_operand"]
+  "TARGET_APX_NF
+   && ix86_binary_operator_ok (, mode, operands)
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (any_rotate:SWI (match_dup 1)
+   (match_dup 2)))])
+
 (define_insn_and_split "*3_mask_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand")
(any_rotate:SWI
diff --git a/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c 
b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
new file mode 100644
index 000..5dfec55ed76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mapx-features=nf -march=x86-64 -O2" } */
+/* { dg-final { scan-assembler-times "\{nf\} rol" 2 } } */
+
+long int f1 (int x) { return ~(1ULL << (x & 0x3f)); }
+long int f2 (char x) { return ~(1ULL << (x & 0x3f)); }
-- 
2.31.1

Re: [PATCH] libstdc++: Fix -Warray-bounds warning in std::vector::resize [PR114945]

2025-04-01 Thread Tomasz Kaminski

On Mon, Mar 31, 2025 at 6:23 PM Jonathan Wakely  wrote:

> This is yet another false positive warning fix. This time the compiler
> can't prove that when the vector has sufficient excess capacity to
> append new elements, the pointer to the existing storage is not null.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/114945
> * include/bits/vector.tcc (vector::_M_default_append): Add
> unreachable condition so the compiler knows that _M_finish is
> not null.
> * testsuite/23_containers/vector/capacity/114945.cc: New test.
> ---
>
> Tested x86_64-linux.
>
LGTM

>
>  libstdc++-v3/include/bits/vector.tcc  |  3 ++
>  .../23_containers/vector/capacity/114945.cc   | 36 +++
>  2 files changed, 39 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc
>
> diff --git a/libstdc++-v3/include/bits/vector.tcc
> b/libstdc++-v3/include/bits/vector.tcc
> index 7a92f34ec64..66d73b4cfd7 100644
> --- a/libstdc++-v3/include/bits/vector.tcc
> +++ b/libstdc++-v3/include/bits/vector.tcc
> @@ -768,6 +768,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>
>   if (__navail >= __n)
> {
> + if (!this->_M_impl._M_finish)
> +   __builtin_unreachable();
> +
>   _GLIBCXX_ASAN_ANNOTATE_GROW(__n);
>   this->_M_impl._M_finish =
> std::__uninitialized_default_n_a(this->_M_impl._M_finish,
> diff --git
> a/libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc
> b/libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc
> new file mode 100644
> index 000..daafc59d5a9
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/vector/capacity/114945.cc
> @@ -0,0 +1,36 @@
> +// { dg-options "-O2 -Werror=stringop-overflow -Werror=array-bounds" }
> +// { dg-do compile { target c++11 } }
> +
> +// Bug libstdc++/114945
> +// Sporadic std::vector::resize() -Wstringop-overflow or -Warray-bounds
> warning
> +
> +#include 
> +#include 
> +template  struct b {
> +  void resize(std::size_t c) { d.resize(c); }
> +  template  void f(a, e);
> +  std::vector d;
> +};
> +#include 
> +std::regex g;
> +uint64_t h;
> +uint32_t i;
> +struct s {
> +  enum class j : size_t;
> +  void k();
> +  using l = b;
> +  std::vector m;
> +};
> +enum class s::j : size_t { n };
> +void o() { g = ""; }
> +void s::k() {
> +  l p;
> +  auto q = uint32_t(), r = uint32_t();
> +  if (h)
> +r = i;
> +  b t;
> +  if (q || r)
> +p.f(j::n, 5);
> +  t.resize(4);
> +  m.push_back(p);
> +}
> --
> 2.49.0
>
>

Re: [PATCH] profile: Another profiling musttail call fix [PR119535]

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcase shows, EDGE_FAKE edges from musttail calls to
> EXIT aren't the only edges we should ignore, we need to ignore also
> edges created by the splitting of blocks for the EDGE_FAKE creation that
> point from the musttail calls to the fallthrough block, which typically does
> the return or with PHIs for the return value.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK

> 2025-04-01  Jakub Jelinek  
> 
>   PR gcov-profile/119535
>   * profile.cc (branch_prob): Ignore any edges from bbs ending with
>   musttail call, rather than only EDGE_FAKE edges from those to EXIT.
> 
>   * c-c++-common/pr119535.c: New test.
> 
> --- gcc/profile.cc.jj 2025-03-27 09:29:37.112574359 +0100
> +++ gcc/profile.cc2025-03-31 10:24:39.593446405 +0200
> @@ -1341,8 +1341,7 @@ branch_prob (bool thunk)
> ignored_edges++;
>   }
> -  /* Ignore fake edges after musttail calls.  */
> -  if ((e->flags & EDGE_FAKE)
> -   && e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
> +  /* Ignore edges after musttail calls.  */
> +  if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
>   {
> gimple_stmt_iterator gsi = gsi_last_bb (e->src);
> gimple *stmt = gsi_stmt (gsi);
> --- gcc/testsuite/c-c++-common/pr119535.c.jj  2025-03-31 10:36:18.844758921 
> +0200
> +++ gcc/testsuite/c-c++-common/pr119535.c 2025-04-01 09:12:08.998801814 
> +0200
> @@ -0,0 +1,31 @@
> +/* PR gcov-profile/119535
> +/* { dg-do compile { target musttail } } */
> +/* { dg-options "-fprofile-generate -O2" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +
> +[[gnu::noipa]] int
> +foo (int x)
> +{
> +  return 42 + x;
> +}
> +
> +int
> +bar (int x)
> +{
> +  foo (x);
> +  foo (2);
> +  [[clang::musttail]] return foo (3);
> +}
> +
> +int
> +baz (int x)
> +{
> +  if (x == 42)
> +return -1;
> +  else if (x == 15)
> +return 25;
> +  else if (x == 26)
> +[[clang::musttail]] return foo (4);
> +  else
> +[[clang::musttail]] return foo (5);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Hongtao Liu

On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang  wrote:
>
> Hi,
>
> For spiltter after 3_mask it now splits the pattern
> to *3_mask, causing the splitter doesn't generate
> nf variant. Add corresponding nf counterpart for define_insn_and_split
> to make the splitter also works for nf insn.
>
> Bootstrapped & regtested on x86-64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/119539
> * config/i386/i386.md (*3_mask_nf): New
> define_insn_and_split.
> (*3_mask_1_nf): Likewise.
> (*3_mask): Use force_lowpart_subreg.
>
> gcc/testsuite/ChangeLog:
>
> PR target/119539
> * gcc.target/i386/apx-nf-pr119539.c: New test.
> ---
>  gcc/config/i386/i386.md   | 46 ++-
>  .../gcc.target/i386/apx-nf-pr119539.c |  6 +++
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index f7f790d2aeb..42312f0c330 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -18131,6 +18131,30 @@ (define_expand "3"
>DONE;
>  })
>
> +;; Avoid useless masking of count operand.
> +(define_insn_and_split "*3_mask_nf"
> +  [(set (match_operand:SWI 0 "nonimmediate_operand")
> +   (any_rotate:SWI
> + (match_operand:SWI 1 "nonimmediate_operand")
> + (subreg:QI
> +   (and
> + (match_operand 2 "int248_register_operand" "c")
> + (match_operand 3 "const_int_operand")) 0)))]
> +  "TARGET_APX_NF
> +   && ix86_binary_operator_ok (, mode, operands)
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> +  == GET_MODE_BITSIZE (mode)-1
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (any_rotate:SWI (match_dup 1)
> +   (match_dup 2)))]
> +{
> +  operands[2] = force_lowpart_subreg (QImode, operands[2],
> + GET_MODE (operands[2]));
> +})
Can we just change the output in original pattern, I think combine
will still match the pattern even w/ clobber flags.

like

@@ -17851,8 +17851,17 @@ (define_insn_and_split "*3_mask"
(match_dup 2)))
   (clobber (reg:CC FLAGS_REG))])]
 {
-  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
-  operands[2] = gen_lowpart (QImode, operands[2]);
+  if (TARGET_APX_F)
+{
+  emit_move_insn (operands[0],
+gen_rtx_ (mode, operands[1], operands[2]));
+  DONE;
+}
+  else
+{
+  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
+  operands[2] = gen_lowpart (QImode, operands[2]);
+}
 }

Also we can remove constraint "c" in the original pattern.

> +
>  ;; Avoid useless masking of count operand.
>  (define_insn_and_split "*3_mask"
>[(set (match_operand:SWI 0 "nonimmediate_operand")
> @@ -18153,8 +18177,8 @@ (define_insn_and_split "*3_mask"
>(match_dup 2)))
>(clobber (reg:CC FLAGS_REG))])]
>  {
> -  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
> -  operands[2] = gen_lowpart (QImode, operands[2]);
> +  operands[2] = force_lowpart_subreg (QImode, operands[2],
> + GET_MODE (operands[2]));
>  })
>
>  (define_split
> @@ -18174,6 +18198,24 @@ (define_split
>  (and:SI (match_dup 2) (match_dup 3)) 0)))]
>   "operands[4] = gen_reg_rtx (mode);")
>
> +(define_insn_and_split "*3_mask_1_nf"
> +  [(set (match_operand:SWI 0 "nonimmediate_operand")
> +   (any_rotate:SWI
> + (match_operand:SWI 1 "nonimmediate_operand")
> + (and:QI
> +   (match_operand:QI 2 "register_operand" "c")
> +   (match_operand:QI 3 "const_int_operand"]
> +  "TARGET_APX_NF
> +   && ix86_binary_operator_ok (, mode, operands)
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> +  == GET_MODE_BITSIZE (mode)-1
> +   && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (any_rotate:SWI (match_dup 1)
> +   (match_dup 2)))])
> +
>  (define_insn_and_split "*3_mask_1"
>[(set (match_operand:SWI 0 "nonimmediate_operand")
> (any_rotate:SWI
> diff --git a/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c 
> b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> new file mode 100644
> index 000..5dfec55ed76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/apx-nf-pr119539.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-mapx-features=nf -march=x86-64 -O2" } */
> +/* { dg-final { scan-assembler-times "\{nf\} rol" 2 } } */
> +
> +long int f1 (int x) { return ~(1ULL << (x & 0x3f)); }
> +long int f2 (char x) { return ~(1ULL << (x & 0x3f)); }
> --
> 2.31.1
>


-- 
BR,
Hongtao

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Martin Uecker

Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao:
> 
> > On Apr 1, 2025, at 10:04, Martin Uecker  wrote:
> > 
> > 
> > 
> > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
> > > > I'd like to offer up this to solve the issues we're facing. This is a
> > > > combination of everything that's been discussed here (or at least that
> > > > I've been able to read in the centi-thread :-).
> > 
> > Thanks! I think this proposal much better as it avoids undue burden
> > on parsers, but it does not address all my concerns.
> > 
> > 
> > From my side, the issue about compromising the scoping rules of C
> > is also about unintended non-local effects of code changes. In
> > my opinion, a change in a library header elsewhere should not cause 
> > code in a local scope (which itself might also come from a macro) to
> > emit a warning or require a programmer to add a workaround. So I am
> > not convinced that adding warnings or a workaround such as
> > __builtin_global_ref  is a good solution.
> > 
> > 
> > I could see the following as a possible way forward: We only 
> > allow the following two syntaxes:
> > 
> > 1. Single argument referring to a member.
> > 
> > __counted_by(len)
> > 
> > with an argument that must be a single identifier and where
> > the identifier then must refer to a struct member. 
> > 
> > (I still think this is not ideal and potentially
> > confusing, but in contrast to new scoping rules it is
> > at least relatively easily to explain as a special rule.).  
> > 
> > 
> > 2. Forward declarations. 
> > 
> > __counted_by(size_t len; len + PADDING)
> 
> In the above, the PADDING is some constant? 

In principle - when considering only the name lookup rules -
it could be a constant, a global variable, or an automatic
variable, i.e. any ordinary identifiers which is visible at
this point. 

> 
> More complicated expressions involving globals will not be supported?

I think one could allow such expressions, But I think the
expressions should be restricted to expressions which have
no side effects.

> 
> > where then the second part can also be a more complicated 
> > expression, but with the explicit requirement that all
> > identifiers in this expression are then looked up according to
> > regular C language rules. So except for the forward declared
> > member(s) they are *never* looked up in the member namespace of
> > the struct, i.e. no new name lookup rules are introduced.
> 
> One question here:
> 
> What’s the major issue if we’d like to add one new scoping rule, for example,
> “Structure scope” (the same as the C++’s instance scope) to C? 
> 
> (In addition to the "VLA in structure" issue I mentioned in my previous 
> writeup, 
> is there any other issue to prevent this new scoping rule being added into C 
> ?).

Note that the "VLA in structure" is a bit of a red herring.  The exact same
issues apply to lookup of any other ordinary identifiers in this context.

enum { MY_BUF_SIZE = 100 };
struct foo {
  char buf[MY_BUF_SIZE];
};

C++ has instance scope for member functions. The rules for C++ are also
complex and not very consistent (see the examples I posted earlier,
demonstrating UB and compiler divergence).  For C such a concept would
be new and much less useful, so the trade-off seems unfavorable (in
constrast to C++ where it is needed).  I also see others issues:  Fully
supporting instance scope would require changes to how C is parsed, 
placing a burden on all C compilers and tooling. Depending on how you 
specify it, it would also cause a change in semantics
for existing code, something C tries very hard to avoid. If you add
warnings as mitigation,  it has the problem that it causes non-local
effects where introducing a name in in enclosing scope somewhere else
now necessitates a change to unrelated code, exactly what scoping rules
are meant to prevent.  

In any case, it seems a major change with many ramifications, including
possibly unintended ones. This should certainly not be done without
having a clear specification and support from WG14 (and probably not
done at all.)

Martin

> 
> Qing
> 
> 
> > 
> > 
> > I think this could address my concerns about breaking
> > scoping in C. Still, I personally would prefer designator syntax
> > for both C and C++ as a nicer solution, and one that already
> > has some support from WG14.
> > 
> > Martin
> > 
> > 
> > > > 
> > > > ---
> > > > 
> > > > 1. The use of '__self' isn't feasible, so we won't use it. Instead,
> > > > we'll rely upon the current behavior—resolving any identifiers to the
> > > > "instance scope". This new scope is used __only__ in attributes, and
> > > > resolves identifiers to those in the least enclosing, non-anonymous
> > > > struct. For example:
> > > > 
> > > > struct foo {
> > > >  char count;
> > > >  struct bar {
> > > >struct {
> > > >  int len;
> > > >};
> > > >struct {
> > > >  struct {
> > > >int *valid_use __counted_by(len); // Valid.
> > > >  };
> > > >

[PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]

2025-04-01 Thread Richard Sandiford

The problem in PR101523 was that, after each successful 2->2
combination attempt, distribute_links would search further and
further for the next combinable use of the i2 destination.
The original patch for the PR dealt with that by disallowing such
combinations.  However, that led to various optimisation regressions,
so there was interest in allowing the combinations again, at least
until an alternative way of getting the same results is in place.

This patch does that, but limits distribute_links to a certain number
of instructions when i2 is unchanged.  As Segher said in the PR trail,
it would make more conceptual sense to apply the limit unconditionally,
but I thought it would be better to change as little as possible at
this development stage.  Logically, in stage 1, the --param should
be applied directly by distribute_links with no input from callers.

As I mentioned in:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28

I think it's safe to drop log links even if a use exists.  All
processing of log links seems to handle the absence of a link
for a particular register in a conservative way.

The initial set-up errs on the side of dropping links, since for example
create_log_links has:

 /* flow.c claimed:

 We don't build a LOG_LINK for hard registers contained
 in ASM_OPERANDs.  If these registers get replaced,
 we might wind up changing the semantics of the insn,
 even if reload can make what appear to be valid
 assignments later.  */
  if (regno < FIRST_PSEUDO_REGISTER
  && asm_noperands (PATTERN (use_insn)) >= 0)
continue;

which excludes combinations by dropping log links, rather than during
try_combine.  And:

  /* If this register is being initialized using itself, and the
 register is uninitialized in this basic block, and there are
 no LOG_LINKS which set the register, then part of the
 register is uninitialized.  In that case we can't assume
 anything about the number of nonzero bits.

 ??? We could do better if we checked this in
 reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
 could avoid making assumptions about the insn which initially
 sets the register, while still using the information in other
 insns.  We would have to be careful to check every insn
 involved in the combination.  */

  if (insn
  && reg_referenced_p (x, PATTERN (insn))
  && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
   REGNO (x)))
{
  struct insn_link *link;

  FOR_EACH_LOG_LINK (link, insn)
if (dead_or_set_p (link->insn, x))
  break;
  if (!link)
{
  rsp->nonzero_bits = GET_MODE_MASK (mode);
  rsp->sign_bit_copies = 1;
  return;
}
}

treats the lack of a log link is a possible sign of uninitialised data,
but that would be a missed optimisation rather than a correctness issue.

One question is what the default --param value should be.  I went with
Jakub's suggestion of 3000 from:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25

Also, to answer Jakub's question in that comment, I tried bisecting:

  int limit = atoi (getenv ("BISECT"));

(so applying the limit for all calls from try_combine) with an
abort in distribute_links if the limit caused a link to be skipped.
The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
was 142, so much lower than the parameter value.  I realised too late
that --enable-checking=release would probably have been a more
interesting test.

Some of the try_combine changes come from Richi's patch in:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53

Bootstrapped & regression-tested on aarch64-linux-gnu,
powerpc64le-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR testsuite/116398
* params.opt (-param=max-combine-search-insns=): New param.
* doc/invoke.texi: Document it.
* combine.cc (distribute_links): Add a limit parameter.
(try_combine): Use the new param to limit distribute_links
when i2 is unchanged.  Return i3 rather than i2 if i2 is unchanged.

gcc/testsuite/
* gcc.target/aarch64/popcnt-le-1.c: Account for commutativity of TST.
* gcc.target/aarch64/popcnt-le-3.c: Likewise AND.
* gcc.target/aarch64/sve/pred-not-gen-1.c: Revert previous patch.
* gcc.target/aarch64/sve/pred-not-gen-4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.

Co-authored-by: Richard Biener 
---
 gcc/combine.cc| 30 +--
 gcc/doc/invoke.texi   |  9

[PATCH] profile: Another profiling musttail call fix [PR119535]

2025-04-01 Thread Jakub Jelinek

Hi!

As the following testcase shows, EDGE_FAKE edges from musttail calls to
EXIT aren't the only edges we should ignore, we need to ignore also
edges created by the splitting of blocks for the EDGE_FAKE creation that
point from the musttail calls to the fallthrough block, which typically does
the return or with PHIs for the return value.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2025-04-01  Jakub Jelinek  

PR gcov-profile/119535
* profile.cc (branch_prob): Ignore any edges from bbs ending with
musttail call, rather than only EDGE_FAKE edges from those to EXIT.

* c-c++-common/pr119535.c: New test.

--- gcc/profile.cc.jj   2025-03-27 09:29:37.112574359 +0100
+++ gcc/profile.cc  2025-03-31 10:24:39.593446405 +0200
@@ -1341,8 +1341,7 @@ branch_prob (bool thunk)
  ignored_edges++;
}
-  /* Ignore fake edges after musttail calls.  */
-  if ((e->flags & EDGE_FAKE)
- && e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+  /* Ignore edges after musttail calls.  */
+  if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
{
  gimple_stmt_iterator gsi = gsi_last_bb (e->src);
  gimple *stmt = gsi_stmt (gsi);
--- gcc/testsuite/c-c++-common/pr119535.c.jj2025-03-31 10:36:18.844758921 
+0200
+++ gcc/testsuite/c-c++-common/pr119535.c   2025-04-01 09:12:08.998801814 
+0200
@@ -0,0 +1,31 @@
+/* PR gcov-profile/119535
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-fprofile-generate -O2" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+
+[[gnu::noipa]] int
+foo (int x)
+{
+  return 42 + x;
+}
+
+int
+bar (int x)
+{
+  foo (x);
+  foo (2);
+  [[clang::musttail]] return foo (3);
+}
+
+int
+baz (int x)
+{
+  if (x == 42)
+return -1;
+  else if (x == 15)
+return 25;
+  else if (x == 26)
+[[clang::musttail]] return foo (4);
+  else
+[[clang::musttail]] return foo (5);
+}

Jakub

[PATCH] config, toplevel, Darwin: Pass -B instead of -L to C++ commands.

2025-04-01 Thread Iain Sandoe

Another misconfigure (Darwin-only) found by a combination of increases
in system security contraints and the COBOL runtime.  We will now fail
any C++ configure test that needs the binary to run.

Tested on x86_64-darwin17,21,24 and on x86_64, powerpc64le, aarch64-linux
confirmed manually that the CXX_FOR_TARGET commands are unchanged for Linux.
OK for trunk (and backports)?
thanks,
Iain

--- 8< ---

Darwin from 10.11 needs embedded rpaths to find the correct libraries at
runtime.  Recent increases in hardening have made it such that the dynamic
loader will no longer fall back to using an installed libstdc++ when the
(new) linked one is not found.  This means we fail configure tests (that
should pass) for runtimes that use C++.

We can resolve this by passing '-B' to the C++ command lines instead of '-L'
(-B implies -L on Darwin, but also causes a corresponding embedded rpath).

ChangeLog:

* configure: Regenerate.
* configure.ac: Use -B instead of -L to specifiy the C++ runtime
paths on Darwin.

Signed-off-by: Iain Sandoe 
---
 configure| 100 +--
 configure.ac |  16 +++--
 2 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 036142a8d06..bf574efd1d8 100755
--- a/configure
+++ b/configure
@@ -19081,7 +19081,101 @@ $as_echo "pre-installed" >&6; }
   fi
 fi
 
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target 
c++" >&5
+case $target in
+  *-*-darwin*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target 
c++" >&5
+$as_echo_n "checking where to find the target c++... " >&6; }
+if test "x${build}" != "x${host}" ; then
+  if expr "x$CXX_FOR_TARGET" : "x/" > /dev/null; then
+# We already found the complete path
+ac_dir=`dirname $CXX_FOR_TARGET`
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" 
>&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  else
+# Canadian cross, just use what we found
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+else
+  ok=yes
+  case " ${configdirs} " in
+*" gcc "*) ;;
+*) ok=no ;;
+  esac
+  case ,${enable_languages}, in
+*,c++,*) ;;
+*) ok=no ;;
+  esac
+  if test $ok = yes; then
+# An in-tree tool is available and we can use it
+CXX_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/xg++ -B$$r/$(HOST_SUBDIR)/gcc/ 
-nostdinc++ `if test -f 
$$r/$(TARGET_SUBDIR)/libstdc++-v3/scripts/testsuite_flags; then $(SHELL) 
$$r/$(TARGET_SUBDIR)/libstdc++-v3/scripts/testsuite_flags --build-includes; 
else echo -funconfigured-libstdc++-v3 ; fi` 
-B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src 
-B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs 
-B$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs'
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5
+$as_echo "just compiled" >&6; }
+  elif expr "x$CXX_FOR_TARGET" : "x/" > /dev/null; then
+# We already found the complete path
+ac_dir=`dirname $CXX_FOR_TARGET`
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" 
>&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  elif test "x$target" = "x$host"; then
+# We can use an host tool
+CXX_FOR_TARGET='$(CXX)'
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5
+$as_echo "host tool" >&6; }
+  else
+# We need a cross tool
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+fi
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target 
c++ for libstdc++" >&5
+$as_echo_n "checking where to find the target c++ for libstdc++... " >&6; }
+if test "x${build}" != "x${host}" ; then
+  if expr "x$RAW_CXX_FOR_TARGET" : "x/" > /dev/null; then
+# We already found the complete path
+ac_dir=`dirname $RAW_CXX_FOR_TARGET`
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" 
>&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  else
+# Canadian cross, just use what we found
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+else
+  ok=yes
+  case " ${configdirs} " in
+*" gcc "*) ;;
+*) ok=no ;;
+  esac
+  case ,${enable_languages}, in
+*,c++,*) ;;
+*) ok=no ;;
+  esac
+  if test $ok = yes; then
+# An in-tree tool is available and we can use it
+RAW_CXX_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/xgcc -shared-libgcc 
-B$$r/$(HOST_SUBDIR)/gcc -nostdinc++ -B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src 
-B$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs 
-B$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs'
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5
+$as_echo "just compiled" >&6; }
+  elif expr "x$RAW_CXX_FOR_TARGET" : "x/" > /dev/null; then
+# We already found the complete path
+ac_dir=`dirname $RAW_CXX_FOR_TARGET`
+{ $as_echo "$as_me:${as_lineno-$LINENO}:

[PATCH] tailc: Improve tail recursion handling [PR119493]

2025-04-01 Thread Jakub Jelinek

Hi!

This is a partial step towards fixing that PR.
For musttail recursive calls which have non-is_gimple_reg_type typed
parameters, the only case we've handled was if the exact parameter
was passed through (perhaps modified, but still the same PARM_DECL).
That isn't necessary, we can copy the argument to the parameter as well
(just need to watch for the use of the parameter in later arguments,
say musttail recursive call which swaps 2 structure arguments).

The patch attempts to play safe and punts if any of the parameters are
addressable (like we do for all normal tail calls and tail recursions,
except for musttail in the posted unreviewed patch).

With this patch (at least when early inlining isn't done on not yet
optimized body) inlining should see already tail recursion optimized
body and will not have problems with SRA breaking musttail.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-01  Jakub Jelinek  

PR tree-optimization/119493
* tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion
if some arguments don't have is_gimple_reg_type, only punt if they
have non-POD types, or volatile, or addressable.  Set
tailr_arg_needs_copy in those cases too.
(eliminate_tail_call): Copy call arguments to params if they don't
have is_gimple_reg_type, use temporaries if the argument is used
later.
(tree_optimize_tail_calls_1): Skip !is_gimple_reg_type
tailr_arg_needs_copy parameters.  Formatting fix.

* gcc.dg/pr119493-1.c: New test.

--- gcc/tree-tailcall.cc.jj 2025-03-28 10:49:30.0 +0100
+++ gcc/tree-tailcall.cc2025-03-31 12:31:01.717603446 +0200
@@ -672,19 +672,16 @@ find_tail_calls (basic_block bb, struct
 have a copyable type and the two arguments must have reasonably
 equivalent types.  The latter requirement could be relaxed if
 we emitted a suitable type conversion statement.  */
- if (!is_gimple_reg_type (TREE_TYPE (param))
+ if (TREE_ADDRESSABLE (TREE_TYPE (param))
  || !useless_type_conversion_p (TREE_TYPE (param),
 TREE_TYPE (arg)))
break;
 
- /* The parameter should be a real operand, so that phi node
-created for it at the start of the function has the meaning
-of copying the value.  This test implies is_gimple_reg_type
-from the previous condition, however this one could be
-relaxed by being more careful with copying the new value
-of the parameter (emitting appropriate GIMPLE_ASSIGN and
-updating the virtual operands).  */
- if (!is_gimple_reg (param))
+ if (is_gimple_reg_type (TREE_TYPE (param))
+ ? !is_gimple_reg (param)
+ : (!is_gimple_variable (param)
+|| TREE_THIS_VOLATILE (param)
+|| may_be_aliased (param)))
break;
}
}
@@ -934,9 +931,9 @@ find_tail_calls (basic_block bb, struct
   param = DECL_CHAIN (param), idx++)
{
  tree ddef, arg = gimple_call_arg (call, idx);
- if (is_gimple_reg (param)
- && (ddef = ssa_default_def (cfun, param))
- && (arg != ddef))
+ if (!is_gimple_reg (param)
+ || ((ddef = ssa_default_def (cfun, param))
+ && arg != ddef))
bitmap_set_bit (tailr_arg_needs_copy, idx);
}
 }
@@ -1212,6 +1209,7 @@ eliminate_tail_call (struct tailcall *t,
 
   /* Add phi node entries for arguments.  The ordering of the phi nodes should
  be the same as the ordering of the arguments.  */
+  auto_vec copies;
   for (param = DECL_ARGUMENTS (current_function_decl),
 idx = 0, gpi = gsi_start_phis (first);
param;
@@ -1220,6 +1218,35 @@ eliminate_tail_call (struct tailcall *t,
   if (!bitmap_bit_p (tailr_arg_needs_copy, idx))
continue;
 
+  if (!is_gimple_reg_type (TREE_TYPE (param)))
+   {
+ if (param == gimple_call_arg (stmt, idx))
+   continue;
+ /* First check if param isn't used by any of the following
+call arguments.  If it is, we need to copy first to
+a temporary and only after doing all the assignments copy it
+to param.  */
+ size_t idx2 = idx + 1;
+ tree param2 = DECL_CHAIN (param);
+ for (; param2; param2 = DECL_CHAIN (param2), idx2++)
+   if (!is_gimple_reg_type (TREE_TYPE (param)))
+ {
+   tree base = get_base_address (gimple_call_arg (stmt, idx2));
+   if (base == param)
+ break;
+ }
+ tree tmp = param;
+ if (param2)
+   {
+ tmp = create_tmp_var (TREE_TYPE (param));
+ copies.safe_push (param);
+

[COMMITTED] Doc: Document enum with underlying type extension [PR117689]

2025-04-01 Thread Sandra Loosemore

This is a C23/C++11 feature that is supported as an extension with
earlier -std= options too, but was never previously documented.  It
interacts with the already-documented forward enum definition extension,
so I have merged discussion of the two extensions into the same section.

gcc/ChangeLog
PR c/117689
* doc/extend.texi (Incomplete Enums): Rename to
(Enum Extensions): This.  Document support for specifying the
underlying type of an enum as an extension in all earlier C
and C++ standards.  Document that a forward declaration with
underlying type is not an incomplete type, and which dialects
GCC supports that in.
---
 gcc/doc/extend.texi | 62 -
 1 file changed, 50 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a49b1cdd1e3..76fb210060d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12943,7 +12943,7 @@ C and/or C++ standards, while others remain specific to 
GNU C.
 * Typeof::  @code{typeof}: referring to the type of an expression.
 * Offsetof::Special syntax for @code{offsetof}.
 * Alignment::   Determining the alignment of a function, type or 
variable.
-* Incomplete Enums::@code{enum foo;}, with details to follow.
+* Enum Extensions:: Forward declarations and specifying the underlying 
type.
 * Variadic Macros:: Macros with a variable number of arguments.
 * Conditionals::Omitting the middle operand of a @samp{?:} expression.
 * Case Ranges:: `case 1 ... 9' and such.
@@ -13643,22 +13643,60 @@ If the operand of the @code{__alignof__} expression 
is a function,
 the expression evaluates to the alignment of the function which may
 be specified by attribute @code{aligned} (@pxref{Common Function Attributes}).
 
-@node Incomplete Enums
-@subsection Incomplete @code{enum} Types
+@node Enum Extensions
+@subsection Extensions to @code{enum} Type Declarations
+@anchor{Incomplete Enums}
+@cindex @code{enum} extensions
+@cindex base type of an @code{enum}
+@cindex underlying type of an @code{enum}
+@cindex forward declaration of an @code{enum}
+@cindex opaque @code{enum} types
+@cindex incomplete @code{enum} types
 
-You can define an @code{enum} tag without specifying its possible values.
-This results in an incomplete type, much like what you get if you write
-@code{struct foo} without describing the elements.  A later declaration
-that does specify the possible values completes the type.
+The C23 and C++11 standards added new syntax to specify the underlying
+type of an @code{enum} type.  For example,
 
+@smallexample
+enum pet : unsigned char @{ CAT, DOG, ROCK @};
+@end smallexample
+
+In GCC, this feature is supported as an extension in all older
+dialects of C and C++ as well.  For C++ dialects before C++11, use
+@option{-Wno-c++11-extensions} to silence the associated warnings.
+
+You can also forward-declare an @code{enum} type, without specifying
+its possible values.  The enumerators are supplied in a later
+redeclaration of the type, which must match the underlying type of the
+first declaration.
+
+@smallexample
+enum pet : unsigned char;
+static enum pet my_pet;
+...
+enum pet : unsigned char @{ CAT, DOG, ROCK @};
+@end smallexample
+
+Forward declaration of @code{enum} types with an explicit underlying
+type is also a feature of C++11 that is supported as an extension by
+GCC for all C dialects.  However, it's not available in C++ dialects
+prior to C++11.
+
+The C++ standard refers to a forward declaration of an @code{enum}
+with an explicit underlying type as an @dfn{opaque type}.  It is not
+considered an incomplete type, since its size is known.  That means
+you can declare variables or allocate storage using the type before
+the redeclaration, not just use pointers of that type.
+
+GCC has also traditionally supported forward declarations of
+@code{enum} types that don't include an explicit underlying type
+specification.  This results in an incomplete type, much like what you
+get if you write @code{struct foo} without describing the elements.
 You cannot allocate variables or storage using the type while it is
 incomplete.  However, you can work with pointers to that type.
 
-This extension may not be very useful, but it makes the handling of
-@code{enum} more consistent with the way @code{struct} and @code{union}
-are handled.
-
-This extension is not supported by GNU C++.
+Forward-declaring an incomplete enum type without an explicit
+underlying type is supported as an extension in all GNU C dialects,
+but is not supported at all in GNU C++.
 
 @node Variadic Macros
 @subsection Macros with a Variable Number of Arguments.
-- 
2.34.1

[PATCH] testsuite: Add support for GCOV_UNDER_TEST

2025-04-01 Thread Christophe Lyon

After commit r15-8947-g8ed2d5d219e999, which added new tests using
gcov, the CI noticed failures because it was calling 'gcov' instead of
$target-gcov.

This is because the CI scripts override GXX_UNDER_TEST, but still run
the testsuite in-tree, and gcc-transform-out-of-tree only depends on
TESTING_IN_BUILD_TREE but the definition of GCOV uses GCC_UNDER_TEST,
GXX_UNDER_TEST or GDC_UNDER_TEST (assuming their default values when
TESTING_IN_BUILD_TREE).

To handle such a case, this patch adds support for a new variable,
GCOV_UNDER_TEST, which overrides the current behavior if defined, and
has an effect similar to overriding GCC_UNDER_TEST etc...

Unfortunately, the change needs to be duplicated in several places,
which use either GCC_UNDER_TEST, GXX_UNDER_TEST or GDC_UNDER_TEST.

Tested g++.dg/gcov/gcov.exp and now g++.dg/gcov/gcov-22.C passes
(calling /bin/$target-gcov as instructed by the CI
scripts).  No change observed on gcc.misc-tests/gcov.exp, and I could
not test gdc.dg/gcov.exp and gnat.dg/gcov/gcov.exp.

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov.exp: Handle GCOV_UNDER_TEST.
* gcc.misc-tests/gcov.exp: Likewise.
* gdc.dg/gcov.exp: Likewise.
* gnat.dg/gcov/gcov.exp: Likewise.
---
 gcc/testsuite/g++.dg/gcov/gcov.exp| 15 +++
 gcc/testsuite/gcc.misc-tests/gcov.exp | 15 +++
 gcc/testsuite/gdc.dg/gcov.exp | 15 +++
 gcc/testsuite/gnat.dg/gcov/gcov.exp   | 15 +++
 4 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/g++.dg/gcov/gcov.exp 
b/gcc/testsuite/g++.dg/gcov/gcov.exp
index 50f60c4a011..d85bf125d16 100644
--- a/gcc/testsuite/g++.dg/gcov/gcov.exp
+++ b/gcc/testsuite/g++.dg/gcov/gcov.exp
@@ -21,12 +21,19 @@ load_lib g++-dg.exp
 load_lib gcov.exp
 
 global GXX_UNDER_TEST
+global GCOV_UNDER_TEST
 
-# Find gcov in the same directory as $GXX_UNDER_TEST.
-if { ![is_remote host] && [string match "*/*" [lindex $GXX_UNDER_TEST 0]] } {
-set GCOV [file dirname [lindex $GXX_UNDER_TEST 
0]]/[gcc-transform-out-of-tree gcov]
+# Find gcov in the same directory as $GXX_UNDER_TEST. under
+# GCOV_UNDER_TEST is defined.
+
+if ![info exists GCOV_UNDER_TEST] {
+if { ![is_remote host] && [string match "*/*" [lindex $GXX_UNDER_TEST 0]] 
} {
+   set GCOV [file dirname [lindex $GXX_UNDER_TEST 
0]]/[gcc-transform-out-of-tree gcov]
+} else {
+   set GCOV [gcc-transform-out-of-tree gcov]
+}
 } else {
-set GCOV [gcc-transform-out-of-tree gcov]
+set GCOV $GCOV_UNDER_TEST
 }
 
 # Initialize harness.
diff --git a/gcc/testsuite/gcc.misc-tests/gcov.exp 
b/gcc/testsuite/gcc.misc-tests/gcov.exp
index c8f20e1e70e..4278b625c0b 100644
--- a/gcc/testsuite/gcc.misc-tests/gcov.exp
+++ b/gcc/testsuite/gcc.misc-tests/gcov.exp
@@ -21,12 +21,19 @@ load_lib gcc-dg.exp
 load_lib gcov.exp
 
 global GCC_UNDER_TEST
+global GCOV_UNDER_TEST
 
-# For now find gcov in the same directory as $GCC_UNDER_TEST.
-if { ![is_remote host] && [string match "*/*" [lindex $GCC_UNDER_TEST 0]] } {
-set GCOV [file dirname [lindex $GCC_UNDER_TEST 
0]]/[gcc-transform-out-of-tree gcov]
+# For now find gcov in the same directory as $GCC_UNDER_TEST, unless
+# GCOV_UNDER_TEST is defined..
+
+if ![info exists GCOV_UNDER_TEST] {
+if { ![is_remote host] && [string match "*/*" [lindex $GCC_UNDER_TEST 0]] 
} {
+   set GCOV [file dirname [lindex $GCC_UNDER_TEST 
0]]/[gcc-transform-out-of-tree gcov]
+} else {
+   set GCOV [gcc-transform-out-of-tree gcov]
+}
 } else {
-set GCOV [gcc-transform-out-of-tree gcov]
+set GCOV $GCOV_UNDER_TEST
 }
 
 # Initialize harness.
diff --git a/gcc/testsuite/gdc.dg/gcov.exp b/gcc/testsuite/gdc.dg/gcov.exp
index a65dd34..3f0234e9f2f 100644
--- a/gcc/testsuite/gdc.dg/gcov.exp
+++ b/gcc/testsuite/gdc.dg/gcov.exp
@@ -21,12 +21,19 @@ load_lib gdc-dg.exp
 load_lib gcov.exp
 
 global GDC_UNDER_TEST
+global GCOV_UNDER_TEST
 
-# For now find gcov in the same directory as $GDC_UNDER_TEST.
-if { ![is_remote host] && [string match "*/*" [lindex $GDC_UNDER_TEST 0]] } {
-set GCOV [file dirname [lindex $GDC_UNDER_TEST 
0]]/[gcc-transform-out-of-tree gcov]
+# For now find gcov in the same directory as $GDC_UNDER_TEST, unless
+# GCOV_UNDER_TEST is defined.
+
+if ![info exists GCOV_UNDER_TEST] {
+if { ![is_remote host] && [string match "*/*" [lindex $GDC_UNDER_TEST 0]] 
} {
+   set GCOV [file dirname [lindex $GDC_UNDER_TEST 
0]]/[gcc-transform-out-of-tree gcov]
+} else {
+   set GCOV [gcc-transform-out-of-tree gcov]
+}
 } else {
-set GCOV [gcc-transform-out-of-tree gcov]
+set GCOV $GCOV_UNDER_TEST
 }
 
 # Initialize harness.
diff --git a/gcc/testsuite/gnat.dg/gcov/gcov.exp 
b/gcc/testsuite/gnat.dg/gcov/gcov.exp
index 4fa887d5bad..d5977a85e8a 100644
--- a/gcc/testsuite/gnat.dg/gcov/gcov.exp
+++ b/gcc/testsuite/gnat.dg/gcov/gcov.exp
@@ -21,12 +21,19 @@ load_lib gnat-dg.exp
 load_lib gcov.exp
 
 global GCC_UNDER_TEST
+global GCOV_UNDER_TEST
 
-# For now find

Re: [PATCH] simplify-rtx: Fix shortcut for vector eq/ne

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 01:16:32PM +0100, Richard Sandiford wrote:
> gcc/
>   * simplify-rtx.cc (simplify_const_relational_operation): Generalize
>   the constant checks in the fold-via-minus path to match the
>   INTEGRAL_MODE_P condition.

LGTM.

Jakub

[PATCH] OpenMP: Require target and/or targetsync init modifier [PR118965]

2025-04-01 Thread Sandra Loosemore

As noted in PR 118965, the initial interop implementation overlooked
the requirement in the OpenMP spec that at least one of the "target"
and "targetsync" modifiers is required in both the interop construct
init clause and the declare variant append_args clause.

Adding the check was fairly straightforward, but it broke about a
gazillion existing test cases.  In particular, things like "init (x, y)"
which were previously accepted (and tested for being accepted) aren't
supposed to be allowed by the spec, much less things like "init (target)"
where target was previously interpreted as a variable name instead of a
modifier.  Since one of the effects of the change is that at least one
modifier is always required, I found that deleting all the code that was
trying to detect and handle the no-modifier case allowed for better
diagnostics.

gcc/c/ChangeLog
PR middle-end/118965
* c-parser.cc (c_parser_omp_clause_init_modifiers): Adjust
error message.
(c_parser_omp_clause_init): Remove code for recognizing clauses
without modifiers.  Diagnose missing target/targetsync modifier.
(c_finish_omp_declare_variant): Diagnose missing target/targetsync
modifier.

gcc/cp/ChangeLog
PR middle-end/118965
* parser.cc (c_parser_omp_clause_init_modifiers): Adjust
error message.
(cp_parser_omp_clause_init): Remove code for recognizing clauses
without modifiers.  Diagnose missing target/targetsync modifier.
(cp_finish_omp_declare_variant): Diagnose missing target/targetsync
modifier.

gcc/gfortran/ChangeLog
PR middle-end/118965
* openmp.cc (gfc_parser_omp_clause_init_modifiers): Fix some
inconsistent code indentation.  Remove code for recognizing
clauses without modifiers.  Diagnose prefer_type without a
following '('.  Adjust error message for an unrecognized modifier.
Diagnose missing target/targetsync modifier.
(gfc_match_omp_init): Fix more inconsistent code indentation.

gcc/testsuite/ChangeLog
PR middle-end/118965
* c-c++-common/gomp/append-args-1.c: Add target/targetsync
modifiers so tests do what they were previously supposed to do.
Adjust expected output.
* c-c++-common/gomp/append-args-7.c: Likewise.
* c-c++-common/gomp/append-args-8.c: Likewise.
* c-c++-common/gomp/append-args-9.c: Likewise.
* c-c++-common/gomp/interop-1.c: Likewise.
* c-c++-common/gomp/interop-2.c: Likewise.
* c-c++-common/gomp/interop-3.c: Likewise.
* c-c++-common/gomp/interop-4.c: Likewise.
* c-c++-common/gomp/pr118965-1.c: New.
* c-c++-common/gomp/pr118965-2.c: New.
* g++.dg/gomp/append-args-1.C: Add target/targetsync modifiers
and adjust expected output.
* g++.dg/gomp/append-args-2.C: Likewise.
* g++.dg/gomp/append-args-6.C: Likewise.
* g++.dg/gomp/append-args-7.C: Likewise.
* g++.dg/gomp/append-args-8.C: Likewise.
* g++.dg/gomp/interop-5.C: Likewise.
* gfortran.dg/gomp/append_args-1.f90: Add target/targetsync
modifiers and adjust expected output.
* gfortran.dg/gomp/append_args-2.f90: Likewise.
* gfortran.dg/gomp/append_args-3.f90: Likewise.
* gfortran.dg/gomp/append_args-4.f90: Likewise.
* gfortran.dg/gomp/interop-1.f90: Likewise.
* gfortran.dg/gomp/interop-2.f90: Likewise.
* gfortran.dg/gomp/interop-3.f90: Likewise.
* gfortran.dg/gomp/interop-4.f90: Likewise.
* gfortran.dg/gomp/pr118965-1.f90: New.
* gfortran.dg/gomp/pr118965-2.f90: New.
---
 gcc/c/c-parser.cc | 44 ---
 gcc/cp/parser.cc  | 45 ---
 gcc/fortran/openmp.cc | 77 +-
 .../c-c++-common/gomp/append-args-1.c |  4 +-
 .../c-c++-common/gomp/append-args-7.c |  4 +-
 .../c-c++-common/gomp/append-args-8.c |  9 ++-
 .../c-c++-common/gomp/append-args-9.c |  7 +-
 gcc/testsuite/c-c++-common/gomp/interop-1.c   | 78 +--
 gcc/testsuite/c-c++-common/gomp/interop-2.c   | 62 ++-
 gcc/testsuite/c-c++-common/gomp/interop-3.c   | 26 +++
 gcc/testsuite/c-c++-common/gomp/interop-4.c   |  8 +-
 gcc/testsuite/c-c++-common/gomp/pr118965-1.c  | 57 ++
 gcc/testsuite/c-c++-common/gomp/pr118965-2.c  | 31 
 gcc/testsuite/g++.dg/gomp/append-args-1.C |  6 +-
 gcc/testsuite/g++.dg/gomp/append-args-2.C |  6 +-
 gcc/testsuite/g++.dg/gomp/append-args-6.C |  4 +-
 gcc/testsuite/g++.dg/gomp/append-args-7.C |  2 +-
 gcc/testsuite/g++.dg/gomp/append-args-8.C |  9 +--
 gcc/testsuite/g++.dg/gomp/interop-5.C |  8 +-
 .../gfortran.dg/gomp/append_args-1.f90|  8 +-
 .../gfortran.dg/gomp/append_args-2.f90| 24 +++---
 .../gfortran.dg/gomp/append_args-3.f90|  2 +-
 .../gfortran.dg/gomp/

Re: [PATCH v2 0/4] Libsanitizer improvements

2025-04-01 Thread Aleksandar Rakic

HTEC Proprietary

Hi,

Could you please provide an email address for upstream libsanitizer?
We are not sure where to send these patches.

Kind regards,
Aleksandar Rakic

From: Richard Biener 
Sent: Tuesday, April 1, 2025 2:50 PM
To: Aleksandar Rakic
Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; c...@mips.com; ja...@redhat.com
Subject: Re: [PATCH v2 0/4] Libsanitizer improvements

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

On Mon, Mar 17, 2025 at 2:24 PM Aleksandar Rakic
 wrote:
>
> This patch series improves the libsanitizer for the mips target
> in GCC.

You should send these improvements to upstream libsanitizer instead.

> These patches are cherry-picked from the mips_rel/11_2_0/master
> and mips_rel/9_3_0/master branches from the MIPS' repository:
> https://github.com/MIPS/gcc .
> Further details on the individual changes are included in the
> respective patches.

Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-04-01 Thread Andrew Waterman

On Mon, Mar 31, 2025 at 11:34 PM Robin Dapp  wrote:
>
> >> Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if
> >> AVL < (2 * VLMAX)" rule...
> >
> > +1, spec has some description about this but I am not sure if I really get 
> > the point.
> >
> > From Spec:
> >
> > "For  example,  this  permits  an  implementation  to  set  vl  =  ceil(AVL
> > /  2)  for  VLMAX  <  AVL  <  2*VLMAX  in  order  to  evenly
> > distribute work over the last two iterations of a stripmine loop. 
> > Requirement
> > 2 ensures that the  rst stripmine iteration of reduction
> > loops uses the largest vector length of all iterations, even in the case of
> > AVL < 2*VLMAX. This allows software to avoid needing to
> > explicitly  calculate  a  running  maximum  of  vector  lengths  observed
> > during  a  stripmined  loop.  Requirement  2  also  allows  an
> > implementation to set vl to VLMAX for VLMAX < AVL < 2*VLMAX"
>
> Yeah, that's very unfortunate.
>
> The rule is something like
>
>   if AVL >= 2 * VLMAX
> vl = vsetvl = min (AVL, VLMAX)
>
>   if VLMAX > AVL < 2 * VLMAX
> vl = vsetvl = "whatever" ;)

Note it's not quite "whatever" -- there is a constraint that vl be
monotonically nonincreasing, which in some cases is the only important
property.  No denying this is an annoyance, though.

>
>   if AVL <= VLMAX
> vl = vsetvl = min (AVL, VLMAX)
>
> The idea of load balancing is alright I guess but it really complicates 
> matters
> in the compiler.
>
> FWIW my plan for GCC 16 is to define a SELECT_VL_SANE (or any better name I 
> can
> come up with) that doesn't have this behavior and always only performs a
> minimum instead.  This will allow us to perform scalar evolution on vsetvl
> rather than giving up as we do right now.  Microarchitectures where vsetvl
> always behaves like a minimum would then enable the corresponding 
> expander/insn
> and others would fall back to the current behavior.
>
> --
> Regards
>  Robin
>

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Martin Uecker

Am Dienstag, dem 01.04.2025 um 18:58 + schrieb Qing Zhao:
> 
> > On Apr 1, 2025, at 11:28, Martin Uecker  wrote:
> > 
> > Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao:
> > > 
> > > > On Apr 1, 2025, at 10:04, Martin Uecker  wrote:
> > > > 
> > > > 
> > > > 
> > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
> > > > > > I'd like to offer up this to solve the issues we're facing. This is 
> > > > > > a
> > > > > > combination of everything that's been discussed here (or at least 
> > > > > > that
> > > > > > I've been able to read in the centi-thread :-).
> > > > 
> > > > Thanks! I think this proposal much better as it avoids undue burden
> > > > on parsers, but it does not address all my concerns.
> > > > 
> > > > 
> > > > From my side, the issue about compromising the scoping rules of C
> > > > is also about unintended non-local effects of code changes. In
> > > > my opinion, a change in a library header elsewhere should not cause 
> > > > code in a local scope (which itself might also come from a macro) to
> > > > emit a warning or require a programmer to add a workaround. So I am
> > > > not convinced that adding warnings or a workaround such as
> > > > __builtin_global_ref  is a good solution.
> > > > 
> > > > 
> > > > I could see the following as a possible way forward: We only 
> > > > allow the following two syntaxes:
> > > > 
> > > > 1. Single argument referring to a member.
> > > > 
> > > > __counted_by(len)
> > > > 
> > > > with an argument that must be a single identifier and where
> > > > the identifier then must refer to a struct member. 
> > > > 
> > > > (I still think this is not ideal and potentially
> > > > confusing, but in contrast to new scoping rules it is
> > > > at least relatively easily to explain as a special rule.).  
> > > > 
> 
> So, in allowed syntax 1, the identifier inside counted_by attribute will be 
> looked up inside 
> the structure. 
> 
> This is our current implementation of the counted_by for FAM and my previous 
> submitted
> patch for counted_by for Pointers inside structures. 
> 
> Keeping this syntax is good. 
> 
> > > > 
> > > > 2. Forward declarations. 
> > > > 
> > > > __counted_by(size_t len; len + PADDING)
> > > 
> > > In the above, the PADDING is some constant? 
> > 
> > In principle - when considering only the name lookup rules -
> > it could be a constant, a global variable, or an automatic
> > variable, i.e. any ordinary identifiers which is visible at
> > this point. 
> 
> I am a little confused here:
> Is this syntax 2 a new syntax, and with new name lookup rules other than the 
> syntax 1?

Yes. With the regular C name lookup rules other than syntax 1.

> 
> How should the identifiers inside counted_by attribute with this syntax be 
> looked up?
> Inside the structure first? Then if not found, looking up the outer scope for 
> identifiers in the
> PADDING part?

The identifier in the forward declaration ("len") will be looked
up in the structure and will be made available when parsing
the expression.  Any other identifiers (such as "PADDING")
will not be looked up in the structure.  So it is always
clear where each identifier is going to be looked up.

> Then, has a new scoping been introduced now?
> Or some other special looking up rules for counted_by attribute?
> 
> > 
> > > 
> > > More complicated expressions involving globals will not be supported?
> > 
> > I think one could allow such expressions, But I think the
> > expressions should be restricted to expressions which have
> > no side effects. 
> 
> See my question in above,  does this new syntax 2 introduce a new “structure 
> scope” to enable
> the identifiers to be looked up inside the structure first as syntax 1?  Or, 
> this new syntax has the 
> same lookup rule as the current C, will NOT look up inside the structure 
> first?

It will NOT look into the structure, except for the forward
declared identifier.


Martin

> 
> > 
> > > 
> > > > where then the second part can also be a more complicated 
> > > > expression, but with the explicit requirement that all
> > > > identifiers in this expression are then looked up according to
> > > > regular C language rules. So except for the forward declared
> > > > member(s) they are *never* looked up in the member namespace of
> > > > the struct, i.e. no new name lookup rules are introduced.
> > > 
> > > One question here:
> > > 
> > > What’s the major issue if we’d like to add one new scoping rule, for 
> > > example,
> > > “Structure scope” (the same as the C++’s instance scope) to C? 
> > > 
> > > (In addition to the "VLA in structure" issue I mentioned in my previous 
> > > writeup, 
> > > is there any other issue to prevent this new scoping rule being added 
> > > into C ?).
> > 
> > Note that the "VLA in structure" is a bit of a red herring.  The exact same
> > issues apply to lookup of any other ordinary identifiers in this context.
> > 
> > enum { MY_BUF_SIZE = 100 };
> > struct foo {
> >  char buf[M

Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-04-01 Thread Robin Dapp


Note it's not quite "whatever" -- there is a constraint that vl be
monotonically nonincreasing, which in some cases is the only important
property.  No denying this is an annoyance, though.


Yes,  I was hoping the smiley would convey that "whatever" was not to be taken 
literally.  In terms of SCEV analysis a minimum is much more helpful.


And given our experience with the vector spec so far I have no doubt that there 
is already a design in the wild that does something other than a minimum in 
that case.


--
Regards
Robin

Re: [PATCH v2 0/4] Libsanitizer improvements

2025-04-01 Thread Andrew Pinski

On Tue, Apr 1, 2025 at 12:02 PM Aleksandar Rakic
 wrote:
>
> HTEC Proprietary
>
> Hi,
>
> Could you please provide an email address for upstream libsanitizer?
> We are not sure where to send these patches.
It is mentioned in libsantizer/README.gcc :
```
Both tools consist of a compiler module and a run-time library.
The sources of the run-time library for these projects are hosted at
https://github.com/llvm/llvm-project in the following directories:
```

Thanks,
Andrew

>
> Kind regards,
> Aleksandar Rakic
>
> 
> From: Richard Biener 
> Sent: Tuesday, April 1, 2025 2:50 PM
> To: Aleksandar Rakic
> Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; c...@mips.com; 
> ja...@redhat.com
> Subject: Re: [PATCH v2 0/4] Libsanitizer improvements
>
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> On Mon, Mar 17, 2025 at 2:24 PM Aleksandar Rakic
>  wrote:
> >
> > This patch series improves the libsanitizer for the mips target
> > in GCC.
>
> You should send these improvements to upstream libsanitizer instead.
>
> > These patches are cherry-picked from the mips_rel/11_2_0/master
> > and mips_rel/9_3_0/master branches from the MIPS' repository:
> > https://github.com/MIPS/gcc .
> > Further details on the individual changes are included in the
> > respective patches.

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Qing Zhao



> On Apr 1, 2025, at 15:25, Martin Uecker  wrote:
> 
> Am Dienstag, dem 01.04.2025 um 18:58 + schrieb Qing Zhao:
>> 
>>> On Apr 1, 2025, at 11:28, Martin Uecker  wrote:
>>> 
>>> Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao:
 
> On Apr 1, 2025, at 10:04, Martin Uecker  wrote:
> 
> 
> 
> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
>>> I'd like to offer up this to solve the issues we're facing. This is a
>>> combination of everything that's been discussed here (or at least that
>>> I've been able to read in the centi-thread :-).
> 
> Thanks! I think this proposal much better as it avoids undue burden
> on parsers, but it does not address all my concerns.
> 
> 
> From my side, the issue about compromising the scoping rules of C
> is also about unintended non-local effects of code changes. In
> my opinion, a change in a library header elsewhere should not cause 
> code in a local scope (which itself might also come from a macro) to
> emit a warning or require a programmer to add a workaround. So I am
> not convinced that adding warnings or a workaround such as
> __builtin_global_ref  is a good solution.
> 
> 
> I could see the following as a possible way forward: We only 
> allow the following two syntaxes:
> 
> 1. Single argument referring to a member.
> 
> __counted_by(len)
> 
> with an argument that must be a single identifier and where
> the identifier then must refer to a struct member. 
> 
> (I still think this is not ideal and potentially
> confusing, but in contrast to new scoping rules it is
> at least relatively easily to explain as a special rule.).  
> 
>> 
>> So, in allowed syntax 1, the identifier inside counted_by attribute will be 
>> looked up inside 
>> the structure. 
>> 
>> This is our current implementation of the counted_by for FAM and my previous 
>> submitted
>> patch for counted_by for Pointers inside structures. 
>> 
>> Keeping this syntax is good. 
>> 
> 
> 2. Forward declarations. 
> 
> __counted_by(size_t len; len + PADDING)
 
 In the above, the PADDING is some constant? 
>>> 
>>> In principle - when considering only the name lookup rules -
>>> it could be a constant, a global variable, or an automatic
>>> variable, i.e. any ordinary identifiers which is visible at
>>> this point. 
>> 
>> I am a little confused here:
>> Is this syntax 2 a new syntax, and with new name lookup rules other than the 
>> syntax 1?
> 
> Yes. With the regular C name lookup rules other than syntax 1.
> 
>> 
>> How should the identifiers inside counted_by attribute with this syntax be 
>> looked up?
>> Inside the structure first? Then if not found, looking up the outer scope 
>> for identifiers in the
>> PADDING part?
> 
> The identifier in the forward declaration ("len") will be looked
> up in the structure and will be made available when parsing
> the expression.  Any other identifiers (such as "PADDING")
> will not be looked up in the structure.  So it is always
> clear where each identifier is going to be looked up.

Yeah, this sounds a good idea to me, and a nice compromise solution.  -:)

Then, if more than one members need to be in the expression, for example:

int number;

struct A {
   size_t count_1;
   size_t count_2;
   char *array __counted_by (size_t count_1; size_t count_2; count1 + count2 + 
number * 4)
}

i.e., all the members that will be in the counted_by expression should be 
declared first inside the 
counted_by, then all other variables in the expression could be looked up per 
the default scoping rule. 

Is the understanding correct?

Qing

> 
>> Then, has a new scoping been introduced now?
>> Or some other special looking up rules for counted_by attribute?
>> 
>>> 
 
 More complicated expressions involving globals will not be supported?
>>> 
>>> I think one could allow such expressions, But I think the
>>> expressions should be restricted to expressions which have
>>> no side effects. 
>> 
>> See my question in above,  does this new syntax 2 introduce a new “structure 
>> scope” to enable
>> the identifiers to be looked up inside the structure first as syntax 1?  Or, 
>> this new syntax has the 
>> same lookup rule as the current C, will NOT look up inside the structure 
>> first?
> 
> It will NOT look into the structure, except for the forward
> declared identifier.
> 
> 
> Martin
> 
>> 
>>> 
 
> where then the second part can also be a more complicated 
> expression, but with the explicit requirement that all
> identifiers in this expression are then looked up according to
> regular C language rules. So except for the forward declared
> member(s) they are *never* looked up in the member namespace of
> the struct, i.e. no new name lookup rules are introduced.
 
 One question here:
 
 What’s the major issue if we’d like to

Re: [PATCH] [testsuite] [riscv] limit vwaddsub-1.c to rv64

2025-04-01 Thread Jeff Law





On 4/1/25 8:03 AM, Kito Cheng wrote:

On Tue, Apr 1, 2025 at 12:47 PM Jeff Law  wrote:




On 3/31/25 7:03 PM, Alexandre Oliva wrote:

On Mar 31, 2025, Jeff Law  wrote:

I don't immediately see anything in this test or its history to
indicate it's only supposed to work for rv64.


It's the 64-bit integral argument rs1.

Right, but ISTM we ought to be able to handle a vector of 64bit integral
types, especially from the intrinsics interface.

Probably the thing to do is see what the intrinsics docs say.   Or maybe
Kito knows offhand.


Long story, fortunately we have some record on that:

https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/9
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/199
Thanks.  While they don't address this issue specifically, Craig's 
comments seem to indicate a desire to support the intrinsics in these 
scenarios and the compiler has to deal with the 64bit integer problems 
on rv32.


I don't think it changes anything for Alex, we still need to disable the 
test for rv32 as the expected sequence just isn't right in that case.


jeff

Re: [PATCH] aarch64: Deprecate -march= for the month of April

2025-04-01 Thread Richard Earnshaw (lists)

On 01/04/2025 09:42, Kyrylo Tkachov wrote:
> Hi all,
> 
> As we're starting a new month, introduce a more appropriate -mapril=
> to specify the compilation target instead.
> This helps keep GCC more up to date with the passage of time.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Signed-off-by: Kyrylo Tkachov 
> 
> gcc/
> 
>   * config/aarch64/aarch64.opt (mapril): Define.
>   * doc/invoke.texi (AArch64 Options): Document the above.
>   Deprecate -march=.
> 

Nice one.  But I think we'll need to wait until -may before committing this.

R.

Re: [PATCH] c++: Fix typo in RAW_DATA_CST build_list_conv subsubconv hanling [PR119563]

2025-04-01 Thread Marek Polacek

On Tue, Apr 01, 2025 at 04:31:38PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs (the embed one actually doesn't but
> dereferences random uninitialized pointer far after allocated memory)
> because of a typo.  In the RAW_DATA_CST handling of list conversion
> where there are conversions to something other than
> initializer_list<{{,un}signed ,}char>, the code now calls
> implicit_conversion for all the RAW_DATA_CST elements and stores them
> into subsubconvs array.
> The next loop (done in a separate loop because subsubconvs[0] is
> handled differently) attempts to do the
>   for (i = 0; i < len; ++i)
> {
>   conversion *sub = subconvs[i];
>   if (sub->rank > t->rank)
> t->rank = sub->rank;
>   if (sub->user_conv_p)
> t->user_conv_p = true;
>   if (sub->bad_p)
> t->bad_p = true;
> }
> rank/user_conv_p/bad_p merging, but I mistyped the index, the loop
> iterates with j iterator and i is subconvs index, so the loop effectively
> doesn't do anything interesting except for merging from one of the
> subsubconvs element, if lucky within the subsubconvs array, if unlucky
> not even from inside of the array.
> 
> The following patch fixes that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.
 
> 2025-04-01  Andrew Pinski  
>   Jakub Jelinek  
> 
>   PR c++/119563
>   * call.cc (build_list_conv): Fix a typo in loop gathering
>   summary information from subsubconvs.
> 
>   * g++.dg/cpp0x/pr119563.C: New test.
>   * g++.dg/cpp/embed-26.C: New test.
> 
> --- gcc/cp/call.cc.jj 2025-03-06 11:08:20.657231238 +0100
> +++ gcc/cp/call.cc2025-04-01 11:25:14.030321320 +0200
> @@ -917,7 +917,7 @@ build_list_conv (tree type, tree ctor, i
> t->rank = cr_exact;
> for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j)
>   {
> -   sub = subsubconvs[i];
> +   sub = subsubconvs[j];
> if (sub->rank > t->rank)
>   t->rank = sub->rank;
> if (sub->user_conv_p)
> --- gcc/testsuite/g++.dg/cpp0x/pr119563.C.jj  2025-04-01 11:14:24.702394749 
> +0200
> +++ gcc/testsuite/g++.dg/cpp0x/pr119563.C 2025-04-01 11:13:50.962866358 
> +0200
> @@ -0,0 +1,79 @@
> +// PR c++/119563
> +// { dg-do run { target c++11 } }
> +// { dg-options "-O2" }
> +
> +namespace std {
> +template 
> +struct initializer_list {
> +private:
> +  T *_M_array;
> +  decltype (sizeof 0) _M_len;
> +public:
> +  constexpr decltype (sizeof 0)
> +  size () const noexcept { return _M_len; }
> +  constexpr const T *
> +  begin () const noexcept { return _M_array; }
> +  constexpr const T *
> +  end () const noexcept { return begin () + size (); }
> +};
> +}
> +
> +struct A {} a;
> +
> +struct B {
> +  constexpr B (int x) : B (a, x) {}
> +  template 
> +  constexpr B (A, T... x) : b(x...) {}
> +  int b;
> +};
> +
> +struct C {
> +  C (std::initializer_list x)
> +  {
> +if (x.size () != 130 + 1024 + 130)
> +  __builtin_abort ();
> +unsigned int i = 1, j = 0;
> +for (auto a = x.begin (); a < x.end (); ++a)
> +  if (a->b != i)
> + __builtin_abort ();
> +  else
> + {
> +   if (j == 129 || j == 129 + 1024)
> + i = 0;
> +   i = (i & 15) + 1;
> +   ++j;
> + }
> +c = true;
> +  }
> +  bool c;
> +};
> +
> +#define D 1 + 0, 2 + 0, 3 + 0, 4 + 0, 5 + 0, 6 + 0, 7 + 0, 8 + 0, \
> +   9 + 0, 10 + 0, 11 + 0, 12 + 0, 13 + 0, 14 + 0, 15 + 0, 16 + 0
> +#define E D, D, D, D, D, D, D, D
> +
> +C c { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, E, E, E, E, E, E, E, E,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> +  1, 2 };
> +
> +int
> +main ()
> +{
> +  if (!c.c)
> +__builtin_abort ();
> +}
> --- gcc/testsuite/g++.dg/cpp/embed-26.C.jj2025-04-01 11:18:56.563595381 
> +0200
> +++ gcc/testsuite/g++.dg/cpp/embed-26.C   2025-04-01 11:18:35.444890450 
> +0200
> @@ -0,0 +1,63 @@
> +// PR c++/119563
> +// { dg-do run { target c++11 } }
> +// { dg-options "-O2" }
> +
> +namespace std {
> +template 
>

[PATCH] PR119482: Avoid mispredictions in bitmap_set_bit

2025-04-01 Thread Andi Kleen

From: Andi Kleen 

This isn't a regression, but it's a very simple patch with high
performance improvement, so perhaps suitable in the current stage.

---

bitmap_set_bit checks the original value of the bit to return it to the
caller and then only writes the new value back if it changes.
Most callers of bitmap_set_bit don't need the return value, but with the 
conditional store
the CPU still has to predict it correctly since gcc doesn't know how to do
that without APX on x86 (even though CMOV could do it with a dummy target).
Really if-conversion should handle this case, but for now we can fix
it.

This simple patch improves runtime by 15% for the test case in the PR.
Which is more than I expected given it only has ~1.44% of the cycles, but I 
guess
the mispredicts caused some down stream effects.

  ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet ran
1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 
pr119482.cc  -quiet

At least with this test case the total number of branches decreases
drastically. Even though the mispredict rate goes up slightly it is
still a big win.

$ perf stat -e 
branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \
-a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc  -quiet -w

 Performance counter stats for 'system wide':

41,932,957,091  branches
   686,117,623  branch-misses#1.64% of all 
branches
 43,690.47 MiB  uncore_imc/cas_count_read/
 12,362.56 MiB  uncore_imc/cas_count_write/

  49.328633365 seconds time elapsed

$ perf stat -e 
branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \
-a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet -w

 Performance counter stats for 'system wide':

37,092,113,179  branches
   663,641,708  branch-misses#1.79% of all 
branches
 43,196.52 MiB  uncore_imc/cas_count_read/
 12,369.33 MiB  uncore_imc/cas_count_write/

  42.632458350 seconds time elapsed

gcc/ChangeLog:

PR middle-end/119482
* bitmap.cc (bitmap_set_bit): Write back value unconditionally
---
 gcc/bitmap.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index f5a64b495ab3..7744f8f8c2e4 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit)
   if (ptr != 0)
 {
   bool res = (ptr->bits[word_num] & bit_val) == 0;
-  if (res)
-   ptr->bits[word_num] |= bit_val;
+  ptr->bits[word_num] |= bit_val;
   return res;
 }
 
-- 
2.49.0

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Bill Wendling

On Tue, Apr 1, 2025 at 5:41 AM Michael Matz  wrote:
>
> Hello,
>
> On Mon, 31 Mar 2025, Bill Wendling wrote:
>
> > 1. The use of '__self' isn't feasible, so we won't use it.
>
> That's a bold statement.  How's that?  The only thing I read here is that
> the very spelling of "self" was objected to.  So, call it _Self,
> _Selfref, or something.  Even _Whatever42 would be better.
>
> Lookup rule changes only for identifiers used in attributes (all of them?
> some of them?) is ... well, not good language design?  I think you used
> the word "overreaching" upthread, correctly so.  And that for wanting to
> avoid the established member-lookup syntax A.B by way of "we won't use
> it"?  That seems a terrible tradeoff.  I don't understand the push for
> that approach, at all.
>
It's not the name but the use of the identifier itself. If we can't
enforce using it in every situation it's no longer useful. This is
something that Apple (and the Clang community in general) is very
against doing (not the identifier, but requiring its use). I still
agree that adding a new scoping rule is overreach, but I'm trying to
compromise so that we don't have multiple compilers with competing
syntaxes. And as Yeoul pointed out in her RFC, we already have a
similar lookup scheme with the "guarded_by" attribute.

-bw

[COMMITTED 16/35] gccrs: Fix ICE when array elements are not a value

2025-04-01 Thread arthur . cohen

From: Philip Herron 

We need to check for error_mark_node when doing adjustments from coercion
sites otherwise we hit assetions as part of the coercion. That fixes the
ICE but the reason for the error_mark_node is because the array element
value.

Fixes Rust-GCC#3567

gcc/rust/ChangeLog:

* backend/rust-compile-expr.cc (CompileExpr::array_value_expr): add 
value chk for array expr

gcc/testsuite/ChangeLog:

* rust/compile/issue-3567.rs: New test.

Signed-off-by: Philip Herron 
---
 gcc/rust/backend/rust-compile-expr.cc| 11 +++
 gcc/testsuite/rust/compile/issue-3567.rs |  4 
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/rust/compile/issue-3567.rs

diff --git a/gcc/rust/backend/rust-compile-expr.cc 
b/gcc/rust/backend/rust-compile-expr.cc
index 21f4ca1ba18..e4ab9f010b8 100644
--- a/gcc/rust/backend/rust-compile-expr.cc
+++ b/gcc/rust/backend/rust-compile-expr.cc
@@ -1902,6 +1902,14 @@ CompileExpr::array_value_expr (location_t expr_locus,
   for (auto &elem : elems.get_values ())
 {
   tree translated_expr = CompileExpr::Compile (*elem, ctx);
+  if (translated_expr == error_mark_node)
+   {
+ rich_location r (line_table, expr_locus);
+ r.add_fixit_replace (elem->get_locus (), "not a value");
+ rust_error_at (r, ErrorCode::E0423, "expected value");
+ return error_mark_node;
+   }
+
   constructor.push_back (translated_expr);
   indexes.push_back (i++);
 }
@@ -2003,6 +2011,9 @@ HIRCompileBase::resolve_adjustements (
   tree e = expression;
   for (auto &adjustment : adjustments)
 {
+  if (e == error_mark_node)
+   return error_mark_node;
+
   switch (adjustment.get_type ())
{
case Resolver::Adjustment::AdjustmentType::ERROR:
diff --git a/gcc/testsuite/rust/compile/issue-3567.rs 
b/gcc/testsuite/rust/compile/issue-3567.rs
new file mode 100644
index 000..021d9c27838
--- /dev/null
+++ b/gcc/testsuite/rust/compile/issue-3567.rs
@@ -0,0 +1,4 @@
+fn main() {
+let _: &[i8] = &[i8];
+// { dg-error "expected value .E0423." "" { target *-*-* } .-1 }
+}
-- 
2.49.0

Patch ping [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]

2025-04-01 Thread Jakub Jelinek

Hi!

On Tue, Mar 25, 2025 at 08:34:10AM +0100, Jakub Jelinek wrote:
> As discussed here and in bugzilla, [[clang::musttail]] attribute in clang
> not just strongly asks for tail call or error, but changes behavior.
> To quote:
> https://clang.llvm.org/docs/AttributeReference.html#musttail
> "The lifetimes of all local variables and function parameters end immediately
> before the call to the function.  This means that it is undefined behaviour
> to pass a pointer or reference to a local variable to the called function,
> which is not the case without the attribute.  Clang will emit a warning in
> common cases where this happens."
> 
> The GCC behavior was just to error if we can't prove the musttail callee
> could not have dereferenced escaped pointers to local vars or parameters
> of the caller.  That is still the case for variables with non-trivial
> destruction (even in clang), like vars with C++ non-trivial destructors or
> variables with cleanup attribute.
> 
> The following patch changes the behavior to match that of clang, for all of
> [[clang::musttail]], [[gnu::musttail]] and __attribute__((musttail)).
> 
> clang 20 actually added warning for some cases of it in
> https://github.com/llvm/llvm-project/pull/109255
> but it is under -Wreturn-stack-address warning.
> 
> Now, gcc doesn't have that warning, but -Wreturn-local-addr instead, and
> IMHO it is better to have this under new warnings, because this isn't about
> returning local address, but about passing it to a musttail call, or maybe
> escaping to a musttail call.  And perhaps users will appreciate they can
> control it separately as well.
> 
> The patch introduces 2 new warnings.
> -Wmusttail-local-addr
> which is turn on by default and warns for the always dumb cases of passing
> an address of a local variable or parameter to musttail call's argument.
> And then
> -Wmaybe-musttail-local-addr
> which is only diagnosed if -Wmusttail-local-addr was not diagnosed and
> diagnoses at most one (so that we don't emit 100s of warnings for one call
> if 100s of vars can escape) case where an address of a local var could have
> escaped to the musttail call.  This is less severe, the code doesn't have
> to be obviously wrong, so the warning is only enabled in -Wextra.
> 
> And I've adjusted also the documentation for this change and addition of
> new warnings.

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679182.html
patch.
I know it is quite controversial and if clang wouldn't be the first
to implement this I'd certainly not go that way; I am willing to change
the warning option names or move the maybe one from -Wextra to -Wall
if there is agreement on that.  Unfortunately there seems to be
quite a lot of code in the wild that uses this attribute already
and without this patch GCC 15 will simply not be able to compile that
(whether it is firefox (skia etc.), protobuf, ...).

The only other option is IMHO to drop the musttail attribute support
for GCC 15 or name it differently with different semantics.
But not sure projects in the wild will like to annotate their calls
with two different musttail like attributes if they satisfy behavior
of both.

Jakub

Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]

2025-04-01 Thread Jeff Law





On 4/1/25 12:15 PM, Vineet Gupta wrote:

On 3/31/25 23:48, Heinrich Schuchardt wrote:

On 3/30/25 01:49, Vineet Gupta wrote:

changes since v2
   - dump log sanfu

---
vsetvl phase4 uses LCM guided info to insert VSETVL insns.
It has an additional loop to insert missing vsetvls on certain edges.
Currently it asserts/aborts on encountering EDGE_ABNORMAL.
When enabling go frontend with V enabled, libgo build hits the assert.

It seems to be safe to just skip the abnormal edge.


Hello Vineet,

Is there a test case where only following an abnormal edge between code
blocks would require to call VSETVL?

In the sketched code below following the exception would require VSETVL
to be called while the edge from the try block to the finally block
would not require this.

try {
for (..) {
uint16_t vectorizable math
if (condition)
throw exception
uint16_t vectorizable math
}
for (..) {
uint8_t vectorizable math
}
} catch exception {
} finally
for (..) {
uint8_t vectorizable math
}
}


Yeah we are going to run testsuite with -fnon-call-exceptions to find such 
cases.

But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs.
code complexity trade-off).
After all we'd just endup with an extra VSETVL in the finally block.
I'd look at that skeleton code as a way to potentially trip this issue 
without needing golang.


Jeff

[PATCH, V6] PR target/118541 - Do not generate unordered fp cmoves for IEEE compares on PowerPC

2025-04-01 Thread Michael Meissner

 Bernhard Reutner-Fischer suggested some typos to the patch for 118551.  Here
 is the changed patch.

In bug PR target/118541 on power9, power10, and power11 systems, for the
function:

extern double __ieee754_acos (double);

double
__acospi (double x)
{
  double ret = __ieee754_acos (x) / 3.14;
  return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
}

GCC currently generates the following code:

Power9  Power10 and Power11
==  ===
bl __ieee754_acos   bl __ieee754_acos@notoc
nop plfd 0,.LC0@pcrel
addis 9,2,.LC2@toc@ha   xxspltidp 12,1065353216
addi 1,1,32 addi 1,1,32
lfd 0,.LC2@toc@l(9) ld 0,16(1)
addis 9,2,.LC0@toc@ha   fdiv 0,1,0
ld 0,16(1)  mtlr 0
lfd 12,.LC0@toc@l(9)xscmpgtdp 1,0,12
fdiv 0,1,0  xxsel 1,0,12,1
mtlr 0  blr
xscmpgtdp 1,0,12
xxsel 1,0,12,1
blr

This is because ifcvt.c optimizes the conditional floating point move to use the
XSCMPGTDP instruction.

However, the XSCMPGTDP instruction will generate an interrupt if one of the
arguments is a signalling NaN and signalling NaNs can generate an interrupt.
The IEEE comparison functions (isgreater, etc.) require that the comparison not
raise an interrupt.

The following patch changes the PowerPC back end so that ifcvt.c will not change
the if/then test and move into a conditional move if the comparison is one of
the comparisons that do not raise an error with signalling NaNs and -Ofast is
not used.  If a normal comparison is used or -Ofast is used, GCC will continue
to generate XSCMPGTDP and XXSEL.

For the following code:

double
ordered_compare (double a, double b, double c, double d)
{
  return __builtin_isgreater (a, b) ? c : d;
}

/* Verify normal > does generate xscmpgtdp.  */

double
normal_compare (double a, double b, double c, double d)
{
  return a > b ? c : d;
}

with the following patch, GCC generates the following for power9, power10, and
power11:

ordered_compare:
fcmpu 0,1,2
fmr 1,4
bnglr 0
fmr 1,3
blr

normal_compare:
xscmpgtdp 1,1,2
xxsel 1,4,3,1
blr

I have built bootstrap compilers on big endian power9 systems and little endian
power9/power10 systems and there were no regressions.  Can I check this patch
into the GCC trunk, and after a waiting period, can I check this into the active
older branches?

2025-04-01  Michael Meissner  

gcc/

PR target/118541
* config/rs6000/predicates.md (invert_fpmask_comparison_operator): Do
not allow UNLT and UNLE unless -ffast-math.
* config/rs6000/rs6000-protos.h (enum rev_cond_ordered): New 
enumeration.
(rs6000_reverse_condition): Add argument.
* config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
ordered comparisons to be reversed for floating point conditional moves,
but allow ordered comparisons to be reversed on jumps.
(rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
* config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
* config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
Adjust rs6000_reverse_condition calls.

gcc/testsuite/

PR target/118541
* gcc.target/powerpc/pr118541-1.c: New test.
* gcc.target/powerpc/pr118541-2.c: Likewise.
* gcc.target/powerpc/pr118541-3.c: Likewise.
* gcc.target/powerpc/pr118541-4.c: Likewise.
---
 gcc/config/rs6000/predicates.md   | 10 +++-
 gcc/config/rs6000/rs6000-protos.h | 17 ++-
 gcc/config/rs6000/rs6000.cc   | 46 ++-
 gcc/config/rs6000/rs6000.h| 10 +++-
 gcc/config/rs6000/rs6000.md   | 25 ++
 gcc/testsuite/gcc.target/powerpc/pr118541-1.c | 28 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-2.c | 26 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-3.c | 26 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-4.c | 26 +++
 9 files changed, 190 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr118541-4.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 647e89afb6a..ba8df6a7979 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1466,8 +146

[COMMITTED] Doc: Document _Bool type as C90 extension [PR118118]

2025-04-01 Thread Sandra Loosemore

gcc/ChangeLog
PR c/118118
* doc/extend.texi (Boolean Type): New section.
---
 gcc/doc/extend.texi | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 76fb210060d..d2bf6048be7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12944,6 +12944,7 @@ C and/or C++ standards, while others remain specific to 
GNU C.
 * Offsetof::Special syntax for @code{offsetof}.
 * Alignment::   Determining the alignment of a function, type or 
variable.
 * Enum Extensions:: Forward declarations and specifying the underlying 
type.
+* Boolean Type::   Support for the @code{_Bool} keyword.
 * Variadic Macros:: Macros with a variable number of arguments.
 * Conditionals::Omitting the middle operand of a @samp{?:} expression.
 * Case Ranges:: `case 1 ... 9' and such.
@@ -13698,6 +13699,17 @@ Forward-declaring an incomplete enum type without an 
explicit
 underlying type is supported as an extension in all GNU C dialects,
 but is not supported at all in GNU C++.
 
+@node Boolean Type
+@subsection Support for the @code{_Bool} Type
+@cindex boolean type
+@cindex @code{_Bool} keyword
+
+The C99 standard added @code{_Bool} as a C language keyword naming the
+boolean type.  As an extension, GNU C also recognizes @code{_Bool} in
+C90 mode as well as with @option{-std=c99} and later.
+
+GNU C++ does not support the @code{_Bool} keyword.
+
 @node Variadic Macros
 @subsection Macros with a Variable Number of Arguments.
 @cindex variable number of arguments
-- 
2.34.1

[COMMITTED] Doc: Cross-reference constructor and init_priority attributes [PR118982]

2025-04-01 Thread Sandra Loosemore

Per the issue, the discussion of these two attributes needed to be
better integrated.  I also did some editing for style and readability,
and clarified that almost all targets support this feature (it is
enabled by default unless the back end disables it), not just "some".

Co-Authored_by: Jonathan Wakely 

gcc/ChangeLog
PR c++/118982
* doc/extend.texi (Common Function Attributes): For the
constructor/destructory attribute, be more explicit about the
relationship between the constructor attribute and
the C++ init_priority attribute, and add a cross-reference.
Also document that most targets support this.
(C++ Attributes): Similarly for the init_priority attribute.
---
 gcc/doc/extend.texi | 49 -
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d2bf6048be7..48c3aeb1318 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2035,24 +2035,32 @@ called.  Functions with these attributes are useful for
 initializing data that is used implicitly during the execution of
 the program.
 
-On some targets the attributes also accept an integer argument to
+On most targets the attributes also accept an integer argument to
 specify a priority to control the order in which constructor and
-destructor functions are run.  A constructor
-with a smaller priority number runs before a constructor with a larger
-priority number; the opposite relationship holds for destructors.  Note
-that priorities 0-100 are reserved.  So, if you have a constructor that
+destructor functions are run.  The @var{priority} argument is a
+constant integral expression bounded between 101 and 65535 inclusive;
+priorities 0-100 are reserved for use by the compiler and its runtime
+libraries.
+A constructor with a smaller priority number runs before a constructor with
+a larger priority number; the opposite relationship holds for destructors.
+So, if you have a constructor that
 allocates a resource and a destructor that deallocates the same
-resource, both functions typically have the same priority.  The
-priorities for constructor and destructor functions are the same as
-those specified for namespace-scope C++ objects (@pxref{C++ Attributes}).
-However, at present, the order in which constructors for C++ objects
-with static storage duration and functions decorated with attribute
-@code{constructor} are invoked is unspecified. In mixed declarations,
-attribute @code{init_priority} can be used to impose a specific ordering.
+resource, both functions typically have the same priority.
 
-Using the argument forms of the @code{constructor} and @code{destructor}
-attributes on targets where the feature is not supported is rejected with
-an error.
+The order in which constructors for C++ objects with static storage
+duration are invoked relative to functions decorated with attribute
+@code{constructor} is normally unspecified.  You can use
+attribute @code{init_priority} (@pxref{C++ Attributes}) on the
+declarations of namespace-scope C++ objects to impose a specific
+ordering; the @var{priority} for the @code{init_priority} attribute
+has the same effect as the @var{priority} for the @code{constructor}
+attribute.
+
+Using the argument form of the @code{constructor} and
+@code{destructor} attributes on targets where the feature is not
+supported is rejected with an error.  Only a few targets (typically
+those not using ELF object format, or the GNU linker) reject this
+usage.
 
 @cindex @code{copy} function attribute
 @item copy
@@ -30291,11 +30299,11 @@ variable or function or moving it into a tagged 
inline namespace.
 In Standard C++, objects defined at namespace scope are guaranteed to be
 initialized in an order in strict accordance with that of their definitions
 @emph{in a given translation unit}.  No guarantee is made for initializations
-across translation units.  However, GNU C++ allows users to control the
+across translation units.  However, GNU C++ allows you to control the
 order of initialization of objects defined at namespace scope with the
 @code{init_priority} attribute by specifying a relative @var{priority},
-a constant integral expression currently bounded between 101 and 65535
-inclusive.  Lower numbers indicate a higher priority.
+with the same meaning as for the @code{constructor} attribute
+(@pxref{Common Function Attributes}).
 
 In the following example, @code{A} would normally be created before
 @code{B}, but the @code{init_priority} attribute reverses that order:
@@ -30309,6 +30317,11 @@ Some_Class  B  __attribute__ ((init_priority (543)));
 Note that the particular values of @var{priority} do not matter; only their
 relative ordering.
 
+As with the @var{priority} argument to the @code{constructor} and
+@code{destructor} attributes, a few targets do not support the
+@code{init_priority} attribute.  In that case the attribute is rejected
+with an error rat

Re: Patch ping [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]

2025-04-01 Thread Andi Kleen

> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679182.html
> patch.
> I know it is quite controversial and if clang wouldn't be the first
> to implement this I'd certainly not go that way; I am willing to change
> the warning option names or move the maybe one from -Wextra to -Wall
> if there is agreement on that.  Unfortunately there seems to be
> quite a lot of code in the wild that uses this attribute already
> and without this patch GCC 15 will simply not be able to compile that
> (whether it is firefox (skia etc.), protobuf, ...).

FWIW the patch looks good to me.
> 
> The only other option is IMHO to drop the musttail attribute support
> for GCC 15 or name it differently with different semantics.
> But not sure projects in the wild will like to annotate their calls
> with two different musttail like attributes if they satisfy behavior
> of both.

Projects that are serious about the performance will also need
to use __attribute__((no_callee_saved_registers)) and that is different
than the clang equivalent. So some ifdefery is needed anyways.

-Andi

[COMMITTED 09/35] gccrs: nr2.0: Fix test const_generics_3.rs

2025-04-01 Thread arthur . cohen

From: Owen Avery 

gcc/testsuite/ChangeLog:

* rust/compile/const_generics_3.rs: Modify test to run with name
resolution 2.0 only and to handle the absence of a bogus
resolution error.
* rust/compile/nr2/exclude: Remove const_generics_3.rs.

Signed-off-by: Owen Avery 
---
 gcc/testsuite/rust/compile/const_generics_3.rs | 8 +---
 gcc/testsuite/rust/compile/nr2/exclude | 1 -
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/rust/compile/const_generics_3.rs 
b/gcc/testsuite/rust/compile/const_generics_3.rs
index e4e9008c4a6..524d48d5bcf 100644
--- a/gcc/testsuite/rust/compile/const_generics_3.rs
+++ b/gcc/testsuite/rust/compile/const_generics_3.rs
@@ -1,10 +1,12 @@
-// { dg-additional-options "-w" }
+// { dg-additional-options "-w -frust-name-resolution-2.0" }
+
+#[lang = "sized"]
+trait Sized {}
 
 const M: usize = 4;
 
 struct Foo {
-// FIXME: This error is bogus. But having it means parsing is valid!
-value: [i32; N], // { dg-error "cannot find value .N. in this scope" }
+value: [T; N],
 }
 
 fn main() {
diff --git a/gcc/testsuite/rust/compile/nr2/exclude 
b/gcc/testsuite/rust/compile/nr2/exclude
index 45e90a4df93..71c2b682bf0 100644
--- a/gcc/testsuite/rust/compile/nr2/exclude
+++ b/gcc/testsuite/rust/compile/nr2/exclude
@@ -1,6 +1,5 @@
 canonical_paths1.rs
 cfg1.rs
-const_generics_3.rs
 generics9.rs
 issue-2043.rs
 issue-2330.rs
-- 
2.49.0

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Qing Zhao



> On Apr 1, 2025, at 11:28, Martin Uecker  wrote:
> 
> Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao:
>> 
>>> On Apr 1, 2025, at 10:04, Martin Uecker  wrote:
>>> 
>>> 
>>> 
>>> Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
> I'd like to offer up this to solve the issues we're facing. This is a
> combination of everything that's been discussed here (or at least that
> I've been able to read in the centi-thread :-).
>>> 
>>> Thanks! I think this proposal much better as it avoids undue burden
>>> on parsers, but it does not address all my concerns.
>>> 
>>> 
>>> From my side, the issue about compromising the scoping rules of C
>>> is also about unintended non-local effects of code changes. In
>>> my opinion, a change in a library header elsewhere should not cause 
>>> code in a local scope (which itself might also come from a macro) to
>>> emit a warning or require a programmer to add a workaround. So I am
>>> not convinced that adding warnings or a workaround such as
>>> __builtin_global_ref  is a good solution.
>>> 
>>> 
>>> I could see the following as a possible way forward: We only 
>>> allow the following two syntaxes:
>>> 
>>> 1. Single argument referring to a member.
>>> 
>>> __counted_by(len)
>>> 
>>> with an argument that must be a single identifier and where
>>> the identifier then must refer to a struct member. 
>>> 
>>> (I still think this is not ideal and potentially
>>> confusing, but in contrast to new scoping rules it is
>>> at least relatively easily to explain as a special rule.).  
>>> 

So, in allowed syntax 1, the identifier inside counted_by attribute will be 
looked up inside 
the structure. 

This is our current implementation of the counted_by for FAM and my previous 
submitted
patch for counted_by for Pointers inside structures. 

Keeping this syntax is good. 

>>> 
>>> 2. Forward declarations. 
>>> 
>>> __counted_by(size_t len; len + PADDING)
>> 
>> In the above, the PADDING is some constant? 
> 
> In principle - when considering only the name lookup rules -
> it could be a constant, a global variable, or an automatic
> variable, i.e. any ordinary identifiers which is visible at
> this point. 

I am a little confused here:
Is this syntax 2 a new syntax, and with new name lookup rules other than the 
syntax 1?

How should the identifiers inside counted_by attribute with this syntax be 
looked up?
Inside the structure first? Then if not found, looking up the outer scope for 
identifiers in the
PADDING part?

Then, has a new scoping been introduced now?
Or some other special looking up rules for counted_by attribute?

> 
>> 
>> More complicated expressions involving globals will not be supported?
> 
> I think one could allow such expressions, But I think the
> expressions should be restricted to expressions which have
> no side effects.

See my question in above,  does this new syntax 2 introduce a new “structure 
scope” to enable
the identifiers to be looked up inside the structure first as syntax 1?  Or, 
this new syntax has the 
same lookup rule as the current C, will NOT look up inside the structure first?

> 
>> 
>>> where then the second part can also be a more complicated 
>>> expression, but with the explicit requirement that all
>>> identifiers in this expression are then looked up according to
>>> regular C language rules. So except for the forward declared
>>> member(s) they are *never* looked up in the member namespace of
>>> the struct, i.e. no new name lookup rules are introduced.
>> 
>> One question here:
>> 
>> What’s the major issue if we’d like to add one new scoping rule, for example,
>> “Structure scope” (the same as the C++’s instance scope) to C? 
>> 
>> (In addition to the "VLA in structure" issue I mentioned in my previous 
>> writeup, 
>> is there any other issue to prevent this new scoping rule being added into C 
>> ?).
> 
> Note that the "VLA in structure" is a bit of a red herring.  The exact same
> issues apply to lookup of any other ordinary identifiers in this context.
> 
> enum { MY_BUF_SIZE = 100 };
> struct foo {
>  char buf[MY_BUF_SIZE];
> };
> 
Yes, this is because there is NO “structure scope” available in C. As long as 
the “structure scope”
is added into C, identifiers could be looked up inside the “structure scope” 
first before looking up
outer scopes. 

> 
> C++ has instance scope for member functions. The rules for C++ are also
> complex and not very consistent (see the examples I posted earlier,
> demonstrating UB and compiler divergence). 

Yes, I studied those C++ examples when I wrote the proposal. And my observation
was: in C++, the instance scope always has higher priority than local and 
global scopes. 
i.e, when there is a conflict between instance scope and local/global scope for 
the identifier,
The identifier within the instance scope will shadow the one with the same name 
in the 
outer scope. 

But in C, there is No concept of “structure scope” at all. Identifiers will NOT 
looked u

Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 05:13:46PM -0700, Bill Wendling wrote:
> On Tue, Apr 1, 2025 at 8:29 AM Martin Uecker  wrote:
> > Am Dienstag, dem 01.04.2025 um 15:01 + schrieb Qing Zhao:
> > > > On Apr 1, 2025, at 10:04, Martin Uecker  wrote:
> > > > Am Montag, dem 31.03.2025 um 13:59 -0700 schrieb Bill Wendling:
> > > > > > I'd like to offer up this to solve the issues we're facing. This is 
> > > > > > a
> > > > > > combination of everything that's been discussed here (or at least 
> > > > > > that
> > > > > > I've been able to read in the centi-thread :-).
> > > >
> > > > Thanks! I think this proposal much better as it avoids undue burden
> > > > on parsers, but it does not address all my concerns.
> > > >
> > > >
> > > > From my side, the issue about compromising the scoping rules of C
> > > > is also about unintended non-local effects of code changes. In
> > > > my opinion, a change in a library header elsewhere should not cause
> > > > code in a local scope (which itself might also come from a macro) to
> > > > emit a warning or require a programmer to add a workaround. So I am
> > > > not convinced that adding warnings or a workaround such as
> > > > __builtin_global_ref  is a good solution.
> 
> To clarify, I'm not in favor of adding a generalized new scoping rule
> to C, but only for identifiers in attributes. From what I've seen in

There are existing attributes which support general expressions in their
arguments, e.g. the assume attribute, OpenMP attributes etc.  Those definitely
shouldn't change behavior if some new behavior for identifier lookup in 
attributes
is added.
Many others do support identifiers as their arguments (e.g. mode attribute,
2 argument malloc attribute, ...).  Those can't change behavior either.

I think using either a predefined identifier or self-declared one is the
only reasonable way to go (whether it represents something like this pointer
in C++ or in the latter case possibly forward declaration of the member), 
otherwise
it will be hard to understand for users and very much error prone.

Jakub

Re: [PATCH] RISC-V: xtheadmemidx: Split slli.uw pattern [combine question]

2025-04-01 Thread Jeff Law




Segher -- there's a combine question near the end...


On 3/23/25 8:43 PM, Bohan Lei wrote:

The combine pass can generate an index like (and:DI (mult:DI (reg:DI)
(const_int scale)) (const_int mask)) when XTheadMemIdx is available.
LRA may pull it out, and thus a splitter is needed when Zba is not
available.

A similar splitter were introduced when XTheadMemIdx support was added,
but removed in commit 31c3c5d.  The new splitter in this new patch is
based on the removed one.

gcc/ChangeLog:

 * config/riscv/thead.md (*th_memidx_operand): New splitter.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/xtheadmemidx-bug.c: New test.
---
  gcc/config/riscv/thead.md | 21 +++
  .../gcc.target/riscv/xtheadmemidx-bug.c   | 13 
  2 files changed, 34 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmemidx-bug.c

diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index d816f3b86dd..20e82e68df2 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -458,6 +458,27 @@ (define_insn "*th_mempair_load_zero_extendsidi2"
  
  ;; XTheadMemIdx
  
+;; Help reload to add a displacement for the base register.

+;; In the case `zext(*(uN*))(base+((rN<<1)&0x1fffe))` LRA splits
+;; off two new instructions: a) `new_base = base + disp`, and
+;; b) `index = (rN<<1)&0x1fffe`.  The index calculation has no
+;; corresponding instruction pattern and needs this insn_and_split
+;; to recover.
+
+(define_insn_and_split "*th_memidx_operand"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+  (match_operand:QI 2 "imm123_operand" "Ds3"))
+   (match_operand 3 "const_int_operand" "n")))]
+  "TARGET_64BIT && TARGET_XTHEADMEMIDX && (lra_in_progress || reload_completed)
+   && (INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x"

So this is a nasty little problem and touches on a deeper issue.

The core problem is that combine doesn't know anything about 
constraints.  It works with predicates and conditions.So combine has 
no idea if it's creating an ASM with operands that can't be handled.  As 
a result combine has to be careful with ASMs.


In can_combine_p we have:



 /* Can't merge an ASM_OPERANDS.  */
  || GET_CODE (src) == ASM_OPERANDS


Which is part of a large conditional indicating when we can't combine 
two insns together.  So I would have expected it to reject this insn for 
combining:



(insn 12 11 0 2 (set (mem/f:DI (reg/v/f:DI 138 [ e ]) [2 *e_6+0 S8 A64])
(asm_operands:DI ("") ("=A") 0 [
(mem/f:DI (reg/v/f:DI 138 [ e ]) [2 *e_6+0 S8 A64])
]
 [
(asm_input:DI ("A") j.c:8)
]
 [] j.c:8)) "j.c":8:3 -1
 (expr_list:REG_DEAD (reg/v/f:DI 138 [ e ])
(nil)))


So the natural question is why did insn 12 participate in combination at 
all:



Trying 11 -> 12:
   11: r138:DI=r145:DI+r144:DI
  REG_DEAD r145:DI
  REG_DEAD r144:DI
   12: [r138:DI]=asm_operands
  REG_DEAD r138:DI
Failed to match this instruction:
(set (mem/f:DI (plus:DI (reg/f:DI 145 [ b ])
(reg:DI 144 [ _4 ])) [2 *e_6+0 S8 A64])
(asm_operands:DI ("") ("=A") 0 [
(mem/f:DI (plus:DI (reg/f:DI 145 [ b ])
(reg:DI 144 [ _4 ])) [2 *e_6+0 S8 A64])
]
 [
(asm_input:DI ("A") j.c:8)
]
 [] j.c:8))
allowing combination of insns 11 and 12
original costs 4 + 36 = 40
replacement cost 36
deferring deletion of insn with uid = 11.
modifying insn i312: [r145:DI+r144:DI]=asm_operands
  REG_DEAD r144:DI
  REG_DEAD r145:DI
deferring rescan insn with uid = 12.


So without a deep dive, my question is should this have been rejected 
for combination before we even got here?  I just don't see how combine 
can do anything with asms other than replacing one pseudo with another 
since combine doesn't have any notion of constraints.


Segher, comments?

jeff

[PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

2025-04-01 Thread Xi Ruoyao

Avoid using gensub that FreeBSD awk lacks, use gsub and split those each
of gawk, mawk, and FreeBSD awk provides.

Reported-by: mp...@vip.163.com
Link: https://man.freebsd.org/cgi/man.cgi?query=awk

gcc/ChangeLog:

* config/loongarch/genopts/gen-evolution.awk: Avoid using gensub
that FreeBSD awk lacks.
---

Manually tested the script with gawk and FreeBSD awk.  Ok for trunk?

 gcc/config/loongarch/genopts/gen-evolution.awk | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/genopts/gen-evolution.awk 
b/gcc/config/loongarch/genopts/gen-evolution.awk
index bf16b26760e..142b658fe7a 100644
--- a/gcc/config/loongarch/genopts/gen-evolution.awk
+++ b/gcc/config/loongarch/genopts/gen-evolution.awk
@@ -33,10 +33,12 @@ BEGIN {
 {
 cpucfg_word[NR] = $1
 cpucfg_bit_in_word[NR] = $2
-name[NR] = gensub(/-/, "_", "g", $3)
+name[NR] = $3
+gsub("-", "_", name[NR])
 name_capitalized[NR] = toupper(name[NR])
-isa_version_major[NR] = gensub(/^([1-9][0-9]*)\.([0-9]+)$/, "\\1", 1, $4)
-isa_version_minor[NR] = gensub(/^([1-9][0-9]*)\.([0-9]+)$/, "\\2", 1, $4)
+split($4, isa_ver, "\\.")
+isa_version_major[NR] = isa_ver[1]
+isa_version_minor[NR] = isa_ver[2]
 
 $1 = $2 = $3 = $4 = ""
 sub (/^\s*/, "")
-- 
2.49.0

Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]

2025-04-01 Thread Heinrich Schuchardt


On 4/1/25 20:15, Vineet Gupta wrote:

On 3/31/25 23:48, Heinrich Schuchardt wrote:

On 3/30/25 01:49, Vineet Gupta wrote:

changes since v2
   - dump log sanfu

---
vsetvl phase4 uses LCM guided info to insert VSETVL insns.
It has an additional loop to insert missing vsetvls on certain edges.
Currently it asserts/aborts on encountering EDGE_ABNORMAL.
When enabling go frontend with V enabled, libgo build hits the assert.

It seems to be safe to just skip the abnormal edge.


Hello Vineet,

Is there a test case where only following an abnormal edge between code
blocks would require to call VSETVL?

In the sketched code below following the exception would require VSETVL
to be called while the edge from the try block to the finally block
would not require this.

try {
for (..) {
uint16_t vectorizable math
if (condition)
throw exception
uint16_t vectorizable math
}
for (..) {
uint8_t vectorizable math
}
} catch exception {
} finally
for (..) {
uint8_t vectorizable math
}
}


Yeah we are going to run testsuite with -fnon-call-exceptions to find such 
cases.

But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs.
code complexity trade-off).
After all we'd just endup with an extra VSETVL in the finally block.

Thx,
-Vineet


My worry is not superfluous vsetvl statements but missing ones on 
abnormal edges.


Your patch sounds to me like:
Ignore abnormal edges. Never insert vsetvl statements there.

We need to test that on an abnormal edge we are inserting a vsetvl 
instruction when needed.


As abnormal edges typically don't connect two vectorized code-blocks 
with different vsetvl requirements this requires analyzing generated 
code for this specific scenario.


Best regards

Heinrich

Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]

2025-04-01 Thread Jeff Law





On 4/1/25 10:46 PM, Heinrich Schuchardt wrote:

On 4/1/25 20:15, Vineet Gupta wrote:

On 3/31/25 23:48, Heinrich Schuchardt wrote:

On 3/30/25 01:49, Vineet Gupta wrote:

changes since v2
   - dump log sanfu

---
vsetvl phase4 uses LCM guided info to insert VSETVL insns.
It has an additional loop to insert missing vsetvls on certain edges.
Currently it asserts/aborts on encountering EDGE_ABNORMAL.
When enabling go frontend with V enabled, libgo build hits the assert.

It seems to be safe to just skip the abnormal edge.


Hello Vineet,

Is there a test case where only following an abnormal edge between code
blocks would require to call VSETVL?

In the sketched code below following the exception would require VSETVL
to be called while the edge from the try block to the finally block
would not require this.

try {
for (..) {
    uint16_t vectorizable math
    if (condition)
    throw exception
    uint16_t vectorizable math
}
for (..) {
    uint8_t vectorizable math
}
} catch exception {
} finally
for (..) {
    uint8_t vectorizable math
}
}


Yeah we are going to run testsuite with -fnon-call-exceptions to find 
such cases.


But I'd argue, there is no need to optimize vsetvl for such esoteric 
cases (vs.

code complexity trade-off).
After all we'd just endup with an extra VSETVL in the finally block.

Thx,
-Vineet


My worry is not superfluous vsetvl statements but missing ones on 
abnormal edges.


Your patch sounds to me like:
Ignore abnormal edges. Never insert vsetvl statements there.

We need to test that on an abnormal edge we are inserting a vsetvl 
instruction when needed.


As abnormal edges typically don't connect two vectorized code-blocks 
with different vsetvl requirements this requires analyzing generated 
code for this specific scenario.
Which is what Vineet, Robin, myself and others discussed at length in 
this morning's patch review meeting.  It's a classic problem with LCM 
based algorithms.


We can't just ignore the edge.  What we instead need to do is ensure the 
algorithm never wants to insert on these problematical edges.  This is 
typically handled by "killing" expressions so that they don't propagate 
to these problem edges.


jeff

Re: [PATCH] PR119482: Avoid mispredictions in bitmap_set_bit

2025-04-01 Thread Richard Biener




> Am 02.04.2025 um 00:02 schrieb Andi Kleen :
> 
> From: Andi Kleen 
> 
> This isn't a regression, but it's a very simple patch with high
> performance improvement, so perhaps suitable in the current stage.

Ok.  Can you add a comment as to why we store unconditionally?

Richard 

> ---
> 
> bitmap_set_bit checks the original value of the bit to return it to the
> caller and then only writes the new value back if it changes.
> Most callers of bitmap_set_bit don't need the return value, but with the 
> conditional store
> the CPU still has to predict it correctly since gcc doesn't know how to do
> that without APX on x86 (even though CMOV could do it with a dummy target).
> Really if-conversion should handle this case, but for now we can fix
> it.
> 
> This simple patch improves runtime by 15% for the test case in the PR.
> Which is more than I expected given it only has ~1.44% of the cycles, but I 
> guess
> the mispredicts caused some down stream effects.
> 
>  ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet ran
>1.15 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 
> pr119482.cc  -quiet
> 
> At least with this test case the total number of branches decreases
> drastically. Even though the mispredict rate goes up slightly it is
> still a big win.
> 
> $ perf stat -e 
> branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ 
> \
> -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc  -quiet -w
> 
> Performance counter stats for 'system wide':
> 
>41,932,957,091  branches
>   686,117,623  branch-misses#1.64% of all 
> branches
> 43,690.47 MiB  uncore_imc/cas_count_read/
> 12,362.56 MiB  uncore_imc/cas_count_write/
> 
>  49.328633365 seconds time elapsed
> 
> $ perf stat -e 
> branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ 
> \
> -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet -w
> 
> Performance counter stats for 'system wide':
> 
>37,092,113,179  branches
>   663,641,708  branch-misses#1.79% of all 
> branches
> 43,196.52 MiB  uncore_imc/cas_count_read/
> 12,369.33 MiB  uncore_imc/cas_count_write/
> 
>  42.632458350 seconds time elapsed
> 
> gcc/ChangeLog:
> 
>PR middle-end/119482
>* bitmap.cc (bitmap_set_bit): Write back value unconditionally
> ---
> gcc/bitmap.cc | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
> index f5a64b495ab3..7744f8f8c2e4 100644
> --- a/gcc/bitmap.cc
> +++ b/gcc/bitmap.cc
> @@ -969,8 +969,7 @@ bitmap_set_bit (bitmap head, int bit)
>   if (ptr != 0)
> {
>   bool res = (ptr->bits[word_num] & bit_val) == 0;
> -  if (res)
> -ptr->bits[word_num] |= bit_val;
> +  ptr->bits[word_num] |= bit_val;
>   return res;
> }
> 
> --
> 2.49.0
>

Re: [PATCH] Libstdc++: Fix bootstrap failure for cross without tm.tm_zone [PR119550]

2025-04-01 Thread Tomasz Kaminski

On Tue, Apr 1, 2025 at 2:46 PM Jonathan Wakely  wrote:

> On Tue, 1 Apr 2025 at 11:34, Tomasz Kaminski  wrote:
> >
> >
> >
> > On Mon, Mar 31, 2025 at 7:28 PM Jonathan Wakely 
> wrote:
> >>
> >> In r15-8491-g778c28c70f8573 I added a use of the Autoconf macro
> >> AC_STRUCT_TIMEZONE, but that requires a link-test for the global tzname
> >> object if tm.tm_zone isn't supported. That link-test isn't allowed for
> >> cross-compilation, so bootstrap fails if tm.tm_zone isn't supported.
> >>
> >> Since libstdc++ only cares about tm.tm_zone and won't use tzname anyway,
> >> we don't need the link-test. Replace AC_STRUCT_TIMEZONE with a custom
> >> macro that only checks for tm.tm_zone. We can improve on the Autoconf
> >> macro by checking it's a suitable type, which isn't actually checked by
> >> AC_STRUCT_TIMEZONE.
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >> PR libstdc++/119550
> >> * acinclude.m4 (GLIBCXX_STRUCT_TM_TM_ZONE): New macro.
> >> * config.h.in: Regenerate.
> >> * configure: Regenerate.
> >> * configure.ac: Use GLIBCXX_STRUCT_TM_TM_ZONE.
> >> * include/bits/chrono_io.h (__formatter_chrono::_M_c): Check
> >> _GLIBCXX_USE_STRUCT_TM_TM_ZONE instead of
> >> _GLIBCXX_HAVE_STRUCT_TM_TM_ZONE.
> >> ---
> >>
> >> Testing x86_64-linux and sparc-solaris2.11, looks fine so far.
> >>
> >>  libstdc++-v3/acinclude.m4 |  35 
> >>  libstdc++-v3/config.h.in  |  21 +--
> >>  libstdc++-v3/configure| 238 --
> >>  libstdc++-v3/configure.ac |   5 +-
> >>  libstdc++-v3/include/bits/chrono_io.h |   2 +-
> >>  5 files changed, 109 insertions(+), 192 deletions(-)
> >>
> >> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> >> index e668d2dba27..02fd349e11d 100644
> >> --- a/libstdc++-v3/acinclude.m4
> >> +++ b/libstdc++-v3/acinclude.m4
> >> @@ -5744,6 +5744,41 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [
> >>fi
> >>  ])
> >>
> >> +dnl
> >> +dnl Check for a tm_zone member in struct tm.
> >> +dnl
> >> +dnl This member is defined as const char* in Glibc, newlib,
> POSIX.1-2024,
> >> +dnl and as char* in BSD (including macOS).
> >> +dnl
> >> +dnl Defines:
> >> +dnl  _GLIBCXX_USE_STRUCT_TM_TM_ZONE if struct tm has a tm_zone member.
> >> +dnl
> >> +AC_DEFUN([GLIBCXX_STRUCT_TM_TM_ZONE], [
> >> +
> >> +  AC_LANG_SAVE
> >
> > From documentation this is deprecated in favor of AC_LANG_PUSH.
>
> It looks like that is supported by autoconf 2.69 and is already used
> elsewhere in GCC. We don't currently use it in libstdc++ but it should
> be safe to do so.
>
> >> +  AC_LANG_CPLUSPLUS
> >> +  ac_save_CXXFLAGS="$CXXFLAGS"
> >> +  CXXFLAGS="$CXXFLAGS -std=c++20"
> >
> > The program that is compiled does not seem to require C++20.
> > If we change the declaration of "t" to use "= {}", we could do it in
> C++98.
> > Any reason to adjust the flags at all?
>
> We only need to use the tm_zone member in C++20 mode, and it's
> possible that on some platform it's not exposed for older standards
> (very unlikely, but possible). I wanted to test for exactly what we
> require. I don't feel strongly about it though.
>
I would found it surprising if the layout of the structure would fiffer
that much,
wouldn't this cause ABI compatibility problems for TU that memcopies struct
tm
and are compiled with different language versions?
I have slight preference towards not overriding CXXFLAGS, as for me this
suggest
that struct tm was changed in C++20 (or linked C standard), which is not
the case.
But I am fine either way.

>
> >>
> >> +
> >> +  AC_CACHE_CHECK([for tm_zone member of struct tm],
> glibcxx_cv_tm_zone, [
> >> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([#include 
> >> +   ],
> >> +   [struct tm t{}; t.tm_zone = (char*)0;]
> >> +   )],
> >> +   [glibcxx_cv_tm_zone=yes],
> >> +   [glibcxx_cv_tm_zone=no]
> >> +  )
> >> +])
> >> +
> >> +  if test $glibcxx_cv_tm_zone = yes; then
> >> +AC_DEFINE(_GLIBCXX_USE_STRUCT_TM_TM_ZONE, 1,
> >> + [Define if struct tm has a tm_zone member.])
> >> +  fi
> >> +
> >> +  CXXFLAGS="$ac_save_CXXFLAGS"
> >> +  AC_LANG_RESTORE
> >> +])
> >> +
> >>  dnl
> >>  dnl Check whether lock tables can be aligned to avoid false sharing.
> >>  dnl
> >> diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
> >> index be151f43dd6..77bbaf1beaa 100644
> >> --- a/libstdc++-v3/config.h.in
> >> +++ b/libstdc++-v3/config.h.in
> >> @@ -74,10 +74,6 @@
> >> don't. */
> >>  #undef HAVE_DECL_STRNLEN
> >>
> >> -/* Define to 1 if you have the declaration of `tzname', and to 0 if
> you don't.
> >> -   */
> >> -#undef HAVE_DECL_TZNAME
> >> -
> >>  /* Define to 1 if you have the  header file. */
> >>  #undef HAVE_DIRENT_H
> >>
> >> @@ -412,9 +408,6 @@
> >>  /* Define to 1 if `d_type' is a member of `struct dirent'. */
> >>  #undef HAVE_STRUCT_DIRENT_D_TYPE
> >>
> >> -/* Define to 1 if `tm_zone' is a member of `struct tm'. */
> >> -#undef HAVE_STR

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Hongtao Liu

On Tue, Apr 1, 2025 at 3:56 PM Jakub Jelinek  wrote:
>
> On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote:
> > >Changing ix86_valid_target_attribute_inner_p might be even better because
> > >OPT_msse4 is RejectNegative option, so !value for it looks weird.
> > msse4 is defined as ix86_opt_isa in ix86_valid_target_attribute_inner_p
> >
> > 1055IX86_ATTR_ISA ("sse4",  OPT_msse4),
> >
> > and would be handled in ix86_handle_option
> >
> > 1282  else if (type == ix86_opt_isa)
> > 1283{
> > 1284  struct cl_decoded_option decoded;
> > 1285
> > 1286  generate_option (opt, NULL, opt_set_p, CL_TARGET,
> > &decoded);
> > 1287  ix86_handle_option (opts, opts_set,
> > 1288  &decoded, input_location);
> > 1289}
> >
> > So I think it's already correct here.
>
> The entries are
> msse4
> Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
> Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and 
> code generation.
>
> mno-sse4
> Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save
> Do not support SSE4.1 and SSE4.2 built-in functions and code generation.
>
> so if it is turned into something without RejectNegative
> it will be wrong in the Mask argument, either for the positive or for the
> negative case.
For the command line, or target attribute, the actual operation goes
into ix86_handle_option, and as long as we get it right in this
ix86_handle_option, everything else should be fine.
As for the macros generated by the mask name (TARGET_SSE4_1_P), their
meanings remain the same, so in my opinion they can be handled this
way. But to be on the safe side, I agree that it should be adjusted in
ix86_valid_target_attribute_inner_p first.
In GCC16, we can do a refactor for this.
>
> I wonder though if msse4 should not be a Target RejectNegative Alias to 
> msse4.2
> and mno-sse4 RejectNegative Alias to mno-sse4.1.
> Though, unsure if one will still be able to specify it in #pragma GCC target
> or target attribute...
>
> Jakub
>


-- 
BR,
Hongtao

Re: [PATCH] tailc: Improve tail recursion handling [PR119493]

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> Hi!
> 
> This is a partial step towards fixing that PR.
> For musttail recursive calls which have non-is_gimple_reg_type typed
> parameters, the only case we've handled was if the exact parameter
> was passed through (perhaps modified, but still the same PARM_DECL).
> That isn't necessary, we can copy the argument to the parameter as well
> (just need to watch for the use of the parameter in later arguments,
> say musttail recursive call which swaps 2 structure arguments).
> 
> The patch attempts to play safe and punts if any of the parameters are
> addressable (like we do for all normal tail calls and tail recursions,
> except for musttail in the posted unreviewed patch).
> 
> With this patch (at least when early inlining isn't done on not yet
> optimized body) inlining should see already tail recursion optimized
> body and will not have problems with SRA breaking musttail.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This looks OK, but I wonder if ...

> 2025-04-01  Jakub Jelinek  
> 
>   PR tree-optimization/119493
>   * tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion
>   if some arguments don't have is_gimple_reg_type, only punt if they
>   have non-POD types, or volatile, or addressable.  Set
>   tailr_arg_needs_copy in those cases too.
>   (eliminate_tail_call): Copy call arguments to params if they don't
>   have is_gimple_reg_type, use temporaries if the argument is used
>   later.
>   (tree_optimize_tail_calls_1): Skip !is_gimple_reg_type
>   tailr_arg_needs_copy parameters.  Formatting fix.
> 
>   * gcc.dg/pr119493-1.c: New test.
> 
> --- gcc/tree-tailcall.cc.jj   2025-03-28 10:49:30.0 +0100
> +++ gcc/tree-tailcall.cc  2025-03-31 12:31:01.717603446 +0200
> @@ -672,19 +672,16 @@ find_tail_calls (basic_block bb, struct
>have a copyable type and the two arguments must have reasonably
>equivalent types.  The latter requirement could be relaxed if
>we emitted a suitable type conversion statement.  */
> -   if (!is_gimple_reg_type (TREE_TYPE (param))
> +   if (TREE_ADDRESSABLE (TREE_TYPE (param))
> || !useless_type_conversion_p (TREE_TYPE (param),
>TREE_TYPE (arg)))
>   break;
>  
> -   /* The parameter should be a real operand, so that phi node
> -  created for it at the start of the function has the meaning
> -  of copying the value.  This test implies is_gimple_reg_type
> -  from the previous condition, however this one could be
> -  relaxed by being more careful with copying the new value
> -  of the parameter (emitting appropriate GIMPLE_ASSIGN and
> -  updating the virtual operands).  */
> -   if (!is_gimple_reg (param))
> +   if (is_gimple_reg_type (TREE_TYPE (param))
> +   ? !is_gimple_reg (param)

... we want to restrict this to musttail calls at this point and
relax for stage1 only?

> +   : (!is_gimple_variable (param)
> +  || TREE_THIS_VOLATILE (param)
> +  || may_be_aliased (param)))
>   break;
>   }
>   }
> @@ -934,9 +931,9 @@ find_tail_calls (basic_block bb, struct
>  param = DECL_CHAIN (param), idx++)
>   {
> tree ddef, arg = gimple_call_arg (call, idx);
> -   if (is_gimple_reg (param)
> -   && (ddef = ssa_default_def (cfun, param))
> -   && (arg != ddef))
> +   if (!is_gimple_reg (param)
> +   || ((ddef = ssa_default_def (cfun, param))
> +   && arg != ddef))
>   bitmap_set_bit (tailr_arg_needs_copy, idx);
>   }
>  }
> @@ -1212,6 +1209,7 @@ eliminate_tail_call (struct tailcall *t,
>  
>/* Add phi node entries for arguments.  The ordering of the phi nodes 
> should
>   be the same as the ordering of the arguments.  */
> +  auto_vec copies;
>for (param = DECL_ARGUMENTS (current_function_decl),
>idx = 0, gpi = gsi_start_phis (first);
> param;
> @@ -1220,6 +1218,35 @@ eliminate_tail_call (struct tailcall *t,
>if (!bitmap_bit_p (tailr_arg_needs_copy, idx))
>   continue;
>  
> +  if (!is_gimple_reg_type (TREE_TYPE (param)))
> + {
> +   if (param == gimple_call_arg (stmt, idx))
> + continue;
> +   /* First check if param isn't used by any of the following
> +  call arguments.  If it is, we need to copy first to
> +  a temporary and only after doing all the assignments copy it
> +  to param.  */
> +   size_t idx2 = idx + 1;
> +   tree param2 = DECL_CHAIN (param);
> +   for (; param2; param2 = DECL_CHAIN (param2), idx2++)
> + if (!is_gimple_reg_type (TREE_TYPE (param)))
> +   {
> + tree base = get_base_address (gimple_call_arg (stmt, idx2

Re: [PATCH] tailr: Punt on tail recursions that would break musttail [PR119493]

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> Hi!
> 
> While working on the previous tailc patch, I've noticed the following
> problem.
> The testcase below fails, because we decide to tail recursion optimize
> the call, but tail recursion (as documented in tree-tailcall.cc) needs to
> add some result multiplication and/or addition if any tail recursion uses
> accumulator, which is added right before the return.
> So, if there are musttail non-recurive calls in the function, successful
> tail recursion optimization will mean we'll later error on the musttail
> calls.  musttail recursive calls are ok, those would be tail recursion
> optimized.
> 
> So, the following patch punts on all tail recursion optimizations if it
> needs accumulators (add and/or mult) if there is at least one non-recursive
> musttail call.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Do we update cfun->has_musttail at some point for optimized away
musttail calls with respect to inlining/DCE?  I do wonder whether
tree-tailcall.cc itself could "quickly" determine whether we have
any musttail calls that we can handle and will not diagnose?

Thanks,
Richard.

> 2025-04-01  Jakub Jelinek  
> 
>   PR tree-optimization/119493
>   * tree-tailcall.cc (tree_optimize_tail_calls_1): Ignore tail recursion
>   candidates which need accumulators if there is at least one musttail
>   non-recursive call.
> 
>   * gcc.dg/pr119493-2.c: New test.
> 
> --- gcc/tree-tailcall.cc.jj   2025-03-31 13:00:58.288725496 +0200
> +++ gcc/tree-tailcall.cc  2025-03-31 14:34:31.285106333 +0200
> @@ -1373,6 +1373,38 @@ tree_optimize_tail_calls_1 (bool opt_tai
>live_vars = NULL;
>  }
>  
> +  if (cfun->has_musttail)
> +{
> +  /* We can't mix non-recursive must tail calls with tail recursive
> +  calls which require accumulators, because in that case we have to
> +  emit code in between the musttail calls and return, which prevent
> +  calling them as tail calls.  So, in that case give up on the
> +  tail recursion.  */
> +  for (act = tailcalls; act; act = act->next)
> + if (!act->tail_recursion)
> +   {
> + gcall *call = as_a  (gsi_stmt (act->call_gsi));
> + if (gimple_call_must_tail_p (call))
> +   break;
> +   }
> +  if (act)
> + for (struct tailcall **p = &tailcalls; *p; )
> +   {
> + if ((*p)->tail_recursion && ((*p)->add || (*p)->mult))
> +   {
> + struct tailcall *a = *p;
> + *p = (*p)->next;
> + gcall *call = as_a  (gsi_stmt (a->call_gsi));
> + maybe_error_musttail (call,
> +   _("tail recursion with accumulation "
> + "mixed with musttail "
> + "non-recursive call"), diag_musttail);
> + free (a);
> +   }
> + else
> +   p = &(*p)->next;
> +   }
> +}
>/* Construct the phi nodes and accumulators if necessary.  */
>a_acc = m_acc = NULL_TREE;
>for (act = tailcalls; act; act = act->next)
> --- gcc/testsuite/gcc.dg/pr119493-2.c.jj  2025-03-31 14:41:18.977464154 
> +0200
> +++ gcc/testsuite/gcc.dg/pr119493-2.c 2025-03-31 14:45:24.327068682 +0200
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/119493 */
> +/* { dg-do compile { target musttail } } */
> +/* { dg-options "-O2 -fdump-tree-tailr1-details" } */
> +/* { dg-final { scan-tree-dump-times "tail recursion with accumulation mixed 
> with musttail non-recursive call" 2 "tailr1" } } */
> +
> +[[gnu::noipa]] int
> +bar (int x, int y)
> +{
> +  return x + y;
> +}
> +
> +[[gnu::noinline, gnu::noclone]] int
> +foo (int x, int y)
> +{
> +  if (x < 10)
> +[[gnu::musttail]] return bar (x, y);
> +  if (y & 2)
> +return foo (x - 1, y) * 2;
> +  if (y & 1)
> +[[gnu::musttail]] return foo (x - 1, y);
> +  return foo (x - 1, y) * 3;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] sra: Avoid creating TBAA hazards (PR118924)

2025-04-01 Thread Richard Biener

On Mon, 31 Mar 2025, Martin Jambor wrote:

> Hi,
> 
> the testcase in PR 118924, when compiled on Aarch64, contains an
> gimple aggregate assignment statement in between different types which
> are types_compatible_p but behave differently for the purposes of
> alias analysis.
> 
> SRA replaces the statement with a series of scalar assignments which
> however have LHSs access chains modeled on the RHS type and so do not
> alias with a subsequent reads and so are DSEd.
> 
> SRA clearly gets its "same_access_path" logic subtly wrong.  One issue
> is that the same_access_path_p function probably should be implemented
> more along the lines of (parts of ao_compare::compare_ao_refs) instead
> of internally relying on operand_equal_p.  That is however not the
> problem in the PR and so I will deal with it only later.
> 
> The issue here is that even when the access path is the same, it must
> not be bolted on an aggregate type that does not match.  This patch
> does that, taking just one simple function from the
> ao_compare::compare_ao_refs machinery and using it to detect the
> situation.  The rest is just merging the information in between
> accesses of the same access group.
> 
> I looked at how many times we come across such assignment during
> "make stage2-bubble" of GCC (configured with only c and C++ and
> without multilib and libsanitizers) and on an x86_64 there were 87924
> such assignments (though now I realize not all of them had to be
> aggregate), so they do happen.  The patch leads to about 5% increase
> of cases where we don't use an "access path" but resort to a
> MEM_REF (from 90209 to 95204).  On an Aarch64, there were 92268 such
> assignments and the increase of falling back to MEM_REFs was by
> 4% (but from a bigger base 132983 to 107991).
> 
> Bootstrapped on x86_64-linux and Aarch64-linux.  OK for master and then
> for all active release branches?

I'm unsure about the lto_streaming_safe arg, in fact the implementation is

  if (lto_streaming_safe)
return type1 == type2;
  else
return TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2);

but I think we guarantee (we _need_ to guarantee!) that when
TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2), after LTO streaming
it's still TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2).  Otherwise
assignments previously valid GIMPLE might no longer be valid and
things that aliased previously might no longer alias.

But that's an implementation bug in types_equal_for_same_type_for_tbaa_p,
but I think you should be able to pass in false as if the implementation
were fixed.  IMO if necessary the implementation itself should
use sth like if (flag_lto && !in_lto_p) rather than leaving it up to
the caller ... the only existing caller uses
lto_streaming_expected_p () as arg, which is similar to my proposal.

I'd say you want to export at a forwarder

bool
types_equal_for_same_type_for_tbaa_p (tree type1, tree type2)
{
  return types_equal_for_same_type_for_tbaa_p (type1, type2, 
lto_streaming_expected_p ());
}

instead as it should be an internal detail.

OK with that change.

Can you fixup the comment

/* Return ture if TYPE1 and TYPE2 will always give the same answer
   when compared wit hother types using same_type_for_tbaa_p.  */

when you are there?  The function is called same_type_for_tbaa
and 'wit hother' should be 'with other'

Richard. 


> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2025-03-24  Martin Jambor  
> 
>   PR tree-optimization/118924
>   * tree-ssa-alias-compare.h (types_equal_for_same_type_for_tbaa_p):
>   Declare.
>   * tree-ssa-alias.cc (types_equal_for_same_type_for_tbaa_p): Make
>   public.
>   * tree-sra.cc: Include tree-ssa-alias-compare.h.
>   (create_access): Initialzie grp_same_access_path to true.
>   (build_accesses_from_assign): Detect tbaa hazards and clear
>   grp_same_access_path fields of involved accesses when they occur.
>   (sort_and_splice_var_accesses): Take previous values of
>   grp_same_access_path into account.
> 
> gcc/testsuite/ChangeLog:
> 
> 2025-03-25  Martin Jambor  
> 
>   PR tree-optimization/118924
>   * g++.dg/tree-ssa/pr118924.C: New test.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr118924.C | 29 
>  gcc/tree-sra.cc  | 18 ---
>  gcc/tree-ssa-alias-compare.h |  3 +++
>  gcc/tree-ssa-alias.cc|  2 +-
>  4 files changed, 48 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr118924.C
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr118924.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr118924.C
> new file mode 100644
> index 000..c95eacafc9c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr118924.C
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-std=c++17 -O2" } */
> +
> +template  struct Vector {
> +  int m_data[Size];
> +  Vector(int, int, int) {}
> +};
> +enum class E { POINTS, LINES, TRIANGLES };
>

Re: [PATCH] c++/modules: Forbid exposures of TU-local entities in inline variables [PR119551]

2025-04-01 Thread Jason Merrill


On 4/1/25 7:02 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

An inline variable has vague linkage, and needs to be conditionally
emitted in TUs that reference it.  Unfortunately this clashes with
[basic.link] p14.2, which says that we ignore the initialisers of all
variables (including inline ones), since importers will not have access
to the referenced TU-local entities to write the definition.

This patch makes such exposures be ill-formed.  One case that continues
to work is if the exposure is part of the dynamic initialiser of an
inline variable; in such cases, the definition has been built as part of
the module interface unit anyway, and importers don't need to write it
out again, so such exposures are "harmless".


OK.


PR c++/119551

gcc/cp/ChangeLog:

* module.cc (trees_out::write_var_def): Only ignore non-inline
variable initializers.

gcc/testsuite/ChangeLog:

* g++.dg/modules/internal-5_a.C: Add cases that should be
ignored.
* g++.dg/modules/internal-5_b.C: Test these new cases, and make
the testcase more robust.
* g++.dg/modules/internal-11.C: New test.
* g++.dg/modules/internal-12_a.C: New test.
* g++.dg/modules/internal-12_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc |  7 +++---
  gcc/testsuite/g++.dg/modules/internal-11.C   | 24 
  gcc/testsuite/g++.dg/modules/internal-12_a.C | 13 +++
  gcc/testsuite/g++.dg/modules/internal-12_b.C | 14 
  gcc/testsuite/g++.dg/modules/internal-5_a.C  |  8 ++-
  gcc/testsuite/g++.dg/modules/internal-5_b.C  |  6 +
  6 files changed, 68 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/internal-11.C
  create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/internal-12_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 894c70f7225..ce22b2ece3f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12679,9 +12679,10 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
  void
  trees_out::write_var_def (tree decl)
  {
-  /* The initializer of a variable or variable template is ignored for
- determining exposures.  */
-  auto ovr = make_temp_override (dep_hash->ignore_tu_local, VAR_P (decl));
+  /* The initializer of a non-inline variable or variable template is
+ ignored for determining exposures.  */
+  auto ovr = make_temp_override (dep_hash->ignore_tu_local,
+VAR_P (decl) && !DECL_INLINE_VAR_P (decl));
  
tree init = DECL_INITIAL (decl);

tree_node (init);
diff --git a/gcc/testsuite/g++.dg/modules/internal-11.C 
b/gcc/testsuite/g++.dg/modules/internal-11.C
new file mode 100644
index 000..53eb30a62be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-11.C
@@ -0,0 +1,24 @@
+// PR c++/119551
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi !M }
+
+export module M;
+
+static int tu_local = 5;
+static int* foo() { return &tu_local; }
+
+// For implementation reasons, we adjust [basic.link] p14.2 to restrict ignored
+// exposures to non-inline variables, since for inline variables without
+// dynamic initialisation we need to emit their initialiser for importer use.
+
+int* a = &tu_local;  // OK
+inline int* b = &tu_local;  // { dg-error "exposes TU-local entity" }
+
+// But dynamic initialisers are fine, importers will just treat them as 
external.
+inline int* c = foo();  // OK
+
+// For consistency, we follow the same rules with templates, noting that
+// we still need to emit definitions with dynamic initializers so we error.
+template  int* d = &tu_local;  // OK
+template  inline int* e = &tu_local;  // { dg-error "exposes TU-local 
entity" }
+template  inline int* f = foo();  // { dg-error "exposes TU-local 
entity" }
diff --git a/gcc/testsuite/g++.dg/modules/internal-12_a.C 
b/gcc/testsuite/g++.dg/modules/internal-12_a.C
new file mode 100644
index 000..5c4e7c602b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-12_a.C
@@ -0,0 +1,13 @@
+// PR c++/119551
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M }
+// Test that emitting variables referencing TU-local entities
+// builds and runs correctly.
+
+export module M;
+
+static int tu_local_var = 5;
+static int* tu_local_func() { return &tu_local_var; }
+
+export int* a = &tu_local_var;
+export inline int* b = tu_local_func();
diff --git a/gcc/testsuite/g++.dg/modules/internal-12_b.C 
b/gcc/testsuite/g++.dg/modules/internal-12_b.C
new file mode 100644
index 000..bc3edf99a11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/internal-12_b.C
@@ -0,0 +1,14 @@
+// PR c++/119551
+// { dg-module-do run }
+// { dg-additional-options "-fmodules" }
+
+import M;
+
+int main() {
+  if (*a != 5)
+__builtin_abort();
+  if (*b != 5)
+__builtin_abort();
+  if (a

Re: [PATCH] target/119549 - fixup handling of -mno-sse4

2025-04-01 Thread Richard Biener

On Tue, 1 Apr 2025, Jakub Jelinek wrote:

> On Tue, Apr 01, 2025 at 01:36:23PM +0800, Hongtao Liu wrote:
> > >Changing ix86_valid_target_attribute_inner_p might be even better because
> > >OPT_msse4 is RejectNegative option, so !value for it looks weird.
> > msse4 is defined as ix86_opt_isa in ix86_valid_target_attribute_inner_p
> > 
> > 1055IX86_ATTR_ISA ("sse4",  OPT_msse4),
> > 
> > and would be handled in ix86_handle_option
> > 
> > 1282  else if (type == ix86_opt_isa)
> > 1283{
> > 1284  struct cl_decoded_option decoded;
> > 1285
> > 1286  generate_option (opt, NULL, opt_set_p, CL_TARGET,
> > &decoded);
> > 1287  ix86_handle_option (opts, opts_set,
> > 1288  &decoded, input_location);
> > 1289}
> > 
> > So I think it's already correct here.
> 
> The entries are
> msse4
> Target RejectNegative Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
> Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in functions and 
> code generation.
> 
> mno-sse4
> Target RejectNegative InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save
> Do not support SSE4.1 and SSE4.2 built-in functions and code generation.
> 
> so if it is turned into something without RejectNegative,
> it will be wrong in the Mask argument, either for the positive or for the
> negative case.
> 
> I wonder though if msse4 should not be a Target RejectNegative Alias to 
> msse4.2
> and mno-sse4 RejectNegative Alias to mno-sse4.1.
> Though, unsure if one will still be able to specify it in #pragma GCC target
> or target attribute...

Yeah, I think the lack of Alias(-msse4.1,-msse4.2) aka alias to 
enable/disable two options is what makes it difficult.  I'll post
a v2 momentarily fixing up ix86_valid_target_attribute_inner_p
instead.  That at least seems quite a safe change (also for branches).

Richard.

[PATCH] simplify-rtx: Fix shortcut for vector eq/ne

2025-04-01 Thread Richard Sandiford

This patch forestalls a regression in gcc.dg/rtl/x86_64/vector_eq.c
with the patch for PR116398.  The test wants:

  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) (const_int 
0) (const_int 0) (const_int 0)])))
  (cinsn 5 (set (reg:V4SI <2>)
(eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>

to be folded to a vector of -1s.  One unusual thing about the fold
is that the <1> in the second insn is uninitialised; it looks like
it should be replaced by <0>, or that there should be an insn 4 that
copies <0> to <1>.

As it stands, the test relies on init-regs to insert a zero
initialisation of <1>.  This happens after all the cse/pre/fwprop
stuff, with only dce passes between init-regs and combine.
Combine therefore sees:

(insn 3 2 8 2 (set (reg:V4SI 98)
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])) 2403 {movv4si_internal}
 (nil))
(insn 8 3 9 2 (clobber (reg:V4SI 99)) -1
 (nil))
(insn 9 8 5 2 (set (reg:V4SI 99)
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])) -1
 (nil))
(insn 5 9 7 2 (set (reg:V4SI 100)
(eq:V4SI (reg:V4SI 98)
(reg:V4SI 99))) 7874 {*sse2_eqv4si3}
 (expr_list:REG_DEAD (reg:V4SI 99)
(expr_list:REG_DEAD (reg:V4SI 98)
(expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(reg:V4SI 99))
(nil)

It looks like the test should then pass through a 3, 9 -> 5 combination,
so that we get an (eq ...) between two zeros and fold it to a vector
of -1s.  But although the combination is attempted, the fold doesn't
happen.  Instead, combine is left to match the unsimplified (eq ...)
between two zeros, which rightly fails.  The test only passes because
late_combine2 happens to try simplifying an (eq ...) between reg X and
reg X, which does fold to a vector of -1s.

The different handling of registers and constants is due to this
code in simplify_const_relational_operation:

  if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx
  && (code == EQ || code == NE)
  && ! ((REG_P (op0) || CONST_INT_P (trueop0))
&& (REG_P (op1) || CONST_INT_P (trueop1)))
  && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0
  /* We cannot do this if tem is a nonzero address.  */
  && ! nonzero_address_p (tem))
return simplify_const_relational_operation (signed_condition (code),
mode, tem, const0_rtx);

INTEGRAL_MODE_P matches vector integer modes, but everything else
about the condition is written for scalar integers only.  Thus if
trueop0 and trueop1 are equal vector constants, we'll bypass all
the exclusions and try simplifying a subtraction.  This will succeed,
giving a vector of zeros.  The recursive call will then try to simplify
a comparison between the vector of zeros and const0_rtx, which isn't
well-formed.  Luckily or unluckily, the ill-formedness doesn't trigger
an ICE, but it does prevent any simplification from happening.

The least-effort fix would be to replace INTEGRAL_MODE_P with
SCALAR_INT_MODE_P.  But the fold does make conceptual sense for
vectors too, so it seemed better to keep the INTEGRAL_MODE_P and
generalise the rest of the condition to match.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

I'm hoping to post the actual patch for PR116398 later today.

Richard


gcc/
* simplify-rtx.cc (simplify_const_relational_operation): Generalize
the constant checks in the fold-via-minus path to match the
INTEGRAL_MODE_P condition.
---
 gcc/simplify-rtx.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index fe007bc7d96..6f969effdf9 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -6657,15 +6657,20 @@ simplify_const_relational_operation (enum rtx_code code,
  we do not know the signedness of the operation on either the left or
  the right hand side of the comparison.  */
 
-  if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx
+  if (INTEGRAL_MODE_P (mode)
+  && trueop1 != CONST0_RTX (mode)
   && (code == EQ || code == NE)
-  && ! ((REG_P (op0) || CONST_INT_P (trueop0))
-   && (REG_P (op1) || CONST_INT_P (trueop1)))
+  && ! ((REG_P (op0)
+|| CONST_SCALAR_INT_P (trueop0)
+|| CONST_VECTOR_P (trueop0))
+   && (REG_P (op1)
+   || CONST_SCALAR_INT_P (trueop1)
+   || CONST_VECTOR_P (trueop1)))
   && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0
   /* We cannot do this if tem is a nonzero address.  */
   && ! nonzero_address_p (tem))
 return simplify_const_relational_operation (signed_condition (code),
-   mode, tem, const0_rtx);
+

Re: [PATCH][gcc-14] libstdc++: Fix bogus -Wstringop-overflow in std::vector::insert [PR117983]

2025-04-01 Thread Tomasz Kaminski

On Tue, Apr 1, 2025 at 11:32 AM Jonathan Wakely  wrote:

> This was fixed on trunk by r15-4473-g3abe751ea86e34, but that isn't
> suitable for backporting. Instead, just add another unreachable
> condition in std::vector::_M_range_insert so the compiler knows this
> memcpy doesn't use a length originating from a negative ptrdiff_t
> converted to a very positive size_t.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/117983
> * include/bits/vector.tcc (vector::_M_range_insert): Add
> unreachable condition to tell the compiler begin() <= end().
> * testsuite/23_containers/vector/modifiers/insert/117983.cc: New
> test.
>
> (cherry picked from commit 878812b6f6905774ab37cb78903e3e11bf1c508c)
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
> This warning is a regression on 12/13/14 so this is needed on all
> release branches.
>
>  libstdc++-v3/include/bits/vector.tcc|  2 ++
>  .../vector/modifiers/insert/117983.cc   | 17 +
>  2 files changed, 19 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc
>
> diff --git a/libstdc++-v3/include/bits/vector.tcc
> b/libstdc++-v3/include/bits/vector.tcc
> index 0458d560075..b7f472ae09c 100644
> --- a/libstdc++-v3/include/bits/vector.tcc
> +++ b/libstdc++-v3/include/bits/vector.tcc
> @@ -1002,6 +1002,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> // reachable.
> pointer __old_start = this->_M_impl._M_start;
> pointer __old_finish = this->_M_impl._M_finish;
> +   if ((__old_finish - __old_start) < 0)
> + __builtin_unreachable();
>
> const size_type __len =
>   _M_check_len(__n, "vector::_M_range_insert");
> diff --git
> a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc
> b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc
> new file mode 100644
> index 000..e6027a677ee
> --- /dev/null
> +++
> b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/117983.cc
> @@ -0,0 +1,17 @@
> +// { dg-options "-O3 -Werror=stringop-overflow" }
> +// { dg-do compile }
> +
> +// PR libstdc++/117983
> +// -Wstringop-overflow false positive for __builtin_memmove from
> vector::insert
> +
> +#include 
> +
> +typedef std::vector bytes;
> +
> +void push(bytes chunk, bytes& data) {
> +  if (data.empty()) {
> +data.swap(chunk);
> +  } else {
> +data.insert(data.end(), chunk.begin(), chunk.end());
> +  }
> +}
> --
> 2.49.0
>
>

Re: [PATCH] combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]

2025-04-01 Thread Jakub Jelinek

On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> > Looking at this some more today, I think we should special case
> > set_noop_p because that can be put into i2 (except for the JUMP_P
> > violations), currently both modified_between_p (pc_rtx, i2, i3)
> > and reg_used_between_p (pc_rtx, i2, i3) returns false.
> > I'll post a patch incrementally for that (but that feels like
> > new optimization, so probably not something that should be backported).
> 
> Trying to review this I noticed that both the comment in combine suggests
> that memory set is important and modified_between_p handles MEM_P
> 'x' and for non-REG 'x' possibly MEM operands (CONCAT?) while
> reg_used_between_p only handles REG_P 'reg'.  Also shouldn't
> you use reg_referenced_p?  At least that function seems to handle
> SET_SRC/SET_DEST separately?
> 
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine?  Why
> does the comment talk about memory?

I was worried about making too risky changes this late in stage4
(and especially also for backports).  Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).

What we know for sure is:
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
   && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
  XVECEXP (newpat, 0, 0))
   && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
  XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).

Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes.  This part didn't change in my patch.

I've only changed those
- && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+ && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.

The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3.  So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false.  Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch).  In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.

Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
   && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
  XVECEXP (newpat, 0, 0))
   && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
  XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.

To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).

So, I think it would be safer to add && SET_DEST

Re: [PATCH v4 0/3] [AArch64, OpenMP] Support SVE types with various OpenMP clauses and constructs

2025-04-01 Thread Tejas Belagod




Ping.

Thanks,
Tejas.

On 3/18/25 11:27 AM, Tejas Belagod wrote:

This series is based on a previous thread and review comments from RichardS and
Jakub upstream:

https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677072.html

The changes suggested are cosmetic in nature. As suggested in the previous
series v3, this series doesn't include tests - they're going to be part of a
separate patch series.

Patches have been rebased, retested and bootstrapped on aarch64-linux-gcc.
OK for trunk?

Richard Sandiford (1):
   gomp: Various fixes for SVE types [PR101018]

Tejas Belagod (2):
   Add function to strip pointer type and get down to the actual pointee
 type.
   AArch64: Diagnose OpenMP offloading when SVE types involved.

  gcc/config/aarch64/aarch64-sve-builtins.cc | 37 -
  gcc/fold-const.cc  |  7 +++
  gcc/gimplify.cc| 60 ++
  gcc/omp-low.cc |  2 +-
  gcc/poly-int.h | 19 +++
  gcc/target.h   | 37 -
  gcc/tree.h |  9 
  7 files changed, 157 insertions(+), 14 deletions(-)

Re: [PATCH v2] RISC-V: Tweak testcase for PIE

2025-04-01 Thread Kito Cheng

Push to trunk since the CI is green :)

On Tue, Apr 1, 2025 at 3:43 PM Kito Cheng  wrote:
>
> Linux toolchain may configured with --enable-default-pie, and that will
> cause lots of regression test failures because the function name will
> append with @plt suffix (e.g. `call foo` become `call foo@plt`), also
> some code generation  will different due to the code model like the address
> generation for global variable, so we may add -fno-pie to those
> testcases to prevent that.
>
> We may consider just drop @plt suffix to prevent that at all, because
> it's not difference between w/ and w/o @plt suffix, the linker will pick
> the right one to do, however it's late stage of GCC development, so just
> tweak the testcase should be the best way to do now.
>
> Changes from v1:
> - Add more testcase for PIE (from rvv.exp).
> - Tweak the rule for match @plt.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rv32i_zcmp.c: Tweak testcase for PIE.
> * gcc.target/riscv/rv32e_zcmp.c: Likewise.
> * gcc.target/riscv/zcmp_stack_alignment.c: Likewise.
> * gcc.target/riscv/cm_mv_rv32.c: Likewise.
> * gcc.target/riscv/cpymem-64.c: Likewise.
> * gcc.target/riscv/fmax-snan.c: Likewise.
> * gcc.target/riscv/fmaxf-snan.c: Likewise.
> * gcc.target/riscv/fmin-snan.c: Likewise.
> * gcc.target/riscv/fminf-snan.c: Likewise.
> * gcc.target/riscv/large-model.c: Likewise.
> * gcc.target/riscv/predef-1.c: Likewise.
> * gcc.target/riscv/predef-4.c: Likewise.
> * gcc.target/riscv/predef-7.c: Likewise.
> * gcc.target/riscv/predef-9.c: Likewise.
> * gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: 
> Likewise.
> * gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Likewise.
> * gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Likewise.
> * gcc.target/riscv/rvv/base/cmpmem-1.c: Likewise.
> * gcc.target/riscv/rvv/base/cmpmem-3.c: Likewise.
> * gcc.target/riscv/rvv/base/cmpmem-4.c: Likewise.
> * gcc.target/riscv/rvv/base/cpymem-1.c: Likewise.
> * gcc.target/riscv/rvv/base/movmem-1.c: Likewise.
> * gcc.target/riscv/rvv/base/pr114352-3.c: Likewise.
> * gcc.target/riscv/rvv/base/setmem-1.c: Likewise.
> * gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
> * gcc.target/riscv/rvv/base/setmem-3.c: Likewise.
> * gcc.target/riscv/rvv/base/spill-9.c: Likewise.
> * g++.target/riscv/mv-symbols1.C: Likewise.
> * g++.target/riscv/mv-symbols3.C: Likewise.
> * g++.target/riscv/mv-symbols4.C: Likewise.
> * g++.target/riscv/mv-symbols5.C: Likewise.
> * g++.target/riscv/mvc-symbols1.C: Likewise.
> * g++.target/riscv/mvc-symbols3.C: Likewise.
> ---
>  gcc/testsuite/g++.target/riscv/mv-symbols1.C| 4 ++--
>  gcc/testsuite/g++.target/riscv/mv-symbols3.C| 4 ++--
>  gcc/testsuite/g++.target/riscv/mv-symbols4.C| 4 ++--
>  gcc/testsuite/g++.target/riscv/mv-symbols5.C| 4 ++--
>  gcc/testsuite/g++.target/riscv/mvc-symbols1.C   | 4 ++--
>  gcc/testsuite/g++.target/riscv/mvc-symbols3.C   | 4 ++--
>  gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 4 ++--
>  gcc/testsuite/gcc.target/riscv/cpymem-64.c  | 4 ++--
>  gcc/testsuite/gcc.target/riscv/fmax-snan.c  | 2 +-
>  gcc/testsuite/gcc.target/riscv/fmaxf-snan.c | 2 +-
>  gcc/testsuite/gcc.target/riscv/fmin-snan.c  | 2 +-
>  gcc/testsuite/gcc.target/riscv/fminf-snan.c | 2 +-
>  gcc/testsuite/gcc.target/riscv/large-model.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/predef-1.c   | 2 +-
>  gcc/testsuite/gcc.target/riscv/predef-4.c   | 2 +-
>  gcc/testsuite/gcc.target/riscv/predef-7.c   | 2 +-
>  gcc/testsuite/gcc.target/riscv/predef-9.c   | 2 +-
>  gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c | 6 +++---
>  gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c | 6 +++---
>  .../riscv/rvv/base/abi-callee-saved-2-save-restore.c| 6 +++---
>  .../gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c | 6 +++---
>  .../gcc.target/riscv/rvv/base/abi-callee-saved-2.c  | 6 +++---
>  gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c  | 4 ++--
>  gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c  | 4 ++--
>  gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c  | 4 ++--
>  gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c  | 4 ++--
>  gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c  | 2 +-
>  gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c| 4 ++--
>  gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c  | 2 +-
>  gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c  | 2 +-
>  gcc/testsuite/gcc.target/riscv/rvv/

Re: [PATCH] [testsuite] [riscv] limit vwaddsub-1.c to rv64

2025-04-01 Thread Kito Cheng

On Tue, Apr 1, 2025 at 12:47 PM Jeff Law  wrote:
>
>
>
> On 3/31/25 7:03 PM, Alexandre Oliva wrote:
> > On Mar 31, 2025, Jeff Law  wrote:
> >> I don't immediately see anything in this test or its history to
> >> indicate it's only supposed to work for rv64.
> >
> > It's the 64-bit integral argument rs1.
> Right, but ISTM we ought to be able to handle a vector of 64bit integral
> types, especially from the intrinsics interface.
>
> Probably the thing to do is see what the intrinsics docs say.   Or maybe
> Kito knows offhand.

Long story, fortunately we have some record on that:

https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/9
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/199






>
>
> >
> > On rv64, it's in a single 64-bit register, that needs to be narrowed to
> > 32 bits and then sign extended to 64 bits to fit the semantics of
> > vwadd.vx.
> Ah.  A vx form.  That's the key here.  Indeed I don't think we can have
> a 64bit argument for the scalar argument in a .vx form insn.
>
> So objection cleared.  OK for the trunk.
> jeff
>
>

Re: [PATCH] Further use of mod_scope in modified_type_die

2025-04-01 Thread Tom Tromey

> "Richard" == Richard Biener  writes:

Richard> On Fri, Mar 28, 2025 at 8:32 AM Eric Botcazou  
wrote:
>> 
>> > Let's involve Eric here, I have no idea what was intended here.
>> 
>> We discussed it with Tom internally and he makes the final decisions here.

Richard> Well the, Tom, go ahead.

Thanks.  I made the change you suggested up-thread, and re-tested the patch.
I'm going to check it in now.

Tom

Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533]

2025-04-01 Thread Vineet Gupta

On 3/31/25 23:48, Heinrich Schuchardt wrote:
> On 3/30/25 01:49, Vineet Gupta wrote:
>> changes since v2
>>   - dump log sanfu
>>
>> ---
>> vsetvl phase4 uses LCM guided info to insert VSETVL insns.
>> It has an additional loop to insert missing vsetvls on certain edges.
>> Currently it asserts/aborts on encountering EDGE_ABNORMAL.
>> When enabling go frontend with V enabled, libgo build hits the assert.
>>
>> It seems to be safe to just skip the abnormal edge.
>
> Hello Vineet,
>
> Is there a test case where only following an abnormal edge between code 
> blocks would require to call VSETVL?
>
> In the sketched code below following the exception would require VSETVL 
> to be called while the edge from the try block to the finally block 
> would not require this.
>
> try {
>   for (..) {
>   uint16_t vectorizable math
>   if (condition)
>   throw exception
>   uint16_t vectorizable math
>   }
>   for (..) {
>   uint8_t vectorizable math
>   }
> } catch exception {
> } finally
>   for (..) {
>   uint8_t vectorizable math
>   }
> }

Yeah we are going to run testsuite with -fnon-call-exceptions to find such 
cases.

But I'd argue, there is no need to optimize vsetvl for such esoteric cases (vs.
code complexity trade-off).
After all we'd just endup with an extra VSETVL in the finally block.

Thx,
-Vineet

Re: [PATCH v2] c++: fix missing lifetime extension [PR119383]

2025-04-01 Thread Jason Merrill


On 3/31/25 4:59 PM, Marek Polacek wrote:

On Thu, Mar 27, 2025 at 11:27:04AM -0400, Jason Merrill wrote:

On 3/25/25 3:37 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?

-- >8 --
Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR ->
TARGET_EXPR folding not to change its value category.  That fix is
correct but it made us stop extending the lifetime in this testcase,
causing a wrong-code issue -- extend_ref_init_temps_1 did not see
through the extra *& because it doesn't use a tree walk.  It is not
hard to fix that, but there may be other places that need this
adjustment.  :/


Hmm, this suggests to me that we should revert the 117512 patch and instead
fix it in build_over_call, perhaps by forcing 'val' to be an lvalue after
it's built rather than relying on 'to' being an lvalue already.


Thanks for the suggestion.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?

@@ -3870,13 +3870,11 @@ cp_build_indirect_ref_1 (location_t loc, tree ptr, 
ref_operator errorstring,
  return error_mark_node;
}
else if (do_fold && TREE_CODE (pointer) == ADDR_EXPR
-  && same_type_p (t, TREE_TYPE (TREE_OPERAND (pointer, 0)))
-  /* Don't let this change the value category.  '*&TARGET_EXPR'
- is an lvalue, but folding it into 'TARGET_EXPR' would turn
- it into a prvalue of class type.  */
-  && lvalue_p (TREE_OPERAND (pointer, 0)))
+  && same_type_p (t, TREE_TYPE (TREE_OPERAND (pointer, 0
/* The POINTER was something like `&x'.  We simplify `*&x' to
-  `x'.  */
+  `x'.  This change the value category: '*&TARGET_EXPR'


"This might change" or "can change".  OK with that tweak.

Jason

[COMMITTED 26/35] gccrs: Fix ICE during const expr eval on array expressions

2025-04-01 Thread arthur . cohen

From: Philip Herron 

Array expressions are still getting turned into VIEW_CONVERT_EXPR's becuase
TYPE_MAIN_VARIANT is not set so then we might as well reuse the type-hasher
to sort this out.

Fixes Rust-GCC#3588

gcc/rust/ChangeLog:

* backend/rust-compile-context.h: only push named types
* backend/rust-compile-type.cc (TyTyResolveCompile::visit): run the 
type hasher

gcc/testsuite/ChangeLog:

* rust/compile/issue-3588.rs: New test.

Signed-off-by: Philip Herron 
---
 gcc/rust/backend/rust-compile-context.h  | 5 -
 gcc/rust/backend/rust-compile-type.cc| 2 ++
 gcc/testsuite/rust/compile/issue-3588.rs | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/rust/compile/issue-3588.rs

diff --git a/gcc/rust/backend/rust-compile-context.h 
b/gcc/rust/backend/rust-compile-context.h
index a44638817db..ce81a1d0db2 100644
--- a/gcc/rust/backend/rust-compile-context.h
+++ b/gcc/rust/backend/rust-compile-context.h
@@ -72,7 +72,10 @@ public:
   return it->second;
 
 compiled_type_map.insert ({h, type});
-push_type (type);
+
+if (TYPE_NAME (type) != NULL)
+  push_type (type);
+
 return type;
   }
 
diff --git a/gcc/rust/backend/rust-compile-type.cc 
b/gcc/rust/backend/rust-compile-type.cc
index 813e11c47cf..83e5756429f 100644
--- a/gcc/rust/backend/rust-compile-type.cc
+++ b/gcc/rust/backend/rust-compile-type.cc
@@ -481,6 +481,8 @@ TyTyResolveCompile::visit (const TyTy::ArrayType &type)
   tree folded_capacity_expr = fold_expr (capacity_expr);
 
   translated = Backend::array_type (element_type, folded_capacity_expr);
+  if (translated != error_mark_node)
+translated = ctx->insert_compiled_type (translated);
 }
 
 void
diff --git a/gcc/testsuite/rust/compile/issue-3588.rs 
b/gcc/testsuite/rust/compile/issue-3588.rs
new file mode 100644
index 000..744d9671c42
--- /dev/null
+++ b/gcc/testsuite/rust/compile/issue-3588.rs
@@ -0,0 +1,5 @@
+const FOO: i32 = if true { [1, 2, 3] } else { [2, 3, 4] }[0];
+
+pub fn test() -> i32 {
+FOO
+}
-- 
2.49.0

81 matches

Mail list logo