date:20240906

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Robin Dapp

> There were absolutely problems without this. It's a while ago now, so I'm
> struggling with the details, but as GCC only applies the mask to selected
> operations there were all sorts of issues that crept in. Zeroing the
> undefined lanes seemed to match the middle end assumptions (or, at least it
> made the UB consistent?) or maybe I read it in the code somewhere. Sorry,
> it's years since I wrote that.

So we only found two instances of this problem and both were related to
_Bools.  In case you have more cases, it would be greatly appreciated
to verify the series with them.  If you don't mind, would it be possible
to comment out the zeroing, re-run the testsuite and check for FAILs?

> This sounds like a generally good plan. Better than just zero it and hope
> that's right anyway. ;)
>
> So, in theory, is it better if amdgcn allows both? Or is that one little
> move immediate instruction in the backend going to produce better/cleaner
> middle end code?

The new predicate is supposed to inform the vectorizer of what it "prefers",
i.e. the hardware does anyway.  So if amdgcn leaves the inactive elements
undefined the predicate should only accept undefined as well.
Once the vectorizer requires zeros (or something else than undefined),
it will, explicity, emit a zeroing merge/blend in gimple.  That way the
zeroing can easily be combined with surrounding code.

Of course amdgcn could also advertise zero and then always force a zero before
loading as you currently do.  That would be unconditional, though, and the
combination with surrounding RTL might also be a bit more difficult than when
it's exposed in gimple already.

Thanks!

-- 
Regards
 Robin

[PATCH] gimple ssa: Don't use __builtin_popcount in switch exp transform [PR116616]

2024-09-06 Thread Filip Kastl

Hi,

bootstrapped and regtested on x86_64-linux.  Ok to push?

Thanks,
Filip Kastl


 8< 


Switch exponential transformation in the switch conversion pass
currently generates

tmp1 = __builtin_popcount (var);
tmp2 = tmp1 == 1;

when inserting code to determine if var is power of two.  If the target
doesn't support expanding the builtin as special instructions switch
conversion relies on this whole pattern being expanded as bitmagic.
However, it is possible that other GIMPLE optimizations move the two
statements of the pattern apart.  In that case the builtin becomes a
libgcc call in the final binary.  The call is slow and in case of
freestanding programs can result in linking error (this bug was
originally found while compiling Linux kernel).

This patch modifies switch conversion to insert the bitmagic
(var ^ (var - 1)) > (var - 1) instead of the builtin.

gcc/ChangeLog:

PR tree-optimization/116616
* tree-switch-conversion.cc (can_pow2p): Remove this function.
(gen_pow2p): Generate bitmagic instead of a builtin.  Remove the
TYPE parameter.
(switch_conversion::is_exp_index_transform_viable): Don't call
can_pow2p.
(switch_conversion::exp_index_transform): Call gen_pow2p without
the TYPE parameter.
* tree-switch-conversion.h: Remove
m_exp_index_transform_pow2p_type.

gcc/testsuite/ChangeLog:

PR tree-optimization/116616
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of the POPCOUNT internal fn call.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |  7 +-
 gcc/tree-switch-conversion.cc | 82 ---
 gcc/tree-switch-conversion.h  |  6 +-
 3 files changed, 19 insertions(+), 76 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index a8c9e03e515..4832f5b52c3 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,10 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
-   this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated and that it has been later expanded
-   into an internal function.  */
+   this switch into an array lookup.  */
 
 int foo(unsigned bar)
 {
@@ -30,4 +28,3 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index c1332a26094..2e42611f9fd 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -133,75 +133,27 @@ gen_log2 (tree op, location_t loc, tree *result, tree 
type)
   return stmts;
 }
 
-/* Is it possible to efficiently check that a value of TYPE is a power of 2?
-
-   If yes, returns TYPE.  If no, returns NULL_TREE.  May also return another
-   type.  This indicates that logarithm of the variable can be computed but
-   only after it is converted to this type.
-
-   Also see gen_pow2p.  */
-
-static tree
-can_pow2p (tree type)
-{
-  /* __builtin_popcount supports the unsigned type or its long and long long
- variants.  Choose the smallest out of those that can still fit TYPE.  */
-  int prec = TYPE_PRECISION (type);
-  int i_prec = TYPE_PRECISION (unsigned_type_node);
-  int li_prec = TYPE_PRECISION (long_unsigned_type_node);
-  int lli_prec = TYPE_PRECISION (long_long_unsigned_type_node);
-
-  if (prec <= i_prec)
-return unsigned_type_node;
-  else if (prec <= li_prec)
-return long_unsigned_type_node;
-  else if (prec <= lli_prec)
-return long_long_unsigned_type_node;
-  else
-return NULL_TREE;
-}
-
-/* Build a sequence of gimple statements checking that OP is a power of 2.  Use
-   special optabs if target supports them.  Return the result as a
-   boolean_type_node ssa name through RESULT.  Assumes that OP's value will
-   be non-negative.  The generated check may give arbitrary answer for negative
-   values.
-
-   Before computing the check, OP may have to be converted to another type.
-   This should be specified in TYPE.  Use can_pow2p to decide what this type
-   should be.
-
-   Should only be used if can_pow2p returns true for type of OP.  */
+/* Build a sequence of gimple statements checking that OP is a power of 2.
+   Return the result as a boolean_type_node ssa name through RESULT.  Assumes
+   that OP's value will be non-negative.  The generated check may give
+   arbitrary answer for negative values.  */
 
 static gimple_seq
-gen_pow2p (tree op, location_t loc, tree *result, tree type)
+gen_pow2p (tree

[PATCH] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for xtheadvector

2024-09-06 Thread Jin Ma

Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.

Ref:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116592

gcc/ChangeLog:

* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/bug-116592.c: New test.

Reported-by: nihui 
---
 gcc/config/riscv/thead.cc |  4 +--
 .../riscv/rvv/xtheadvector/bug-116592.c   | 36 +++
 2 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/bug-116592.c

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
-   return "th.vsetvli\tzero,%0,e%1,%m2";
+   return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
- return "th.vsetvli\t%0,%1,e%2,%m3";
+ return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/bug-116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/bug-116592.c
new file mode 100644
index ..937efbfd1b09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/bug-116592.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2" { target { 
rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2" { target { 
rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
-- 
2.17.1

Re: [PATCH] gimple ssa: Don't use __builtin_popcount in switch exp transform [PR116616]

2024-09-06 Thread Andrew Pinski

On Fri, Sep 6, 2024 at 12:07 AM Filip Kastl  wrote:
>
> Hi,
>
> bootstrapped and regtested on x86_64-linux.  Ok to push?
>
> Thanks,
> Filip Kastl
>
>
>  8< 
>
>
> Switch exponential transformation in the switch conversion pass
> currently generates
>
> tmp1 = __builtin_popcount (var);
> tmp2 = tmp1 == 1;
>
> when inserting code to determine if var is power of two.  If the target
> doesn't support expanding the builtin as special instructions switch
> conversion relies on this whole pattern being expanded as bitmagic.
> However, it is possible that other GIMPLE optimizations move the two
> statements of the pattern apart.  In that case the builtin becomes a
> libgcc call in the final binary.  The call is slow and in case of
> freestanding programs can result in linking error (this bug was
> originally found while compiling Linux kernel).
>
> This patch modifies switch conversion to insert the bitmagic
> (var ^ (var - 1)) > (var - 1) instead of the builtin.
>
> gcc/ChangeLog:
>
> PR tree-optimization/116616
> * tree-switch-conversion.cc (can_pow2p): Remove this function.
> (gen_pow2p): Generate bitmagic instead of a builtin.  Remove the
> TYPE parameter.
> (switch_conversion::is_exp_index_transform_viable): Don't call
> can_pow2p.
> (switch_conversion::exp_index_transform): Call gen_pow2p without
> the TYPE parameter.
> * tree-switch-conversion.h: Remove
> m_exp_index_transform_pow2p_type.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/116616
> * gcc.target/i386/switch-exp-transform-1.c: Don't test for
> presence of the POPCOUNT internal fn call.
>
> Signed-off-by: Filip Kastl 
> ---
>  .../gcc.target/i386/switch-exp-transform-1.c  |  7 +-
>  gcc/tree-switch-conversion.cc | 82 ---
>  gcc/tree-switch-conversion.h  |  6 +-
>  3 files changed, 19 insertions(+), 76 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
> b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> index a8c9e03e515..4832f5b52c3 100644
> --- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> +++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
> @@ -1,10 +1,8 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul 
> -mpopcnt -mbmi" } */
> +/* { dg-options "-O2 -fdump-tree-switchconv -mbmi" } */
>
>  /* Checks that exponential index transform enables switch conversion to 
> convert
> -   this switch into an array lookup.  Also checks that the "index variable 
> is a
> -   power of two" check has been generated and that it has been later expanded
> -   into an internal function.  */
> +   this switch into an array lookup.  */
>
>  int foo(unsigned bar)
>  {
> @@ -30,4 +28,3 @@ int foo(unsigned bar)
>  }
>
>  /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
> -/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index c1332a26094..2e42611f9fd 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -133,75 +133,27 @@ gen_log2 (tree op, location_t loc, tree *result, tree 
> type)
>return stmts;
>  }
>
> -/* Is it possible to efficiently check that a value of TYPE is a power of 2?
> -
> -   If yes, returns TYPE.  If no, returns NULL_TREE.  May also return another
> -   type.  This indicates that logarithm of the variable can be computed but
> -   only after it is converted to this type.
> -
> -   Also see gen_pow2p.  */
> -
> -static tree
> -can_pow2p (tree type)
> -{
> -  /* __builtin_popcount supports the unsigned type or its long and long long
> - variants.  Choose the smallest out of those that can still fit TYPE.  */
> -  int prec = TYPE_PRECISION (type);
> -  int i_prec = TYPE_PRECISION (unsigned_type_node);
> -  int li_prec = TYPE_PRECISION (long_unsigned_type_node);
> -  int lli_prec = TYPE_PRECISION (long_long_unsigned_type_node);
> -
> -  if (prec <= i_prec)
> -return unsigned_type_node;
> -  else if (prec <= li_prec)
> -return long_unsigned_type_node;
> -  else if (prec <= lli_prec)
> -return long_long_unsigned_type_node;
> -  else
> -return NULL_TREE;
> -}
> -
> -/* Build a sequence of gimple statements checking that OP is a power of 2.  
> Use
> -   special optabs if target supports them.  Return the result as a
> -   boolean_type_node ssa name through RESULT.  Assumes that OP's value will
> -   be non-negative.  The generated check may give arbitrary answer for 
> negative
> -   values.
> -
> -   Before computing the check, OP may have to be converted to another type.
> -   This should be specified in TYPE.  Use can_pow2p to decide what this type
> -   should be.
> -
> -   Should only be used if can_pow2p returns true for type of OP.  */
> +/* Build a sequence of gimple statements checking that OP is a power of 2

Re: [PATCH] gimple ssa: Don't use __builtin_popcount in switch exp transform [PR116616]

2024-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2024 at 12:18:30AM -0700, Andrew Pinski wrote:
> You need to do this in an unsigned types. Otherwise you get the wrong
> answer and also introduce undefined code.
> So you need to use:
> tree utype = unsigned_type_for (type);
> tree tmp3;
> if (types_compatible_p (type, utype)
>   tmp3 = op;
> else
>  tmp3 = gimple_build (&gsi, false, GSI_NEW_STMT, loc, CONVERT_EXPR, utype, 
> op);

I think NOP_EXPR is used for these conversions instead.

Otherwise agreed.

Jakub

[PATCH] fab: Factor out the main folding part of pass_fold_builtins::execute [PR116601]

2024-09-06 Thread Andrew Pinski

This is an alternative patch to fix PR tree-optimization/116601 by factoring
out the main part of pass_fold_builtins::execute into its own function so that
we don't need to repeat the code for doing the eh cleanup. It also fixes the
problem I saw with the atomics which might skip over a statement; though I don't
have a testcase for that.
Just a note on the return value of fold_all_builtin_stmt, it does not return 
true
if something was folded but rather if the iterator should increment to the next
statement or not. This was the bug with atomics, is that in some cases the 
atomic
builtins could remove the statement which is being processed but then there 
would
be another gsi_next happening.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116601

gcc/ChangeLog:

* tree-ssa-ccp.cc (optimize_memcpy): Return true if the statement
was updated.
(pass_fold_builtins::execute): Factor out folding code into ...
(fold_all_builtin_stmt): This.

gcc/testsuite/ChangeLog:

* g++.dg/torture/except-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/torture/except-2.C |  18 +
 gcc/tree-ssa-ccp.cc | 534 
 2 files changed, 276 insertions(+), 276 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C

diff --git a/gcc/testsuite/g++.dg/torture/except-2.C 
b/gcc/testsuite/g++.dg/torture/except-2.C
new file mode 100644
index 000..d896937a118
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/except-2.C
@@ -0,0 +1,18 @@
+// { dg-do compile }
+// { dg-additional-options "-fexceptions -fnon-call-exceptions" }
+// PR tree-optimization/116601
+
+struct RefitOption {
+  char subtype;
+  int string;
+} n;
+void h(RefitOption);
+void k(RefitOption *__val)
+{
+  try {
+*__val = RefitOption{};
+RefitOption __trans_tmp_2 = *__val;
+h(__trans_tmp_2);
+  }
+  catch(...){}
+}
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 44711018e0e..930432e3244 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -4166,18 +4166,19 @@ optimize_atomic_op_fetch_cmp_0 (gimple_stmt_iterator 
*gsip,
a = {};
b = {};
Similarly for memset (&a, ..., sizeof (a)); instead of a = {};
-   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;  */
+   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;
+   Returns true if the statement was changed.  */
 
-static void
+static bool
 optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, tree src, tree len)
 {
   gimple *stmt = gsi_stmt (*gsip);
   if (gimple_has_volatile_ops (stmt))
-return;
+return false;
 
   tree vuse = gimple_vuse (stmt);
   if (vuse == NULL)
-return;
+return false;
 
   gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
   tree src2 = NULL_TREE, len2 = NULL_TREE;
@@ -4202,7 +4203,7 @@ optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, 
tree src, tree len)
 }
 
   if (src2 == NULL_TREE)
-return;
+return false;
 
   if (len == NULL_TREE)
 len = (TREE_CODE (src) == COMPONENT_REF
@@ -4216,24 +4217,24 @@ optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, 
tree src, tree len)
   || !poly_int_tree_p (len)
   || len2 == NULL_TREE
   || !poly_int_tree_p (len2))
-return;
+return false;
 
   src = get_addr_base_and_unit_offset (src, &offset);
   src2 = get_addr_base_and_unit_offset (src2, &offset2);
   if (src == NULL_TREE
   || src2 == NULL_TREE
   || maybe_lt (offset, offset2))
-return;
+return false;
 
   if (!operand_equal_p (src, src2, 0))
-return;
+return false;
 
   /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
  Make sure that
  [ src + offset, src + offset + len - 1 ] is a subset of that.  */
   if (maybe_gt (wi::to_poly_offset (len) + (offset - offset2),
wi::to_poly_offset (len2)))
-return;
+return false;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
@@ -4271,6 +4272,237 @@ optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, 
tree src, tree len)
   fprintf (dump_file, "into\n  ");
   print_gimple_stmt (dump_file, stmt, 0, dump_flags);
 }
+  return true;
+}
+
+/* Fold statement STMT located at I. Maybe setting CFG_CHANGED if
+   the condition was changed and cfg_cleanup is needed to be run.
+   Returns true if the iterator I is at the statement to handle;
+   otherwise false means move the iterator to the next statement.  */
+static int
+fold_all_builtin_stmt (gimple_stmt_iterator &i, gimple *stmt,
+  bool &cfg_changed)
+{
+  /* Remove assume internal function calls. */
+  if (gimple_call_internal_p (stmt, IFN_ASSUME))
+{
+  gsi_remove (&i, true);
+  return true;
+   }
+
+  if (gimple_code (stmt) != GIMPLE_CALL)
+{
+  if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
+   return optimize_memcpy (&i, gimple_assign_lhs (stmt),
+   gimple_assign_rhs1 (stmt), NULL_TREE);
+

Re: [PATCH] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for xtheadvector

2024-09-06 Thread Xi Ruoyao

On Fri, 2024-09-06 at 15:10 +0800, Jin Ma wrote:
> Since the THeadVector vsetvli does not support vl as an immediate, we
> need to convert 0 to zero when outputting asm.
> 
> Ref:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116592

See the "bug number" section of https://gcc.gnu.org/contribute.html for
how to refer to a PR correctly, instead of putting an URL here.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] fab: Factor out the main folding part of pass_fold_builtins::execute [PR116601]

2024-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2024 at 12:21:20AM -0700, Andrew Pinski wrote:
> This is an alternative patch to fix PR tree-optimization/116601 by factoring
> out the main part of pass_fold_builtins::execute into its own function so that
> we don't need to repeat the code for doing the eh cleanup. It also fixes the
> problem I saw with the atomics which might skip over a statement; though I 
> don't
> have a testcase for that.

I'm worried about using this elsewhere, various fab foldings are meant to be
done only in that pass and not earlier.
E.g. the __builtin_constant_p folding, __builtin_assume_aligned, stack
restore, unreachable, va_{start,end,copy}.

Jakub

[PATCH] ada: Fix gcc-interface/misc.cc compilation on SPARC

2024-09-06 Thread Rainer Orth

This patch

commit 72c6938f29cbeddb3220720e68add4cf09ffd794
Author: Eric Botcazou 
Date:   Sun Aug 25 15:20:59 2024 +0200

ada: Streamline handling of low-level peculiarities of record field layout

broke the Ada build on SPARC:

In file included from ./tm_p.h:4,
 from 
/vol/gcc/src/hg/master/local/gcc/ada/gcc-interface/misc.cc:31:
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc-protos.h:46:47: error: use 
of enum ‘memmodel’ without previous declaration
   46 | extern void sparc_emit_membar_for_model (enum memmodel, int, int);
  |   ^~~~

Fixed by including memmodel.h.

Bootstrapped without regressions on sparc-sun-solaris2.11 and
i386-pc-solaris2.11.

Ok for trunk?  I guess this is obvious, though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-09-05  Rainer Orth  

gcc/ada:
* gcc-interface/misc.cc: Include memmodel.h.

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -28,6 +28,7 @@
 #include "coretypes.h"
 #include "target.h"
 #include "tree.h"
+#include "memmodel.h"
 #include "tm_p.h"
 #include "diagnostic.h"
 #include "opts.h"

Re: [PATCH] fab: Cleanup eh after optimize_memcpy [PR116601]

2024-09-06 Thread Richard Biener

On Fri, Sep 6, 2024 at 3:00 AM Andrew Pinski  wrote:
>
> On Thu, Sep 5, 2024 at 12:26 AM Richard Biener
>  wrote:
> >
> > On Thu, Sep 5, 2024 at 8:25 AM Andrew Pinski  
> > wrote:
> > >
> > > When optimize_memcpy was added in r7-5443-g7b45d0dfeb5f85,
> > > a path was added such that a statement was turned into a non-throwing
> > > statement and maybe_clean_or_replace_eh_stmt/gimple_purge_dead_eh_edges
> > > would not be called for that statement.
> > > This adds these calls to that path.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > Ok? For the trunk, 14, 13 and 12 branches?
> >
> > I wonder if this can be somehow integrated better with the existing
> >
> >   old_stmt = stmt;
> >   stmt = gsi_stmt (i);
> >   update_stmt (stmt);
> >
> >   if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
> >   && gimple_purge_dead_eh_edges (bb))
> > cfg_changed = true;
> >
> > which frankly looks odd - update_stmt shouldn't ever change stmt.  Maybe
> > moving the old_stmt assign before the switch works?
>
> I agree it looks odd/wrong. But only moving the assignment before the
> switch does not fix this issue since if we don't have a builtin (which
> we have in this case, it is a memcpy like statement):
>   __trans_tmp_2 = MEM[(const struct RefitOption &)__val_5(D)];
>
> I have a set of patches to refactor this code to simplify and fix the
> issue with the update_stmt and more (since there are issues with the
> atomic replacements too).

If it gets too big for branches consider the original change approved for
backporting (you might want to start with pushing that to trunk and then
refactoring).  Looking I think the memcpy handling is simply misplaced
in the switch as it does more than all of the rest.

> >
> > > PR tree-optimization/116601
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-ccp.cc (pass_fold_builtins::execute): Cleanup eh
> > > after optimize_memcpy on a mem statement.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.dg/torture/except-2.C: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/testsuite/g++.dg/torture/except-2.C | 18 ++
> > >  gcc/tree-ssa-ccp.cc | 11 +--
> > >  2 files changed, 27 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C
> > >
> > > diff --git a/gcc/testsuite/g++.dg/torture/except-2.C 
> > > b/gcc/testsuite/g++.dg/torture/except-2.C
> > > new file mode 100644
> > > index 000..d896937a118
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/torture/except-2.C
> > > @@ -0,0 +1,18 @@
> > > +// { dg-do compile }
> > > +// { dg-additional-options "-fexceptions -fnon-call-exceptions" }
> > > +// PR tree-optimization/116601
> > > +
> > > +struct RefitOption {
> > > +  char subtype;
> > > +  int string;
> > > +} n;
> > > +void h(RefitOption);
> > > +void k(RefitOption *__val)
> > > +{
> > > +  try {
> > > +*__val = RefitOption{};
> > > +RefitOption __trans_tmp_2 = *__val;
> > > +h(__trans_tmp_2);
> > > +  }
> > > +  catch(...){}
> > > +}
> > > diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> > > index 44711018e0e..3cd385f476b 100644
> > > --- a/gcc/tree-ssa-ccp.cc
> > > +++ b/gcc/tree-ssa-ccp.cc
> > > @@ -4325,8 +4325,15 @@ pass_fold_builtins::execute (function *fun)
> > >if (gimple_code (stmt) != GIMPLE_CALL)
> > > {
> > >   if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
> > > -   optimize_memcpy (&i, gimple_assign_lhs (stmt),
> > > -gimple_assign_rhs1 (stmt), NULL_TREE);
> > > +   {
> > > + optimize_memcpy (&i, gimple_assign_lhs (stmt),
> > > +  gimple_assign_rhs1 (stmt), NULL_TREE);
> > > + old_stmt = stmt;
> > > + stmt = gsi_stmt (i);
> > > + if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
> > > + && gimple_purge_dead_eh_edges (bb))
> > > +   cfg_changed = true;
> > > +   }
> > >   gsi_next (&i);
> > >   continue;
> > > }
> > > --
> > > 2.43.0
> > >

Re: [PATCH] gimple ssa: Don't use __builtin_popcount in switch exp transform [PR116616]

2024-09-06 Thread Richard Biener

On Fri, 6 Sep 2024, Jakub Jelinek wrote:

> On Fri, Sep 06, 2024 at 12:18:30AM -0700, Andrew Pinski wrote:
> > You need to do this in an unsigned types. Otherwise you get the wrong
> > answer and also introduce undefined code.
> > So you need to use:
> > tree utype = unsigned_type_for (type);
> > tree tmp3;
> > if (types_compatible_p (type, utype)
> >   tmp3 = op;
> > else
> >  tmp3 = gimple_build (&gsi, false, GSI_NEW_STMT, loc, CONVERT_EXPR, utype, 
> > op);
> 
> I think NOP_EXPR is used for these conversions instead.

There is gimple_convert to abstract this.

Richard.

Re: [PATCH] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for xtheadvector

2024-09-06 Thread Jin Ma

> See the "bug number" section of https://gcc.gnu.org/contribute.html for
> how to refer to a PR correctly, instead of putting an URL here.

I am very sorry to make this mistake, thank you for reminding me. I will make 
corrections.

BR
Jin

Re: [PATCH] fab: Factor out the main folding part of pass_fold_builtins::execute [PR116601]

2024-09-06 Thread Richard Biener

On Fri, Sep 6, 2024 at 9:31 AM Jakub Jelinek  wrote:
>
> On Fri, Sep 06, 2024 at 12:21:20AM -0700, Andrew Pinski wrote:
> > This is an alternative patch to fix PR tree-optimization/116601 by factoring
> > out the main part of pass_fold_builtins::execute into its own function so 
> > that
> > we don't need to repeat the code for doing the eh cleanup. It also fixes the
> > problem I saw with the atomics which might skip over a statement; though I 
> > don't
> > have a testcase for that.
>
> I'm worried about using this elsewhere, various fab foldings are meant to be
> done only in that pass and not earlier.
> E.g. the __builtin_constant_p folding, __builtin_assume_aligned, stack
> restore, unreachable, va_{start,end,copy}.

Maybe we can document this fact better or name the function differently?

>
> Jakub
>

[PATCH v2] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for XTheadVector

2024-09-06 Thread Jin Ma

Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.

PR target/116592

gcc/ChangeLog:

* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.

Reported-by: nihui 
---
 gcc/config/riscv/thead.cc |  4 +--
 .../riscv/rvv/xtheadvector/pr116592.c | 36 +++
 2 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
-   return "th.vsetvli\tzero,%0,e%1,%m2";
+   return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
- return "th.vsetvli\t%0,%1,e%2,%m3";
+ return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
new file mode 100644
index ..937efbfd1b09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2" { target { 
rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2" { target { 
rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
-- 
2.17.1

Re: [PATCH] fab: Factor out the main folding part of pass_fold_builtins::execute [PR116601]

2024-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2024 at 09:51:38AM +0200, Richard Biener wrote:
> On Fri, Sep 6, 2024 at 9:31 AM Jakub Jelinek  wrote:
> >
> > On Fri, Sep 06, 2024 at 12:21:20AM -0700, Andrew Pinski wrote:
> > > This is an alternative patch to fix PR tree-optimization/116601 by 
> > > factoring
> > > out the main part of pass_fold_builtins::execute into its own function so 
> > > that
> > > we don't need to repeat the code for doing the eh cleanup. It also fixes 
> > > the
> > > problem I saw with the atomics which might skip over a statement; though 
> > > I don't
> > > have a testcase for that.
> >
> > I'm worried about using this elsewhere, various fab foldings are meant to be
> > done only in that pass and not earlier.
> > E.g. the __builtin_constant_p folding, __builtin_assume_aligned, stack
> > restore, unreachable, va_{start,end,copy}.
> 
> Maybe we can document this fact better or name the function differently?

Some of it is documented already in the source.
case BUILT_IN_CONSTANT_P:
  /* Resolve __builtin_constant_p.  If it hasn't been
 folded to integer_one_node by now, it's fairly
 certain that the value simply isn't constant.  */
  result = integer_zero_node;
or
case BUILT_IN_VA_START:
case BUILT_IN_VA_END:
case BUILT_IN_VA_COPY:
  /* These shouldn't be folded before pass_stdarg.  */
  result = optimize_stdarg_builtin (stmt);
Obviously, if either is done much earlier, then the former can fold to 1
(e.g. if it is before IPA or shortly after IPA and not all usual propagation
after inlining etc. is done already), or pass_stdarg hasn't been done, etc.

Jakub

Re: [PATCH v2] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for XTheadVector

2024-09-06 Thread 钟居哲

I think it's better to add a "vsetvli" assembly check in testcase.



juzhe.zh...@rivai.ai
 
From: Jin Ma
Date: 2024-09-06 15:52
To: gcc-patches
CC: jeffreyalaw; juzhe.zhong; pan2.li; kito.cheng; christoph.muellner; 
shuizhuyuanluo; pinskia; xry111; jinma.contrib; Jin Ma
Subject: [PATCH v2] RISC-V: Fix illegal operands "th.vsetvli zero,0,e32,m8" for 
XTheadVector
Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.
 
PR target/116592
 
gcc/ChangeLog:
 
* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.
 
Reported-by: nihui 
---
gcc/config/riscv/thead.cc |  4 +--
.../riscv/rvv/xtheadvector/pr116592.c | 36 +++
2 files changed, 38 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
 
diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
- return "th.vsetvli\tzero,%0,e%1,%m2";
+ return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
-   return "th.vsetvli\t%0,%1,e%2,%m3";
+   return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
new file mode 100644
index ..937efbfd1b09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2" { target { 
rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2" { target { 
rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
-- 
2.17.1

Re: [PATCH] ada: Fix gcc-interface/misc.cc compilation on SPARC

2024-09-06 Thread Eric Botcazou

> commit 72c6938f29cbeddb3220720e68add4cf09ffd794
> Author: Eric Botcazou 
> Date:   Sun Aug 25 15:20:59 2024 +0200
> 
> ada: Streamline handling of low-level peculiarities of record field
> layout
> 
> broke the Ada build on SPARC:
> 
> In file included from ./tm_p.h:4,
>  from
> /vol/gcc/src/hg/master/local/gcc/ada/gcc-interface/misc.cc:31:
> /vol/gcc/src/hg/master/local/gcc/config/sparc/sparc-protos.h:46:47: error:
> use of enum ‘memmodel’ without previous declaration 46 | extern void
> sparc_emit_membar_for_model (enum memmodel, int, int);
>   |   ^~~~
> 
> Fixed by including memmodel.h.

Sorry about that, a small merge glitch, the fix is already in the pipeline.

-- 
Eric Botcazou

Re: [PATCH v2] RISC-V: Fix illegal operands "th.vsetvli zero,0,e32,m8" for XTheadVector

2024-09-06 Thread Jin Ma

> I think it's better to add a "vsetvli" assembly check in testcase.

> juzhe.zh...@rivai.ai

Yeah, apparently I forgot to modify it  :）

Thanks.
Jin

[PATCH v3] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for XTheadVector

2024-09-06 Thread Jin Ma

Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.

PR target/116592

gcc/ChangeLog:

* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.

Reported-by: nihui 
---
 gcc/config/riscv/thead.cc |  4 +--
 .../riscv/rvv/xtheadvector/pr116592.c | 36 +++
 2 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
-   return "th.vsetvli\tzero,%0,e%1,%m2";
+   return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
- return "th.vsetvli\t%0,%1,e%2,%m3";
+ return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
new file mode 100644
index ..1350f739c42a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
@@ -0,0 +1,36 @@
+/* { dg-do assemble } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2" { target { 
rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2" { target { 
rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
-- 
2.17.1

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Andrew Stubbs


On 06/09/2024 08:06, Robin Dapp wrote:

There were absolutely problems without this. It's a while ago now, so I'm
struggling with the details, but as GCC only applies the mask to selected
operations there were all sorts of issues that crept in. Zeroing the
undefined lanes seemed to match the middle end assumptions (or, at least it
made the UB consistent?) or maybe I read it in the code somewhere. Sorry,
it's years since I wrote that.


So we only found two instances of this problem and both were related to
_Bools.  In case you have more cases, it would be greatly appreciated
to verify the series with them.  If you don't mind, would it be possible
to comment out the zeroing, re-run the testsuite and check for FAILs?


I looked it up, and it was an execution failure in testcase 
gfortran.dg/assumed_rank_1.f90 that prompted me to add the initialization.


I believe I observed other cases of this too, but I can't find a list.

It shouldn't be too hard to run the test you suggest, but I won't have 
the results today.



This sounds like a generally good plan. Better than just zero it and hope
that's right anyway. ;)

So, in theory, is it better if amdgcn allows both? Or is that one little
move immediate instruction in the backend going to produce better/cleaner
middle end code?


The new predicate is supposed to inform the vectorizer of what it "prefers",
i.e. the hardware does anyway.  So if amdgcn leaves the inactive elements
undefined the predicate should only accept undefined as well.
Once the vectorizer requires zeros (or something else than undefined),
it will, explicity, emit a zeroing merge/blend in gimple.  That way the
zeroing can easily be combined with surrounding code.

Of course amdgcn could also advertise zero and then always force a zero before
loading as you currently do.  That would be unconditional, though, and the
combination with surrounding RTL might also be a bit more difficult than when
it's exposed in gimple already.


OK, good to know, thanks!

Andrew

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Robin Dapp

> > So we only found two instances of this problem and both were related to
> > _Bools.  In case you have more cases, it would be greatly appreciated
> > to verify the series with them.  If you don't mind, would it be possible
> > to comment out the zeroing, re-run the testsuite and check for FAILs?
>
> I looked it up, and it was an execution failure in testcase 
> gfortran.dg/assumed_rank_1.f90 that prompted me to add the initialization.

Ah, I saw that one as well here.  Thanks, will have a look locally.

-- 
Regards
 Robin

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-06 Thread Thomas Schwinge

Hi!

On 2024-08-16T15:36:29+, Prathamesh Kulkarni  wrote:
>> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
>> :
>> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
>>  wrote:
>> >>> From: Thomas Schwinge 
>> >>> Sent: Friday, August 9, 2024 12:55 AM
>> >
>> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski 
>> wrote:
>>  On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
>>   wrote:
>> > compiled with -fopenmp -foffload=nvptx-none now fails with:
>> > gcc: error: unrecognized command-line option '-m64'
>> > nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit
>> >>> status compilation terminated.
>> >>>
>> >>> Heh.  Yeah...
>> >>>
>> > As mentioned in RFC email, this happens because
>> > nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host
>> > compiler
>> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
>> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
>> >>> options.
>> >
>> >>> So, my idea is: instead of the current strategy that the host
>> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc.,
>> >>> which the 'mkoffload's then interpret and re-synthesize '-m64' etc.
>> >>> -- how about we instead directly tell the 'mkoffload's the relevant
>> >>> ABI options?  That is, 'TARGET_OFFLOAD_OPTIONS' instead synthesizes
>> >>> '-foffload-abi=-m64'
>> >>> etc., which the 'mkoffload's can then readily use.  Could you please
>> >>> give that a try, and/or does anyone see any issues with that approach?
>> >>>
>> >>> And use something like '-foffload-abi=disable' to replace the current:
>> >>>
>> >>>/* PR libgomp/65099: Currently, we only support offloading in 64-bit
>> >>>   configurations.  */
>> >>>if (offload_abi == OFFLOAD_ABI_LP64)
>> >>>  {
>> >>>
>> >>> (As discussed before, this should be done differently altogether,
>> >>> but that's for another day.)
>> >> Sorry, I don't quite follow. Currently we enable offloading if
>> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from
>> >> -foffload-abi=lp64. If we change -foffload-abi to instead specify
>> >> host-specific ABI opts, I guess mkoffload will still need to somehow
>> >> figure out which ABI is used, so it can disable offloading for 32-bit
>> >> ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS for each host to
>> pass -foffload-abi=disable if TARGET_ILP32 is set and offload target
>> is nvptx, but not sure if that'd be correct ?
>> >
>> > Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
>> > implementations return either the correct host flags to be used by the
>> > 'mkoffload's (the case that offloading is supported for the current
>> > host flags/ABI configuration), or otherwise return '-foffload-abi=disable'.

Oh..., you're right of course: we do need to continue to tell the
'mkoffload's which kind of offload code to generate!  My bad...

>> >> I added another option -foffload-abi-host-opts to specify host abi
>> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit which
>> >> mkoffload can use to enable/disable offloading (as before).
>> >
>> > I'm not sure however, if this additional option is really necessary?
> Well, my concern was if that'd change the behavior for TARGET_ILP32 ?
> IIUC, currently for -foffload-abi=ilp32, mkoffload will create empty C file
> for ptx_cfile_name (instead of munged ptx assembly since offloading will be 
> disabled),
> and pass that to host compiler with -m32 option (in compile_native).
>
> If we change -foffload-abi to specify ABI host opts, and pass 
> -foffload-abi=disable 
> for TARGET_ILP32 in TARGET_OFFLOAD_OPTIONS, mkoffload will no longer be able 
> to
> pass 32-bit ABI opts to host compiler, which may result in linker error (arch 
> mismatch?)
> if the host object files are 32-bit ABI and xnvptx-none.o is 64-bit (assuming 
> the host
> compiler is configured to generate 64-bit code-gen by default) ?
>
> So, I thought to add another option -foffload-abi-host-opts to pass 
> host-specific ABI opts,
> and keep -foffload-abi as-is to infer ABI type for enabling/disabling 
> offloading.

Quite right, yes.

>> -Original Message-
>> From: Richard Biener 
>> Sent: Tuesday, August 13, 2024 10:06 PM

>> Since we do not support 32 -> 64 bit offload

We don't -- but it's generally possible.  As Tobias recently educated
me, the OpenMP specification explicitly does *not* require matching
host 'sizeof (void *)' and device 'sizeof (void *)'.

At the LLVM workshop at ISC High Performance 2024 there was a (short)
presentation of someone who did LLVM offloading from host to a different
architecture, and from there again to a yet different architecture.  Heh!

Anyway:

>> wouldn’t the most
>> pragmatic fix be to recognize -m64 in the nvptx backend (and ignore
>> it)?

> I think nvptx already supports m64 and ignores it.
> From nvptx.opt:
>
> m64
> Target RejectNegative Mask(ABI64)
> Ignored, but preserved for backward compatibility.  Only 64-bit ABI is
> supported.

Re: [PATCH v3 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-09-06 Thread Richard Sandiford

"Kong, Lingling"  writes:
> Hi,
>
> This version has added a new optab named 'cfmovcc'. The new optab is used
> in the middle end to expand to cfcmov. And simplified my patch by trying to
> generate the conditional faulting movcc in noce_try_cmove_arith function.
>
> All the changes passed bootstrap & regtest x86-64-pc-linux-gnu.
> We also tested spec with SDE and passed the runtime test.
>
> Ok for trunk?
>
>
> APX CFCMOV[1] feature implements conditionally faulting which means
> If the comparison is false, all memory faults are suppressed when load
> or store a memory operand. Now we could load or store a memory
> operand may trap or fault for conditional move.
>
> In middle-end, now we don't support a conditional move if we knew
> that a load from A or B could trap or fault. To enable CFCMOV, we
> added a new optab named cfmovcc.
>
> Conditional move suppress fault for condition mem store would not
> move any arithmetic calculations. For condition mem load now just
> support a conditional move one trap mem and one no trap and no mem
> cases.

Sorry if this is going over old ground (I haven't read the earlier
versions yet), but: instead of adding a new optab, could we treat
CFCMOV as a scalar instance of maskload_optab?  Robin is working on
adding an "else" value for when the condition/mask is false.  After
that, it would seem to be a pretty close match to CFCMOV.

One reason for preferring maskload is that it makes the load an
explicit part of the interface.  We could then potentially use
it in gimple too, not just expand.

Thanks,
Richard

>
>
> [1].https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
>
> gcc/ChangeLog:
>
>* doc/md.texi: Add cfmovcc insn pattern explanation.
>* ifcvt.cc (can_use_cmove_load_mem_notrap): New func
>for conditional faulting movcc for load.
>(can_use_cmove_store_mem_notrap): New func for conditional
>faulting movcc for store.
>(can_use_cfmovcc):  New func for conditional faulting.
>(noce_try_cmove_arith): Try to convert to conditional faulting
>movcc.
>(noce_process_if_block): Ditto.
>* optabs.cc (emit_conditional_move): Handle cfmovcc.
>(emit_conditional_move_1): Ditto.
>* optabs.def (OPTAB_D): New optab.
> ---
> gcc/doc/md.texi |  10 
> gcc/ifcvt.cc| 119 
> gcc/optabs.cc   |  14 +-
> gcc/optabs.def  |   1 +
> 4 files changed, 132 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index a9259112251..5f563787c49 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8591,6 +8591,16 @@ Return 1 if operand 1 is a normal floating point 
> number and 0
> otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> has mode @code{SImode}, and operand 1 has mode @var{m}.
> +@cindex @code{cfmov@var{mode}cc} instruction pattern
> +@item @samp{cfmov@var{mode}cc}
> +Similar to @samp{mov@var{mode}cc} but for conditional faulting,
> +If the comparison is false, all memory faults are suppressed
> +when load or store a memory operand.
> +
> +Conditionally move operand 2 or operand 3 into operand 0 according
> +to the comparison in operand 1.  If the comparison is true, operand 2
> +is moved into operand 0, otherwise operand 3 is moved.
> +
> @end table
>  @end ifset
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 6487574c514..59845390607 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -778,6 +778,9 @@ static bool noce_try_store_flag_mask (struct noce_if_info 
> *);
> static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
> rtx, rtx, rtx, rtx = NULL, 
> rtx = NULL);
> static bool noce_try_cmove (struct noce_if_info *);
> +static bool can_use_cmove_load_mem_notrap (rtx, rtx);
> +static bool can_use_cmove_store_mem_notrap (rtx, rtx, rtx, bool);
> +static bool can_use_cfmovcc (struct noce_if_info *);
> static bool noce_try_cmove_arith (struct noce_if_info *);
> static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
> static bool noce_try_minmax (struct noce_if_info *);
> @@ -2132,6 +2135,69 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool 
> simple)
>return true;
> }
> +/* Return TRUE if we could convert "if (test) x = *a; else x = b;"
> +   or "if (test) x = a; else x = *b;" to conditional faulting movcc,
> +   i.e. x86 cfcmov, especially when load a or b may cause memmory faults.  */
> +
> +static bool
> +can_use_cmove_load_mem_notrap (rtx a, rtx b)
> +{
> +  /* Just handle a conditional move from one trap MEM + other non_trap,
> + non mem cases.  */
> +  if (!(MEM_P (a) ^ MEM_P (b)))
> +  return false;
> +  bool a_trap = may_trap_or_fault_p (a);
> +  bool b_trap = may_trap_or_fault_p (b);
> +
> +  if (!(a_trap ^ b_trap))
> +

[PATCH v2] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-06 Thread Torbjörn SVENSSON

Ok for trunk and releases/gcc-14?

Changes since v1:

- Corrected changelog entry for pac-15.c
- Added a tab before all the asm instructions in the pac-*.c and bti-*.c tests
- Corrected the expected number of bti instructions for bti-2.c as it 
previously counted the .file directive

--

Some of the test cases were scanning for "bti", but it would,
incorrectly, match the ".arch_extenssion pacbti".
Also, keep test cases active if a supported Cortex-M core is supplied.

gcc/testsuite/ChangeLog:

* gcc.target/arm/bti-1.c: Enable for Cortex-M(52|55|85) and
check for asm instructions starting with a tab.
* gcc.target/arm/bti-2.c: Likewise.
* gcc.target/arm/pac-1.c: Check for asm instructions starting
with a tab.
* gcc.target/arm/pac-2.c: Likewise.
* gcc.target/arm/pac-3.c: Likewise.
* gcc.target/arm/pac-6.c: Likewise.
* gcc.target/arm/pac-7.c: Likewise.
* gcc.target/arm/pac-8.c: Likewise.
* gcc.target/arm/pac-9.c: Likewise.
* gcc.target/arm/pac-10.c: Likewise.
* gcc.target/arm/pac-11.c: Likewise.
* gcc.target/arm/pac-sibcall.c: Likewise.
* gcc.target/arm/pac-15.c: Enable for Cortex-M(52|55|85).

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/testsuite/gcc.target/arm/bti-1.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/bti-2.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-1.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-10.c  | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-11.c  | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-15.c  | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-2.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-3.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-4.c   | 2 +-
 gcc/testsuite/gcc.target/arm/pac-6.c   | 6 +++---
 gcc/testsuite/gcc.target/arm/pac-7.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-8.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-9.c   | 4 ++--
 gcc/testsuite/gcc.target/arm/pac-sibcall.c | 2 +-
 14 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bti-1.c 
b/gcc/testsuite/gcc.target/arm/bti-1.c
index 79dd8010d2d..70a62b5a70c 100644
--- a/gcc/testsuite/gcc.target/arm/bti-1.c
+++ b/gcc/testsuite/gcc.target/arm/bti-1.c
@@ -1,6 +1,6 @@
 /* Check that GCC does bti instruction.  */
 /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } { "-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } } */
 /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
-mbranch-protection=bti --save-temps" } */
 
 int
@@ -9,4 +9,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler "bti" } } */
+/* { dg-final { scan-assembler "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/bti-2.c 
b/gcc/testsuite/gcc.target/arm/bti-2.c
index 33910563849..7c901d06967 100644
--- a/gcc/testsuite/gcc.target/arm/bti-2.c
+++ b/gcc/testsuite/gcc.target/arm/bti-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* -Os to create jump table.  */
 /* { dg-options "-Os" } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } { "-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } } */
 /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
-mbranch-protection=bti --save-temps" } */
 
 extern int f1 (void);
@@ -55,4 +55,4 @@ lab2:
   return 2;
 }
 
-/* { dg-final { scan-assembler-times "bti" 15 } } */
+/* { dg-final { scan-assembler-times "\tbti" 14 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-1.c 
b/gcc/testsuite/gcc.target/arm/pac-1.c
index 9b26f62b65f..e0eea0858e0 100644
--- a/gcc/testsuite/gcc.target/arm/pac-1.c
+++ b/gcc/testsuite/gcc.target/arm/pac-1.c
@@ -6,6 +6,6 @@
 
 #include "pac.h"
 
-/* { dg-final { scan-assembler-times "pac\tip, lr, sp" 2 } } */
-/* { dg-final { scan-assembler-times "aut\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\tpac\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\taut\tip, lr, sp" 2 } } */
 /* { dg-final { scan-assembler-not "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-10.c 
b/gcc/testsuite/gcc.target/arm/pac-10.c
index a794195e8f6..6da8434aeaf 100644
--- a/gcc/testsuite/gcc.target/arm/pac-10.c
+++ b/gcc/testsuite/gcc.target/arm/pac-10.c
@@ -5,6 +5,6 @@
 
 #include "pac.h"
 
-/* { dg-final { scan-assembler "pac\tip, lr, sp" } } */
-/* { dg-final { scan-assembler "aut\tip, lr, sp" } } */
+/* { dg-final { scan-assembler "\tpac\tip, lr, sp" } } */
+/* { dg-final { scan-assembler "\taut\tip, lr, sp" } } */
 /* { dg-final { scan-assembler-not "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-11.c 
b/gcc/testsuite/gcc.target/arm/pac-11.c
index 37ffc93b41b..0bb727c2c80 100644
--- a/gcc/testsuite

Re: [PATCH RFA] libstdc++: avoid GLIBCXX redefinition

2024-09-06 Thread Jonathan Wakely

On Fri, 6 Sept 2024 at 02:47, Jason Merrill  wrote:
>
> On 8/28/24 6:22 AM, Jason Merrill wrote:
> > On 8/28/24 6:09 AM, Jonathan Wakely wrote:
> >> On Wed, 28 Aug 2024 at 10:58, Jason Merrill  wrote:
> >>>
> >>> On 8/28/24 5:55 AM, Jonathan Wakely wrote:
>  On Wed, 28 Aug 2024 at 10:54, Jason Merrill wrote:
> >
> > Tested x86_64-pc-linux-gnu, OK for trunk?
> 
>  Redefining that macro to invalidate PCH is a bit of a hack, but it's
>  what we have for now, so OK for trunk, thanks.
> >>>
> >>> If it's just to invalidate PCH, do we want to #undef instead?
> >>
> >> It might not even be necessary now, since r14-3276-g91315f23ba127e
> >> removed any -include bits/stdc++.h from the flags. I'd need to look
> >> into that though.
> >
> > I suppose it could still find some random other PCH corresponding to the
> > first #include, though that seems very unlikely.
> >
> >>>   Does
> >>> anything care about the actual value of the macro?
> >>
> >> No, I don't think so.
> >>
> >> But #undef would only work if it comes after including
> >> , so we'd need to force an include of that into the
> >> flags.
> >
> > I meant #undef before #define in c++config.h so we get the normal value.
>
> Like so.  Do you prefer this or the original patch?

If this is still sufficient to invalidate the PCH then I prefer this
second version. OK for trunk, thanks.

Re: [PATCH 3/3] Handle non-grouped stores as single-lane SLP

2024-09-06 Thread Richard Biener

On Thu, 5 Sep 2024, Richard Biener wrote:

> The following enables single-lane loop SLP discovery for non-grouped stores
> and adjusts vectorizable_store to properly handle those.
> 
> For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop,
> not running into the "not falling back to strided accesses" bail-out.
> I have not investigated in detail.
> 
> There is a set of i386 target assembler test FAILs,
> gcc.target/i386/pr88531-2[bc].c in particular fail because the
> target cannot identify SLP emulated gathers, see another mail from me.
> Others need adjustment, I've adjusted one with this patch only.
> In particular there are gcc.target/i386/cond_op_fma_*-1.c FAILs
> that are because we no longer fold a VEC_COND_EXPR during the
> region value-numbering we do after vectorization since we
> code-generate a { 0.0, ... } constant in the VEC_COND_EXPR now
> instead of having a separate statement which gets forwarded
> and then triggers folding.  This leads to sligtly different
> code generation.  The solution is probably to use gimple_build
> when building stmts or, in this case, directly emit .COND_FMA
> instead of .FMA and a VEC_COND_EXPR.
> 
> gcc.dg/vect/slp-19a.c mixes contiguous 8-lane SLP with a single
> lane contiguous store from one lane of the 8-lane load and we
> expect to use load-lanes for this reason but the heuristic for
> forcing single-lane rediscovery as implemented doesn't trigger
> here as it treats both SLP instances separately.  FAILs on RISC-V
> 
> gcc.dg/vect/slp-19c.c shows we fail to implement an interleaving
> scheme for group_size 12 (by extension using the group_size 3
> scheme to reduce to 4 lanes and then continue with a pow2 scheme
> would work);  we are also not considering load-lanes because of
> the above reason, but aarch64 cannot do ld12.  FAILs on AARCH64
> (load requires three vectors) and x86_64.
> 
> gcc.dg/vect/slp-19c.c FAILs with variable-length vectors because
> of "SLP induction not supported for variable-length vectors".
> 
> gcc.target/aarch64/pr110449.c will FAIL because the (contested)
> optimization in r14-2367-g224fd59b2dc8a5 was only applied to
> loop-vect but not SLP vect.  I'll leave it to target maintainers
> to either XFAIL (the optimization is bad) or remove the test.

I have now pushed this as r15-3509-gd34cda72098867 - there is
fallout (see above), some is analyzed and tracked by bugs
linked from the meta-bug PR116578.

This should finish the larger changes around SLP in the vectorizer.
What's left is making not using SLP fatal - I'll assess the fallout
of doing that next week and I'm going to talk about the whole
topic during GNU Cauldron the weekend after.

The idea is to make it reasonable easy to "go back", so no non-SLP
code is ripped out for GCC 15.  There's stage1 and stage3 time left
to address missing SLP or target features or at least document them
in bugzilla (I've only thoroughly analyzed x86-64).

Richard. 

>   * tree-vect-slp.cc (vect_analyze_slp): Perform single-lane
>   loop SLP discovery for non-grouped stores.  Move check on the root
>   for re-doing SLP analysis with a single lane for load/store-lanes
>   earlier and make sure we are dealing with a grouped access.
>   * tree-vect-stmts.cc (vectorizable_store): Always set
>   vec_num for SLP.
> 
>   * gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP.
>   * gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
>   * gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
>   * gcc.dg/vect/slp-12b.c: Likewise.
>   * gcc.dg/vect/slp-12c.c: Likewise.
>   * gcc.dg/vect/slp-19a.c: Likewise.
>   * gcc.dg/vect/slp-19b.c: Likewise.
>   * gcc.dg/vect/slp-4-big-array.c: Likewise.
>   * gcc.dg/vect/slp-4.c: Likewise.
>   * gcc.dg/vect/slp-5.c: Likewise.
>   * gcc.dg/vect/slp-7.c: Likewise.
>   * gcc.dg/vect/slp-perm-7.c: Likewise.
>   * gcc.dg/vect/slp-37.c: Likewise.
>   * gcc.dg/vect/slp-26.c: RISC-V can now SLP two instances.
>   * gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of
>   initialization loop.
>   * gcc.dg/vect/slp-reduc-5.c: Likewise.
>   * gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL.  SLP can handle
>   inner loop inductions with multiple vector stmt copies.
>   * gfortran.dg/vect/vect-8.f90: Adjust expected number of
>   vectorized loops.
>   * gcc.target/i386/vectorize1.c: Adjust what we scan for.
> ---
>  gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c  |  2 +-
>  .../gcc.dg/vect/fast-math-vect-call-1.c   |  2 +-
>  .../gcc.dg/vect/fast-math-vect-call-2.c   |  2 +-
>  .../gcc.dg/vect/no-scevccp-outer-12.c |  3 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c |  5 ++-
>  gcc/testsuite/gcc.dg/vect/slp-12b.c   |  2 +-
>  gcc/testsuite/gcc.dg/vect/slp-12c.c   |  2 +-
>  gcc/testsuite/gcc.dg/vect/slp-19a.c   |  2 +-
>  gcc/testsuite/gcc.dg/vect/slp-19b.c   |  2 +-
>  gcc/testsuite/gcc.dg/vect/slp-26.c|

[patch,reload] PR116326: Add #define IN_RELOAD1_CC

2024-09-06 Thread Georg-Johann Lay


The reason for PR116326 is that LRA and reload require different
ELIMINABLE_REGS for a multi-register frame pointer.  As ELIMINABLE_REGS
is used to initialize static const objects, it is not possible to make
ELIMINABLE_REGS to depend on options or patch it in some target hook.

It was also concluded that it is not desirable to adjust reload so that
it behaves like LRA, but a hack like #define IN_RELOAD1_CC at the top
of reload1.cc would be fine, see

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116326#c8

This is an according patch that defines IN_RELOAD1_CC and uses it in
avr.h to define ELIMINABLE_REGS.

This is only required for trunk.

PR116326 occurred for some test case in avr-torture.exp, so I didn't
duplicate the test case.

As it appears, this patch also fixes:

https://gcc.gnu.org/PR116324
https://gcc.gnu.org/PR116325
https://gcc.gnu.org/PR116550

Johann

--

AVR: target/116326 - Adjust ELIMINABLE_REGS to reload resp. LRA.

PR target/116326
gcc/
* reload1.cc (IN_RELOAD1_CC): Define prior to all includes.
* config/avr/avr.h (ELIMINABLE_REGS): Depend on IN_RELOAD1_CC.AVR: target/116326 - Adjust ELIMINABLE_REGS to reload resp. LRA.

PR target/116326
gcc/
* reload1.cc (IN_RELOAD1_CC): Define prior to all includes.
* config/avr/avr.h (ELIMINABLE_REGS): Depend on IN_RELOAD1_CC.

diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index 1cf4180e534..d540f2fcb13 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -308,11 +308,20 @@ enum reg_class {
 
 #define STATIC_CHAIN_REGNUM ((AVR_TINY) ? 18 :2)
 
+#ifndef IN_RELOAD1_CC
+#define ELIMINABLE_REGS		\
+  {\
+{ ARG_POINTER_REGNUM, STACK_POINTER_REGNUM },   \
+{ ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM },		\
+{ FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }		\
+  }
+#else
 #define ELIMINABLE_REGS {	\
 { ARG_POINTER_REGNUM, STACK_POINTER_REGNUM },   \
 { ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM },   \
 { FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM }, \
 { FRAME_POINTER_REGNUM + 1, STACK_POINTER_REGNUM + 1 } }
+#endif /* In reload1.cc ? */
 
 #define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET)			\
   OFFSET = avr_initial_elimination_offset (FROM, TO)
diff --git a/gcc/reload1.cc b/gcc/reload1.cc
index 2e059b09970..cfa0be24f32 100644
--- a/gcc/reload1.cc
+++ b/gcc/reload1.cc
@@ -17,6 +17,15 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+// PR116326: Reload and LRA use different representations of ELIMINABLE_REGS
+// for a multi-register frame-pointer like it is the case for avr.  Since
+// ELIMINABLE_REGS is used to initialize a static const object, it is not
+// possible to cater for different ELIMINABLE_REGS by means of a command line
+// option like -m[no-]lra.  But with the following macro, we can use #ifdef.
+// This hack was added to reload1.cc because after a complete Reload -> LRA
+// transition, this file will be removed.
+#define IN_RELOAD1_CC
+
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"

[PATCH] RISC-V: Add more vector-vector extract cases.

2024-09-06 Thread Robin Dapp

Hi,

this adds a V16SI -> V4SI and related i.e. "quartering" vector-vector
extract expander for VLS modes.  It helps with unnecessary spills in
x264.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract):
Add quarter vec-vec extract.
* config/riscv/vector-iterators.md: New iterators.
---
 gcc/config/riscv/autovec.md  |  28 
 gcc/config/riscv/vector-iterators.md | 184 +++
 2 files changed, 212 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a07aa0c26fd..905dcfe2dbc 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1488,6 +1488,34 @@ (define_expand "vec_extract"
   DONE;
 })
 
+(define_expand "vec_extract"
+  [(set (match_operand:0 "nonimmediate_operand")
+ (vec_select:
+   (match_operand:VLS_HAS_QUARTER   1 "register_operand")
+   (parallel
+[(match_operand 2 "immediate_operand")])))]
+  "TARGET_VECTOR"
+{
+  int sz = GET_MODE_NUNITS (mode).to_constant ();
+  int part = INTVAL (operands[2]);
+
+  rtx start = GEN_INT (part * sz);
+  rtx tmp = operands[1];
+
+  if (part != 0)
+{
+  tmp = gen_reg_rtx (mode);
+
+  rtx ops[] = {tmp, operands[1], start};
+  riscv_vector::emit_vlmax_insn
+   (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+riscv_vector::BINARY_OP, ops);
+}
+
+  emit_move_insn (operands[0], gen_lowpart (mode, tmp));
+  DONE;
+})
+
 ;; -
 ;;  [FP] Binary operations
 ;; -
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f27b89e841b..62195f65170 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4552,3 +4552,187 @@ (define_mode_attr vls_half [
   (V256DF "v128df")
   (V512DF "v256df")
 ])
+
+(define_mode_iterator VLS_HAS_QUARTER [
+  (V4QI "riscv_vector::vls_mode_valid_p (V4QImode)")
+  (V8QI "riscv_vector::vls_mode_valid_p (V8QImode)")
+  (V16QI "riscv_vector::vls_mode_valid_p (V16QImode)")
+  (V4HI "riscv_vector::vls_mode_valid_p (V4HImode)")
+  (V8HI "riscv_vector::vls_mode_valid_p (V8HImode)")
+  (V16HI "riscv_vector::vls_mode_valid_p (V16HImode)")
+  (V4SI "riscv_vector::vls_mode_valid_p (V4SImode)")
+  (V8SI "riscv_vector::vls_mode_valid_p (V8SImode)")
+  (V16SI "riscv_vector::vls_mode_valid_p (V16SImode) && TARGET_MIN_VLEN >= 64")
+  (V4DI "riscv_vector::vls_mode_valid_p (V4DImode) && TARGET_VECTOR_ELEN_64")
+  (V8DI "riscv_vector::vls_mode_valid_p (V8DImode) && TARGET_VECTOR_ELEN_64 && 
TARGET_MIN_VLEN >= 64")
+  (V16DI "riscv_vector::vls_mode_valid_p (V16DImode) && TARGET_VECTOR_ELEN_64 
&& TARGET_MIN_VLEN >= 128")
+  (V4SF "riscv_vector::vls_mode_valid_p (V4SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
+  (V8SF "riscv_vector::vls_mode_valid_p (V8SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
+  (V16SF "riscv_vector::vls_mode_valid_p (V16SFmode) && 
TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 64")
+  (V4DF "riscv_vector::vls_mode_valid_p (V4DFmode) && 
TARGET_VECTOR_ELEN_FP_64")
+  (V8DF "riscv_vector::vls_mode_valid_p (V8DFmode) && TARGET_VECTOR_ELEN_FP_64 
&& TARGET_MIN_VLEN >= 64")
+  (V16DF "riscv_vector::vls_mode_valid_p (V16DFmode) && 
TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
+  (V32QI "riscv_vector::vls_mode_valid_p (V32QImode)")
+  (V64QI "riscv_vector::vls_mode_valid_p (V64QImode) && TARGET_MIN_VLEN >= 64")
+  (V128QI "riscv_vector::vls_mode_valid_p (V128QImode) && TARGET_MIN_VLEN >= 
128")
+  (V256QI "riscv_vector::vls_mode_valid_p (V256QImode) && TARGET_MIN_VLEN >= 
256")
+  (V512QI "riscv_vector::vls_mode_valid_p (V512QImode) && TARGET_MIN_VLEN >= 
512")
+  (V1024QI "riscv_vector::vls_mode_valid_p (V1024QImode) && TARGET_MIN_VLEN >= 
1024")
+  (V2048QI "riscv_vector::vls_mode_valid_p (V2048QImode) && TARGET_MIN_VLEN >= 
2048")
+  (V4096QI "riscv_vector::vls_mode_valid_p (V4096QImode) && TARGET_MIN_VLEN >= 
4096")
+  (V32HI "riscv_vector::vls_mode_valid_p (V32HImode) && TARGET_MIN_VLEN >= 64")
+  (V64HI "riscv_vector::vls_mode_valid_p (V64HImode) && TARGET_MIN_VLEN >= 
128")
+  (V128HI "riscv_vector::vls_mode_valid_p (V128HImode) && TARGET_MIN_VLEN >= 
256")
+  (V256HI "riscv_vector::vls_mode_valid_p (V256HImode) && TARGET_MIN_VLEN >= 
512")
+  (V512HI "riscv_vector::vls_mode_valid_p (V512HImode) && TARGET_MIN_VLEN >= 
1024")
+  (V1024HI "riscv_vector::vls_mode_valid_p (V1024HImode) && TARGET_MIN_VLEN >= 
2048")
+  (V2048HI "riscv_vector::vls_mode_valid_p (V2048HImode) && TARGET_MIN_VLEN >= 
4096")
+  (V32SI "riscv_vector::vls_mode_valid_p (V32SImode) && TARGET_MIN_VLEN >= 
128")
+  (V64SI "riscv_vector::vls_mode_valid_p (V64SImode) && TARGET_MIN_VLEN >= 
256")
+  (V128SI "riscv_vector::vls_mode_valid_p (V128SImode) && TARGET_MIN_VLEN >= 
512")
+  (V256SI "riscv_vector::vls_mode_valid_p (V256S

New Ukrainian PO file for 'gcc' (version 14.2.0)

2024-09-06 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-14.2.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-06 Thread Jin Ma

In the process of DF to SI, we generally use "unsigned_fix" rather than
"truncate" for conversion. Although this has no effect in general,
unexpected ICE often occurs when precise semantic analysis is required,
such as analysis in function "simplify_const_unary_operation" in
simplify-rtx.cc.

gcc/ChangeLog:

* config/riscv/riscv.md: Change "truncate" to "unsigned_fix" for
the Zfa extension on rv32.
---
 gcc/config/riscv/riscv.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 9f94b5aa0232..36d7b333c456 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2627,7 +2627,7 @@ (define_insn "*movdf_softfloat"
 
 (define_insn "movsidf2_low_rv32"
   [(set (match_operand:SI  0 "register_operand" "=  r")
-   (truncate:SI
+   (unsigned_fix:SI
(match_operand:DF 1 "register_operand"  "zmvf")))]
   "TARGET_HARD_FLOAT && !TARGET_64BIT && TARGET_ZFA"
   "fmv.x.w\t%0,%1"
@@ -2638,7 +2638,7 @@ (define_insn "movsidf2_low_rv32"
 
 (define_insn "movsidf2_high_rv32"
   [(set (match_operand:SI  0 "register_operand""=  r")
-   (truncate:SI
+   (unsigned_fix:SI
 (lshiftrt:DF
 (match_operand:DF 1 "register_operand" "zmvf")
 (const_int 32]
-- 
2.17.1

Re: [PATCH v1 4/9] aarch64: Exclude symbols using GOT from code models

2024-09-06 Thread Richard Sandiford

Evgeny Karpov  writes:
> Monday, September 2, 2024 5:00 PM
> Richard Sandiford  wrote:
>
>> I think we should instead patch the callers that are using
>> aarch64_symbol_binds_local_p for GOT decisions.  The function itself
>> is checking for a more general property (and one that could be useful
>> in other contexts).
>
> The patch has been refactored to address the review. Thanks!
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e4df70ddedc..8dc10efa629 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -20988,7 +20988,7 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
>   /* With -fPIC non-local symbols use the GOT.  For orthogonality
>  always use the GOT for extern weak symbols.  */
>   if ((flag_pic || SYMBOL_REF_WEAK (x))
> - && !aarch64_symbol_binds_local_p (x))
> + && !aarch64_symbol_binds_local_p (x) && !TARGET_PECOFF)
> return SYMBOL_TINY_GOT;
>
>   /* When we retrieve symbol + offset address, we have to make sure
> @@ -21010,7 +21010,7 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
> case AARCH64_CMODEL_SMALL_PIC:
> case AARCH64_CMODEL_SMALL:
>   if ((flag_pic || SYMBOL_REF_WEAK (x))
> - && !aarch64_symbol_binds_local_p (x))
> + && !aarch64_symbol_binds_local_p (x) && !TARGET_PECOFF)
> return aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC
> ? SYMBOL_SMALL_GOT_28K : SYMBOL_SMALL_GOT_4G;

Sorry for the nits, but: the GCC convention is to put each && on a separate
line when the && chain spans multiple lines.  And I think it makes sense
to test TARGET_PECOFF first:

 if (!TARGET_PECOFF
 && (flag_pic || SYMBOL_REF_WEAK (x))
 && !aarch64_symbol_binds_local_p (x))

Thanks,
Richard

Re: [PATCH] RISC-V: Add more vector-vector extract cases.

2024-09-06 Thread 钟居哲

Thanks. lgtm.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-09-06 17:56
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH] RISC-V: Add more vector-vector extract cases.
Hi,
 
this adds a V16SI -> V4SI and related i.e. "quartering" vector-vector
extract expander for VLS modes.  It helps with unnecessary spills in
x264.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (vec_extract):
Add quarter vec-vec extract.
* config/riscv/vector-iterators.md: New iterators.
---
gcc/config/riscv/autovec.md  |  28 
gcc/config/riscv/vector-iterators.md | 184 +++
2 files changed, 212 insertions(+)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a07aa0c26fd..905dcfe2dbc 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1488,6 +1488,34 @@ (define_expand "vec_extract"
   DONE;
})
+(define_expand "vec_extract"
+  [(set (match_operand: 0 "nonimmediate_operand")
+ (vec_select:
+   (match_operand:VLS_HAS_QUARTER 1 "register_operand")
+   (parallel
+ [(match_operand 2 "immediate_operand")])))]
+  "TARGET_VECTOR"
+{
+  int sz = GET_MODE_NUNITS (mode).to_constant ();
+  int part = INTVAL (operands[2]);
+
+  rtx start = GEN_INT (part * sz);
+  rtx tmp = operands[1];
+
+  if (part != 0)
+{
+  tmp = gen_reg_rtx (mode);
+
+  rtx ops[] = {tmp, operands[1], start};
+  riscv_vector::emit_vlmax_insn
+ (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+ riscv_vector::BINARY_OP, ops);
+}
+
+  emit_move_insn (operands[0], gen_lowpart (mode, tmp));
+  DONE;
+})
+
;; -
;;  [FP] Binary operations
;; -
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f27b89e841b..62195f65170 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4552,3 +4552,187 @@ (define_mode_attr vls_half [
   (V256DF "v128df")
   (V512DF "v256df")
])
+
+(define_mode_iterator VLS_HAS_QUARTER [
+  (V4QI "riscv_vector::vls_mode_valid_p (V4QImode)")
+  (V8QI "riscv_vector::vls_mode_valid_p (V8QImode)")
+  (V16QI "riscv_vector::vls_mode_valid_p (V16QImode)")
+  (V4HI "riscv_vector::vls_mode_valid_p (V4HImode)")
+  (V8HI "riscv_vector::vls_mode_valid_p (V8HImode)")
+  (V16HI "riscv_vector::vls_mode_valid_p (V16HImode)")
+  (V4SI "riscv_vector::vls_mode_valid_p (V4SImode)")
+  (V8SI "riscv_vector::vls_mode_valid_p (V8SImode)")
+  (V16SI "riscv_vector::vls_mode_valid_p (V16SImode) && TARGET_MIN_VLEN >= 64")
+  (V4DI "riscv_vector::vls_mode_valid_p (V4DImode) && TARGET_VECTOR_ELEN_64")
+  (V8DI "riscv_vector::vls_mode_valid_p (V8DImode) && TARGET_VECTOR_ELEN_64 && 
TARGET_MIN_VLEN >= 64")
+  (V16DI "riscv_vector::vls_mode_valid_p (V16DImode) && TARGET_VECTOR_ELEN_64 
&& TARGET_MIN_VLEN >= 128")
+  (V4SF "riscv_vector::vls_mode_valid_p (V4SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
+  (V8SF "riscv_vector::vls_mode_valid_p (V8SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
+  (V16SF "riscv_vector::vls_mode_valid_p (V16SFmode) && 
TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 64")
+  (V4DF "riscv_vector::vls_mode_valid_p (V4DFmode) && 
TARGET_VECTOR_ELEN_FP_64")
+  (V8DF "riscv_vector::vls_mode_valid_p (V8DFmode) && TARGET_VECTOR_ELEN_FP_64 
&& TARGET_MIN_VLEN >= 64")
+  (V16DF "riscv_vector::vls_mode_valid_p (V16DFmode) && 
TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
+  (V32QI "riscv_vector::vls_mode_valid_p (V32QImode)")
+  (V64QI "riscv_vector::vls_mode_valid_p (V64QImode) && TARGET_MIN_VLEN >= 64")
+  (V128QI "riscv_vector::vls_mode_valid_p (V128QImode) && TARGET_MIN_VLEN >= 
128")
+  (V256QI "riscv_vector::vls_mode_valid_p (V256QImode) && TARGET_MIN_VLEN >= 
256")
+  (V512QI "riscv_vector::vls_mode_valid_p (V512QImode) && TARGET_MIN_VLEN >= 
512")
+  (V1024QI "riscv_vector::vls_mode_valid_p (V1024QImode) && TARGET_MIN_VLEN >= 
1024")
+  (V2048QI "riscv_vector::vls_mode_valid_p (V2048QImode) && TARGET_MIN_VLEN >= 
2048")
+  (V4096QI "riscv_vector::vls_mode_valid_p (V4096QImode) && TARGET_MIN_VLEN >= 
4096")
+  (V32HI "riscv_vector::vls_mode_valid_p (V32HImode) && TARGET_MIN_VLEN >= 64")
+  (V64HI "riscv_vector::vls_mode_valid_p (V64HImode) && TARGET_MIN_VLEN >= 
128")
+  (V128HI "riscv_vector::vls_mode_valid_p (V128HImode) && TARGET_MIN_VLEN >= 
256")
+  (V256HI "riscv_vector::vls_mode_valid_p (V256HImode) && TARGET_MIN_VLEN >= 
512")
+  (V512HI "riscv_vector::vls_mode_valid_p (V512HImode) && TARGET_MIN_VLEN >= 
1024")
+  (V1024HI "riscv_vector::vls_mode_valid_p (V1024HImode) && TARGET_MIN_VLEN >= 
2048")
+  (V2048HI "riscv_vector::vls_mode_valid_p (V2048HImode) && TARGET_MIN_VLEN >= 
4096")
+  (V32SI "riscv_vector::vls_mode_valid_p (V32SImode) && TARGET_MIN_VLEN >= 
128

Re: [PATCH v3] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for XTheadVector

2024-09-06 Thread 钟居哲

Sorry, I still don't see assembly check.



juzhe.zh...@rivai.ai
 
From: Jin Ma
Date: 2024-09-06 16:32
To: gcc-patches
CC: jeffreyalaw; juzhe.zhong; pan2.li; kito.cheng; christoph.muellner; 
shuizhuyuanluo; pinskia; xry111; jinma.contrib; Jin Ma
Subject: [PATCH v3] RISC-V: Fix illegal operands "th.vsetvli zero,0,e32,m8" for 
XTheadVector
Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.
 
PR target/116592
 
gcc/ChangeLog:
 
* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.
 
Reported-by: nihui 
---
gcc/config/riscv/thead.cc |  4 +--
.../riscv/rvv/xtheadvector/pr116592.c | 36 +++
2 files changed, 38 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
 
diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
- return "th.vsetvli\tzero,%0,e%1,%m2";
+ return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
-   return "th.vsetvli\t%0,%1,e%2,%m3";
+   return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
new file mode 100644
index ..1350f739c42a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
@@ -0,0 +1,36 @@
+/* { dg-do assemble } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2" { target { 
rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2" { target { 
rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
-- 
2.17.1

Re: [PATCH] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-06 Thread Robin Dapp

> In the process of DF to SI, we generally use "unsigned_fix" rather than
> "truncate" for conversion. Although this has no effect in general,
> unexpected ICE often occurs when precise semantic analysis is required,
> such as analysis in function "simplify_const_unary_operation" in
> simplify-rtx.cc.

Do you have a test case for this or does it fail already in the test suite?

-- 
Regards
 Robin

Re: [PATCH v2] testsuite: Sanitize pacbti test cases for Cortex-M

2024-09-06 Thread Christophe Lyon





On 9/6/24 11:17, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

Changes since v1:

- Corrected changelog entry for pac-15.c
- Added a tab before all the asm instructions in the pac-*.c and bti-*.c tests
- Corrected the expected number of bti instructions for bti-2.c as it 
previously counted the .file directive


Thanks, this LGTM, but you'll have to wait for approval from a maintainer.

Christophe



--

Some of the test cases were scanning for "bti", but it would,
incorrectly, match the ".arch_extenssion pacbti".
Also, keep test cases active if a supported Cortex-M core is supplied.

gcc/testsuite/ChangeLog:

* gcc.target/arm/bti-1.c: Enable for Cortex-M(52|55|85) and
check for asm instructions starting with a tab.
* gcc.target/arm/bti-2.c: Likewise.
* gcc.target/arm/pac-1.c: Check for asm instructions starting
with a tab.
* gcc.target/arm/pac-2.c: Likewise.
* gcc.target/arm/pac-3.c: Likewise.
* gcc.target/arm/pac-6.c: Likewise.
* gcc.target/arm/pac-7.c: Likewise.
* gcc.target/arm/pac-8.c: Likewise.
* gcc.target/arm/pac-9.c: Likewise.
* gcc.target/arm/pac-10.c: Likewise.
* gcc.target/arm/pac-11.c: Likewise.
* gcc.target/arm/pac-sibcall.c: Likewise.
* gcc.target/arm/pac-15.c: Enable for Cortex-M(52|55|85).

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
  gcc/testsuite/gcc.target/arm/bti-1.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/bti-2.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-1.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-10.c  | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-11.c  | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-15.c  | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-2.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-3.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-4.c   | 2 +-
  gcc/testsuite/gcc.target/arm/pac-6.c   | 6 +++---
  gcc/testsuite/gcc.target/arm/pac-7.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-8.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-9.c   | 4 ++--
  gcc/testsuite/gcc.target/arm/pac-sibcall.c | 2 +-
  14 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bti-1.c 
b/gcc/testsuite/gcc.target/arm/bti-1.c
index 79dd8010d2d..70a62b5a70c 100644
--- a/gcc/testsuite/gcc.target/arm/bti-1.c
+++ b/gcc/testsuite/gcc.target/arm/bti-1.c
@@ -1,6 +1,6 @@
  /* Check that GCC does bti instruction.  */
  /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } { 
"-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } } */
  /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
-mbranch-protection=bti --save-temps" } */
  
  int

@@ -9,4 +9,4 @@ main (void)
return 0;
  }
  
-/* { dg-final { scan-assembler "bti" } } */

+/* { dg-final { scan-assembler "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/bti-2.c 
b/gcc/testsuite/gcc.target/arm/bti-2.c
index 33910563849..7c901d06967 100644
--- a/gcc/testsuite/gcc.target/arm/bti-2.c
+++ b/gcc/testsuite/gcc.target/arm/bti-2.c
@@ -1,7 +1,7 @@
  /* { dg-do compile } */
  /* -Os to create jump table.  */
  /* { dg-options "-Os" } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" } { 
"-mcpu=cortex-m52*" "-mcpu=cortex-m55*" "-mcpu=cortex-m85*" } } */
  /* { dg-options "-march=armv8.1-m.main -mthumb -mfloat-abi=softfp 
-mbranch-protection=bti --save-temps" } */
  
  extern int f1 (void);

@@ -55,4 +55,4 @@ lab2:
return 2;
  }
  
-/* { dg-final { scan-assembler-times "bti" 15 } } */

+/* { dg-final { scan-assembler-times "\tbti" 14 } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-1.c 
b/gcc/testsuite/gcc.target/arm/pac-1.c
index 9b26f62b65f..e0eea0858e0 100644
--- a/gcc/testsuite/gcc.target/arm/pac-1.c
+++ b/gcc/testsuite/gcc.target/arm/pac-1.c
@@ -6,6 +6,6 @@
  
  #include "pac.h"
  
-/* { dg-final { scan-assembler-times "pac\tip, lr, sp" 2 } } */

-/* { dg-final { scan-assembler-times "aut\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\tpac\tip, lr, sp" 2 } } */
+/* { dg-final { scan-assembler-times "\taut\tip, lr, sp" 2 } } */
  /* { dg-final { scan-assembler-not "\tbti" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pac-10.c 
b/gcc/testsuite/gcc.target/arm/pac-10.c
index a794195e8f6..6da8434aeaf 100644
--- a/gcc/testsuite/gcc.target/arm/pac-10.c
+++ b/gcc/testsuite/gcc.target/arm/pac-10.c
@@ -5,6 +5,6 @@
  
  #include "pac.h"
  
-/* { dg-final { scan-assembler "pac\tip, lr, sp" } } */

-/* { dg-final { scan-assembler "aut\tip, lr, sp" } } */
+/* { dg-final { scan-assembler "\tpac\tip, lr, sp" } } */
+/* { dg-final { scan-assembler "\taut\tip, lr, sp" } } */
  /* { dg-final {

[PATCH] c++: Properly mangle CONST_DECL without a INTEGER_CST value [PR116511]

2024-09-06 Thread Simon Martin

We ICE upon the following *valid* code when mangling the requires
clause

=== cut here ===
template  struct s1 {
  enum { e1 = 1 };
};
template  struct s2 {
  enum { e1 = s1::e1 };
  s2() requires(0 != e1) {}
};
s2<8> a;
=== cut here ===

The problem is that the mangler wrongly assumes that the DECL_INITIAL of
a CONST_DECL is always an INTEGER_CST, and blindly passes it to
write_integer_cst.

I assume we should be able to actually compute the value of e1 and use
it when mangling, however from my investigation, it seems to be a pretty
involved change.

What's clear however is that we should not try to write a non-literal as
a literal. This patch adds a utility function to determine whether a
tree is a literal as per the definition in the ABI, and uses it to only
call write_template_arg_literal when we actually have a literal in hand.

Note that I had to change the expectation of an existing test, that was
expecting "[...]void (AF::*)(){}[...]" and now gets an equivalent
"[...](void (AF::*)())0[...]" (and FWIW is what clang and icx give; see
https://godbolt.org/z/hnjdeKEhW).

Successfully tested on x86_64-pc-linux-gnu.

PR c++/116511

gcc/cp/ChangeLog:

* mangle.cc (literal_p): New.
(write_expression): Only call write_template_arg_literal for
expressions with literal_p.
(write_template_arg): Likewise.
(write_template_arg_literal): Assert literal_p.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle72.C: Adjust test expectation.
* g++.dg/abi/mangle80.C: New test.

---
 gcc/cp/mangle.cc| 33 -
 gcc/testsuite/g++.dg/abi/mangle72.C |  2 +-
 gcc/testsuite/g++.dg/abi/mangle80.C | 13 
 3 files changed, 42 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle80.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 46dc6923add..8279c3fe177 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -223,6 +223,7 @@ static void write_method_parms (tree, const int, const 
tree);
 static void write_class_enum_type (const tree);
 static void write_template_args (tree, tree = NULL_TREE);
 static void write_expression (tree);
+static bool literal_p (const tree);
 static void write_template_arg_literal (const tree);
 static void write_template_arg (tree);
 static void write_template_template_arg (const tree);
@@ -3397,8 +3398,7 @@ write_expression (tree expr)
   || code == TEMPLATE_PARM_INDEX)
 write_template_param (expr);
   /* Handle literals.  */
-  else if (TREE_CODE_CLASS (code) == tcc_constant
-  || code == CONST_DECL)
+  else if (literal_p (expr))
 write_template_arg_literal (expr);
   else if (code == EXCESS_PRECISION_EXPR
   && TREE_CODE (TREE_OPERAND (expr, 0)) == REAL_CST)
@@ -3946,6 +3946,29 @@ write_expression (tree expr)
 }
 }
 
+/* Determine whether T is a literal per section 5.1.6.1 of the CXX ABI.  */
+
+static bool
+literal_p (const tree t)
+{
+  if ((TREE_TYPE (t) && NULLPTR_TYPE_P (TREE_TYPE (t)))
+  || null_member_pointer_value_p (t))
+return true;
+  else
+switch (TREE_CODE (t))
+  {
+  case CONST_DECL:
+   return literal_p (DECL_INITIAL (t));
+  case INTEGER_CST:
+  case REAL_CST:
+  case STRING_CST:
+  case COMPLEX_CST:
+   return true;
+  default:
+   return false;
+  }
+}
+
 /* Literal subcase of non-terminal .
 
  "Literal arguments, e.g. "A<42L>", are encoded with their type
@@ -3956,6 +3979,8 @@ write_expression (tree expr)
 static void
 write_template_arg_literal (const tree value)
 {
+  gcc_assert (literal_p (value));
+
   if (TREE_CODE (value) == STRING_CST)
 /* Temporarily mangle strings as braced initializer lists.  */
 write_string ("tl");
@@ -4113,9 +4138,7 @@ write_template_arg (tree node)
   else if (code == TEMPLATE_DECL)
 /* A template appearing as a template arg is a template template arg.  */
 write_template_template_arg (node);
-  else if ((TREE_CODE_CLASS (code) == tcc_constant && code != PTRMEM_CST)
-  || code == CONST_DECL
-  || null_member_pointer_value_p (node))
+  else if (literal_p (node))
 write_template_arg_literal (node);
   else if (code == EXCESS_PRECISION_EXPR
   && TREE_CODE (TREE_OPERAND (node, 0)) == REAL_CST)
diff --git a/gcc/testsuite/g++.dg/abi/mangle72.C 
b/gcc/testsuite/g++.dg/abi/mangle72.C
index 9581451c25d..fd7d6cb51ad 100644
--- a/gcc/testsuite/g++.dg/abi/mangle72.C
+++ b/gcc/testsuite/g++.dg/abi/mangle72.C
@@ -89,7 +89,7 @@ void k00 (F) { }
 // { dg-final { scan-assembler "_Z3k001FIXtl1DEEE" } }
 
 void k0x (F) { }
-// { dg-final { scan-assembler 
"_Z3k0x1FIXtl1DtlA2_M2AFFvvEtlS3_EtlS3_adL_ZNS1_1fEvEE" } }
+// { dg-final { scan-assembler 
"_Z3k0x1FIXtl1DtlA2_M2AFFvvELS3_0EtlS3_adL_ZNS1_1fEvEE" } }
 
 void kx_ (F) { }
 // { dg-final { scan-assembler 
"_Z3kx_1FIXtl1DtlA2_M2AFFvvEtlS3_adL_ZNS1_1fEvEE" } }
diff --git a/gcc/testsuite/g++.dg/abi/mangle80.C 
b/gcc/testsuite

Re: [PATCH] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-06 Thread Jin Ma

> Do you have a test case for this or does it fail already in the test suite?
>
> -- 
> Regards
>  Robin

Sorry, I'll try to write it.

BR
Jin

[PATCH v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

2024-09-06 Thread Jin Ma

In the process of DF to SI, we generally use "unsigned_fix" rather than
"truncate" for conversion. Although this has no effect in general,
unexpected ICE often occurs when precise semantic analysis is required.

gcc/ChangeLog:

* config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
the Zfa extension on rv32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
---
 gcc/config/riscv/riscv.md| 4 ++--
 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c | 9 +
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 9f94b5aa0232..36d7b333c456 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2627,7 +2627,7 @@ (define_insn "*movdf_softfloat"
 
 (define_insn "movsidf2_low_rv32"
   [(set (match_operand:SI  0 "register_operand" "=  r")
-   (truncate:SI
+   (unsigned_fix:SI
(match_operand:DF 1 "register_operand"  "zmvf")))]
   "TARGET_HARD_FLOAT && !TARGET_64BIT && TARGET_ZFA"
   "fmv.x.w\t%0,%1"
@@ -2638,7 +2638,7 @@ (define_insn "movsidf2_low_rv32"
 
 (define_insn "movsidf2_high_rv32"
   [(set (match_operand:SI  0 "register_operand""=  r")
-   (truncate:SI
+   (unsigned_fix:SI
 (lshiftrt:DF
 (match_operand:DF 1 "register_operand" "zmvf")
 (const_int 32]
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c 
b/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c
new file mode 100644
index ..e00047b09e3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c
@@ -0,0 +1,9 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zfa -mabi=ilp32d -O2 -g" } */
+
+unsigned int
+foo (double a) {
+  unsigned int tt = *(unsigned long long *)&a & 0x;
+  return tt;
+}
-- 
2.17.1

[PATCH v2 0/9] SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement

2024-09-06 Thread Evgeny Karpov

Hello,

Thank you for reviewing v1!

v2 Changes:
- Add extra comments and extend patch descriptions.
- Extract libstdc++ changes to a separate patch.
- Minor style refactoring based on the reviews.
- Unify mingw_pe_declare_type for functions and objects.

Regards,
Evgeny

Evgeny Karpov (9):
  Support weak references
  aarch64: Add debugging information
  aarch64: Add minimal C++ support
  aarch64: Exclude symbols using GOT from code models
  aarch64: Multiple adjustments to support the SMALL code model
correctly
  aarch64: Use symbols without offset to prevent relocation issues
  aarch64: Disable the anchors
  Add LTO support
  aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

 gcc/config.gcc|  1 +
 gcc/config/aarch64/aarch64-coff.h | 32 +++---
 gcc/config/aarch64/aarch64.cc | 43 ---
 gcc/config/aarch64/cygming.h  | 69 +--
 gcc/config/i386/cygming.h | 16 +++
 gcc/config/i386/i386-protos.h |  2 -
 gcc/config/mingw/winnt-dll.cc |  4 +-
 gcc/config/mingw/winnt.cc | 33 ++-
 gcc/config/mingw/winnt.h  |  7 ++--
 libiberty/simple-object-coff.c|  4 +-
 10 files changed, 158 insertions(+), 53 deletions(-)

-- 
2.34.1

[PATCH v2 1/9] Support weak references

2024-09-06 Thread Evgeny Karpov

The patch adds support for weak references. The original MinGW
implementation targets ix86, which handles weak symbols differently
compared to AArch64. In AArch64, the weak symbols are replaced by
other symbols which reference the original weak symbols, and the
compiler does not track the original symbol names.
This patch resolves this and declares the original symbols.

Here is an explanation of why this change is needed and what the
difference is between x86_64-w64-mingw32 and aarch64-w64-mingw32.

The way x86_64 calls a weak function:
call  weak_fn2

GCC emits the call and creates the required definitions at the end
of the assembly:

.weak weak_fn2
.def  weak_fn2;   .scl  2;.type 32;   .endef

This is different from aarch64:

weak_fn2 will be legitimized and replaced by .refptr.weak_fn2,
and there will be no other references to weak_fn2 in the code.

adrp  x0, .refptr.weak_fn2
add   x0, x0, :lo12:.refptr.weak_fn2
ldr   x0, [x0]
blr   x0

GCC does not emit the required definitions at the end of the assembly,
and weak_fn2 is tracked only by the mingw stub sybmol.

Without the change, the stub definition will emit:

.section  .rdata$.refptr.weak_fn2, "dr"
.globl  .refptr.weak_fn2
.linkonce discard
.refptr.weak_fn2:
.quad   weak_fn2

which is not enough. This fix will emit the required definitions:

.weak   weak_fn2
.defweak_fn2;   .scl  2;.type 32;   .endef
.section  .rdata$.refptr.weak_fn2, "dr"
.globl  .refptr.weak_fn2
.linkonce discard
.refptr.weak_fn2:
.quad   weak_fn2

gcc/ChangeLog:

* config/aarch64/cygming.h (SUB_TARGET_RECORD_STUB): Request
declaration for weak symbols.
(PE_COFF_LEGITIMIZE_EXTERN_DECL): Legitimize external
declaration for weak symbols.
* config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Update
declarations in ix86 with the same functionality.
(PE_COFF_LEGITIMIZE_EXTERN_DECL): Likewise.
* config/mingw/winnt-dll.cc (legitimize_pe_coff_symbol):
Support declaration for weak symbols if requested.
* config/mingw/winnt.cc (struct stub_list): Likewise.
(mingw_pe_record_stub): Likewise.
(mingw_pe_file_end): Likewise.
* config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
---
 gcc/config/aarch64/cygming.h  |  6 --
 gcc/config/i386/cygming.h |  4 ++--
 gcc/config/mingw/winnt-dll.cc |  4 ++--
 gcc/config/mingw/winnt.cc | 13 -
 gcc/config/mingw/winnt.h  |  2 +-
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 9ce140a356f..bd6078023e3 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -171,7 +171,8 @@ still needed for compilation.  */
 mingw_handle_selectany_attribute, NULL }
 
 #undef SUB_TARGET_RECORD_STUB
-#define SUB_TARGET_RECORD_STUB mingw_pe_record_stub
+#define SUB_TARGET_RECORD_STUB(NAME, DECL) mingw_pe_record_stub((NAME), \
+  DECL_WEAK ((DECL)))
 
 #define SUPPORTS_ONE_ONLY 1
 
@@ -186,7 +187,8 @@ still needed for compilation.  */
 #undef GOT_ALIAS_SET
 #define GOT_ALIAS_SET mingw_GOT_alias_set ()
 
-#define PE_COFF_LEGITIMIZE_EXTERN_DECL 1
+#define PE_COFF_LEGITIMIZE_EXTERN_DECL(RTX) \
+  (GET_CODE (RTX) == SYMBOL_REF && SYMBOL_REF_WEAK (RTX))
 
 #define HAVE_64BIT_POINTERS 1
 
diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 9c8c7e33cc2..1633017eff6 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -461,7 +461,7 @@ do {\
 #define TARGET_ASM_ASSEMBLE_VISIBILITY i386_pe_assemble_visibility
 
 #undef SUB_TARGET_RECORD_STUB
-#define SUB_TARGET_RECORD_STUB mingw_pe_record_stub
+#define SUB_TARGET_RECORD_STUB(NAME, DECL) mingw_pe_record_stub((NAME), 0)
 
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
@@ -470,7 +470,7 @@ do {\
 # define HAVE_GAS_ALIGNED_COMM 0
 #endif
 
-#define PE_COFF_LEGITIMIZE_EXTERN_DECL \
+#define PE_COFF_LEGITIMIZE_EXTERN_DECL(RTX) \
   (ix86_cmodel == CM_LARGE_PIC || ix86_cmodel == CM_MEDIUM_PIC)
 
 #define HAVE_64BIT_POINTERS TARGET_64BIT_DEFAULT
diff --git a/gcc/config/mingw/winnt-dll.cc b/gcc/config/mingw/winnt-dll.cc
index f74495b7fda..eb7cff7a593 100644
--- a/gcc/config/mingw/winnt-dll.cc
+++ b/gcc/config/mingw/winnt-dll.cc
@@ -134,7 +134,7 @@ get_dllimport_decl (tree decl, bool beimport)
 {
   SYMBOL_REF_FLAGS (rtl) |= SYMBOL_FLAG_EXTERNAL;
 #ifdef SUB_TARGET_RECORD_STUB
-  SUB_TARGET_RECORD_STUB (name);
+  SUB_TARGET_RECORD_STUB (name, decl);
 #endif
 }
 
@@ -206,7 +206,7 @@ legitimize_pe_coff_symbol (rtx addr, bool inreg)
}
 }
 
-  if (!PE_COFF_LEGITIMIZE_EXTERN_DECL)
+  if (!PE_COFF_LEGITIMIZE_EXTERN_DECL (addr))
 return NULL_RTX;
 
   if (GET_CODE (addr) == SYMBOL_REF
diff --git a/gcc/config/mingw/winnt.cc b/gcc/con

[PATCH v2 2/9] aarch64: Add debugging information

2024-09-06 Thread Evgeny Karpov

This patch enables DWARF and allows compilation with debugging
information by using "gcc -g". The unwind info is disabled for
the moment and will be revisited after SEH implementation for
the target.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (TARGET_ASM_UNALIGNED_HI_OP):
Enable DWARF.
(TARGET_ASM_UNALIGNED_SI_OP): Likewise.
(TARGET_ASM_UNALIGNED_DI_OP): Likewise.
* config/aarch64/cygming.h (DWARF2_DEBUGGING_INFO): Likewise.
(PREFERRED_DEBUGGING_TYPE): Likewise.
(DWARF2_UNWIND_INFO): Likewise.
(ASM_OUTPUT_DWARF_OFFSET): Likewise.
---
 gcc/config/aarch64/aarch64.cc |  9 
 gcc/config/aarch64/cygming.h  | 39 ++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index bfd7bcdef7c..e4df70ddedc 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -30588,6 +30588,15 @@ aarch64_run_selftests (void)
 #undef TARGET_ASM_ALIGNED_SI_OP
 #define TARGET_ASM_ALIGNED_SI_OP "\t.word\t"
 
+#if TARGET_PECOFF
+#undef TARGET_ASM_UNALIGNED_HI_OP
+#define TARGET_ASM_UNALIGNED_HI_OP TARGET_ASM_ALIGNED_HI_OP
+#undef TARGET_ASM_UNALIGNED_SI_OP
+#define TARGET_ASM_UNALIGNED_SI_OP TARGET_ASM_ALIGNED_SI_OP
+#undef TARGET_ASM_UNALIGNED_DI_OP
+#define TARGET_ASM_UNALIGNED_DI_OP TARGET_ASM_ALIGNED_DI_OP
+#endif
+
 #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
 #define TARGET_ASM_CAN_OUTPUT_MI_THUNK \
   hook_bool_const_tree_hwi_hwi_const_tree_true
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index bd6078023e3..e4ceab82b9e 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -21,8 +21,13 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_AARCH64_CYGMING_H
 #define GCC_AARCH64_CYGMING_H
 
+#define DWARF2_DEBUGGING_INFO 1
+
 #undef PREFERRED_DEBUGGING_TYPE
-#define PREFERRED_DEBUGGING_TYPE DINFO_TYPE_NONE
+#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
+
+#undef DWARF2_UNWIND_INFO
+#define DWARF2_UNWIND_INFO 0
 
 #define FASTCALL_PREFIX '@'
 
@@ -75,6 +80,38 @@ still needed for compilation.  */
 #define ASM_OUTPUT_EXTERNAL_LIBCALL(FILE, FUN) \
   mingw_pe_declare_function_type (FILE, XSTR (FUN, 0), 1)
 
+/* Use section relative relocations for debugging offsets.  Unlike
+   other targets that fake this by putting the section VMA at 0, PE
+   won't allow it.  */
+#define ASM_OUTPUT_DWARF_OFFSET(FILE, SIZE, LABEL, OFFSET, SECTION) \
+  do { \
+switch (SIZE)  \
+  {\
+  case 4:  \
+   fputs ("\t.secrel32\t", FILE);  \
+   assemble_name (FILE, LABEL);\
+   if ((OFFSET) != 0)  \
+ fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC,   \
+  (HOST_WIDE_INT) (OFFSET));   \
+   break;  \
+  case 8:  \
+   /* This is a hack.  There is no 64-bit section relative \
+  relocation.  However, the COFF format also does not  \
+  support 64-bit file offsets; 64-bit applications are \
+  limited to 32-bits of code+data in any one module.   \
+  Fake the 64-bit offset by zero-extending it.  */ \
+   fputs ("\t.secrel32\t", FILE);  \
+   assemble_name (FILE, LABEL);\
+   if ((OFFSET) != 0)  \
+ fprintf (FILE, "+" HOST_WIDE_INT_PRINT_DEC,   \
+  (HOST_WIDE_INT) (OFFSET));   \
+   fputs ("\n\t.long\t0", FILE);   \
+   break;  \
+  default: \
+   gcc_unreachable (); \
+  }\
+  } while (0)
+
 #define TARGET_OS_CPP_BUILTINS()   \
   do   \
 {  \
-- 
2.34.1

[PATCH v2 4/9] aarch64: Exclude symbols using GOT from code models

2024-09-06 Thread Evgeny Karpov

Symbols using GOT are not supported by the aarch64-w64-mingw32
target and should be excluded from the code models.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_classify_symbol):
Disable GOT for PECOFF target.
---
 gcc/config/aarch64/aarch64.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e4df70ddedc..03362a975c0 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20987,7 +20987,8 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
case AARCH64_CMODEL_TINY:
  /* With -fPIC non-local symbols use the GOT.  For orthogonality
 always use the GOT for extern weak symbols.  */
- if ((flag_pic || SYMBOL_REF_WEAK (x))
+ if (!TARGET_PECOFF
+ && (flag_pic || SYMBOL_REF_WEAK (x))
  && !aarch64_symbol_binds_local_p (x))
return SYMBOL_TINY_GOT;
 
@@ -21009,7 +21010,8 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
case AARCH64_CMODEL_SMALL_SPIC:
case AARCH64_CMODEL_SMALL_PIC:
case AARCH64_CMODEL_SMALL:
- if ((flag_pic || SYMBOL_REF_WEAK (x))
+ if (!TARGET_PECOFF
+ && (flag_pic || SYMBOL_REF_WEAK (x))
  && !aarch64_symbol_binds_local_p (x))
return aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC
? SYMBOL_SMALL_GOT_28K : SYMBOL_SMALL_GOT_4G;
-- 
2.34.1

[PATCH v2 3/9] aarch64: Add minimal C++ support

2024-09-06 Thread Evgeny Karpov

The patch resolves compilation issues for the C++ language. Previous
patch series contributed to C++ as well, however, C++ could not be
tested until we got a C++ compiler and could build at least a "Hello
World" C++ program, and in reality, more than that.

Another issue has been fixed in the libstdc++ patch.
https://gcc.gnu.org/pipermail/libstdc++/2024-September/059472.html

gcc/ChangeLog:

* config.gcc: Add missing dependencies.
---
 gcc/config.gcc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a36dd1bcbc6..e1117c273f0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1283,6 +1283,7 @@ aarch64-*-mingw*)
extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
extra_objs="${extra_objs} winnt.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
+   cxx_target_objs="${cxx_target_objs} msformat-c.o"
d_target_objs="${d_target_objs} winnt-d.o"
tmake_file="${tmake_file} mingw/t-cygming"
case ${enable_threads} in
-- 
2.34.1

[PATCH v2 5/9] aarch64: Multiple adjustments to support the SMALL code model correctly

2024-09-06 Thread Evgeny Karpov

LOCAL_LABEL_PREFIX has been changed to help the assembly
compiler recognize local labels. Emitting locals has been
replaced with the .lcomm directive to declare uninitialized
data without defining an exact section. Functions and objects
were missing declarations. Binutils was not able to distinguish
static from external, or an object from a function.
mingw_pe_declare_object_type has been added to have type
information for relocation on AArch64, which is not the case
for ix86.

This fix relies on changes in binutils.
aarch64: Relocation fixes and LTO
https://sourceware.org/pipermail/binutils/2024-August/136481.html

gcc/ChangeLog:

* config/aarch64/aarch64-coff.h (LOCAL_LABEL_PREFIX):
Use "." as the local label prefix.
(ASM_OUTPUT_ALIGNED_LOCAL): Remove.
(ASM_OUTPUT_LOCAL): New.
* config/aarch64/cygming.h (ASM_OUTPUT_EXTERNAL_LIBCALL):
Update.
(ASM_DECLARE_OBJECT_NAME): New.
(ASM_DECLARE_FUNCTION_NAME): New.
* config/i386/cygming.h (ASM_DECLARE_COLD_FUNCTION_NAME):
Update.
(ASM_OUTPUT_EXTERNAL_LIBCALL): Update.
* config/mingw/winnt.cc (mingw_pe_declare_function_type):
Rename into ...
(mingw_pe_declare_type): ... this.
(i386_pe_start_function): Update.
* config/mingw/winnt.h (mingw_pe_declare_function_type):
Rename into ...
(mingw_pe_declare_type): ... this.
---
 gcc/config/aarch64/aarch64-coff.h | 22 ++
 gcc/config/aarch64/cygming.h  | 18 +-
 gcc/config/i386/cygming.h |  8 
 gcc/config/mingw/winnt.cc | 18 +-
 gcc/config/mingw/winnt.h  |  3 +--
 5 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-coff.h 
b/gcc/config/aarch64/aarch64-coff.h
index 81fd9954f75..17f346fe540 100644
--- a/gcc/config/aarch64/aarch64-coff.h
+++ b/gcc/config/aarch64/aarch64-coff.h
@@ -20,9 +20,8 @@
 #ifndef GCC_AARCH64_COFF_H
 #define GCC_AARCH64_COFF_H
 
-#ifndef LOCAL_LABEL_PREFIX
-# define LOCAL_LABEL_PREFIX""
-#endif
+#undef LOCAL_LABEL_PREFIX
+#define LOCAL_LABEL_PREFIX  "."
 
 /* Using long long breaks -ansi and -std=c90, so these will need to be
made conditional for an LLP64 ABI.  */
@@ -54,19 +53,10 @@
 }
 #endif
 
-/* Output a local common block.  /bin/as can't do this, so hack a
-   `.space' into the bss segment.  Note that this is *bad* practice,
-   which is guaranteed NOT to work since it doesn't define STATIC
-   COMMON space but merely STATIC BSS space.  */
-#ifndef ASM_OUTPUT_ALIGNED_LOCAL
-# define ASM_OUTPUT_ALIGNED_LOCAL(STREAM, NAME, SIZE, ALIGN)   \
-{  \
-  switch_to_section (bss_section); \
-  ASM_OUTPUT_ALIGN (STREAM, floor_log2 (ALIGN / BITS_PER_UNIT));   \
-  ASM_OUTPUT_LABEL (STREAM, NAME); \
-  fprintf (STREAM, "\t.space\t%d\n", (int)(SIZE)); \
-}
-#endif
+#define ASM_OUTPUT_LOCAL(FILE, NAME, SIZE, ROUNDED)  \
+( fputs (".lcomm ", (FILE)),   \
+  assemble_name ((FILE), (NAME)),  \
+  fprintf ((FILE), ",%lu\n", (ROUNDED)))
 
 #define ASM_OUTPUT_SKIP(STREAM, NBYTES)\
   fprintf (STREAM, "\t.space\t%d  // skip\n", (int) (NBYTES))
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index e4ceab82b9e..0e484a84ba9 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -78,7 +78,7 @@ still needed for compilation.  */
 
 /* Declare the type properly for any external libcall.  */
 #define ASM_OUTPUT_EXTERNAL_LIBCALL(FILE, FUN) \
-  mingw_pe_declare_function_type (FILE, XSTR (FUN, 0), 1)
+  mingw_pe_declare_type (FILE, XSTR (FUN, 0), 1, 1)
 
 /* Use section relative relocations for debugging offsets.  Unlike
other targets that fake this by putting the section VMA at 0, PE
@@ -213,6 +213,22 @@ still needed for compilation.  */
 
 #define SUPPORTS_ONE_ONLY 1
 
+#undef ASM_DECLARE_OBJECT_NAME
+#define ASM_DECLARE_OBJECT_NAME(STREAM, NAME, DECL)\
+  do { \
+mingw_pe_declare_type (STREAM, NAME, TREE_PUBLIC (DECL), 0);   \
+ASM_OUTPUT_LABEL ((STREAM), (NAME));   \
+  } while (0)
+
+
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL)  \
+  do { \
+mingw_pe_declare_type (STREAM, NAME, TREE_PUBLIC (DECL), 1);   \
+aarch64_declare_function_name (STREAM, NAME, DECL);
\
+  } while (0)
+
+
 /* Define this to be nonzero if static stack checking is supported.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 1633017eff6..4c3d925e8b3 100644
--- a/gcc/config/

[PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-06 Thread Evgeny Karpov

aarch64.cc has been updated to prevent emitting "symbol + offset"
for SYMBOL_SMALL_ABSOLUTE for the PECOFF target. "symbol + offset"
cannot be used in relocations for aarch64-w64-mingw32 due to
relocation requirements.

Instead, it will adjust the address by an offset with the
"add" instruction.

This approach allows addressing 4GB, instead of 1MB as it was before.
This issue has been fixed in the binutils patch series.

https://sourceware.org/pipermail/binutils/2024-August/136481.html

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_load_symref_and_add_offset):
New.
(aarch64_expand_mov_immediate): Use
aarch64_load_symref_and_add_offset.
---
 gcc/config/aarch64/aarch64.cc | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 03362a975c0..3a8ecdf562b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4887,6 +4887,19 @@ aarch64_split_add_offset (scalar_int_mode mode, rtx 
dest, rtx src,
  temp1, temp2, 0, false);
 }
 
+/* Load the address and apply the offset by using "add" instruction.  */
+
+static void
+aarch64_load_symref_and_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+   poly_int64 offset)
+{
+  gcc_assert (can_create_pseudo_p ());
+  src = aarch64_force_temporary (mode, dest, src);
+  aarch64_add_offset (mode, dest, src, offset,
+ NULL_RTX, NULL_RTX, 0, false);
+}
+
+
 /* Add DELTA to the stack pointer, marking the instructions frame-related.
TEMP1 is available as a temporary if nonnull.  FORCE_ISA_MODE is as
for aarch64_add_offset.  EMIT_MOVE_IMM is false if TEMP1 already
@@ -6054,10 +6067,8 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
case SYMBOL_TINY_TLSIE:
  if (const_offset != 0)
{
- gcc_assert(can_create_pseudo_p ());
- base = aarch64_force_temporary (int_mode, dest, base);
- aarch64_add_offset (int_mode, dest, base, const_offset,
- NULL_RTX, NULL_RTX, 0, false);
+ aarch64_load_symref_and_add_offset (int_mode, dest, base,
+ const_offset);
  return;
}
  /* FALLTHRU */
@@ -6068,6 +6079,13 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
case SYMBOL_TLSLE24:
case SYMBOL_TLSLE32:
case SYMBOL_TLSLE48:
+ if (TARGET_PECOFF && const_offset != 0)
+   {
+ aarch64_load_symref_and_add_offset (int_mode, dest, base,
+ const_offset);
+ return;
+   }
+
  aarch64_load_symref_appropriately (dest, imm, sty);
  return;
 
-- 
2.34.1

[PATCH v2 7/9] aarch64: Disable the anchors

2024-09-06 Thread Evgeny Karpov

The anchors have been disabled as they use symbol + offset, which is
not applicable for COFF AArch64.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (TARGET_MIN_ANCHOR_OFFSET):
Keep default TARGET_MAX_ANCHOR_OFFSET for PECOFF target.
(TARGET_MAX_ANCHOR_OFFSET): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3a8ecdf562b..56315fae7b9 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -30962,11 +30962,13 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MIN_ANCHOR_OFFSET
 #define TARGET_MIN_ANCHOR_OFFSET -256
 
+#if !TARGET_PECOFF
 /* Limit the maximum anchor offset to 4k-1, since that's the limit for a
byte offset; we can do much more for larger data types, but have no way
to determine the size of the access.  We assume accesses are aligned.  */
 #undef TARGET_MAX_ANCHOR_OFFSET
 #define TARGET_MAX_ANCHOR_OFFSET 4095
+#endif
 
 #undef TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
 #define TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT \
-- 
2.34.1

[PATCH] c++, v2: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-06 Thread Jakub Jelinek

Hi!

On Wed, Aug 14, 2024 at 06:11:35PM +0200, Jakub Jelinek wrote:
> Here is the I believe ABI compatible version, which uses the separate
> guard variables, so different structured binding variables can be
> initialized in different threads, but the thread that did the artificial
> base initialization will keep temporaries live at least until the last
> guard variable is released (i.e. when even that variable has been
> initialized).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660354.html
> patch, ok for trunk?
> 
> As for namespace scope structured bindings and this DR, all of
> set_up_extended_ref_temp, cp_finish_decl -> expand_static_init and
> cp_finish_decl -> cp_finish_decomp -> cp_finish_decl -> expand_static_init
> in that case just push some decls into the static_aggregates or
> tls_aggregates chains.
> So, we can end up e.g. with the most important decl for a extended ref
> temporary (which initializes some temporaries), then perhaps some more
> of those, then DECL_DECOMPOSITION_P base, then n times optionally some further
> extended refs and DECL_DECOMPOSITION_P non-base and I think we need
> to one_static_initialization_or_destruction all of them together, by
> omitting CLEANUP_POINT_EXPR from the very first one (or all until the
> DECL_DECOMPOSITION_P base?), say through temporarily clearing
> stmts_are_full_exprs_p and then wrapping whatever
> one_static_initialization_or_destruction produces for all of those into
> a single CLEANUP_POINT_EXPR argument.
> Perhaps remember static_aggregates or tls_aggregates early before any
> check_initializer etc. calls and then after cp_finish_decomp cut that
> TREE_LIST nodes and pass that as a separate TREE_VALUE in the list.
> Though, not sure what to do about modules.cc uses of these, it needs
> to save/restore that stuff somehow too.

Now that the CWG 2867 patch for automatic structured bindings is in,
here is an updated version of the block scope static structured bindings
CWG 2867 patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

No patch for the namespace scope structured bindings yet, will work on that
soon.

2024-09-05  Jakub Jelinek  

PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decl): If need_decomp_init, for function scope structure
binding bases, temporarily clear stmts_are_full_exprs_p before
calling expand_static_init, after it call cp_finish_decomp and wrap
code emitted by both into maybe_cleanup_point_expr_void and ensure
cp_finish_decomp isn't called again.

* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.

--- gcc/cp/decl.cc.jj   2024-09-04 19:55:59.046491602 +0200
+++ gcc/cp/decl.cc  2024-09-04 20:04:35.695952219 +0200
@@ -9140,7 +9140,24 @@ cp_finish_decl (tree decl, tree init, bo
 initializer.  It is not legal to redeclare a static data
 member, so this issue does not arise in that case.  */
   else if (var_definition_p && TREE_STATIC (decl))
-   expand_static_init (decl, init);
+   {
+ if (decomp && DECL_FUNCTION_SCOPE_P (decl))
+   {
+ tree sl = push_stmt_list ();
+ auto saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p ();
+ current_stmt_tree ()->stmts_are_full_exprs_p = 0;
+ expand_static_init (decl, init);
+ current_stmt_tree ()->stmts_are_full_exprs_p
+   = saved_stmts_are_full_exprs_p;
+ cp_finish_decomp (decl, decomp);
+ decomp = NULL;
+ sl = pop_stmt_list (sl);
+ sl = maybe_cleanup_point_expr_void (sl);
+ add_stmt (sl);
+   }
+ else
+   expand_static_init (decl, init);
+   }
 }
 
   /* If a CLEANUP_STMT was created to destroy a temporary bound to a
--- gcc/testsuite/g++.dg/DRs/dr2867-3.C.jj  2024-08-13 21:05:42.876446125 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2867-3.C 2024-08-13 21:05:42.876446125 +0200
@@ -0,0 +1,159 @@
+// CWG2867 - Order of initialization for structured bindings.
+// { dg-do run { target c++11 } }
+// { dg-options "" }
+
+#define assert(X) do { if (!(X)) __builtin_abort(); } while (0)
+
+namespace std {
+  template struct tuple_size;
+  template struct tuple_element;
+}
+
+int a, c, d, i;
+
+struct A {
+  A () { assert (c == 3); ++c; }
+  ~A () { ++a; }
+  template  int &get () const { assert (c == 5 + I); ++c; return i; }
+};
+
+template <> struct std::tuple_size  { static const int value = 4; };
+template  struct std::tuple_element  { using type = int; };
+template <> struct std::tuple_size  { static const int value = 4; };
+template  struct std::tuple_element  { using type = int; };
+
+struct B {
+  B () { assert (c >= 1 && c <= 2); ++c; }
+  ~B () { assert (c >= 9 && c <= 10); ++c; }
+};
+
+struct C {
+

[PATCH v2 8/9] Add LTO support

2024-09-06 Thread Evgeny Karpov

The patch reuses the configuration for LTO from ix86 and adds the
aarch64 architecture to the list of supported COFF headers.

gcc/ChangeLog:

* config/aarch64/cygming.h (TARGET_ASM_LTO_START): New.
(TARGET_ASM_LTO_END): Likewise.
* config/i386/cygming.h (TARGET_ASM_LTO_START): Update.
(TARGET_ASM_LTO_END): Likewise.
* config/i386/i386-protos.h (i386_pe_asm_lto_start): Delete.
(i386_pe_asm_lto_end): Likewise.
* config/mingw/winnt.cc (i386_pe_asm_lto_start): Rename
into ...
(mingw_pe_asm_lto_start): ... this.
(i386_pe_asm_lto_end): Rename into ...
(mingw_pe_asm_lto_end): ... this.
* config/mingw/winnt.h (mingw_pe_asm_lto_start): New.
(mingw_pe_asm_lto_end): Likewise.

libiberty/ChangeLog:

* simple-object-coff.c: Add aarch64.
---
 gcc/config/aarch64/cygming.h   | 6 ++
 gcc/config/i386/cygming.h  | 4 ++--
 gcc/config/i386/i386-protos.h  | 2 --
 gcc/config/mingw/winnt.cc  | 4 ++--
 gcc/config/mingw/winnt.h   | 2 ++
 libiberty/simple-object-coff.c | 4 +++-
 6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 0e484a84ba9..9e1c5a6fbc2 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -245,4 +245,10 @@ still needed for compilation.  */
 
 #define HAVE_64BIT_POINTERS 1
 
+/* Kludge because of missing PE-COFF support for early LTO debug.  */
+#undef  TARGET_ASM_LTO_START
+#define TARGET_ASM_LTO_START mingw_pe_asm_lto_start
+#undef  TARGET_ASM_LTO_END
+#define TARGET_ASM_LTO_END mingw_pe_asm_lto_end
+
 #endif
diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 4c3d925e8b3..742f67e9f10 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -348,9 +348,9 @@ do {\
 
 /* Kludge because of missing PE-COFF support for early LTO debug.  */
 #undef  TARGET_ASM_LTO_START
-#define TARGET_ASM_LTO_START i386_pe_asm_lto_start
+#define TARGET_ASM_LTO_START mingw_pe_asm_lto_start
 #undef  TARGET_ASM_LTO_END
-#define TARGET_ASM_LTO_END i386_pe_asm_lto_end
+#define TARGET_ASM_LTO_END mingw_pe_asm_lto_end
 
 #undef ASM_COMMENT_START
 #define ASM_COMMENT_START " #"
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3a7bc949e56..e9e4a9d4f08 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -316,8 +316,6 @@ extern void i386_pe_asm_output_aligned_decl_common (FILE *, 
tree,
const char *,
HOST_WIDE_INT,
HOST_WIDE_INT);
-extern void i386_pe_asm_lto_start (void);
-extern void i386_pe_asm_lto_end (void);
 extern void i386_pe_start_function (FILE *, const char *, tree);
 extern void i386_pe_end_function (FILE *, const char *, tree);
 extern void i386_pe_end_cold_function (FILE *, const char *, tree);
diff --git a/gcc/config/mingw/winnt.cc b/gcc/config/mingw/winnt.cc
index f93e80a1d52..5d7cc760bd7 100644
--- a/gcc/config/mingw/winnt.cc
+++ b/gcc/config/mingw/winnt.cc
@@ -831,14 +831,14 @@ mingw_pe_file_end (void)
 static enum debug_info_levels saved_debug_info_level;
 
 void
-i386_pe_asm_lto_start (void)
+mingw_pe_asm_lto_start (void)
 {
   saved_debug_info_level = debug_info_level;
   debug_info_level = DINFO_LEVEL_NONE;
 }
 
 void
-i386_pe_asm_lto_end (void)
+mingw_pe_asm_lto_end (void)
 {
   debug_info_level = saved_debug_info_level;
 }
diff --git a/gcc/config/mingw/winnt.h b/gcc/config/mingw/winnt.h
index 14bff19e697..1ac19fd2386 100644
--- a/gcc/config/mingw/winnt.h
+++ b/gcc/config/mingw/winnt.h
@@ -23,6 +23,8 @@ http://www.gnu.org/licenses/.  */
 extern tree mingw_handle_selectany_attribute (tree *, tree, tree, int, bool *);
 
 extern void mingw_pe_asm_named_section (const char *, unsigned int, tree);
+extern void mingw_pe_asm_lto_start (void);
+extern void mingw_pe_asm_lto_end (void);
 extern void mingw_pe_declare_type (FILE *, const char *, bool, bool);
 extern void mingw_pe_encode_section_info (tree, rtx, int);
 extern void mingw_pe_file_end (void);
diff --git a/libiberty/simple-object-coff.c b/libiberty/simple-object-coff.c
index e748205972f..fd3c310db51 100644
--- a/libiberty/simple-object-coff.c
+++ b/libiberty/simple-object-coff.c
@@ -219,7 +219,9 @@ static const struct coff_magic_struct coff_magic[] =
   /* i386.  */
   { 0x14c, 0, F_EXEC | IMAGE_FILE_SYSTEM | IMAGE_FILE_DLL },
   /* x86_64.  */
-  { 0x8664, 0, F_EXEC | IMAGE_FILE_SYSTEM | IMAGE_FILE_DLL }
+  { 0x8664, 0, F_EXEC | IMAGE_FILE_SYSTEM | IMAGE_FILE_DLL },
+  /* AArch64.  */
+  { 0xaa64, 0, F_EXEC | IMAGE_FILE_SYSTEM | IMAGE_FILE_DLL }
 };
 
 /* See if we have a COFF file.  */
-- 
2.34.1

[PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-09-06 Thread Evgeny Karpov

In some cases, the alignment can be bigger than BIGGEST_ALIGNMENT.

The issue was detected while building FFmpeg.
It creates structures, most likely for AVX optimization.

For instance:
float __attribute__((aligned (32))) large_aligned_array[3];

BIGGEST_ALIGNMENT could be up to 512 bits on x64.
This patch has been added to cover this case without needing to
change the FFmpeg code.

gcc/ChangeLog:

* config/aarch64/aarch64-coff.h (ASM_OUTPUT_ALIGNED_LOCAL):
Change alignment.
---
 gcc/config/aarch64/aarch64-coff.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-coff.h 
b/gcc/config/aarch64/aarch64-coff.h
index 17f346fe540..bf8e30b9c08 100644
--- a/gcc/config/aarch64/aarch64-coff.h
+++ b/gcc/config/aarch64/aarch64-coff.h
@@ -58,6 +58,16 @@
   assemble_name ((FILE), (NAME)),  \
   fprintf ((FILE), ",%lu\n", (ROUNDED)))
 
+#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT)  \
+  { \
+unsigned HOST_WIDE_INT rounded = MAX ((SIZE), 1); \
+unsigned HOST_WIDE_INT alignment = MAX ((ALIGNMENT), BIGGEST_ALIGNMENT); \
+rounded += (alignment / BITS_PER_UNIT) - 1; \
+rounded = (rounded / (alignment / BITS_PER_UNIT) \
+  * (alignment / BITS_PER_UNIT)); \
+ASM_OUTPUT_LOCAL (FILE, NAME, SIZE, rounded); \
+  }
+
 #define ASM_OUTPUT_SKIP(STREAM, NBYTES)\
   fprintf (STREAM, "\t.space\t%d  // skip\n", (int) (NBYTES))
 
-- 
2.34.1

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-06 Thread Martin Storsjö


On Fri, 6 Sep 2024, Evgeny Karpov wrote:


aarch64.cc has been updated to prevent emitting "symbol + offset"
for SYMBOL_SMALL_ABSOLUTE for the PECOFF target. "symbol + offset"
cannot be used in relocations for aarch64-w64-mingw32 due to
relocation requirements.

Instead, it will adjust the address by an offset with the
"add" instruction.

This approach allows addressing 4GB, instead of 1MB as it was before.
This issue has been fixed in the binutils patch series.

https://sourceware.org/pipermail/binutils/2024-August/136481.html


Sorry, but no.

You can't just redefine how relocations in your object file format works, 
just because you feel like it.


Didn't you see my reply about how MSVC themselves also use this 
relocation?


// Martin

Re: [PATCH] aarch64: Use is_attribute_namespace_p and get_attribute_name inside aarch64_lookup_shared_state_flags [PR116598]

2024-09-06 Thread Richard Sandiford

Andrew Pinski  writes:
> The code in aarch64_lookup_shared_state_flags all C++11 attributes on the 
> function type
> had a namespace associated with them. But with the addition of 
> reproducible/unsequenced,
> this is not true.
>
> This fixes the issue by using is_attribute_namespace_p instead of manually 
> figuring out
> the namespace is named "arm" and uses get_attribute_name instead of manually 
> grabbing
> the attribute name.
>
> Built and tested for aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
>   PR target/116598
>   * config/aarch64/aarch64.cc (aarch64_lookup_shared_state_flags): Use
>   is_attribute_namespace_p and get_attribute_name instead of manually 
> grabbing
>   the namespace and name of the attribute.

OK, thanks.

Richard

> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 27e24ba70ab..6a3f1a23a9f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -594,14 +594,10 @@ aarch64_lookup_shared_state_flags (tree attrs, const 
> char *state_name)
>  {
>for (tree attr = attrs; attr; attr = TREE_CHAIN (attr))
>  {
> -  if (!cxx11_attribute_p (attr))
> +  if (!is_attribute_namespace_p ("arm", attr))
>   continue;
>  
> -  auto ns = IDENTIFIER_POINTER (TREE_PURPOSE (TREE_PURPOSE (attr)));
> -  if (strcmp (ns, "arm") != 0)
> - continue;
> -
> -  auto attr_name = IDENTIFIER_POINTER (TREE_VALUE (TREE_PURPOSE (attr)));
> +  auto attr_name = IDENTIFIER_POINTER (get_attribute_name (attr));
>auto flags = aarch64_attribute_shared_state_flags (attr_name);
>if (!flags)
>   continue;

[PATCH] Fix SLP double-reduction support

2024-09-06 Thread Richard Biener

When doing SLP discovery I forgot to handle double reductions even
though they are already queued in LOOP_VINFO_REDUCTIONS.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_analyze_slp): Also handle discovery
for double reductions.
---
 gcc/tree-vect-slp.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3d2973698e2..0fb17340bd3 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4652,7 +4652,9 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 reduction path.  In that case we'd have to reverse
 engineer that conversion stmt following the chain using
 reduc_idx and from the PHI using reduc_def.  */
- && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
+ && (STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
+ || (STMT_VINFO_DEF_TYPE (next_info)
+ == vect_double_reduction_def)))
{
  /* Do not discover SLP reductions combining lane-reducing
 ops, that will fail later.  */
-- 
2.43.0

[PATCH] x86-64: Don't use temp for argument in a TImode register

2024-09-06 Thread H.J. Lu

Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
in a TImode register.  Otherwise, the TImode variable will be put in
the GPR save area which guarantees only 8-byte alignment.

gcc/

PR target/116621
* config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
a PARALLEL BLKmode container of an EXPR_LIST expression in a
TImode register.

gcc/testsuite/

PR target/116621
* gcc.target/i386/pr116621.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc  | 22 ++--
 gcc/testsuite/gcc.target/i386/pr116621.c | 43 
 2 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116621.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 707b75a6d5d..45320124b91 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -4908,13 +4908,31 @@ ix86_gimplify_va_arg (tree valist, tree type, 
gimple_seq *pre_p,
 
   examine_argument (nat_mode, type, 0, &needed_intregs, &needed_sseregs);
 
-  need_temp = (!REG_P (container)
+  bool container_in_reg = false;
+  if (REG_P (container))
+   container_in_reg = true;
+  else if (GET_CODE (container) == PARALLEL
+  && GET_MODE (container) == BLKmode
+  && XVECLEN (container, 0) == 1)
+   {
+ /* Check if it is a PARALLEL BLKmode container of an EXPR_LIST
+expression in a TImode register.  In this case, temp isn't
+needed.  Otherwise, the TImode variable will be put in the
+GPR save area which guarantees only 8-byte alignment.   */
+ rtx x = XVECEXP (container, 0, 0);
+ if (GET_CODE (x) == EXPR_LIST
+ && REG_P (XEXP (x, 0))
+ && XEXP (x, 1) == const0_rtx)
+   container_in_reg = true;
+   }
+
+  need_temp = (!container_in_reg
   && ((needed_intregs && TYPE_ALIGN (type) > 64)
   || TYPE_ALIGN (type) > 128));
 
   /* In case we are passing structure, verify that it is consecutive block
  on the register save area.  If not we need to do moves.  */
-  if (!need_temp && !REG_P (container))
+  if (!need_temp && !container_in_reg)
{
  /* Verify that all registers are strictly consecutive  */
  if (SSE_REGNO_P (REGNO (XEXP (XVECEXP (container, 0, 0), 0
diff --git a/gcc/testsuite/gcc.target/i386/pr116621.c 
b/gcc/testsuite/gcc.target/i386/pr116621.c
new file mode 100644
index 000..704266458a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116621.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+union S8302
+{
+  union
+  {
+double b;
+int c;
+  } a;
+  long double d;
+  unsigned short int f[5];
+};
+
+union S8302 s8302;
+extern void check8302va (int i, ...);
+
+int
+main (void)
+{
+  memset (&s8302, '\0', sizeof (s8302));
+  s8302.a.b = -221438.25;
+  check8302va (1, s8302);
+  return 0;
+}
+
+__attribute__((noinline, noclone))
+void
+check8302va (int z, ...)
+{
+  union S8302 arg, *p;
+  va_list ap;
+
+  __builtin_va_start (ap, z);
+  p = &s8302;
+  arg = __builtin_va_arg (ap, union S8302);
+  if (p->a.b != arg.a.b)
+__builtin_abort ();
+  __builtin_va_end (ap);
+}
-- 
2.46.0

[PATCH] match: Change (A * B) + (-C) to (B - C/A) * A, if C multiple of A [PR109393]

2024-09-06 Thread konstantinos . eleftheriou

From: kelefth 

The following function:

int foo(int *a, int j)
{
  int k = j - 1;
  return a[j - 1] == a[k];
}

does not fold to `return 1;` using -O2 or higher. The cause of this is that
the expression `4 * j + (-4)` for the index computation is not folded to
`4 * (j - 1)`. Existing simplifications that handle similar cases are applied
when A == C, which is not the case in this instance.

A previous attempt to address this issue is
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html

This patch adds the following simplification in match.pd:
(A * B) + (-C) -> (B - C/A) * A, if C a multiple of A

which also handles cases where the index is j - 2, j - 3, etc.

Bootstrapped for all languages and regression tested on x86-64 and aarch64.

PR tree-optimization/109393

gcc/ChangeLog:

* match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A.

gcc/testsuite/ChangeLog:

* gcc.dg/pr109393.c: New test.

Tested-by: Christoph Müllner 
Signed-off-by: Philipp Tomsich 
Signed-off-by: Konstantinos Eleftheriou 
---
 gcc/match.pd| 15 ++-
 gcc/testsuite/gcc.dg/pr109393.c | 23 +++
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr109393.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 621306213e4..9d971b663c6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4216,7 +4216,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 ? wi::max_value (TYPE_PRECISION (type), SIGNED)
 : wi::min_value (TYPE_PRECISION (type), SIGNED))
 && single_use (@3))
- (mult (plusminus @2 { build_one_cst (type); }) @0))
+ (mult (plusminus @2 { build_one_cst (type); }) @0)))
+   /* (A * B) + (-C) -> (B - C/A) * A, if C is a multiple of A.  */
+   (simplify
+(plus (mult:cs@3 integer_nonzerop@0 @2) INTEGER_CST@4)
+  (if (TREE_CODE (type) == INTEGER_TYPE
+ && wi::neg_p (wi::to_wide (@4)))
+   (with {
+ wide_int c1 = wi::to_wide (@0);
+ wide_int c2_abs = wi::abs (wi::to_wide (@4));
+ /* Calculate @4 / @0 in order to factorize the expression.  */
+ wide_int div_res = wi::div_trunc (c2_abs, c1, TYPE_SIGN (type));
+ tree div_cst = wide_int_to_tree (type, div_res); }
+   (if (wi::multiple_of_p (c2_abs, c1, TYPE_SIGN (type)))
+ (mult (minus @2 { div_cst; }) @0
 
 #if GIMPLE
 /* Canonicalize X + (X << C) into X * (1 + (1 << C)) and
diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
new file mode 100644
index 000..17bf9330796
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109393.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/109393 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(int *a, int j)
+{
+  int k = j - 1;
+  return a[j - 1] == a[k];
+}
+
+int foo2(int *a, int j)
+{
+  int k = j - 5;
+  return a[j - 5] == a[k];
+}
+
+int bar(int *a, int j)
+{
+  int k = j - 1;
+  return (&a[j + 1] - 2) == &a[k];
+}
+
+/* { dg-final { scan-tree-dump-times "return 1;" 3 "optimized" } } */
\ No newline at end of file
-- 
2.46.0

[PATCH]middle-end: check that the lhs of a COND_EXPR is an SSA_NAME in cond_store recognition [PR116628]

2024-09-06 Thread Tamar Christina

Hi All,

Because the vect_recog_bool_pattern can at the moment still transition
out of GIMPLE and back into GENERIC the vect_recog_cond_store_pattern can
end up using an expression as a mask rather than an SSA_NAME.

This adds an explicit check that we have a mask and not an expression.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/116628
* tree-vect-patterns.cc (vect_recog_cond_store_pattern): Add SSA_NAME
check on expression.

gcc/testsuite/ChangeLog:

PR tree-optimization/116628
* gcc.dg/vect/pr116628.c: New test.

---
diff --git a/gcc/testsuite/gcc.dg/vect/pr116628.c 
b/gcc/testsuite/gcc.dg/vect/pr116628.c
new file mode 100644
index 
..4068c657ac5570b10f2dca4be5109abbaf574f55
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr116628.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_masked_store } */
+/* { dg-additional-options "-Ofast -march=armv9-a" { target aarch64-*-* } } */
+
+typedef float c;
+c a[2000], b[0];
+void d() {
+  for (int e = 0; e < 2000; e++)
+if (b[e])
+  a[e] = b[e];
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
f7c3c623ea46ea09f4f86139d2a92bb6363aee3c..3a0d4cb7092cb59fe8b8664b682ade73ab5e9645
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6685,6 +6685,9 @@ vect_recog_cond_store_pattern (vec_info *vinfo,
   /* Check if the else value matches the original loaded one.  */
   bool invert = false;
   tree cmp_ls = gimple_arg (cond_stmt, 0);
+  if (TREE_CODE (cmp_ls) != SSA_NAME)
+return NULL;
+
   tree cond_arg1 = gimple_arg (cond_stmt, 1);
   tree cond_arg2 = gimple_arg (cond_stmt, 2);
 




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/pr116628.c b/gcc/testsuite/gcc.dg/vect/pr116628.c
new file mode 100644
index ..4068c657ac5570b10f2dca4be5109abbaf574f55
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr116628.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect_masked_store } */
+/* { dg-additional-options "-Ofast -march=armv9-a" { target aarch64-*-* } } */
+
+typedef float c;
+c a[2000], b[0];
+void d() {
+  for (int e = 0; e < 2000; e++)
+if (b[e])
+  a[e] = b[e];
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index f7c3c623ea46ea09f4f86139d2a92bb6363aee3c..3a0d4cb7092cb59fe8b8664b682ade73ab5e9645 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6685,6 +6685,9 @@ vect_recog_cond_store_pattern (vec_info *vinfo,
   /* Check if the else value matches the original loaded one.  */
   bool invert = false;
   tree cmp_ls = gimple_arg (cond_stmt, 0);
+  if (TREE_CODE (cmp_ls) != SSA_NAME)
+return NULL;
+
   tree cond_arg1 = gimple_arg (cond_stmt, 1);
   tree cond_arg2 = gimple_arg (cond_stmt, 2);

Re: [PATCH RFC] c-family: add attribute flag_enum [PR46457]

2024-09-06 Thread Jonathan Wakely


On 05/09/24 21:44 -0400, Jason Merrill wrote:

On 9/4/24 11:02 AM, Marek Polacek wrote:

+handle_flag_enum_attribute (tree *node, tree ARG_UNUSED(name), tree args,
+   int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (args)
+warning (OPT_Wattributes, "%qE attribute arguments ignored", name);

You don't need this check I think; if the # of args isn't correct, we
should not get here.  Then the goto can...go too.


Dropped.

On 9/4/24 11:28 AM, Eric Gallager wrote:


Question about PR tagging: should PR c++/81665 be tagged here, too?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81665


Added.

Here's what I'm pushing:


 

diff --git a/libstdc++-v3/include/bits/regex_constants.h 
b/libstdc++-v3/include/bits/regex_constants.h
index 437895f1dc3..4148093bc4e 100644
--- a/libstdc++-v3/include/bits/regex_constants.h
+++ b/libstdc++-v3/include/bits/regex_constants.h
@@ -66,7 +66,7 @@ namespace regex_constants
   * elements @c ECMAScript, @c basic, @c extended, @c awk, @c grep, @c egrep
   * %set.
   */
-  enum syntax_option_type : unsigned int
+  enum [[gnu::flag_enum]] syntax_option_type : unsigned int


This needs to be [[__gnu__::__flag_enum__]] because valid programs can
#define gnu 1
#define flag_enum 1

Re: [PATCH]middle-end: check that the lhs of a COND_EXPR is an SSA_NAME in cond_store recognition [PR116628]

2024-09-06 Thread Richard Biener

On Fri, 6 Sep 2024, Tamar Christina wrote:

> Hi All,
> 
> Because the vect_recog_bool_pattern can at the moment still transition
> out of GIMPLE and back into GENERIC the vect_recog_cond_store_pattern can
> end up using an expression as a mask rather than an SSA_NAME.
> 
> This adds an explicit check that we have a mask and not an expression.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/116628
>   * tree-vect-patterns.cc (vect_recog_cond_store_pattern): Add SSA_NAME
>   check on expression.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/116628
>   * gcc.dg/vect/pr116628.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/pr116628.c 
> b/gcc/testsuite/gcc.dg/vect/pr116628.c
> new file mode 100644
> index 
> ..4068c657ac5570b10f2dca4be5109abbaf574f55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr116628.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_masked_store } */
> +/* { dg-additional-options "-Ofast -march=armv9-a" { target aarch64-*-* } } 
> */
> +
> +typedef float c;
> +c a[2000], b[0];
> +void d() {
> +  for (int e = 0; e < 2000; e++)
> +if (b[e])
> +  a[e] = b[e];
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> f7c3c623ea46ea09f4f86139d2a92bb6363aee3c..3a0d4cb7092cb59fe8b8664b682ade73ab5e9645
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6685,6 +6685,9 @@ vect_recog_cond_store_pattern (vec_info *vinfo,
>/* Check if the else value matches the original loaded one.  */
>bool invert = false;
>tree cmp_ls = gimple_arg (cond_stmt, 0);
> +  if (TREE_CODE (cmp_ls) != SSA_NAME)
> +return NULL;
> +
>tree cond_arg1 = gimple_arg (cond_stmt, 1);
>tree cond_arg2 = gimple_arg (cond_stmt, 2);
>  
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 2/4]middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern

2024-09-06 Thread Richard Biener

On Tue, 3 Sep 2024, Tamar Christina wrote:

> Hi All,
> 
> Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
> In the cases where the conditonal is loop invariant or non-boolean it instead
> converts the operation back into GENERIC and hides much of the operation from
> the analysis part of the vectorizer.
> 
> i.e.
> 
>   a ? b : c
> 
> is transformed into:
> 
>   a != 0 ? b : c
> 
> however by doing so we can't perform any optimization on the mask as they 
> aren't
> explicit until quite late during codegen.
> 
> To fix this this patch lowers booleans earlier and so ensures that we are 
> always
> in GIMPLE.
> 
> For when the value is a loop invariant boolean we have to generate an 
> additional
> conversion from bool to the integer mask form.
> 
> This is done by creating a loop invariant a ? -1 : 0 with the target mask
> precision and then doing a normal != 0 comparison on that.
> 
> To support this the patch also adds the ability to during pattern matching
> create a loop invariant pattern that won't be seen by the vectorizer and will
> instead me materialized inside the loop preheader in the case of loops, or in
> the case of BB vectorization it materializes it in the first BB in the region.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?

OK, but can you clarify a question below?

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (append_inv_pattern_def_seq): New.
>   (vect_recog_bool_pattern): Lower COND_EXPRs.
>   * tree-vect-slp.cc (vect_schedule_slp): Materialize loop invariant
>   statements.
>   * tree-vect-loop.cc (vect_transform_loop): Likewise.
>   * tree-vect-stmts.cc (vectorizable_comparison_1): Remove
>   VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype.
>   * tree-vectorizer.cc (vec_info::vec_info): Initialize
>   inv_pattern_def_seq.
>   * tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New.
>   (class vec_info): Add inv_pattern_def_seq.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
>   * gcc.dg/vect/vect-conditional_store_5.c: New test.
>   * gcc.dg/vect/vect-conditional_store_6.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
> new file mode 100644
> index 
> ..650a3bfbfb1dd44afc2d58bbe85f75f1d28b9bd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_float } */
> +
> +/* { dg-additional-options "-mavx2" { target avx2 } } */
> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> +
> +void foo3 (float *restrict a, int *restrict c)
> +{
> +#pragma GCC unroll 8
> +  for (int i = 0; i < 8; i++)
> +c[i] = a[i] > 1.0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized using SLP" "slp1" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
> new file mode 100644
> index 
> ..37d60fa76351c13980427751be4450c14617a9a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_masked_store } */
> +
> +/* { dg-additional-options "-mavx2" { target avx2 } } */
> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> +
> +#include 
> +
> +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int 
> stride)
> +{
> +  if (stride <= 1)
> +return;
> +
> +  bool ai = a[0];
> +
> +  for (int i = 0; i < n; i++)
> +{
> +  int res = c[i];
> +  int t = b[i+stride];
> +  if (ai)
> +t = res;
> +  c[i] = t;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { target 
> aarch64-*-* } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
> new file mode 100644
> index 
> ..5e1aedf3726b073c132bb64a9b474592ceb8e9b9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_masked_store } */
> +
> +/* { dg-additional-options "-mavx2" { target avx2 } } */
> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> +
> +void foo3 (unsigned long long *restrict a, int *restrict b, int *restrict c, 
> int n, int stride)
> +{
> +

[PATCH v3] GCC Driver : Enable very long gcc command-line option

2024-09-06 Thread Deepthi . Hemraj

From: Deepthi Hemraj 

For excessively long environment variables i.e >128KB
Store the arguments in a temporary file and collect them back together in 
collect2.

This commit patches for COLLECT_GCC_OPTIONS issue:
GCC should not limit the length of command line passed to collect2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527

The Linux kernel has the following limits on shell commands:
I.  Total number of bytes used to specify arguments must be under 128KB.
II. Each environment variable passed to an executable must be under 128 KiB

In order to circumvent these limitations, many build tools support
response-files, i.e. files that contain the arguments for the executed
command. These are typically passed using @ syntax.

Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
expanded command line to collect2. With many options, this exceeds the limit II.

GCC : Added Testcase for PR111527

TC1 : If the command line argument less than 128kb, gcc should use
  COLLECT_GCC_OPTION to communicate and compile fine.
TC2 : If the command line argument in the range of 128kb to 2mb,
  gcc should copy arguments in a file and use FILE_GCC_OPTIONS
  to communicate and compile fine.
TC3 : If the command line argument greater thean 2mb, gcc shuld
  fail the compile and report error. (Expected FAIL)

Signed-off-by: sunil dora 
Signed-off-by: Topi Kuutela 
---
 gcc/collect2.cc   | 42 --
 gcc/gcc.cc| 37 +--
 gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
 7 files changed, 162 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c

diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index 902014a9cc1..5b5f16ab46c 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -376,6 +376,42 @@ typedef int scanfilter;
 
 static void scan_prog_file (const char *, scanpass, scanfilter);
 
+char* getenv_extended (const char* var_name)
+{
+  int file_size;
+  char* buf = NULL;
+  const char* prefix = "@";
+  size_t prefix_len = strlen(prefix);
+
+  char* string = getenv (var_name);
+  if (strncmp (string, prefix, prefix_len) == 0)
+{
+  FILE *fptr;
+  char *new_string = xstrdup(string + prefix_len);
+  fptr = fopen (new_string, "r");
+  if (fptr == NULL)
+   return (0);
+  /* Copy contents from temporary file to buffer */
+  if (fseek (fptr, 0, SEEK_END) == -1)
+   return (0);
+  file_size = ftell (fptr);
+  rewind (fptr);
+  buf = (char *) xmalloc (file_size + 1);
+  if (buf == NULL)
+   return (0);
+  if (fread ((void *) buf, file_size, 1, fptr) <= 0)
+   {
+ free (buf);
+ fatal_error (input_location, "fread failed");
+ return (0);
+   }
+  buf[file_size] = '\0';
+  free(new_string);
+  return buf;
+}
+  return string;
+}
+
 
 /* Delete tempfiles and exit function.  */
 
@@ -1004,7 +1040,7 @@ main (int argc, char **argv)
 /* Now pick up any flags we want early from COLLECT_GCC_OPTIONS
The LTO options are passed here as are other options that might
be unsuitable for ld (e.g. -save-temps).  */
-p = getenv ("COLLECT_GCC_OPTIONS");
+p = getenv_extended ("COLLECT_GCC_OPTIONS");
 while (p && *p)
   {
const char *q = extract_string (&p);
@@ -1200,7 +1236,7 @@ main (int argc, char **argv)
  AIX support needs to know if -shared has been specified before
  parsing commandline arguments.  */
 
-  p = getenv ("COLLECT_GCC_OPTIONS");
+  p = getenv_extended ("COLLECT_GCC_OPTIONS");
   while (p && *p)
 {
   const char *q = extract_string (&p);
@@ -1594,7 +1630,7 @@ main (int argc, char **argv)
   fprintf (stderr, "o_file  = %s\n",
   (o_file ? o_file : "not found"));
 
-  ptr = getenv ("COLLECT_GCC_OPTIONS");
+  ptr = getenv_extended ("COLLECT_GCC_OPTIONS");
   if (ptr)
fprintf (stderr, "COLLECT_GCC_OPTIONS = %s\n", ptr);
 
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index ae1d80fe00a..98c1dff6335 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -2953,12 +2953,43 @@ add_to_obstack (char *path, void *data)
   return NULL;
 }
 
-/* Add or change the value of an environment variable, outputting the
-   change to standard error if in verbose mode.  */
+/* Add or change the value of an environment variable,
+ * outputting the change to standard error if in verbose mode.  */
 static void
 xputenv (const char *string)

Re: [PATCH]middle-end: check that the lhs of a COND_EXPR is an SSA_NAME in cond_store recognition [PR116628]

2024-09-06 Thread Kyrylo Tkachov

Hi Tamar,


> On 6 Sep 2024, at 14:56, Tamar Christina  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi All,
> 
> Because the vect_recog_bool_pattern can at the moment still transition
> out of GIMPLE and back into GENERIC the vect_recog_cond_store_pattern can
> end up using an expression as a mask rather than an SSA_NAME.
> 
> This adds an explicit check that we have a mask and not an expression.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>PR tree-optimization/116628
>* tree-vect-patterns.cc (vect_recog_cond_store_pattern): Add SSA_NAME
>check on expression.
> 
> gcc/testsuite/ChangeLog:
> 
>PR tree-optimization/116628
>* gcc.dg/vect/pr116628.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/pr116628.c 
> b/gcc/testsuite/gcc.dg/vect/pr116628.c
> new file mode 100644
> index 
> ..4068c657ac5570b10f2dca4be5109abbaf574f55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr116628.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_masked_store } */
> +/* { dg-additional-options "-Ofast -march=armv9-a" { target aarch64-*-* } } 
> */

FWIW the ICE in the PR doesn’t trigger for me with -march=armv9-a. I think 
something in the heuristics for -mcpu=neoverse-v2 is needed.
Thanks,
Kyrill

> +
> +typedef float c;
> +c a[2000], b[0];
> +void d() {
> +  for (int e = 0; e < 2000; e++)
> +if (b[e])
> +  a[e] = b[e];
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> f7c3c623ea46ea09f4f86139d2a92bb6363aee3c..3a0d4cb7092cb59fe8b8664b682ade73ab5e9645
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6685,6 +6685,9 @@ vect_recog_cond_store_pattern (vec_info *vinfo,
>   /* Check if the else value matches the original loaded one.  */
>   bool invert = false;
>   tree cmp_ls = gimple_arg (cond_stmt, 0);
> +  if (TREE_CODE (cmp_ls) != SSA_NAME)
> +return NULL;
> +
>   tree cond_arg1 = gimple_arg (cond_stmt, 1);
>   tree cond_arg2 = gimple_arg (cond_stmt, 2);
> 
> 
> 
> 
> 
> --
>

Re: [PATCH 3/4][rtl]: simplify boolean vector EQ and NE comparisons

2024-09-06 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> This adds vector constant simplification for EQ and NE.  This is useful since
> the vectorizer generates a lot more vector compares now, in particular NE and 
> EQ
> and so these help us optimize cases where the values were not known at GIMPLE
> but instead only at RTL.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * simplify-rtx.cc (simplify_context::simplify_unary_operation): Try
>   simplifying operand.
>   (simplify_const_relational_operation): Simplify vector EQ and NE.
>   (test_vector_int_const_compare): New.
>   (test_vector_int_const_compare_ops): New.
>   (simplify_rtx_cc_tests): Use them.
>
> ---
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 
> a20a61c5dddbc80b23a9489d925a2c31b2163458..7e83e80246b70c81c388e77967f645d171efe983
>  100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -886,6 +886,10 @@ simplify_context::simplify_unary_operation (rtx_code 
> code, machine_mode mode,
>  
>trueop = avoid_constant_pool_reference (op);
>  
> +  /* If the operand is not a reg or constant try simplifying it first.  */
> +  if (rtx tmp_op = simplify_rtx (op))
> +op = tmp_op;
> +

We shouldn't need to do this.  The assumption is that the operands are
already simplified.

Which caller required this?

>tem = simplify_const_unary_operation (code, mode, trueop, op_mode);
>if (tem)
>  return tem;
> @@ -6354,6 +6358,35 @@ simplify_const_relational_operation (enum rtx_code 
> code,
>   return 0;
>  }
>  
> +  /* Check if the operands are a vector EQ or NE comparison.  */
> +  if (VECTOR_MODE_P (mode)
> +  && INTEGRAL_MODE_P (mode)
> +  && GET_CODE (op0) == CONST_VECTOR
> +  && GET_CODE (op1) == CONST_VECTOR
> +  && (code == EQ || code == NE))
> +{
> +  if (rtx_equal_p (op0, op1))
> + return code == EQ ? const_true_rtx : const0_rtx;
> +
> +  unsigned int npatterns0, npatterns1;
> +  if (CONST_VECTOR_NUNITS (op0).is_constant (&npatterns0)
> +   && CONST_VECTOR_NUNITS (op1).is_constant (&npatterns1))
> + {
> +   if (npatterns0 != npatterns1)
> + return code == EQ ? const0_rtx : const_true_rtx;

This looks like a typing error.  The operands have to have the same
number of elements.  But...

> +
> +   for (unsigned i = 0; i < npatterns0; i++)
> + {
> +   rtx val0 = CONST_VECTOR_ELT (op0, i);
> +   rtx val1 = CONST_VECTOR_ELT (op1, i);
> +   if (!rtx_equal_p (val0, val1))
> + return code == EQ ? const0_rtx : const_true_rtx;
> + }
> +
> +   return code == EQ ? const_true_rtx : const0_rtx;
> + }

...when is this loop needed?  For constant-sized vectors, isn't the
result always rtx_equal_p for EQ and !rtx_equal_p for NE?  If we have
equal vectors for which rtx_equal_p returns false then that should be
fixed.

For variable-sized vectors, I suppose the question is whether the
first unequal element is found in the minimum vector length, or whether
it only occurs for larger lengths.  In the former case we can fold at
compile time, but in the latter case we can't.

So we probably do want the loop for variable-length vectors, up to
constant_lower_bound (CONST_VECTOR_NUNITS (...)).

> +}
> +
>/* We can't simplify MODE_CC values since we don't know what the
>   actual comparison is.  */
>if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
> @@ -8820,6 +8853,55 @@ test_vector_ops ()
>  }
>  }
>  
> +/* Verify vector constant comparisons for EQ and NE.  */
> +
> +static void
> +test_vector_int_const_compare (machine_mode mode)
> +{
> +  rtx zeros = CONST0_RTX (mode);
> +  rtx minusone = CONSTM1_RTX (mode);
> +  rtx series_0_1 = gen_const_vec_series (mode, const0_rtx, const1_rtx);
> +  ASSERT_RTX_EQ (const0_rtx,
> +  simplify_const_relational_operation (EQ, mode, zeros,
> +   CONST1_RTX (mode)));
> +  ASSERT_RTX_EQ (const_true_rtx,
> +  simplify_const_relational_operation (EQ, mode, zeros,
> +   CONST0_RTX (mode)));
> +  ASSERT_RTX_EQ (const_true_rtx,
> +  simplify_const_relational_operation (EQ, mode, minusone,
> +   CONSTM1_RTX (mode)));
> +  ASSERT_RTX_EQ (const_true_rtx,
> +  simplify_const_relational_operation (NE, mode, zeros,
> +   CONST1_RTX (mode)));
> +  ASSERT_RTX_EQ (const_true_rtx,
> +  simplify_const_relational_operation (NE, mode, zeros,
> +   series_0_1));
> +  ASSERT_RTX_EQ (const0_rtx,
> +  simplify_const_relational_operation (EQ, mode, zeros,
> +   series_0_1));

RE: [PATCH 2/4]middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern

2024-09-06 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Friday, September 6, 2024 2:09 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 2/4]middle-end: lower COND_EXPR into gimple form in
> vect_recog_bool_pattern
> 
> On Tue, 3 Sep 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
> > In the cases where the conditonal is loop invariant or non-boolean it 
> > instead
> > converts the operation back into GENERIC and hides much of the operation 
> > from
> > the analysis part of the vectorizer.
> >
> > i.e.
> >
> >   a ? b : c
> >
> > is transformed into:
> >
> >   a != 0 ? b : c
> >
> > however by doing so we can't perform any optimization on the mask as they
> aren't
> > explicit until quite late during codegen.
> >
> > To fix this this patch lowers booleans earlier and so ensures that we are 
> > always
> > in GIMPLE.
> >
> > For when the value is a loop invariant boolean we have to generate an 
> > additional
> > conversion from bool to the integer mask form.
> >
> > This is done by creating a loop invariant a ? -1 : 0 with the target mask
> > precision and then doing a normal != 0 comparison on that.
> >
> > To support this the patch also adds the ability to during pattern matching
> > create a loop invariant pattern that won't be seen by the vectorizer and 
> > will
> > instead me materialized inside the loop preheader in the case of loops, or 
> > in
> > the case of BB vectorization it materializes it in the first BB in the 
> > region.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> 
> OK, but can you clarify a question below?
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-patterns.cc (append_inv_pattern_def_seq): New.
> > (vect_recog_bool_pattern): Lower COND_EXPRs.
> > * tree-vect-slp.cc (vect_schedule_slp): Materialize loop invariant
> > statements.
> > * tree-vect-loop.cc (vect_transform_loop): Likewise.
> > * tree-vect-stmts.cc (vectorizable_comparison_1): Remove
> > VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype.
> > * tree-vectorizer.cc (vec_info::vec_info): Initialize
> > inv_pattern_def_seq.
> > * tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New.
> > (class vec_info): Add inv_pattern_def_seq.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
> > * gcc.dg/vect/vect-conditional_store_5.c: New test.
> > * gcc.dg/vect/vect-conditional_store_6.c: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
> > new file mode 100644
> > index
> ..650a3bfbfb1dd44afc2d58b
> be85f75f1d28b9bd0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +/* { dg-additional-options "-mavx2" { target avx2 } } */
> > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > +
> > +void foo3 (float *restrict a, int *restrict c)
> > +{
> > +#pragma GCC unroll 8
> > +  for (int i = 0; i < 8; i++)
> > +c[i] = a[i] > 1.0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized using SLP" "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
> > new file mode 100644
> > index
> ..37d60fa76351c13980427
> 751be4450c14617a9a9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target vect_masked_store } */
> > +
> > +/* { dg-additional-options "-mavx2" { target avx2 } } */
> > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > +
> > +#include 
> > +
> > +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int 
> > stride)
> > +{
> > +  if (stride <= 1)
> > +return;
> > +
> > +  bool ai = a[0];
> > +
> > +  for (int i = 0; i < n; i++)
> > +{
> > +  int res = c[i];
> > +  int t = b[i+stride];
> > +  if (ai)
> > +t = res;
> > +  c[i] = t;
> > +}
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { target 
> > aarch64-
> *-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
> > new file mode 100644
> > index
> ..5e1aedf3726b0

[r15-3509 Regression] FAIL: gcc.target/i386/pr88531-2c.c scan-assembler-times vmulps 1 on Linux/x86_64

2024-09-06 Thread haochen.jiang

On Linux/x86_64,

d34cda720988674bcf8a24267c9e1ec61335d6de is the first bad commit
commit d34cda720988674bcf8a24267c9e1ec61335d6de
Author: Richard Biener 
Date:   Fri Sep 29 12:54:17 2023 +0200

Handle non-grouped stores as single-lane SLP

caused

FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 1
FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfmadd132pd[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfmsub132pd[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfnmadd132pd[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times vfnmsub132pd[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
vfmadd132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
vfmsub132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
vfnmadd132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
vfnmsub132ph[ \\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmadd132ps[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfmsub132ps[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmadd132ps[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times vfnmsub132ps[ 
\\t]+[^{\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/pr88531-2b.c scan-assembler-times vmulps 1
FAIL: gcc.target/i386/pr88531-2c.c scan-assembler-times vmulps 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3509/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-19c.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-19c.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-19c.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-19c.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_double-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_double-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_double-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_double-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma__Float16-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma__Float16-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma__Float16-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma__Float16-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_float-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_float-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_float-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/cond_op_fma_float-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr88531-2b.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr88531-2b.c --t

[PATCH][PR116569] match.pd: Check trunc_mod vector obtap before folding.

2024-09-06 Thread Jennifer Schmitz

In the pattern X - (X / Y) * Y to X % Y, this patch guards the
simplification for vector types by a check for:
1) Support of the mod optab for vectors OR
2) Application during early gimple passes (using PROP_gimple_any).
This is to prevent reverting vectorization of modulo to div/mult/sub
if the target does not support vector mod optab, while still allowing
the simplification during early gimple passes (as tested, for example,
in gcc.dg/fold-minus-1.c).

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
PR tree-optimization/116569
* generic-match-head.cc (optimize_early_gimple_p): Add inline
function with mask for early gimple passes.
* gimple-match-head.cc (optimize_early_gimple_p): Likewise.
* match.pd: Guard simplification to trunc_mod with check for
mod optab support.

gcc/testsuite/
PR tree-optimization/116569
* gcc.dg/torture/pr116569.c: New test.


0001-PR116569-match.pd-Check-trunc_mod-vector-obtap-befor.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature

RE: [PATCH 3/4][rtl]: simplify boolean vector EQ and NE comparisons

2024-09-06 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, September 6, 2024 2:21 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH 3/4][rtl]: simplify boolean vector EQ and NE comparisons
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This adds vector constant simplification for EQ and NE.  This is useful 
> > since
> > the vectorizer generates a lot more vector compares now, in particular NE 
> > and EQ
> > and so these help us optimize cases where the values were not known at 
> > GIMPLE
> > but instead only at RTL.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * simplify-rtx.cc (simplify_context::simplify_unary_operation): Try
> > simplifying operand.
> > (simplify_const_relational_operation): Simplify vector EQ and NE.
> > (test_vector_int_const_compare): New.
> > (test_vector_int_const_compare_ops): New.
> > (simplify_rtx_cc_tests): Use them.
> >
> > ---
> >
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > index
> a20a61c5dddbc80b23a9489d925a2c31b2163458..7e83e80246b70c81c388e77
> 967f645d171efe983 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -886,6 +886,10 @@ simplify_context::simplify_unary_operation (rtx_code
> code, machine_mode mode,
> >
> >trueop = avoid_constant_pool_reference (op);
> >
> > +  /* If the operand is not a reg or constant try simplifying it first.  */
> > +  if (rtx tmp_op = simplify_rtx (op))
> > +op = tmp_op;
> > +
> 
> We shouldn't need to do this.  The assumption is that the operands are
> already simplified.
> 
> Which caller required this?
> 
> >tem = simplify_const_unary_operation (code, mode, trueop, op_mode);
> >if (tem)
> >  return tem;
> > @@ -6354,6 +6358,35 @@ simplify_const_relational_operation (enum rtx_code
> code,
> > return 0;
> >  }
> >
> > +  /* Check if the operands are a vector EQ or NE comparison.  */
> > +  if (VECTOR_MODE_P (mode)
> > +  && INTEGRAL_MODE_P (mode)
> > +  && GET_CODE (op0) == CONST_VECTOR
> > +  && GET_CODE (op1) == CONST_VECTOR
> > +  && (code == EQ || code == NE))
> > +{
> > +  if (rtx_equal_p (op0, op1))
> > +   return code == EQ ? const_true_rtx : const0_rtx;
> > +
> > +  unsigned int npatterns0, npatterns1;
> > +  if (CONST_VECTOR_NUNITS (op0).is_constant (&npatterns0)
> > + && CONST_VECTOR_NUNITS (op1).is_constant (&npatterns1))
> > +   {
> > + if (npatterns0 != npatterns1)
> > +   return code == EQ ? const0_rtx : const_true_rtx;
> 
> This looks like a typing error.  The operands have to have the same
> number of elements.  But...
> 
> > +
> > + for (unsigned i = 0; i < npatterns0; i++)
> > +   {
> > + rtx val0 = CONST_VECTOR_ELT (op0, i);
> > + rtx val1 = CONST_VECTOR_ELT (op1, i);
> > + if (!rtx_equal_p (val0, val1))
> > +   return code == EQ ? const0_rtx : const_true_rtx;
> > +   }
> > +
> > + return code == EQ ? const_true_rtx : const0_rtx;
> > +   }
> 
> ...when is this loop needed?  For constant-sized vectors, isn't the
> result always rtx_equal_p for EQ and !rtx_equal_p for NE?  If we have
> equal vectors for which rtx_equal_p returns false then that should be
> fixed.

Hmm I suppose, I guess

  if (rtx_equal_p (op0, op1))
return code == EQ ? const_true_rtx : const0_rtx;
  else
return code == NE ? const_true_rtx : const0_rtx;

does the same thing, 

Fair, I just didn't think about it ☹

> 
> For variable-sized vectors, I suppose the question is whether the
> first unequal element is found in the minimum vector length, or whether
> it only occurs for larger lengths.  In the former case we can fold at
> compile time, but in the latter case we can't.
> 
> So we probably do want the loop for variable-length vectors, up to
> constant_lower_bound (CONST_VECTOR_NUNITS (...)).
> 
> > +}
> > +
> >/* We can't simplify MODE_CC values since we don't know what the
> >   actual comparison is.  */
> >if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
> > @@ -8820,6 +8853,55 @@ test_vector_ops ()
> >  }
> >  }
> >
> > +/* Verify vector constant comparisons for EQ and NE.  */
> > +
> > +static void
> > +test_vector_int_const_compare (machine_mode mode)
> > +{
> > +  rtx zeros = CONST0_RTX (mode);
> > +  rtx minusone = CONSTM1_RTX (mode);
> > +  rtx series_0_1 = gen_const_vec_series (mode, const0_rtx, const1_rtx);
> > +  ASSERT_RTX_EQ (const0_rtx,
> > +simplify_const_relational_operation (EQ, mode, zeros,
> > + CONST1_RTX (mode)));
> > +  ASSERT_RTX_EQ (const_true_rtx,
> > +simplify_const_relational_operation (EQ, mode, zeros,
> > + CONST0_RTX (mode)));
> > +  ASSERT_RTX_EQ (const_true_rtx,

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-06 Thread Qing Zhao



> On Sep 5, 2024, at 18:22, Bill Wendling  wrote:
> 
> Hi Qing,
> 
> Sorry for my late reply.
> 
> On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao  wrote:
>> 
>> Hi,
>> 
>> Thanks for the information.
>> 
>> Yes, providing a unary operator similar as __counted_by(PTR) as suggested by 
>> multiple people previously is a cleaner approach.
>> 
>> Then the programmer will use the following:
>> 
>> __builtin_choose_expr(
>> __builtin_has_attribute (__p->FAM, "counted_by”)
>> __builtin_get_counted_by(__p->FAM) = COUNT, 0);
>> 
>> From the programmer’s point of view, it’s cleaner too.
>> 
>> However, there is one issue with “__builtin_choose_expr” currently in GCC, 
>> its documentation explicitly mentions this limitation:  
>> (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr)
>> 
>> "Note: This construct is only available for C. Furthermore, the unused 
>> expression (exp1 or exp2 depending on the value of const_exp) may still 
>> generate syntax errors. This may change in future revisions.”
>> 
>> So, due to this limitation, when there is no counted_by attribute, the 
>> __builtin_get_counted_by() still is evaluated by the compiler and errors is 
>> issued and the compilation stops, this can be show from the small testing 
>> case:
>> 
>> [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c
>> 
>> struct flex {
>>  unsigned int b;
>>  int c[];
>> } *array_flex;
>> 
>> #define MY_ALLOC(P, FAM, COUNT) ({ \
>>  typeof(P) __p; \
>>  unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
>>  __p = (typeof(P)) __builtin_malloc(__size); \
>>  __builtin_choose_expr( \
>>__builtin_has_attribute (__p->FAM, counted_by), \
>>__builtin_counted_by_ref(__p->FAM) = COUNT, 0); \
>>  P = __p; \
>> })
>> 
>> int main(int argc, char *argv[])
>> {
>>  MY_ALLOC(array_flex, c, 20);
>>  return 0;
>> }
>> [opc@qinzhao-ol8u3-x86 gcc]$ sh t
>> ttt.c: In function ‘main’:
>> ttt.c:13:5: error: the argument must have ‘counted_by’ attribute 
>> ‘__builtin_counted_by_ref’
>> ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’
>> 
>> I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it does 
>> parse the __builtin_counted_by_ref(__p->FAM) even when 
>> __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued the error 
>> when parsing __builtin_counted_by_ref and stopped the compilation.
>> 
>> So, in order to support this approach, we first must fix the issue in the 
>> current __builtin_choose_expr in GCC. Otherwise, it’s impossible for the 
>> user to use this new builtin.
>> 
>> Let me know your comments and suggestions.
>> 
> Do you need to emit a diagnostic if the FAM doesn't have the
> counted_by attribute? It was originally supposed to "silently fail" if
> it didn't. We may need to do the same for Clang if so.

Yes, “silently fail” should workaround this problem if fixing the issue in the 
current __builtin_choose_expr is too complicate. 

I will study a little bit on how to fix the issue in __builtin_choose_expr 
first.

Martin and Joseph, any comment or suggestion from you?

thanks.

Qing


> 
> -bw

Re: [PATCH][PR116569] match.pd: Check trunc_mod vector obtap before folding.

2024-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2024 at 01:46:01PM +, Jennifer Schmitz wrote:
> In the pattern X - (X / Y) * Y to X % Y, this patch guards the
> simplification for vector types by a check for:
> 1) Support of the mod optab for vectors OR
> 2) Application during early gimple passes (using PROP_gimple_any).
> This is to prevent reverting vectorization of modulo to div/mult/sub
> if the target does not support vector mod optab, while still allowing
> the simplification during early gimple passes (as tested, for example,
> in gcc.dg/fold-minus-1.c).
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
>   PR tree-optimization/116569
>   * generic-match-head.cc (optimize_early_gimple_p): Add inline
>   function with mask for early gimple passes.
>   * gimple-match-head.cc (optimize_early_gimple_p): Likewise.
>   * match.pd: Guard simplification to trunc_mod with check for
>   mod optab support.
> 
> gcc/testsuite/
>   PR tree-optimization/116569
>   * gcc.dg/torture/pr116569.c: New test.

This is certainly wrong.
PROP_gimple_any is set already at the end of gimplification, so certainly
doesn't include any other early gimple passes.
And, not all statements are folded during gimplification, e.g. in OpenMP
regions folding is postponed until the omp lowering pass and folded only
there (at which point PROP_gimple_any is already set).

What exactly are you trying to ensure this optimization goes before?
For non-VL vectors I guess vector lowering, but that is done far later
and we already have a different predicate for that.
For VL vectors, what transforms that if user write % ?

Jakub

[PATCH] vect: Do not try to duplicate_and_interleave one-element mode.

2024-09-06 Thread Robin Dapp

Hi,

PR112694 shows that we try to create sub-vectors of single-element
vectors because can_duplicate_and_interleave_p returns true.
The problem resurfaced in PR116611.

This patch makes can_duplicate_and_interleave_p return false
if count / nvectors > 0 and removes the corresponding check in the riscv
backend.

This partially gets rid of the FAIL in slp-19a.c.  At least when built
with cost model we don't have LOAD_LANES anymore.  Without cost model,
as in the test suite, we choose a different path and still end up with
LOAD_LANES.

Bootstrapped and regtested on x86 and power10, regtested on
rv64gcv_zvfh_zvbb.  Still waiting for the aarch64 results.

Regards
 Robin

gcc/ChangeLog:

PR target/112694
PR target/116611.

* config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
return.
* tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
false when we cannot create sub-elements.
---
 gcc/config/riscv/riscv-v.cc | 9 -
 gcc/tree-vect-slp.cc| 4 
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b6c3a21e2d..5c5ed63d22e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode, machine_mode 
op_mode, rtx target,
  mask to do the iteration loop control. Just disable it directly.  */
   if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
 return false;
-  /* FIXME: Explicitly disable VLA interleave SLP vectorization when we
- may encounter ICE for poly size (1, 1) vectors in loop vectorizer.
- Ideally, middle-end loop vectorizer should be able to disable it
- itself, We can remove the codes here when middle-end code is able
- to disable VLA SLP vectorization for poly size (1, 1) VF.  */
-  if (!BYTES_PER_RISCV_VECTOR.is_constant ()
-  && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL,
-  poly_int64 (16, 16)))
-return false;
 
   struct expand_vec_perm_d d;
 
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3d2973698e2..17b59870c69 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
   unsigned int nvectors = 1;
   for (;;)
 {
+  /* We need to be able to to fuse COUNT / NVECTORS elements together,
+so no point in continuing if there are none.  */
+  if (nvectors > count)
+   return false;
   scalar_int_mode int_mode;
   poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
   if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
-- 
2.46.0

Re: [PATCH][PR116569] match.pd: Check trunc_mod vector obtap before folding.

2024-09-06 Thread Kyrylo Tkachov



> On 6 Sep 2024, at 16:00, Jakub Jelinek  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Sep 06, 2024 at 01:46:01PM +, Jennifer Schmitz wrote:
>> In the pattern X - (X / Y) * Y to X % Y, this patch guards the
>> simplification for vector types by a check for:
>> 1) Support of the mod optab for vectors OR
>> 2) Application during early gimple passes (using PROP_gimple_any).
>> This is to prevent reverting vectorization of modulo to div/mult/sub
>> if the target does not support vector mod optab, while still allowing
>> the simplification during early gimple passes (as tested, for example,
>> in gcc.dg/fold-minus-1.c).
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>  PR tree-optimization/116569
>>  * generic-match-head.cc (optimize_early_gimple_p): Add inline
>>  function with mask for early gimple passes.
>>  * gimple-match-head.cc (optimize_early_gimple_p): Likewise.
>>  * match.pd: Guard simplification to trunc_mod with check for
>>  mod optab support.
>> 
>> gcc/testsuite/
>>  PR tree-optimization/116569
>>  * gcc.dg/torture/pr116569.c: New test.
> 
> This is certainly wrong.
> PROP_gimple_any is set already at the end of gimplification, so certainly
> doesn't include any other early gimple passes.
> And, not all statements are folded during gimplification, e.g. in OpenMP
> regions folding is postponed until the omp lowering pass and folded only
> there (at which point PROP_gimple_any is already set).
> 
> What exactly are you trying to ensure this optimization goes before?
> For non-VL vectors I guess vector lowering, but that is done far later
> and we already have a different predicate for that.
> For VL vectors, what transforms that if user write % ?

There’s currently no way to write this in a generic VLA way. The SVE intrinsics 
for this would be opaque to GIMPLE and the generic vector extension doesn’t 
support VLA for now.
The problem is the fold-minus-1.c test case that wants to see the fold happen 
early on, and I think that makes sense from a canonicalization POV but when the 
vectorizer has expanded a vector mod later on we don’t want to put it back 
together.
I agree gimple_any doesn’t look like the right thing. Is there a better check 
to use?
Thanks,
Kyrill

> 
>Jakub
>

Re: [PATCH][PR116569] match.pd: Check trunc_mod vector obtap before folding.

2024-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2024 at 02:10:19PM +, Kyrylo Tkachov wrote:
> > This is certainly wrong.
> > PROP_gimple_any is set already at the end of gimplification, so certainly
> > doesn't include any other early gimple passes.
> > And, not all statements are folded during gimplification, e.g. in OpenMP
> > regions folding is postponed until the omp lowering pass and folded only
> > there (at which point PROP_gimple_any is already set).
> > 
> > What exactly are you trying to ensure this optimization goes before?
> > For non-VL vectors I guess vector lowering, but that is done far later
> > and we already have a different predicate for that.
> > For VL vectors, what transforms that if user write % ?
> 
> There’s currently no way to write this in a generic VLA way. The SVE 
> intrinsics for this would be opaque to GIMPLE and the generic vector 
> extension doesn’t support VLA for now.
> The problem is the fold-minus-1.c test case that wants to see the fold happen 
> early on, and I think that makes sense from a canonicalization POV but when 
> the vectorizer has expanded a vector mod later on we don’t want to put it 
> back together.
> I agree gimple_any doesn’t look like the right thing. Is there a better check 
> to use?

If it never works with SVE or RISC-V VL vectors, then match.pd shouldn't do
it unless it works.  Testcase can be always adjusted or limited to targets
which do support that.
Or, do it for VECTOR_INTEGER_TYPE_P only
if ((optimize_vectors_before_lowering_p ()
 && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
|| target_supports_op_p (type, TRUNC_MOD_EXPR, optab_vector))
i.e. do it before vector lowering for vectors which can be lowered,
and when the optabs works.
Anyway, I think it would be useful to check whether it actually results
in better generated code on targets which support TRUNC_DIV_EXPR
and MULT_EXPR on vectors but not TRUNC_MOD_EXPR (if there are any).

Jakub

[r15-3509 Regression] FAIL: gcc.target/i386/pr88531-2c.c scan-assembler-times vmulps 1 on Linux/x86_64

2024-09-06 Thread haochen.jiang

On Linux/x86_64,

d34cda720988674bcf8a24267c9e1ec61335d6de is the first bad commit
commit d34cda720988674bcf8a24267c9e1ec61335d6de
Author: Richard Biener 
Date:   Fri Sep 29 12:54:17 2023 +0200

Handle non-grouped stores as single-lane SLP

caused

FAIL: gcc.dg/vect/fast-math-vect-call-2.c scan-tree-dump-times vect 
"vectorizing stmts using SLP" 6

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3509/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/fast-math-vect-call-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH 2/4]middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern

2024-09-06 Thread Richard Biener




> Am 06.09.2024 um 15:28 schrieb Tamar Christina :
> 
> 
>> 
>> -Original Message-
>> From: Richard Biener 
>> Sent: Friday, September 6, 2024 2:09 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
>> Subject: Re: [PATCH 2/4]middle-end: lower COND_EXPR into gimple form in
>> vect_recog_bool_pattern
>> 
>>> On Tue, 3 Sep 2024, Tamar Christina wrote:
>>> 
>>> Hi All,
>>> 
>>> Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
>>> In the cases where the conditonal is loop invariant or non-boolean it 
>>> instead
>>> converts the operation back into GENERIC and hides much of the operation 
>>> from
>>> the analysis part of the vectorizer.
>>> 
>>> i.e.
>>> 
>>>  a ? b : c
>>> 
>>> is transformed into:
>>> 
>>>  a != 0 ? b : c
>>> 
>>> however by doing so we can't perform any optimization on the mask as they
>> aren't
>>> explicit until quite late during codegen.
>>> 
>>> To fix this this patch lowers booleans earlier and so ensures that we are 
>>> always
>>> in GIMPLE.
>>> 
>>> For when the value is a loop invariant boolean we have to generate an 
>>> additional
>>> conversion from bool to the integer mask form.
>>> 
>>> This is done by creating a loop invariant a ? -1 : 0 with the target mask
>>> precision and then doing a normal != 0 comparison on that.
>>> 
>>> To support this the patch also adds the ability to during pattern matching
>>> create a loop invariant pattern that won't be seen by the vectorizer and 
>>> will
>>> instead me materialized inside the loop preheader in the case of loops, or 
>>> in
>>> the case of BB vectorization it materializes it in the first BB in the 
>>> region.
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
>>> x86_64-pc-linux-gnu -m32, -m64 and no issues.
>>> 
>>> Ok for master?
>> 
>> OK, but can you clarify a question below?
>> 
>>> Thanks,
>>> Tamar
>>> 
>>> gcc/ChangeLog:
>>> 
>>>* tree-vect-patterns.cc (append_inv_pattern_def_seq): New.
>>>(vect_recog_bool_pattern): Lower COND_EXPRs.
>>>* tree-vect-slp.cc (vect_schedule_slp): Materialize loop invariant
>>>statements.
>>>* tree-vect-loop.cc (vect_transform_loop): Likewise.
>>>* tree-vect-stmts.cc (vectorizable_comparison_1): Remove
>>>VECT_SCALAR_BOOLEAN_TYPE_P handling for vectype.
>>>* tree-vectorizer.cc (vec_info::vec_info): Initialize
>>>inv_pattern_def_seq.
>>>* tree-vectorizer.h (LOOP_VINFO_INV_PATTERN_DEF_SEQ): New.
>>>(class vec_info): Add inv_pattern_def_seq.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>* gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
>>>* gcc.dg/vect/vect-conditional_store_5.c: New test.
>>>* gcc.dg/vect/vect-conditional_store_6.c: New test.
>>> 
>>> ---
>>> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
>> b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
>>> new file mode 100644
>>> index
>> ..650a3bfbfb1dd44afc2d58b
>> be85f75f1d28b9bd0
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-conditional_store_1.c
>>> @@ -0,0 +1,15 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target vect_float } */
>>> +
>>> +/* { dg-additional-options "-mavx2" { target avx2 } } */
>>> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
>>> +
>>> +void foo3 (float *restrict a, int *restrict c)
>>> +{
>>> +#pragma GCC unroll 8
>>> +  for (int i = 0; i < 8; i++)
>>> +c[i] = a[i] > 1.0;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "vectorized using SLP" "slp1" } } */
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
>> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
>>> new file mode 100644
>>> index
>> ..37d60fa76351c13980427
>> 751be4450c14617a9a9
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_5.c
>>> @@ -0,0 +1,28 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target vect_masked_store } */
>>> +
>>> +/* { dg-additional-options "-mavx2" { target avx2 } } */
>>> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
>>> +
>>> +#include 
>>> +
>>> +void foo3 (float *restrict a, int *restrict b, int *restrict c, int n, int 
>>> stride)
>>> +{
>>> +  if (stride <= 1)
>>> +return;
>>> +
>>> +  bool ai = a[0];
>>> +
>>> +  for (int i = 0; i < n; i++)
>>> +{
>>> +  int res = c[i];
>>> +  int t = b[i+stride];
>>> +  if (ai)
>>> +t = res;
>>> +  c[i] = t;
>>> +}
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
>>> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "vect" { target 
>>> aarch64-
>> *-* } } } */
>>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
>> b/gcc/testsuite/gcc.dg/vect/vect-conditional_store_6.c
>>

Re: [PATCH] vect: Do not try to duplicate_and_interleave one-element mode.

2024-09-06 Thread Richard Biener




> Am 06.09.2024 um 16:05 schrieb Robin Dapp :
> 
> Hi,
> 
> PR112694 shows that we try to create sub-vectors of single-element
> vectors because can_duplicate_and_interleave_p returns true.

Can we avoid querying the function?  CCing Richard who should know more about 
this.

Richard 

> The problem resurfaced in PR116611.
> 
> This patch makes can_duplicate_and_interleave_p return false
> if count / nvectors > 0 and removes the corresponding check in the riscv
> backend.
> 
> This partially gets rid of the FAIL in slp-19a.c.  At least when built
> with cost model we don't have LOAD_LANES anymore.  Without cost model,
> as in the test suite, we choose a different path and still end up with
> LOAD_LANES.
> 
> Bootstrapped and regtested on x86 and power10, regtested on
> rv64gcv_zvfh_zvbb.  Still waiting for the aarch64 results.
> 
> Regards
> Robin
> 
> gcc/ChangeLog:
> 
>PR target/112694
>PR target/116611.
> 
>* config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early
>return.
>* tree-vect-slp.cc (can_duplicate_and_interleave_p): Return
>false when we cannot create sub-elements.
> ---
> gcc/config/riscv/riscv-v.cc | 9 -
> gcc/tree-vect-slp.cc| 4 
> 2 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9b6c3a21e2d..5c5ed63d22e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode, 
> machine_mode op_mode, rtx target,
>  mask to do the iteration loop control. Just disable it directly.  */
>   if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL)
> return false;
> -  /* FIXME: Explicitly disable VLA interleave SLP vectorization when we
> - may encounter ICE for poly size (1, 1) vectors in loop vectorizer.
> - Ideally, middle-end loop vectorizer should be able to disable it
> - itself, We can remove the codes here when middle-end code is able
> - to disable VLA SLP vectorization for poly size (1, 1) VF.  */
> -  if (!BYTES_PER_RISCV_VECTOR.is_constant ()
> -  && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL,
> -   poly_int64 (16, 16)))
> -return false;
> 
>   struct expand_vec_perm_d d;
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 3d2973698e2..17b59870c69 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo, 
> unsigned int count,
>   unsigned int nvectors = 1;
>   for (;;)
> {
> +  /* We need to be able to to fuse COUNT / NVECTORS elements together,
> + so no point in continuing if there are none.  */
> +  if (nvectors > count)
> +return false;
>   scalar_int_mode int_mode;
>   poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT;
>   if (int_mode_for_size (elt_bits, 1).exists (&int_mode))
> --
> 2.46.0
>

[PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-06 Thread Evgeny Karpov

Friday, September 6, 2024
Martin Storsjö  wrote:

> Sorry, but no.
> 
> You can't just redefine how relocations in your object file format works,
> just because you feel like it.

This patch changes how symbol with offset will be emitted.

It will change:
adrp x0, symbol + offset

to:
adrp  x0, symbol
add x0, x0, offset

It is not optimal, however, it is not critical, and both relocation approaches
work with it. The concern regarding changing relocation in binutils is clear
and will be discussed on the binutils mailing list.

> Didn't you see my reply about how MSVC themselves also use this
> relocation?

Yes, the extended answer will be prepared.

Regards,
Evgeny

RE: [PATCH]middle-end: check that the lhs of a COND_EXPR is an SSA_NAME in cond_store recognition [PR116628]

2024-09-06 Thread Tamar Christina

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Friday, September 6, 2024 2:15 PM
> To: Tamar Christina 
> Cc: GCC Patches ; nd ; Richard Biener
> ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: check that the lhs of a COND_EXPR is an
> SSA_NAME in cond_store recognition [PR116628]
> 
> Hi Tamar,
> 
> 
> > On 6 Sep 2024, at 14:56, Tamar Christina  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi All,
> >
> > Because the vect_recog_bool_pattern can at the moment still transition
> > out of GIMPLE and back into GENERIC the vect_recog_cond_store_pattern can
> > end up using an expression as a mask rather than an SSA_NAME.
> >
> > This adds an explicit check that we have a mask and not an expression.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >PR tree-optimization/116628
> >* tree-vect-patterns.cc (vect_recog_cond_store_pattern): Add SSA_NAME
> >check on expression.
> >
> > gcc/testsuite/ChangeLog:
> >
> >PR tree-optimization/116628
> >* gcc.dg/vect/pr116628.c: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr116628.c
> b/gcc/testsuite/gcc.dg/vect/pr116628.c
> > new file mode 100644
> > index
> ..4068c657ac5570b10f2dca
> 4be5109abbaf574f55
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr116628.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_float } */
> > +/* { dg-require-effective-target vect_masked_store } */
> > +/* { dg-additional-options "-Ofast -march=armv9-a" { target aarch64-*-* } 
> > } */
> 
> FWIW the ICE in the PR doesn’t trigger for me with -march=armv9-a. I think
> something in the heuristics for -mcpu=neoverse-v2 is needed.

Hmm so it does generate the wrong pattern, but only for VNx2QI and due to 
costing
It's not chosen.  Looks like any VL128 sve picks the wrong one.

Ok I'll just update the test.

Thanks,
Tamar

> Thanks,
> Kyrill
> 
> > +
> > +typedef float c;
> > +c a[2000], b[0];
> > +void d() {
> > +  for (int e = 0; e < 2000; e++)
> > +if (b[e])
> > +  a[e] = b[e];
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> f7c3c623ea46ea09f4f86139d2a92bb6363aee3c..3a0d4cb7092cb59fe8b8664b6
> 82ade73ab5e9645 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -6685,6 +6685,9 @@ vect_recog_cond_store_pattern (vec_info *vinfo,
> >   /* Check if the else value matches the original loaded one.  */
> >   bool invert = false;
> >   tree cmp_ls = gimple_arg (cond_stmt, 0);
> > +  if (TREE_CODE (cmp_ls) != SSA_NAME)
> > +return NULL;
> > +
> >   tree cond_arg1 = gimple_arg (cond_stmt, 1);
> >   tree cond_arg2 = gimple_arg (cond_stmt, 2);
> >
> >
> >
> >
> >
> > --
> >

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-06 Thread Patrick Palka

On Thu, 5 Sep 2024, Jason Merrill wrote:

> On 9/5/24 2:28 PM, Patrick Palka wrote:
> > On Thu, 5 Sep 2024, Jason Merrill wrote:
> > 
> > > On 9/5/24 1:26 PM, Patrick Palka wrote:
> > > > On Thu, 5 Sep 2024, Jason Merrill wrote:
> > > > 
> > > > > On 9/5/24 10:54 AM, Patrick Palka wrote:
> > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > > > > > for trunk/14?
> > > > > > 
> > > > > > -- >8 --
> > > > > > 
> > > > > > A lambda within a default template argument used in some template-id
> > > > > > may have a smaller template depth than the context of the
> > > > > > template-id.
> > > > > > For example, the lambda in v1's default template argument has
> > > > > > template
> > > > > > depth 1, and in v2's has template depth 2, but the template-ids
> > > > > > v1<0>
> > > > > > and v2<0> which uses these default arguments appear in a depth 3
> > > > > > template
> > > > > > context.  So add_extra_args will ultimately return args with depth 3
> > > > > > --
> > > > > > too many args for the lambda, leading to a bogus substitution.
> > > > > > 
> > > > > > This patch fixes this by trimming the result of add_extra_args to
> > > > > > match
> > > > > > the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH
> > > > > > field
> > > > > > is added that tracks the template-ness of a lambda;
> > > > > > 
> > > > > > PR c++/116567
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * pt.cc (tsubst_lambda_expr): For a deferred-substitution
> > > > > > lambda,
> > > > > > trim the augmented template arguments to match the template
> > > > > > depth
> > > > > > of the lambda.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > > * g++.dg/cpp2a/lambda-targ7.C: New test.
> > > > > > ---
> > > > > > gcc/cp/pt.cc  | 11 +
> > > > > > gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30
> > > > > > +++
> > > > > > 2 files changed, 41 insertions(+)
> > > > > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > > > index 747e627f547..c49a26b4f5e 100644
> > > > > > --- a/gcc/cp/pt.cc
> > > > > > +++ b/gcc/cp/pt.cc
> > > > > > @@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args,
> > > > > > tsubst_flags_t complain, tree in_decl)
> > > > > >   LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args,
> > > > > > complain);
> > > > > >   return t;
> > > > > > }
> > > > > > +  if (LAMBDA_EXPR_EXTRA_ARGS (t))
> > > > > > +{
> > > > > > +  /* If we deferred substitution into this lambda, then it's
> > > > > > probably
> > > > > > from
> > > > > 
> > > > > "probably" seems wrong, given that it wasn't implemented for this
> > > > > case.
> > > > 
> > > > I said "probably" because in e.g.
> > > > 
> > > >   template
> > > >   bool b = true;
> > > > 
> > > >   template
> > > >   void f() {
> > > > b<0>;
> > > >   }
> > > > 
> > > > the lambda context has the same depth as the template-id context.  But
> > > > as you point out, the issue is ultimately related vs unrelated
> > > > parameters rather than depth.
> > > > 
> > > > > 
> > > > > > +a context (e.g. default template argument context) which may
> > > > > > have
> > > > > > fewer
> > > > > > +levels than the current context it's embedded in.  Adjust the
> > > > > > result
> > > > > > of
> > > > > > +add_extra_args accordingly.  */
> > > > > 
> > > > > Hmm, this looks like a situation of not just fewer levels, but
> > > > > potentially
> > > > > unrelated levels.  "args" here is for f, which shares no template
> > > > > context
> > > > > with
> > > > > v1.  What happens if your templates have non-type template parameters?
> > > > 
> > > > Indeed before add_extra_args 'args' will be unrelated, but after doing
> > > > add_extra_args the innermost levels of 'args' will correspond to the
> > > > lambda's template context, and so using get_innermost_template_args
> > > > ought to get rid of the unrelated arguments, keeping only the ones
> > > > relevant to the original lambda context.
> > > 
> > > Will they?  The original function of add_extra_args was to reintroduce
> > > outer
> > > args that we weren't able to substitute the last time through
> > > tsubst_lambda_expr.  I expect the innermost levels of 'args' to be the
> > > same
> > > before and after.
> > > 
> > > Hmm, looking at add_extra_args again, I see that whether the EXTRA_ARGS go
> > > on
> > > the outside or the inside depends on whether they're dependent. How does
> > > this
> > > work other than by accident? >.>
> > 
> > It's kind of a happy accident indeed :P  In the cases this patch is
> > concerned with, i.e. a template-id using a default argument containing
> > a lambda, the extra args will always be considered dependent because
> > during default template argument coercion we substitute an inc

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-06 Thread Jason Merrill


On 9/6/24 11:19 AM, Patrick Palka wrote:

On Thu, 5 Sep 2024, Jason Merrill wrote:


On 9/5/24 2:28 PM, Patrick Palka wrote:

On Thu, 5 Sep 2024, Jason Merrill wrote:


On 9/5/24 1:26 PM, Patrick Palka wrote:

On Thu, 5 Sep 2024, Jason Merrill wrote:


On 9/5/24 10:54 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

A lambda within a default template argument used in some template-id
may have a smaller template depth than the context of the
template-id.
For example, the lambda in v1's default template argument has
template
depth 1, and in v2's has template depth 2, but the template-ids
v1<0>
and v2<0> which uses these default arguments appear in a depth 3
template
context.  So add_extra_args will ultimately return args with depth 3
--
too many args for the lambda, leading to a bogus substitution.

This patch fixes this by trimming the result of add_extra_args to
match
the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH
field
is added that tracks the template-ness of a lambda;

PR c++/116567

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): For a deferred-substitution
lambda,
trim the augmented template arguments to match the template
depth
of the lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ7.C: New test.
---
 gcc/cp/pt.cc  | 11 +
 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30
+++
 2 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..c49a26b4f5e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args,
tsubst_flags_t complain, tree in_decl)
   LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args,
complain);
   return t;
 }
+  if (LAMBDA_EXPR_EXTRA_ARGS (t))
+{
+  /* If we deferred substitution into this lambda, then it's
probably
from


"probably" seems wrong, given that it wasn't implemented for this
case.


I said "probably" because in e.g.

   template
   bool b = true;

   template
   void f() {
 b<0>;
   }

the lambda context has the same depth as the template-id context.  But
as you point out, the issue is ultimately related vs unrelated
parameters rather than depth.




+a context (e.g. default template argument context) which may
have
fewer
+levels than the current context it's embedded in.  Adjust the
result
of
+add_extra_args accordingly.  */


Hmm, this looks like a situation of not just fewer levels, but
potentially
unrelated levels.  "args" here is for f, which shares no template
context
with
v1.  What happens if your templates have non-type template parameters?


Indeed before add_extra_args 'args' will be unrelated, but after doing
add_extra_args the innermost levels of 'args' will correspond to the
lambda's template context, and so using get_innermost_template_args
ought to get rid of the unrelated arguments, keeping only the ones
relevant to the original lambda context.


Will they?  The original function of add_extra_args was to reintroduce
outer
args that we weren't able to substitute the last time through
tsubst_lambda_expr.  I expect the innermost levels of 'args' to be the
same
before and after.

Hmm, looking at add_extra_args again, I see that whether the EXTRA_ARGS go
on
the outside or the inside depends on whether they're dependent. How does
this
work other than by accident? >.>


It's kind of a happy accident indeed :P  In the cases this patch is
concerned with, i.e. a template-id using a default argument containing
a lambda, the extra args will always be considered dependent because
during default template argument coercion we substitute an incomplete
set of targs into the default targ (namely the element corresponding to
the default targ is NULL_TREE), which any_dependent_template_arguments_p
considers dependent.  So add_extra_args will reliably put these captured
extra args as the innermost!


I see that the testcases you enabled this code to handle in
r12--g2c699fd29829cd were also about lambda/requires in default template
arguments.  Can we detect this case some other way than uses_template_parms?


I think during default template argument coercion we need to set
tf_partial when in a template context.  Then build_extra_args should
remember if this flag was set.  Then add_extra_args should merge the
extra arguments according to whether the flag was set during deferring.


Can we do the pruning added by this patch in add_extra_args instead of its
caller?


If we're only concerned with the default template argument situation, then
it seems 'extra' will always be a full set of template arguments (even when
the template-id names a template from the current instantiation) so I think
add_extra_args just needs to substitute into '

Re: [PATCH v2 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-06 Thread Martin Storsjö


On Fri, 6 Sep 2024, Evgeny Karpov wrote:


Friday, September 6, 2024
Martin Storsjö  wrote:


Sorry, but no.

You can't just redefine how relocations in your object file format works,
just because you feel like it.


This patch changes how symbol with offset will be emitted.

It will change:
adrp x0, symbol + offset

to:
adrp  x0, symbol
add x0, x0, offset


I presume more precisely, it changes

adrp x0, symbol + offset
add x0, x0, :lo12:symbol + offset

into

adrp x0, symbol
add x0, x0, :lo12:symbol
add x0, x0, offset


It is not optimal, however, it is not critical, and both relocation approaches
work with it.


Indeed, it's not optimal, and this other form is totally acceptable. And 
using it to work around an issue somewhere is also quite ok.


But don't make claims that "'symbol + offset' cannot be used in 
relocations for aarch64-w64-mingw32 due to relocation requirements." - 
because that is just misleading.


Only in the case if the offset is more than +/- 1 MB, you would need to 
change the form of the generated code.


And if the reason for this is the binutils patch that changes how the 
relocation is handled, that reason will go away, because I believe that 
patch is demonstrably wrong and that patch should be dropped (or 
reworked).


// Martin

Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option

2024-09-06 Thread Dora, Sunil Kumar

Hi Andrew,

Thank you for your feedback. Initially, we attempted to address the issue by 
utilizing GCC’s response files. However, we discovered that the 
COLLECT_GCC_OPTIONS variable already contains the expanded contents of the 
response files.

As a result, using response files only mitigates the multiplication factor but 
does not bypass the 128KB limit.
I have included the response file usage logs and the complete history in the 
Bugzilla report for your reference: Bugzilla 
Link.
Following your suggestion, I have updated the logic to avoid hardcoding /tmp.
Please find the revised version of patch at the following link:

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662519.html

Thanks,
Sunil Dora

From: Andrew Pinski 
Sent: Friday, August 30, 2024 8:05 PM
To: Hemraj, Deepthi 
Cc: gcc-patches@gcc.gnu.org ; rguent...@suse.de 
; jeffreya...@gmail.com ; 
josmy...@redhat.com ; MacLeod, Randy 
; Gowda, Naveen ; 
Dora, Sunil Kumar 
Subject: Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know 
the content is safe.

On Fri, Aug 30, 2024 at 12:34 AM  wrote:
>
> From: Deepthi Hemraj 
>
> For excessively long environment variables i.e >128KB
> Store the arguments in a temporary file and collect them back together in 
> collect2.
>
> This commit patches for COLLECT_GCC_OPTIONS issue:
> GCC should not limit the length of command line passed to collect2.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527
>
> The Linux kernel has the following limits on shell commands:
> I.  Total number of bytes used to specify arguments must be under 128KB.
> II. Each environment variable passed to an executable must be under 128 KiB
>
> In order to circumvent these limitations, many build tools support
> response-files, i.e. files that contain the arguments for the executed
> command. These are typically passed using @ syntax.
>
> Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
> expanded command line to collect2. With many options, this exceeds the limit 
> II.
>
> GCC : Added Testcase for PR111527
>
> TC1 : If the command line argument less than 128kb, gcc should use
>   COLLECT_GCC_OPTION to communicate and compile fine.
> TC2 : If the command line argument in the range of 128kb to 2mb,
>   gcc should copy arguments in a file and use FILE_GCC_OPTIONS
>   to communicate and compile fine.
> TC3 : If the command line argument greater thean 2mb, gcc shuld
>   fail the compile and report error. (Expected FAIL)
>
> Signed-off-by: sunil dora 
> Signed-off-by: Topi Kuutela 
> Signed-off-by: Deepthi Hemraj 
> ---
>  gcc/collect2.cc   | 39 ++--
>  gcc/gcc.cc| 37 +--
>  gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
>  gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
>  gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
>  gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
>  gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
>  7 files changed, 159 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c
>
> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> index 902014a9cc1..1f56963b1ce 100644
> --- a/gcc/collect2.cc
> +++ b/gcc/collect2.cc
> @@ -376,6 +376,39 @@ typedef int scanfilter;
>
>  static void scan_prog_file (const char *, scanpass, scanfilter);
>
> +char* getenv_extended (const char* var_name)
> +{
> +  int file_size;
> +  char* buf = NULL;
> +  const char* prefix = "/tmp";
> +
> +  char* string = getenv (var_name);
> +  if (strncmp (var_name, prefix, strlen(prefix)) == 0)

This is not what was meant by saying using the same env and supporting
response files.
Instead what Richard meant was use `@file` as the option that gets
passed via COLLECT_GCC_OPTIONS and then if you see `@` expand the
options like what is done for the normal command line.
Hard coding "/tmp" here is wrong because TMPDIR might not be set to
"/tmp" and even more with -save-temps, the response file should stay
around afterwards and be in the working directory rather than TMPDIR.

Thanks,
Andrew Pinski

> +{
> +  FILE *fptr;
> +  fptr = fopen (string, "r");
> +  if (fptr == NULL)
> +   return (0);
> +  /* Copy contents from temporary file to buffer */
> +  if (fseek (fptr, 0, SEEK_END) == -1)
> +   return (0);
> +  file_size = ftell (fptr);
> +  rewind (fptr);
> +  buf = (char *) xmalloc (file_size + 1);
> +

[PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-06 Thread Jin Ma

When we use flto, the function list of rvv will be generated twice,
once in the cc1 phase and once in the lto phase. However, due to
the different generation methods, the two lists are different.

For example, when there is no zvfh or zvfhmin in arch, it is
generated by calling function "riscv_pragma_intrinsic". since the
TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation,
a list of rvv functions related to float16 will be generated. In
the lto phase, the rvv function list is generated only by calling
the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16
is disabled, so that the float16-related rvv function list cannot
be generated like cc1. This will cause confusion, resulting in
matching tothe wrong function due to inconsistent fcode in the lto
phase, eventually leading to ICE.

So I think we should be consistent with their generated lists, which
is exactly what this patch does.

But there is still a problem here. If we use "-fchecking", we still
have ICE. This is because in the lto phase, after the rvv function
list is generated and before the expand_builtin, the ggc_grow will
be called to clean up the memory, resulting in
"(* registered_functions)[code]->decl" being cleaned up to
", and finally ICE".

I think this is wrong and needs to be fixed, maybe we shouldn't
use "ggc_alloc ()", or is there another better
way to implement it?

I'm trying to fix it here. Any comments here?

gcc/ChangeLog:

* config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov
to riscv-protos.h.
(riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c.
(riscv_pragma_intrinsic_flags_restore): Likewise.
(riscv_pragma_intrinsic): Likewise.
* config/riscv/riscv-protos.h (struct pragma_intrinsic_flags):
New.
(riscv_pragma_intrinsic_flags_restore): New.
(riscv_pragma_intrinsic_flags_pollute): New.
* config/riscv/riscv-vector-builtins.cc 
(riscv_pragma_intrinsic_flags_pollute): New.
(riscv_pragma_intrinsic_flags_restore): New.
(handle_pragma_vector_for_lto): New.
(init_builtins): Correct the processing logic for lto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-10.c: New test.
---
 gcc/config/riscv/riscv-c.cc   | 70 +---
 gcc/config/riscv/riscv-protos.h   | 13 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 83 ++-
 .../gcc.target/riscv/rvv/base/bug-10.c| 18 
 4 files changed, 114 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 71112d9c66d7..7037ecc1268a 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -34,72 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define builtin_define(TXT) cpp_define (pfile, TXT)
 
-struct pragma_intrinsic_flags
-{
-  int intrinsic_target_flags;
-
-  int intrinsic_riscv_vector_elen_flags;
-  int intrinsic_riscv_zvl_flags;
-  int intrinsic_riscv_zvb_subext;
-  int intrinsic_riscv_zvk_subext;
-};
-
-static void
-riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags)
-{
-  flags->intrinsic_target_flags = target_flags;
-  flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags;
-  flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags;
-  flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext;
-  flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext;
-
-  target_flags = target_flags
-| MASK_VECTOR;
-
-  riscv_zvl_flags = riscv_zvl_flags
-| MASK_ZVL32B
-| MASK_ZVL64B
-| MASK_ZVL128B;
-
-  riscv_vector_elen_flags = riscv_vector_elen_flags
-| MASK_VECTOR_ELEN_32
-| MASK_VECTOR_ELEN_64
-| MASK_VECTOR_ELEN_FP_16
-| MASK_VECTOR_ELEN_FP_32
-| MASK_VECTOR_ELEN_FP_64;
-
-  riscv_zvb_subext = riscv_zvb_subext
-| MASK_ZVBB
-| MASK_ZVBC
-| MASK_ZVKB;
-
-  riscv_zvk_subext = riscv_zvk_subext
-| MASK_ZVKG
-| MASK_ZVKNED
-| MASK_ZVKNHA
-| MASK_ZVKNHB
-| MASK_ZVKSED
-| MASK_ZVKSH
-| MASK_ZVKN
-| MASK_ZVKNC
-| MASK_ZVKNG
-| MASK_ZVKS
-| MASK_ZVKSC
-| MASK_ZVKSG
-| MASK_ZVKT;
-}
-
-static void
-riscv_pragma_intrinsic_flags_restore (struct pragma_intrinsic_flags *flags)
-{
-  target_flags = flags->intrinsic_target_flags;
-
-  riscv_vector_elen_flags = flags->intrinsic_riscv_vector_elen_flags;
-  riscv_zvl_flags = flags->intrinsic_riscv_zvl_flags;
-  riscv_zvb_subext = flags->intrinsic_riscv_zvb_subext;
-  riscv_zvk_subext = flags->intrinsic_riscv_zvk_subext;
-}
-
 static int
 riscv_ext_version_value (unsigned major, unsigned minor)
 {
@@ -269,14 +203,14 @@ riscv_pragma_intrinsic (cpp_reader *)
 {
   struct pragma_intrinsic_flags backup_flags;
 
-  riscv_pragma_intrinsic_flags_pollute (&backup_flags);
+  riscv_vector::riscv_pragma_intrinsic_flags_pollute (&backup_flags);

[PATCH v4] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for XTheadVector

2024-09-06 Thread Jin Ma

Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.

PR target/116592

gcc/ChangeLog:

* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.

Reported-by: nihui 
---
 gcc/config/riscv/thead.cc |  4 +-
 .../riscv/rvv/xtheadvector/pr116592.c | 38 +++
 2 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
-   return "th.vsetvli\tzero,%0,e%1,%m2";
+   return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
- return "th.vsetvli\t%0,%1,e%2,%m3";
+ return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
new file mode 100644
index ..a7cd8c5bdb72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
@@ -0,0 +1,38 @@
+/* { dg-do assemble } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2 -save-temps" 
{ target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2 -save-temps" { 
target { rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
+
+/* { dg-final { scan-assembler-not {th\.vsetvli\s+zero,0} } } */
-- 
2.17.1

Re: [PATCH v3] RISC-V: Fix illegal operands "th.vsetvli zero,0,e32,m8" for XTheadVector

2024-09-06 Thread Jin Ma


> Sorry, I still don't see assembly check.

I am very sorry, I uploaded the wrong patch and I tried to correct it :)

By the way, there seems to be no exp here to really test the XTheadVector test 
cases,
which may have been missed when the XTheadVector extension was first 
implemented.

Maybe I should write one for that.

BR
Jin

Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option

2024-09-06 Thread Andrew Pinski

On Fri, Sep 6, 2024, 9:38 AM Dora, Sunil Kumar <
sunilkumar.d...@windriver.com> wrote:

> Hi Andrew,
>
> Thank you for your feedback. Initially, we attempted to address the issue
> by utilizing GCC’s response files. However, we discovered that the
> COLLECT_GCC_OPTIONS variable already contains the expanded contents of
> the response files.
>
> As a result, using response files only mitigates the multiplication factor
> but does not bypass the 128KB limit.
>

I think you missed understood me fully. What I was saying instead of
creating a string inside set_collect_gcc_options, create the response file
and pass that via COLLECT_GCC_OPTIONS with the @file format. And then
inside collect2.cc when using COLLECT_GCC_OPTIONS/extract_string instead
read in the response file options if there was an @file instead of those 2
loops. This requires more than what you did. Oh and should be less memory
hungry and maybe slightly faster.

Thanks,
Andrew



I have included the response file usage logs and the complete history in
> the Bugzilla report for your reference: Bugzilla Link
> .
> Following your suggestion, I have updated the logic to avoid hardcoding
> /tmp.
> Please find the revised version of patch at the following link:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662519.html
>
> Thanks,
> Sunil Dora
> --
> *From:* Andrew Pinski 
> *Sent:* Friday, August 30, 2024 8:05 PM
> *To:* Hemraj, Deepthi 
> *Cc:* gcc-patches@gcc.gnu.org ; rguent...@suse.de
> ; jeffreya...@gmail.com ;
> josmy...@redhat.com ; MacLeod, Randy <
> randy.macl...@windriver.com>; Gowda, Naveen ;
> Dora, Sunil Kumar 
> *Subject:* Re: [PATCH v2] GCC Driver : Enable very long gcc command-line
> option
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On Fri, Aug 30, 2024 at 12:34 AM  wrote:
> >
> > From: Deepthi Hemraj 
> >
> > For excessively long environment variables i.e >128KB
> > Store the arguments in a temporary file and collect them back together
> in collect2.
> >
> > This commit patches for COLLECT_GCC_OPTIONS issue:
> > GCC should not limit the length of command line passed to collect2.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527
> >
> > The Linux kernel has the following limits on shell commands:
> > I.  Total number of bytes used to specify arguments must be under 128KB.
> > II. Each environment variable passed to an executable must be under 128
> KiB
> >
> > In order to circumvent these limitations, many build tools support
> > response-files, i.e. files that contain the arguments for the executed
> > command. These are typically passed using @ syntax.
> >
> > Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
> > expanded command line to collect2. With many options, this exceeds the
> limit II.
> >
> > GCC : Added Testcase for PR111527
> >
> > TC1 : If the command line argument less than 128kb, gcc should use
> >   COLLECT_GCC_OPTION to communicate and compile fine.
> > TC2 : If the command line argument in the range of 128kb to 2mb,
> >   gcc should copy arguments in a file and use FILE_GCC_OPTIONS
> >   to communicate and compile fine.
> > TC3 : If the command line argument greater thean 2mb, gcc shuld
> >   fail the compile and report error. (Expected FAIL)
> >
> > Signed-off-by: sunil dora 
> > Signed-off-by: Topi Kuutela 
> > Signed-off-by: Deepthi Hemraj 
> > ---
> >  gcc/collect2.cc   | 39 ++--
> >  gcc/gcc.cc| 37 +--
> >  gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
> >  7 files changed, 159 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c
> >
> > diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> > index 902014a9cc1..1f56963b1ce 100644
> > --- a/gcc/collect2.cc
> > +++ b/gcc/collect2.cc
> > @@ -376,6 +376,39 @@ typedef int scanfilter;
> >
> >  static void scan_prog_file (const char *, scanpass, scanfilter);
> >
> > +char* getenv_extended (const char* var_name)
> > +{
> > +  int file_size;
> > +  char* buf = NULL;
> > +  const char* prefix = "/tmp";
> > +
> > +  char* string = getenv (var_name);
> > +  if (strncmp (var_name, prefix, strlen(prefix)) == 0)
>
> This is not what was meant by saying using the same e

Re: New version of unsiged patch

2024-09-06 Thread Steve Kargl

On Thu, Sep 05, 2024 at 09:07:20AM +0200, Thomas Koenig wrote:
> Ping (a little bit)?
> 
> With another weekend coming up, I would have some time to
> work on incorporating any feedback, or on putting in
> more intrinsics.
> 

In the documentation, you have

+Generally, unsigned integers are only permitted as data in intrinsics.

Does the word 'intrinsics' apply to 'intrinsic operators'
or 'intrinsic subprograms' or both?  This might benefit from
a big of wordiness.

   Generally, unsigned integers are only permitted as 
   operands in intrinsics operations or as actual arguments
   to subprograms with unsigned integer dummy arguments.

-- 
Steve

[PATCH] c++: Fix up pedantic handling of alignas [PR110345]

2024-09-06 Thread Jakub Jelinek

Hi!

The following patch on top of the PR110345 P2552R3 series:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661904.html   

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661905.html   

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661906.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662330.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662331.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662333.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662334.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662336.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662379.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662380.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662381.html
emits pedantic pedwarns for alignas appertaining to incorrect entities.
As the middle-end and attribute exclusions look for "aligned" attribute,
the patch transforms alignas into "internal "::aligned attribute (didn't
use [[aligned (x)]] so that people can't type it that way).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-06  Jakub Jelinek  

PR c++/110345
gcc/c-family/
* c-common.h (attr_aligned_exclusions): Declare.
(handle_aligned_attribute): Likewise.
* c-attribs.cc (handle_aligned_attribute): No longer
static.
(attr_aligned_exclusions): Use extern instead of static.
gcc/cp/
* cp-tree.h (enum cp_tree_index): Add CPTI_INTERNAL_IDENTIFIER.
(internal_identifier): Define.
(internal_attribute_table): Declare.
* parser.cc (cp_parser_exception_declaration): Error on alignas
on exception declaration.
(cp_parser_std_attribute_spec): Turn alignas into internal
ns aligned attribute rather than gnu.
* decl.cc (initialize_predefined_identifiers): Initialize
internal_identifier.
* tree.cc (handle_alignas_attribute): New function.
(internal_attributes): New variable.
(internal_attribute_table): Likewise.
* cp-objcp-common.h (cp_objcp_attribute_table): Add
internal_attribute_table entry.
gcc/testsuite/
* g++.dg/cpp0x/alignas1.C: Add dg-options "".
* g++.dg/cpp0x/alignas2.C: Likewise.
* g++.dg/cpp0x/alignas7.C: Likewise.
* g++.dg/cpp0x/alignas21.C: New test.
* g++.dg/ext/bitfield9.C: Expect a warning.
* g++.dg/cpp2a/is-layout-compatible3.C: Add dg-options -pedantic.
Expect a warning.

--- gcc/c-family/c-common.h.jj  2024-09-06 13:43:37.311307920 +0200
+++ gcc/c-family/c-common.h 2024-09-06 15:33:47.497616169 +0200
@@ -1645,8 +1645,10 @@ extern int parse_tm_stmt_attr (tree, int
 extern int tm_attr_to_mask (tree);
 extern tree tm_mask_to_attr (int);
 extern tree find_tm_attribute (tree);
+extern const struct attribute_spec::exclusions attr_aligned_exclusions[];
 extern const struct attribute_spec::exclusions attr_cold_hot_exclusions[];
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[];
+extern tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
 extern tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
 extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
 extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
--- gcc/c-family/c-attribs.cc.jj2024-09-06 13:43:37.300308064 +0200
+++ gcc/c-family/c-attribs.cc   2024-09-06 16:00:55.864465359 +0200
@@ -100,7 +100,6 @@ static tree handle_destructor_attribute
 static tree handle_mode_attribute (tree *, tree, tree, int, bool *);
 static tree handle_section_attribute (tree *, tree, tree, int, bool *);
 static tree handle_special_var_sec_attribute (tree *, tree, tree, int, bool *);
-static tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_warn_if_not_aligned_attribute (tree *, tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree *, tree, tree,
@@ -192,7 +191,7 @@ static tree handle_null_terminated_strin
   { name, function, type, variable }
 
 /* Define attributes that are mutually exclusive with one another.  */
-static const struct attribute_spec::exclusions attr_aligned_exclusions[] =
+extern const struct attribute_spec::exclusions attr_aligned_exclusions[] =
 {
   /* Attribute name exclusion applies to:
function, type, variable */
@@ -2806,7 +2805,7 @@ common_handle_aligned_attribute (tree *n
 /* Handle a "aligned" attribute; arguments as in
struct attribute_spec.handler.  */
 
-static tree
+tree
 handle_aligned_attribute (tree *node, tree name, tree args,

[PATCH] c++, v3: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]

2024-09-06 Thread Jakub Jelinek

On Wed, Sep 04, 2024 at 10:31:48PM +0200, Franz Sirl wrote:
> Hmm, it just occured to me, how about adding !NONVIRTUAL here? When
> NONVIRTUAL is true, there is no conditional stmt at all, or?

Yeah, that makes sense, the problem doesn't happen in that case.

Here is an adjusted patch, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2024-09-06  Jakub Jelinek  

PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Use save_expr
on instance_ptr and function even if it doesn't have side-effects,
as long as it isn't a decl.

* g++.dg/ubsan/pr116449.C: New test.

--- gcc/cp/typeck.cc.jj 2024-09-02 17:07:30.115098114 +0200
+++ gcc/cp/typeck.cc2024-09-04 19:08:24.127490242 +0200
@@ -4188,10 +4188,23 @@ get_member_function_from_ptrfunc (tree *
   if (!nonvirtual && is_dummy_object (instance_ptr))
nonvirtual = true;
 
-  if (TREE_SIDE_EFFECTS (instance_ptr))
-   instance_ptr = instance_save_expr = save_expr (instance_ptr);
+  /* Use save_expr even when instance_ptr doesn't have side-effects,
+unless it is a simple decl (save_expr won't do anything on
+constants), so that we don't ubsan instrument the expression
+multiple times.  See PR116449.  */
+  if (TREE_SIDE_EFFECTS (instance_ptr)
+ || (!nonvirtual && !DECL_P (instance_ptr)))
+   {
+ instance_save_expr = save_expr (instance_ptr);
+ if (instance_save_expr == instance_ptr)
+   instance_save_expr = NULL_TREE;
+ else
+   instance_ptr = instance_save_expr;
+   }
 
-  if (TREE_SIDE_EFFECTS (function))
+  /* See above comment.  */
+  if (TREE_SIDE_EFFECTS (function)
+ || (!nonvirtual && !DECL_P (function)))
function = save_expr (function);
 
   /* Start by extracting all the information from the PMF itself.  */
--- gcc/testsuite/g++.dg/ubsan/pr116449.C.jj2024-09-04 18:58:46.106764285 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr116449.C   2024-09-04 18:58:46.106764285 
+0200
@@ -0,0 +1,14 @@
+// PR c++/116449
+// { dg-do compile }
+// { dg-options "-O2 -Wall -fsanitize=undefined" }
+
+struct C { void foo (int); void bar (); int c[16]; };
+typedef void (C::*P) ();
+struct D { P d; };
+static D e[1] = { { &C::bar } };
+
+void
+C::foo (int x)
+{
+  (this->*e[c[x]].d) ();
+}


Jakub

[PATCH] libiberty: Fix up > 64K section handling in simple_object_elf_copy_lto_debug_section [PR116614]

2024-09-06 Thread Jakub Jelinek

Hi!

cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) 
A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) 
B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) 
C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) 
D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g 
-fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
{
  *err = ENOTSUP;
  return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fix.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-06  Jakub Jelinek  

PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections.  Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.

--- libiberty/simple-object-elf.c.jj2024-01-03 12:07:48.461085637 +0100
+++ libiberty/simple-object-elf.c   2024-09-06 13:34:12.796669098 +0200
@@ -128,9 +128,9 @@ typedef struct {
 
 #define SHN_UNDEF  0   /* Undefined section */
 #define SHN_LORESERVE  0xFF00  /* Begin range of reserved indices */
-#define SHN_COMMON 0xFFF2  /* Associated symbol is in common */
+#define SHN_COMMON 0xFFF2  /* Associated symbol is in common */
 #define SHN_XINDEX 0x  /* Section index is held elsewhere */
-#define SHN_HIRESERVE  0x  /* End of reserved indices */
+#define SHN_HIRESERVE  0x  /* End of reserved indices */
 
 
 /* 32-bit ELF program header.  */
@@ -569,8 +569,8 @@ simple_object_elf_find_sections (simple_
 void *data,
 int *err)
 {
-  struct simple_object_elf_read *eor =
-(struct simple_object_elf_read *) sobj->data;
+  struct simple_object_elf_read *eor
+= (struct simple_object_elf_read *) sobj->data;
   const struct elf_type_functions *type_functions = eor->type_functions;
   unsigned char ei_class = eor->ei_class;
   size_t shdr_size;
@@ -662,8 +662,8 @@ simple_object_elf_fetch_attributes (simp
const char **errmsg ATTRIBUTE_UNUSED,
int *err ATTRIBUTE_UNUSED)
 {
-  struct simple_object_elf_read *eor =
-(struct simple_object_elf_read *) sobj->data;
+  struct simple_object_elf_read *eor
+= (struct simple_object_elf_read *) sobj->data;
   struct simple_object_elf_attributes *ret;
 
   ret = XNEW (struct simple_object_elf_attribut

Re: New version of unsiged patch

2024-09-06 Thread Steve Kargl

On Thu, Sep 05, 2024 at 09:07:20AM +0200, Thomas Koenig wrote:
> Ping (a little bit)?
> 
> With another weekend coming up, I would have some time to
> work on incorporating any feedback, or on putting in
> more intrinsics.
> 

Last comment as I've made it to the end of the patch.

Your testcases are all free source form.  In fixed 
form, gfortran would need to deal with 'u = 1 2 3 4 u _8'
I don't have the patch in my tree at the moment,
it isn't clear to me if the matcher for an unsigned
constant is prepared to deal with space between 4 an u
in the above.  The match_digits likely leaves the 
current locus at u, but I'll need to go check that.

-- 
Steve

Re: New version of unsiged patch

2024-09-06 Thread Steve Kargl

On Sun, Aug 18, 2024 at 12:10:18PM +0200, Thomas Koenig wrote:
> Hello world,
> 
> this version of the patch includes DOT_PRODUCT, MATMUL and quite
> a few improvements for simplification.
> 

All,

I have gone through Thomas's current patch and sent a
few emails with comments to him.  To keep things managable
for both Thomas and future reviews, I'm inclined to
have the patch committed unless there is some objection
for others.

Thanks Thomas for leading the charge.

-- 
steve

Re: [PATCH RFC] c-family: add attribute flag_enum [PR46457]

2024-09-06 Thread Jason Merrill


On 9/6/24 8:56 AM, Jonathan Wakely wrote:

On 05/09/24 21:44 -0400, Jason Merrill wrote:

On 9/4/24 11:02 AM, Marek Polacek wrote:
+handle_flag_enum_attribute (tree *node, tree ARG_UNUSED(name), tree 
args,

+    int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (args)
+    warning (OPT_Wattributes, "%qE attribute arguments ignored", 
name);

You don't need this check I think; if the # of args isn't correct, we
should not get here.  Then the goto can...go too.


Dropped.

On 9/4/24 11:28 AM, Eric Gallager wrote:


Question about PR tagging: should PR c++/81665 be tagged here, too?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81665


Added.

Here's what I'm pushing:



diff --git a/libstdc++-v3/include/bits/regex_constants.h 
b/libstdc++-v3/include/bits/regex_constants.h

index 437895f1dc3..4148093bc4e 100644
--- a/libstdc++-v3/include/bits/regex_constants.h
+++ b/libstdc++-v3/include/bits/regex_constants.h
@@ -66,7 +66,7 @@ namespace regex_constants
   * elements @c ECMAScript, @c basic, @c extended, @c awk, @c grep, 
@c egrep

   * %set.
   */
-  enum syntax_option_type : unsigned int
+  enum [[gnu::flag_enum]] syntax_option_type : unsigned int


This needs to be [[__gnu__::__flag_enum__]] because valid programs can
#define gnu 1
#define flag_enum 1


Oops, fixed:


From e4b64bea337d9ac936c555154f9d60c4876b65d3 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Fri, 6 Sep 2024 12:12:24 -0400
Subject: [PATCH] libstdc++: add missing __
To: gcc-patches@gcc.gnu.org

I forgot the __ in my recent r15-3500-g1914ca8791ce4e.

libstdc++-v3/ChangeLog:

	* include/bits/regex_constants.h: Add __ to attribute.
---
 libstdc++-v3/include/bits/regex_constants.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/regex_constants.h b/libstdc++-v3/include/bits/regex_constants.h
index 4148093bc4e..cb70a8647d9 100644
--- a/libstdc++-v3/include/bits/regex_constants.h
+++ b/libstdc++-v3/include/bits/regex_constants.h
@@ -66,7 +66,7 @@ namespace regex_constants
* elements @c ECMAScript, @c basic, @c extended, @c awk, @c grep, @c egrep
* %set.
*/
-  enum [[gnu::flag_enum]] syntax_option_type : unsigned int
+  enum [[__gnu__::__flag_enum__]] syntax_option_type : unsigned int
   {
 _S_icase		= 1 << 0,
 _S_nosubs		= 1 << 1,
-- 
2.46.0

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-06 Thread Martin Uecker

Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao:
> 
> > On Sep 5, 2024, at 18:22, Bill Wendling  wrote:
> > 
> > Hi Qing,
> > 
> > Sorry for my late reply.
> > 
> > On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao  wrote:
> > > 
> > > Hi,
> > > 
> > > Thanks for the information.
> > > 
> > > Yes, providing a unary operator similar as __counted_by(PTR) as suggested 
> > > by multiple people previously is a cleaner approach.
> > > 
> > > Then the programmer will use the following:
> > > 
> > > __builtin_choose_expr(
> > > __builtin_has_attribute (__p->FAM, "counted_by”)
> > > __builtin_get_counted_by(__p->FAM) = COUNT, 0);
> > > 
> > > From the programmer’s point of view, it’s cleaner too.
> > > 
> > > However, there is one issue with “__builtin_choose_expr” currently in 
> > > GCC, its documentation explicitly mentions this limitation:  
> > > (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr)
> > > 
> > > "Note: This construct is only available for C. Furthermore, the unused 
> > > expression (exp1 or exp2 depending on the value of const_exp) may still 
> > > generate syntax errors. This may change in future revisions.”
> > > 
> > > So, due to this limitation, when there is no counted_by attribute, the 
> > > __builtin_get_counted_by() still is evaluated by the compiler and errors 
> > > is issued and the compilation stops, this can be show from the small 
> > > testing case:
> > > 
> > > [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c
> > > 
> > > struct flex {
> > >  unsigned int b;
> > >  int c[];
> > > } *array_flex;
> > > 
> > > #define MY_ALLOC(P, FAM, COUNT) ({ \
> > >  typeof(P) __p; \
> > >  unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
> > >  __p = (typeof(P)) __builtin_malloc(__size); \
> > >  __builtin_choose_expr( \
> > >__builtin_has_attribute (__p->FAM, counted_by), \
> > >__builtin_counted_by_ref(__p->FAM) = COUNT, 0); \
> > >  P = __p; \
> > > })
> > > 
> > > int main(int argc, char *argv[])
> > > {
> > >  MY_ALLOC(array_flex, c, 20);
> > >  return 0;
> > > }
> > > [opc@qinzhao-ol8u3-x86 gcc]$ sh t
> > > ttt.c: In function ‘main’:
> > > ttt.c:13:5: error: the argument must have ‘counted_by’ attribute 
> > > ‘__builtin_counted_by_ref’
> > > ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’
> > > 
> > > I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it does 
> > > parse the __builtin_counted_by_ref(__p->FAM) even when 
> > > __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued the 
> > > error when parsing __builtin_counted_by_ref and stopped the compilation.
> > > 
> > > So, in order to support this approach, we first must fix the issue in the 
> > > current __builtin_choose_expr in GCC. Otherwise, it’s impossible for the 
> > > user to use this new builtin.
> > > 
> > > Let me know your comments and suggestions.
> > > 
> > Do you need to emit a diagnostic if the FAM doesn't have the
> > counted_by attribute? It was originally supposed to "silently fail" if
> > it didn't. We may need to do the same for Clang if so.
> 
> Yes, “silently fail” should workaround this problem if fixing the issue in 
> the current __builtin_choose_expr is too complicate. 
> 
> I will study a little bit on how to fix the issue in __builtin_choose_expr 
> first.
> 
> Martin and Joseph, any comment or suggestion from you?

My recommendation would be not to change __builtin_choose_expr.

The design where __builtin_get_counted_by  returns a null
pointer constant (void*)0 seems good.  Most users will
get an error which I think is what we want and for those
that want it to work even if the attribute is not there, the
following code seems perfectly acceptable to me:

auto p = __builtin_get_counted_by(__p->FAM)
*_Generic(p, void*: &(int){}, default: p) = 1;


Kees also seemed happy with it. And if I understood it correctly,
also Clang's bounds checking people can work with this.

Martin

[pushed] c++: adjust testcase to reveal failure [PR107919]

2024-09-06 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

This test appeared to be passing, but only because the warning was
suppressed by #pragma system_header.

PR tree-optimization/107919

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-pr107919-1.C: Add -Wsystem-headers and
xfail.
---
 gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
index 067a44a462e..049fa4d307a 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
@@ -1,6 +1,6 @@
 // { dg-do compile }
 // { dg-require-effective-target c++17 }
-// { dg-options "-O2 -Wuninitialized" }
+// { dg-options "-O2 -Wuninitialized -Wsystem-headers" }
 
 #include 
 #include 
@@ -13,3 +13,5 @@ void do_something(void* storage)
   auto& swappedValue = *reinterpret_cast(storage);
   std::swap(event, swappedValue);
 }
+
+// { dg-bogus "may be used uninitialized" "" { xfail *-*-* } 0 }

base-commit: e4b64bea337d9ac936c555154f9d60c4876b65d3
-- 
2.46.0

[committed] libstdc++: Fix std::chrono::parse for TAI and GPS clocks

2024-09-06 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk. This should be backported too.

I noticed while testing this that all the from_stream overloads for
time_point specializations use time_point_cast to convert to the correct
result type. That's wrong, but I'll fix that separately, as it affects
more than just GPS and TAI clocks.

-- >8 --

Howard Hinnant brought to my attention that chrono::parse was giving
incorrect values for chrono::gps_clock, because it was applying the
offset between the GPS clock and UTC. That's incorrect, because when we
parse HH::MM::SS as a GPS time, the result should be that time, not
HH:MM:SS+offset.

The problem was that I was using clock_cast to convert from sys_time to
utc_time and then using clock_time again to convert to gps_time. The
solution is to convert the parsed time into an duration representing the
time since the GPS clock's epoch, then construct a gps_time directly
from that duration.

As well as adding tests for correct round tripping of times for all
clocks, this also adds some more tests for correct results with
std::format.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (from_stream): Fix conversions in
overloads for gps_time and tai_time.
* testsuite/std/time/clock/file/io.cc: Test round tripping using
chrono::parse. Add additional std::format tests.
* testsuite/std/time/clock/gps/io.cc: Likewise.
* testsuite/std/time/clock/local/io.cc: Likewise.
* testsuite/std/time/clock/tai/io.cc: Likewise.
* testsuite/std/time/clock/utc/io.cc: Likewise.
---
 libstdc++-v3/include/bits/chrono_io.h | 12 +++---
 .../testsuite/std/time/clock/file/io.cc   | 23 +++
 .../testsuite/std/time/clock/gps/io.cc| 22 ++-
 .../testsuite/std/time/clock/local/io.cc  | 17 
 .../testsuite/std/time/clock/tai/io.cc| 39 ++-
 .../testsuite/std/time/clock/utc/io.cc| 22 +++
 6 files changed, 128 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 38a0b002c81..0e4d23c9bb7 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2939,8 +2939,9 @@ namespace __detail
__is.setstate(ios_base::failbit);
  else
{
- auto __st = __p._M_sys_days + __p._M_time - *__offset;
- auto __tt = tai_clock::from_utc(utc_clock::from_sys(__st));
+ constexpr sys_days __epoch(-days(4383)); // 1958y/1/1
+ auto __d = __p._M_sys_days - __epoch + __p._M_time - *__offset;
+ tai_time> __tt(__d);
  __tp = chrono::time_point_cast<_Duration>(__tt);
}
}
@@ -2977,9 +2978,10 @@ namespace __detail
__is.setstate(ios_base::failbit);
  else
{
- auto __st = __p._M_sys_days + __p._M_time - *__offset;
- auto __tt = gps_clock::from_utc(utc_clock::from_sys(__st));
- __tp = chrono::time_point_cast<_Duration>(__tt);
+ constexpr sys_days __epoch(days(3657)); // 1980y/1/Sunday[1]
+ auto __d = __p._M_sys_days - __epoch + __p._M_time - *__offset;
+ gps_time> __gt(__d);
+ __tp = chrono::time_point_cast<_Duration>(__gt);
}
}
   return __is;
diff --git a/libstdc++-v3/testsuite/std/time/clock/file/io.cc 
b/libstdc++-v3/testsuite/std/time/clock/file/io.cc
index 9ab9f10ec77..9da5019ab78 100644
--- a/libstdc++-v3/testsuite/std/time/clock/file/io.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/file/io.cc
@@ -32,6 +32,14 @@ test_format()
   auto ft = clock_cast(sys_days(2024y/January/21)) + 0ms + 2.5s;
   s = std::format("{}", ft);
   VERIFY( s == "2024-01-21 00:00:02.500");
+
+  const std::chrono::file_time t0{};
+  s = std::format("{:%Z %z %Ez %Oz}", t0);
+  VERIFY( s == "UTC + +00:00 +00:00" );
+
+  s = std::format("{}", t0);
+  // chrono::file_clock epoch is unspecified, so this is libstdc++-specific.
+  VERIFY( s == "2174-01-01 00:00:00" );
 }
 
 void
@@ -49,6 +57,21 @@ test_parse()
   VERIFY( tp == clock_cast(expected) );
   VERIFY( abbrev == "BST" );
   VERIFY( offset == 60min );
+
+  // Test round trip
+  std::stringstream ss;
+  ss << clock_cast(expected) << " 0123456";
+  VERIFY( ss >> parse("%F %T %z%Z", tp, abbrev, offset) );
+  VERIFY( ss.eof() );
+  VERIFY( (tp + offset) == clock_cast(expected) );
+  VERIFY( abbrev == "456" );
+  VERIFY( offset == (1h + 23min) );
+
+  ss.str("");
+  ss.clear();
+  ss << file_time{};
+  VERIFY( ss >> parse("%F %T", tp) );
+  VERIFY( tp.time_since_epoch() == 0s );
 }
 
 int main()
diff --git a/libstdc++-v3/testsuite/std/time/clock/gps/io.cc 
b/libstdc++-v3/testsuite/std/time/clock/gps/io.cc
index e995d9f3d78..c012520080a 100644
--- a/libstdc++-v3/testsuite/std/time/clock/gps/io.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/gps/io.cc
@@ -42,13 +42,19 @@ test_format()
   // PR libstdc++/113500

[PATCH] libstdc++: Adjust std::span::iterator to be ADL-proof

2024-09-06 Thread Jonathan Wakely

This proposed patch means that span is not an associated type of
span::iterator, which means that we won't try to complete T when
doing ADL in the constraints for const_iterator. This makes it more
reliable to use std::span.

See https://github.com/llvm/llvm-project/issues/107215 for more info on
the problem and the constraint recursion that can happen.

Does this seem worthwhile?

It would be an ABI change to do something like this for other uses of
__normal_iterator, such as std::vector and std::string. But std::span is
a C++20 feature so still experimental. I think we should consider this
for other new uses of __normal_iterator too, e.g. in .

-- >8 --

libstdc++-v3/ChangeLog:

* include/std/span (span::__iter_tag): Declare nested type.
(span::iterator): Use __iter_tag as second template argument.
---
 libstdc++-v3/include/std/span | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 00fc5279152..b7392a0500e 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -123,6 +123,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
using __is_compatible_ref
  = __is_array_convertible<_Type, remove_reference_t<_Ref>>;
 
+  // Nested type so that _Type is not an associated class of iterator.
+  struct __iter_tag;
+
 public:
   // member types
   using element_type   = _Type;
@@ -133,7 +136,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using const_pointer  = const _Type*;
   using reference  = element_type&;
   using const_reference= const element_type&;
-  using iterator = __gnu_cxx::__normal_iterator;
+  using iterator = __gnu_cxx::__normal_iterator;
   using reverse_iterator   = std::reverse_iterator;
 #if __cplusplus > 202002L
   using const_iterator = std::const_iterator;
-- 
2.46.0

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-06 Thread Bill Wendling

On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker  wrote:
>
> Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao:
> >
> > > On Sep 5, 2024, at 18:22, Bill Wendling  wrote:
> > >
> > > Hi Qing,
> > >
> > > Sorry for my late reply.
> > >
> > > On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Thanks for the information.
> > > >
> > > > Yes, providing a unary operator similar as __counted_by(PTR) as 
> > > > suggested by multiple people previously is a cleaner approach.
> > > >
> > > > Then the programmer will use the following:
> > > >
> > > > __builtin_choose_expr(
> > > > __builtin_has_attribute (__p->FAM, "counted_by”)
> > > > __builtin_get_counted_by(__p->FAM) = COUNT, 0);
> > > >
> > > > From the programmer’s point of view, it’s cleaner too.
> > > >
> > > > However, there is one issue with “__builtin_choose_expr” currently in 
> > > > GCC, its documentation explicitly mentions this limitation:  
> > > > (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr)
> > > >
> > > > "Note: This construct is only available for C. Furthermore, the unused 
> > > > expression (exp1 or exp2 depending on the value of const_exp) may still 
> > > > generate syntax errors. This may change in future revisions.”
> > > >
> > > > So, due to this limitation, when there is no counted_by attribute, the 
> > > > __builtin_get_counted_by() still is evaluated by the compiler and 
> > > > errors is issued and the compilation stops, this can be show from the 
> > > > small testing case:
> > > >
> > > > [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c
> > > >
> > > > struct flex {
> > > >  unsigned int b;
> > > >  int c[];
> > > > } *array_flex;
> > > >
> > > > #define MY_ALLOC(P, FAM, COUNT) ({ \
> > > >  typeof(P) __p; \
> > > >  unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
> > > >  __p = (typeof(P)) __builtin_malloc(__size); \
> > > >  __builtin_choose_expr( \
> > > >__builtin_has_attribute (__p->FAM, counted_by), \
> > > >__builtin_counted_by_ref(__p->FAM) = COUNT, 0); \
> > > >  P = __p; \
> > > > })
> > > >
> > > > int main(int argc, char *argv[])
> > > > {
> > > >  MY_ALLOC(array_flex, c, 20);
> > > >  return 0;
> > > > }
> > > > [opc@qinzhao-ol8u3-x86 gcc]$ sh t
> > > > ttt.c: In function ‘main’:
> > > > ttt.c:13:5: error: the argument must have ‘counted_by’ attribute 
> > > > ‘__builtin_counted_by_ref’
> > > > ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’
> > > >
> > > > I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it 
> > > > does parse the __builtin_counted_by_ref(__p->FAM) even when 
> > > > __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued the 
> > > > error when parsing __builtin_counted_by_ref and stopped the compilation.
> > > >
> > > > So, in order to support this approach, we first must fix the issue in 
> > > > the current __builtin_choose_expr in GCC. Otherwise, it’s impossible 
> > > > for the user to use this new builtin.
> > > >
> > > > Let me know your comments and suggestions.
> > > >
> > > Do you need to emit a diagnostic if the FAM doesn't have the
> > > counted_by attribute? It was originally supposed to "silently fail" if
> > > it didn't. We may need to do the same for Clang if so.
> >
> > Yes, “silently fail” should workaround this problem if fixing the issue in 
> > the current __builtin_choose_expr is too complicate.
> >
> > I will study a little bit on how to fix the issue in __builtin_choose_expr 
> > first.
> >
> > Martin and Joseph, any comment or suggestion from you?
>
> My recommendation would be not to change __builtin_choose_expr.
>
> The design where __builtin_get_counted_by  returns a null
> pointer constant (void*)0 seems good.  Most users will
> get an error which I think is what we want and for those
> that want it to work even if the attribute is not there, the
> following code seems perfectly acceptable to me:
>
> auto p = __builtin_get_counted_by(__p->FAM)
> *_Generic(p, void*: &(int){}, default: p) = 1;
>
>
> Kees also seemed happy with it. And if I understood it correctly,
> also Clang's bounds checking people can work with this.
>
The problem with this is explained in the Clang RFC [1]. Apple's team
rejects taking the address of the 'counter' field when using
-fbounds-safety. They suggested this as an alternative:

  __builtin_bounds_attr_arg(ptr->FAM) = COUNT;

The __builtin_bounds_attr_arg(ptr->FAM) is replaced by an L-value to
the 'ptr->count' field during SEMA, and life goes on as normal. There
are a few reasons for this:

  1. They want to track the relationship between the FAM and the
counter so that one isn't modified without the other being modified.
Allowing the address to be taken makes that check vastly harder.

  2. Apple's implementation supports expressions in the '__counted_by'
attribute, thus the 'count' may be an R-value that can't have its
address taken.

[1] 
https://discourse.llvm.org/t/rfc-introducing-new-clang

Re: [PATCH RFC] c-family: add attribute flag_enum [PR46457]

2024-09-06 Thread Jonathan Wakely

On Fri, 6 Sept 2024 at 20:17, Jason Merrill  wrote:
>
> On 9/6/24 8:56 AM, Jonathan Wakely wrote:
> > On 05/09/24 21:44 -0400, Jason Merrill wrote:
> >> On 9/4/24 11:02 AM, Marek Polacek wrote:
>  +handle_flag_enum_attribute (tree *node, tree ARG_UNUSED(name), tree
>  args,
>  +int ARG_UNUSED (flags), bool *no_add_attrs)
>  +{
>  +  if (args)
>  +warning (OPT_Wattributes, "%qE attribute arguments ignored",
>  name);
> >>> You don't need this check I think; if the # of args isn't correct, we
> >>> should not get here.  Then the goto can...go too.
> >>
> >> Dropped.
> >>
> >> On 9/4/24 11:28 AM, Eric Gallager wrote:
> >>>
> >>> Question about PR tagging: should PR c++/81665 be tagged here, too?
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81665
> >>
> >> Added.
> >>
> >> Here's what I'm pushing:
> >
> >
> >> diff --git a/libstdc++-v3/include/bits/regex_constants.h
> >> b/libstdc++-v3/include/bits/regex_constants.h
> >> index 437895f1dc3..4148093bc4e 100644
> >> --- a/libstdc++-v3/include/bits/regex_constants.h
> >> +++ b/libstdc++-v3/include/bits/regex_constants.h
> >> @@ -66,7 +66,7 @@ namespace regex_constants
> >>* elements @c ECMAScript, @c basic, @c extended, @c awk, @c grep,
> >> @c egrep
> >>* %set.
> >>*/
> >> -  enum syntax_option_type : unsigned int
> >> +  enum [[gnu::flag_enum]] syntax_option_type : unsigned int
> >
> > This needs to be [[__gnu__::__flag_enum__]] because valid programs can
> > #define gnu 1
> > #define flag_enum 1
>
> Oops, fixed:

Thanks!

[PATCH] gimple-fold: Move optimizing memcpy to memset to fold_stmt from fab

2024-09-06 Thread Andrew Pinski

I noticed this folding inside fab could be done else where and could
even improve inlining decisions and a few other things so let's
move it to fold_stmt.
It also fixes PR 116601 because places which call fold_stmt already
have to deal with the stmt becoming a non-throw statement.

For the fix for PR 116601 on the branches should be the original patch
rather than a backport of this one.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116601

gcc/ChangeLog:

* gimple-fold.cc (optimize_memcpy_to_memset): Move
from tree-ssa-ccp.cc and rename. Also return true
if the optimization happened.
(gimple_fold_builtin_memory_op): Call
optimize_memcpy_to_memset.
(fold_stmt_1): Call optimize_memcpy_to_memset for
load/store copies.
* tree-ssa-ccp.cc (optimize_memcpy): Delete.
(pass_fold_builtins::execute): Remove code that
calls optimize_memcpy.

gcc/testsuite/ChangeLog:

* gcc.dg/pr78408-1.c: Adjust dump scan to match where
the optimization now happens.
* g++.dg/torture/except-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc  | 134 
 gcc/testsuite/g++.dg/torture/except-2.C |  18 
 gcc/testsuite/gcc.dg/pr78408-1.c|   5 +-
 gcc/tree-ssa-ccp.cc | 132 +--
 4 files changed, 156 insertions(+), 133 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 2746fcfe314..942de7720fd 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -894,6 +894,121 @@ size_must_be_zero_p (tree size)
   return vr.zero_p ();
 }
 
+/* Optimize
+   a = {};
+   b = a;
+   into
+   a = {};
+   b = {};
+   Similarly for memset (&a, ..., sizeof (a)); instead of a = {};
+   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;  */
+
+static bool
+optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, tree dest, tree src, 
tree len)
+{
+  gimple *stmt = gsi_stmt (*gsip);
+  if (gimple_has_volatile_ops (stmt))
+return false;
+
+  tree vuse = gimple_vuse (stmt);
+  if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
+return false;
+
+  gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
+  tree src2 = NULL_TREE, len2 = NULL_TREE;
+  poly_int64 offset, offset2;
+  tree val = integer_zero_node;
+  if (gimple_store_p (defstmt)
+  && gimple_assign_single_p (defstmt)
+  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR
+  && !gimple_clobber_p (defstmt))
+src2 = gimple_assign_lhs (defstmt);
+  else if (gimple_call_builtin_p (defstmt, BUILT_IN_MEMSET)
+  && TREE_CODE (gimple_call_arg (defstmt, 0)) == ADDR_EXPR
+  && TREE_CODE (gimple_call_arg (defstmt, 1)) == INTEGER_CST)
+{
+  src2 = TREE_OPERAND (gimple_call_arg (defstmt, 0), 0);
+  len2 = gimple_call_arg (defstmt, 2);
+  val = gimple_call_arg (defstmt, 1);
+  /* For non-0 val, we'd have to transform stmt from assignment
+into memset (only if dest is addressable).  */
+  if (!integer_zerop (val) && is_gimple_assign (stmt))
+   src2 = NULL_TREE;
+}
+
+  if (src2 == NULL_TREE)
+return false;
+
+  if (len == NULL_TREE)
+len = (TREE_CODE (src) == COMPONENT_REF
+  ? DECL_SIZE_UNIT (TREE_OPERAND (src, 1))
+  : TYPE_SIZE_UNIT (TREE_TYPE (src)));
+  if (len2 == NULL_TREE)
+len2 = (TREE_CODE (src2) == COMPONENT_REF
+   ? DECL_SIZE_UNIT (TREE_OPERAND (src2, 1))
+   : TYPE_SIZE_UNIT (TREE_TYPE (src2)));
+  if (len == NULL_TREE
+  || !poly_int_tree_p (len)
+  || len2 == NULL_TREE
+  || !poly_int_tree_p (len2))
+return false;
+
+  src = get_addr_base_and_unit_offset (src, &offset);
+  src2 = get_addr_base_and_unit_offset (src2, &offset2);
+  if (src == NULL_TREE
+  || src2 == NULL_TREE
+  || maybe_lt (offset, offset2))
+return false;
+
+  if (!operand_equal_p (src, src2, 0))
+return false;
+
+  /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
+ Make sure that
+ [ src + offset, src + offset + len - 1 ] is a subset of that.  */
+  if (maybe_gt (wi::to_poly_offset (len) + (offset - offset2),
+   wi::to_poly_offset (len2)))
+return false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Simplified\n  ");
+  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+  fprintf (dump_file, "after previous\n  ");
+  print_gimple_stmt (dump_file, defstmt, 0, dump_flags);
+}
+
+  /* For simplicity, don't change the kind of the stmt,
+ turn dest = src; into dest = {}; and memcpy (&dest, &src, len);
+ into memset (&dest, val, len);
+ In theory we could change dest = src into memset if dest
+ is addressable (maybe beneficial if val is not 0), or
+ memcpy (&dest, &src, len) into dest = {} if len is the size
+ of dest, dest isn't volatile.  */
+  if (is_gim

Re: [PATCH v4] RISC-V: Fix illegal operands "th.vsetvli zero, 0, e32, m8" for XTheadVector

2024-09-06 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Jin Ma
Date: 2024-09-07 01:40
To: gcc-patches
CC: jeffreyalaw; juzhe.zhong; pan2.li; kito.cheng; jinma.contrib; Jin Ma; nihui
Subject: [PATCH v4] RISC-V: Fix illegal operands "th.vsetvli zero,0,e32,m8" for 
XTheadVector
Since the THeadVector vsetvli does not support vl as an immediate, we
need to convert 0 to zero when outputting asm.
 
PR target/116592
 
gcc/ChangeLog:
 
* config/riscv/thead.cc (th_asm_output_opcode): Change '0' to
"zero"
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.
 
Reported-by: nihui 
---
gcc/config/riscv/thead.cc |  4 +-
.../riscv/rvv/xtheadvector/pr116592.c | 38 +++
2 files changed, 40 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
 
diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 2f1d83fbbc7f..707d91076eb5 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -960,11 +960,11 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
  if (strstr (p, "zero,zero"))
return "th.vsetvli\tzero,zero,e%0,%m1";
  else
- return "th.vsetvli\tzero,%0,e%1,%m2";
+ return "th.vsetvli\tzero,%z0,e%1,%m2";
}
  else
{
-   return "th.vsetvli\t%0,%1,e%2,%m3";
+   return "th.vsetvli\t%z0,%z1,e%2,%m3";
}
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
new file mode 100644
index ..a7cd8c5bdb72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr116592.c
@@ -0,0 +1,38 @@
+/* { dg-do assemble } */
+/* { dg-options "-march=rv32gc_zfh_xtheadvector -mabi=ilp32d -O2 -save-temps" 
{ target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zfh_xtheadvector -mabi=lp64d -O2 -save-temps" { 
target { rv64 } } } */
+
+#include 
+#include 
+
+static vfloat32m8_t atan2_ps(vfloat32m8_t a, vfloat32m8_t b, size_t vl)
+{
+  float tmpx[vl];
+  float tmpy[vl];
+  __riscv_vse32_v_f32m8(tmpx, a, vl);
+  __riscv_vse32_v_f32m8(tmpy, b, vl);
+  for (size_t i = 0; i < vl; i++)
+  {
+tmpx[i] = atan2(tmpx[i], tmpy[i]);
+  }
+  return __riscv_vle32_v_f32m8(tmpx, vl);
+}
+
+void my_atan2(const float *x, const float *y, float *out, int size)
+{
+  int n = size;
+  while (n > 0)
+  {
+size_t vl = __riscv_vsetvl_e32m8(n);
+vfloat32m8_t _x = __riscv_vle32_v_f32m8(x, vl);
+vfloat32m8_t _y = __riscv_vle32_v_f32m8(y, vl);
+vfloat32m8_t _out = atan2_ps(_x, _y, vl);
+__riscv_vse32_v_f32m8(out, _out, vl);
+n -= vl;
+x += vl;
+y += vl;
+out += vl;
+  }
+}
+
+/* { dg-final { scan-assembler-not {th\.vsetvli\s+zero,0} } } */
-- 
2.17.1

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-09-06 Thread Qing Zhao

Now, if

1. __builtin_get_counted_by should return a LVALUE instead of a pointer 
(required by CLANG’s design)
And
2. It’s better not to change the behavior of __builtin_choose_expr.

Then the solution left is:

__builtin_get_counted_by (p->FAM) returns a LVALUE as p->COUNT if p->FAM has a 
counted_by attribute, if p->FAM does not have a counted_by attribute, silently 
do nothing. (Or just issue warning if Linux is OKEY with such waning). 

Is this acceptable?

thanks.

Qing
> On Sep 6, 2024, at 16:59, Bill Wendling  wrote:
> 
> On Fri, Sep 6, 2024 at 12:32 PM Martin Uecker  wrote:
>> 
>> Am Freitag, dem 06.09.2024 um 13:59 + schrieb Qing Zhao:
>>> 
 On Sep 5, 2024, at 18:22, Bill Wendling  wrote:
 
 Hi Qing,
 
 Sorry for my late reply.
 
 On Thu, Aug 29, 2024 at 7:22 AM Qing Zhao  wrote:
> 
> Hi,
> 
> Thanks for the information.
> 
> Yes, providing a unary operator similar as __counted_by(PTR) as suggested 
> by multiple people previously is a cleaner approach.
> 
> Then the programmer will use the following:
> 
> __builtin_choose_expr(
>__builtin_has_attribute (__p->FAM, "counted_by”)
>__builtin_get_counted_by(__p->FAM) = COUNT, 0);
> 
> From the programmer’s point of view, it’s cleaner too.
> 
> However, there is one issue with “__builtin_choose_expr” currently in 
> GCC, its documentation explicitly mentions this limitation:  
> (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fchoose_005fexpr)
> 
> "Note: This construct is only available for C. Furthermore, the unused 
> expression (exp1 or exp2 depending on the value of const_exp) may still 
> generate syntax errors. This may change in future revisions.”
> 
> So, due to this limitation, when there is no counted_by attribute, the 
> __builtin_get_counted_by() still is evaluated by the compiler and errors 
> is issued and the compilation stops, this can be show from the small 
> testing case:
> 
> [opc@qinzhao-ol8u3-x86 gcc]$ cat ttt.c
> 
> struct flex {
> unsigned int b;
> int c[];
> } *array_flex;
> 
> #define MY_ALLOC(P, FAM, COUNT) ({ \
> typeof(P) __p; \
> unsigned int __size = sizeof(*P) + sizeof(*P->FAM) * COUNT; \
> __p = (typeof(P)) __builtin_malloc(__size); \
> __builtin_choose_expr( \
>   __builtin_has_attribute (__p->FAM, counted_by), \
>   __builtin_counted_by_ref(__p->FAM) = COUNT, 0); \
> P = __p; \
> })
> 
> int main(int argc, char *argv[])
> {
> MY_ALLOC(array_flex, c, 20);
> return 0;
> }
> [opc@qinzhao-ol8u3-x86 gcc]$ sh t
> ttt.c: In function ‘main’:
> ttt.c:13:5: error: the argument must have ‘counted_by’ attribute 
> ‘__builtin_counted_by_ref’
> ttt.c:19:3: note: in expansion of macro ‘MY_ALLOC’
> 
> I checked the FE code on handling “__buiiltin_choose_expr”, Yes, it does 
> parse the __builtin_counted_by_ref(__p->FAM) even when 
> __builtin_has_attribute(__p->FAM, counted_by) is FALSE, and issued the 
> error when parsing __builtin_counted_by_ref and stopped the compilation.
> 
> So, in order to support this approach, we first must fix the issue in the 
> current __builtin_choose_expr in GCC. Otherwise, it’s impossible for the 
> user to use this new builtin.
> 
> Let me know your comments and suggestions.
> 
 Do you need to emit a diagnostic if the FAM doesn't have the
 counted_by attribute? It was originally supposed to "silently fail" if
 it didn't. We may need to do the same for Clang if so.
>>> 
>>> Yes, “silently fail” should workaround this problem if fixing the issue in 
>>> the current __builtin_choose_expr is too complicate.
>>> 
>>> I will study a little bit on how to fix the issue in __builtin_choose_expr 
>>> first.
>>> 
>>> Martin and Joseph, any comment or suggestion from you?
>> 
>> My recommendation would be not to change __builtin_choose_expr.
>> 
>> The design where __builtin_get_counted_by  returns a null
>> pointer constant (void*)0 seems good.  Most users will
>> get an error which I think is what we want and for those
>> that want it to work even if the attribute is not there, the
>> following code seems perfectly acceptable to me:
>> 
>> auto p = __builtin_get_counted_by(__p->FAM)
>> *_Generic(p, void*: &(int){}, default: p) = 1;
>> 
>> 
>> Kees also seemed happy with it. And if I understood it correctly,
>> also Clang's bounds checking people can work with this.
>> 
> The problem with this is explained in the Clang RFC [1]. Apple's team
> rejects taking the address of the 'counter' field when using
> -fbounds-safety. They suggested this as an alternative:
> 
>  __builtin_bounds_attr_arg(ptr->FAM) = COUNT;
> 
> The __builtin_bounds_attr_arg(ptr->FAM) is replaced by an L-value to
> the 'ptr->count' field during SEMA, and life goes on as normal. There
>

RE: [PATCH] RISC-V: Fix ICE for rvv in lto

2024-09-06 Thread Li, Pan2

> +/* Test that we do not have ice when compile */
> +
> +/* { dg-do run } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=zvl -flto -O2 
> -fno-checking" } */
> +
> +#include 
> +
> +int
> +main ()
> +{
> +  size_t vl = 8;
> +  vint32m1_t vs1 = {};
> +  vint32m1_t vs2 = {};
> +
> +  __volatile__ vint32m1_t vd = __riscv_vadd_vv_i32m1(vs1, vs2, vl);
> +
> +  return 0;
> +}

Interesting, do we still have ice when there is no __voltaile__ for vd? As well 
as gcc-14 branch.
Because it is quite a common case that should be covered by test already.

Pan

-Original Message-
From: Jin Ma  
Sent: Saturday, September 7, 2024 1:31 AM
To: gcc-patches@gcc.gnu.org
Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; Li, Pan2 ; 
kito.ch...@gmail.com; jinma.cont...@gmail.com; Jin Ma 
Subject: [PATCH] RISC-V: Fix ICE for rvv in lto

When we use flto, the function list of rvv will be generated twice,
once in the cc1 phase and once in the lto phase. However, due to
the different generation methods, the two lists are different.

For example, when there is no zvfh or zvfhmin in arch, it is
generated by calling function "riscv_pragma_intrinsic". since the
TARGET_VECTOR_ELEN_FP_16 is enabled before rvv function generation,
a list of rvv functions related to float16 will be generated. In
the lto phase, the rvv function list is generated only by calling
the function "riscv_init_builtins", but the TARGET_VECTOR_ELEN_FP_16
is disabled, so that the float16-related rvv function list cannot
be generated like cc1. This will cause confusion, resulting in
matching tothe wrong function due to inconsistent fcode in the lto
phase, eventually leading to ICE.

So I think we should be consistent with their generated lists, which
is exactly what this patch does.

But there is still a problem here. If we use "-fchecking", we still
have ICE. This is because in the lto phase, after the rvv function
list is generated and before the expand_builtin, the ggc_grow will
be called to clean up the memory, resulting in
"(* registered_functions)[code]->decl" being cleaned up to
", and finally ICE".

I think this is wrong and needs to be fixed, maybe we shouldn't
use "ggc_alloc ()", or is there another better
way to implement it?

I'm trying to fix it here. Any comments here?

gcc/ChangeLog:

* config/riscv/riscv-c.cc (struct pragma_intrinsic_flags): Mov
to riscv-protos.h.
(riscv_pragma_intrinsic_flags_pollute): Mov to riscv-vector-builtins.c.
(riscv_pragma_intrinsic_flags_restore): Likewise.
(riscv_pragma_intrinsic): Likewise.
* config/riscv/riscv-protos.h (struct pragma_intrinsic_flags):
New.
(riscv_pragma_intrinsic_flags_restore): New.
(riscv_pragma_intrinsic_flags_pollute): New.
* config/riscv/riscv-vector-builtins.cc 
(riscv_pragma_intrinsic_flags_pollute): New.
(riscv_pragma_intrinsic_flags_restore): New.
(handle_pragma_vector_for_lto): New.
(init_builtins): Correct the processing logic for lto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-10.c: New test.
---
 gcc/config/riscv/riscv-c.cc   | 70 +---
 gcc/config/riscv/riscv-protos.h   | 13 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 83 ++-
 .../gcc.target/riscv/rvv/base/bug-10.c| 18 
 4 files changed, 114 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-10.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 71112d9c66d7..7037ecc1268a 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -34,72 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #define builtin_define(TXT) cpp_define (pfile, TXT)
 
-struct pragma_intrinsic_flags
-{
-  int intrinsic_target_flags;
-
-  int intrinsic_riscv_vector_elen_flags;
-  int intrinsic_riscv_zvl_flags;
-  int intrinsic_riscv_zvb_subext;
-  int intrinsic_riscv_zvk_subext;
-};
-
-static void
-riscv_pragma_intrinsic_flags_pollute (struct pragma_intrinsic_flags *flags)
-{
-  flags->intrinsic_target_flags = target_flags;
-  flags->intrinsic_riscv_vector_elen_flags = riscv_vector_elen_flags;
-  flags->intrinsic_riscv_zvl_flags = riscv_zvl_flags;
-  flags->intrinsic_riscv_zvb_subext = riscv_zvb_subext;
-  flags->intrinsic_riscv_zvk_subext = riscv_zvk_subext;
-
-  target_flags = target_flags
-| MASK_VECTOR;
-
-  riscv_zvl_flags = riscv_zvl_flags
-| MASK_ZVL32B
-| MASK_ZVL64B
-| MASK_ZVL128B;
-
-  riscv_vector_elen_flags = riscv_vector_elen_flags
-| MASK_VECTOR_ELEN_32
-| MASK_VECTOR_ELEN_64
-| MASK_VECTOR_ELEN_FP_16
-| MASK_VECTOR_ELEN_FP_32
-| MASK_VECTOR_ELEN_FP_64;
-
-  riscv_zvb_subext = riscv_zvb_subext
-| MASK_ZVBB
-| MASK_ZVBC
-| MASK_ZVKB;
-
-  riscv_zvk_subext = riscv_zvk_subext
-| MASK_ZVKG
-| MASK_ZVKNED
-| MASK_ZVKNHA
-| MASK_ZVKNHB
-| MASK_ZVKSED
-| MASK_ZVKSH
-

1 2 >

1 - 100 of 104 matches

Mail list logo