Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-26 Thread Martin Uecker


I haven't looked at the patch, but it sounds you give the result
the wrong type. Then patching up all use cases instead of the
type seems wrong.

Martin


Am Donnerstag, dem 25.01.2024 um 20:11 + schrieb Qing Zhao:
> Thanks a lot for the testing.
> 
> Yes, I can repeat the issue with the following small example:
> 
> #include 
> #include 
> #include 
> 
> #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
> 
> struct untracked {
>int size;
>int array[] __attribute__((counted_by (size)));
> } *a;
> struct untracked * alloc_buf (int index)
> {
>   struct untracked *p;
>   p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
> (offsetof (struct untracked, array[0])
>  + (index) * sizeof (int;
>   p->size = index;
>   return p;
> }
> 
> int main()
> {
>   a = alloc_buf(10);
>  printf ("same_type is %d\n",
>   (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->array)[0];
>   return 0;
> }
> 
> 
> /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
> same_type is 1
> 
> Looks like that the “typeof” operator need to be handled specially in C FE
>  for the new internal function .ACCESS_WITH_SIZE. 
> 
> (I have specially handle the operator “offsetof” in C FE already).
> 
> Will fix this issue.
> 
> Thanks.
> 
> Qing
> 
> > On Jan 24, 2024, at 7:51 PM, Kees Cook  wrote:
> > 
> > On Wed, Jan 24, 2024 at 12:29:51AM +, Qing Zhao wrote:
> > > This is the 4th version of the patch.
> > 
> > Thanks very much for this!
> > 
> > I tripped over an unexpected behavioral change that the Linux kernel
> > depends on:
> > 
> > __builtin_types_compatible_p() no longer treats an array marked with
> > counted_by as different from that array's decayed pointer. Specifically,
> > the kernel uses these macros:
> > 
> > 
> > /*
> > * Force a compilation error if condition is true, but also produce a
> > * result (of value 0 and type int), so the expression can be used
> > * e.g. in a structure initializer (or where-ever else comma expressions
> > * aren't permitted).
> > */
> > #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
> > 
> > #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
> > 
> > /* &a[0] degrades to a pointer: a different type from an array */
> > #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
> > 
> > 
> > This gets used in various places to make sure we're dealing with an
> > array for a macro:
> > 
> > #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
> > __must_be_array(arr))
> > 
> > 
> > So this builds:
> > 
> > struct untracked {
> >int size;
> >int array[];
> > } *a;
> > 
> > __must_be_array(a->array)
> > => 0 (as expected)
> > __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
> > => 0 (as expected, array vs decayed array pointer)
> > 
> > 
> > But if counted_by is added, we get a build failure:
> > 
> > struct tracked {
> >int size;
> >int array[] __counted_by(size);
> > } *b;
> > 
> > __must_be_array(b->array)
> > => build failure (not expected)
> > __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
> > => 1 (not expected, both pointers?)
> > 
> > 
> > 
> > 
> > -- 
> > Kees Cook
> 



Re: [PATCH v1] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-26 Thread chenglulu



在 2024/1/24 下午5:36, Li Wei 写道:

We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/vect-10.f90: New test.
---
  gcc/config/loongarch/loongarch.cc  | 42 +
  gcc/testsuite/gfortran.dg/vect/vect-10.f90 | 71 ++
  2 files changed, 113 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/vect/vect-10.f90

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 072c68d97e3..32a0b6f43e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4096,6 +4096,36 @@ 
loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi
return 1 << ceil_log2 (uf);
  }
  
+static bool

+loongarch_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign)
+return false;
+  tree_code code = gimple_assign_rhs_code (assign);
+  if (code != PLUS_EXPR && code != MINUS_EXPR)
+return false;
+
+  auto is_mul_result = [&](int i)
+{
+  tree rhs = gimple_op (assign, i);
+  if (TREE_CODE (rhs) != SSA_NAME)
+   return false;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   return false;
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
+   return false;
+
+  return true;
+};
+
+  return is_mul_result (1) || is_mul_result (2);
+}
+
  unsigned
  loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   stmt_vec_info stmt_info, slp_tree,
@@ -4108,6 +4138,18 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  {
int stmt_cost = loongarch_builtin_vectorization_cost (kind, vectype,
misalign);
+  if (vectype && stmt_info)
+   {
+ gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+ machine_mode mode = TYPE_MODE (vectype);


Hi, Liwei:

I think the code here needs to be commented.

Thanks.


+ if (kind == vector_stmt && GET_MODE_SIZE (mode) == 16 && assign)
+   {
+ if (!vect_is_reduction (stmt_info)
+ && loongarch_multiply_add_p (m_vinfo, stmt_info))
+   stmt_cost = 0;
+   }
+   }
+
retval = adjust_cost_for_freq (stmt_info, where, count * stmt_cost);
m_costs[where] += retval;
  




Re:[pushed] [PATCH v3] LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT

2024-01-26 Thread chenglulu

Pushed to r14-8446.

在 2024/1/16 上午10:32, Jiahao Xu 写道:

Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.

SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6%
on 3A6000.

gcc/ChangeLog:

* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/short-circuit.c: New test.

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 4e6ede926d3..8b453ab3140 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -869,6 +869,7 @@ typedef struct {
 1 is the default; other values are interpreted relative to that.  */
  
  #define BRANCH_COST(speed_p, predictable_p) la_branch_cost

+#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
  
  /* Return the asm template for a conditional branch instruction.

 OPCODE is the opcode's mnemonic and OPERANDS is the asm template for
diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c 
b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
new file mode 100644
index 000..bed585ee172
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-gimple" } */
+
+int
+short_circuit (float *a)
+{
+  float t1x = a[0];
+  float t2x = a[1];
+  float t1y = a[2];
+  float t2y = a[3];
+  float t1z = a[4];
+  float t2z = a[5];
+
+  if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y < 
t1z)
+return 0;
+
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "if" 6 "gimple" } } */




Re: [pushed][PATCH v3] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-26 Thread chenglulu



在 2024/1/26 下午3:32, Richard Biener 写道:

On Fri, Jan 26, 2024 at 7:23 AM chenxiaolong  wrote:

gcc/testsuite/ChangeLog:

OK


Pushed to r14-8445.

Thank you everyone for your review!




 * gcc.dg/signbit-2.c: Added additional "-mlsx" compilation options.
 * gfortran.dg/graphite/vect-pr40979.f90: Dito.
 * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
---
  gcc/testsuite/gcc.dg/signbit-2.c   | 1 +
  gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90| 1 +
  gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
  3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index 62bb4047d74..5511bb78149 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -5,6 +5,7 @@
  /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
x86_64-*-* } } } */
  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
  /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
+/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
  /* { dg-skip-if "no fallback for MVE" { arm_mve } } */

  #include 
diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 
b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
index a42290948c4..6f2ad1166a4 100644
--- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
@@ -1,6 +1,7 @@
  ! { dg-do compile }
  ! { dg-require-effective-target vect_double }
  ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && 
ilp32 } } }
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }

  module mqc_m
  integer, parameter, private :: longreal = selected_real_kind(15,90)
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 08965cc5e20..97b88821731 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -2,6 +2,7 @@
  ! { dg-require-effective-target vect_double }
  ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
  ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } 
} }
+! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
  ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }

  *** RESID COMPUTES THE RESIDUAL:  R = V - AU
--
2.20.1





[PATCH] genopinit: Split init_all_optabs [PR113575]

2024-01-26 Thread Robin Dapp
Hi,

init_all_optabs initializes > 1 patterns for riscv targets.  This
leads to pathological situations in dataflow analysis (which can occur
with many adjacent stores).
To alleviate this this patch makes genopinit split the init_all_optabs
function into several init_optabs_xx functions that each initialize 1000
patterns.

With this change insn-opinit.cc's compilation time is reduced from 4+
minutes to 1:30 and memory consumption decreases from 1.2G to 630M.

Bootstrapped and regtested on x86 and aarch64 (where we do split) and
on power10 (where we don't).  Regtested on riscv.

Regards
 Robin

gcc/ChangeLog:

PR other/113575

* genopinit.cc (main): Split init_all_optabs into functions
of 1000 patterns each.
---
 gcc/genopinit.cc | 43 ++-
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 88ccafa5b2c..d8682b2a9ad 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -367,11 +367,44 @@ main (int argc, const char **argv)
 fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
   fprintf (s_file, "};\n\n");
 
-  fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
*optabs)\n{\n");
-  fprintf (s_file, "  bool *ena = optabs->pat_enable;\n");
-  for (i = 0; patterns.iterate (i, &p); ++i)
-fprintf (s_file, "  ena[%u] = HAVE_%s;\n", i, p->name);
-  fprintf (s_file, "}\n\n");
+  /* Some targets like riscv have a large number of patterns.  In order to
+ prevent pathological situations in dataflow analysis split the init
+ function into separate ones that initialize 1000 patterns each.  */
+
+  const int patterns_per_function = 1000;
+
+  if (patterns.length () > patterns_per_function)
+{
+  unsigned num_init_functions
+   = patterns.length () / patterns_per_function + 1;
+  for (i = 0; i < num_init_functions; i++)
+   {
+ fprintf (s_file, "static void\ninit_optabs_%02d "
+  "(struct target_optabs *optabs)\n{\n", i);
+ fprintf (s_file, "  bool *ena = optabs->pat_enable;\n");
+ unsigned start = i * patterns_per_function;
+ unsigned end = MIN (patterns.length (),
+ (i + 1) * patterns_per_function);
+ for (j = start; j < end; ++j)
+   fprintf (s_file, "  ena[%u] = HAVE_%s;\n", j, patterns[j].name);
+ fprintf (s_file, "}\n\n");
+   }
+
+  fprintf (s_file, "void\ninit_all_optabs "
+  "(struct target_optabs *optabs)\n{\n");
+  for (i = 0; i < num_init_functions; ++i)
+   fprintf (s_file, "  init_optabs_%02d (optabs);\n", i);
+  fprintf (s_file, "}\n\n");
+}
+  else
+{
+  fprintf (s_file, "void\ninit_all_optabs "
+  "(struct target_optabs *optabs)\n{\n");
+  fprintf (s_file, "  bool *ena = optabs->pat_enable;\n");
+  for (i = 0; patterns.iterate (i, &p); ++i)
+   fprintf (s_file, "  ena[%u] = HAVE_%s;\n", i, p->name);
+  fprintf (s_file, "}\n\n");
+}
 
   fprintf (s_file,
   "/* Returns TRUE if the target supports any of the partial vector\n"
-- 
2.43.0


Re: [pushed][PATCH v1] LoongArch: Optimize implementation of single-precision floating-point approximate division.

2024-01-26 Thread chenglulu

Pushed to r14-8444.

在 2024/1/24 下午5:44, Li Wei 写道:

We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implementation
of single-precision floating-point approximate division, failing to propose
invariants in the loop2_invariant pass. To this end, the intermediate temporary
calculation results are stored in new pseudo-registers without destroying the
read-write dependency, so that they could be recognized as loop invariants in
the loop2_invariant pass.
After optimization, the number of instructions of 521.wrf is reduced by 0.18%
compared with before optimization (1716612948501 -> 1713471771364).

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/invariant-recip.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 19 +++
  .../gcc.target/loongarch/invariant-recip.c| 33 +++
  2 files changed, 46 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/invariant-recip.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 32a0b6f43e8..1b88147fd8c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10894,16 +10894,23 @@ void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, 
machine_mode mode)
/* x0 = 1./b estimate.  */
emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
  unspec)));
-  /* 2.0 - b * x0  */
+  /* e0 = 2.0 - b * x0.  */
emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode,
   gen_rtx_NEG (mode, b), x0, mtwo)));
  
-  /* x0 = a * x0  */

if (a != CONST1_RTX (mode))
-emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0)));
-
-  /* res = e0 * x0  */
-  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+{
+  rtx e1 = gen_reg_rtx (mode);
+  /* e1 = a * x0.  */
+  emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, a, x0)));
+  /* res = e0 * e1.  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, e1)));
+}
+  else
+{
+  /* res = e0 * x0.  */
+  emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0)));
+}
  }
  
  static bool

diff --git a/gcc/testsuite/gcc.target/loongarch/invariant-recip.c 
b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
new file mode 100644
index 000..2f64f6ed5e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=loongarch64 -mabi=lp64d -mrecip -mfrecipe 
-fdump-rtl-loop2_invariant " } */
+/* { dg-final { scan-rtl-dump "Decided to move dependent invariant" 
"loop2_invariant" } } */
+
+void
+nislfv_rain_plm (int im, int km, float dzl[im][km], float rql[im][km],
+ float dt)
+{
+  int i, k;
+  float con1, decfl;
+  float dz[km], qn[km], wi[km + 1];
+
+  for (i = 0; i < im; i++)
+{
+  for (k = 0; k < km; k++)
+{
+  dz[k] = dzl[i][k];
+}
+  con1 = 0.05;
+  for (k = km - 1; k >= 0; k--)
+{
+  decfl = (wi[k + 1] - wi[k]) * dt / dz[k];
+  if (decfl > con1)
+{
+  wi[k] = wi[k + 1] - con1 * dz[k] / dt;
+}
+}
+  for (k = 0; k < km; k++)
+{
+  rql[i][k] = qn[k];
+}
+}
+}




Re: [pushed][PATCH] LoongArch: Split vec_selects of bottom elements into simple move

2024-01-26 Thread chenglulu

Pushed to r14-8447.

在 2024/1/16 上午10:23, Jiahao Xu 写道:

For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.

(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
   (vec_select:SF (reg:V8SF 32 $f0 [115])
   (parallel [
   (const_int 0 [0])
   ])))

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_extract_0):
New define_insn_and_split patten.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-extract.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 72f7161311c..90f66ee4d24 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -761,6 +761,21 @@ (define_expand "vec_extract"
DONE;
  })
  
+(define_insn_and_split "vec_extract_0"

+  [(set (match_operand: 0 "register_operand" "=f")
+(vec_select:
+  (match_operand:FLASX 1 "register_operand" "f")
+  (parallel [(const_int 0)])))]
+  "ISA_HAS_LSX"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+}
+  [(set_attr "move_type" "fmove")
+   (set_attr "mode" "")])
+
  (define_expand "vec_perm"
   [(match_operand:LASX 0 "register_operand")
(match_operand:LASX 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-extract.c 
b/gcc/testsuite/gcc.target/loongarch/vect-extract.c
new file mode 100644
index 000..ce126e3a4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-extract.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -mlasx -fno-vect-cost-model 
-fno-unroll-loops" } */
+/* { dg-final { scan-assembler-not "xvpickve.w" } } */
+/* { dg-final { scan-assembler-not "xvpickve.d" } } */
+
+float
+sum_float (float *a, int n) {
+  float res = 0.0;
+  for (int i = 0; i < n; i++)
+res += a[i];
+  return res;
+}
+
+double
+sum_double (double *a, int n) {
+  double res = 0.0;
+  for (int i = 0; i < n; i++)
+res += a[i];
+  return res;
+}




[Committed] RISC-V: Refine some codes of VSETVL PASS [NFC]

2024-01-26 Thread Juzhe-Zhong
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): 
Refine some codes.
(pre_vsetvl::emit_vsetvl): Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc | 69 +---
 1 file changed, 27 insertions(+), 42 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 1a398f02596..d7b40a5c813 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2930,28 +2930,19 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
   EXECUTE_IF_SET_IN_BITMAP (e, 0, expr_index, sbi)
{
  vsetvl_info &curr_info = *m_exprs[expr_index];
- if (!curr_info.valid_p ())
-   continue;
-
  edge eg = INDEX_EDGE (m_edges, ed);
- if (eg->probability == profile_probability::never ())
-   continue;
- if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
- || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
-   continue;
-
- /* When multiple set bits in earliest edge, such edge may
-have infinite loop in preds or succs or multiple conflict
-vsetvl expression which make such edge is unrelated.  We
-don't perform fusion for such situation.  */
- if (bitmap_count_bits (e) != 1)
-   continue;
-
  vsetvl_block_info &src_block_info = get_block_info (eg->src);
  vsetvl_block_info &dest_block_info = get_block_info (eg->dest);
 
- if (src_block_info.probability
- == profile_probability::uninitialized ())
+ if (!curr_info.valid_p ()
+ || eg->probability == profile_probability::never ()
+ || src_block_info.probability
+  == profile_probability::uninitialized ()
+ /* When multiple set bits in earliest edge, such edge may
+have infinite loop in preds or succs or multiple conflict
+vsetvl expression which make such edge is unrelated.  We
+don't perform fusion for such situation.  */
+ || bitmap_count_bits (e) != 1)
continue;
 
  if (src_block_info.empty_p ())
@@ -3058,29 +3049,27 @@ pre_vsetvl::earliest_fuse_vsetvl_info (int iter)
{
  vsetvl_info &prev_info = src_block_info.get_exit_info ();
  if (!prev_info.valid_p ()
- || m_dem.available_p (prev_info, curr_info))
+ || m_dem.available_p (prev_info, curr_info)
+ || !m_dem.compatible_p (prev_info, curr_info))
continue;
 
- if (m_dem.compatible_p (prev_info, curr_info))
+ if (dump_file && (dump_flags & TDF_DETAILS))
{
- if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- fprintf (dump_file, "Fuse curr info since prev info "
- "compatible with it:\n");
- fprintf (dump_file, "  prev_info: ");
- prev_info.dump (dump_file, "");
- fprintf (dump_file, "  curr_info: ");
- curr_info.dump (dump_file, "");
-   }
- m_dem.merge (prev_info, curr_info);
- if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- fprintf (dump_file, "  prev_info after fused: ");
- prev_info.dump (dump_file, "");
- fprintf (dump_file, "\n");
-   }
- changed = true;
+ fprintf (dump_file, "Fuse curr info since prev info "
+ "compatible with it:\n");
+ fprintf (dump_file, "  prev_info: ");
+ prev_info.dump (dump_file, "");
+ fprintf (dump_file, "  curr_info: ");
+ curr_info.dump (dump_file, "");
+   }
+ m_dem.merge (prev_info, curr_info);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "  prev_info after fused: ");
+ prev_info.dump (dump_file, "");
+ fprintf (dump_file, "\n");
}
+ changed = true;
}
}
 }
@@ -3344,15 +,11 @@ pre_vsetvl::emit_vsetvl ()
 {
   edge eg = INDEX_EDGE (m_edges, ed);
   sbitmap i = m_insert[ed];
-  if (bitmap_count_bits (i) < 1)
-   continue;
-
-  if (bitmap_count_bits (i) > 1)
+  if (bitmap_count_bits (i) != 1)
/* For code with infinite loop (e.g. pr61634.c), The data flow is
   completely wrong.  */
continue;
 
-  gcc_assert (bitmap_count_bits (i) == 1);
   unsigned expr_index = bitmap_first_set_bit (i);
   const vsetvl_info &info = *m_exprs[expr_index];
   gcc_assert (info.valid_p ());
-

[PATCH v2] LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.

2024-01-26 Thread Li Wei
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/vect/vect-10.f90: New test.
---
 gcc/config/loongarch/loongarch.cc  | 48 +++
 gcc/testsuite/gfortran.dg/vect/vect-10.f90 | 71 ++
 2 files changed, 119 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/vect/vect-10.f90

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b494040d165..4d99e30828b 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4096,6 +4096,37 @@ 
loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi
   return 1 << ceil_log2 (uf);
 }
 
+/* Check if assign stmt rhs op comes from a multiply-add operation.  */
+static bool
+loongarch_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign)
+return false;
+  tree_code code = gimple_assign_rhs_code (assign);
+  if (code != PLUS_EXPR && code != MINUS_EXPR)
+return false;
+
+  auto is_mul_result = [&](int i)
+{
+  tree rhs = gimple_op (assign, i);
+  if (TREE_CODE (rhs) != SSA_NAME)
+   return false;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   return false;
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
+   return false;
+
+  return true;
+};
+
+  return is_mul_result (1) || is_mul_result (2);
+}
+
 unsigned
 loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   stmt_vec_info stmt_info, slp_tree,
@@ -4108,6 +4139,23 @@ loongarch_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
 {
   int stmt_cost = loongarch_builtin_vectorization_cost (kind, vectype,
misalign);
+  if (vectype && stmt_info)
+   {
+ gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+ machine_mode mode = TYPE_MODE (vectype);
+
+ /* We found through testing that this strategy (the stmt that
+matches the multiply-add pattern) has positive returns only
+when applied to the 128-bit vector stmt, so this restriction
+is currently made.  */
+ if (kind == vector_stmt && GET_MODE_SIZE (mode) == 16 && assign)
+   {
+ if (!vect_is_reduction (stmt_info)
+ && loongarch_multiply_add_p (m_vinfo, stmt_info))
+   stmt_cost = 0;
+   }
+   }
+
   retval = adjust_cost_for_freq (stmt_info, where, count * stmt_cost);
   m_costs[where] += retval;
 
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-10.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-10.f90
new file mode 100644
index 000..b85bc2702a3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/vect-10.f90
@@ -0,0 +1,71 @@
+! { dg-do compile }
+! { dg-additional-options "-Ofast -mlsx -fvect-cost-model=dynamic" { target 
loongarch64*-*-* } }
+
+MODULE material_mod
+
+IMPLICIT NONE
+
+integer, parameter :: dfp = selected_real_kind (13, 99)
+integer, parameter :: rfp = dfp
+
+PUBLIC Mat_updateE, iepx, iepy, iepz
+
+PRIVATE
+
+integer, dimension (:, :, :), allocatable :: iepx, iepy, iepz
+real (kind = rfp), dimension (:), allocatable :: Dbdx, Dbdy, Dbdz
+integer :: imin, jmin, kmin
+integer, dimension (6) :: Exsize
+integer, dimension (6) :: Eysize
+integer, dimension (6) :: Ezsize
+integer, dimension (6) :: Hxsize
+integer, dimension (6) :: Hysize
+integer, dimension (6) :: Hzsize
+
+CONTAINS
+
+SUBROUTINE mat_updateE (nx, ny, nz, Hx, Hy, Hz, Ex, Ey, Ez)
+
+integer, intent (in) :: nx, ny, nz
+
+real (kind = rfp), intent (inout), &
+  dimension (Exsize (1) : Exsize (2), Exsize (3) : Exsize (4), Exsize (5) : 
Exsize (6)) :: Ex
+real (kind = rfp), intent (inout), &
+  dimension (Eysize (1) : Eysize (2), Eysize (3) : Eysize (4), Eysize (5) : 
Eysize (6)) :: Ey
+real (kind = rfp), intent (inout), &
+  dimension (Ezsize (1) : Ezsize (2), Ezsize (3) : Ezsize (4), Ezsize (5) : 
Ezsize (6)) :: Ez
+real (kind = rfp), intent (in),&
+  dimension (Hxsize (1) : Hxsize (2),

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> v3 -> v4:
>   1. Add macro support for TLS symbols
>   2. Added support for loading __get_tls_addr symbol address using call36.
>   3. Merge template got_load_tls_{ld/gd/le/ie}.
>   4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 87c9eafd88ae4a4339e094af08c77e7dfc9ea700 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Fri, 5 Jan 2024 18:40:06 +0800
Subject: [PATCH] LoongArch: Don't split the instructions containing relocs for
 extreme code model

The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent.  So model them as "one large
instruction", i.e. define_insn, with two output registers.  The real
address is the sum of these two registers.

The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (unspec): Add
	UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
	(la_pcrel64_two_parts): New define_insn.
	* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
	typo in the comment.
	(loongarch_call_tls_get_addr): If -mcmodel=extreme
	-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
	addressing the TLS symbol and __tls_get_addr.  Emit an REG_EQUAL
	note to allow CSE addressing __tls_get_addr.
	(loongarch_legitimize_tls_address): If -mcmodel=extreme
	-mexplicit-relocs={always,auto}, address TLS IE symbols with
	la_pcrel64_two_parts.
	(loongarch_split_symbol): If -mcmodel=extreme
	-mexplicit-relocs={always,auto}, address symbols with
	la_pcrel64_two_parts.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
	Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
	instruction sequences are not reordered by the compiler.
	(NOIPA): Disallow interprocedural optimizations.
	* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
	duplicated from func-call-extreme-1.c, include it instead.
	(dg-options): Likewise.
	* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
	Likewise.
	* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
	Likewise.
	* gcc.target/loongarch/cmodel-extreme-1.c: New test.
	* gcc.target/loongarch/cmodel-extreme-2.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 112 ++
 gcc/config/loongarch/loongarch.md |  20 
 .../gcc.target/loongarch/cmodel-extreme-1.c   |  18 +++
 .../gcc.target/loongarch/cmodel-extreme-2.c   |   7 ++
 .../loongarch/func-call-extreme-1.c   |  14 ++-
 .../loongarch/func-call-extreme-2.c   |  29 +
 .../loongarch/func-call-extreme-3.c   |   2 +-
 .../loongarch/func-call-extreme-4.c   |   2 +-
 8 files changed, 122 insertions(+), 82 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-2.c

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 481903147b8..e70ce80c7b3 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2737,7 +2737,7 @@ loongarch_add_offset (rtx temp, rtx reg, HOST_WIDE_INT offset)
   return plus_constant (Pmode, reg, offset);
 }
 
-/* The __tls_get_attr symbol.  */
+/* The __tls_get_addr symbol.  */
 static GTY (()) rtx loongarch_tls_symbol;
 
 /* Load an entry for a TLS access.  */
@@ -2777,20 +2777,22 @@ loongarch_call_tls_get_addr (rtx sym, enum loongarch_symbol_type type, rtx v0)
 
   if (loongarch_explicit_relocs_p (type))
 {
-  /* Split tls symbol to high and low.  */
-  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
-  high = loongarch_force_temporary (tmp, high);
-
   if (TARGET_CMODEL_EXTREME)
 	{
-	  rtx tmp1 = gen_reg_rtx (Pmode);
-	  emit_insn (gen_tls_low (Pmode, tmp1, gen_rtx_REG (Pmode, 0), loc));
-	  emit_insn (gen_lui_h_lo20 (tmp1, tmp1, loc));
-	  emit_insn (gen_lui_h_hi12 (tmp1, tmp1, loc));
-	  emit_move_insn (a0, gen_rtx_PLUS (Pmode, high, tmp1));
+	  rtx part1 = gen_reg_rtx (Pmode);
+	  rtx part2 = gen_reg_rtx (Pmode);
+
+	  emit_insn (gen_la_pcrel64_two_parts (part1, part2, loc));
+	  emit_move_insn (a0, gen_rtx_PLUS (Pmode, part1, part2));
 	}
   else
-	emit_insn (gen_tls_low (Pmode, a0, high, loc));
+	{
+	  /* Split tls symbol to high and low.  */
+	  rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
+
+	  high = loongarch_force_temporary (tmp, high);
+	  emit_insn (gen_tls_low (Pmode, a0, high, loc));
+	}
 }
   else
 emit_insn (loongarch_load_tls (a0, loc, type));
@@ -2870,20 +2872,30 @@ loongarch_call_tls_get_addr (rtx sym, enum

Re: [PATCH v2 3/5] C: Implement musttail attribute for returns

2024-01-26 Thread Joseph Myers
On Thu, 25 Jan 2024, Andi Kleen wrote:

> On Thu, Jan 25, 2024 at 08:08:23PM +, Joseph Myers wrote:
> > On Wed, 24 Jan 2024, Andi Kleen wrote:
> > 
> > > Implement a C23 clang compatible musttail attribute similar to the earlier
> > > C++ implementation in the C parser.
> > 
> > I'd expect diagnostics, and associated tests of those diagnostics, for:
> > 
> > * musttail attribute used with any arguments, even empty 
> > [[gnu::musttail()]], much like e.g. [[fallthrough()]] or 
> > [[maybe_unused()]] gets diagnosed.
> 
> These happen naturally because the attribute doesn't get removed when
> not in front of return, and it gets warned about like any other unknown 
> attribute:
> 
> tattr.c:5:9: warning: ‘musttail’ attribute ignored [-Wattributes]
> 5 | [[gnu::musttail]] i++;
>   | ^
> 
> I don't have tests for that but since it's not new behavior I suppose
> that's sufficient.

Each attribute should have tests that invalid uses are appropriately 
diagnosed.  See gcc.dg/c23-attr-fallthrough-2.c for examples of such tests 
in the case of the [[fallthrough]] attribute.  Some invalid uses may be 
diagnosed by existing code that's generic across attributes, others 
require specific code for the individual attribute.

The default parsing of an attribute without an entry in the table of 
attribute handlers is that arbitrary balanced token sequences are parsed 
and discarded as arguments.  To diagnose such arguments (in contexts when 
the attribute is otherwise valid), an entry in the table of attribute 
handlers is appropriate.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

> +(define_insn "@load_tls"
>    [(set (match_operand:P 0 "register_operand" "=r")
>   (unspec:P
>       [(match_operand:P 1 "symbolic_operand" "")]
> -     UNSPEC_TLS_GD))]
> +     UNSPEC_TLS))]

/* snip */

> +{
> +  enum loongarch_symbol_type symbol_type;
> +  gcc_assert (loongarch_symbolic_constant_p (operands[1],
> &symbol_type));

/* snip */

> +  switch (symbol_type)
> +    {
> +    case SYMBOL_TLS_LE:
> +  return "la.tls.le\t%0,%1";
> +    case SYMBOL_TLS_IE:
> +  return "la.tls.ie\t%0,%1";
> +    case SYMBOL_TLSLDM:
> +  return "la.tls.ld\t%0,%1";
> +    case SYMBOL_TLSGD:
> +  return "la.tls.gd\t%0,%1";

/* snip */

> +    default:
> +  gcc_unreachable ();
> +    }
> +}
> +  [(set_attr "mode" "")
> +   (set_attr "length" "2")])

Should be 8, it's in bytes.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v4 2/4] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> +;; Use two registers to get the global symbol address from the got table.
> +;; la.global rd, rt, sym
> +
> +(define_insn_and_split "movdi_symbolic_off64"
> + [(set (match_operand:DI 0 "register_operand" "=r,r")
> +   (match_operand:DI 1 "symbolic_off64_or_reg_operand" "Yd,r"))
> +  (unspec:DI [(const_int 0)]
> +    UNSPEC_LOAD_SYMBOL_OFFSET64)
> +  (clobber (match_operand:DI 2 "register_operand" "=&r,r"))]
> + "TARGET_64BIT && TARGET_CMODEL_EXTREME"
> +{
> +  if (which_alternative == 1)
> +    return "#";
> +
> +  enum loongarch_symbol_type symbol_type;
> +  gcc_assert (loongarch_symbolic_constant_p (operands[1], &symbol_type));
> +
> +  switch (symbol_type)
> +    {
> +    case SYMBOL_PCREL64:
> +  return "la.local\t%0,%2,%1";
> +    case SYMBOL_GOT_DISP:
> +  return "la.global\t%0,%2,%1";
> +    case SYMBOL_TLS_IE:
> +  return "la.tls.ie\t%0,%2,%1";
> +    case SYMBOL_TLSGD:
> +  return "la.tls.gd\t%0,%2,%1";
> +    case SYMBOL_TLSLDM:
> +  return "la.tls.ld\t%0,%2,%1";
> +
> +    default:
> +  gcc_unreachable ();
> +  }
> +}
> + "&& REG_P (operands[1]) && find_reg_note (insn, REG_UNUSED, operands[2]) != 
> 0"
> + [(set (match_dup 0) (match_dup 1))]
> + ""
> + [(set_attr "mode" "DI")
> +  (set_attr "length" "5")])

Should be 20, in bytes.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Richard Biener
On Wed, 24 Jan 2024, Andrew Stubbs wrote:

> This is enough to get gfx1100 working for most purposes, on top of the
> patch that Tobias committed a week or so ago; there are still some test
> failures to investigate, and probably some tuning to do.
> 
> It might also get gfx1030 working too. @Richi, could you test it,
> please?

I can report partial success here.  I do see quite some FAILs because of

/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:
 
In function 'accum_._omp_fn.1':^M
/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38:
 
error: unrecognizable insn:^M
(insn 108 107 109 6 (set (reg:V8SF 849)^M
(unspec:V8SF [^M
(reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M
(const_int 1 [0x1])^M
] UNSPEC_PLUS_DPP_SHR)) 
"/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29
 
discrim 1 -1^M
 (nil))^M
during RTL pass: vregs^M
/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38:
 
internal compiler error: in extract_insn, at recog.cc:2812^M

there are also quite a number of execution FAILs like

icv-5.exe: 
/space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: 
isa_matches_agent: Assertion `agent_isa_s' failed.
FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test

(the assert in question looks bad - yeah, somehow we got past
device initialization - not sure how - but end up here)

Maybe HSA behaves odd here - I didn't constrain the device it should
choose but it works (most of the time).  GCN_DEBUG prints me all the
HSA agents available but I don't see any debug on which agent
is actually initialized during libgomp device init (at least nothing
I can easily match up).  Maybe sth to improve.

I'll followup with a test summary once the (serial) run of libgomp
testing finished.  At least there are quite some number of
actual kernel executions and PASSing testcases.

Richard.

> I can't test the other multilibs right now. @PA, can you test it please?
> 
> I can self-approve the patch, but I'll hold off the commit until the
> test results come back.
> 
> Andrew
> 
> gcc/ChangeLog:
> 
>   * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
>   * config/gcn/gcn-valu.md (all_convert): New iterator.
>   (2): New
>   define_expand, and rename the old one to ...
>   (*_sdwa): ... this.
>   (extend2): Likewise, to ...
>   (extend_sdwa): .. this.
>   (*_shift): New.
>   * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
>   (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
>   * config/gcn/gcn.md (mulhisi3): Disable on RDNA3.
>   (mulqihi3_scalar): Likewise.
> 
> libgcc/ChangeLog:
> 
>   * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
> 
> libgomp/ChangeLog:
> 
>   * config/gcn/time.c (RTC_TICKS): Configure RDNA3.
>   (omp_get_wtime): Add RDNA3-compatible variant.
>   * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
> 
> Signed-off-by:  Andrew Stubbs 
> ---
>  gcc/config/gcn/gcn-opts.h |  2 +-
>  gcc/config/gcn/gcn-valu.md| 41 ---
>  gcc/config/gcn/gcn.cc | 31 ---
>  gcc/config/gcn/gcn.md |  4 +--
>  libgcc/config/gcn/amdgcn_veclib.h |  2 +-
>  libgomp/config/gcn/time.c | 10 
>  libgomp/plugin/plugin-gcn.c   |  6 +++--
>  7 files changed, 77 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
> index 79fbda3ab25..6be2c9204fa 100644
> --- a/gcc/config/gcn/gcn-opts.h
> +++ b/gcc/config/gcn/gcn-opts.h
> @@ -62,7 +62,7 @@ extern enum gcn_isa {
>  
>  
>  #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
> -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
> +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
>  
>  #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
>  
> diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
> index 3d5b6271ee6..cd027f8b369 100644
> --- a/gcc/config/gcn/gcn-valu.md
> +++ b/gcc/config/gcn/gcn-valu.md
> @@ -3555,30 +3555,63 @@
>  ;; }}}
>  ;; {{{ Int/int conversions
>  
> +(define_code_iterator all_convert [truncate zero_extend sign_extend])
>  (define_code_iterator zero_convert [truncate zero_extend])
>  (define_code_attr convop [
>   (sign_extend "extend")
>   (zero_extend "zero_extend")
>   (truncate "trunc")])
>  
> -(define_insn "2"
> +(define_expand "2"
> +  [(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
> +(all_convert:V_INT_1REG
> +   (match_operand:V_INT_1REG_ALT 1 "gcn_alu_operand" " v")))]
> +  "")
> +
> +(define_insn "*_sdwa"
>[(set (match_operand:V_INT_1REG 0 "register_operand"  "=v")
> 

Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread chenglulu



在 2024/1/26 下午4:52, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:


+(define_insn "@load_tls"
    [(set (match_operand:P 0 "register_operand" "=r")
    (unspec:P
        [(match_operand:P 1 "symbolic_operand" "")]
-       UNSPEC_TLS_GD))]
+       UNSPEC_TLS))]

/* snip */


+{
+  enum loongarch_symbol_type symbol_type;
+  gcc_assert (loongarch_symbolic_constant_p (operands[1],
&symbol_type));

/* snip */


+  switch (symbol_type)
+    {
+    case SYMBOL_TLS_LE:
+  return "la.tls.le\t%0,%1";
+    case SYMBOL_TLS_IE:
+  return "la.tls.ie\t%0,%1";
+    case SYMBOL_TLSLDM:
+  return "la.tls.ld\t%0,%1";
+    case SYMBOL_TLSGD:
+  return "la.tls.gd\t%0,%1";

/* snip */


+    default:
+  gcc_unreachable ();
+    }
+}
+  [(set_attr "mode" "")
+   (set_attr "length" "2")])

Should be 8, it's in bytes.


Um, sorry, I meant to use insn_count.



Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu



在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
   1. Add macro support for TLS symbols
   2. Added support for loading __get_tls_addr symbol address using call36.
   3. Merge template got_load_tls_{ld/gd/le/ie}.
   4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.



Re: [PATCH v2 3/5] C: Implement musttail attribute for returns

2024-01-26 Thread Andi Kleen
> > I don't have tests for that but since it's not new behavior I suppose
> > that's sufficient.
> 
> Each attribute should have tests that invalid uses are appropriately 
> diagnosed.  See gcc.dg/c23-attr-fallthrough-2.c for examples of such tests 
> in the case of the [[fallthrough]] attribute.  Some invalid uses may be 
> diagnosed by existing code that's generic across attributes, others 
> require specific code for the individual attribute.

Okay I can add a test for the other statement and declaration cases like
below.

Any other change you need for approval?


> 
> The default parsing of an attribute without an entry in the table of 
> attribute handlers is that arbitrary balanced token sequences are parsed 
> and discarded as arguments.

And it triggers a warning too (see below)

> To diagnose such arguments (in contexts when 
> the attribute is otherwise valid), an entry in the table of attribute 
> handlers is appropriate.

The only valid usage is [[musttail]] return and there is already the default
warning in the other cases. So I don't think an entry in the table is needed.

BTW I noticed that [[musttail]] ; (empty statement with attribute) gives an 
error, which
is probably a (unrelated) bug, afaik that should be legal for C23.

-Andi


t.c:

[[musttail]] int j;
__attribute__((musttail)) int k;

void foo(void)
{
[[musttail]] j++;
[[musttail]] if (k > 0)
[[musttail]] k++;
}


t.c:2:1: warning: ‘musttail’ attribute ignored [-Wattributes]
2 | [[musttail]] int j;
  | ^
t.c:3:1: warning: ‘musttail’ attribute directive ignored [-Wattributes]
3 | __attribute__((musttail)) int k;
  | ^
t.c: In function ‘foo’:
t.c:7:9: warning: ‘musttail’ attribute ignored [-Wattributes]
7 | [[musttail]] j++;
  | ^
t.c:8:9: warning: ‘musttail’ attribute ignored [-Wattributes]
8 | [[musttail]] if (k > 0)
  | ^
t.c:9:17: warning: ‘musttail’ attribute ignored [-Wattributes]
9 | [[musttail]] k++;



Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread chenglulu



在 2024/1/26 下午4:59, chenglulu 写道:


在 2024/1/26 下午4:52, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:


+(define_insn "@load_tls"
    [(set (match_operand:P 0 "register_operand" "=r")
  (unspec:P
      [(match_operand:P 1 "symbolic_operand" "")]
-        UNSPEC_TLS_GD))]
+        UNSPEC_TLS))]

/* snip */


+{
+  enum loongarch_symbol_type symbol_type;
+  gcc_assert (loongarch_symbolic_constant_p (operands[1],
&symbol_type));

/* snip */


+  switch (symbol_type)
+    {
+    case SYMBOL_TLS_LE:
+  return "la.tls.le\t%0,%1";
+    case SYMBOL_TLS_IE:
+  return "la.tls.ie\t%0,%1";
+    case SYMBOL_TLSLDM:
+  return "la.tls.ld\t%0,%1";
+    case SYMBOL_TLSGD:
+  return "la.tls.gd\t%0,%1";

/* snip */


+    default:
+  gcc_unreachable ();
+    }
+}
+  [(set_attr "mode" "")
+   (set_attr "length" "2")])
When the symbol type is TLS LE and -mcmodel=extreme, 4 instructions are 
generated here, and I will also modify them here.



Should be 8, it's in bytes.


Um, sorry, I meant to use insn_count.




Re: [PATCH v2 3/5] C: Implement musttail attribute for returns

2024-01-26 Thread Joseph Myers
On Fri, 26 Jan 2024, Andi Kleen wrote:

> > > I don't have tests for that but since it's not new behavior I suppose
> > > that's sufficient.
> > 
> > Each attribute should have tests that invalid uses are appropriately 
> > diagnosed.  See gcc.dg/c23-attr-fallthrough-2.c for examples of such tests 
> > in the case of the [[fallthrough]] attribute.  Some invalid uses may be 
> > diagnosed by existing code that's generic across attributes, others 
> > require specific code for the individual attribute.
> 
> Okay I can add a test for the other statement and declaration cases like
> below.
> 
> Any other change you need for approval?

I use testcases as a key part of the review of a patch, to see if the 
behavior is as I'd expect, so will need to see the updated patch series.

As we're in regression-fixing mode for GCC 14, a new feature like this 
will need to wait for consideration until after GCC 14 branches.

> > The default parsing of an attribute without an entry in the table of 
> > attribute handlers is that arbitrary balanced token sequences are parsed 
> > and discarded as arguments.
> 
> And it triggers a warning too (see below)

For attribute arguments, the key test is [[gnu::musttail()]] on a return 
statement where the attribute would be valid were it not for the attribute 
arguments.

> BTW I noticed that [[musttail]] ; (empty statement with attribute) gives an 
> error, which
> is probably a (unrelated) bug, afaik that should be legal for C23.

That's defined in the standard as an attribute declaration, not an 
attribute on a statement (empty statements can't have attributes).  The 
only currently supported attribute valid in an attribute declaration is 
[[fallthrough]].

When you give an attribute in C23 syntax without a namespace (so 
[[musttail]] as opposed to [[gnu::musttail]]), if it's not a known 
standard attribute then it fails the constraint in 6.7.12.1p2, "The 
identifier in a standard attribute shall be one of: [list]".

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Richard Biener
On Fri, 26 Jan 2024, Richard Biener wrote:

> On Wed, 24 Jan 2024, Andrew Stubbs wrote:
> 
> > This is enough to get gfx1100 working for most purposes, on top of the
> > patch that Tobias committed a week or so ago; there are still some test
> > failures to investigate, and probably some tuning to do.
> > 
> > It might also get gfx1030 working too. @Richi, could you test it,
> > please?
> 
> I can report partial success here.  I do see quite some FAILs because of
> 
> /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:
>  
> In function 'accum_._omp_fn.1':^M
> /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38:
>  
> error: unrecognizable insn:^M
> (insn 108 107 109 6 (set (reg:V8SF 849)^M
> (unspec:V8SF [^M
> (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M
> (const_int 1 [0x1])^M
> ] UNSPEC_PLUS_DPP_SHR)) 
> "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29
>  
> discrim 1 -1^M
>  (nil))^M
> during RTL pass: vregs^M
> /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38:
>  
> internal compiler error: in extract_insn, at recog.cc:2812^M
> 
> there are also quite a number of execution FAILs like
> 
> icv-5.exe: 
> /space/rguenther/src/gcc-autopar_devel/libgomp/plugin/plugin-gcn.c:2462: 
> isa_matches_agent: Assertion `agent_isa_s' failed.
> FAIL: libgomp.c/../libgomp.c-c++-common/icv-5.c execution test
> 
> (the assert in question looks bad - yeah, somehow we got past
> device initialization - not sure how - but end up here)
> 
> Maybe HSA behaves odd here - I didn't constrain the device it should
> choose but it works (most of the time).  GCN_DEBUG prints me all the
> HSA agents available but I don't see any debug on which agent
> is actually initialized during libgomp device init (at least nothing
> I can easily match up).  Maybe sth to improve.
> 
> I'll followup with a test summary once the (serial) run of libgomp
> testing finished.  At least there are quite some number of
> actual kernel executions and PASSing testcases.

=== libgomp Summary ===

# of expected passes29126
# of unexpected failures697
# of unexpected successes   1
# of expected failures  703
# of unresolved testcases   318
# of unsupported tests  766

full summary attached (compressed).  Even compressed libgomp.log is
too big to send.

Richard.

> 
> Richard.
> 
> > I can't test the other multilibs right now. @PA, can you test it please?
> > 
> > I can self-approve the patch, but I'll hold off the commit until the
> > test results come back.
> > 
> > Andrew
> > 
> > gcc/ChangeLog:
> > 
> > * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3.
> > * config/gcn/gcn-valu.md (all_convert): New iterator.
> > (2): New
> > define_expand, and rename the old one to ...
> > (*_sdwa): ... this.
> > (extend2): Likewise, to ...
> > (extend_sdwa): .. this.
> > (*_shift): New.
> > * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly.
> > (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100.
> > * config/gcn/gcn.md (mulhisi3): Disable on RDNA3.
> > (mulqihi3_scalar): Likewise.
> > 
> > libgcc/ChangeLog:
> > 
> > * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3.
> > 
> > libgomp/ChangeLog:
> > 
> > * config/gcn/time.c (RTC_TICKS): Configure RDNA3.
> > (omp_get_wtime): Add RDNA3-compatible variant.
> > * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100.
> > 
> > Signed-off-by:  Andrew Stubbs 
> > ---
> >  gcc/config/gcn/gcn-opts.h |  2 +-
> >  gcc/config/gcn/gcn-valu.md| 41 ---
> >  gcc/config/gcn/gcn.cc | 31 ---
> >  gcc/config/gcn/gcn.md |  4 +--
> >  libgcc/config/gcn/amdgcn_veclib.h |  2 +-
> >  libgomp/config/gcn/time.c | 10 
> >  libgomp/plugin/plugin-gcn.c   |  6 +++--
> >  7 files changed, 77 insertions(+), 19 deletions(-)
> > 
> > diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
> > index 79fbda3ab25..6be2c9204fa 100644
> > --- a/gcc/config/gcn/gcn-opts.h
> > +++ b/gcc/config/gcn/gcn-opts.h
> > @@ -62,7 +62,7 @@ extern enum gcn_isa {
> >  
> >  
> >  #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
> > -#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
> > +#define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS || TARGET_RDNA3)
> >  
> >  #define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
> >  
> > diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
> > index 3d5b6271ee6..cd027f8b369 100644
> > --- a/gcc/config/gcn/gcn-valu.md
> > +++ b/gcc/config/gcn/gcn-valu.md
> > @@ -3555,30 +3555,63 @@
> >  ;; }}}
> >  ;; {{{ Int/int conver

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs

On 25/01/2024 23:03, Tobias Burnus wrote:

When targeting AMD GPUs, the LLVM assembler (and linker) are used.

Two days ago LLVM changed the default for theAMDHSA code object version (COV) from 4 to 5. In principle, we do not 
care which COV is used as long as it works; unfortunately, 
"mkoffload.cc" also generates an object file directly, bypassing the AMD 
GPU compiler as it copies debugging data to that file. That object file 
must have the same COV version (ELF ABI version) as compiler + llvm-mc 
assembler generated files. In order to ensure those are the same, this 
patch forces the use of COV 4 instead of using the default. Once GCC 
requires LLVM >= 14 instead of LLVM >= 13.0.1 we could change it. 
(Assuming that COV 5 is sufficiently stable in LLVM 14.) - But for now 
COV 4 will do.

If you wonder how this LLVM issue shows up, simply compile any OpenMP
or OpenACC program with AMD GPU offloading and enable debugging ("-g"),
e.g.
   gcc -fopenmp -g test.f90 -foffload=amdgcn-amdhsa 
-foffload-options=-march=gfx908

With LLVM main (to become LLVM 18), you will then get the error:

   ld: error: incompatible ABI version: /tmp/ccAKx5cz.mkoffload.dbg.o

OK for mainline?


Looks good to me.

The alternative would be to copy the elf flags from another object file; 
that probably has it's own pitfalls.


OK.

Andrew


Re: [PATCH] aarch64: Fix undefinedness while testing the J constraint [PR100204]

2024-01-26 Thread Alex Coplan
On 25/01/2024 11:57, Andrew Pinski wrote:
> The J constraint can invoke undefined behavior due to it taking the
> negative of the ival if ival was HWI_MIN. The fix is simple as casting
> to `unsigned HOST_WIDE_INT` before doing the negative of it. This
> does that.

Thanks for doing this.

> 
> Committed as obvious after build/test for aarch64-linux-gnu.
> 
> gcc/ChangeLog:
> 
>   PR target/100204
>   * config/aarch64/constraints.md (J): Cast to `unsigned HOST_WIDE_INT`
>   before taking the negative of it.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/constraints.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/constraints.md 
> b/gcc/config/aarch64/constraints.md
> index 8566befd727..a2569cea510 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -118,7 +118,7 @@ (define_constraint "Uat"
>  (define_constraint "J"
>   "A constant that can be used with a SUB operation (once negated)."
>   (and (match_code "const_int")
> -  (match_test "aarch64_uimm12_shift (-ival)")))
> +  (match_test "aarch64_uimm12_shift (- (unsigned HOST_WIDE_INT) ival)")))

Sorry for the nitpick, but: I don't think we want a space after the unary -
here (at least according to https://gcc.gnu.org/codingconventions.html).

Alex

>  
>  ;; We can't use the mode of a CONST_INT to determine the context in
>  ;; which it is being used, so we must have a separate constraint for
> -- 
> 2.39.3
> 


Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 07:29, Richard Biener wrote:

On Fri, Jan 26, 2024 at 12:04 AM Tobias Burnus  wrote:


When targeting AMD GPUs, the LLVM assembler (and linker) are used.

Two days ago LLVM changed the default for the AMDHSA code object
version (COV) from 4 to 5.

In principle, we do not care which COV is used as long as it works;
unfortunately, "mkoffload.cc" also generates an object file directly,
bypassing the AMD GPU compiler as it copies debugging data to that
file. That object file must have the same COV version (ELF ABI version)
as compiler + llvm-mc assembler generated files.

In order to ensure those are the same, this patch forces the use of
COV 4 instead of using the default. Once GCC requires LLVM >= 14
instead of LLVM >= 13.0.1 we could change it. (Assuming that COV 5
is sufficiently stable in LLVM 14.) - But for now COV 4 will do.

If you wonder how this LLVM issue shows up, simply compile any OpenMP
or OpenACC program with AMD GPU offloading and enable debugging ("-g"),
e.g.
   gcc -fopenmp -g test.f90 -foffload=amdgcn-amdhsa 
-foffload-options=-march=gfx908

With LLVM main (to become LLVM 18), you will then get the error:

   ld: error: incompatible ABI version: /tmp/ccAKx5cz.mkoffload.dbg.o

OK for mainline?


If you link against prebuilt objects with COV 5 it seems there's no way to
override the COV version GCC uses?  That is, do we want to add
a -mcode-object-version=... option to allow the user to override this
(and ABI_VERSION_SPEC honoring that, if specified and of course
mkoffload following suit)?

Otherwise looks OK in the meantime.


We don't have a stable ABI, so trying to link against foreign binaries 
is already a problem. Most recently, the SIMD clone implementation 
required a change to the procedure calling ABI, the reverse-offload 
changes reimplemented the stack setup, and the low-latency memory 
patches changed the way we use local memories and needed more info 
passed into the device runtime. I expect more of this in future.


Compatibility across GCC versions doesn't really exist, and 
compatibility with LLVM-binaries is a non-starter.


Andrew


Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 09:45, Richard Biener wrote:

On Fri, 26 Jan 2024, Richard Biener wrote:

 === libgomp Summary ===

# of expected passes29126
# of unexpected failures697
# of unexpected successes   1
# of expected failures  703
# of unresolved testcases   318
# of unsupported tests  766

full summary attached (compressed).  Even compressed libgomp.log is
too big to send.

Richard.


I think this is good enough to start with. PA reported clean results for 
everything except gfx900 (looks like an unrelated issue).


I'll go ahead and commit the patch.

Hopefully Tobias's patch has already trimmed all the "-g" failures from 
that list.


Andrew


[PATCH] tree-optimization/113602 - datarefs of non-addressables

2024-01-26 Thread Richard Biener
We can end up creating ADDR_EXPRs of non-addressable entities during
for example vectorization.  The following plugs this in data-ref
analysis when that would create such invalid ADDR_EXPR as part of
analyzing the ref structure.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/113602
* tree-vect-data-ref.cc (dr_analyze_innermost): Fail when
the base object isn't addressable.

* gcc.dg/pr113602.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr113602.c | 10 ++
 gcc/tree-data-ref.cc|  7 ++-
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr113602.c

diff --git a/gcc/testsuite/gcc.dg/pr113602.c b/gcc/testsuite/gcc.dg/pr113602.c
new file mode 100644
index 000..94bfbc91b5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr113602.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target bitint575 } */
+/* { dg-options "-O2 -fno-tree-loop-optimize" } */
+
+_BitInt(503)
+f(void)
+{
+  register _BitInt(503) r asm(""); /* { dg-error "invalid" } */
+  return r;
+}
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index ae55bf6aa48..f37734b5340 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -1182,7 +1182,12 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree 
ref,
   base = TREE_OPERAND (base, 0);
 }
   else
-base = build_fold_addr_expr (base);
+{
+  if (may_be_nonaddressable_p (base))
+   return opt_result::failure_at (stmt,
+  "failed: base not addressable.\n");
+  base = build_fold_addr_expr (base);
+}
 
   if (in_loop)
 {
-- 
2.35.3


Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Richard Biener
On Fri, 26 Jan 2024, Andrew Stubbs wrote:

> On 26/01/2024 09:45, Richard Biener wrote:
> > On Fri, 26 Jan 2024, Richard Biener wrote:
> > 
> >  === libgomp Summary ===
> > 
> > # of expected passes29126
> > # of unexpected failures697
> > # of unexpected successes   1
> > # of expected failures  703
> > # of unresolved testcases   318
> > # of unsupported tests  766
> > 
> > full summary attached (compressed).  Even compressed libgomp.log is
> > too big to send.
> > 
> > Richard.
> 
> I think this is good enough to start with. PA reported clean results for
> everything except gfx900 (looks like an unrelated issue).
> 
> I'll go ahead and commit the patch.
> 
> Hopefully Tobias's patch has already trimmed all the "-g" failures from that
> list.

Should I open a bug for the ICE?  That's responsible for quite a number
of failures as well.

Richard.


Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 10:22, Richard Biener wrote:

On Fri, 26 Jan 2024, Andrew Stubbs wrote:


On 26/01/2024 09:45, Richard Biener wrote:

On Fri, 26 Jan 2024, Richard Biener wrote:

  === libgomp Summary ===

# of expected passes29126
# of unexpected failures697
# of unexpected successes   1
# of expected failures  703
# of unresolved testcases   318
# of unsupported tests  766

full summary attached (compressed).  Even compressed libgomp.log is
too big to send.

Richard.


I think this is good enough to start with. PA reported clean results for
everything except gfx900 (looks like an unrelated issue).

I'll go ahead and commit the patch.

Hopefully Tobias's patch has already trimmed all the "-g" failures from that
list.


Should I open a bug for the ICE?  That's responsible for quite a number
of failures as well.


The broken vector reduction instruction? It's a known issue (RDNA 
doesn't support those instructions anymore, and somehow disabling the 
insn isn't enough to stop them being generated), but it doesn't have a 
tracking number, so why not?


Thanks

Andrew



[PATCH] Avoid assert for unknown device ISAs in GCN libgomp plugin

2024-01-26 Thread Richard Biener
When the agent reports a device ISA we don't support avoid hitting
an assert, instead report the raw integers as error.  I'm not sure
whether -1 is special as I didn't figure where that field is
initialized.  But I guess since agents are not rejected upfront
when registering them I might be able to force execution to an
unsupported one.

An alternative would maybe to change get_agent_info () to return NULL
for unsupported ISAs?

Tested on x86_64-unknown-linux-gnu -> amdgcn-hsa with gfx1060

OK?

Thanks,
Richard.

libgomp/
* plugin/plugin-gcn.c (isa_matches_agent): Avoid asserting we
only get supported device ISAs.  Report raw numbers when not.
---
 libgomp/plugin/plugin-gcn.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index db28781dedb..d8c3907c108 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -2459,13 +2459,15 @@ isa_matches_agent (struct agent_info *agent, Elf64_Ehdr 
*image)
   char msg[120];
   const char *agent_isa_s = isa_hsa_name (agent->device_isa);
   const char *agent_isa_gcc_s = isa_gcc_name (agent->device_isa);
-  assert (agent_isa_s);
-  assert (agent_isa_gcc_s);
-
-  snprintf (msg, sizeof msg,
-   "GCN code object ISA '%s' does not match GPU ISA '%s'.\n"
-   "Try to recompile with '-foffload-options=-march=%s'.\n",
-   isa_s, agent_isa_s, agent_isa_gcc_s);
+  if (agent_isa_s && agent_isa_gcc_s)
+   snprintf (msg, sizeof msg,
+ "GCN code object ISA '%s' does not match GPU ISA '%s'.\n"
+ "Try to recompile with '-foffload-options=-march=%s'.\n",
+ isa_s, agent_isa_s, agent_isa_gcc_s);
+  else
+   snprintf (msg, sizeof msg,
+ "GCN code object ISA '%s' (%d) does not match GPU ISA %d.\n",
+ isa_s, isa_field, agent->device_isa);
 
   hsa_error (msg, HSA_STATUS_ERROR);
   return false;
-- 
2.35.3


Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Tobias Burnus

Hi all,

Andrew Stubbs wrote:

On 26/01/2024 07:29, Richard Biener wrote:
If you link against prebuilt objects with COV 5 it seems there's no 
way to

override the COV version GCC uses?  That is, do we want to add
a -mcode-object-version=... option to allow the user to override this
(and ABI_VERSION_SPEC honoring that, if specified and of course
mkoffload following suit)?


For completeness, I added such a feature, see attachment. (Actually, 
'=0' could be permitted for mkoffload without "-g" debugging enabled.)


However, the real problem is that one usually also has libraries build 
with the default such as libc, libm, libgomp, ... Thus, specifying 
anything else but GCC's default is likely to break.


Hence and also because of the following, I think it doesn't make sense 
to add:


We don't have a stable ABI, so trying to link against foreign binaries 
is already a problem. Most recently, the SIMD clone implementation 
required a change to the procedure calling ABI, the reverse-offload 
changes reimplemented the stack setup, and the low-latency memory 
patches changed the way we use local memories and needed more info 
passed into the device runtime. I expect more of this in future.



PS: The original patch has been committed as r14-8449-g4b5650acb31072.

Tobias
amdgcn: Add -mcode-object-version= to override the default

For fiji, GCC defaults to Code Object V3 and otherwiese to V4;
the -mcode-object-version= flag permits to override it to the
specified version, which is passed on the the assembler.
Using -mcode-object-version=0, no COV is passed to the assembler,
using its default. - Note that all files including libraries must
be build with the same ABI version and that GCN'S mkoffload must
know the version number for handle debugging symbols.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Update for
	-mcode-object-version=
	* config/gcn/gcn.opt (mcode-object-version=): Add.
	* doc/invoke.texi (gcn): Add -mcode-object-version=.
	* config/gcn/mkoffload.cc (copy_early_debug_info,
	main): Handle mcode-object-version=.

 gcc/config/gcn/gcn-hsa.h| 11 +--
 gcc/config/gcn/gcn.opt  |  4 
 gcc/config/gcn/mkoffload.cc | 18 --
 gcc/doc/invoke.texi |  4 
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index e5b93f7d9e5..79ee4171ce2 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -81,8 +81,15 @@ extern unsigned int gcn_local_sym_hash (const char *name);
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
Note that Fiji is only suppored with LLVM <= 17 as version 3 i no longer
-   supported in LLVM >= 18.  */
-#define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
+   supported in LLVM >= 18.
+
+   This can be overridden by -mcode-object-version; we permit = 0 to fall
+   back to the assembler default - which has issues with mkoffload, see above.
+   Otherwise, we use the provided value.  */
+
+#define ABI_VERSION_SPEC "mcode-object-version=0:;" \
+			 "mcode-object-version=*:--amdhsa-code-object-version=%*;" \
+			 "march=fiji:--amdhsa-code-object-version=3;" \
 			 "!march=*|march=*:--amdhsa-code-object-version=4"
 
 /* Note that the XNACK and SRAM-ECC settings must match those in mkoffload.cc
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 842fd36d25c..874ff085134 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -54,6 +54,10 @@ mtune=
 Target RejectNegative Negative(mtune=) Joined ToLower Enum(gpu_type) Var(gcn_tune) Init(PROCESSOR_FIJI)
 Specify the name of the target GPU.
 
+mcode-object-version=
+Target RejectNegative Joined UInteger
+Override the used code object version
+
 m32
 Target RejectNegative InverseMask(ABI64)
 Generate code for a 32-bit ABI.
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 0d0e7bac9b2..6cbf3dc5873 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -128,6 +128,7 @@ uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU architecture.
 uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_ANY_V4;
 
 static int gcn_stack_size = 0;  /* Zero means use default.  */
+static int code_object_version = -1;  /* Negative means default.  */
 
 /* Delete tempfiles.  */
 
@@ -351,7 +352,10 @@ copy_early_debug_info (const char *infile, const char *outfile)
 
   /* Patch the correct elf architecture flag into the file.  */
   ehdr.e_ident[7] = ELFOSABI_AMDGPU_HSA;
-  ehdr.e_ident[8] = (elf_arch == EF_AMDGPU_MACH_AMDGCN_GFX803
+  /* The ABI version; code object V1 had no version, V2 is 0, V3 is 1 etc. */
+  ehdr.e_ident[8] = (code_object_version >= 2
+		 ? code_object_version - 2
+		 : elf_arch == EF_AMDGPU_MACH_AMDGCN_GFX803
 		 ? ELFABIVERSION_AMDGPU_HSA_V3
 		 : ELFABIVERSION_AMDGPU_HSA_V4);
   ehdr.e_type = ET_REL;
@@ -972,

Re: [PATCH] Avoid assert for unknown device ISAs in GCN libgomp plugin

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 10:30, Richard Biener wrote:

When the agent reports a device ISA we don't support avoid hitting
an assert, instead report the raw integers as error.  I'm not sure
whether -1 is special as I didn't figure where that field is
initialized.  But I guess since agents are not rejected upfront
when registering them I might be able to force execution to an
unsupported one.

An alternative would maybe to change get_agent_info () to return NULL
for unsupported ISAs?

Tested on x86_64-unknown-linux-gnu -> amdgcn-hsa with gfx1060

OK?


OK, thanks.

Andrew



Thanks,
Richard.

libgomp/
* plugin/plugin-gcn.c (isa_matches_agent): Avoid asserting we
only get supported device ISAs.  Report raw numbers when not.
---
  libgomp/plugin/plugin-gcn.c | 16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index db28781dedb..d8c3907c108 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -2459,13 +2459,15 @@ isa_matches_agent (struct agent_info *agent, Elf64_Ehdr 
*image)
char msg[120];
const char *agent_isa_s = isa_hsa_name (agent->device_isa);
const char *agent_isa_gcc_s = isa_gcc_name (agent->device_isa);
-  assert (agent_isa_s);
-  assert (agent_isa_gcc_s);
-
-  snprintf (msg, sizeof msg,
-   "GCN code object ISA '%s' does not match GPU ISA '%s'.\n"
-   "Try to recompile with '-foffload-options=-march=%s'.\n",
-   isa_s, agent_isa_s, agent_isa_gcc_s);
+  if (agent_isa_s && agent_isa_gcc_s)
+   snprintf (msg, sizeof msg,
+ "GCN code object ISA '%s' does not match GPU ISA '%s'.\n"
+ "Try to recompile with '-foffload-options=-march=%s'.\n",
+ isa_s, agent_isa_s, agent_isa_gcc_s);
+  else
+   snprintf (msg, sizeof msg,
+ "GCN code object ISA '%s' (%d) does not match GPU ISA %d.\n",
+ isa_s, isa_field, agent->device_isa);
  
hsa_error (msg, HSA_STATUS_ERROR);

return false;




[PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Richard Biener
The following avoids selecting an unsupported agent early, avoiding
later asserts when we rely on it being supported.

tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060

that's the alternative to the other patch.  I do indeed seem to get
the other (unsupported) agent selected somehow after the other supported
agent finished a kernel run.  Not sure if it's the CPU or the IGPU though.

OK?  Which variant?

libgomp/
* plugin/plugin-gcn.c (get_agent_info): When the agent isn't supported
return NULL.
---
 libgomp/plugin/plugin-gcn.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d8c3907c108..f453f630e06 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1036,6 +1036,8 @@ print_kernel_dispatch (struct kernel_dispatch *dispatch, 
unsigned indent)
 /* }}}  */
 /* {{{ Utility functions  */
 
+static const char* isa_hsa_name (int isa);
+
 /* Cast the thread local storage to gcn_thread.  */
 
 static inline struct gcn_thread *
@@ -1589,6 +1591,11 @@ get_agent_info (int n)
   GOMP_PLUGIN_error ("Attempt to use an uninitialized GCN agent.");
   return NULL;
 }
+  if (!isa_hsa_name (hsa_context.agents[n].device_isa))
+{
+  GOMP_PLUGIN_error ("Attempt to use an unsupported GCN agent.");
+  return NULL;
+}
   return &hsa_context.agents[n];
 }
 
-- 
2.35.3


Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Richard Sandiford
Victor Do Nascimento  writes:
> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16)
>  END (libat_test_and_set_16)
>  
>  
> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations,
> +   that is, map it to LSE128, LRCPC or CORE as appropriate.  */
> +
> +ALIAS (libat_exchange_16, LSE128_LRCPC3, LSE128)
> +ALIAS (libat_fetch_or_16, LSE128_LRCPC3, LSE128)
> +ALIAS (libat_fetch_and_16, LSE128_LRCPC3, LSE128)
> +ALIAS (libat_or_fetch_16, LSE128_LRCPC3, LSE128)
> +ALIAS (libat_and_fetch_16, LSE128_LRCPC3, LSE128)
> +ALIAS (libat_load_16, LSE128_LRCPC3, LRCPC3)
> +ALIAS (libat_store_16, LSE128_LRCPC3, LRCPC3)
> +ALIAS (libat_compare_exchange_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_fetch_add_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_add_fetch_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_fetch_sub_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_sub_fetch_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_fetch_xor_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_xor_fetch_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_fetch_nand_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_nand_fetch_16, LSE128_LRCPC3, LSE2)
> +ALIAS (libat_test_and_set_16, LSE128_LRCPC3, LSE2)
> +
>  /* Alias entry points which are the same in LSE2 and LSE128.  */
>  
>  #if !HAVE_FEAT_LSE128
> @@ -734,6 +803,29 @@ ALIAS (libat_fetch_nand_16, LSE128, LSE2)
>  ALIAS (libat_nand_fetch_16, LSE128, LSE2)
>  ALIAS (libat_test_and_set_16, LSE128, LSE2)
>  
> +
> +/* Alias entry points which are the same in LRCPC3 and LSE2.  */
> +
> +#if !HAVE_FEAT_LRCPC3
> +ALIAS (libat_load_16, LRCPC3, LSE2)
> +ALIAS (libat_store_16, LRCPC3, LSE2)
> +#endif
> +ALIAS (libat_exchange_16, LRCPC3, LSE2)
> +ALIAS (libat_fetch_or_16, LRCPC3, LSE2)
> +ALIAS (libat_fetch_and_16, LRCPC3, LSE2)
> +ALIAS (libat_or_fetch_16, LRCPC3, LSE2)
> +ALIAS (libat_and_fetch_16, LRCPC3, LSE2)
> +ALIAS (libat_compare_exchange_16, LRCPC3, LSE2)
> +ALIAS (libat_fetch_add_16, LRCPC3, LSE2)
> +ALIAS (libat_add_fetch_16, LRCPC3, LSE2)
> +ALIAS (libat_fetch_sub_16, LRCPC3, LSE2)
> +ALIAS (libat_sub_fetch_16, LRCPC3, LSE2)
> +ALIAS (libat_fetch_xor_16, LRCPC3, LSE2)
> +ALIAS (libat_xor_fetch_16, LRCPC3, LSE2)
> +ALIAS (libat_fetch_nand_16, LRCPC3, LSE2)
> +ALIAS (libat_nand_fetch_16, LRCPC3, LSE2)
> +ALIAS (libat_test_and_set_16, LRCPC3, LSE2)
> +
>  /* Alias entry points which are the same in baseline and LSE2.  */
>  
>  ALIAS (libat_exchange_16, LSE2, CORE)

Sorry to be awkward, but I think this is becoming a bit unwieldly.
It wasn't too bad using aliases for LSE128->LSE2 fallbacks since LSE128
could optimise a decent number of routines.  But here we're using two
sets of aliases for every function in order to express "LRCPC3 loads and
stores should be used where possible, independently of the choices for
other routines".

This also complicates the ifuncs, since we need to detect LRCPC3+LSE128
as a distinct combination even though none of the underlying routines
depend on both LRCPC3 and LSE128 together.

I think instead we should make the ifunc mechanism more granular,
so that we can have default/lse2/rcpc3 for loads and stores,
default/lse128 for things that LSE128 can do, etc.  I realise that
would require some rework of the generic framework code, but I think
it's still worth doing.

I won't object if another maintainer is happy with the patch in its
current form though.

Thanks,
Richard

> diff --git a/libatomic/config/linux/aarch64/host-config.h 
> b/libatomic/config/linux/aarch64/host-config.h
> index 4e354124063..d03fcfe4a64 100644
> --- a/libatomic/config/linux/aarch64/host-config.h
> +++ b/libatomic/config/linux/aarch64/host-config.h
> @@ -37,9 +37,12 @@ typedef struct __ifunc_arg_t {
>  
>  #ifdef HWCAP_USCAT
>  # if N == 16
> -#  define IFUNC_COND_1   (has_lse128 (hwcap, features))
> -#  define IFUNC_COND_2   (has_lse2 (hwcap, features))
> -#  define IFUNC_NCOND(N) 2
> +#  define IFUNC_COND_1   (has_lse128 (hwcap, features) \
> +  && has_rcpc3 (hwcap, features))
> +#  define IFUNC_COND_2   (has_lse128 (hwcap, features))
> +#  define IFUNC_COND_3   (has_rcpc3  (hwcap, features))
> +#  define IFUNC_COND_4   (has_lse2   (hwcap, features))
> +#  define IFUNC_NCOND(N) 4
>  # else
>  #  define IFUNC_COND_1   (hwcap & HWCAP_ATOMICS)
>  #  define IFUNC_NCOND(N) 1
> @@ -90,6 +93,9 @@ has_lse2 (unsigned long hwcap, const __ifunc_arg_t 
> *features)
>  #define AT_FEAT_FIELD(isar0) (((isar0) >> 20) & 15)
>  
>  /* Ensure backwards compatibility with glibc <= 2.38.  */
> +#ifndef HWCAP2_LRCPC3
> +#define HWCAP2_LRCPC3(1UL << 46)
> +#endif
>  #ifndef HWCAP2_LSE128
>  #define HWCAP2_LSE128(1UL << 47)
>  #endif
> @@ -116,6 +122,27 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t 
> *features)
>return false;
>  }
>  
> +/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].  The
> +   expected value is 0b0011.  Check that.  */
> +
> +static inl

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:
> 
> 在 2024/1/26 下午4:49, Xi Ruoyao 写道:
> > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> > > v3 -> v4:
> > >    1. Add macro support for TLS symbols
> > >    2. Added support for loading __get_tls_addr symbol address using 
> > > call36.
> > >    3. Merge template got_load_tls_{ld/gd/le/ie}.
> > >    4. Enable explicit reloc for extreme TLS GD/LD with 
> > > -mexplicit-relocs=auto.
> > I've rebased and attached the patch to fix the bad split in -mexplicit-
> > relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
> > tested it seriously though (only tested the added and modified test
> > cases).
> > 
> OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 10:40, Richard Biener wrote:

The following avoids selecting an unsupported agent early, avoiding
later asserts when we rely on it being supported.

tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060

that's the alternative to the other patch.  I do indeed seem to get
the other (unsupported) agent selected somehow after the other supported
agent finished a kernel run.  Not sure if it's the CPU or the IGPU though.

OK?  Which variant?


So, looking at it again, the original intent of the assert was to alert 
toolchain developers that they missed adding a new name when porting to 
a new device, but I concur that it's not ideal when the assert 
encounters an unknown device in the wild.


However, if we're trying to do something more useful than merely fixing 
an ugly error message, maybe we should look at removing unsupported 
devices in "suitable_hsa_agent_p" instead? Unsupported GPUs wouldn't be 
assigned a device number at all.


Probably devices that are GPUs but skipped because they are unsupported 
should be mentioned on GOMP_DEBUG (as well as GCN_DEBUG)?


The goal should be that folks with your twin-GPU setup shouldn't have to 
work around it, but I don't really want to remove the message for people 
who only have one device but don't realize it is unsupported.


On the other hand, if a user has two devices that *are* supported, but 
the second one is preferred, they'll have to set OMP_DEFAULT_DEVICE 
explicitly, and is this so different?


As a user, WDYT?

Andrew



libgomp/
* plugin/plugin-gcn.c (get_agent_info): When the agent isn't supported
return NULL.
---
  libgomp/plugin/plugin-gcn.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index d8c3907c108..f453f630e06 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1036,6 +1036,8 @@ print_kernel_dispatch (struct kernel_dispatch *dispatch, 
unsigned indent)
  /* }}}  */
  /* {{{ Utility functions  */
  
+static const char* isa_hsa_name (int isa);

+
  /* Cast the thread local storage to gcn_thread.  */
  
  static inline struct gcn_thread *

@@ -1589,6 +1591,11 @@ get_agent_info (int n)
GOMP_PLUGIN_error ("Attempt to use an uninitialized GCN agent.");
return NULL;
  }
+  if (!isa_hsa_name (hsa_context.agents[n].device_isa))
+{
+  GOMP_PLUGIN_error ("Attempt to use an unsupported GCN agent.");
+  return NULL;
+}
return &hsa_context.agents[n];
  }
  




Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu



在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
    1. Add macro support for TLS symbols
    2. Added support for loading __get_tls_addr symbol address using call36.
    3. Merge template got_load_tls_{ld/gd/le/ie}.
    4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.

I temporarily changed my binutils to turn off this function.;-)







[PATCH] c++/modules: Stream additional fields for DECL_STRUCT_FUNCTION [PR113580]

2024-01-26 Thread Nathaniel Shead
This patch just adds enough of the fields from 'function' to fix the ICE
in the linked PR. I suppose there might be more fields from this type
that should be propagated, but I don't know enough to find out which
they might be yet, since a lot of them seem to be only set after
gimplification.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently the DECL_STRUCT_FUNCTION for a declaration is always
reconstructed from scratch. This causes issues though, as some fields
used by other parts of the compiler (in this case, specifically
'function_{start,end}_locus') are then not correctly initialised. This
patch makes sure that these fields are also read and written.

PR c++/113580

gcc/cp/ChangeLog:

* module.cc (struct post_process_data): Create.
(trees_in::post_decls): Use.
(trees_in::post_process): Return entire vector at once.
Change overload to take post_process_data instead of tree.
(trees_out::write_function_def): Write needed flags from
DECL_STRUCT_FUNCTION.
(trees_in::read_function_def): Read them and pass to
post_process.
(module_state::read_cluster): Write flags into cfun.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr113580_a.C: New test.
* g++.dg/modules/pr113580_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc  | 47 ++-
 gcc/testsuite/g++.dg/modules/pr113580_a.C | 10 +
 gcc/testsuite/g++.dg/modules/pr113580_b.C | 10 +
 3 files changed, 58 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr113580_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr113580_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 6176801b7a7..840c7ef6dab 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -2837,6 +2837,13 @@ typedef hash_map,uintptr_t> >
 duplicate_hash_map;
 
+/* Data needed for post-processing.  */
+struct post_process_data {
+  tree decl;
+  location_t start_locus;
+  location_t end_locus;
+};
+
 /* Tree stream reader.  Note that reading a stream doesn't mark the
read trees with TREE_VISITED.  Thus it's quite safe to have
multiple concurrent readers.  Which is good, because lazy
@@ -2848,7 +2855,7 @@ private:
   module_state *state; /* Module being imported.  */
   vec back_refs; /* Back references.  */
   duplicate_hash_map *duplicates;  /* Map from existings to duplicate.  */
-  vec post_decls;/* Decls to post process.  */
+  vec post_decls;   /* Decls to post process.  */
   unsigned unused; /* Inhibit any interior TREE_USED
   marking.  */
 
@@ -2945,16 +2952,16 @@ public:
   tree odr_duplicate (tree decl, bool has_defn);
 
 public:
-  /* Return the next decl to postprocess, or NULL.  */
-  tree post_process ()
+  /* Return the decls to postprocess.  */
+  const vec& post_process ()
   {
-return post_decls.length () ? post_decls.pop () : NULL_TREE;
+return post_decls;
   }
 private:
-  /* Register DECL for postprocessing.  */
-  void post_process (tree decl)
+  /* Register DATA for postprocessing.  */
+  void post_process (post_process_data data)
   {
-post_decls.safe_push (decl);
+post_decls.safe_push (data);
   }
 
 private:
@@ -11667,15 +11674,25 @@ trees_out::write_function_def (tree decl)
   tree_node (cexpr->body);
 }
 
+  function* f = DECL_STRUCT_FUNCTION (decl);
+
   if (streaming_p ())
 {
   unsigned flags = 0;
 
+  if (f)
+   flags |= 2;
   if (DECL_NOT_REALLY_EXTERN (decl))
flags |= 1;
 
   u (flags);
 }
+
+  if (state && f)
+{
+  state->write_location (*this, f->function_start_locus);
+  state->write_location (*this, f->function_end_locus);
+}
 }
 
 void
@@ -11692,6 +11709,8 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
   tree saved = tree_node ();
   tree context = tree_node ();
   constexpr_fundef cexpr;
+  post_process_data pdata {};
+  pdata.decl = maybe_template;
 
   tree maybe_dup = odr_duplicate (maybe_template, DECL_SAVED_TREE (decl));
   bool installing = maybe_dup && !DECL_SAVED_TREE (decl);
@@ -11708,6 +11727,12 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
 
   unsigned flags = u ();
 
+  if (flags & 2)
+{
+  pdata.start_locus = state->read_location (*this);
+  pdata.end_locus = state->read_location (*this);
+}
+
   if (get_overrun ())
 return NULL_TREE;
 
@@ -11722,7 +11747,7 @@ trees_in::read_function_def (tree decl, tree 
maybe_template)
SET_DECL_FRIEND_CONTEXT (decl, context);
   if (cexpr.decl)
register_constexpr_fundef (cexpr);
-  post_process (maybe_template);
+  post_process (pdata);
 }
   else if (maybe_dup)
 {
@@ -15100,8 +15125,10 @@ module_state::read_cluster (unsigned snum)
  push_function_context does too much work.   */
   tree old_cfd = current_function_decl;
 

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 10:39, Tobias Burnus wrote:

Hi all,

Andrew Stubbs wrote:

On 26/01/2024 07:29, Richard Biener wrote:
If you link against prebuilt objects with COV 5 it seems there's no 
way to

override the COV version GCC uses?  That is, do we want to add
a -mcode-object-version=... option to allow the user to override this
(and ABI_VERSION_SPEC honoring that, if specified and of course
mkoffload following suit)?


For completeness, I added such a feature, see attachment. (Actually, 
'=0' could be permitted for mkoffload without "-g" debugging enabled.)


However, the real problem is that one usually also has libraries build 
with the default such as libc, libm, libgomp, ... Thus, specifying 
anything else but GCC's default is likely to break.


Hence and also because of the following, I think it doesn't make sense 
to add:


We don't have a stable ABI, so trying to link against foreign binaries 
is already a problem. Most recently, the SIMD clone implementation 
required a change to the procedure calling ABI, the reverse-offload 
changes reimplemented the stack setup, and the low-latency memory 
patches changed the way we use local memories and needed more info 
passed into the device runtime. I expect more of this in future.



PS: The original patch has been committed as r14-8449-g4b5650acb31072.

Tobias


Agreed, there's no point in having a knob that only has one valid 
setting, especially when we want to be able to change that setting 
without breaking third-party scripts that choose to that knob "for 
completeness", or something.


The toolchain can have an opinion about which is the correct COV, and I 
think not relying on the LLVM default choice makes sense also. I think 
COV 5 is only a small update, but there's no reason to imagine that COV6 
will Just Work (COV2 to COV3, and COV3 to COV4 required real effort).


We can move on to COV5 for GCC 15, probably. I'm not aware of any great 
blocker, but it sets a minimum LLVM.


Andrew


Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Tobias Burnus

Andrew Stubbs wrote:
We can move on to COV5 for GCC 15, probably. I'm not aware of any 
great blocker, but it sets a minimum LLVM.


And as our testing hardware showed, it also bumps the minimal ROCm to 
5.2 (as 5.1 fails with COV5).


Otherwise, as mentioned, COV5 was added to LLVM 14, but as we already 
require 13.0.1; thus, we would just have to require a half-a-year newer 
LLVM, which could be indeed fine for GCC 15.


Tobias



[PATCH] Fix architecture support in OMP_OFFLOAD_init_device for gcn

2024-01-26 Thread Richard Biener
The following makes the existing architecture support check work
instead of being optimized away (enum vs. -1).  This avoids
later asserts when we assume such devices are never actually
used.

Tested as previously, now the error is

libgomp: GCN fatal error: Unknown GCN agent architecture
Runtime message: HSA_STATUS_ERROR: A generic error has occurred.

now will figure why we try to initialize that device.

OK?

libgomp/
* plugin/plugin-gcn.c
(EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add.
(isa_code): Return that instead of -1.
(GOMP_OFFLOAD_init_device): Adjust.
---
 libgomp/plugin/plugin-gcn.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index db28781dedb..588358bbbf9 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -384,6 +384,7 @@ struct gcn_image_desc
See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */
 
 typedef enum {
+  EF_AMDGPU_MACH_UNSUPPORTED = -1,
   EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
   EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
   EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
@@ -1727,7 +1728,7 @@ isa_code(const char *isa) {
   if (!strncmp (isa, gcn_gfx1100_s, gcn_isa_name_len))
 return EF_AMDGPU_MACH_AMDGCN_GFX1100;
 
-  return -1;
+  return EF_AMDGPU_MACH_UNSUPPORTED;
 }
 
 /* CDNA2 devices have twice as many VGPRs compared to older devices.  */
@@ -3374,7 +3375,7 @@ GOMP_OFFLOAD_init_device (int n)
 return hsa_error ("Error querying the name of the agent", status);
 
   agent->device_isa = isa_code (agent->name);
-  if (agent->device_isa < 0)
+  if (agent->device_isa == EF_AMDGPU_MACH_UNSUPPORTED)
 return hsa_error ("Unknown GCN agent architecture", HSA_STATUS_ERROR);
 
   status = hsa_fns.hsa_agent_get_info_fn (agent->id, 
HSA_AGENT_INFO_VENDOR_NAME,
-- 
2.35.3


Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Victor Do Nascimento
On 1/26/24 10:53, Richard Sandiford wrote:
> Victor Do Nascimento  writes:
>> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16)
>>  END (libat_test_and_set_16)
>>  
>>  
>> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations,
>> +   that is, map it to LSE128, LRCPC or CORE as appropriate.  */
>> +
>> +ALIAS (libat_exchange_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_fetch_or_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_fetch_and_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_or_fetch_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_and_fetch_16, LSE128_LRCPC3, LSE128)
>> +ALIAS (libat_load_16, LSE128_LRCPC3, LRCPC3)
>> +ALIAS (libat_store_16, LSE128_LRCPC3, LRCPC3)
>> +ALIAS (libat_compare_exchange_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_add_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_add_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_sub_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_sub_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_xor_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_xor_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_fetch_nand_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_nand_fetch_16, LSE128_LRCPC3, LSE2)
>> +ALIAS (libat_test_and_set_16, LSE128_LRCPC3, LSE2)
>> +
>>  /* Alias entry points which are the same in LSE2 and LSE128.  */
>>  
>>  #if !HAVE_FEAT_LSE128
>> @@ -734,6 +803,29 @@ ALIAS (libat_fetch_nand_16, LSE128, LSE2)
>>  ALIAS (libat_nand_fetch_16, LSE128, LSE2)
>>  ALIAS (libat_test_and_set_16, LSE128, LSE2)
>>  
>> +
>> +/* Alias entry points which are the same in LRCPC3 and LSE2.  */
>> +
>> +#if !HAVE_FEAT_LRCPC3
>> +ALIAS (libat_load_16, LRCPC3, LSE2)
>> +ALIAS (libat_store_16, LRCPC3, LSE2)
>> +#endif
>> +ALIAS (libat_exchange_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_or_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_and_16, LRCPC3, LSE2)
>> +ALIAS (libat_or_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_and_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_compare_exchange_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_add_16, LRCPC3, LSE2)
>> +ALIAS (libat_add_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_sub_16, LRCPC3, LSE2)
>> +ALIAS (libat_sub_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_xor_16, LRCPC3, LSE2)
>> +ALIAS (libat_xor_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_fetch_nand_16, LRCPC3, LSE2)
>> +ALIAS (libat_nand_fetch_16, LRCPC3, LSE2)
>> +ALIAS (libat_test_and_set_16, LRCPC3, LSE2)
>> +
>>  /* Alias entry points which are the same in baseline and LSE2.  */
>>  
>>  ALIAS (libat_exchange_16, LSE2, CORE)
> 
> Sorry to be awkward, but I think this is becoming a bit unwieldly.
> It wasn't too bad using aliases for LSE128->LSE2 fallbacks since LSE128
> could optimise a decent number of routines.  But here we're using two
> sets of aliases for every function in order to express "LRCPC3 loads and
> stores should be used where possible, independently of the choices for
> other routines".
> 
> This also complicates the ifuncs, since we need to detect LRCPC3+LSE128
> as a distinct combination even though none of the underlying routines
> depend on both LRCPC3 and LSE128 together.
> 
> I think instead we should make the ifunc mechanism more granular,
> so that we can have default/lse2/rcpc3 for loads and stores,
> default/lse128 for things that LSE128 can do, etc.  I realise that
> would require some rework of the generic framework code, but I think
> it's still worth doing.
> 
> I won't object if another maintainer is happy with the patch in its
> current form though.
> 
> Thanks,
> Richard

In all fairness, I have to agree with you.

While I had wanted to be as noninvasive as possible with existing
generic framework code, I do agree that with the addition of RCPC3
following LSE128 we're rapidly approaching a kind of maintainability
"critical mass" regarding combinatorial explosion.

In this patch series I'd already had to expand the number of supported
ifunc alternatives to 4 and with n atomic architectural extensions we'll
end up w/ 2^n possible cases to cater for in our selector alone.

Happy to go back to the drawing board and see how we can add the
flexibility to Libatomic.

As always, thanks for the feedback!

V.


>> diff --git a/libatomic/config/linux/aarch64/host-config.h 
>> b/libatomic/config/linux/aarch64/host-config.h
>> index 4e354124063..d03fcfe4a64 100644
>> --- a/libatomic/config/linux/aarch64/host-config.h
>> +++ b/libatomic/config/linux/aarch64/host-config.h
>> @@ -37,9 +37,12 @@ typedef struct __ifunc_arg_t {
>>  
>>  #ifdef HWCAP_USCAT
>>  # if N == 16
>> -#  define IFUNC_COND_1  (has_lse128 (hwcap, features))
>> -#  define IFUNC_COND_2  (has_lse2 (hwcap, features))
>> -#  define IFUNC_NCOND(N)2
>> +#  define IFUNC_COND_1  (has_lse128 (hwcap, features) \
>> + && has_rcpc3 (hwcap, features))
>> +#  define IFUNC_COND_2  (has_lse128 (hwcap, features))
>> +#  define IFUNC_COND_3  (has_rcpc3  (hwcap, features))
>> +#  define IFUNC_COND_4  (has_lse2   (hwca

Re: [PATCH] Fix architecture support in OMP_OFFLOAD_init_device for gcn

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 11:42, Richard Biener wrote:

The following makes the existing architecture support check work
instead of being optimized away (enum vs. -1).  This avoids
later asserts when we assume such devices are never actually
used.

Tested as previously, now the error is

libgomp: GCN fatal error: Unknown GCN agent architecture
Runtime message: HSA_STATUS_ERROR: A generic error has occurred.

now will figure why we try to initialize that device.

OK?


OK.



libgomp/
* plugin/plugin-gcn.c
(EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add.
(isa_code): Return that instead of -1.
(GOMP_OFFLOAD_init_device): Adjust.
---
  libgomp/plugin/plugin-gcn.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index db28781dedb..588358bbbf9 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -384,6 +384,7 @@ struct gcn_image_desc
 See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */
  
  typedef enum {

+  EF_AMDGPU_MACH_UNSUPPORTED = -1,
EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
@@ -1727,7 +1728,7 @@ isa_code(const char *isa) {
if (!strncmp (isa, gcn_gfx1100_s, gcn_isa_name_len))
  return EF_AMDGPU_MACH_AMDGCN_GFX1100;
  
-  return -1;

+  return EF_AMDGPU_MACH_UNSUPPORTED;
  }
  
  /* CDNA2 devices have twice as many VGPRs compared to older devices.  */

@@ -3374,7 +3375,7 @@ GOMP_OFFLOAD_init_device (int n)
  return hsa_error ("Error querying the name of the agent", status);
  
agent->device_isa = isa_code (agent->name);

-  if (agent->device_isa < 0)
+  if (agent->device_isa == EF_AMDGPU_MACH_UNSUPPORTED)
  return hsa_error ("Unknown GCN agent architecture", HSA_STATUS_ERROR);
  
status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_VENDOR_NAME,




[PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Richard Biener
The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

I'll run a bootstrap with both pending changes and will do
another round of full libgomp testing with this.

OK if that succeeds?

Thanks,
Richard.

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
---
 libgomp/plugin/plugin-gcn.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..88ed77ff049 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
 #undef DLSYM_FN
 }
 
+static gcn_isa isa_code(const char *isa);
+
 /* Return true if the agent is a GPU and can accept of concurrent submissions
from different threads.  */
 
@@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)
   switch (device_type)
 {
 case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64];
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ return false;
+  }
   break;
 case HSA_DEVICE_TYPE_CPU:
   if (!support_cpu_devices)
-- 
2.35.3


Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Jakub Jelinek
On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:
> The following avoids registering unsupported GCN offload devices
> when iterating over available ones.  With a Zen4 desktop CPU
> you will have an IGPU (unspported) which will otherwise be made
> available.  This causes testcases like
> libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
> decives to FAIL.
> 
> I'll run a bootstrap with both pending changes and will do
> another round of full libgomp testing with this.
> 
> OK if that succeeds?
> 
> Thanks,
> Richard.
> 
> libgomp/
>   * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
>   agents with unsupported ISA.
> ---
>  libgomp/plugin/plugin-gcn.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> index 588358bbbf9..88ed77ff049 100644
> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c
> @@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
>  #undef DLSYM_FN
>  }
>  
> +static gcn_isa isa_code(const char *isa);

Space before ( please.

> +
>  /* Return true if the agent is a GPU and can accept of concurrent submissions
> from different threads.  */
>  
> @@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)
>switch (device_type)
>  {
>  case HSA_DEVICE_TYPE_GPU:
> +  {
> + char name[64];
> + if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
> +  != HSA_STATUS_SUCCESS)
> + || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
> +   return false;
> +  }
>break;
>  case HSA_DEVICE_TYPE_CPU:
>if (!support_cpu_devices)

Otherwise it looks reasoanble to me, but let's see what Andrew thinks.

Jakub



Re: [PATCH] genopinit: Split init_all_optabs [PR113575]

2024-01-26 Thread Richard Biener
On Fri, Jan 26, 2024 at 9:17 AM Robin Dapp  wrote:
>
> Hi,
>
> init_all_optabs initializes > 1 patterns for riscv targets.  This
> leads to pathological situations in dataflow analysis (which can occur
> with many adjacent stores).
> To alleviate this this patch makes genopinit split the init_all_optabs
> function into several init_optabs_xx functions that each initialize 1000
> patterns.
>
> With this change insn-opinit.cc's compilation time is reduced from 4+
> minutes to 1:30 and memory consumption decreases from 1.2G to 630M.
>
> Bootstrapped and regtested on x86 and aarch64 (where we do split) and
> on power10 (where we don't).  Regtested on riscv.

OK.

Thanks,
Richard.

> Regards
>  Robin
>
> gcc/ChangeLog:
>
> PR other/113575
>
> * genopinit.cc (main): Split init_all_optabs into functions
> of 1000 patterns each.
> ---
>  gcc/genopinit.cc | 43 ++-
>  1 file changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
> index 88ccafa5b2c..d8682b2a9ad 100644
> --- a/gcc/genopinit.cc
> +++ b/gcc/genopinit.cc
> @@ -367,11 +367,44 @@ main (int argc, const char **argv)
>  fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
>fprintf (s_file, "};\n\n");
>
> -  fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
> *optabs)\n{\n");
> -  fprintf (s_file, "  bool *ena = optabs->pat_enable;\n");
> -  for (i = 0; patterns.iterate (i, &p); ++i)
> -fprintf (s_file, "  ena[%u] = HAVE_%s;\n", i, p->name);
> -  fprintf (s_file, "}\n\n");
> +  /* Some targets like riscv have a large number of patterns.  In order to
> + prevent pathological situations in dataflow analysis split the init
> + function into separate ones that initialize 1000 patterns each.  */
> +
> +  const int patterns_per_function = 1000;
> +
> +  if (patterns.length () > patterns_per_function)
> +{
> +  unsigned num_init_functions
> +   = patterns.length () / patterns_per_function + 1;
> +  for (i = 0; i < num_init_functions; i++)
> +   {
> + fprintf (s_file, "static void\ninit_optabs_%02d "
> +  "(struct target_optabs *optabs)\n{\n", i);
> + fprintf (s_file, "  bool *ena = optabs->pat_enable;\n");
> + unsigned start = i * patterns_per_function;
> + unsigned end = MIN (patterns.length (),
> + (i + 1) * patterns_per_function);
> + for (j = start; j < end; ++j)
> +   fprintf (s_file, "  ena[%u] = HAVE_%s;\n", j, patterns[j].name);
> + fprintf (s_file, "}\n\n");
> +   }
> +
> +  fprintf (s_file, "void\ninit_all_optabs "
> +  "(struct target_optabs *optabs)\n{\n");
> +  for (i = 0; i < num_init_functions; ++i)
> +   fprintf (s_file, "  init_optabs_%02d (optabs);\n", i);
> +  fprintf (s_file, "}\n\n");
> +}
> +  else
> +{
> +  fprintf (s_file, "void\ninit_all_optabs "
> +  "(struct target_optabs *optabs)\n{\n");
> +  fprintf (s_file, "  bool *ena = optabs->pat_enable;\n");
> +  for (i = 0; patterns.iterate (i, &p); ++i)
> +   fprintf (s_file, "  ena[%u] = HAVE_%s;\n", i, p->name);
> +  fprintf (s_file, "}\n\n");
> +}
>
>fprintf (s_file,
>"/* Returns TRUE if the target supports any of the partial 
> vector\n"
> --
> 2.43.0


[patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs (was: [patch] amdgcn: config.gcc - enable gfx1100 multilib; add gfx1100 to docs)

2024-01-26 Thread Tobias Burnus

Hi all, hi Richard & Andrew,

Am 24.01.24 um 17:01 schrieb Tobias Burnus:
This patch obviously depends on Andrew's; he wrote in the previous 
email of this thread regarding his patch:


Andrew Stubbs wrote:

This is enough to get gfx1100 working for most purposes, on top of the
patch that Tobias committed a week or so ago; there are still some test
failures to investigate, and probably some tuning to do.

It might also get gfx1030 working too. @Richi, could you test it,
please?


If gfx1030 doesn't work, I would propose my patch previously in the 
thread, https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html


This patch assumes that both gfx1100 and gfx1030 are now working:

Okay to enable gfx1100 multilib building and to document gfx1100 in 
the manual?

and, with this patch, additionally gfx1030?

OK for mainline, once Andrew's patch is in?

Tobias



Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Tobias Burnus

Jakub Jelinek wrote:

On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.

...

@@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)
switch (device_type)
  {
  case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64];
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ return false;
+  }
break;


I wonder whether we want to add a diagnostic for this or not - like:
GCN_DEBUG ("Ignoring unsupported GPU %s", name)? Or is this more 
confusing than helpful?



  case HSA_DEVICE_TYPE_CPU:
if (!support_cpu_devices)


Otherwise it looks reasoanble to me, but let's see what Andrew thinks.


Likewise: looks reasonable but I also would wait for Andrew.

Tobias


Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Tobias Burnus

Now with patch ...

Tobias Burnus wrote:

Hi all, hi Richard & Andrew,

Am 24.01.24 um 17:01 schrieb Tobias Burnus:
This patch obviously depends on Andrew's; he wrote in the previous 
email of this thread regarding his patch:


Andrew Stubbs wrote:

This is enough to get gfx1100 working for most purposes, on top of the
patch that Tobias committed a week or so ago; there are still some test
failures to investigate, and probably some tuning to do.

It might also get gfx1030 working too. @Richi, could you test it,
please?


If gfx1030 doesn't work, I would propose my patch previously in the 
thread, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html


This patch assumes that both gfx1100 and gfx1030 are now working:

Okay to enable gfx1100 multilib building and to document gfx1100 in 
the manual?

and, with this patch, additionally gfx1030?

OK for mainline, once Andrew's patch is in? 

Tobiasamdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

gcc/ChangeLog:

	* config.gcc (amdgcn-*-*): Add gfx1030 and gfx1100 to
	TM_MULTILIB_CONFIG.
	* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1030/gfx1100.
	* doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to
	-march/-mtune.

libgomp/ChangeLog:

	* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
	for gfx1030 and gfx1100.
	* testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test.
	* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.

Signed-off-by: Tobias Burnus 

 gcc/config.gcc  |  2 +-
 gcc/doc/install.texi| 12 ++--
 gcc/doc/invoke.texi |  6 ++
 libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c |  8 
 libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c |  8 
 libgomp/testsuite/libgomp.c/declare-variant-4.h | 16 
 6 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..a0f9c672308 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4564,7 +4564,7 @@ case "${target}" in
 			TM_MULTILIB_CONFIG=
 			;;
 		xdefault | xyes)
-			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a" | sed "s/${with_arch},\?//;s/,$//"`
+			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100" | sed "s/${with_arch},\?//;s/,$//"`
 			;;
 		*)
 			TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 71593919389..5747b5a12fe 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1258,12 +1258,12 @@ default set of libraries is selected based on the value of
 
 @item amdgcn*-*-*
 @var{list} is a comma separated list of ISA names (allowed values: @code{fiji},
-@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
-include the name of the default ISA, specified via @option{--with-arch}.  If
-@var{list} is empty, then there will be no multilibs and only the default
-run-time library will be built.  If @var{list} is @code{default} or
-@option{--with-multilib-list=} is not specified, then the default set of
-libraries is selected.
+@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
+@code{gfx1100}).  It ought not include the name of the default ISA, specified
+via @option{--with-arch}.  If @var{list} is empty, then there will be no
+multilibs and only the default run-time library will be built.  If @var{list}
+is @code{default} or @option{--with-multilib-list=} is not specified, then
+the default set of libraries is selected.
 
 @item arm*-*-*
 @var{list} is a comma separated list of @code{aprofile} and
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6ec56493e59..64c5ed2ffde 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21739,6 +21739,12 @@ Compile for CDNA1 Instinct MI100 series devices (gfx908).
 @item gfx90a
 Compile for CDNA2 Instinct MI200 series devices (gfx90a).
 
+@item gfx1030
+Compile for RDNA2 gfx1030 devices (GFX10 series).
+
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).
+
 @end table
 
 @opindex msram-ecc
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c
new file mode 100644
index 000..d98d5ef54ec
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1030.c
@@ -0,0 +1,8 @@
+/* { dg-do link { target { offload_target_amdgcn } } } */
+/* { dg-additional-options -foffload=amdgcn-amdhsa } */
+/* { dg-additional-options -foffload=-march=gfx1030 } */
+/* { dg-additional-options "-foffload=-fdump-tree-optimized" } */
+
+#include "declare-variant-4.h"
+
+/* { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump "= gfx1030 \\(\\);" "optimized" } } */
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c b/libgomp/testsuite/libgomp.c/declare-variant-4-gfx1100.c
new file mode 10

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Richard Biener
On Fri, 26 Jan 2024, Tobias Burnus wrote:

> Jakub Jelinek wrote:
> > On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:
> >> libgomp/
> >>  * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
> >>  agents with unsupported ISA.
> ...
> >> @@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)
> >> switch (device_type)
> >>   {
> >>   case HSA_DEVICE_TYPE_GPU:
> >> +  {
> >> +  char name[64];
> >> +  if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
> >> +   != HSA_STATUS_SUCCESS)
> >> +  || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
> >> +return false;
> >> +  }
> >> break;
> 
> I wonder whether we want to add a diagnostic for this or not - like:
> GCN_DEBUG ("Ignoring unsupported GPU %s", name)? Or is this more confusing
> than helpful?

I can do that, but there's later reasons it might be ignored as well,
so doing this in the caller might be better (you'd get duplicate
diagnostic as well, once for counting once for registering).  I figured
this might better be done as followup?

Richard.

> >>   case HSA_DEVICE_TYPE_CPU:
> >> if (!support_cpu_devices)
> > 
> > Otherwise it looks reasoanble to me, but let's see what Andrew thinks.
> 
> Likewise: looks reasonable but I also would wait for Andrew.
> 
> Tobias
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Richard Biener
On Fri, 26 Jan 2024, Tobias Burnus wrote:

> Now with patch ...
> 
> Tobias Burnus wrote:
> > Hi all, hi Richard & Andrew,
> >
> > Am 24.01.24 um 17:01 schrieb Tobias Burnus:
> >> This patch obviously depends on Andrew's; he wrote in the previous email of
> >> this thread regarding his patch:
> >>
> >> Andrew Stubbs wrote:
> >>> This is enough to get gfx1100 working for most purposes, on top of the
> >>> patch that Tobias committed a week or so ago; there are still some test
> >>> failures to investigate, and probably some tuning to do.
> >>>
> >>> It might also get gfx1030 working too. @Richi, could you test it,
> >>> please?
> >
> > If gfx1030 doesn't work, I would propose my patch previously in the thread,
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643835.html
> >
> > This patch assumes that both gfx1100 and gfx1030 are now working:
> >
> >> Okay to enable gfx1100 multilib building and to document gfx1100 in the
> >> manual?
> > and, with this patch, additionally gfx1030?
> >
> > OK for mainline, once Andrew's patch is in? 
> Tobias

Looks good to me.

+@item gfx1030
+Compile for RDNA2 gfx1030 devices (GFX10 series).
+
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).

Btw, "GFX10" series isn't precise as it's only the high-end parts
that are covered by gfx1030, there's gfx103[0-6] where hopefully
at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are
trivial to support as well(?).  Using gfx103x might be better
that way, OTOH if APU vs dGPU will make a compilation target
difference then gfx103d vs gfx103a maybe?  "GFX10" series might
be also not know to users, but I'm unsure we can list AMD
product names here?

Richard.


Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Tobias Burnus

Hi Richard,

Richard Biener wrote:

Looks good to me.
Thanks - I will commit it after lunch to see whether someone else has 
additional comments.

+@item gfx1030
+Compile for RDNA2 gfx1030 devices (GFX10 series).
+
+@item gfx1100
+Compile for RDNA3 gfx1100 devices (GFX11 series).

Btw, "GFX10" series isn't precise as it's only the high-end parts
that are covered by gfx1030, there's gfx103[0-6] where hopefully
at least gfx1031, gfx1032 and gfx1034 (the dGPU variants) are
trivial to support as well(?).

Using gfx103x might be better
that way, OTOH if APU vs dGPU will make a compilation target
difference then gfx103d vs gfx103a maybe?  "GFX10" series might


On the LLVM side (and also for the llvm-mc assembler), they distinguish 
all of the gfx* for the -mcpu= argument. See 
https://llvm.org/docs/AMDGPUUsage.html#id26 for that list.


Thus, I think it makes sense to do the same here.  The last column on 
that page lists the supported hardware but is it neither really up to 
date nor complete.


Thus, I found it easier to just mention gfx1100 as that's unique. On the 
ROCm side, AMD has:


https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

Tobias


Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 12:06, Jakub Jelinek wrote:

On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:

The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

I'll run a bootstrap with both pending changes and will do
another round of full libgomp testing with this.

OK if that succeeds?

Thanks,
Richard.

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
---
  libgomp/plugin/plugin-gcn.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..88ed77ff049 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
  #undef DLSYM_FN
  }
  
+static gcn_isa isa_code(const char *isa);


Space before ( please.


+
  /* Return true if the agent is a GPU and can accept of concurrent submissions
 from different threads.  */
  
@@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)

switch (device_type)
  {
  case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64];
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ return false;
+  }
break;
  case HSA_DEVICE_TYPE_CPU:
if (!support_cpu_devices)


Otherwise it looks reasoanble to me, but let's see what Andrew thinks.


'n' before 'a', please. ;-)

I think we need at least a GCN_DEBUG message when we ignore a GPU 
device. Possibly gomp_debug also.


Andrew


Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Richard Biener
On Fri, 26 Jan 2024, Andrew Stubbs wrote:

> On 26/01/2024 12:06, Jakub Jelinek wrote:
> > On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:
> >> The following avoids registering unsupported GCN offload devices
> >> when iterating over available ones.  With a Zen4 desktop CPU
> >> you will have an IGPU (unspported) which will otherwise be made
> >> available.  This causes testcases like
> >> libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
> >> decives to FAIL.
> >>
> >> I'll run a bootstrap with both pending changes and will do
> >> another round of full libgomp testing with this.
> >>
> >> OK if that succeeds?
> >>
> >> Thanks,
> >> Richard.
> >>
> >> libgomp/
> >>  * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
> >>  agents with unsupported ISA.
> >> ---
> >>   libgomp/plugin/plugin-gcn.c | 9 +
> >>   1 file changed, 9 insertions(+)
> >>
> >> diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
> >> index 588358bbbf9..88ed77ff049 100644
> >> --- a/libgomp/plugin/plugin-gcn.c
> >> +++ b/libgomp/plugin/plugin-gcn.c
> >> @@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
> >>   #undef DLSYM_FN
> >>   }
> >>   
> >> +static gcn_isa isa_code(const char *isa);
> > 
> > Space before ( please.
> > 
> >> +
> >>   /* Return true if the agent is a GPU and can accept of concurrent
> >>   submissions
> >>  from different threads.  */
> >>   @@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)
> >> switch (device_type)
> >>   {
> >>   case HSA_DEVICE_TYPE_GPU:
> >> +  {
> >> +  char name[64];
> >> +  if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
> >> +   != HSA_STATUS_SUCCESS)
> >> +  || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
> >> +return false;
> >> +  }
> >> break;
> >>   case HSA_DEVICE_TYPE_CPU:
> >> if (!support_cpu_devices)
> > 
> > Otherwise it looks reasoanble to me, but let's see what Andrew thinks.
> 
> 'n' before 'a', please. ;-)

?!

> I think we need at least a GCN_DEBUG message when we ignore a GPU device.
> Possibly gomp_debug also.

Like the following?  This will do

GCN debug: HSA run-time initialized for GCN
GCN debug: HSA_SYSTEM_INFO_ENDIANNESS: LITTLE
GCN debug: HSA_SYSTEM_INFO_EXTENSIONS: IMAGES
GCN debug: Ignoring unsupported agent 'gfx1036'
GCN debug: There are 1 GCN GPU devices.
GCN debug: Ignoring unsupported agent 'gfx1036'
GCN debug: HSA_AGENT_INFO_NAME: AMD Ryzen 9 7900X 12-Core Processor
...

for debug it's probably not too imporant to say this twice.

That said, no idea how to do gomp_debug.

OK?

Thanks,
Richard.


>From 7462a8ac70aeebc231c65456b9060d8cbf7d4c50 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 26 Jan 2024 12:57:10 +0100
Subject: [PATCH] Avoid registering unsupported OMP offload devices
To: gcc-patches@gcc.gnu.org

The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
---
 libgomp/plugin/plugin-gcn.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..2a17dc8accc 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
 #undef DLSYM_FN
 }
 
+static gcn_isa isa_code (const char *isa);
+
 /* Return true if the agent is a GPU and can accept of concurrent submissions
from different threads.  */
 
@@ -1443,6 +1445,16 @@ suitable_hsa_agent_p (hsa_agent_t agent)
   switch (device_type)
 {
 case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64] = "nil";
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ {
+   GCN_DEBUG ("Ignoring unsupported agent '%s'\n", name);
+   return false;
+ }
+  }
   break;
 case HSA_DEVICE_TYPE_CPU:
   if (!support_cpu_devices)
-- 
2.35.3



Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Jakub Jelinek
On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote:
> > > Otherwise it looks reasoanble to me, but let's see what Andrew thinks.
> > 
> > 'n' before 'a', please. ;-)
> 
> ?!

I've misspelled a word.

> @@ -1443,6 +1445,16 @@ suitable_hsa_agent_p (hsa_agent_t agent)
>switch (device_type)
>  {
>  case HSA_DEVICE_TYPE_GPU:
> +  {
> + char name[64] = "nil";
> + if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
> +  != HSA_STATUS_SUCCESS)
> + || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
> +   {
> + GCN_DEBUG ("Ignoring unsupported agent '%s'\n", name);
> + return false;
> +   }

I must say I know nothing about HSA libraries, but generally if a function
that is supposed to fill some buffer fails the content of the buffer is
undefined/unpredictable.
So might be better not to not initialize name before calling the function
(unless it has to be initialized) and strcpy it to nil or something similar
if it fails.

Jakub



[PATCH] debug/103047 - argument order of inlined functions

2024-01-26 Thread Richard Biener
The inliner puts variables for parameters of the inlined functions
in the inline scope in reverse order.  The following reverses them
again so that we get consistent ordering between the
DW_TAG_subprogram DW_TAG_formal_parameter and the
DW_TAG_inlined_subroutine DW_TAG_formal_parameter set.

I failed to create a testcase with regexps since the inline
instances have just abstract origins and so I can't match them up.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR debug/103047
* tree-inline.cc (initialize_inlined_parameters): Reverse
the decl chain of inlined parameters.
---
 gcc/tree-inline.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 1a2541ad952..75c10eb7dfc 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -3684,6 +3684,10 @@ initialize_inlined_parameters (copy_body_data *id, 
gimple *stmt,
   setup_one_parameter (id, p, static_chain, fn, bb, &vars);
 }
 
+  /* Reverse so the variables appear in the correct order in DWARF
+ debug info.  */
+  vars = nreverse (vars);
+
   declare_inline_vars (id->block, vars);
 }
 
-- 
2.35.3


Re: [PATCH] debug/103047 - argument order of inlined functions

2024-01-26 Thread Jakub Jelinek
On Fri, Jan 26, 2024 at 03:16:15PM +0100, Richard Biener wrote:
> The inliner puts variables for parameters of the inlined functions
> in the inline scope in reverse order.  The following reverses them
> again so that we get consistent ordering between the
> DW_TAG_subprogram DW_TAG_formal_parameter and the
> DW_TAG_inlined_subroutine DW_TAG_formal_parameter set.
> 
> I failed to create a testcase with regexps since the inline
> instances have just abstract origins and so I can't match them up.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> OK if that succeeds?
> 
> Thanks,
> Richard.
> 
>   PR debug/103047
>   * tree-inline.cc (initialize_inlined_parameters): Reverse
>   the decl chain of inlined parameters.

LGTM.

Jakub



Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Richard Biener
On Fri, 26 Jan 2024, Jakub Jelinek wrote:

> On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote:
> > > > Otherwise it looks reasoanble to me, but let's see what Andrew thinks.
> > > 
> > > 'n' before 'a', please. ;-)
> > 
> > ?!
> 
> I've misspelled a word.
> 
> > @@ -1443,6 +1445,16 @@ suitable_hsa_agent_p (hsa_agent_t agent)
> >switch (device_type)
> >  {
> >  case HSA_DEVICE_TYPE_GPU:
> > +  {
> > +   char name[64] = "nil";
> > +   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
> > +!= HSA_STATUS_SUCCESS)
> > +   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
> > + {
> > +   GCN_DEBUG ("Ignoring unsupported agent '%s'\n", name);
> > +   return false;
> > + }
> 
> I must say I know nothing about HSA libraries, but generally if a function
> that is supposed to fill some buffer fails the content of the buffer is
> undefined/unpredictable.
> So might be better not to not initialize name before calling the function
> (unless it has to be initialized) and strcpy it to nil or something similar
> if it fails.

Yeah, sorry.  Here's a proper engineered variant.  I don't expect that
function to ever fail of course.

>From 445891ba57e858d980441bd63249e3bc94632db3 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 26 Jan 2024 12:57:10 +0100
Subject: [PATCH] Avoid registering unsupported OMP offload devices
To: gcc-patches@gcc.gnu.org

The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

OK?

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
---
 libgomp/plugin/plugin-gcn.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..2771123252a 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
 #undef DLSYM_FN
 }
 
+static gcn_isa isa_code (const char *isa);
+
 /* Return true if the agent is a GPU and can accept of concurrent submissions
from different threads.  */
 
@@ -1443,6 +1445,18 @@ suitable_hsa_agent_p (hsa_agent_t agent)
   switch (device_type)
 {
 case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64];
+   hsa_status_t status
+ = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name);
+   if (status != HSA_STATUS_SUCCESS
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ {
+   GCN_DEBUG ("Ignoring unsupported agent '%s'\n",
+  status == HSA_STATUS_SUCCESS ? name : "invalid");
+   return false;
+ }
+  }
   break;
 case HSA_DEVICE_TYPE_CPU:
   if (!support_cpu_devices)
-- 
2.35.3



Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 14:04, Richard Biener wrote:

On Fri, 26 Jan 2024, Andrew Stubbs wrote:


On 26/01/2024 12:06, Jakub Jelinek wrote:

On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote:

The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

I'll run a bootstrap with both pending changes and will do
another round of full libgomp testing with this.

OK if that succeeds?

Thanks,
Richard.

libgomp/
  * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
  agents with unsupported ISA.
---
   libgomp/plugin/plugin-gcn.c | 9 +
   1 file changed, 9 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..88ed77ff049 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
   #undef DLSYM_FN
   }
   
+static gcn_isa isa_code(const char *isa);


Space before ( please.


+
   /* Return true if the agent is a GPU and can accept of concurrent
   submissions
  from different threads.  */
   @@ -1443,6 +1445,13 @@ suitable_hsa_agent_p (hsa_agent_t agent)
 switch (device_type)
   {
   case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64];
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ return false;
+  }
 break;
   case HSA_DEVICE_TYPE_CPU:
 if (!support_cpu_devices)


Otherwise it looks reasoanble to me, but let's see what Andrew thinks.


'n' before 'a', please. ;-)


?!


I think we need at least a GCN_DEBUG message when we ignore a GPU device.
Possibly gomp_debug also.


Like the following?  This will do

GCN debug: HSA run-time initialized for GCN
GCN debug: HSA_SYSTEM_INFO_ENDIANNESS: LITTLE
GCN debug: HSA_SYSTEM_INFO_EXTENSIONS: IMAGES
GCN debug: Ignoring unsupported agent 'gfx1036'
GCN debug: There are 1 GCN GPU devices.
GCN debug: Ignoring unsupported agent 'gfx1036'
GCN debug: HSA_AGENT_INFO_NAME: AMD Ryzen 9 7900X 12-Core Processor
...

for debug it's probably not too imporant to say this twice.

That said, no idea how to do gomp_debug.

OK?


I'm fairly comfortable with the repeat in debug output.

I mentioned gomp_debug because the target-independent GOMP_DEBUG=1 is a 
lot less noisy and actually documented where end-users might find it. 
From the plugin you would call GOMP_PLUGIN_debug (there are examples in 
plugin-nvptx.c). Probably the repeat is less welcome in that case 
though, so perhaps good for a follow-up.




Thanks,
Richard.


 From 7462a8ac70aeebc231c65456b9060d8cbf7d4c50 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 26 Jan 2024 12:57:10 +0100
Subject: [PATCH] Avoid registering unsupported OMP offload devices
To: gcc-patches@gcc.gnu.org

The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
---
  libgomp/plugin/plugin-gcn.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..2a17dc8accc 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
  #undef DLSYM_FN
  }
  
+static gcn_isa isa_code (const char *isa);

+
  /* Return true if the agent is a GPU and can accept of concurrent submissions
 from different threads.  */
  
@@ -1443,6 +1445,16 @@ suitable_hsa_agent_p (hsa_agent_t agent)

switch (device_type)
  {
  case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64] = "nil";
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ {
+   GCN_DEBUG ("Ignoring unsupported agent '%s'\n", name);
+   return false;
+ }
+  }
break;
  case HSA_DEVICE_TYPE_CPU:
if (!support_cpu_devices)


Like Jakub says, I think it needs to be like this, to be safe:

  status = hsa_fns.hsa_agent_get_info_fn (...)
  if (status unsuccessful || name unsupported)
if (status successful) output debug
return false

Andrew


Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs

On 26/01/2024 14:21, Richard Biener wrote:

On Fri, 26 Jan 2024, Jakub Jelinek wrote:


On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote:

Otherwise it looks reasoanble to me, but let's see what Andrew thinks.


'n' before 'a', please. ;-)


?!


I've misspelled a word.


@@ -1443,6 +1445,16 @@ suitable_hsa_agent_p (hsa_agent_t agent)
switch (device_type)
  {
  case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64] = "nil";
+   if ((hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name)
+!= HSA_STATUS_SUCCESS)
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ {
+   GCN_DEBUG ("Ignoring unsupported agent '%s'\n", name);
+   return false;
+ }


I must say I know nothing about HSA libraries, but generally if a function
that is supposed to fill some buffer fails the content of the buffer is
undefined/unpredictable.
So might be better not to not initialize name before calling the function
(unless it has to be initialized) and strcpy it to nil or something similar
if it fails.


Yeah, sorry.  Here's a proper engineered variant.  I don't expect that
function to ever fail of course.


I keep crossing emails with you today. :(

This version looks good to me. It has to work for all versions of ROCm 
from maybe 3.8 (the last version known to work with the Fiji devices) 
onwards to forever.


OK.

Andrew


 From 445891ba57e858d980441bd63249e3bc94632db3 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 26 Jan 2024 12:57:10 +0100
Subject: [PATCH] Avoid registering unsupported OMP offload devices
To: gcc-patches@gcc.gnu.org

The following avoids registering unsupported GCN offload devices
when iterating over available ones.  With a Zen4 desktop CPU
you will have an IGPU (unspported) which will otherwise be made
available.  This causes testcases like
libgomp.c-c++-common/non-rect-loop-1.c which iterate over all
decives to FAIL.

OK?

libgomp/
* plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out
agents with unsupported ISA.
---
  libgomp/plugin/plugin-gcn.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 588358bbbf9..2771123252a 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1427,6 +1427,8 @@ init_hsa_runtime_functions (void)
  #undef DLSYM_FN
  }
  
+static gcn_isa isa_code (const char *isa);

+
  /* Return true if the agent is a GPU and can accept of concurrent submissions
 from different threads.  */
  
@@ -1443,6 +1445,18 @@ suitable_hsa_agent_p (hsa_agent_t agent)

switch (device_type)
  {
  case HSA_DEVICE_TYPE_GPU:
+  {
+   char name[64];
+   hsa_status_t status
+ = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_NAME, name);
+   if (status != HSA_STATUS_SUCCESS
+   || isa_code (name) == EF_AMDGPU_MACH_UNSUPPORTED)
+ {
+   GCN_DEBUG ("Ignoring unsupported agent '%s'\n",
+  status == HSA_STATUS_SUCCESS ? name : "invalid");
+   return false;
+ }
+  }
break;
  case HSA_DEVICE_TYPE_CPU:
if (!support_cpu_devices)




Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-01-26 Thread Qing Zhao


> On Jan 26, 2024, at 3:04 AM, Martin Uecker  wrote:
> 
> 
> I haven't looked at the patch, but it sounds you give the result
> the wrong type. Then patching up all use cases instead of the
> type seems wrong.

Yes, this is for resolving a very early gimplification issue as I reported last 
Nov:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html

Since no-one responded at that time, I fixed the issue by replacing the 
ARRAY_REF
With a pointer indirection:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html

The reason for such change is:  return a flexible array member TYPE is not 
allowed
by C language (our gimplification follows this rule), so, we have to return a 
pointer TYPE instead. 

**The new internal function

 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
ACCESS_MODE, INDEX)

INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)

which returns the "REF_TO_OBJ" same as the 1st argument;

Both the return type and the type of the first argument of this function have 
been converted from 
the incomplete array type to the corresponding pointer type.

As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
original INDEX of the ARRAY_REF was lost
when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX for 
bound sanitizer instrumentation, I added
The 6th argument “INDEX”.

What’s your comment and suggestion on this solution?

Thanks.

Qing


> 
> Martin
> 
> 
> Am Donnerstag, dem 25.01.2024 um 20:11 + schrieb Qing Zhao:
>> Thanks a lot for the testing.
>> 
>> Yes, I can repeat the issue with the following small example:
>> 
>> #include 
>> #include 
>> #include 
>> 
>> #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
>> 
>> struct untracked {
>>   int size;
>>   int array[] __attribute__((counted_by (size)));
>> } *a;
>> struct untracked * alloc_buf (int index)
>> {
>>  struct untracked *p;
>>  p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
>>(offsetof (struct untracked, array[0])
>> + (index) * sizeof (int;
>>  p->size = index;
>>  return p;
>> }
>> 
>> int main()
>> {
>>  a = alloc_buf(10);
>> printf ("same_type is %d\n",
>>  (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->array)[0];
>>  return 0;
>> }
>> 
>> 
>> /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
>> same_type is 1
>> 
>> Looks like that the “typeof” operator need to be handled specially in C FE
>> for the new internal function .ACCESS_WITH_SIZE. 
>> 
>> (I have specially handle the operator “offsetof” in C FE already).
>> 
>> Will fix this issue.
>> 
>> Thanks.
>> 
>> Qing
>> 
>>> On Jan 24, 2024, at 7:51 PM, Kees Cook  wrote:
>>> 
>>> On Wed, Jan 24, 2024 at 12:29:51AM +, Qing Zhao wrote:
 This is the 4th version of the patch.
>>> 
>>> Thanks very much for this!
>>> 
>>> I tripped over an unexpected behavioral change that the Linux kernel
>>> depends on:
>>> 
>>> __builtin_types_compatible_p() no longer treats an array marked with
>>> counted_by as different from that array's decayed pointer. Specifically,
>>> the kernel uses these macros:
>>> 
>>> 
>>> /*
>>> * Force a compilation error if condition is true, but also produce a
>>> * result (of value 0 and type int), so the expression can be used
>>> * e.g. in a structure initializer (or where-ever else comma expressions
>>> * aren't permitted).
>>> */
>>> #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>>> 
>>> #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>>> 
>>> /* &a[0] degrades to a pointer: a different type from an array */
>>> #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
>>> 
>>> 
>>> This gets used in various places to make sure we're dealing with an
>>> array for a macro:
>>> 
>>> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
>>> __must_be_array(arr))
>>> 
>>> 
>>> So this builds:
>>> 
>>> struct untracked {
>>>   int size;
>>>   int array[];
>>> } *a;
>>> 
>>> __must_be_array(a->array)
>>> => 0 (as expected)
>>> __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
>>> => 0 (as expected, array vs decayed array pointer)
>>> 
>>> 
>>> But if counted_by is added, we get a build failure:
>>> 
>>> struct tracked {
>>>   int size;
>>>   int array[] __counted_by(size);
>>> } *b;
>>> 
>>> __must_be_array(b->array)
>>> => build failure (not expected)
>>> __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
>>> => 1 (not expected, both pointers?)
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Kees Cook
>> 
> 



[PATCH v2] arm: Fix missing bti instruction for virtual thunks

2024-01-26 Thread Richard Ball
v2: Formatting and test options fix.

Adds missing bti instruction at the beginning of a virtual
thunk, when bti is enabled.

gcc/ChangeLog:

* config/arm/arm.cc (arm_output_mi_thunk): Emit
insn for bti_c when bti is enabled.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add v8_1_m_main_pacbti.
* g++.target/arm/bti_thunk.C: New test.diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index e5a944486d7bd583627b0e22dfe8f95862e975bb..c44047c377a802d0c1dc1406df1b88a6b079607b 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -29257,6 +29257,8 @@ arm_output_mi_thunk (FILE *file, tree thunk, HOST_WIDE_INT delta,
   const char *fnname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk));
 
   assemble_start_function (thunk, fnname);
+  if (aarch_bti_enabled ())
+emit_insn (aarch_gen_bti_c ());
   if (TARGET_32BIT)
 arm32_output_mi_thunk (file, thunk, delta, vcall_offset, function);
   else
diff --git a/gcc/testsuite/g++.target/arm/bti_thunk.C b/gcc/testsuite/g++.target/arm/bti_thunk.C
new file mode 100644
index ..23c1acc66629ba5907830ae27de0b29a47032c65
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/bti_thunk.C
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8_1m_main_pacbti_ok } */
+/* { dg-add-options arm_arch_v8_1m_main_pacbti } */
+/* { dg-additional-options "-mbranch-protection=bti" }*/
+
+#include 
+
+struct C18 {
+  virtual void f7();
+};
+
+struct C19 : virtual C18 {
+  virtual void f7();
+};
+
+void C19::f7() {
+  printf("foo\n");
+}
+
+/* { dg-final { scan-assembler-times "\tbti" 2 } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 73360cd3a0d5553ff7586b48e0bb17298e6612f0..3070b51b9b86e1a2bad630a765a075f8b911d2ad 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5524,6 +5524,8 @@ foreach { armfunc armflag armdefs } {
 		__ARM_ARCH_8M_BASE__
 	v8m_main "-march=armv8-m.main+fp -mthumb" __ARM_ARCH_8M_MAIN__
 	v8_1m_main "-march=armv8.1-m.main+fp -mthumb" __ARM_ARCH_8M_MAIN__
+	v8_1m_main_pacbti "-march=armv8.1-m.main+pacbti+fp -mthumb"
+		"__ARM_ARCH_8M_MAIN__ && __ARM_FEATURE_BTI"
 	v9a "-march=armv9-a+simd" __ARM_ARCH_9A__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEFS $armdefs ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {


Re: [PATCH] c++/modules: Stream additional fields for DECL_STRUCT_FUNCTION [PR113580]

2024-01-26 Thread Patrick Palka
On Fri, 26 Jan 2024, Nathaniel Shead wrote:

> This patch just adds enough of the fields from 'function' to fix the ICE
> in the linked PR. I suppose there might be more fields from this type
> that should be propagated, but I don't know enough to find out which
> they might be yet, since a lot of them seem to be only set after
> gimplification.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> Currently the DECL_STRUCT_FUNCTION for a declaration is always
> reconstructed from scratch. This causes issues though, as some fields
> used by other parts of the compiler (in this case, specifically
> 'function_{start,end}_locus') are then not correctly initialised. This
> patch makes sure that these fields are also read and written.

LGTM

> 
>   PR c++/113580
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (struct post_process_data): Create.
>   (trees_in::post_decls): Use.
>   (trees_in::post_process): Return entire vector at once.
>   Change overload to take post_process_data instead of tree.
>   (trees_out::write_function_def): Write needed flags from
>   DECL_STRUCT_FUNCTION.
>   (trees_in::read_function_def): Read them and pass to
>   post_process.
>   (module_state::read_cluster): Write flags into cfun.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/pr113580_a.C: New test.
>   * g++.dg/modules/pr113580_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc  | 47 ++-
>  gcc/testsuite/g++.dg/modules/pr113580_a.C | 10 +
>  gcc/testsuite/g++.dg/modules/pr113580_b.C | 10 +
>  3 files changed, 58 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr113580_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr113580_b.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 6176801b7a7..840c7ef6dab 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -2837,6 +2837,13 @@ typedef hash_mapsimple_hashmap_traits,uintptr_t> >
>  duplicate_hash_map;
>  
> +/* Data needed for post-processing.  */
> +struct post_process_data {
> +  tree decl;
> +  location_t start_locus;
> +  location_t end_locus;
> +};
> +
>  /* Tree stream reader.  Note that reading a stream doesn't mark the
> read trees with TREE_VISITED.  Thus it's quite safe to have
> multiple concurrent readers.  Which is good, because lazy
> @@ -2848,7 +2855,7 @@ private:
>module_state *state;   /* Module being imported.  */
>vec back_refs;   /* Back references.  */
>duplicate_hash_map *duplicates;/* Map from existings to duplicate.  */
> -  vec post_decls;  /* Decls to post process.  */
> +  vec post_decls; /* Decls to post process.  */
>unsigned unused;   /* Inhibit any interior TREE_USED
>  marking.  */
>  
> @@ -2945,16 +2952,16 @@ public:
>tree odr_duplicate (tree decl, bool has_defn);
>  
>  public:
> -  /* Return the next decl to postprocess, or NULL.  */
> -  tree post_process ()
> +  /* Return the decls to postprocess.  */
> +  const vec& post_process ()
>{
> -return post_decls.length () ? post_decls.pop () : NULL_TREE;
> +return post_decls;
>}
>  private:
> -  /* Register DECL for postprocessing.  */
> -  void post_process (tree decl)
> +  /* Register DATA for postprocessing.  */
> +  void post_process (post_process_data data)
>{
> -post_decls.safe_push (decl);
> +post_decls.safe_push (data);
>}
>  
>  private:
> @@ -11667,15 +11674,25 @@ trees_out::write_function_def (tree decl)
>tree_node (cexpr->body);
>  }
>  
> +  function* f = DECL_STRUCT_FUNCTION (decl);
> +
>if (streaming_p ())
>  {
>unsigned flags = 0;
>  
> +  if (f)
> + flags |= 2;
>if (DECL_NOT_REALLY_EXTERN (decl))
>   flags |= 1;
>  
>u (flags);
>  }
> +
> +  if (state && f)
> +{
> +  state->write_location (*this, f->function_start_locus);
> +  state->write_location (*this, f->function_end_locus);
> +}
>  }
>  
>  void
> @@ -11692,6 +11709,8 @@ trees_in::read_function_def (tree decl, tree 
> maybe_template)
>tree saved = tree_node ();
>tree context = tree_node ();
>constexpr_fundef cexpr;
> +  post_process_data pdata {};
> +  pdata.decl = maybe_template;
>  
>tree maybe_dup = odr_duplicate (maybe_template, DECL_SAVED_TREE (decl));
>bool installing = maybe_dup && !DECL_SAVED_TREE (decl);
> @@ -11708,6 +11727,12 @@ trees_in::read_function_def (tree decl, tree 
> maybe_template)
>  
>unsigned flags = u ();
>  
> +  if (flags & 2)
> +{
> +  pdata.start_locus = state->read_location (*this);
> +  pdata.end_locus = state->read_location (*this);
> +}
> +
>if (get_overrun ())
>  return NULL_TREE;
>  
> @@ -11722,7 +11747,7 @@ trees_in::read_function_def (tree decl, tree 
> maybe_template)
>   SET_DECL_FRIEND_CONTE

Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Thomas Schwinge
Hi!

Great progress that you've made!  :-)

On 2024-01-26T13:32:02+0100, Tobias Burnus  wrote:
> Tobias Burnus wrote:
>> Am 24.01.24 um 17:01 schrieb Tobias Burnus:
>>> Okay to enable gfx1100 multilib building and to document gfx1100 in 
>>> the manual?
>>
>> and, with this patch, additionally gfx1030?

> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the 
> value of
>  
>  @item amdgcn*-*-*
>  @var{list} is a comma separated list of ISA names (allowed values: 
> @code{fiji},
> -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
> -include the name of the default ISA, specified via @option{--with-arch}.  If
> -@var{list} is empty, then there will be no multilibs and only the default
> -run-time library will be built.  If @var{list} is @code{default} or
> -@option{--with-multilib-list=} is not specified, then the default set of
> -libraries is selected.
> +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
> +@code{gfx1100}).  It ought not include the name of the default ISA, specified
> +via @option{--with-arch}.  If @var{list} is empty, then there will be no
> +multilibs and only the default run-time library will be built.  If @var{list}
> +is @code{default} or @option{--with-multilib-list=} is not specified, then
> +the default set of libraries is selected.

Further down in that file, we state:

@anchor{amdgcn-x-amdhsa}
@heading amdgcn-*-amdhsa
AMD GCN GPU target.

Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, 
[...]

LLVM 13.0.1 may still be fine for gfx1030
('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
tested), but it's not sufficient for gfx1100 anymore:

[...]
checking for suffix of object files... configure: error: in 
`[...]/amdgcn-amdhsa/gfx1100/libgcc':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details
make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1
[...]

'[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log':

[...]
'gfx1100' is not a recognized processor for this target (ignoring processor)
'gfx1100' is not a recognized processor for this target (ignoring processor)
/tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id 
amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id 
amdgcn-unknown-amdhsa--gfx000
.amdgcn_target "amdgcn-unknown-amdhsa--gfx1100"
   ^
[...]

Which version of LLVM should we be recommending?


Grüße
 Thomas


Re: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs

2024-01-26 Thread Richard Biener



> Am 26.01.2024 um 17:22 schrieb Thomas Schwinge :
> 
> Hi!
> 
> Great progress that you've made!  :-)
> 
>> On 2024-01-26T13:32:02+0100, Tobias Burnus  wrote:
>> Tobias Burnus wrote:
>>> Am 24.01.24 um 17:01 schrieb Tobias Burnus:
 Okay to enable gfx1100 multilib building and to document gfx1100 in
 the manual?
>>> 
>>> and, with this patch, additionally gfx1030?
> 
>> amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the 
>> docs
> 
>> --- a/gcc/doc/install.texi
>> +++ b/gcc/doc/install.texi
>> @@ -1258,12 +1258,12 @@ default set of libraries is selected based on the 
>> value of
>> 
>> @item amdgcn*-*-*
>> @var{list} is a comma separated list of ISA names (allowed values: 
>> @code{fiji},
>> -@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}). It ought not
>> -include the name of the default ISA, specified via @option{--with-arch}.  If
>> -@var{list} is empty, then there will be no multilibs and only the default
>> -run-time library will be built.  If @var{list} is @code{default} or
>> -@option{--with-multilib-list=} is not specified, then the default set of
>> -libraries is selected.
>> +@code{gfx900}, @code{gfx906}, @code{gfx908}, @code{gfx90a}, @code{gfx1030},
>> +@code{gfx1100}).  It ought not include the name of the default ISA, 
>> specified
>> +via @option{--with-arch}.  If @var{list} is empty, then there will be no
>> +multilibs and only the default run-time library will be built.  If 
>> @var{list}
>> +is @code{default} or @option{--with-multilib-list=} is not specified, then
>> +the default set of libraries is selected.
> 
> Further down in that file, we state:
> 
>@anchor{amdgcn-x-amdhsa}
>@heading amdgcn-*-amdhsa
>AMD GCN GPU target.
> 
>Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, 
> [...]
> 
> LLVM 13.0.1 may still be fine for gfx1030
> ('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
> tested), but it's not sufficient for gfx1100 anymore:
> 
>[...]
>checking for suffix of object files... configure: error: in 
> `[...]/amdgcn-amdhsa/gfx1100/libgcc':
>configure: error: cannot compute suffix of object files: cannot compile
>See `config.log' for more details
>make[1]: *** [Makefile:14105: configure-target-libgcc] Error 1
>[...]
> 
> '[...]/amdgcn-amdhsa/gfx1100/libgcc/config.log':
> 
>[...]
>'gfx1100' is not a recognized processor for this target (ignoring 
> processor)
>'gfx1100' is not a recognized processor for this target (ignoring 
> processor)
>/tmp/ccZdohcj.s:1:17: error: .amdgcn_target directive's target id 
> amdgcn-unknown-amdhsa--gfx1100 does not match the specified target id 
> amdgcn-unknown-amdhsa--gfx000
>.amdgcn_target "amdgcn-unknown-amdhsa--gfx1100"
>   ^
>[...]
> 
> Which version of LLVM should we be recommending?

A more recent one ;)  unless we know of any fixed bugs in the assembler we’d 
rely on I‘d day the oldest that works „or later“

Richard 

> 
> Grüße
> Thomas


[patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled (was: [patch] amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs)

2024-01-26 Thread Tobias Burnus

Hi,

Thomas Schwinge wrote:
amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to 
the docs

...
Further down in that file, we state:
 @anchor{amdgcn-x-amdhsa}
 @heading amdgcn-*-amdhsa
 AMD GCN GPU target.
 
 Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, [...]


LLVM 13.0.1 may still be fine for gfx1030
('[...]/amdgcn-amdhsa/gfx1030/libgcc' does get built; I've not further
tested), but it's not sufficient for gfx1100 anymore:


Testing with the system compilers here, llvm-mc-14.0.6 also fails while 
llvm-mc-15.0.7 accepts it.



Which version of LLVM should we be recommending?


>= LLVM 15, I think. How about the following wording? It still mentions 
LLVM 13.0.1 for those that really need it but with for the default 
setup, it requires 15+.


Tobias
install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

gcc/ChangeLog:

	* doc/install.texi (amdgcn): Recommend LLVM 15+ and newlib 4.4+,
	but keep requiring only newlib 4.3+ and, if gfx1100 is disabled,
	LLVM 13.0.1+.

Signed-off-by: Tobias Burnus 

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 5747b5a12fe..c7794439107 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3927,14 +3927,15 @@ This is a synonym for @samp{x86_64-*-solaris2*}.
 @heading amdgcn-*-amdhsa
 AMD GCN GPU target.
 
-Instead of GNU Binutils, you will need to install LLVM 13.0.1, or later, and copy
+Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 @file{bin/llvm-mc} to @file{amdgcn-amdhsa/bin/as},
 @file{bin/lld} to @file{amdgcn-amdhsa/bin/ld},
 @file{bin/llvm-nm} to @file{amdgcn-amdhsa/bin/nm}, and
 @file{bin/llvm-ar} to both @file{bin/amdgcn-amdhsa-ar} and
-@file{bin/amdgcn-amdhsa-ranlib}.
+@file{bin/amdgcn-amdhsa-ranlib}.  Note that LLVM 13.0.1 or LLVM 14 can be used
+by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}.
 
-Use Newlib (4.3.0 or newer).
+Use Newlib (4.3.0 or newer; 4.4.0 or later is recommended).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use


[wwwdocs][patch] gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

2024-01-26 Thread Tobias Burnus

Mention that gfx1030/gfx1100 are now supported.

As noted in another thread, LLVM 15's assembler is now required, before 
LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would 
do.) Hence, the added link to the install documentation.


Comments, suggestions?

Tobias
gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

Signed-off-by: Tobias Burnus 

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index a04b62ff..2d777f52 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -329,6 +329,11 @@ a work-in-progress.
 AMD Radeon (GCN)
 
 
+  Initial support for the AMD Radeon gfx1030 (RDNA2) and
+gfx1100 (RDNA3) devices has been added, which required an
+update of the default
+https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa";>build
+requirements for the build.
   Improved register usage and performance on CDNA Instinct MI100
 and MI200 series devices.
   The default device architecture is now gfx900 (Vega).


RE: [PATCH] libstdc++: Make PSTL algorithms accept C++20 iterators [PR110512]

2024-01-26 Thread Dvorskiy, Mikhail
Hi everyone,

Let me explain a reason of the issue connected with _PSTL_USAGE_WARNINGS macro 
with GCC14.

Firstly, there is no such issue on version GCC 13.2.0, because 
_PSTL_PRAGMA_MESSAGE is defined in #pragma message only if _PSTL_USAGE_WARNINGS 
> 0, please have a look:
https://github.com/gcc-mirror/gcc/blob/releases/gcc-13.2.0/libstdc%2B%2B-v3/include/pstl/pstl_config.h#L160

On GCC trunk there is another preprocessor logic - _PSTL_PRAGMA_MESSAGE is 
defined in #pragma message when _PSTL_USAGE_WARNINGS is merely defined.
https://github.com/gcc-mirror/gcc/blame/9693459e030977d6e906ea7eb587ed09ee4fddbd/libstdc%2B%2B-v3/include/pstl/pstl_config.h#L180
 
while _PSTL_USAGE_WARNINGS is defined to 0 by default:
https://github.com/gcc-mirror/gcc/blame/9693459e030977d6e906ea7eb587ed09ee4fddbd/libstdc%2B%2B-v3/include/pstl/pstl_config.h#L29

Best regards,
Mikhail Dvorskiy

-Original Message-
From: Jonathan Wakely  
Sent: Saturday, January 13, 2024 12:20 PM
To: Pilar Latiesa 
Cc: libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] libstdc++: Make PSTL algorithms accept C++20 iterators 
[PR110512]

On Sat, 13 Jan 2024 at 09:36, Pilar Latiesa  wrote:
>
> Hi Jonathan
>
> Thanks so much for implementing this.
>
> There are a couple of typos in the patch description: 
> 's/C==17RandomAccessIterator/Cpp17RandomAccessIterator/' and 
> 's/__or_/__and_/'.

Thanks for the comments, I'll fix those.

>
> I've applied your patch localy and it works fine for all my use cases, which 
> admitedly simply consist of using views::zip and views::enumerate with 
> std::for_each. My tbb version is 2020.1.

Thanks for checking it, all feedback is useful.

>
> Unrelated to your patch, with GCC 14, I'm getting a ton of notes for code 
> like:
>
> void f(std::vector &v)
>   { std::for_each(std::execution::par, v.begin(), v.end(), [](int &i) 
> { i *= 2; }); }
>
> indicating that some internal functions are not vectorized.
>
> I very much like getting warnings if an algorithm happens to fall back to its 
> serial version, but in this case I didn't even asked for (unseq) 
> vectorization, yet I got bunch of notes: "Vectorized algorithm unimplemented, 
> redirected to serial" and the algorithm itself is indeed parallelized.
>
> The notes are so confusing that I would suggest undefining 
> _PSTL_USAGE_WARNINGS for G++ (in fact, I don't understand the logic that uses 
> this macro in pstl_config.h).

Yeah, I've seen some of those, and I'm afraid I don't have any ideas about them 
for now. Could you report it to bugzilla so we don't forget to look into it? 
Thanks!

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de 
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Re: [PATCH V3 2/4] RISC-V: Add vector related pipelines

2024-01-26 Thread Edwin Lu



On 1/25/2024 9:06 AM, Robin Dapp wrote:

Thanks, that looks better IMHO.


+;; Copyright (C) 2011-2024 Free Software Foundation, Inc.
+;; Contributed by Andrew Waterman (and...@sifive.com).
+;; Based on MIPS target for GNU compiler.

You might want to change that, as well as the date.  While at
it you can also fix the broken date in my original file ;)



Completely forgot about this. I'll update it :)


+(define_insn_reservation "vec_load" 6
+  (and (eq_attr "is_inorder" "no")
+   (eq_attr "type" "vlde,vldm,vlds,vldux,vldox,vldff,vldr"))
+  "vxu_ooo_issue,vxu_ooo_alu")

I would rather ditch the is_inorder attribute for now and define
"low" latencies as well as reservations explicitly once we're
sure rather than falling back to scheduler defaults.



I think removing the is_inorder attribute should be ok. I added it 
because I wanted to avoid having two matching insn reservations defined 
since matching solely on the type attribute should also match on all 
subsets as well (i.e. if eventually we add an insn reservation checking 
for type "vlde" and tune "generic-ooo", any "vlde" insn would map to 
both reservations)


For now I should just remove the is_inorder attribute. We will update 
the latencies and add new reservations after we know what they should 
be. Is that correct?



OK with those changes.

Regards
  Robin


[middle-end PATCH] Constant fold {-1,-1} << 1 in simplify-rtx.cc

2024-01-26 Thread Roger Sayle

This patch addresses a missed optimization opportunity in the RTL
optimization passes.  The function simplify_const_binary_operation
will constant fold binary operators with two CONST_INT operands,
and those with two CONST_VECTOR operands, but is missing compile-time
evaluation of binary operators with a CONST_VECTOR and a CONST_INT,
such as vector shifts and rotates.

My first version of this patch didn't contain a switch statement to
explicitly check for valid binary opcodes, which bootstrapped and
regression tested fine, but by paranoia has got the better of me,
so this version now checks that VEC_SELECT or some funky (future)
rtx_code doesn't cause problems.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline (in stage 1)?


2024-01-26  Roger Sayle  

gcc/ChangeLog
* simplify-rtx.cc (simplify_const_binary_operation): Constant
fold binary operations where the LHS is CONST_VECTOR and the
RHS is CONST_INT (or CONST_DOUBLE) such as vector shifts.


Thanks in advance,
Roger
--

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index c7215cf..2e2809a 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -5021,6 +5021,60 @@ simplify_const_binary_operation (enum rtx_code code, 
machine_mode mode,
   return gen_rtx_CONST_VECTOR (mode, v);
 }
 
+  if (VECTOR_MODE_P (mode)
+  && GET_CODE (op0) == CONST_VECTOR
+  && (CONST_SCALAR_INT_P (op1) || CONST_DOUBLE_AS_FLOAT_P (op1))
+  && (CONST_VECTOR_DUPLICATE_P (op0)
+ || CONST_VECTOR_NUNITS (op0).is_constant ()))
+{
+  switch (code)
+   {
+   case PLUS:
+   case MINUS:
+   case MULT:
+   case DIV:
+   case MOD:
+   case UDIV:
+   case UMOD:
+   case AND:
+   case IOR:
+   case XOR:
+   case SMIN:
+   case SMAX:
+   case UMIN:
+   case UMAX:
+   case LSHIFTRT:
+   case ASHIFTRT:
+   case ASHIFT:
+   case ROTATE:
+   case ROTATERT:
+   case SS_PLUS:
+   case US_PLUS:
+   case SS_MINUS:
+   case US_MINUS:
+   case SS_ASHIFT:
+   case US_ASHIFT:
+   case COPYSIGN:
+ break;
+   default:
+ return NULL_RTX;
+   }
+
+  unsigned int npatterns = (CONST_VECTOR_DUPLICATE_P (op0)
+   ? CONST_VECTOR_NPATTERNS (op0)
+   : CONST_VECTOR_NUNITS (op0).to_constant ());
+  rtx_vector_builder builder (mode, npatterns, 1);
+  for (unsigned i = 0; i < npatterns; i++)
+   {
+ rtx x = simplify_binary_operation (code, GET_MODE_INNER (mode),
+CONST_VECTOR_ELT (op0, i), op1);
+ if (!x || !valid_for_const_vector_p (mode, x))
+   return 0;
+ builder.quick_push (x);
+   }
+  return builder.build ();
+}
+
   if (SCALAR_FLOAT_MODE_P (mode)
   && CONST_DOUBLE_AS_FLOAT_P (op0) 
   && CONST_DOUBLE_AS_FLOAT_P (op1)


Re: [PATCH V3 3/4] RISC-V: Use default cost model for insn scheduling

2024-01-26 Thread Edwin Lu

On 1/25/2024 9:06 AM, Robin Dapp wrote:


39 additional unique testsuite failures (scan dumps) will still be present.
I don't know how optimal the new output is compared to the old. Should I update
the testcase expected output to match the new scan dumps?


Currently, without vector op latency, the output should come close
to what's normally considered "good" (i.e. minimal number of vsetvls
and so on).  Therefore I'd suggest not to change the scan dumps to
much except when there is a real problem.  If you have a specific
example that you're unsure about we can discuss this on or off list.



That sounds good to me! I don't have a specific example that stands out. 
It's mostly just not finding the expected number of vsetvls or the 
vector insns have been moved earlier or something like that.


Edwin


Re: [PATCH V3 4/4] RISC-V: Enable assert for insn_has_dfa_reservation

2024-01-26 Thread Edwin Lu

On 1/25/2024 9:06 AM, Robin Dapp wrote:

/* If we ever encounter an insn without an insn reservation, trip
   an assert so we can find and fix this problem.  */
-#if 0
+  if (! insn_has_dfa_reservation_p (insn)) {
+print_rtl(stderr, insn);
+fprintf(stderr, "%d", get_attr_type (insn));
+  }
gcc_assert (insn_has_dfa_reservation_p (insn));
-#endif
  
return more - 1;

  }


Oops accidentally left my debugging statements in the patch.



I was thinking about make the gcc_assert a gcc_checking_assert so,
in case we accidentally forget something at any point, it would
only gracefully degrade in a release build.  As we already have
a hard assert for the type the patch (and not many test with
enable checking anyway) this is OK IMHO.

I suppose you tested with all available -mtune options?



I ran the testsuite on all three tunes using linux rv64gcv. generic-ooo 
had some bugs fixed while rocket and sifive-7-series had around 37 new 
scan dump failures which I think is to be expected. No ICE's from the 
hard gcc_assert on any of the tunes so I think it should be fine as is.


Edwin



Re: [PATCH] c++/modules: Stream additional fields for DECL_STRUCT_FUNCTION [PR113580]

2024-01-26 Thread Jason Merrill

On 1/26/24 10:49, Patrick Palka wrote:

On Fri, 26 Jan 2024, Nathaniel Shead wrote:


This patch just adds enough of the fields from 'function' to fix the ICE
in the linked PR. I suppose there might be more fields from this type
that should be propagated, but I don't know enough to find out which
they might be yet, since a lot of them seem to be only set after
gimplification.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently the DECL_STRUCT_FUNCTION for a declaration is always
reconstructed from scratch. This causes issues though, as some fields
used by other parts of the compiler (in this case, specifically
'function_{start,end}_locus') are then not correctly initialised. This
patch makes sure that these fields are also read and written.


LGTM


Yep, OK.

Jason



Re: [PATCH] c++: implement [[gnu::non_owning]] [PR110358]

2024-01-26 Thread Jason Merrill

On 1/25/24 20:37, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site.  But #pragma GCC diagnostic tend
to be onerous.  A better solution would be to have an attribute.  Such
an attribute should not be tied to this particular warning though.  [*]

The warning bogusly triggers for classes that are like std::span,
std::reference_wrapper, and std::ranges::ref_view.  The common property
seems to be that these classes are only wrappers around some data.  So
I chose the name non_owning, but I'm not attached to it.  I hope that
in the future the attribute can be used for something other than this
diagnostic.


You decided not to pursue Barry's request for a bool argument to the 
attribute?


Might it be more useful for the attribute to make reference_like_class_p 
return true, so that we still warn about a temporary of another type 
passing through it?


Jason



Re: [PATCH] c++: #pragma doesn't disable -Wunused-label [PR113582]

2024-01-26 Thread Jason Merrill

On 1/25/24 20:38, Marek Polacek wrote:

Low prio and not a regression.  Feel free to ignore till GCC 15.


OK for stage 1.


Bootstrapped/regtested on x86_64-pc-linux-gnu.

-- >8 --
The PR complains that

   void do_something(){
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wunused-label"
 start:;
 #pragma GCC diagnostic pop
   } #1

doesn't work.  That's because we warn_for_unused_label only while we're
in finish_function, meaning we're at #1 where we're outside the #pragma
region.  We can use suppress_warning + warning_suppressed_p to fix this.

Note that I'm not using TREE_USED.  Propagating it in tsubst_stmt/LABEL_EXPR
from decl to label would mean that we don't warn in do_something2, but
I think we want the warning there: we're in a template and the goto is
a discarded statement.

PR c++/113582

gcc/c-family/ChangeLog:

* c-warn.cc (warn_for_unused_label): Don't warn if -Wunused-label has
been suppressed for the label.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_label_for_labeled_statement): suppress_warning
if it's not enabled at input_location.
* pt.cc (tsubst_stmt): Call copy_warning.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wunused-label-4.C: New test.
---
  gcc/c-family/c-warn.cc  |  4 ++-
  gcc/cp/parser.cc|  6 -
  gcc/cp/pt.cc|  9 ---
  gcc/testsuite/g++.dg/warn/Wunused-label-4.C | 29 +
  4 files changed, 42 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wunused-label-4.C

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index 8168696fa45..fdb5338f6f6 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -2186,7 +2186,9 @@ warn_for_unused_label (tree label)
  {
if (!TREE_USED (label))
  {
-  if (DECL_INITIAL (label))
+  if (warning_suppressed_p (label, OPT_Wunused_label))
+   /* Don't warn.  */;
+  else if (DECL_INITIAL (label))
warning (OPT_Wunused_label, "label %q+D defined but not used", label);
else
warning (OPT_Wunused_label, "label %q+D declared but not defined", 
label);
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3748ccd49ff..224d47f2f90 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13093,7 +13093,11 @@ cp_parser_label_for_labeled_statement (cp_parser* 
parser, tree attributes)
/* Anything else must be an ordinary label.  */
label = finish_label_stmt (cp_parser_identifier (parser));
if (label && TREE_CODE (label) == LABEL_DECL)
-   FALLTHROUGH_LABEL_P (label) = fallthrough_p;
+   {
+ FALLTHROUGH_LABEL_P (label) = fallthrough_p;
+ if (!warning_enabled_at (input_location, OPT_Wunused_label))
+   suppress_warning (label, OPT_Wunused_label);
+   }
break;
  }
  
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc

index 74013533b0f..af9fd8f6f03 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18796,11 +18796,12 @@ tsubst_stmt (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  case LABEL_EXPR:
{
tree decl = LABEL_EXPR_LABEL (t);
-   tree label;
-
-   label = finish_label_stmt (DECL_NAME (decl));
+   tree label = finish_label_stmt (DECL_NAME (decl));
if (TREE_CODE (label) == LABEL_DECL)
- FALLTHROUGH_LABEL_P (label) = FALLTHROUGH_LABEL_P (decl);
+ {
+   FALLTHROUGH_LABEL_P (label) = FALLTHROUGH_LABEL_P (decl);
+   copy_warning (label, decl);
+ }
if (DECL_ATTRIBUTES (decl) != NULL_TREE)
  cplus_decl_attributes (&label, DECL_ATTRIBUTES (decl), 0);
}
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-label-4.C 
b/gcc/testsuite/g++.dg/warn/Wunused-label-4.C
new file mode 100644
index 000..d194f043d21
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-label-4.C
@@ -0,0 +1,29 @@
+// PR c++/113582
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wunused-label" }
+
+template void
+do_something ()
+{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wunused-label"
+start:
+  if constexpr(B)
+goto start;
+#pragma GCC diagnostic pop
+}
+
+template void
+do_something2 ()
+{
+start: // { dg-warning "defined but not used" }
+  if constexpr(B)
+goto start;
+}
+
+void
+g ()
+{
+  do_something<0>();
+  do_something2<0>();
+}

base-commit: f22a7ae8a96f7e5e330b12bd5045424619aa4926




Re: [PATCH] c-family: Fix ICE with large column number after restoring a PCH [PR105608]

2024-01-26 Thread Jason Merrill

On 12/5/23 20:52, Lewis Hyatt wrote:

Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105608

There are two related issues here really, a regression since GCC 11 where we
can ICE after restoring a PCH, and a deeper issue with bogus locations
assigned to macros that were defined prior to restoring a PCH.  This patch
fixes the ICE regression with a simple change, and I think it's appropriate
for GCC 14 as well as backport to 11, 12, 13. The bad locations (wrong, but
not generally causing an ICE, and mostly affecting only the output of
-Wunused-macros) are not as problematic, and will be harder to fix. I could
take a stab at that for GCC 15. In the meantime the patch adds XFAILed
tests for the wrong locations (as well as passing tests for the regression
fix). Does it look OK please? Bootstrap + regtest all languages on x86-64
Linux. Thanks!


OK for trunk and branches, thanks!


-- >8 --

Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH.  However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.

This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
The new testcase line-map-3.C contains XFAILed examples where the locations
are wrong.

This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long.  This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.

gcc/c-family/ChangeLog:

PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.

gcc/testsuite/ChangeLog:

PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
* g++.dg/pch/line-map-3.C: New test.
* g++.dg/pch/line-map-3.Hs: New test.
---
  gcc/c-family/c-pch.cc  |  5 ++---
  gcc/testsuite/g++.dg/pch/line-map-1.C  |  4 
  gcc/testsuite/g++.dg/pch/line-map-1.Hs |  1 +
  gcc/testsuite/g++.dg/pch/line-map-2.C  |  6 ++
  gcc/testsuite/g++.dg/pch/line-map-2.Hs |  1 +
  gcc/testsuite/g++.dg/pch/line-map-3.C  | 23 +++
  gcc/testsuite/g++.dg/pch/line-map-3.Hs |  1 +
  7 files changed, 38 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/pch/line-map-1.C
  create mode 100644 gcc/testsuite/g++.dg/pch/line-map-1.Hs
  create mode 100644 gcc/testsuite/g++.dg/pch/line-map-2.C
  create mode 100644 gcc/testsuite/g++.dg/pch/line-map-2.Hs
  create mode 100644 gcc/testsuite/g++.dg/pch/line-map-3.C
  create mode 100644 gcc/testsuite/g++.dg/pch/line-map-3.Hs

diff --git a/gcc/c-family/c-pch.cc b/gcc/c-family/c-pch.cc
index 2f014fca210..9ee6f179002 100644
--- a/gcc/c-family/c-pch.cc
+++ b/gcc/c-family/c-pch.cc
@@ -342,6 +342,8 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
gt_pch_restore (f);
cpp_set_line_map (pfile, line_table);
rebuild_location_adhoc_htab (line_table);
+  line_table->trace_includes = saved_trace_includes;
+  linemap_add (line_table, LC_ENTER, 0, saved_loc.file, saved_loc.line);
  
timevar_push (TV_PCH_CPP_RESTORE);

if (cpp_read_state (pfile, name, f, smd) != 0)
@@ -355,9 +357,6 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
  
fclose (f);
  
-  line_table->trace_includes = saved_trace_includes;

-  linemap_add (line_table, LC_ENTER, 0, saved_loc.file, saved_loc.line);
-
/* Give the front end a chance to take action after a PCH file has
   been loaded.  */
if (lang_post_pch_load)
diff --git a/gcc/testsuite/g++.dg/pch/line-map-1.C 
b/gcc/testsuite/g++.dg/pch/line-map-1.C
new file mode 100644
index 000..9d1ac6d1683
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/line-map-1.C
@@ -0,0 +1,4 @@
+/* PR preprocessor/105608 */
+/* { dg-do compile } */
+#define MACRO_ON_A_LONG_LINE "this line is long enough that it forces the line 
table to create an LC_RENAME map, which formerly triggered an ICE after PCH restore"
+#include "line-map-1.H"
diff --git a/gcc/testsuite/g++.dg/pch/line-map-1.Hs 
b/gcc/testsuite/g++.dg/pch/line-map-1.Hs
new file mode 100644
index 000..3b6178bfae0
--- /dev

Re: [PATCH 2/2] RISC-V/testsuite: Also verify if-conversion runs for pr105314.c

2024-01-26 Thread Maciej W. Rozycki
On Wed, 24 Jan 2024, Jeff Law wrote:

> >   Do we have consensus now to move forward with this change as posted?  I'd
> > like to get these patches ticked off ASAP.
> I think it should move forward.  I think having the RTL tests deals with
> Andrew's concern and the testcase adjustment has value as well.
> 
> I ACK's the RTL tests a few minutes ago and we should consider the 1/2 and 2/2
> of the original OK now as well.

 I have committed both patch series now, thank you for your assistance and 
review.

  Maciej


Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-01-26 Thread Jason Merrill

On 1/25/24 14:18, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?  This isn't a very satisfactory fix, but at least
it safely fixes these testcases I guess.  Note that there's
implementation disagreement about the second testcase, GCC always
accepted it but Clang/MSVC/icc reject it.


Because of trying to initialize int& from {c}; removing the extra braces 
makes it work everywhore.


https://eel.is/c++draft/dcl.init#list-3.10 says that we always generate 
a prvalue in this case, so perhaps we shouldn't recalculate if the 
initializer is an init-list?


The first testcase is special because it's a C-style cast; seems like 
the maybe_valid = false heuristics should be disabled if c_cast_p.


Jason



Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-01-26 Thread Jason Merrill

On 1/26/24 16:52, Jason Merrill wrote:

On 1/25/24 14:18, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?  This isn't a very satisfactory fix, but at least
it safely fixes these testcases I guess.  Note that there's
implementation disagreement about the second testcase, GCC always
accepted it but Clang/MSVC/icc reject it.


Because of trying to initialize int& from {c}; removing the extra braces 
makes it work everywhore.


https://eel.is/c++draft/dcl.init#list-3.10 says that we always generate 
a prvalue in this case, so perhaps we shouldn't recalculate if the 
initializer is an init-list?


...but it seems bad to silently bind a const int& to a prvalue instead 
of directly to the reference returned by the operator, as clang does if 
we add const to the second testcase, so I think there's a defect in the 
standard here.


Maybe for now also disable the maybe_valid heuristics in the case of an 
init-list?


The first testcase is special because it's a C-style cast; seems like 
the maybe_valid = false heuristics should be disabled if c_cast_p.


Jason




Re: [PATCH] c++: problematic assert in reference_binding [PR113141]

2024-01-26 Thread Jason Merrill

On 1/26/24 17:11, Jason Merrill wrote:

On 1/26/24 16:52, Jason Merrill wrote:

On 1/25/24 14:18, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/13?  This isn't a very satisfactory fix, but at least
it safely fixes these testcases I guess.  Note that there's
implementation disagreement about the second testcase, GCC always
accepted it but Clang/MSVC/icc reject it.


Because of trying to initialize int& from {c}; removing the extra 
braces makes it work everywhore.


https://eel.is/c++draft/dcl.init#list-3.10 says that we always 
generate a prvalue in this case, so perhaps we shouldn't recalculate 
if the initializer is an init-list?


...but it seems bad to silently bind a const int& to a prvalue instead 
of directly to the reference returned by the operator, as clang does if 
we add const to the second testcase, so I think there's a defect in the 
standard here.


Perhaps bullet 3.9 should change to "...its referenced type is 
reference-related to E or scalar, ..."


Maybe for now also disable the maybe_valid heuristics in the case of an 
init-list?


The first testcase is special because it's a C-style cast; seems like 
the maybe_valid = false heuristics should be disabled if c_cast_p.


Jason






Re: [PATCH] testsuite: Fix vect_long_mult on Power [PR109705]

2024-01-26 Thread Andrew Pinski
On Mon, Jan 15, 2024 at 6:43 PM Kewen.Lin  wrote:
>
> Hi,
>
> As pointed out by the discussion in PR109705, the current
> vect_long_mult effective target check on Power is broken.
> This patch is to fix it accordingly.
>
> With additional change by adding a guard vect_long_mult
> in gcc.dg/vect/pr25413a.c , it's tested well on Power{8,9}
> LE & BE (also on Power10 LE as before).

I see this is still broken for 32bit PowerPC where vect_long_mult
should return true still since long there is 32bit and there is a
32bit vector multiply.
Can someone test (and apply if approved) the attached patch to see if
it fixes pr25413a.c for powerpc*-*-* for 32bit?

Thanks,
Andrew Pinski

>
> I'm going to push this soon.
>
> BR,
> Kewen
> -
> PR testsuite/109705
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp (check_effective_target_vect_long_mult):
> Fix powerpc*-*-* checks.
> ---
>  gcc/testsuite/lib/target-supports.exp | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 81ae92a0266..fac32fb3d0e 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9073,9 +9073,9 @@ proc check_effective_target_vect_int_mult { } {
>
>  proc check_effective_target_vect_long_mult { } {
>  if { [istarget i?86-*-*] || [istarget x86_64-*-*]
> -|| (([istarget powerpc*-*-*]
> -  && ![istarget powerpc-*-linux*paired*])
> -  && [check_effective_target_ilp32])
> +|| ([istarget powerpc*-*-*]
> + && [check_effective_target_powerpc_vsx_ok]
> + && [check_effective_target_has_arch_pwr10])
>  || [is-effective-target arm_neon]
>  || ([istarget sparc*-*-*] && [check_effective_target_ilp32])
>  || [istarget aarch64*-*-*]
> --
> 2.39.1
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9fc5aae166d..52a2aec9982 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9086,8 +9086,9 @@ proc check_effective_target_vect_int_mult { } {
 proc check_effective_target_vect_long_mult { } {
 if { [istarget i?86-*-*] || [istarget x86_64-*-*]
 || ([istarget powerpc*-*-*]
- && [check_effective_target_powerpc_vsx_ok]
- && [check_effective_target_has_arch_pwr10])
+ && ([check_effective_target_ilp32]
+ || ([check_effective_target_powerpc_vsx_ok]
+ && [check_effective_target_has_arch_pwr10])))
 || [is-effective-target arm_neon]
 || ([istarget sparc*-*-*] && [check_effective_target_ilp32])
 || ([istarget aarch64*-*-*]


Re: [Committed] RISC-V: Add regression test for vsetvl bug pr113429

2024-01-26 Thread Patrick O'Neill

What target/config are these failures on?
I tried rv64gcv, rv64gc, rv32gcv, and rv32gc with RUNTESTFLAGS="rvv.exp" 
and don't see these failures.


Thanks,
Patrick

On 1/25/24 23:20, juzhe.zh...@rivai.ai wrote:

This patch causes the following regression:

FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O0  (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O1  (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2  (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer 
-finline-functions  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O3 -g  (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -Os  (test for excess 
errors)


I suggest you add :

/* { dg-require-effective-target rv64 } */
/* { dg-require-effective-target riscv_v } */


juzhe.zh...@rivai.ai

*From:* Patrick O'Neill 
*Date:* 2024-01-24 09:20
*To:* juzhe.zh...@rivai.ai; gcc-patches

*CC:* kito.cheng ; law
; rdapp ;
vineetg 
*Subject:* [Committed] RISC-V: Add regression test for vsetvl bug
pr113429

The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.
PR target/113429
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.
Signed-off-by: Patrick O'Neill
---
  .../gcc.target/riscv/rvv/vsetvl/pr113429.c| 70 +++
  1 file changed, 70 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
new file mode 100644
index 000..05c3eeecb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+
+long a;
+int b, c, d, e, f, g;
+short h, i, j;
+static int k = 3;
+static int l = 6;
+int m[5][7];
+signed char n;
+int *const o = &c;
+
+signed char(p)(signed char p1, signed char q) {
+  return p1 / q;
+}
+
+void s(unsigned p1) {
+  b = (b ^ p1) & 255;
+}
+
+static long t() {
+  long u;
+  signed char v;
+  d = 1;
+  for (; d <= 4; d++) {
+j = 0;
+for (; j <= 4; j++) {
+  v = 0;
+  for (; v <= 4; v++) {
+if (m[v][v])
+  continue;
+c = 0;
+for (; c <= 4; c++) {
+  n = 0;
+  for (; n <= 4; n++) {
+int *w = &e;
+long r = v;
+u = r == 0 ? a : a % r;
+h |= u;
+*w = g;
+--m[n][c];
+f &= *o;
+  }
+}
+if (p((i < 3) ^ 9, k))
+  ;
+else if (v)
+  return 0;
+  }
+}
+  }
+  return 1;
+}
+
+static char x() {
+  for (;;) {
+t();
+if (l)
+  return 0;
+  }
+}
+
+int main() {
+  x();
+  s(e & 255);
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
-- 
2.34.1


Re: Re: [Committed] RISC-V: Add regression test for vsetvl bug pr113429

2024-01-26 Thread 钟居哲
newlib rv32gcv



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-01-27 08:38
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; law; rdapp; vineetg
Subject: Re: [Committed] RISC-V: Add regression test for vsetvl bug pr113429
What target/config are these failures on?
I tried rv64gcv, rv64gc, rv32gcv, and rv32gc with RUNTESTFLAGS="rvv.exp" and 
don't see these failures.

Thanks,
Patrick
On 1/25/24 23:20, juzhe.zh...@rivai.ai wrote:
This patch causes the following regression:

FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O0  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O1  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -O3 -g  (test for excess errors)
FAIL: gcc.target/riscv/rvv/vsetvl/pr113429.c   -Os  (test for excess errors)

I suggest you add :

/* { dg-require-effective-target rv64 } */
/* { dg-require-effective-target riscv_v } */



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2024-01-24 09:20
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; law; rdapp; vineetg
Subject: [Committed] RISC-V: Add regression test for vsetvl bug pr113429
The reduced testcase for pr113429 (cam4 failure) needed additional
modules so it wasn't committed.
The fuzzer found a c testcase that was also fixed with pr113429's fix.
Adding it as a regression test.
PR target/113429
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr113429.c: New test.
Signed-off-by: Patrick O'Neill 
---
 .../gcc.target/riscv/rvv/vsetvl/pr113429.c| 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
new file mode 100644
index 000..05c3eeecb94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+
+long a;
+int b, c, d, e, f, g;
+short h, i, j;
+static int k = 3;
+static int l = 6;
+int m[5][7];
+signed char n;
+int *const o = &c;
+
+signed char(p)(signed char p1, signed char q) {
+  return p1 / q;
+}
+
+void s(unsigned p1) {
+  b = (b ^ p1) & 255;
+}
+
+static long t() {
+  long u;
+  signed char v;
+  d = 1;
+  for (; d <= 4; d++) {
+j = 0;
+for (; j <= 4; j++) {
+  v = 0;
+  for (; v <= 4; v++) {
+if (m[v][v])
+  continue;
+c = 0;
+for (; c <= 4; c++) {
+  n = 0;
+  for (; n <= 4; n++) {
+int *w = &e;
+long r = v;
+u = r == 0 ? a : a % r;
+h |= u;
+*w = g;
+--m[n][c];
+f &= *o;
+  }
+}
+if (p((i < 3) ^ 9, k))
+  ;
+else if (v)
+  return 0;
+  }
+}
+  }
+  return 1;
+}
+
+static char x() {
+  for (;;) {
+t();
+if (l)
+  return 0;
+  }
+}
+
+int main() {
+  x();
+  s(e & 255);
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
-- 
2.34.1



[PATCH] RISC-V: Add require-effective-target to pr113429 testcase

2024-01-26 Thread Patrick O'Neill
The pr113429 testcase fails with newlib spike runs. Adding
require-effective-target rv64 and riscv_v fixes the issue.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr113429.c: Add
require-effective-target rv64 and riscv_v

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc newlib spike.
---
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
index 05c3eeecb94..a7f5db616d8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr113429.c
@@ -1,5 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-require-effective-target riscv_v } */
 
 long a;
 int b, c, d, e, f, g;
-- 
2.34.1



Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread chenglulu



在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
    1. Add macro support for TLS symbols
    2. Added support for loading __get_tls_addr symbol address using call36.
    3. Merge template got_load_tls_{ld/gd/le/ie}.
    4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.



The following situations are not handled in the patch:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc


index 3fab4b64453..6336a9f696f 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,

 {
   if (TARGET_CMODEL_EXTREME)
    {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
  insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
    }
   else
@@ -7482,7 +7488,15 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,

   else
 {
   if (TARGET_CMODEL_EXTREME)
-   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+   {
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+   }
   else
    loongarch_emit_move (temp1, fnaddr);




Re: [PATCH v4] RISC-V: Implement TLS Descriptors.

2024-01-26 Thread Fangrui Song
On Mon, Dec 4, 2023 at 11:02 PM Tatsuyuki Ishi  wrote:
>
> This implements TLS Descriptors (TLSDESC) as specified in [1].
>
> The 4-instruction sequence is implemented as a single RTX insn for
> simplicity, but this can be revisited later if instruction scheduling or
> more flexible RA is desired.
>
> The default remains to be the traditional TLS model, but can be configured
> with --with_tls={trad,desc}. The choice can be revisited once toolchain
> and libc support ships.

typo: --with-tls={trad,desc}.

The patch looks good to me, but I am not an approver:) I only have
some nitpicky comments.

Reviewed-by: Fangrui Song 

I know that the binutils patch needs some work, but I think it doesn't
have to block this patch.
BTW, Clang got -mtls-dialect=desc today, so one can test GCC by  gcc
-S -fpic -mtls-dialect + clang -c + ld.lld + musl
https://maskray.me/blog/2024-01-23-riscv-tlsdesc-works

I actually tested my ld.lld patch by compiling with GCC and manually
replacing TLS GD code sequence with TLSDESC :)

> [1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373.
>
> gcc/Changelog:
> * config/riscv/riscv.opt: Add -mtls-dialect to configure TLS flavor.
> * config.gcc: Add --with_tls configuration option to change the
> default TLS flavor.
> * config/riscv/riscv.h: Add TARGET_TLSDESC determined from
> -mtls-dialect and with_tls defaults.
> * config/riscv/riscv-opts.h: Define enum riscv_tls_type for the
> two TLS flavors.
> * config/riscv/riscv-protos.h: Define SYMBOL_TLSDESC symbol type.
> * config/riscv/riscv.md: Add instruction sequence for TLSDESC.
> * config/riscv/riscv.cc (riscv_symbol_insns): Add instruction
> sequence length data for TLSDESC.
> (riscv_legitimize_tls_address): Add lowering of TLSDESC.
> * doc/install.texi: Document --with-tls for RISC-V.
> * doc/invoke.texi: Document -mtls-dialect for RISC-V.
> * testsuite/gcc.target/riscv/tls_1.x: Add TLSDESC GD test case.
> * testsuite/gcc.target/riscv/tlsdesc.c: Same as above.
> ---
> No regression in gcc tests for rv64gc, tested alongside the binutils and
> glibc implementation. Tested with --with_tls=desc.
>
> v2: Add with_tls configuration option, and a few readability improvements.
> Added Changelog.
> v3: Add documentation per Kito's suggestion.
> Fix minor issues pointed out by Kito and Jeff.
> Thanks Kito Cheng and Jeff Law for review.
> v4: Add TLSDESC GD assembly test.
> Rebase on top of trunk.
>
>  gcc/config.gcc   | 15 ++-
>  gcc/config/riscv/riscv-opts.h|  6 ++
>  gcc/config/riscv/riscv-protos.h  |  5 +++--
>  gcc/config/riscv/riscv.cc| 24 
>  gcc/config/riscv/riscv.h |  9 +++--
>  gcc/config/riscv/riscv.md| 20 +++-
>  gcc/config/riscv/riscv.opt   | 14 ++
>  gcc/doc/install.texi |  3 +++
>  gcc/doc/invoke.texi  | 13 -
>  gcc/testsuite/gcc.target/riscv/tls_1.x   |  5 +
>  gcc/testsuite/gcc.target/riscv/tlsdesc.c | 12 
>  11 files changed, 115 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/tls_1.x
>  create mode 100644 gcc/testsuite/gcc.target/riscv/tlsdesc.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 748430194f3..8bb22e9f590 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2490,6 +2490,7 @@ riscv*-*-linux*)
> # Force .init_array support.  The configure script cannot always
> # automatically detect that GAS supports it, yet we require it.
> gcc_cv_initfini_array=yes
> +   with_tls=${with_tls:-trad}
> ;;
>  riscv*-*-elf* | riscv*-*-rtems*)
> tm_file="elfos.h newlib-stdint.h ${tm_file} riscv/elf.h"
> @@ -2532,6 +2533,7 @@ riscv*-*-freebsd*)
> # Force .init_array support.  The configure script cannot always
> # automatically detect that GAS supports it, yet we require it.
> gcc_cv_initfini_array=yes
> +   with_tls=${with_tls:-trad}
> ;;
>
>  loongarch*-*-linux*)
> @@ -4658,7 +4660,7 @@ case "${target}" in
> ;;
>
> riscv*-*-*)
> -   supported_defaults="abi arch tune riscv_attribute isa_spec"
> +   supported_defaults="abi arch tune riscv_attribute isa_spec 
> tls"
>
> case "${target}" in
> riscv-* | riscv32*) xlen=32 ;;
> @@ -4788,6 +4790,17 @@ case "${target}" in
> ;;
> esac
> fi
> +   # Handle --with-tls.
> +   case "$with_tls" in
> +   "" \
> +   | trad | desc)
> +   # OK
> +   ;;
> +   *)
> +   echo "Unknown TLS method used in 
> --with-tls=$with

[PATCH] LoongArch: Fix soft-float builds of libffi

2024-01-26 Thread Yang Yujie
This patch correspond to the upstream PR:
https://github.com/libffi/libffi/pull/817

libffi/ChangeLog:

* src/loongarch64/ffi.c: Avoid defining floats
in struct call_context if the ABI is soft-float.
---
 libffi/src/loongarch64/ffi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libffi/src/loongarch64/ffi.c b/libffi/src/loongarch64/ffi.c
index 140be3bc3dc..01c2e18a395 100644
--- a/libffi/src/loongarch64/ffi.c
+++ b/libffi/src/loongarch64/ffi.c
@@ -58,7 +58,9 @@
 */
 typedef struct call_context
 {
+#if !defined(__loongarch_soft_float)
   ABI_FLOAT fa[8];
+#endif
   size_t a[10];
 } call_context;
 
-- 
2.43.0