[PATCH] bitint: Handle VCE from large/huge _BitInt SSA_NAME from load [PR114156]

2024-03-01 Thread Jakub Jelinek
Hi!

When adding checks in which case not to merge a VIEW_CONVERT_EXPR from
large/huge _BitInt to vector/complex etc., I missed the case of loads.
Those are handled differently later.
Anyway, I think the load case is something we can handle just fine,
so the following patch does that instead of preventing the merging
gimple_lower_bitint; we'd then copy from memory to memory and and do the
vce only on the second one, it is just better to vce the first one.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-01  Jakub Jelinek  

PR middle-end/114156
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Allow
rhs1 of a VCE to have no underlying variable if it is a load and
handle that case.

* gcc.dg/bitint-96.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-02-24 12:44:27.993108306 +0100
+++ gcc/gimple-lower-bitint.cc  2024-02-29 19:28:59.442020619 +0100
@@ -5329,6 +5329,22 @@ bitint_large_huge::lower_stmt (gimple *s
  gimple_assign_set_rhs1 (stmt, rhs1);
  gimple_assign_set_rhs_code (stmt, SSA_NAME);
}
+ else if (m_names == NULL
+  || !bitmap_bit_p (m_names, SSA_NAME_VERSION (rhs1)))
+   {
+ gimple *g = SSA_NAME_DEF_STMT (rhs1);
+ gcc_assert (gimple_assign_load_p (g));
+ tree mem = gimple_assign_rhs1 (g);
+ tree ltype = TREE_TYPE (lhs);
+ addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (mem));
+ if (as != TYPE_ADDR_SPACE (ltype))
+   ltype
+ = build_qualified_type (ltype,
+ TYPE_QUALS (ltype)
+ | ENCODE_QUAL_ADDR_SPACE (as));
+ rhs1 = build1 (VIEW_CONVERT_EXPR, ltype, mem);
+ gimple_assign_set_rhs1 (stmt, rhs1);
+   }
  else
{
  int part = var_to_partition (m_map, rhs1);
--- gcc/testsuite/gcc.dg/bitint-96.c.jj 2024-02-29 19:37:27.441032088 +0100
+++ gcc/testsuite/gcc.dg/bitint-96.c2024-02-29 19:36:34.815753879 +0100
@@ -0,0 +1,17 @@
+/* PR middle-end/114156 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+#if __BITINT_MAXWIDTH__ >= 128
+_BitInt(128) a, b;
+#else
+int a, b;
+#endif
+
+void
+foo (void)
+{
+  int u = b;
+  __builtin_memmove (&a, &b, sizeof (a));
+}

Jakub



RE: Re:[PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-03-01 Thread Demin Han
Hi juzhe,

I also thought it’s related to commutive firstly.

Following things make me to do the removal:

1.  No tests fails in regression

2.  When I write if (a == 2) and if (2 == a), the results are same

3.  The vec_duplicate operand  is the 5th operand in both cmp and eqne 
patterns. I think they are equal.


From: 钟居哲 
Sent: 2024年3月1日 15:24
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; Li, Pan2 ; 
jeffreyalaw ; Robin Dapp 
Subject: Re:[PATCH 1/5] RISC-V: Remove float vector eqne pattern

Hello, han.  Thanks for trying to optimize the codes.

But I believe those vector-scalar patterns (eq/ne) you remove in this patch are 
necessary.

This is the story:
1. For commutative RTL code in GCC like plus, eq, ne, ... etc,
we known in semantic Both (eq: (reg) (vec_duplicate ... ) and (eq: 
(vec_duplicate ... ) (reg)) are right.
However, GCC prefer this order as I remembered - (eq: (vec_duplicate ... ) 
(reg)).

2. Before this patch, the order of the comparison as follows (take eq and lt as 
an example):

1). (eq: (vec_duplicate ... ) (reg))  --> commutative
2). (lt: (reg) (vec_duplicate ... ) --> non-commutative

   These patterns order are different.

   So, you see we have dedicated patterns (seems duplicate patterns) for 
vector-scalar eq/ne, whereas, we unify eq/ne into other comparisons for 
vector-vector instructions.
   If we unify eq/ne into other comparisons for vector-scalar instructions 
(like your patch does), we will end up have:

   (eq: (reg) (vec_duplicate ... ) [after this patch] instead of (eq: 
(vec_duplicate ... ) (reg)) [Before this patch].

So, I think this patch may not be right.
I may be wrong, Robin/Jerff/kito feel free to correct me if I am wrong.


-- Original --
From:  
"demin.han"mailto:demin@starfivetech.com>>;
Date:  Fri, Mar 1, 2024 02:27 PM
To:  "gcc-patches"mailto:gcc-patches@gcc.gnu.org>>;
Cc:  "juzhe.zhong"mailto:juzhe.zh...@rivai.ai>>; 
"kito.cheng"mailto:kito.ch...@gmail.com>>; "Li, 
Pan2"mailto:pan2...@intel.com>>; 
"jeffreyalaw"mailto:jeffreya...@gmail.com>>;
Subject:  [PATCH 1/5] RISC-V: Remove float vector eqne pattern

We can unify eqne and other comparison operations.

Tested on RV32 and RV64

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Remove eqne cond
* config/riscv/vector.md (@pred_eqne_scalar): Remove patterns
(*pred_eqne_scalar_merge_tie_mask): Ditto
(*pred_eqne_scalar): Ditto
(*pred_eqne_scalar_narrow): Ditto

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
 .../riscv/riscv-vector-builtins-bases.cc  |  4 -
 gcc/config/riscv/vector.md| 86 ---
 2 files changed, 90 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index b6f6e4ff37e..d414721ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1420,10 +1420,6 @@ public:
 switch (e.op_info->op)
   {
  case OP_TYPE_vf: {
-   if (CODE == EQ || CODE == NE)
- return e.use_compare_insn (CODE, code_for_pred_eqne_scalar (
-e.vector_mode ()));
-   else
  return e.use_compare_insn (CODE, code_for_pred_cmp_scalar (
 e.vector_mode ()));
  }
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ab6e099852d..9210d7c28ad 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7520,92 +7520,6 @@ (define_insn "*pred_cmp_scalar_narrow"
(set_attr "mode" "")
(set_attr "spec_restriction" "none,thv,thv,none,none")])

-(define_expand "@pred_eqne_scalar"
-  [(set (match_operand: 0 "register_operand")
- (if_then_else:
-   (unspec:
- [(match_operand: 1 "vector_mask_operand")
-  (match_operand 6 "vector_length_operand")
-  (match_operand 7 "const_int_operand")
-  (match_operand 8 "const_int_operand")
-  (reg:SI VL_REGNUM)
-  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (match_operator: 3 "equality_operator"
-  [(vec_duplicate:V_VLSF
- (match_operand: 5 "register_operand"))
-   (match_operand:V_VLSF 4 "register_operand")])
-   (match_operand: 2 "vector_merge_operand")))]
-  "TARGET_VECTOR"
-  {})
-
-(define_insn "*pred_eqne_scalar_merge_tie_mask"
-  [(set (match_operand: 0 "register_operand"  "=vm")
- (if_then_else:
-   (unspec:
- [(match_operand: 1 "register_operand" "  0")
-  (match_operand 5 "vector_length_operand" " rK")
-  (match_operand 6 "const_int_operand" "  i")
-  (match_operand 7 "const_int_operand" "  i")
-  (reg:SI VL_REGNUM)
-  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (match_operator: 2 "equality_operator"
-  [(vec_duplicate:V_VLSF
- (match_operand: 4 "register_operand" "  f"))
-   (match_operand:V_VLSF 3 "register_operand"  " vr")])
-   (match_dup 1)))]
-  "TARGET_VECTOR"
-  "vmf%B2.vf\t%0,%3,%4,v0.t"
-  [(set_attr "

RE: Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-03-01 Thread Demin Han
Hi juzhe,

Yes, for comparison between vector and scalar variable, this patch is not work, 
because the scalar is duplicated in loop vectorize pass.
I have not found idea for this situation, so solve vector-imm comparison first.
Thanks for remind this, I will try that patch.

Thanks.

From: 钟居哲 
Sent: 2024年3月1日 15:49
To: Demin Han ; gcc-patches 

Cc: kito.cheng ; Li, Pan2 ; 
jeffreyalaw ; Robin Dapp ; 
richard.sandiford 
Subject: Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec 
and imm

Hi, han. I understand you are trying to support optimize vector-splat_vector 
into vector-scalar in "expand" stage, that is,

vv -> vx or vv -> vf.

It's a known issue that we know for a long time.

This patch is trying to transform vv->vf when the splat vector is duplicate 
from a constant (by recognize it is a CONST_VECTOR in expand stage),
but can't transform vv->vf when splat vector is duplicate from a register.

For example, like a[i] = b[i] > x ? c[i] : d[i], the x is a register, this case 
can not be optimized with your patch.

Actually, we have a solution to do all possible transformation (including the 
case I mentioned above) from vv to vx or vf by late-combine PASS which
is contributed by ARM Richard Sandiford: 
https://patchwork.ozlabs.org/project/gcc/patch/mptr0ljn9eh@arm.com/
You can try to apply this patch and experiment it locally yourself.

And I believe it will be landed in GCC-15. So I don't think we need this patch 
to do the optimization.

Thanks.

-- Original --
From:  
"demin.han"mailto:demin@starfivetech.com>>;
Date:  Fri, Mar 1, 2024 02:27 PM
To:  "gcc-patches"mailto:gcc-patches@gcc.gnu.org>>;
Cc:  "juzhe.zhong"mailto:juzhe.zh...@rivai.ai>>; 
"kito.cheng"mailto:kito.ch...@gmail.com>>; "Li, 
Pan2"mailto:pan2...@intel.com>>; 
"jeffreyalaw"mailto:jeffreya...@gmail.com>>;
Subject:  [PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec 
and imm

Currently, following instructions generated in autovector:
flw
vsetvli
vfmv.v.f
...
vmfxx.vv
Two issues:
  1. Additional vsetvl and vfmv instructions
  2. Occupy one vector register and may results in smaller lmul

We expect:
flw
...
vmfxx.vf

Tested on RV32 and RV64

gcc/ChangeLog:

* config/riscv/autovec.md: Accept imm
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select scalar pattern
(expand_vec_cmp): Ditto
* config/riscv/riscv.cc (riscv_const_insns): Exclude float mode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Add new tests

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
 gcc/config/riscv/autovec.md   |  2 +-
 gcc/config/riscv/riscv-v.cc   | 23 +
 gcc/config/riscv/riscv.cc |  2 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 34 +++
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..6cfb0800c45 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -690,7 +690,7 @@ (define_expand "vec_cmp"
   [(set (match_operand: 0 "register_operand")
  (match_operator: 1 "comparison_operator"
[(match_operand:V_VLSF 2 "register_operand")
-(match_operand:V_VLSF 3 "register_operand")]))]
+(match_operand:V_VLSF 3 "nonmemory_operand")]))]
   "TARGET_VECTOR"
   {
 riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]),
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 14e75b9a117..2a188ac78e0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2610,9 +2610,15 @@ expand_vec_init (rtx target, rtx vals)
 /* Get insn code for corresponding comparison.  */

 static insn_code
-get_cmp_insn_code (rtx_code code, machine_mode mode)
+get_cmp_insn_code (rtx_code code, machine_mode mode, bool scalar_p)
 {
   insn_code icode;
+  if (FLOAT_MODE_P (mode))
+{
+  icode = !scalar_p ? code_for_pred_cmp (mode)
+ : code_for_pred_cmp_scalar (mode);
+  return icode;
+}
   switch (code)
 {
 case EQ:
@@ -2628,10 +2634,7 @@ get_cmp_insn_code (rtx_code code, machine_mode mode)
 case LTU:
 case GE:
 case GEU:
-  if (FLOAT_MODE_P (mode))
- icode = code_for_pred_cmp (mode);
-  else
- icode = code_for_pred_ltge (mode);
+  icode = code_for_pred_ltge (mode);
   break;
 default:
   gcc_unreachable ();
@@ -2757,7 +2760,6 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
 {
   machine_mode mask_mode = GET_MODE (target);
   machine_mode data_mode = GET_MODE (op0);
-  insn_code icode = get_cmp_insn_code (code, data_mode);

   if (code == LTGT)
 {
@@ -2765,12 +2767,19 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
   rtx gt = gen_reg_rtx (mask_mode);
   expand_vec_cmp (lt, LT, op0, op1, mask, maskoff);
   expand_vec_cmp (gt, GT, op0, op1, mask, maskoff);
- 

Re: [PATCH] RISC-V: Add riscv_vector_cc function attribute

2024-03-01 Thread Kito Cheng
Thanks for your patch! this is generally in good shape, just a few
minor comments :)


> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 2135dfde9c8..afe486ba47b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -6314,6 +6314,18 @@ Permissible values for this parameter are @code{user}, 
> @code{supervisor},
>  and @code{machine}.  If there is no parameter, then it defaults to
>  @code{machine}.
>
> +@cindex @code{riscv_vector_cc} function attribute, RISC-V
> +@item riscv_vector_cc
> +Use this attribute to force the function to use the vector calling
> +convention variant.
> +For more information on riscv_vector_cc, please see
> +@uref{https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67}

Please remove above two line, I guess it's not good idea to reference
a pull request link here :P

> +
> +@smallexample
> +void foo() __attribute__((riscv_vector_cc));
> +[[riscv::vector_cc]] void foo(); // For C++11 and C23
> +@end smallexample
> +
>  @end table
>
>  The following target-specific function attributes are available for the

> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
>  
> b/gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
> new file mode 100644
> index 000..7db9d874bcd
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
> @@ -0,0 +1,117 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O1" } */
> +/* { dg-final { check-function-bodies "**" "" } } */

I would like to prevent scanning the asm body if possible, since it
might cause problem when we improving code gen, so could you try to
scan .variant_cc\t like
gcc/testsuite/gcc.target/aarch64/pcs_attribute-3.c?

Then we can also drop -O1 in the option :)

> +
> +#include 

Drop this.


> +void __attribute__((riscv_vector_cc))
> +foo2 (int a)
> +{
> +  int8_t data[1024];

Just char rather than int8_t, I would like to remove unnecessary
header including if possible :)


Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-03-01 Thread Tobias Burnus

Hi all, hi John & Thomas

John David Anglin wrote:

On 2024-02-29 6:02 p.m., Thomas Schwinge wrote:

I wonder: shouldn't that cap at 50 threads happen inside libgomp,
generally, instead of per test case and user code (!)?



Per my
understanding, OpenMP 'num_threads' specifies a *desired* number of
threads; the implementation may limit that value.

Sounds like a good suggestion.


I concur – if the hardware/OS doesn't support more.

* * *

However – for completeness and to correct a statement: While num_threads 
specifies the desired number of threads, 'strict' will turn this into 
error termination if the implementation cannot fulfilled the request.


Namely, "if prescriptiveness is specified as 'strict' and Algorithm 11.1 
would result in a number of threads other than the value of the first 
item of the _nthreads_ list then runtime error termination is performed."


Note that 'strict' for num_threads is new in/since the OpenMP 6.0 draft 
(TR11, I think) and not yet implemented in GCC.


However, I guess that the thread limit also affects 'teams' and nested 
parallelization. And for teams 'num_teams(n)' sets lower = upper value 
to 'n' — Thus, this enforces this number of teams. (While 
'num_teams(m:n)' sets both limits and 'omp_set_num_teams(n)' or 
OMP_NUM_TEAMS=n only set the upper bound).


[As far as I can see, OpenACC always permits an implementation to use 
fewer gangs/workers/vectors if the hardware doesn't support the 
requested number.]


Tobias



[PATCH] function: Fix another TYPE_NO_NAMED_ARGS_STDARG_P spot

2024-03-01 Thread Jakub Jelinek
Hi!

When looking at PR114175 (although that bug seems to be now a riscv backend
bug), I've noticed that for the TYPE_NO_NAMED_ARGS_STDARG_P functions which
return value through hidden reference, like
#include 

struct S { char a[64]; };
int n;

struct S
foo (...)
{
  struct S s = {};
  va_list ap;
  va_start (ap);
  for (int i = 0; i < n; ++i)
if ((i & 1))
  s.a[0] += va_arg (ap, double);
else
  s.a[0] += va_arg (ap, int);
  va_end (ap);
  return s;
}
we were incorrectly calling assign_parms_setup_varargs twice, once
at the start of the function and once in
  if (cfun->stdarg && !DECL_CHAIN (parm))
assign_parms_setup_varargs (&all, &data, false);
where parm is the last and only "named" parameter.

The first call, guarded with TYPE_NO_NAMED_ARGS_STDARG_P, was added in
r13-3549 and is needed for int bar (...) etc. functions using
va_start/va_arg/va_end, otherwise the 
  FOR_EACH_VEC_ELT (fnargs, i, parm)
in which the other call is will not iterate at all.  But we shouldn't
be doing that if we have the hidden return pointer.

With the following patch on the above testcase with -O0 -std=c23 the
assembly difference is:
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
pushq   %rbx
subq$192, %rsp
.cfi_offset 3, -24
-   movq%rdi, -192(%rbp)
-   movq%rsi, -184(%rbp)
-   movq%rdx, -176(%rbp)
-   movq%rcx, -168(%rbp)
-   movq%r8, -160(%rbp)
-   movq%r9, -152(%rbp)
-   testb   %al, %al
-   je  .L2
-   movaps  %xmm0, -144(%rbp)
-   movaps  %xmm1, -128(%rbp)
-   movaps  %xmm2, -112(%rbp)
-   movaps  %xmm3, -96(%rbp)
-   movaps  %xmm4, -80(%rbp)
-   movaps  %xmm5, -64(%rbp)
-   movaps  %xmm6, -48(%rbp)
-   movaps  %xmm7, -32(%rbp)
-.L2:
movq%rdi, -312(%rbp)
movq%rdi, -192(%rbp)
movq%rsi, -184(%rbp)
movq%rdx, -176(%rbp)
movq%rcx, -168(%rbp)
movq%r8, -160(%rbp)
movq%r9, -152(%rbp)
testb   %al, %al
-   je  .L13
+   je  .L12
movaps  %xmm0, -144(%rbp)
movaps  %xmm1, -128(%rbp)
movaps  %xmm2, -112(%rbp)
movaps  %xmm3, -96(%rbp)
movaps  %xmm4, -80(%rbp)
movaps  %xmm5, -64(%rbp)
movaps  %xmm6, -48(%rbp)
movaps  %xmm7, -32(%rbp)
-.L13:
+.L12:
plus some renumbering of labels later on which clearly shows
that because of this bug, we were saving all the registers twice
rather then once.  With -O2 -std=c23 some of it is DCEd, but we still get
subq$160, %rsp
.cfi_def_cfa_offset 168
-   testb   %al, %al
-   je  .L2
-   movaps  %xmm0, 24(%rsp)
-   movaps  %xmm1, 40(%rsp)
-   movaps  %xmm2, 56(%rsp)
-   movaps  %xmm3, 72(%rsp)
-   movaps  %xmm4, 88(%rsp)
-   movaps  %xmm5, 104(%rsp)
-   movaps  %xmm6, 120(%rsp)
-   movaps  %xmm7, 136(%rsp)
-.L2:
movq%rdi, -24(%rsp)
movq%rsi, -16(%rsp)
movq%rdx, -8(%rsp)
movq%rcx, (%rsp)
movq%r8, 8(%rsp)
movq%r9, 16(%rsp)
testb   %al, %al
-   je  .L13
+   je  .L12
movaps  %xmm0, 24(%rsp)
movaps  %xmm1, 40(%rsp)
movaps  %xmm2, 56(%rsp)
movaps  %xmm3, 72(%rsp)
movaps  %xmm4, 88(%rsp)
movaps  %xmm5, 104(%rsp)
movaps  %xmm6, 120(%rsp)
movaps  %xmm7, 136(%rsp)
-.L13:
+.L12:
difference, i.e. this time not all, but the floating point args
were conditionally all saved twice.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-01  Jakub Jelinek  

* function.cc (assign_parms): Only call assign_parms_setup_varargs
early for TYPE_NO_NAMED_ARGS_STDARG_P functions if fnargs is empty.

--- gcc/function.cc.jj  2024-01-12 13:47:20.834428745 +0100
+++ gcc/function.cc 2024-02-29 21:14:35.275889093 +0100
@@ -3650,7 +3650,8 @@ assign_parms (tree fndecl)
   assign_parms_initialize_all (&all);
   fnargs = assign_parms_augmented_arg_list (&all);
 
-  if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl)))
+  if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl))
+  && fnargs.is_empty ())
 {
   struct assign_parm_data_one data = {};
   assign_parms_setup_varargs (&all, &data, false);

Jakub



Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-03-01 Thread Jakub Jelinek
On Fri, Mar 01, 2024 at 09:29:01AM +0100, Tobias Burnus wrote:
> John David Anglin wrote:
> > On 2024-02-29 6:02 p.m., Thomas Schwinge wrote:
> > > I wonder: shouldn't that cap at 50 threads happen inside libgomp,
> > > generally, instead of per test case and user code (!)?
> 
> > > Per my
> > > understanding, OpenMP 'num_threads' specifies a *desired* number of
> > > threads; the implementation may limit that value.
> > Sounds like a good suggestion.
> 
> I concur – if the hardware/OS doesn't support more.
> 
> * * *
> 
> However – for completeness and to correct a statement: While num_threads
> specifies the desired number of threads, 'strict' will turn this into error
> termination if the implementation cannot fulfilled the request.
> 
> Namely, "if prescriptiveness is specified as 'strict' and Algorithm 11.1
> would result in a number of threads other than the value of the first item
> of the _nthreads_ list then runtime error termination is performed."
> 
> Note that 'strict' for num_threads is new in/since the OpenMP 6.0 draft
> (TR11, I think) and not yet implemented in GCC.

Also note that if hppa-linux really has such low thread limits, we can't
simply try to add some hack in gomp_resolve_num_threads where it would lower
the result if larger than 50, because if the limit is max 50 threads per
process, just doing nested parallelism and asking for 25 threads in the
outer and 4 in the inner in each will run over that limit, or teams 4
with max 25 threads in parallel inside of it, or user can use pthread_create
in the process as well, etc.

So, if at all possible, the best thing would be to change kernel so that
the number of threads is limited just by available memory for stacks and
user tweakable limit.
E.g. on my ws I have
cat /proc/sys/kernel/threads-max
253865

Isn't this just that you have 50 in there?

Jakub



Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-03-01 Thread Helge Deller

On 3/1/24 09:44, Jakub Jelinek wrote:

On Fri, Mar 01, 2024 at 09:29:01AM +0100, Tobias Burnus wrote:

John David Anglin wrote:

On 2024-02-29 6:02 p.m., Thomas Schwinge wrote:

I wonder: shouldn't that cap at 50 threads happen inside libgomp,
generally, instead of per test case and user code (!)?



Per my
understanding, OpenMP 'num_threads' specifies a *desired* number of
threads; the implementation may limit that value.

Sounds like a good suggestion.


I concur – if the hardware/OS doesn't support more.

* * *

However – for completeness and to correct a statement: While num_threads
specifies the desired number of threads, 'strict' will turn this into error
termination if the implementation cannot fulfilled the request.

Namely, "if prescriptiveness is specified as 'strict' and Algorithm 11.1
would result in a number of threads other than the value of the first item
of the _nthreads_ list then runtime error termination is performed."

Note that 'strict' for num_threads is new in/since the OpenMP 6.0 draft
(TR11, I think) and not yet implemented in GCC.


Also note that if hppa-linux really has such low thread limits, we can't
simply try to add some hack in gomp_resolve_num_threads where it would lower
the result if larger than 50, because if the limit is max 50 threads per
process, just doing nested parallelism and asking for 25 threads in the
outer and 4 in the inner in each will run over that limit, or teams 4
with max 25 threads in parallel inside of it, or user can use pthread_create
in the process as well, etc.

So, if at all possible, the best thing would be to change kernel so that
the number of threads is limited just by available memory for stacks and
user tweakable limit.
E.g. on my ws I have
cat /proc/sys/kernel/threads-max
253865

Isn't this just that you have 50 in there?


On my physical parisc machines I see:

root@panama(4GB RAM):~# cat /proc/sys/kernel/threads-max
39696

root@parisc(8GM RAM):~# cat /proc/sys/kernel/threads-max
63861

So, I wonder where the 50 comes from.
On parisc stacks grow upwards and the initial maximum stack
size is limited, so maybe this influences the 50 ?

Helge


[PATCH v2] RISC-V: Add riscv_vector_cc function attribute

2024-03-01 Thread Li Xu
From: xuli 

Standard vector calling convention variant will only enabled when function
has vector argument or returning value by default, however user may also
want to invoke function without that during a vectorized loop at some situation,
but it will cause a huge performance penalty due to vector register 
store/restore.

So user can declare function with this riscv_vector_cc attribute like below, 
that could enforce
function will use standard vector calling convention variant.

void foo() __attribute__((riscv_vector_cc));
[[riscv::vector_cc]] void foo(); // For C++11 and C23

For more details please reference the below link.
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67

gcc/ChangeLog:

* config/riscv/riscv.cc (TARGET_GNU_ATTRIBUTES): Add riscv_vector_cc
attribute to riscv_attribute_table.
(riscv_vector_cc_function_p): Return true if FUNC is a riscv_vector_cc 
function.
(riscv_fntype_abi): Add riscv_vector_cc attribute check.
* doc/extend.texi: Add riscv_vector_cc attribute description.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c: 
New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c: New test.
---
 gcc/config/riscv/riscv.cc | 55 ---
 gcc/doc/extend.texi   | 10 
 .../base/attribute-riscv_vector_cc-error.C| 21 +++
 .../attribute-riscv_vector_cc-callee-saved.c  | 30 ++
 .../base/attribute-riscv_vector_cc-error.c| 11 
 5 files changed, 119 insertions(+), 8 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4100abc9dd1..7f37f231796 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -537,24 +537,52 @@ static tree riscv_handle_fndecl_attribute (tree *, tree, 
tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
 /* Defining target-specific uses of __attribute__.  */
-TARGET_GNU_ATTRIBUTES (riscv_attribute_table,
+static const attribute_spec riscv_gnu_attributes[] =
 {
   /* Syntax: { name, min_len, max_len, decl_required, type_required,
   function_type_required, affects_type_identity, handler,
   exclude } */
 
   /* The attribute telling no prologue/epilogue.  */
-  { "naked",   0,  0, true, false, false, false,
-riscv_handle_fndecl_attribute, NULL },
+  {"naked", 0, 0, true, false, false, false, riscv_handle_fndecl_attribute,
+   NULL},
   /* This attribute generates prologue/epilogue for interrupt handlers.  */
-  { "interrupt", 0, 1, false, true, true, false,
-riscv_handle_type_attribute, NULL },
+  {"interrupt", 0, 1, false, true, true, false, riscv_handle_type_attribute,
+   NULL},
 
   /* The following two are used for the built-in properties of the Vector type
  and are not used externally */
   {"RVV sizeless type", 4, 4, false, true, false, true, NULL, NULL},
-  {"RVV type", 0, 0, false, true, false, true, NULL, NULL}
-});
+  {"RVV type", 0, 0, false, true, false, true, NULL, NULL},
+  /* This attribute is used to declare a function, forcing it to use the
+standard vector calling convention variant. Syntax:
+__attribute__((riscv_vector_cc)). */
+  {"riscv_vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_gnu_attribute_table  =
+{
+  "gnu", {riscv_gnu_attributes}
+};
+
+static const attribute_spec riscv_attributes[] =
+{
+  /* This attribute is used to declare a function, forcing it to use the
+ standard vector calling convention variant. Syntax:
+ [[riscv::vector_cc]]. */
+  {"vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_nongnu_attribute_table =
+{
+  "riscv", {riscv_attributes}
+};
+
+static const scoped_attribute_specs *const riscv_attribute_table[] =
+{
+  &riscv_gnu_attribute_table,
+  &riscv_nongnu_attribute_table
+};
 
 /* Order for the CLOBBERs/USEs of gpr_save.  */
 static const unsigned gpr_save_reg_order[] = {
@@ -5425,6 +5453,16 @@ riscv_arguments_is_vector_type_p (const_tree fntype)
   return false;
 }
 
+/* Return true if FUNC is a riscv_vector_cc function.
+   For more details please reference the below link.
+   https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67 */
+static bool
+riscv_vector_cc_function_p (const_tree fntype)
+{
+  return lookup_attribute ("vector_cc", TYPE_ATTRIBUTES (fntype)) != NULL_TREE
+|| lookup_attribute ("riscv_vector_cc", TYPE_ATTRIBUTES (fntype)) != 
NULL_TREE;
+}
+
 /* Implement TARGET_FNTYP

[PATCH] Allow patterns in SLP reductions

2024-03-01 Thread Richard Biener
The following removes the over-broad rejection of patterns for SLP
reductions which is done by removing them from LOOP_VINFO_REDUCTIONS
during pattern detection.  That's also insufficient in case the
pattern only appears on the reduction path.  Instead this implements
the proper correctness check in vectorizable_reduction and guides
SLP discovery to heuristically avoid forming later invalid groups.

I also couldn't find any testcase that FAILs when allowing the SLP
reductions to form so I've added one.

I came across this for single-lane SLP reductions with the all-SLP
work where we rely on patterns to properly vectorize COND_EXPR
reductions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

Richard.

* tree-vect-patterns.cc (vect_pattern_recog_1): Do not
remove reductions involving patterns.
* tree-vect-loop.cc (vectorizable_reduction): Reject SLP
reduction groups with multiple lane-reducing reductions.
* tree-vect-slp.cc (vect_analyze_slp_instance): When discovering
SLP reduction groups avoid including lane-reducing ones.

* gcc.dg/vect/vect-reduc-sad-9.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 
 gcc/tree-vect-loop.cc| 15 +
 gcc/tree-vect-patterns.cc| 13 
 gcc/tree-vect-slp.cc | 26 +---
 4 files changed, 101 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
new file mode 100644
index 000..3c6af4510f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
@@ -0,0 +1,68 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } } */
+/* { dg-require-effective-target vect_usad_char } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 64
+
+unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
+int abs (int);
+
+/* Sum of absolute differences between arrays of unsigned char types.
+   Detected as a sad pattern.
+   Vectorized on targets that support sad for unsigned chars.  */
+
+__attribute__ ((noinline)) int
+foo (int len, int *res2)
+{
+  int i;
+  int result = 0;
+  int result2 = 0;
+
+  for (i = 0; i < len; i++)
+{
+  /* Make sure we are not using an SLP reduction for this.  */
+  result += abs (X[2*i] - Y[2*i]);
+  result2 += abs (X[2*i + 1] - Y[2*i + 1]);
+}
+
+  *res2 = result2;
+  return result;
+}
+
+
+int
+main (void)
+{
+  int i;
+  int sad;
+
+  check_vect ();
+
+  for (i = 0; i < N/2; i++)
+{
+  X[2*i] = i;
+  Y[2*i] = N/2 - i;
+  X[2*i+1] = i;
+  Y[2*i+1] = 0;
+  __asm__ volatile ("");
+}
+
+
+  int sad2;
+  sad = foo (N/2, &sad2);
+  if (sad != (N/2)*(N/4))
+abort ();
+  if (sad2 != (N/2-1)*(N/2)/2)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 35f1f8c7d42..13dcdba403a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7703,6 +7703,21 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   return false;
 }
 
+  /* Lane-reducing ops also never can be used in a SLP reduction group
+ since we'll mix lanes belonging to different reductions.  But it's
+ OK to use them in a reduction chain or when the reduction group
+ has just one element.  */
+  if (lane_reduc_code_p
+  && slp_node
+  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
+  && SLP_TREE_LANES (slp_node) > 1)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"lane-reducing reduction in reduction group.\n");
+  return false;
+}
+
   /* All uses but the last are expected to be defined in the loop.
  The last use is the reduction variable.  In case of nested cycle this
  assumption is not true: we use reduc_index to record the index of the
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..fe1ffba8688 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -7172,7 +7172,6 @@ vect_pattern_recog_1 (vec_info *vinfo,
  vect_recog_func *recog_func, stmt_vec_info stmt_info)
 {
   gimple *pattern_stmt;
-  loop_vec_info loop_vinfo;
   tree pattern_vectype;
 
   /* If this statement has already been replaced with pattern statements,
@@ -7198,8 +7197,6 @@ vect_pattern_recog_1 (vec_info *vinfo,
   return;
 }
 
-  loop_vinfo = dyn_cast  (vinfo);
- 
   /* Fou

Re: [PATCH v2] RISC-V: Add riscv_vector_cc function attribute

2024-03-01 Thread Kito Cheng
LGTM, thanks :)

On Fri, Mar 1, 2024 at 5:10 PM Li Xu  wrote:
>
> From: xuli 
>
> Standard vector calling convention variant will only enabled when function
> has vector argument or returning value by default, however user may also
> want to invoke function without that during a vectorized loop at some 
> situation,
> but it will cause a huge performance penalty due to vector register 
> store/restore.
>
> So user can declare function with this riscv_vector_cc attribute like below, 
> that could enforce
> function will use standard vector calling convention variant.
>
> void foo() __attribute__((riscv_vector_cc));
> [[riscv::vector_cc]] void foo(); // For C++11 and C23
>
> For more details please reference the below link.
> https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (TARGET_GNU_ATTRIBUTES): Add riscv_vector_cc
> attribute to riscv_attribute_table.
> (riscv_vector_cc_function_p): Return true if FUNC is a 
> riscv_vector_cc function.
> (riscv_fntype_abi): Add riscv_vector_cc attribute check.
> * doc/extend.texi: Add riscv_vector_cc attribute description.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C: New 
> test.
> * gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c: 
> New test.
> * gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c: New 
> test.
> ---
>  gcc/config/riscv/riscv.cc | 55 ---
>  gcc/doc/extend.texi   | 10 
>  .../base/attribute-riscv_vector_cc-error.C| 21 +++
>  .../attribute-riscv_vector_cc-callee-saved.c  | 30 ++
>  .../base/attribute-riscv_vector_cc-error.c| 11 
>  5 files changed, 119 insertions(+), 8 deletions(-)
>  create mode 100644 
> gcc/testsuite/g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 4100abc9dd1..7f37f231796 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -537,24 +537,52 @@ static tree riscv_handle_fndecl_attribute (tree *, 
> tree, tree, int, bool *);
>  static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
>
>  /* Defining target-specific uses of __attribute__.  */
> -TARGET_GNU_ATTRIBUTES (riscv_attribute_table,
> +static const attribute_spec riscv_gnu_attributes[] =
>  {
>/* Syntax: { name, min_len, max_len, decl_required, type_required,
>function_type_required, affects_type_identity, handler,
>exclude } */
>
>/* The attribute telling no prologue/epilogue.  */
> -  { "naked",   0,  0, true, false, false, false,
> -riscv_handle_fndecl_attribute, NULL },
> +  {"naked", 0, 0, true, false, false, false, riscv_handle_fndecl_attribute,
> +   NULL},
>/* This attribute generates prologue/epilogue for interrupt handlers.  */
> -  { "interrupt", 0, 1, false, true, true, false,
> -riscv_handle_type_attribute, NULL },
> +  {"interrupt", 0, 1, false, true, true, false, riscv_handle_type_attribute,
> +   NULL},
>
>/* The following two are used for the built-in properties of the Vector 
> type
>   and are not used externally */
>{"RVV sizeless type", 4, 4, false, true, false, true, NULL, NULL},
> -  {"RVV type", 0, 0, false, true, false, true, NULL, NULL}
> -});
> +  {"RVV type", 0, 0, false, true, false, true, NULL, NULL},
> +  /* This attribute is used to declare a function, forcing it to use the
> +standard vector calling convention variant. Syntax:
> +__attribute__((riscv_vector_cc)). */
> +  {"riscv_vector_cc", 0, 0, false, true, true, true, NULL, NULL}
> +};
> +
> +static const scoped_attribute_specs riscv_gnu_attribute_table  =
> +{
> +  "gnu", {riscv_gnu_attributes}
> +};
> +
> +static const attribute_spec riscv_attributes[] =
> +{
> +  /* This attribute is used to declare a function, forcing it to use the
> + standard vector calling convention variant. Syntax:
> + [[riscv::vector_cc]]. */
> +  {"vector_cc", 0, 0, false, true, true, true, NULL, NULL}
> +};
> +
> +static const scoped_attribute_specs riscv_nongnu_attribute_table =
> +{
> +  "riscv", {riscv_attributes}
> +};
> +
> +static const scoped_attribute_specs *const riscv_attribute_table[] =
> +{
> +  &riscv_gnu_attribute_table,
> +  &riscv_nongnu_attribute_table
> +};
>
>  /* Order for the CLOBBERs/USEs of gpr_save.  */
>  static const unsigned gpr_save_reg_order[] = {
> @@ -5425,6 +5453,16 @@ riscv_arguments_is_vector_type_p (const_tree fntype)
>return false;
>  }
>
> +/* Return true if FUNC is a riscv_vector_cc function.
> +   For more details please reference the below link.
> +   https://github.com/riscv-non-i

Re: [PATCH] tree-optimization/110221 - SLP and loop mask/len

2024-03-01 Thread Andre Vieira (lists)

Hi,

Bootstrapped and tested the gcc-13 backport of this on gcc-12 for 
aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions.


OK to push to gcc-12 branch?

Kind regards,
Andre Vieira

On 10/11/2023 13:16, Richard Biener wrote:

The following fixes the issue that when SLP stmts are internal defs
but appear invariant because they end up only using invariant defs
then they get scheduled outside of the loop.  This nice optimization
breaks down when loop masks or lens are applied since those are not
explicitly tracked as dependences.  The following makes sure to never
schedule internal defs outside of the vectorized loop when the
loop uses masks/lens.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110221
* tree-vect-slp.cc (vect_schedule_slp_node): When loop
masking / len is applied make sure to not schedule
intenal defs outside of the loop.

* gfortran.dg/pr110221.f: New testcase.
---
  gcc/testsuite/gfortran.dg/pr110221.f | 17 +
  gcc/tree-vect-slp.cc | 10 ++
  2 files changed, 27 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f

diff --git a/gcc/testsuite/gfortran.dg/pr110221.f 
b/gcc/testsuite/gfortran.dg/pr110221.f
new file mode 100644
index 000..8b57384313a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr110221.f
@@ -0,0 +1,17 @@
+C PR middle-end/68146
+C { dg-do compile }
+C { dg-options "-O2 -w" }
+C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" { 
target avx512f } }
+  SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY)
+  IMPLICIT DOUBLE PRECISION (A,B,G,O-Y)
+  IMPLICIT COMPLEX*16 (C,Z)
+  DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*)
+  N=INT(V)
+  CALL GAMMA2(VG,GA)
+  DO 65 K=1,N
+CBY(K)=CYY
+65CONTINUE
+  CDJ(0)=V0/Z*CBJ(0)-CBJ(1)
+  DO 70 K=1,N
+70  CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1)
+  END
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3e5814c3a31..80e279d8f50 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo,
/* Emit other stmts after the children vectorized defs which is
 earliest possible.  */
gimple *last_stmt = NULL;
+  if (auto loop_vinfo = dyn_cast  (vinfo))
+   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ {
+   /* But avoid scheduling internal defs outside of the loop when
+  we might have only implicitly tracked loop mask/len defs.  */
+   gimple_stmt_iterator si
+ = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
+   last_stmt = *si;
+ }
bool seen_vector_def = false;
FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)


[PATCH] s390: Streamline vector builtins with LLVM

2024-03-01 Thread Stefan Schulze Frielinghaus
Similar as to s390_lcbb, s390_vll, s390_vstl, et al. make use of a
signed vector type for vlbb.  Furthermore, a const void pointer seems
more common and an integer for the mask.

For s390_vfi(s,d)b make use of integers for masks, too.

Use unsigned integers for all s390_vlbr/vstbr variants.

Make use of type UV16QI for the length operand of s390_vstrs(,z)(h,f).

Following the Principles of Operation, change from signed to unsigned
type for s390_va(c,cc,ccc)q and s390_vs(,c,bc)biq and s390_vmslg.

Make use of scalar type UINT128 instead of UV16QI for s390_vgfm(,a)g,
and s390_vsumq(f,g).

Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Update to reflect latest
changes.
* config/s390/s390-builtins.def: Streamline vector builtins with
LLVM.
---
 gcc/config/s390/s390-builtin-types.def | 29 +++-
 gcc/config/s390/s390-builtins.def  | 48 +-
 2 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 556104e0e23..ce51ae8cd3f 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -58,6 +58,7 @@ DEF_TYPE (BT_FLT, float_type_node, 0)
 DEF_TYPE (BT_FLTCONST, float_type_node, 1)
 DEF_TYPE (BT_INT, integer_type_node, 0)
 DEF_TYPE (BT_INT128, intTI_type_node, 0)
+DEF_TYPE (BT_INT128CONST, intTI_type_node, 1)
 DEF_TYPE (BT_INTCONST, integer_type_node, 1)
 DEF_TYPE (BT_LONG, long_integer_type_node, 0)
 DEF_TYPE (BT_LONGLONG, long_long_integer_type_node, 0)
@@ -69,6 +70,8 @@ DEF_TYPE (BT_SHORTCONST, short_integer_type_node, 1)
 DEF_TYPE (BT_UCHAR, unsigned_char_type_node, 0)
 DEF_TYPE (BT_UCHARCONST, unsigned_char_type_node, 1)
 DEF_TYPE (BT_UINT, unsigned_type_node, 0)
+DEF_TYPE (BT_UINT128, unsigned_intTI_type_node, 0)
+DEF_TYPE (BT_UINT128CONST, unsigned_intTI_type_node, 1)
 DEF_TYPE (BT_UINT64, c_uint64_type_node, 0)
 DEF_TYPE (BT_UINTCONST, unsigned_type_node, 1)
 DEF_TYPE (BT_ULONG, long_unsigned_type_node, 0)
@@ -79,6 +82,7 @@ DEF_TYPE (BT_USHORTCONST, short_unsigned_type_node, 1)
 DEF_TYPE (BT_VOID, void_type_node, 0)
 DEF_TYPE (BT_VOIDCONST, void_type_node, 1)
 DEF_VECTOR_TYPE (BT_UV16QI, BT_UCHAR, 16)
+DEF_VECTOR_TYPE (BT_UV1TI, BT_UINT128, 1)
 DEF_VECTOR_TYPE (BT_UV2DI, BT_ULONGLONG, 2)
 DEF_VECTOR_TYPE (BT_UV4SI, BT_UINT, 4)
 DEF_VECTOR_TYPE (BT_UV8HI, BT_USHORT, 8)
@@ -93,6 +97,8 @@ DEF_POINTER_TYPE (BT_DBLCONSTPTR, BT_DBLCONST)
 DEF_POINTER_TYPE (BT_DBLPTR, BT_DBL)
 DEF_POINTER_TYPE (BT_FLTCONSTPTR, BT_FLTCONST)
 DEF_POINTER_TYPE (BT_FLTPTR, BT_FLT)
+DEF_POINTER_TYPE (BT_INT128CONSTPTR, BT_INT128CONST)
+DEF_POINTER_TYPE (BT_INT128PTR, BT_INT128)
 DEF_POINTER_TYPE (BT_INTCONSTPTR, BT_INTCONST)
 DEF_POINTER_TYPE (BT_INTPTR, BT_INT)
 DEF_POINTER_TYPE (BT_LONGLONGCONSTPTR, BT_LONGLONGCONST)
@@ -103,6 +109,8 @@ DEF_POINTER_TYPE (BT_SHORTCONSTPTR, BT_SHORTCONST)
 DEF_POINTER_TYPE (BT_SHORTPTR, BT_SHORT)
 DEF_POINTER_TYPE (BT_UCHARCONSTPTR, BT_UCHARCONST)
 DEF_POINTER_TYPE (BT_UCHARPTR, BT_UCHAR)
+DEF_POINTER_TYPE (BT_UINT128CONSTPTR, BT_UINT128CONST)
+DEF_POINTER_TYPE (BT_UINT128PTR, BT_UINT128)
 DEF_POINTER_TYPE (BT_UINT64PTR, BT_UINT64)
 DEF_POINTER_TYPE (BT_UINTCONSTPTR, BT_UINTCONST)
 DEF_POINTER_TYPE (BT_UINTPTR, BT_UINT)
@@ -114,9 +122,11 @@ DEF_POINTER_TYPE (BT_VOIDCONSTPTR, BT_VOIDCONST)
 DEF_POINTER_TYPE (BT_VOIDPTR, BT_VOID)
 DEF_DISTINCT_TYPE (BT_BCHAR, BT_UCHAR)
 DEF_DISTINCT_TYPE (BT_BINT, BT_UINT)
+DEF_DISTINCT_TYPE (BT_BINT128, BT_UINT128)
 DEF_DISTINCT_TYPE (BT_BLONGLONG, BT_ULONGLONG)
 DEF_DISTINCT_TYPE (BT_BSHORT, BT_USHORT)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV16QI, BT_BCHAR, 16)
+DEF_OPAQUE_VECTOR_TYPE (BT_BV1TI, BT_BINT128, 1)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV2DI, BT_BLONGLONG, 2)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV4SI, BT_BINT, 4)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV8HI, BT_BSHORT, 8)
@@ -131,6 +141,7 @@ DEF_FN_TYPE_1 (BT_FN_INT_VOIDPTR, BT_INT, BT_VOIDPTR)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_INT, BT_OV4SI, BT_INT)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_INTCONSTPTR, BT_OV4SI, BT_INTCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_OV4SI, BT_OV4SI, BT_OV4SI)
+DEF_FN_TYPE_1 (BT_FN_UINT128_UINT128, BT_UINT128, BT_UINT128)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_UCHAR, BT_UV16QI, BT_UCHAR)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_UCHARCONSTPTR, BT_UV16QI, BT_UCHARCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_USHORT, BT_UV16QI, BT_USHORT)
@@ -154,7 +165,6 @@ DEF_FN_TYPE_1 (BT_FN_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_1 (BT_FN_V16QI_SCHAR, BT_V16QI, BT_SCHAR)
 DEF_FN_TYPE_1 (BT_FN_V16QI_UCHAR, BT_V16QI, BT_UCHAR)
 DEF_FN_TYPE_1 (BT_FN_V16QI_V16QI, BT_V16QI, BT_V16QI)
-DEF_FN_TYPE_1 (BT_FN_V1TI_V1TI, BT_V1TI, BT_V1TI)
 DEF_FN_TYPE_1 (BT_FN_V2DF_DBL, BT_V2DF, BT_DBL)
 DEF_FN_TYPE_1 (BT_FN_V2DF_DBLCONSTPTR, BT_V2DF, BT_DBLCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_V2DF_FLTCONSTPTR, BT_V2DF, BT_FLTCONSTPTR)
@@ -207,18 +217,18 @@ DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_OV4SI, BT_OV4SI, 
BT_OV4SI, BT_OV4SI)
 DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_UCHAR, BT_OV4SI, BT_OV4SI, B

Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-03-01 Thread mengqinggang

Thanks, I try to send a new version patch next week.


在 2024/2/29 下午2:08, Xi Ruoyao 写道:

On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote:

Generate la.tls.desc macro instruction for TLS descriptors model.

la.tls.desc expand to
   pcalau12i $a0, %desc_pc_hi20(a)
   ld.d  $a1, $a0, %desc_ld_pc_lo12(a)
   addi.d    $a0, $a0, %desc_add_pc_lo12(a)
   jirl  $ra, $a1, %desc_call(a)

The default is TLS descriptors, but can be configure with
-mtls-dialect={desc,trad}.

Please keep trad as the default for now.  Glibc-2.40 will be released
after GCC 14.1 but we don't want to end up in a situation where the
default configuration of the latest GCC release creating something not
working with latest Glibc release.

And there's also musl libc we need to take into account.

Or you can write some autoconf test for if the assembler supports
tlsdesc and check TARGET_GLIBC_MAJOR & TARGET_GLIBC_MINOR for Glibc
version to decide if enable desc by default.  If you want this but don't
have time to implement you can leave trad the default and I'll take care
of this.

/* snip */


+(define_insn "@got_load_tls_desc"
+  [(set (match_operand:P 0 "register_operand" "=r")
+   (unspec:P
+       [(match_operand:P 1 "symbolic_operand" "")]
+       UNSPEC_TLS_DESC))
+    (clobber (reg:SI FCC0_REGNUM))
+    (clobber (reg:SI FCC1_REGNUM))
+    (clobber (reg:SI FCC2_REGNUM))
+    (clobber (reg:SI FCC3_REGNUM))
+    (clobber (reg:SI FCC4_REGNUM))
+    (clobber (reg:SI FCC5_REGNUM))
+    (clobber (reg:SI FCC6_REGNUM))
+    (clobber (reg:SI FCC7_REGNUM))
+    (clobber (reg:SI A1_REGNUM))
+    (clobber (reg:SI RETURN_ADDR_REGNUM))]

Ok, the clobber list is correct.


+  "TARGET_TLS_DESC"
+  "la.tls.desc\t%0,%1"

With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
of la.tls.desc.  As we don't want to add too many code we can just hard
code the 4 instructions here instead of splitting this insn, just
something like

{ return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; }


+  [(set_attr "got" "load")
+   (set_attr "mode" "")])

We need (set_attr "length" "16") in this list as this actually expands
into 16 bytes.






Re: [PATCH] bitint: Handle VCE from large/huge _BitInt SSA_NAME from load [PR114156]

2024-03-01 Thread Richard Biener
On Fri, 1 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> When adding checks in which case not to merge a VIEW_CONVERT_EXPR from
> large/huge _BitInt to vector/complex etc., I missed the case of loads.
> Those are handled differently later.
> Anyway, I think the load case is something we can handle just fine,
> so the following patch does that instead of preventing the merging
> gimple_lower_bitint; we'd then copy from memory to memory and and do the
> vce only on the second one, it is just better to vce the first one.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-03-01  Jakub Jelinek  
> 
>   PR middle-end/114156
>   * gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Allow
>   rhs1 of a VCE to have no underlying variable if it is a load and
>   handle that case.
> 
>   * gcc.dg/bitint-96.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-02-24 12:44:27.993108306 +0100
> +++ gcc/gimple-lower-bitint.cc2024-02-29 19:28:59.442020619 +0100
> @@ -5329,6 +5329,22 @@ bitint_large_huge::lower_stmt (gimple *s
> gimple_assign_set_rhs1 (stmt, rhs1);
> gimple_assign_set_rhs_code (stmt, SSA_NAME);
>   }
> +   else if (m_names == NULL
> +|| !bitmap_bit_p (m_names, SSA_NAME_VERSION (rhs1)))
> + {
> +   gimple *g = SSA_NAME_DEF_STMT (rhs1);
> +   gcc_assert (gimple_assign_load_p (g));
> +   tree mem = gimple_assign_rhs1 (g);
> +   tree ltype = TREE_TYPE (lhs);
> +   addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (mem));
> +   if (as != TYPE_ADDR_SPACE (ltype))
> + ltype
> +   = build_qualified_type (ltype,
> +   TYPE_QUALS (ltype)
> +   | ENCODE_QUAL_ADDR_SPACE (as));
> +   rhs1 = build1 (VIEW_CONVERT_EXPR, ltype, mem);
> +   gimple_assign_set_rhs1 (stmt, rhs1);
> + }
> else
>   {
> int part = var_to_partition (m_map, rhs1);
> --- gcc/testsuite/gcc.dg/bitint-96.c.jj   2024-02-29 19:37:27.441032088 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-96.c  2024-02-29 19:36:34.815753879 +0100
> @@ -0,0 +1,17 @@
> +/* PR middle-end/114156 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 128
> +_BitInt(128) a, b;
> +#else
> +int a, b;
> +#endif
> +
> +void
> +foo (void)
> +{
> +  int u = b;
> +  __builtin_memmove (&a, &b, sizeof (a));
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] function: Fix another TYPE_NO_NAMED_ARGS_STDARG_P spot

2024-03-01 Thread Richard Biener
On Fri, 1 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> When looking at PR114175 (although that bug seems to be now a riscv backend
> bug), I've noticed that for the TYPE_NO_NAMED_ARGS_STDARG_P functions which
> return value through hidden reference, like
> #include 
> 
> struct S { char a[64]; };
> int n;
> 
> struct S
> foo (...)
> {
>   struct S s = {};
>   va_list ap;
>   va_start (ap);
>   for (int i = 0; i < n; ++i)
> if ((i & 1))
>   s.a[0] += va_arg (ap, double);
> else
>   s.a[0] += va_arg (ap, int);
>   va_end (ap);
>   return s;
> }
> we were incorrectly calling assign_parms_setup_varargs twice, once
> at the start of the function and once in
>   if (cfun->stdarg && !DECL_CHAIN (parm))
> assign_parms_setup_varargs (&all, &data, false);
> where parm is the last and only "named" parameter.
> 
> The first call, guarded with TYPE_NO_NAMED_ARGS_STDARG_P, was added in
> r13-3549 and is needed for int bar (...) etc. functions using
> va_start/va_arg/va_end, otherwise the 
>   FOR_EACH_VEC_ELT (fnargs, i, parm)
> in which the other call is will not iterate at all.  But we shouldn't
> be doing that if we have the hidden return pointer.
> 
> With the following patch on the above testcase with -O0 -std=c23 the
> assembly difference is:
>   pushq   %rbp
>   .cfi_def_cfa_offset 16
>   .cfi_offset 6, -16
>   movq%rsp, %rbp
>   .cfi_def_cfa_register 6
>   pushq   %rbx
>   subq$192, %rsp
>   .cfi_offset 3, -24
> - movq%rdi, -192(%rbp)
> - movq%rsi, -184(%rbp)
> - movq%rdx, -176(%rbp)
> - movq%rcx, -168(%rbp)
> - movq%r8, -160(%rbp)
> - movq%r9, -152(%rbp)
> - testb   %al, %al
> - je  .L2
> - movaps  %xmm0, -144(%rbp)
> - movaps  %xmm1, -128(%rbp)
> - movaps  %xmm2, -112(%rbp)
> - movaps  %xmm3, -96(%rbp)
> - movaps  %xmm4, -80(%rbp)
> - movaps  %xmm5, -64(%rbp)
> - movaps  %xmm6, -48(%rbp)
> - movaps  %xmm7, -32(%rbp)
> -.L2:
>   movq%rdi, -312(%rbp)
>   movq%rdi, -192(%rbp)
>   movq%rsi, -184(%rbp)
>   movq%rdx, -176(%rbp)
>   movq%rcx, -168(%rbp)
>   movq%r8, -160(%rbp)
>   movq%r9, -152(%rbp)
>   testb   %al, %al
> - je  .L13
> + je  .L12
>   movaps  %xmm0, -144(%rbp)
>   movaps  %xmm1, -128(%rbp)
>   movaps  %xmm2, -112(%rbp)
>   movaps  %xmm3, -96(%rbp)
>   movaps  %xmm4, -80(%rbp)
>   movaps  %xmm5, -64(%rbp)
>   movaps  %xmm6, -48(%rbp)
>   movaps  %xmm7, -32(%rbp)
> -.L13:
> +.L12:
> plus some renumbering of labels later on which clearly shows
> that because of this bug, we were saving all the registers twice
> rather then once.  With -O2 -std=c23 some of it is DCEd, but we still get
>   subq$160, %rsp
>   .cfi_def_cfa_offset 168
> - testb   %al, %al
> - je  .L2
> - movaps  %xmm0, 24(%rsp)
> - movaps  %xmm1, 40(%rsp)
> - movaps  %xmm2, 56(%rsp)
> - movaps  %xmm3, 72(%rsp)
> - movaps  %xmm4, 88(%rsp)
> - movaps  %xmm5, 104(%rsp)
> - movaps  %xmm6, 120(%rsp)
> - movaps  %xmm7, 136(%rsp)
> -.L2:
>   movq%rdi, -24(%rsp)
>   movq%rsi, -16(%rsp)
>   movq%rdx, -8(%rsp)
>   movq%rcx, (%rsp)
>   movq%r8, 8(%rsp)
>   movq%r9, 16(%rsp)
>   testb   %al, %al
> - je  .L13
> + je  .L12
>   movaps  %xmm0, 24(%rsp)
>   movaps  %xmm1, 40(%rsp)
>   movaps  %xmm2, 56(%rsp)
>   movaps  %xmm3, 72(%rsp)
>   movaps  %xmm4, 88(%rsp)
>   movaps  %xmm5, 104(%rsp)
>   movaps  %xmm6, 120(%rsp)
>   movaps  %xmm7, 136(%rsp)
> -.L13:
> +.L12:
> difference, i.e. this time not all, but the floating point args
> were conditionally all saved twice.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-03-01  Jakub Jelinek  
> 
>   * function.cc (assign_parms): Only call assign_parms_setup_varargs
>   early for TYPE_NO_NAMED_ARGS_STDARG_P functions if fnargs is empty.
> 
> --- gcc/function.cc.jj2024-01-12 13:47:20.834428745 +0100
> +++ gcc/function.cc   2024-02-29 21:14:35.275889093 +0100
> @@ -3650,7 +3650,8 @@ assign_parms (tree fndecl)
>assign_parms_initialize_all (&all);
>fnargs = assign_parms_augmented_arg_list (&all);
>  
> -  if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl)))
> +  if (TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (fndecl))
> +  && fnargs.is_empty ())
>  {
>struct assign_parm_data_one data = {};
>assign_parms_setup_varargs (&all, &data, false);
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/110221 - SLP and loop mask/len

2024-03-01 Thread Richard Biener
On Fri, 1 Mar 2024, Andre Vieira (lists) wrote:

> Hi,
> 
> Bootstrapped and tested the gcc-13 backport of this on gcc-12 for
> aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu and no regressions.
> 
> OK to push to gcc-12 branch?

OK.

Thanks,
Richard.

> Kind regards,
> Andre Vieira
> 
> On 10/11/2023 13:16, Richard Biener wrote:
> > The following fixes the issue that when SLP stmts are internal defs
> > but appear invariant because they end up only using invariant defs
> > then they get scheduled outside of the loop.  This nice optimization
> > breaks down when loop masks or lens are applied since those are not
> > explicitly tracked as dependences.  The following makes sure to never
> > schedule internal defs outside of the vectorized loop when the
> > loop uses masks/lens.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> >  PR tree-optimization/110221
> >  * tree-vect-slp.cc (vect_schedule_slp_node): When loop
> >  masking / len is applied make sure to not schedule
> >  intenal defs outside of the loop.
> > 
> > * gfortran.dg/pr110221.f: New testcase.
> > ---
> >   gcc/testsuite/gfortran.dg/pr110221.f | 17 +
> >   gcc/tree-vect-slp.cc | 10 ++
> >   2 files changed, 27 insertions(+)
> >   create mode 100644 gcc/testsuite/gfortran.dg/pr110221.f
> > 
> > diff --git a/gcc/testsuite/gfortran.dg/pr110221.f
> > b/gcc/testsuite/gfortran.dg/pr110221.f
> > new file mode 100644
> > index 000..8b57384313a
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/pr110221.f
> > @@ -0,0 +1,17 @@
> > +C PR middle-end/68146
> > +C { dg-do compile }
> > +C { dg-options "-O2 -w" }
> > +C { dg-additional-options "-mavx512f --param vect-partial-vector-usage=2" {
> > target avx512f } }
> > +  SUBROUTINE CJYVB(V,Z,V0,CBJ,CDJ,CBY,CYY)
> > +  IMPLICIT DOUBLE PRECISION (A,B,G,O-Y)
> > +  IMPLICIT COMPLEX*16 (C,Z)
> > +  DIMENSION CBJ(0:*),CDJ(0:*),CBY(0:*)
> > +  N=INT(V)
> > +  CALL GAMMA2(VG,GA)
> > +  DO 65 K=1,N
> > +CBY(K)=CYY
> > +65CONTINUE
> > +  CDJ(0)=V0/Z*CBJ(0)-CBJ(1)
> > +  DO 70 K=1,N
> > +70  CDJ(K)=-(K+V0)/Z*CBJ(K)+CBJ(K-1)
> > +  END
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 3e5814c3a31..80e279d8f50 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -9081,6 +9081,16 @@ vect_schedule_slp_node (vec_info *vinfo,
> >   /* Emit other stmts after the children vectorized defs which is
> > earliest possible.  */
> > gimple *last_stmt = NULL;
> > +  if (auto loop_vinfo = dyn_cast  (vinfo))
> > +   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > +   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > + {
> > +   /* But avoid scheduling internal defs outside of the loop when
> > +  we might have only implicitly tracked loop mask/len defs.  */
> > +   gimple_stmt_iterator si
> > + = gsi_after_labels (LOOP_VINFO_LOOP (loop_vinfo)->header);
> > +   last_stmt = *si;
> > + }
> > bool seen_vector_def = false;
> > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> >if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/114164 - unsupported SIMD clone call, unsupported VEC_COND

2024-03-01 Thread Richard Biener
The following avoids creating unsupported VEC_COND_EXPRs as part of
SIMD clone call mask argument setup during vectorization which results
in inefficient decomposing of the operation during vector lowering.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Will push on Monday when arm CI is happy.

Richard.

PR tree-optimization/114164
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Fail if
the code generated for mask argument setup is not supported.
---
 gcc/tree-vect-stmts.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index be0e1a9c69d..14a3ffb5f02 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4210,6 +4210,16 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
 " supported for mismatched vector 
sizes.\n");
  return false;
}
+ if (!expand_vec_cond_expr_p (clone_arg_vectype,
+  arginfo[i].vectype, ERROR_MARK))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+vect_location,
+"cannot compute mask argument for"
+" in-branch vector clones.\n");
+ return false;
+   }
}
  else if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
{
-- 
2.35.3


Re: [PATCH 5/5] RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

2024-03-01 Thread Robin Dapp
Hi Han,

in addition to what Juzhe mentioned (and that late-combine is going
to handle such cases) it should be noted that register pressure
should not be the only consideration here.  Many uarchs have a higher
latency for register-file-crossing moves.  At least without spilling
the vv variant is preferable, with spilling it very much depends.

Regards
 Robin



Re: [PATCH] middle-end/114070 - VEC_COND_EXPR folding

2024-03-01 Thread Richard Biener
On Thu, 29 Feb 2024, Jakub Jelinek wrote:

> On Thu, Feb 29, 2024 at 11:16:54AM +0100, Richard Biener wrote:
> > That said, the quick experiment shows this isn't anything for stage4.
> 
> The earlier the vector lowering is moved in the pass list, the higher
> are the possibilities that match.pd or some other optimization reintroduces
> unsupportable vector operations into the IL.
> 
> Guess your patch looks reasonable.

Pushed.

Thanks,
Richard.

> > >   PR middle-end/114070
> > >   * match.pd ((c ? a : b) op d  -->  c ? (a op d) : (b op d)):
> > >   Allow the folding if before lowering and the current IL
> > >   isn't supported with vcond_mask.
> > > ---
> > >  gcc/match.pd | 18 +++---
> > >  1 file changed, 15 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index f3fffd8dec2..4edba7c84fb 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -5153,7 +5153,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >(op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> > >(if (TREE_CODE_CLASS (op) != tcc_comparison
> > > || types_match (type, TREE_TYPE (@1))
> > > -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> > > +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> > > +   || (optimize_vectors_before_lowering_p ()
> > > +/* The following is optimistic on the side of non-support, we are
> > > +   missing the legacy vcond{,u,eq} cases.  Do this only when
> > > +   lowering will be able to fixup..  */
> > > +&& !expand_vec_cond_expr_p (TREE_TYPE (@1),
> > > +TREE_TYPE (@0), ERROR_MARK)))
> > > (vec_cond @0 (op! @1 @3) (op! @2 @4
> > >  
> > >  /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
> > > @@ -5161,13 +5167,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >(op (vec_cond:s @0 @1 @2) @3)
> > >(if (TREE_CODE_CLASS (op) != tcc_comparison
> > > || types_match (type, TREE_TYPE (@1))
> > > -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> > > +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> > > +   || (optimize_vectors_before_lowering_p ()
> > > +&& !expand_vec_cond_expr_p (TREE_TYPE (@1),
> > > +TREE_TYPE (@0), ERROR_MARK)))
> > > (vec_cond @0 (op! @1 @3) (op! @2 @3
> > >   (simplify
> > >(op @3 (vec_cond:s @0 @1 @2))
> > >(if (TREE_CODE_CLASS (op) != tcc_comparison
> > > || types_match (type, TREE_TYPE (@1))
> > > -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> > > +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> > > +   || (optimize_vectors_before_lowering_p ()
> > > +&& !expand_vec_cond_expr_p (TREE_TYPE (@1),
> > > +TREE_TYPE (@0), ERROR_MARK)))
> > > (vec_cond @0 (op! @3 @1) (op! @3 @2)
> > >  
> > >  #if GIMPLE
> > > 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-03-01 Thread Maxim Kuvyrkov
> On Feb 29, 2024, at 21:59, Richard Earnshaw (lists) 
>  wrote:
> 
> On 29/02/2024 17:55, Andrew Pinski (QUIC) wrote:
>>> -Original Message-
>>> From: Maxim Kuvyrkov 
>>> Sent: Thursday, February 29, 2024 9:46 AM
>>> To: Andrew Pinski (QUIC) 
>>> Cc: Evgeny Karpov ; Andrew Pinski
>>> ; Richard Sandiford ; gcc-
>>> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac
>>> Walker ; Ron Riddle
>>> ; Radek Barton 
>>> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW
>>> environments for AArch64
>>> 
>>> WARNING: This email originated from outside of Qualcomm. Please be wary
>>> of any links or attachments, and do not enable macros.
>>> 
 On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)
>>>  wrote:
 
 
 
> -Original Message-
> From: Evgeny Karpov 
> Sent: Thursday, February 29, 2024 8:46 AM
> To: Andrew Pinski 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
> ; m...@harmstone.com; Zac Walker
> ; Ron Riddle ;
> Radek Barton ; Andrew Pinski (QUIC)
> 
> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
> for AArch64
> 
> Wednesday, February 28, 2024 2:00 AM
> Andrew Pinski wrote:
> 
>> What does this mean with respect to C++ exceptions? Or you using
>> SJLJ exceptions support or the dwarf unwinding ones without SEH
>>> support?
>> I am not sure if SJLJ exceptions is well tested any more in GCC either.
>> 
>> Also I have a question if you ran the full GCC/G++ testsuites and
>> what were the results?
>> If you did run it, did you use a cross compiler or the native
>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions
>>> though)?
> 
> As mentioned in the cover letter and the thread, the current
> contribution covers only the C scope.
> Exception handling is fully disabled for now.
> There is an experimental build with C++ and SEH, however, it is not
> included in the plan for the current contribution.
> 
> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
>>> build
> 
>> If you run using a cross compiler, did you use ssh or some other
>> route to run the applications?
>> 
>> Thanks,
>> Andrew Pinski
> 
> GitHub Actions are used to cross-compile toolchains, packages and
> tests, and execute tests on Windows Arm64.
 
 This does not answer my question because what you are running is just
>>> simple testcases and not the FULL GCC testsuite.
 So again have you ran the GCC testsuite and do you have a dejagnu board to
>>> be able to execute the binaries?
 I think without the GCC testsuite ran to find all of the known failures, 
 you are
>>> going to be running into many issues.
 The GCC testsuite includes many tests for ABI corner cases and many
>>> features that you will most likely not think about testing using your simple
>>> testcases.
 In fact I suspect there will be some of the aarch64 testcases which will 
 need
>>> to be modified for the windows ABI which you have not done yet.
>>> 
>>> Hi Andrew,
>>> 
>>> We (Linaro) have a prototype CI loop setup for testing aarch64-w64-
>>> mingw32, and we have results for gcc-c and libatomic -- see [1].
>>> 
>>> The results are far from clean, but that's expected.  This patch series 
>>> aims at
>>> enabling C hello-world only, and subsequent patch series will improve the
>>> state of the port.
>>> 
>>> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-
>>> build/6/artifact/artifacts/sumfiles/
>> 
>> Looking at these results, this port is not in any shape or form to be 
>> upstreamed right now. Even simple -g will cause failures.
>> Note we don't need a clean testsuite run but the patch series is not even 
>> allowing enabling hello world due to the -g not being able to used.
>> 
> 
> It seemed to me as though the patch was posted for comments, not for 
> immediate inclusion.  I agree this isn't ready for committing yet, but 
> neither should the submitters wait until it's perfect before posting it.
> 
> I think it's gcc-15 material, so now is about the right time to be thinking 
> about it.

Hi Andrew,

I agree with Richard.  This patch series is large as is, and it has clear goals:
1. Enable aarch64-w64-mingw32 to compile C-language hello-world.
2. Not regress any other targets.

As far as I know, it achieves both, but both Microsoft and Linaro will do 
additional testing on x86_64-w64-mingw32 to confirm.

There are more features and fixes for aarch64-w64-mingw32 waiting in the 
development repos on github, but I don't see any point in cleaning them up and 
preparing for submission before this already-quiet-large patchset if reviewed.

Thank you,

--
Maxim Kuvyrkov
https://www.linaro.org



Re: [PATCH] c++: Ensure DECL_CONTEXT is set for temporary vars [PR114005]

2024-03-01 Thread Jason Merrill

On 2/29/24 16:28, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Alternatively we could update 'DECL_CONTEXT' only for
'make_temporary_var_for_ref_to_temp' in call.cc, as a more targetted
fix, but I felt that this way it'd also fix any other similar issues
that have gone uncaught so far.

-- >8 --

Modules streaming requires DECL_CONTEXT to be set for anything streamed.
This patch ensures that 'create_temporary_var' does set a DECL_CONTEXT
for these variables (such as the backing storage for initializer_lists)
even if not inside a function declaration.

PR c++/114005

gcc/cp/ChangeLog:

* init.cc (create_temporary_var): Set DECL_CONTEXT to
current_namespace if at namespace scope.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114005_a.C: New test.
* g++.dg/modules/pr114005_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/init.cc| 2 ++
  gcc/testsuite/g++.dg/modules/pr114005_a.C | 8 
  gcc/testsuite/g++.dg/modules/pr114005_b.C | 7 +++
  3 files changed, 17 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr114005_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr114005_b.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index ac37330527e..e6fca7b3226 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4258,6 +4258,8 @@ create_temporary_var (tree type)
DECL_ARTIFICIAL (decl) = 1;
DECL_IGNORED_P (decl) = 1;
DECL_CONTEXT (decl) = current_function_decl;
+  if (!DECL_CONTEXT (decl))
+DECL_CONTEXT (decl) = current_namespace;


Maybe always set it to current_scope () instead of current_function_decl?

OK with that change.

Jason



Re: [PATCH] c++/modules: Stream definitions for implicit instantiations [PR114170]

2024-03-01 Thread Jason Merrill

On 2/29/24 20:08, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

An implicit instantiation has an initializer depending on whether
DECL_INITIALIZED_P is set (like normal VAR_DECLs) which needs to be
written to ensure that consumers of header modules properly emit
definitions for these instantiations. This patch ensures that we
correctly fallback to checking this flag when DECL_INITIAL is not set
for a template instantiation.


Can you say more about how and why DECL_INITIAL and DECL_INITIALIZED_P 
are inconsistent here?



As a drive-by fix, also ensures that the count of initializers matches
the actual number of initializers written. This doesn't seem to be
necessary for correctness in the current testsuite, but feels wrong and
makes debugging harder when initializers aren't properly written for
other reasons.

PR c++/114170

gcc/cp/ChangeLog:

* module.cc (has_definition): Fall back to DECL_INITIALIZED_P
when DECL_INITIAL is not set on a template.
(module_state::write_inits): Only increment count when
initializers are actually written.

gcc/testsuite/ChangeLog:

* g++.dg/modules/var-tpl-2_a.H: New test.
* g++.dg/modules/var-tpl-2_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   |  8 +---
  gcc/testsuite/g++.dg/modules/var-tpl-2_a.H | 10 ++
  gcc/testsuite/g++.dg/modules/var-tpl-2_b.C | 10 ++
  3 files changed, 25 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 1b2ba2e0fa8..09578de41ec 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11586,8 +11586,9 @@ has_definition (tree decl)
  
  case VAR_DECL:

if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl))
-   return DECL_INITIAL (decl);
+ && DECL_TEMPLATE_INFO (decl)
+ && DECL_INITIAL (decl))
+   return true;
else
{
  if (!DECL_INITIALIZED_P (decl))
@@ -17528,13 +17529,14 @@ module_state::write_inits (elf_out *to, depset::hash 
&table, unsigned *crc_ptr)
tree list = static_aggregates;
for (int passes = 0; passes != 2; passes++)
  {
-  for (tree init = list; init; init = TREE_CHAIN (init), count++)
+  for (tree init = list; init; init = TREE_CHAIN (init))
if (TREE_LANG_FLAG_0 (init))
  {
tree decl = TREE_VALUE (init);
  
  	dump ("Initializer:%u for %N", count, decl);

sec.tree_node (decl);
+   ++count;
  }
  
list = tls_aggregates;

diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H 
b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
new file mode 100644
index 000..607fc0b808e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
@@ -0,0 +1,10 @@
+// PR c++/114170
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+inline int f() { return 42; }
+
+template
+inline int v = f();
+
+inline int g() { return v; }
diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C 
b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
new file mode 100644
index 000..6d2ef4004e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
@@ -0,0 +1,10 @@
+// PR c++/114170
+// { dg-module-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import "var-tpl-2_a.H";
+
+int main() {
+  if (v != 42)
+__builtin_abort();
+}




Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 2/29/24 15:56, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a function template
instantiation it seems we neglect to make the function depend on the enum
definition, which ultimately causes streaming to fail due to the enum
definition not being streamed before uses of its enumerators are streamed,
as far as I can tell.


I would think that the function doesn't need to depend on the local enum 
in order for the local enum to be streamed before the use of the 
enumerator, which comes after the definition of the enum in the function 
body?


Why isn't streaming the body of the function outputting the enum 
definition before the use of the enumerator?



This was nearly enough to make things work, except we now ran into
issues with the local TYPE/CONST_DECL copies when streaming the
constexpr version of a function body.  It occurred to me that we don't
need to make copies of local types when copying a constexpr function
body; only VAR_DECLs etc need to be copied for sake of recursive
constexpr calls.  So this patch adjusts copy_fn accordingly.


Maybe adjust can_be_nonlocal instead?  It seems unnecessary in general 
to remap types and enumerators for inlining.


Jason



[COMMITTED htdocs] robots.txt: Disallow various wiki actions

2024-03-01 Thread Mark Wielaard
It is fine for robots to crawl the wiki pages, but they should perform
actions, generate huge diffs, search/highlight pages or generate
calendars.
---
 htdocs/robots.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/robots.txt b/htdocs/robots.txt
index 057c5899..36be4d13 100644
--- a/htdocs/robots.txt
+++ b/htdocs/robots.txt
@@ -14,4 +14,8 @@ Disallow: /bugzilla/show_bug.cgi*ctype=xml*
 Disallow: /bugzilla/attachment.cgi
 Disallow: /bugzilla/showdependencygraph.cgi
 Disallow: /bugzilla/showdependencytree.cgi
+Disallow: /wiki/*?action=*
+Disallow: /wiki/*?diffs=*
+Disallow: /wiki/*?highlight=*
+Disallow: /wiki/*?calparms=*
 Crawl-Delay: 60
-- 
2.43.2



Re: [PATCH] c++: auto(x) partial substitution [PR110025, PR114138]

2024-03-01 Thread Jason Merrill

On 2/29/24 14:17, Patrick Palka wrote:

On Wed, 28 Feb 2024, Jason Merrill wrote:

I wonder about, rather than returning it directly, setting its level to 1 for
the substitution?


Done, that works nicely.


Then I wonder if it would be feasible to give all autos level 0 and adjust it
here?  That's probably not a stage 4 change, though...


It seems feasible.  I experimented doing this in the past[1] and ran
into two complications.  One complication was with constrained auto
deduction, e.g.

   template
   void g() {
 C auto x = ...;
   };

Here the underlying concept-id that we enter satisfaction with is
C where this auto has level one greater than the template
depth, and the argument vector we pass has an extra innermost level
containing the deduced type, so things match up nicely.  This seems
to be the only place where we truly need auto to have a non 0/1 level.
In my WIP patch in that thread I just made do_auto_deduction build the
concept-id C in terms of an auto of the proper level before
entering satisfaction, which was kind of ugly but worked.


So maybe set its level to TMPL_ARGS_DEPTH (targs) after 
add_to_template_args, rather than 1?



The other complication was with Concepts TS extended auto deduction:

   tuple t = tuple{};

because unify_pack_expansion (called from fn_type_unification during
do_auto_deduction) isn't prepared to see a parameter pack of level 0
(unify has no problems with ordinary tparms of level 0 though).  This
shouldn't be too hard to fix though.

How does the following look for trunk and perhaps 13 (there should be
no functional change for code that doesn't use auto(x))?

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587818.html

-- >8 --

PR c++/110025
PR c++/114138

gcc/cp/ChangeLog:

* cp-tree.h (make_cast_auto): Declare.
* parser.cc (cp_parser_functional_cast): Replace a parsed auto
with a level-less one via make_cast_auto.
* pt.cc (find_parameter_packs_r): Don't treat level-less auto
as a type parameter pack.
(tsubst) : Generalized CTAD placeholder
handling to all level-less autos.
(make_cast_auto): Define.
(do_auto_deduction): Handle replacement of a level-less auto.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/auto-fncast16.C: New test.
* g++.dg/cpp23/auto-fncast17.C: New test.
* g++.dg/cpp23/auto-fncast18.C: New test.
---
  gcc/cp/cp-tree.h   |  1 +
  gcc/cp/parser.cc   | 11 
  gcc/cp/pt.cc   | 37 +++-
  gcc/testsuite/g++.dg/cpp23/auto-fncast16.C | 12 
  gcc/testsuite/g++.dg/cpp23/auto-fncast17.C | 15 +
  gcc/testsuite/g++.dg/cpp23/auto-fncast18.C | 69 ++
  6 files changed, 142 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast16.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast17.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast18.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 04c3aa6cd91..6f1da1c7bad 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7476,6 +7476,7 @@ extern tree make_decltype_auto(void);
  extern tree make_constrained_auto (tree, tree);
  extern tree make_constrained_decltype_auto(tree, tree);
  extern tree make_template_placeholder (tree);
+extern tree make_cast_auto (void);
  extern bool template_placeholder_p(tree);
  extern bool ctad_template_p   (tree);
  extern bool unparenthesized_id_or_class_member_access_p (tree);
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3ee9d49fb8e..3dbe6722ba1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -33314,6 +33314,17 @@ cp_parser_functional_cast (cp_parser* parser, tree 
type)
if (!type)
  type = error_mark_node;
  
+  if (TREE_CODE (type) == TYPE_DECL

+  && is_auto (TREE_TYPE (type)))
+type = TREE_TYPE (type);
+
+  if (is_auto (type)
+  && !AUTO_IS_DECLTYPE (type)
+  && !PLACEHOLDER_TYPE_CONSTRAINTS (type)
+  && !CLASS_PLACEHOLDER_TEMPLATE (type))
+/* auto(x) and auto{x} are represented by level-less auto.  */
+type = make_cast_auto ();
+
if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
  {
cp_lexer_set_source_position (parser->lexer);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 2803824d11e..369e33f23c7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3921,7 +3921,8 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 parameter pack (14.6.3), or the type-specifier-seq of a type-id that
 is a pack expansion, the invented template parameter is a template
 parameter pack.  */
-  if (ppd->type_pack_expansion_p && is_auto (t))
+  if (ppd->type_pack_expansion_p && is_auto (t)
+ && TEMPLATE_TYPE_LEVEL (t) != 0)
TEMPLATE_TYPE_PARAMETER_PACK (t) = true;
   

Re: [PATCH] dwarf2out: Don't move variable sized aggregates to comdat [PR114015]

2024-03-01 Thread Jason Merrill

On 2/29/24 07:10, Jakub Jelinek wrote:

Hi!

The following testcase ICEs, because we decide to move that
struct { char a[n]; } DW_TAG_structure_type into .debug_types section
/ DW_UT_type DWARF5 unit, but refer from there to a DW_TAG_variable
(created artificially for the array bounds).
Even with non-bitint, I think it is just wrong to use .debug_types
section / DW_UT_type for something that uses DW_OP_fbreg and similar
in it, things clearly dependent on a particular function.
In most cases, is_nested_in_subprogram (die) check results in such
aggregates not being moved, but in the function parameter type case
that is not the case.

The following patch fixes it by returning false from should_move_die_to_comdat
for non-constant sized aggregate types, i.e. when either we gave up on
adding DW_AT_byte_size for it because it wasn't expressable, or when
it is something non-constant (location description, reference, ...).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2024-02-29  Jakub Jelinek  

PR debug/114015
* dwarf2out.cc (should_move_die_to_comdat): Return false for
aggregates without DW_AT_byte_size attribute or with non-constant
DW_AT_byte_size.

* gcc.dg/debug/dwarf2/pr114015.c: New test.

--- gcc/dwarf2out.cc.jj 2024-02-17 01:14:48.157790666 +0100
+++ gcc/dwarf2out.cc2024-02-28 17:11:44.259252850 +0100
@@ -8215,6 +8215,15 @@ should_move_die_to_comdat (dw_die_ref di
|| is_nested_in_subprogram (die)
|| contains_subprogram_definition (die))
return false;
+  if (die->die_tag != DW_TAG_enumeration_type)
+   {
+ /* Don't move non-constant size aggregates.  */
+ dw_attr_node *sz = get_AT (die, DW_AT_byte_size);
+ if (sz == NULL
+ || (AT_class (sz) != dw_val_class_unsigned_const
+ && AT_class (sz) != dw_val_class_unsigned_const_implicit))
+   return false;
+   }
return true;
  case DW_TAG_array_type:
  case DW_TAG_interface_type:
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr114015.c.jj 2024-02-28 
17:22:33.206221495 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr114015.c2024-02-28 
17:21:49.357831730 +0100
@@ -0,0 +1,14 @@
+/* PR debug/114015 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-g -fvar-tracking-assignments -fdebug-types-section -w" } */
+
+#if __BITINT_MAXWIDTH__ >= 236
+typedef _BitInt(236) B;
+#else
+typedef _BitInt(63) B;
+#endif
+
+int
+foo (B n, struct { char a[n]; } o)
+{
+}

Jakub





Re: [PATCH] c++/modules: Stream definitions for implicit instantiations [PR114170]

2024-03-01 Thread Nathaniel Shead
On Fri, Mar 01, 2024 at 08:18:09AM -0500, Jason Merrill wrote:
> On 2/29/24 20:08, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > -- >8 --
> > 
> > An implicit instantiation has an initializer depending on whether
> > DECL_INITIALIZED_P is set (like normal VAR_DECLs) which needs to be
> > written to ensure that consumers of header modules properly emit
> > definitions for these instantiations. This patch ensures that we
> > correctly fallback to checking this flag when DECL_INITIAL is not set
> > for a template instantiation.
> 
> Can you say more about how and why DECL_INITIAL and DECL_INITIALIZED_P are
> inconsistent here?

For variables with non-trivial dynamic initialization, DECL_INITIAL can
be empty after 'split_nonconstant_init' but DECL_INITIALIZED_P is still
set; we need to check the latter to determine if we need to go looking
for a definition to emit (often in 'static_aggregates' here). This is
the case in the linked testcase.

However, for template specialisations (not instantiations?) we primarily
care about DECL_INITIAL; if the variable has initialization depending on
a template parameter then we'll need to emit that definition even though
it doesn't yet have DECL_INITIALIZED_P set; this is the case in e.g.

  template  int value = N;

> > As a drive-by fix, also ensures that the count of initializers matches
> > the actual number of initializers written. This doesn't seem to be
> > necessary for correctness in the current testsuite, but feels wrong and
> > makes debugging harder when initializers aren't properly written for
> > other reasons.
> > 
> > PR c++/114170
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (has_definition): Fall back to DECL_INITIALIZED_P
> > when DECL_INITIAL is not set on a template.
> > (module_state::write_inits): Only increment count when
> > initializers are actually written.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/var-tpl-2_a.H: New test.
> > * g++.dg/modules/var-tpl-2_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/module.cc   |  8 +---
> >   gcc/testsuite/g++.dg/modules/var-tpl-2_a.H | 10 ++
> >   gcc/testsuite/g++.dg/modules/var-tpl-2_b.C | 10 ++
> >   3 files changed, 25 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
> >   create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index 1b2ba2e0fa8..09578de41ec 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -11586,8 +11586,9 @@ has_definition (tree decl)
> >   case VAR_DECL:
> > if (DECL_LANG_SPECIFIC (decl)
> > - && DECL_TEMPLATE_INFO (decl))
> > -   return DECL_INITIAL (decl);
> > + && DECL_TEMPLATE_INFO (decl)
> > + && DECL_INITIAL (decl))
> > +   return true;
> > else
> > {
> >   if (!DECL_INITIALIZED_P (decl))
> > @@ -17528,13 +17529,14 @@ module_state::write_inits (elf_out *to, 
> > depset::hash &table, unsigned *crc_ptr)
> > tree list = static_aggregates;
> > for (int passes = 0; passes != 2; passes++)
> >   {
> > -  for (tree init = list; init; init = TREE_CHAIN (init), count++)
> > +  for (tree init = list; init; init = TREE_CHAIN (init))
> > if (TREE_LANG_FLAG_0 (init))
> >   {
> > tree decl = TREE_VALUE (init);
> > dump ("Initializer:%u for %N", count, decl);
> > sec.tree_node (decl);
> > +   ++count;
> >   }
> > list = tls_aggregates;
> > diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H 
> > b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
> > new file mode 100644
> > index 000..607fc0b808e
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
> > @@ -0,0 +1,10 @@
> > +// PR c++/114170
> > +// { dg-additional-options "-fmodule-header" }
> > +// { dg-module-cmi {} }
> > +
> > +inline int f() { return 42; }
> > +
> > +template
> > +inline int v = f();
> > +
> > +inline int g() { return v; }
> > diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C 
> > b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
> > new file mode 100644
> > index 000..6d2ef4004e6
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
> > @@ -0,0 +1,10 @@
> > +// PR c++/114170
> > +// { dg-module-do run }
> > +// { dg-additional-options "-fmodules-ts" }
> > +
> > +import "var-tpl-2_a.H";
> > +
> > +int main() {
> > +  if (v != 42)
> > +__builtin_abort();
> > +}
> 


Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-03-01 Thread Richard Earnshaw (lists)
On 29/02/2024 17:56, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 05:51:03PM +, Richard Earnshaw (lists) wrote:
>> Oh, but wait!  Perhaps that now falls into the initial 'if' clause and we 
>> never reach the point where you pick zero.  So perhaps I'm worrying about 
>> nothing.
> 
> If you are worried about the
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> +  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  n_named_args = 0;
> case in the patch, we know at that point that the initial n_named_args is
> equal to structure_value_addr_parm, so either 0, in that case
> --n_named_args;
> would yield the undesirable negative value, so we want 0 instead; for that
> case we could as well just have ; in there instead of n_named_args = 0;,
> or it is 1, in that case --n_named_args; would turn that into 0.
> 
>   Jakub
> 

No, I was thinking about the case of strict_argument_naming when the first 
argument is the artificial return value pointer.  In that case we'd want 
n_named_args=1.

But I think it's a non-issue as that will be caught by 

  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
   && targetm.calls.strict_argument_naming (args_so_far))
 ;

R.


Re: [PATCH] c++/modules: Stream definitions for implicit instantiations [PR114170]

2024-03-01 Thread Jason Merrill

On 3/1/24 08:41, Nathaniel Shead wrote:

On Fri, Mar 01, 2024 at 08:18:09AM -0500, Jason Merrill wrote:

On 2/29/24 20:08, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

An implicit instantiation has an initializer depending on whether
DECL_INITIALIZED_P is set (like normal VAR_DECLs) which needs to be
written to ensure that consumers of header modules properly emit
definitions for these instantiations. This patch ensures that we
correctly fallback to checking this flag when DECL_INITIAL is not set
for a template instantiation.


Can you say more about how and why DECL_INITIAL and DECL_INITIALIZED_P are
inconsistent here?


For variables with non-trivial dynamic initialization, DECL_INITIAL can
be empty after 'split_nonconstant_init' but DECL_INITIALIZED_P is still
set; we need to check the latter to determine if we need to go looking
for a definition to emit (often in 'static_aggregates' here). This is
the case in the linked testcase.


Ah, right.


However, for template specialisations (not instantiations?) we primarily
care about DECL_INITIAL; if the variable has initialization depending on
a template parameter then we'll need to emit that definition even though
it doesn't yet have DECL_INITIALIZED_P set; this is the case in e.g.

   template  int value = N;


It seems odd that DECL_INITIALIZED_P wouldn't be set for the pattern of 
value.  But given that, the patch makes sense.  Let's add a comment like


/* DECL_INITIALIZED_P might not be set on a dependent VAR_DECL.  */

to explain the DECL_TEMPLATE_INFO special case.  OK with that change.


As a drive-by fix, also ensures that the count of initializers matches
the actual number of initializers written. This doesn't seem to be
necessary for correctness in the current testsuite, but feels wrong and
makes debugging harder when initializers aren't properly written for
other reasons.

PR c++/114170

gcc/cp/ChangeLog:

* module.cc (has_definition): Fall back to DECL_INITIALIZED_P
when DECL_INITIAL is not set on a template.
(module_state::write_inits): Only increment count when
initializers are actually written.

gcc/testsuite/ChangeLog:

* g++.dg/modules/var-tpl-2_a.H: New test.
* g++.dg/modules/var-tpl-2_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/module.cc   |  8 +---
   gcc/testsuite/g++.dg/modules/var-tpl-2_a.H | 10 ++
   gcc/testsuite/g++.dg/modules/var-tpl-2_b.C | 10 ++
   3 files changed, 25 insertions(+), 3 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
   create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 1b2ba2e0fa8..09578de41ec 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11586,8 +11586,9 @@ has_definition (tree decl)
   case VAR_DECL:
 if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl))
-   return DECL_INITIAL (decl);
+ && DECL_TEMPLATE_INFO (decl)
+ && DECL_INITIAL (decl))
+   return true;
 else
{
  if (!DECL_INITIALIZED_P (decl))
@@ -17528,13 +17529,14 @@ module_state::write_inits (elf_out *to, depset::hash 
&table, unsigned *crc_ptr)
 tree list = static_aggregates;
 for (int passes = 0; passes != 2; passes++)
   {
-  for (tree init = list; init; init = TREE_CHAIN (init), count++)
+  for (tree init = list; init; init = TREE_CHAIN (init))
if (TREE_LANG_FLAG_0 (init))
  {
tree decl = TREE_VALUE (init);
dump ("Initializer:%u for %N", count, decl);
sec.tree_node (decl);
+   ++count;
  }
 list = tls_aggregates;
diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H 
b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
new file mode 100644
index 000..607fc0b808e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
@@ -0,0 +1,10 @@
+// PR c++/114170
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+inline int f() { return 42; }
+
+template
+inline int v = f();
+
+inline int g() { return v; }
diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C 
b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
new file mode 100644
index 000..6d2ef4004e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
@@ -0,0 +1,10 @@
+// PR c++/114170
+// { dg-module-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import "var-tpl-2_a.H";
+
+int main() {
+  if (v != 42)
+__builtin_abort();
+}








Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-03-01 Thread Jakub Jelinek
On Fri, Mar 01, 2024 at 01:53:08PM +, Richard Earnshaw (lists) wrote:
> On 29/02/2024 17:56, Jakub Jelinek wrote:
> > On Thu, Feb 29, 2024 at 05:51:03PM +, Richard Earnshaw (lists) wrote:
> >> Oh, but wait!  Perhaps that now falls into the initial 'if' clause and we 
> >> never reach the point where you pick zero.  So perhaps I'm worrying about 
> >> nothing.
> > 
> > If you are worried about the
> > +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> > +  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
> >  n_named_args = 0;
> > case in the patch, we know at that point that the initial n_named_args is
> > equal to structure_value_addr_parm, so either 0, in that case
> > --n_named_args;
> > would yield the undesirable negative value, so we want 0 instead; for that
> > case we could as well just have ; in there instead of n_named_args = 0;,
> > or it is 1, in that case --n_named_args; would turn that into 0.
> > 
> > Jakub
> > 
> 
> No, I was thinking about the case of strict_argument_naming when the first 
> argument is the artificial return value pointer.  In that case we'd want 
> n_named_args=1.
> 
> But I think it's a non-issue as that will be caught by 
> 
>   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>&& targetm.calls.strict_argument_naming (args_so_far))
>  ;

Yes, that for strict argument naming and calls to
struct large_struct foo (...);
with the patch we set n_named_args = 1 early:
  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = structure_value_addr_parm;
and then
  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
   && targetm.calls.strict_argument_naming (args_so_far))
;
doesn't change it.

Jakub



Update GCC 14 OpenACC changes some more (was: [wwwdocs] gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update)

2024-03-01 Thread Thomas Schwinge
Hi!

On 2024-02-27T20:16:52+0100, Tobias Burnus  wrote:
> Minor update for older and more recent changes.
>
> Comments?

> gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update

> Update OpenACC for one new feature (Fortran interface to exisiting
> C/C++ routines).

> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html

> +OpenACC 3.2: The following API routines are now available in
> +  Fortran using the openacc module or the
> +  open_lib.h header file: acc_alloc,
> +  acc_free, acc_hostptr,
> +  acc_deviceptr, acc_memcpy_to_device,
> +  acc_memcpy_to_device_async,
> +  acc_memcyp_from_device and
> +  acc_memcyp_from_device_async.

Thanks -- but you have to improve your copy'n'paste skills.  ;-P

On top of your wwwdocs commit f92f353bb0e932edba7d063b2609943683cf0a36
"gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update", I've
pushed commit df2bc49fc018c2b1aeb27030fe1967470d0d4ec3
"Update GCC 14 OpenACC changes some more", see attached.


Grüße
 Thomas


>From df2bc49fc018c2b1aeb27030fe1967470d0d4ec3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 1 Mar 2024 15:01:54 +0100
Subject: [PATCH] Update GCC 14 OpenACC changes some more

Follow-up to commit f92f353bb0e932edba7d063b2609943683cf0a36
"gcc-14/changes.html + projects/gomp/: OpenMP + OpenACC update":

  - 's%acc_alloc%acc_malloc'
  - add 'acc_map_data' and 'acc_unmap_data'
  - swap 'acc_deviceptr' and 'acc_hostptr'
  - 's%memcyp%memcpy%g'
---
 htdocs/gcc-14/changes.html | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e8004d4a..d88fbc96 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -120,12 +120,14 @@ a work-in-progress.
   constructs.
 OpenACC 3.2: The following API routines are now available in
   Fortran using the openacc module or the
-  openacc_lib.h header file: acc_alloc,
-  acc_free, acc_hostptr,
-  acc_deviceptr, acc_memcpy_to_device,
+  openacc_lib.h header file:
+  acc_malloc, acc_free,
+  acc_map_data, acc_unmap_data,
+  acc_deviceptr, acc_hostptr,
+  acc_memcpy_to_device,
   acc_memcpy_to_device_async,
-  acc_memcyp_from_device, and
-  acc_memcyp_from_device_async.
+  acc_memcpy_from_device, and
+  acc_memcpy_from_device_async.
   
   
   For offload-device code generated via OpenMP and OpenACC, the math
-- 
2.43.0



Re: [Patch] OpenACC: Update libgomp.texi + openacc{.f90,_lib.h} for 3.1 arg-name changes

2024-03-01 Thread Tobias Burnus

Hi Thomas,


Thomas Schwinge wrote:

On 2024-02-27T20:11:30+0100, Tobias Burnus  wrote:

The attached patch updates the manual to match OpenACC 3.3
specification for the implemented routines.

But not update references to OpenACC 3.3, too?


As the change is not really visible (except when using Fortran 
keywords), it was not really clear to me whether the reference should be 
either changed to *or* augmented by the OpenACC 3.1 *or* 3.3 
specification reference.


What do you prefer? 3.1 or 3.3, in addition or instead of the existing 
2.x (?) references?



The questions is whether we want to do this now, or once we actually
support 3.1 or 3.3; what was your intention for preparing this now?


Fallout of some bug fixes I intended to to in the .texi file, which in 
turn was a fallout of the trivial addition of the 3.3 interfaces for 
Fortran. Well, then I realized that 3.1 changed the argument names as well.


I think we should at least do the .texi bug fixes. Additionally, those 
'type, dimension(:[,:]...)' look very odd – thus, I would be inclined to 
do those as well.


Otherwise, it is more the question when to break the keyword= API; 
fortunately, it is not an ABI issue as the compiler just uses it to 
reorder the arguments back to the original declaration.



NOTE: Those argument names *do* have an effect and can be a breaking
change as Fortran permits using the arg name in the call, e.g.,
call acc_copyin(a=myVar)  ! old
must now be called either as
call acc_copyin(data_arg=myVar)  ! new
or as
call acc_copyin(myVar)  ! works with old and new names
As the latter is way more common, the spec change hopefully does not
break too many programs.

I wonder: would it happen to be possible via "Fortran interface magic" to
actually support several variants of named arguments?  I agree we can
drop any bogus GCC-local variants, but is it possible to support all the
official variants?


Obviously not as the default (Fortran + real world) is to use no 
keywords – and then the two variants become ambiguous. Therefore, 
Fortran doesn't permit to combine two specific functions that only 
differ in this aspect.


If a real-world program uses the keywords by ill chance, it still had 
the very same problem depending on the compiler version and vendor as 
that's an upstream spec change.


The simple solution on the program side is just to drop the keyword – 
then it will work with either variant.


I think only very programs are affected – possibly even none. And I 
wonder how other compilers handle this, given that they also started 
implementing (selected) OpenACC 2.7 and 3.x features (including 3.3, as 
real-world programs proof).



And, finally, it synced over all named constants from openacc.f90 to
config/accel/openacc.f90.

I don't think that's necessary: as I understand, that one's for
'acc_on_device' only?


I think you are right — unless 'f951' is run on the device side, which 
won't happen for offloading, the accelerator version of the module file 
is not read – only the host version. The named constants will be 
expanded early to their numeric value and only the procedure calls 
remain. — Of those, only 'acc_on_device' has to be available on the 
device side and — hence, it is used at lto/link time by the device-side 
of the linker (by linking libgomp.a).


Thus, I withdraw this change as not being required, not harming, but 
wasting some GCC-build-time (only) file storage size and CPU cycles.


Tobias


Re: Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-03-01 Thread 钟居哲
+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_4;
+  const int segment_permute_8;

Why do we only have 2/4/8, I think we should have 2/3/4/5/6/7/8


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-02-28 05:27
To: juzhe.zh...@rivai.ai; gcc-patches; palmer; kito.cheng
CC: rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.
> This patch looks odd to me.
> I don't see memrefs in the trunk code.
 
It's on top of the vle/vse offset handling patch from
a while back that I haven't committed yet.
 
> Also, I prefer list all cost in cost tune info for NF = 2 ~ 8 like ARM SVE 
> does:
I don't mind having separate costs for each but I figured they
scale anyway with the number of vectors already.  Attached v2
is more similar to aarch64.
 
Regards
Robin
 
Subject: [PATCH v2] RISC-V: Add initial cost handling for segment
loads/stores.
 
This patch makes segment loads and stores more expensive.  It adds
segment_permute_2 (as well as 4 and 8) cost fields to the common vector
costs and adds handling to adjust_stmt_cost.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_permute cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_permute_[248] to 1.
---
gcc/config/riscv/riscv-protos.h|   5 +
gcc/config/riscv/riscv-vector-costs.cc | 139 +
gcc/config/riscv/riscv.cc  |   6 ++
3 files changed, 108 insertions(+), 42 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..9b737aca1a3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,6 +218,11 @@ struct common_vector_cost
   const int gather_load_cost;
   const int scatter_store_cost;
+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_4;
+  const int segment_permute_8;
+
   /* Cost of a vector-to-scalar operation.  */
   const int vec_to_scalar_cost;
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index adf9c197df5..c8178d71101 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1043,6 +1043,25 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
}
+/* Returns the group size i.e. the number of vectors to be loaded by a
+   segmented load/store instruction.  Return 0 if it is no segmented
+   load/store.  */
+static int
+segment_loadstore_group_size (enum vect_cost_for_stmt kind,
+   stmt_vec_info stmt_info)
+{
+  if (stmt_info
+  && (kind == vector_load || kind == vector_store)
+  && STMT_VINFO_DATA_REF (stmt_info))
+{
+  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+  if (stmt_info
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+ return DR_GROUP_SIZE (stmt_info);
+}
+  return 0;
+}
+
/* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of riscv_builtin_vectorization_cost handling which doesn't have any
@@ -1067,55 +1086,91 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, 
loop_vec_info loop,
 case vector_load:
 case vector_store:
{
-   /* Unit-stride vector loads and stores do not have offset addressing
-  as opposed to scalar loads and stores.
-  If the address depends on a variable we need an additional
-  add/sub for each load/store in the worst case.  */
-   if (stmt_info && stmt_info->stmt)
+   if (stmt_info && stmt_info->stmt && STMT_VINFO_DATA_REF (stmt_info))
{
-   data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
-   class loop *father = stmt_info->stmt->bb->loop_father;
-   if (!loop && father && !father->inner && father->superloops)
+   /* Segment loads and stores.  When the group size is > 1
+ the vectorizer will add a vector load/store statement for
+ each vector in the group.  Here we additionally add permute
+ costs for each.  */
+   /* TODO: Indexed and ordered/unordered cost.  */
+   int group_size = segment_loadstore_group_size (kind, stmt_info);
+   if (group_size > 1)
+ {
+   switch (group_size)
+ {
+ case 2:
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost += costs->vla->segment_permute_2;
+   else
+ stmt_cost += costs->vls->segment_permute_2;
+   break;
+ case 4:
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost += costs->vla->segment_permute_4;
+   else
+ stmt_cost += costs->vls->segment_permute_4;
+   break;
+ case 8:
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost += costs->vla->segment_permute_8;
+   else
+ stmt_cost += costs

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-03-01 Thread Richard Earnshaw (lists)
On 29/02/2024 15:55, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 02:14:05PM +, Richard Earnshaw wrote:
>>> I tried the above on arm, aarch64 and x86_64 and that seems fine,
>>> including the new testcase you added.
>>>
>>
>> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
>> n_named_args entirely, it doesn't need it (I don't think it even existed
>> when the AAPCS code was added).
> 
> So far I've just checked that the new testcase passes not just on
> x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
> with vanilla trunk.
> Haven't posted this patch in patch form, plus while I'm not really sure
> whether setting n_named_args to 0 or not changing in the
> !pretend_outgoing_varargs_named is right, the setting to 0 feels more
> correct to me.  If structure_value_addr_parm is 1, the function effectively
> has a single named argument and then ... args and if the target wants
> n_named_args to be number of named arguments except the last, then that
> should be 0 rather than 1.
> 
> Thus, is the following patch ok for trunk then?
> 
> 2024-02-29  Jakub Jelinek  
> 
>   PR target/107453

PR 114136

Would be more appropriate for this, I think.

Otherwise, OK.

R.

>   * calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
>   n_named_args initially before INIT_CUMULATIVE_ARGS to
>   structure_value_addr_parm rather than 0, after it don't modify
>   it if strict_argument_naming and clear only if
>   !pretend_outgoing_varargs_named.
> 
> --- gcc/calls.cc.jj   2024-01-22 11:48:08.045847508 +0100
> +++ gcc/calls.cc  2024-02-29 16:24:47.799855912 +0100
> @@ -2938,7 +2938,7 @@ expand_call (tree exp, rtx target, int i
>/* Count the struct value address, if it is passed as a parm.  */
>+ structure_value_addr_parm);
>else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> -n_named_args = 0;
> +n_named_args = structure_value_addr_parm;
>else
>  /* If we know nothing, treat all args as named.  */
>  n_named_args = num_actuals;
> @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int i
>   we do not have any reliable way to pass unnamed args in
>   registers, so we must force them into memory.  */
>  
> -  if (type_arg_types != 0
> +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>&& targetm.calls.strict_argument_naming (args_so_far))
>  ;
>else if (type_arg_types != 0
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  /* Don't include the last named arg.  */
>  --n_named_args;
> -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> +&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  n_named_args = 0;
>else
>  /* Treat all args as named.  */
> 
>   Jakub
> 



Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-03-01 Thread Andre Vieira (lists)

Hi Thiago,

Thanks for this, LGTM but I can't approve this, CC'ing Richard.

Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 
'gcc/testsuite' from bullet points 2-4.


Kind regards,
Andre

On 13/01/2024 00:55, Thiago Jung Bauermann wrote:

Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into
permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration
into a permerror") these tests fail with errors such as:

   FAIL: gcc.target/arm/pr59858.c (test for excess errors)
   FAIL: gcc.target/arm/pr65647.c (test for excess errors)
   FAIL: gcc.target/arm/pr65710.c (test for excess errors)
   FAIL: gcc.target/arm/pr97969.c (test for excess errors)

Here's one example of the excess errors:

   FAIL: gcc.target/arm/pr65647.c (test for excess errors)
   Excess errors:
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: 
initialization of 'int' from 'int *' makes integer from pointer without a cast 
[-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: 
initialization of 'int' from 'struct S1 *' makes integer from pointer without a 
cast [-Wint-conversion]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: implicit 
declaration of function 'fn3'; did you mean 'fn2'? 
[-Wimplicit-function-declaration]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: implicit 
declaration of function 'fn5'; did you mean 'fn4'? 
[-Wimplicit-function-declaration]
   /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: implicit 
declaration of function 'fn6'; did you mean 'fn4'? 
[-Wimplicit-function-declaration]

PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE.
PR target/65647 and PR target/97969 test for a compilation infinite loop.

Therefore, add -fpermissive so that the tests behave as they did previously.
Tested on armv8l-linux-gnueabihf.

gcc/testsuite/ChangeLog:
* gcc.target/arm/pr59858.c: Add -fpermissive.
* gcc/testsuite/gcc.target/arm/pr65647.c: Likewise.
* gcc/testsuite/gcc.target/arm/pr65710.c: Likewise.
* gcc/testsuite/gcc.target/arm/pr97969.c: Likewise.
---
  gcc/testsuite/gcc.target/arm/pr59858.c | 2 +-
  gcc/testsuite/gcc.target/arm/pr65647.c | 2 +-
  gcc/testsuite/gcc.target/arm/pr65710.c | 2 +-
  gcc/testsuite/gcc.target/arm/pr97969.c | 2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
b/gcc/testsuite/gcc.target/arm/pr59858.c
index 3360b48e8586..9336edfce277 100644
--- a/gcc/testsuite/gcc.target/arm/pr59858.c
+++ b/gcc/testsuite/gcc.target/arm/pr59858.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w" 
} */
+/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
-fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w 
-fpermissive" } */
  /* { dg-require-effective-target fpic } */
  /* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft -mfloat-abi=hard" { *-*-* } 
{ "-mfloat-abi=hard" } { "" } } */
  /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
b/gcc/testsuite/gcc.target/arm/pr65647.c
index 26b4e399f6be..3cbf6b804ec0 100644
--- a/gcc/testsuite/gcc.target/arm/pr65647.c
+++ b/gcc/testsuite/gcc.target/arm/pr65647.c
@@ -1,7 +1,7 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target arm_arch_v6m_ok } */
  /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
{"-mfloat-abi=soft" } } */
-/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft" } */
+/* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft -fpermissive" 
} */
  
  a, b, c, e, g = &e, h, i = 7, l = 1, m, n, o, q = &m, r, s = &r, u, w = 9, x,

y = 6, z, t6 = 7, t8, t9 = 1, t11 = 5, t12 = &t8, t13 = 3, t15,
diff --git a/gcc/testsuite/gcc.target/arm/pr65710.c 
b/gcc/testsuite/gcc.target/arm/pr65710.c
index 103ce1d45f77..4cbf7817af7e 100644
--- a/gcc/testsuite/gcc.target/a

Re: [PATCH] testsuite: Fix fallout of turning warnings into errors on 32-bit Arm

2024-03-01 Thread Richard Earnshaw (lists)
On 01/03/2024 14:23, Andre Vieira (lists) wrote:
> Hi Thiago,
> 
> Thanks for this, LGTM but I can't approve this, CC'ing Richard.
> 
> Do have a nitpick, in the gcc/testsuite/ChangeLog: remove 'gcc/testsuite' 
> from bullet points 2-4.
> 

Yes, this is OK with the change Andre mentioned (your push will fail if you 
don't fix that).

R.

PS, if you've set up GCC git customizations (see 
contrib/gcc-git-customization.sh), you can verify things like this with 'git 
gcc-verify HEAD^..HEAD'


> Kind regards,
> Andre
> 
> On 13/01/2024 00:55, Thiago Jung Bauermann wrote:
>> Since commits 2c3db94d9fd ("c: Turn int-conversion warnings into
>> permerrors") and 55e94561e97e ("c: Turn -Wimplicit-function-declaration
>> into a permerror") these tests fail with errors such as:
>>
>>    FAIL: gcc.target/arm/pr59858.c (test for excess errors)
>>    FAIL: gcc.target/arm/pr65647.c (test for excess errors)
>>    FAIL: gcc.target/arm/pr65710.c (test for excess errors)
>>    FAIL: gcc.target/arm/pr97969.c (test for excess errors)
>>
>> Here's one example of the excess errors:
>>
>>    FAIL: gcc.target/arm/pr65647.c (test for excess errors)
>>    Excess errors:
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:17: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:51: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:6:62: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:7:48: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:8:9: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:24:5: error: 
>> initialization of 'int' from 'int *' makes integer from pointer without a 
>> cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:25:5: error: 
>> initialization of 'int' from 'struct S1 *' makes integer from pointer 
>> without a cast [-Wint-conversion]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:41:3: error: 
>> implicit declaration of function 'fn3'; did you mean 'fn2'? 
>> [-Wimplicit-function-declaration]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:46:3: error: 
>> implicit declaration of function 'fn5'; did you mean 'fn4'? 
>> [-Wimplicit-function-declaration]
>>    /path/gcc.git/gcc/testsuite/gcc.target/arm/pr65647.c:57:16: error: 
>> implicit declaration of function 'fn6'; did you mean 'fn4'? 
>> [-Wimplicit-function-declaration]
>>
>> PR rtl-optimization/59858 and PR target/65710 test the fix of an ICE.
>> PR target/65647 and PR target/97969 test for a compilation infinite loop.
>>
>> Therefore, add -fpermissive so that the tests behave as they did previously.
>> Tested on armv8l-linux-gnueabihf.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/arm/pr59858.c: Add -fpermissive.
>> * gcc/testsuite/gcc.target/arm/pr65647.c: Likewise.
>> * gcc/testsuite/gcc.target/arm/pr65710.c: Likewise.
>> * gcc/testsuite/gcc.target/arm/pr97969.c: Likewise.
>> ---
>>   gcc/testsuite/gcc.target/arm/pr59858.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pr65647.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pr65710.c | 2 +-
>>   gcc/testsuite/gcc.target/arm/pr97969.c | 2 +-
>>   4 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c 
>> b/gcc/testsuite/gcc.target/arm/pr59858.c
>> index 3360b48e8586..9336edfce277 100644
>> --- a/gcc/testsuite/gcc.target/arm/pr59858.c
>> +++ b/gcc/testsuite/gcc.target/arm/pr59858.c
>> @@ -1,5 +1,5 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
>> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
>> -fPIC -w" } */
>> +/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb 
>> -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts 
>> -fPIC -w -fpermissive" } */
>>   /* { dg-require-effective-target fpic } */
>>   /* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft 
>> -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
>>   /* { dg-require-effective-target arm_arch_v5te_thumb_ok } */
>> diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c 
>> b/gcc/testsuite/gcc.target/arm/pr65647.c
>> index 26b4e399f6be..3cbf6b804ec0 100644
>> --- a/gcc/testsuite/gcc.target/arm/pr65647.c
>> +++ b/gcc/testsuite/gcc.target/arm/pr65647.c
>> @@ -1,7 +1,7 @@
>>   /* { dg-do compile } */
>>   /* { dg-require-effective-target arm_arch_v6m_ok } */
>>   /* { dg-skip-if

Re: [PATCH] testsuite: Turn errors back into warnings in arm/acle/cde-mve-error-2.c

2024-03-01 Thread Richard Earnshaw (lists)
On 13/01/2024 20:46, Thiago Jung Bauermann wrote:
> Since commit 2c3db94d9fd ("c: Turn int-conversion warnings into
> permerrors") the test fails with errors such as:
> 
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 32)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 33)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 34)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 35)
> ⋮
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 118 (test for 
> warnings, line 117)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 119)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 120 (test for 
> warnings, line 119)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 121)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 122 (test for 
> warnings, line 121)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 123)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   at line 124 (test for 
> warnings, line 123)
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0   (test for errors, line 
> 125)
> ⋮
>   FAIL: gcc.target/arm/acle/cde-mve-error-2.c   -O0  (test for excess errors)
> 
> There's a total of 1016 errors.  Here's a sample of the excess errors:
> 
>   Excess errors:
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:117:31: 
> error: passing argument 2 of '__builtin_arm_vcx1qv16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:119:3: 
> error: passing argument 3 of '__builtin_arm_vcx1qav16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:121:3: 
> error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
>   /path/gcc.git/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c:123:3: 
> error: passing argument 3 of '__builtin_arm_vcx2qv16qi' makes integer from 
> pointer without a cast [-Wint-conversion]
> 
> The test expects these messages to be warnings, not errors.  My first try
> was to change it to expect them as errors instead.  This didn't work, IIUC
> because the error prevents the compiler from continuing processing the file
> and thus other errors which are expected by the test don't get emitted.
> 
> Therefore, add -fpermissive so that the test behaves as it did previously.
> Because of the additional line in the header, I had to adjust the line
> numbers of the expected warnings.
> 
> Tested on armv8l-linux-gnueabihf.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/arm/acle/cde-mve-error-2.c: Add -fpermissive.
> ---
>  .../gcc.target/arm/acle/cde-mve-error-2.c | 63 ++-
>  1 file changed, 32 insertions(+), 31 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c 
> b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
> index 5b7774825442..da283a06a54d 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-error-2.c
> @@ -2,6 +2,7 @@
>  
>  /* { dg-do assemble } */
>  /* { dg-require-effective-target arm_v8_1m_main_cde_mve_fp_ok } */
> +/* { dg-options "-fpermissive" } */
>  /* { dg-add-options arm_v8_1m_main_cde_mve_fp } */
>  
>  /* The error checking files are split since there are three kinds of
> @@ -115,73 +116,73 @@ uint8x16_t test_bad_immediates (uint8x16_t n, 
> uint8x16_t m, int someval,
>  
>/* `imm' is of wrong type.  */
>accum += __arm_vcx1q_u8 (0, "");/* { dg-error 
> {argument 2 to '__builtin_arm_vcx1qv16qi' must be a constant immediate in 
> range \[0-4095\]} } */
> -  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes 
> integer from pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 
> 117 } */
> +  /* { dg-warning {passing argument 2 of '__builtin_arm_vcx1qv16qi' makes 
> integer from pointer without a cast \[-Wint-conversion\]} "" { target *-*-* } 
> 118 } */

Absolute line numbers are a pain, but I think we can use '.-1' (without the 
quotes) in these cases to minimize the churn.

If that works, ok with that change.

R.



Re: [PATCH v6 0/5]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-03-01 Thread Qing Zhao
Ping on this patch set.

Thanks a lot!

Qing

> On Feb 16, 2024, at 14:47, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the 6th version of the patch.
> 
> compare with the 5th version, the only difference is:
> 
> 1. Add the 6th argument to .ACCESS_WITH_SIZE
>   to carry the TYPE of the flexible array.
>   Such information is needed during tree-object-size.cc.
> 
>   previously, we use the result type of the routine
>   .ACCESS_WITH_SIZE to decide the element type of the
>   original array, however, the result type of the routine
>   might be changed during tree optimizations due to 
>   possible type casting in the source code.
> 
> 
> compare with the 4th version, the major difference are:
> 
> 1. Change the return type of the routine .ACCESS_WITH_SIZE 
>   FROM:
> Pointer to the type of the element of the flexible array;
>   TO:
> Pointer to the type of the flexible array;
>And then wrap the call with an indirection reference. 
> 
> 2. Adjust all other parts with this change, (this will simplify the bound 
> sanitizer instrument);
> 
> 3. Add the fixes to the kernel building failures, which include:
>A. The operator “typeof” cannot return correct type for a->array; 
>B. The operator “&” cannot return correct address for a->array;
> 
> 4. Correctly handle the case when the value of “counted-by” is zero or 
> negative as following
>   4.1. Update the counted-by doc with the following:
>When the counted-by field is assigned a negative integer value, the 
> compiler will treat the value as zero. 
>   4.2. Adjust __bdos and array bound sanitizer to handle correctly when 
> “counted-by” is zero. 
> 
> 
> It based on the following proposal:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
> Represent the missing dependence for the "counted_by" attribute and its 
> consumers
> 
> **The summary of the proposal is:
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size 
> information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the 
> "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound 
> sanitizer, query the size information or ACCESS_MODE information from the new 
> internal function;
> * When expansing to RTL, replace the internal function with the actual 
> reference to the FAM field;
> * Some adjustment to ipa alias analysis, and other SSA passes to mitigate the 
> impact to the optimizer and code generation.
> 
> 
> **The new internal function
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, 
> ACCESS_MODE, TYPE_OF_REF)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> Both the return type and the type of the first argument of this function have 
> been converted from the incomplete array type to the corresponding pointer 
> type.
> 
> The call to .ACCESS_WITH_SIZE is wrapped with an INDIRECT_REF, whose type is 
> the original imcomplete array type.
> 
> Please see the following link for why:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object,
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE 
> represents
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes;
> 4th argument "TYPE_OF_SIZE": A constant 0 with the TYPE of the object
>  refed by REF_TO_SIZE
> 5th argument "ACCESS_MODE":
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 6th argument "TYPE_OF_REF": A constant 0 with the pointer TYPE to
>  the original flexible array type.
> 
> ** The Patch sets included:
> 
> 1. Provide counted_by attribute to flexible array member field;
>  which includes:
>  * "counted_by" attribute documentation;
>  * C FE handling of the new attribute;
>syntax checking, error reporting;
>  * testing cases;
> 
> 2. Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.
>  which includes:
>  * The definition of the new internal function .ACCESS_WITH_SIZE in 
> internal-fn.def.
>  * C FE converts every reference to a FAM with "counted_by" attribute to 
> a call to the internal function .ACCESS_WITH_SIZE.
>(build_component_ref in c_typeck.cc)
>This includes the case when the object is statically allocated and 
> initialized.
>In order to make this working, we should update 
> initializer_constant_valid_p_1 and output_constant in varasm.cc to include 
> calls to .ACCESS_WITH_SIZE.
> 
>However, for the reference inside "offsetof", ignore the "counted_by" 
> attribute since it's not useful at al

[PATCH] libstdc++: Use [] for empty argument to automake macro

2024-03-01 Thread Jonathan Wakely
Does this look reasonable to push? There are lots of other uses of
AM_ENABLE_MULTILIB(, ..) in the tree, but I'll leave others to "fix"
those if needed.

-- >8 --

This seems to be needed to build with --enable-maintainer-mode but I
don't understand why it doesn't fail in the normal build.

libstdc++-v3/ChangeLog:

* configure.ac: Use [] for empty argument to AM_ENABLE_MULTILIB
macro.
---
 libstdc++-v3/configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 37396bd6ebb..1d6a94a16f2 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -11,7 +11,7 @@ AC_CONFIG_HEADER(config.h)
 ### am handles this now?  ORIGINAL_LD_FOR_MULTILIBS=$LD
 
 # Find the rest of the source tree framework.
-AM_ENABLE_MULTILIB(, ..)
+AM_ENABLE_MULTILIB([], ..)
 
 # Gets build, host, target, *_vendor, *_cpu, *_os, etc.
 #
-- 
2.43.2



Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-03-01 Thread Robin Dapp
> 2.  When I write if (a == 2) and if (2 == a), the results are
> same
> 
> 3.  The vec_duplicate operand  is the 5th operand in both cmp and
> eqne patterns. I think they are equal.

A comparison with a constant is always canonicalized to have the
constant second, that's why you won't see a difference.
A vector constant follows the same rule because
swap_commutative_operands_p will place it second.

I'm not sure whether we need the vec_duplicate first, honestly.
I don't remember a canonicalization rule that puts it there.
We do have something for constants and vec_merge.  As long as
things come from expand I think a constant will always be
second and this patch removes the patterns where the duplicate
is first.

Generally with fast math we could invert the condition so
a comparison should be "commutative".  With NaNs I think we
also allow it if the unordered comparisons are supported.
But I'm not even certain that we try something like that with
vectors.  On the other hand - as there is no canonical order
nothing would prevent it from being first in the future?

Will need to think about it some more (and try with NaNs) but
we could give try removing the patterns with GCC15 I suppose.

The rest should still be handled in a more generic fashion.

Regards
 Robin



Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-03-01 Thread Richard Earnshaw (lists)
On 01/03/2024 04:38, Alexandre Oliva wrote:
> Hello, Matthew,
> 
> Thanks for the review.

For closure, Jakub has just pushed a patch to the generic code, so I don't 
think we need this now.

R.

> 
> On Feb 26, 2024, Matthew Malcomson  wrote:
> 
>> I think you're right that the AAPCS32 requires all arguments to be passed in
>> registers for this testcase.
>> (Nit on the commit-message: It says that your reading of the AAPCS32
>> suggests
>> that the *caller* is correct -- I believe based on the change you
>> suggested you
>> meant *callee* is correct in expecting arguments in registers.)
> 
> Ugh, yeah, sorry about the typo.
> 
>> The approach you suggest looks OK to me -- I do notice that it doesn't
>> fix the
>> legacy ABI's of `atpcs` and `apcs` and guess it would be nicer to have them
>> working at the same time though would defer to maintainers on how
>> important that
>> is.
>> (For the benefit of others reading) I don't believe there is any ABI concern
>> with this since it's fixing something that is currently not working at
>> all and
>> only applies to c23 (so a change shouldn't have too much of an impact).
> 
>> You mention you chose to make the change in the arm backend rather
>> than general
>> code due to hesitancy to change the generic ABI-affecting code. That makes
>> sense to me, certainly at this late stage in the development cycle.
> 
> *nod* I wrote the patch in the following context: I hit the problem on
> the very first toolchain I started transitioning to gcc-13.  I couldn't
> really fathom the notion that this breakage could have survived an
> entire release cycle if it affected many targets, and sort of held on to
> an assumption that the abi used by our arm-eabi toolchain had to be an
> uncommon one.
> 
> All of this hypothesizing falls apart by the now apparent knowledge that
> the test is faling elsewhere as well, even on other ARM ABIs, it just
> hadn't been addressed yet.  I'm glad we're getting there :-)
> 
>> From a quick check on c23-stdarg-4.c it does look like the below
>> change ends up
>> with the same codegen as your patch (except in the case of those
>> legacy ABI's,
>> where the below does make the caller and callee ABI match AFAICT):
> 
>> ```
>>   diff --git a/gcc/calls.cc b/gcc/calls.cc
>>   index 01f44734743..0b302f633ed 100644
>>   --- a/gcc/calls.cc
>>   +++ b/gcc/calls.cc
>>   @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int ignore)
>>     we do not have any reliable way to pass unnamed args in
>>     registers, so we must force them into memory.  */
> 
>>   -  if (type_arg_types != 0
>>   +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>  && targetm.calls.strict_argument_naming (args_so_far))
>>    ;
>>  else if (type_arg_types != 0
>>  && ! targetm.calls.pretend_outgoing_varargs_named
>> (args_so_far))
>>    /* Don't include the last named arg.  */
>>    --n_named_args;
>>   -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>   +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>>   +    && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>>    n_named_args = 0;
>>  else
>>    /* Treat all args as named.  */
>> ```
> 
>> Do you agree that this makes sense (i.e. is there something I'm
>> completely missing)?
> 
> Yeah, your argument is quite convincing, and the target knobs are indeed
> in line with the change you suggest, whereas the current code seems to
> deviate from them.
> 
> With my ABI designer hat on, however, I see that there's room for ABIs
> to make decisions about 0-args stdargs that go differently from stdargs
> with leading named args, from prototyped functions, and even from
> prototypeless functions, and we might end up needing more knobs to deal
> with such custom decisions.  We can cross that bridge if/when we get to
> it, though.
> 
>> (lm32 mcore msp430 gcn cris fr30 frv h8300 arm v850 rx pru)
> 
> Interesting that ppc64le is not on your list.  There's PR107453 about
> that, and another thread is discussing a fix for it that is somewhat
> different from what you propose (presumably because the way the problem
> manifests on ppc64le is different), but it also tweaks expand_call.
> 
> I'll copy you when following up there.
> 



Re: [PATCH] arm: Fixed C23 call compatibility with arm-none-eabi

2024-03-01 Thread Richard Earnshaw (lists)
On 19/02/2024 09:13, Torbjörn SVENSSON wrote:
> Ok for trunk and releases/gcc-13?
> Regtested on top of 945cb8490cb for arm-none-eabi, without any regression.
> 
> Backporting to releases/gcc-13 will change -std=c23 to -std=c2x.

Jakub has just pushed a different fix for this, so I don't think we need this 
now.

R.


> 
> --
> 
> In commit 4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a, -std=c23 support was
> introduced to support functions without any named arguments.  For
> arm-none-eabi, this is not as simple as placing all arguments on the
> stack.  Align the caller to use r0, r1, r2 and r3 for arguments even for
> functions without any named arguments, as specified in the AAPCS.
> 
> Verify that the generic test case have the arguments are in the right
> order and add ARM specific test cases.
> 
> gcc/ChangeLog:
> 
>   * calls.h: Added the type of the function to function_arg_info.
>   * calls.cc: Save the type of the function.
>   * config/arm/arm.cc: Check in the AAPCS layout function if
>   function has no named args.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/torture/c23-stdarg-split-1a.c: Detect out of order
>   arguments.
>   * gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.
>   * gcc.target/arm/aapcs/align_vaarg3.c: New test.
>   * gcc.target/arm/aapcs/align_vaarg4.c: New test.
> 
> Signed-off-by: Torbjörn SVENSSON 
> Co-authored-by: Yvan ROUX 
> ---
>  gcc/calls.cc  |  2 +-
>  gcc/calls.h   | 20 --
>  gcc/config/arm/arm.cc | 13 ---
>  .../gcc.dg/torture/c23-stdarg-split-1a.c  |  4 +-
>  .../gcc.dg/torture/c23-stdarg-split-1b.c  | 15 +---
>  .../gcc.target/arm/aapcs/align_vaarg3.c   | 37 +++
>  .../gcc.target/arm/aapcs/align_vaarg4.c   | 31 
>  7 files changed, 102 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg4.c
> 
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 01f44734743..a1cc283b952 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1376,7 +1376,7 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
>with those made by function.cc.  */
>  
>/* See if this argument should be passed by invisible reference.  */
> -  function_arg_info arg (type, argpos < n_named_args);
> +  function_arg_info arg (type, fntype, argpos < n_named_args);
>if (pass_by_reference (args_so_far_pnt, arg))
>   {
> const bool callee_copies
> diff --git a/gcc/calls.h b/gcc/calls.h
> index 464a4e34e33..88836559ebe 100644
> --- a/gcc/calls.h
> +++ b/gcc/calls.h
> @@ -35,24 +35,33 @@ class function_arg_info
>  {
>  public:
>function_arg_info ()
> -: type (NULL_TREE), mode (VOIDmode), named (false),
> +: type (NULL_TREE), fntype (NULL_TREE), mode (VOIDmode), named (false),
>pass_by_reference (false)
>{}
>  
>/* Initialize an argument of mode MODE, either before or after promotion.  
> */
>function_arg_info (machine_mode mode, bool named)
> -: type (NULL_TREE), mode (mode), named (named), pass_by_reference (false)
> +: type (NULL_TREE), fntype (NULL_TREE), mode (mode), named (named),
> +pass_by_reference (false)
>{}
>  
>/* Initialize an unpromoted argument of type TYPE.  */
>function_arg_info (tree type, bool named)
> -: type (type), mode (TYPE_MODE (type)), named (named),
> +: type (type), fntype (NULL_TREE), mode (TYPE_MODE (type)), named 
> (named),
>pass_by_reference (false)
>{}
>  
> +  /* Initialize an unpromoted argument of type TYPE with a known function 
> type
> + FNTYPE.  */
> +  function_arg_info (tree type, tree fntype, bool named)
> +: type (type), fntype (fntype), mode (TYPE_MODE (type)), named (named),
> +pass_by_reference (false)
> +  {}
> +
>/* Initialize an argument with explicit properties.  */
>function_arg_info (tree type, machine_mode mode, bool named)
> -: type (type), mode (mode), named (named), pass_by_reference (false)
> +: type (type), fntype (NULL_TREE), mode (mode), named (named),
> +pass_by_reference (false)
>{}
>  
>/* Return true if the gimple-level type is an aggregate.  */
> @@ -96,6 +105,9 @@ public:
>   libgcc support functions).  */
>tree type;
>  
> +  /* The type of the function that has this argument, or null if not known.  
> */
> +  tree fntype;
> +
>/* The mode of the argument.  Depending on context, this might be
>   the mode of the argument type or the mode after promotion.  */
>machine_mode mode;
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 1cd69268ee9..98e149e5b7e 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -7006,7 +7006,7 @@ aapcs_libcall_value (machine_mode mode)
> numbers referred to here are those in the AAPCS.

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 2/29/24 15:56, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > OK for trunk?
> > 
> > -- >8 --
> > 
> > For local enums defined in a non-template function or a function template
> > instantiation it seems we neglect to make the function depend on the enum
> > definition, which ultimately causes streaming to fail due to the enum
> > definition not being streamed before uses of its enumerators are streamed,
> > as far as I can tell.
> 
> I would think that the function doesn't need to depend on the local enum in
> order for the local enum to be streamed before the use of the enumerator,
> which comes after the definition of the enum in the function body?
> 
> Why isn't streaming the body of the function outputting the enum definition
> before the use of the enumerator?

IIUC (based on observing the behavior for local classes) streaming the
definition of a local class/enum as part of the function definition is
what we want to avoid; we want to treat a local type definition as a
logically separate definition and stream it separately (similar
to class defns vs member defns I guess).  And by not registering a dependency
between the function and the local enum, we end up never streaming out
the local enum definition separately and instead stream it out as part
of the function definition (accidentally) which we then can't stream in
properly.

Perhaps the motivation for treating local type definitions as logically
separate from the function definition is because they can leak out of a
function with a deduced return type:

  auto f() {
struct A { };
return A();
  }

  using type = decltype(f()); // refers directly to f()::A

It's also consistent with templated local types getting their own
TEMPLATE_DECL (which Nathan revisited in r11-4529-g9703b8d98c116e).

> 
> > This was nearly enough to make things work, except we now ran into
> > issues with the local TYPE/CONST_DECL copies when streaming the
> > constexpr version of a function body.  It occurred to me that we don't
> > need to make copies of local types when copying a constexpr function
> > body; only VAR_DECLs etc need to be copied for sake of recursive
> > constexpr calls.  So this patch adjusts copy_fn accordingly.
> 
> Maybe adjust can_be_nonlocal instead?  It seems unnecessary in general to
> remap types and enumerators for inlining.

That seems to work nicely too.  I'm testing a patch to that effect with
all front ends and will report back.

> 
> Jason
> 
> 



Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-03-01 Thread Robin Dapp
> +  /* Segment load/store permute cost.  */
> +  const int segment_permute_2;
> +  const int segment_permute_4;
> +  const int segment_permute_8;
> 
> Why do we only have 2/4/8, I think we should have 2/3/4/5/6/7/8

No idea why I posted that (wrong) version, I used it for
some testing locally.  Attached is the proper version, still
called it v3...

Regards
 Robin

Subject: [PATCH v3] RISC-V: Add initial cost handling for segment
 loads/stores.

This patch makes segment loads and stores more expensive.  It adds
segment_permute_2 as well as 3 to 8 cost fields to the common vector
costs and adds handling to adjust_stmt_cost.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_permute cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_permute_[2-8] to 1.
---
 gcc/config/riscv/riscv-protos.h|   9 ++
 gcc/config/riscv/riscv-vector-costs.cc | 163 ++---
 gcc/config/riscv/riscv.cc  |  14 +++
 3 files changed, 144 insertions(+), 42 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..90d1fcbb3b1 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,6 +218,15 @@ struct common_vector_cost
   const int gather_load_cost;
   const int scatter_store_cost;
 
+  /* Segment load/store permute cost.  */
+  const int segment_permute_2;
+  const int segment_permute_3;
+  const int segment_permute_4;
+  const int segment_permute_5;
+  const int segment_permute_6;
+  const int segment_permute_7;
+  const int segment_permute_8;
+
   /* Cost of a vector-to-scalar operation.  */
   const int vec_to_scalar_cost;
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index adf9c197df5..f4da213fe14 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1043,6 +1043,25 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
 }
 
+/* Returns the group size i.e. the number of vectors to be loaded by a
+   segmented load/store instruction.  Return 0 if it is no segmented
+   load/store.  */
+static int
+segment_loadstore_group_size (enum vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info)
+{
+  if (stmt_info
+  && (kind == vector_load || kind == vector_store)
+  && STMT_VINFO_DATA_REF (stmt_info))
+{
+  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+  if (stmt_info
+ && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+   return DR_GROUP_SIZE (stmt_info);
+}
+  return 0;
+}
+
 /* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of riscv_builtin_vectorization_cost handling which doesn't have any
@@ -1067,55 +1086,115 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, 
loop_vec_info loop,
 case vector_load:
 case vector_store:
{
- /* Unit-stride vector loads and stores do not have offset addressing
-as opposed to scalar loads and stores.
-If the address depends on a variable we need an additional
-add/sub for each load/store in the worst case.  */
- if (stmt_info && stmt_info->stmt)
+ if (stmt_info && stmt_info->stmt && STMT_VINFO_DATA_REF (stmt_info))
{
- data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
- class loop *father = stmt_info->stmt->bb->loop_father;
- if (!loop && father && !father->inner && father->superloops)
+ /* Segment loads and stores.  When the group size is > 1
+the vectorizer will add a vector load/store statement for
+each vector in the group.  Here we additionally add permute
+costs for each.  */
+ /* TODO: Indexed and ordered/unordered cost.  */
+ int group_size = segment_loadstore_group_size (kind, stmt_info);
+ if (group_size > 1)
+   {
+ switch (group_size)
+   {
+   case 2:
+ if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+   stmt_cost += costs->vla->segment_permute_2;
+ else
+   stmt_cost += costs->vls->segment_permute_2;
+ break;
+   case 3:
+ if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+   stmt_cost += costs->vla->segment_permute_3;
+ else
+   stmt_cost += costs->vls->segment_permute_3;
+ break;
+   case 4:
+ if (ri

[PATCH] libstdc++: Better diagnostics for std::format errors

2024-03-01 Thread Jonathan Wakely
Does the text of these new diagnostics look good?

There are of course other ways for a type to be not-formattable (e.g.
the formatter::format member doesn't return the right type or has some
other kind of incorrect signature, or the formatter::parse member isn't
constexpr) but we can't predict/detect them all reliably. This just
attempts to give a user-friendly explanation for a couple of common
mistakes. It should not have any false positives, because the
basic_format_arg constructor requires __formattable_with<_Tp, _Context>
so if either of these assertions fails, constructing __arg will fail
too.  The static_assert only adds a more readable error for a
compilation that's going to fail anyway.

Tested x86_64-linux.

-- >8 --

This adds two new static_assert messages to the internals of
std::make_format_args to give better diagnostics for invalid format
args. Rather than just getting an error saying that basic_format_arg
cannot be constructed, we get more specific errors for the cases where
std::formatter isn't specialized for the type at all, and where it's
specialized but only meets the BasicFormatter requirements and so can
only format non-const arguments.

Also add a test for the existing static_assert when constructing a
format_string for non-formattable args.

libstdc++-v3/ChangeLog:

* include/std/format (_Arg_store::_S_make_elt): Add two
static_assert checks to give more user-friendly error messages.
* testsuite/lib/prune.exp (libstdc++-dg-prune): Prune another
form of "in requirements with" note.
* testsuite/std/format/arguments/args_neg.cc: Check for
user-friendly diagnostics for non-formattable types.
* testsuite/std/format/string_neg.cc: Likewise.
---
 libstdc++-v3/include/std/format   | 13 +++
 libstdc++-v3/testsuite/lib/prune.exp  |  1 +
 .../std/format/arguments/args_neg.cc  | 34 ++-
 .../testsuite/std/format/string_neg.cc|  4 +++
 4 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index ee189f9086c..1e839e88db4 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3704,6 +3704,19 @@ namespace __format
static _Element_t
_S_make_elt(_Tp& __v)
{
+ using _Tq = remove_const_t<_Tp>;
+ using _CharT = typename _Context::char_type;
+ static_assert(is_default_constructible_v>,
+   "std::formatter must be specialized for the type "
+   "of each format arg");
+ using __format::__formattable_with;
+ if constexpr (is_const_v<_Tp>)
+   if constexpr (!__formattable_with<_Tp, _Context>)
+ if constexpr (__formattable_with<_Tq, _Context>)
+   static_assert(__formattable_with<_Tp, _Context>,
+ "format arg must be non-const because its "
+ "std::formatter specialization has a "
+ "non-const reference parameter");
  basic_format_arg<_Context> __arg(__v);
  if constexpr (_S_values_only)
return __arg._M_val;
diff --git a/libstdc++-v3/testsuite/lib/prune.exp 
b/libstdc++-v3/testsuite/lib/prune.exp
index 24a15ccad22..071dcf34c1e 100644
--- a/libstdc++-v3/testsuite/lib/prune.exp
+++ b/libstdc++-v3/testsuite/lib/prune.exp
@@ -54,6 +54,7 @@ proc libstdc++-dg-prune { system text } {
 regsub -all "(^|\n)\[^\n\]*:   . skipping \[0-9\]* instantiation contexts 
\[^\n\]*" $text "" text
 regsub -all "(^|\n)\[^\n\]*:   in .constexpr. expansion \[^\n\]*" $text "" 
text
 regsub -all "(^|\n)\[^\n\]*:   in requirements  .with\[^\n\]*" $text "" 
text
+regsub -all "(^|\n)\[^\n\]*:   in requirements with\[^\n\]*" $text "" text
 regsub -all "(^|\n)inlined from \[^\n\]*" $text "" text
 # Why doesn't GCC need these to strip header context?
 regsub -all "(^|\n)In file included from \[^\n\]*" $text "" text
diff --git a/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc 
b/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
index 16ac3040146..ded56fe63ab 100644
--- a/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
+++ b/libstdc++-v3/testsuite/std/format/arguments/args_neg.cc
@@ -6,7 +6,39 @@
 
 std::string rval() { return "path/etic/experience"; }
 
-void f()
+void test_rval()
 {
   (void)std::make_format_args(rval()); // { dg-error "cannot bind non-const 
lvalue reference" }
 }
+
+void test_missing_specialization()
+{
+  struct X { };
+  X x;
+  (void)std::make_format_args(x); // { dg-error "here" }
+// { dg-error "std::formatter must be specialized" "" { target *-*-* } 0 }
+}
+
+struct Y { };
+template<> class std::formatter {
+public:
+  constexpr typename format_parse_context::iterator
+  parse(format_parse_context& c)
+  { return c.begin(); }
+
+  template
+  typename C::iterator format(Y&, C&) const;
+};
+
+void test(std:

[PATCH] libstdc++: Add missing std::tuple constructor [PR114147]

2024-03-01 Thread Jonathan Wakely
This fixes a regression on all active branches.

Tested aarch64-linux.  

-- >8 --

I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.

libstdc++-v3/ChangeLog:

PR libstdc++/114147
* include/std/tuple (tuple::tuple(allocator_arg_t, const Alloc&)):
Add missing overload of allocator-extended default constructor.
(tuple::tuple(allocator_arg_t, const Alloc&)): Likewise.
* testsuite/20_util/tuple/cons/114147.cc: New test.
---
 libstdc++-v3/include/std/tuple| 14 ++
 .../testsuite/20_util/tuple/cons/114147.cc| 15 +++
 2 files changed, 29 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/tuple/cons/114147.cc

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 9c89c13ab84..3065058e184 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1550,6 +1550,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
tuple(allocator_arg_t __tag, const _Alloc& __a)
: _Inherited(__tag, __a) { }
 
+  template::value> = false>
+   _GLIBCXX20_CONSTEXPR
+   explicit
+   tuple(allocator_arg_t __tag, const _Alloc& __a)
+   : _Inherited(__tag, __a) { }
+
   template= 1),
   _ImplicitCtor<_NotEmpty, const _Elements&...> = true>
_GLIBCXX20_CONSTEXPR
@@ -2198,6 +2205,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
tuple(allocator_arg_t __tag, const _Alloc& __a)
: _Inherited(__tag, __a) { }
 
+  template::value, _T1, _T2> = false>
+   _GLIBCXX20_CONSTEXPR
+   explicit
+   tuple(allocator_arg_t __tag, const _Alloc& __a)
+   : _Inherited(__tag, __a) { }
+
   template = true>
_GLIBCXX20_CONSTEXPR
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/114147.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cons/114147.cc
new file mode 100644
index 000..916e7204964
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/114147.cc
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++11 } }
+
+// PR libstdc++/114147
+// tuple allocator-extended ctor requires non-explicit default ctor
+
+#include 
+#include 
+
+struct X { explicit X(); };
+
+std::allocator a;
+std::tuple t0(std::allocator_arg, a);
+std::tuple t1(std::allocator_arg, a);
+std::tuple t2(std::allocator_arg, a);
+std::tuple t3(std::allocator_arg, a);
-- 
2.43.2



Re: [PATCH] c++: auto(x) partial substitution [PR110025, PR114138]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 2/29/24 14:17, Patrick Palka wrote:
> > On Wed, 28 Feb 2024, Jason Merrill wrote:
> > > I wonder about, rather than returning it directly, setting its level to 1
> > > for
> > > the substitution?
> > 
> > Done, that works nicely.
> > 
> > > Then I wonder if it would be feasible to give all autos level 0 and adjust
> > > it
> > > here?  That's probably not a stage 4 change, though...
> > 
> > It seems feasible.  I experimented doing this in the past[1] and ran
> > into two complications.  One complication was with constrained auto
> > deduction, e.g.
> > 
> >template
> >void g() {
> >  C auto x = ...;
> >};
> > 
> > Here the underlying concept-id that we enter satisfaction with is
> > C where this auto has level one greater than the template
> > depth, and the argument vector we pass has an extra innermost level
> > containing the deduced type, so things match up nicely.  This seems
> > to be the only place where we truly need auto to have a non 0/1 level.
> > In my WIP patch in that thread I just made do_auto_deduction build the
> > concept-id C in terms of an auto of the proper level before
> > entering satisfaction, which was kind of ugly but worked.
> 
> So maybe set its level to TMPL_ARGS_DEPTH (targs) after add_to_template_args,
> rather than 1?

AFAICT in-place type modification in this case would be unsafe or at
least difficult to reason about due to the satisfaction/normalization
caches.  We would cache the result as if the auto had the nonzero level
and then (presumably) reset its level back to 0 afterward, leaving the
hash tables in an inconsistent state.

> 
> > The other complication was with Concepts TS extended auto deduction:
> > 
> >tuple t = tuple{};
> > 
> > because unify_pack_expansion (called from fn_type_unification during
> > do_auto_deduction) isn't prepared to see a parameter pack of level 0
> > (unify has no problems with ordinary tparms of level 0 though).  This
> > shouldn't be too hard to fix though.
> > 
> > How does the following look for trunk and perhaps 13 (there should be
> > no functional change for code that doesn't use auto(x))?
> > 
> > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587818.html
> > 
> > -- >8 --
> > 
> > PR c++/110025
> > PR c++/114138
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (make_cast_auto): Declare.
> > * parser.cc (cp_parser_functional_cast): Replace a parsed auto
> > with a level-less one via make_cast_auto.
> > * pt.cc (find_parameter_packs_r): Don't treat level-less auto
> > as a type parameter pack.
> > (tsubst) : Generalized CTAD placeholder
> > handling to all level-less autos.
> > (make_cast_auto): Define.
> > (do_auto_deduction): Handle replacement of a level-less auto.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp23/auto-fncast16.C: New test.
> > * g++.dg/cpp23/auto-fncast17.C: New test.
> > * g++.dg/cpp23/auto-fncast18.C: New test.
> > ---
> >   gcc/cp/cp-tree.h   |  1 +
> >   gcc/cp/parser.cc   | 11 
> >   gcc/cp/pt.cc   | 37 +++-
> >   gcc/testsuite/g++.dg/cpp23/auto-fncast16.C | 12 
> >   gcc/testsuite/g++.dg/cpp23/auto-fncast17.C | 15 +
> >   gcc/testsuite/g++.dg/cpp23/auto-fncast18.C | 69 ++
> >   6 files changed, 142 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast16.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast17.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast18.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 04c3aa6cd91..6f1da1c7bad 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -7476,6 +7476,7 @@ extern tree make_decltype_auto
> > (void);
> >   extern tree make_constrained_auto (tree, tree);
> >   extern tree make_constrained_decltype_auto(tree, tree);
> >   extern tree make_template_placeholder (tree);
> > +extern tree make_cast_auto (void);
> >   extern bool template_placeholder_p(tree);
> >   extern bool ctad_template_p   (tree);
> >   extern bool unparenthesized_id_or_class_member_access_p (tree);
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index 3ee9d49fb8e..3dbe6722ba1 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -33314,6 +33314,17 @@ cp_parser_functional_cast (cp_parser* parser, tree
> > type)
> > if (!type)
> >   type = error_mark_node;
> >   +  if (TREE_CODE (type) == TYPE_DECL
> > +  && is_auto (TREE_TYPE (type)))
> > +type = TREE_TYPE (type);
> > +
> > +  if (is_auto (type)
> > +  && !AUTO_IS_DECLTYPE (type)
> > +  && !PLACEHOLDER_TYPE_CONSTRAINTS (type)
> > +  && !CLASS_PLACEHOLDER_TEMPLATE (type))
> > +/* auto(x) and auto{x} are represented by level-less auto.  */
> > +type = make

[PATCH] s390: Streamline NNPA builtins with POP mnemonics

2024-03-01 Thread Stefan Schulze Frielinghaus
At the moment there are no extended mnemonics for vclfn(h,l) and vcrnf
defined in the Principles of Operation.  Thus, remove the suffix "s"
from the builtins and expanders and introduce a further operand for the
data type.

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Update to reflect latest
changes.
* config/s390/s390-builtins.def: Remove suffix s from
s390_vclfn(h,l)s and s390_vcrnfs.
* config/s390/s390.md: Similar, remove suffix s from unspec
definitions.
* config/s390/vecintrin.h (vec_extend_to_fp32_hi): Redefine.
(vec_extend_to_fp32_lo): Redefine.
(vec_round_from_fp32): Redefine.
* config/s390/vx-builtins.md (vclfnhs_v8hi): Remove suffix s.
(vclfnh_v8hi): Add with extra operand.
(vclfnls_v8hi): Remove suffix s.
(vclfnl_v8hi): Add with extra operand.
(vcrnfs_v8hi): Remove suffix s.
(vcrnf_v8hi): Add with extra operand.
---
OK for mainline?

 gcc/config/s390/s390-builtin-types.def |  4 ++--
 gcc/config/s390/s390-builtins.def  |  6 +++---
 gcc/config/s390/s390.md|  6 +++---
 gcc/config/s390/vecintrin.h|  6 +++---
 gcc/config/s390/vx-builtins.md | 27 ++
 5 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index ce51ae8cd3f..c3d09b42835 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -273,7 +273,6 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
BT_V2DF)
 DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
 DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
 DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
-DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
 DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
@@ -324,7 +323,6 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
BT_UV8HI, BT_USHORT, BT_I
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_INT)
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_UV8HI)
-DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, BT_UINT)
 DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
BT_UV16QI, BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
BT_V16QI)
@@ -340,6 +338,7 @@ DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_INT_INTPTR, BT_V2DI, 
BT_V2DF, BT_INT, BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_V2DF_INTPTR, BT_V2DI, BT_V2DF, BT_V2DF, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V2DI_V2DI_V2DI_INTPTR, BT_V2DI, BT_V2DI, BT_V2DI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V2DI_V4SI_V4SI_V2DI, BT_V2DI, BT_V4SI, BT_V4SI, BT_V2DI)
+DEF_FN_TYPE_3 (BT_FN_V4SF_UV8HI_UINT_UINT, BT_V4SF, BT_UV8HI, BT_UINT, BT_UINT)
 DEF_FN_TYPE_3 (BT_FN_V4SF_V2DF_INT_INT, BT_V4SF, BT_V2DF, BT_INT, BT_INT)
 DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_FLT_INT, BT_V4SF, BT_V4SF, BT_FLT, BT_INT)
 DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_UCHAR_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR, 
BT_UCHAR)
@@ -377,6 +376,7 @@ DEF_FN_TYPE_4 (BT_FN_UV4SI_UV4SI_UV4SI_UINTCONSTPTR_UCHAR, 
BT_UV4SI, BT_UV4SI, B
 DEF_FN_TYPE_4 (BT_FN_UV4SI_UV4SI_UV4SI_UV4SI_INT, BT_UV4SI, BT_UV4SI, 
BT_UV4SI, BT_UV4SI, BT_INT)
 DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_INT_INTPTR, BT_UV8HI, BT_UV8HI, 
BT_UV8HI, BT_INT, BT_INTPTR)
 DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, 
BT_UV8HI, BT_UV8HI, BT_INT)
+DEF_FN_TYPE_4 (BT_FN_UV8HI_V4SF_V4SF_UINT_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
BT_UINT, BT_UINT)
 DEF_FN_TYPE_4 (BT_FN_VOID_UV2DI_UV2DI_ULONGLONGPTR_ULONGLONG, BT_VOID, 
BT_UV2DI, BT_UV2DI, BT_ULONGLONGPTR, BT_ULONGLONG)
 DEF_FN_TYPE_4 (BT_FN_VOID_UV4SI_UV4SI_UINTPTR_ULONGLONG, BT_VOID, BT_UV4SI, 
BT_UV4SI, BT_UINTPTR, BT_ULONGLONG)
 DEF_FN_TYPE_4 (BT_FN_VOID_V4SI_V4SI_INTPTR_ULONGLONG, BT_VOID, BT_V4SI, 
BT_V4SI, BT_INTPTR, BT_ULONGLONG)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 02ff516c677..0d4e20ea425 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -3025,10 +3025,10 @@ B_DEF  (s390_vstrszf,vstrszv4si,
0,
 
 /* arch 14 builtins */
 
-B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_UV8HI_UINT)
-B_DEF  (s390_vclfnls,vclfnls_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_UV8HI_UINT)
+B_DEF  (s390_vclfnh, vclfnh_v8hi,   0, 
 B

[Patch] invoke.texi: Add note that -foffload= does not affect device detection

2024-03-01 Thread Tobias Burnus

Not very often, but do I keep running into issues (fails, segfaults)
related to testing programs compiled with a GCC without offload
configured and then using the system libraries. - That's equivalent
to having the system compiler (or any offload compiler) and
compiling with -foffload=disable.

The problem is that while the program only contains host code,
the run-time library still initializes devices when an API
routine - such as omp_get_num_devices - is invoked. This can
lead to odd bugs as target regions, obviously, will use host
fallback (for any device number) but the API routines will
happily operate on the actual devices, which can lead to odd
errors.

(Likewise issue when compiling for one offload target type
and running on a system which has devices of an other type.)

I assume that that's not a very common problem, but it can be
rather confusing when hitting this issue.

Maybe the proposed wording will help others to avoid this pitfall.
(Or is this superfluous as -foffload= is not much used and, even if,
no one then remembers or finds this none?)

Thoughts?

* * *

It was not clear to me how to refer to libgomp.texi
- Should it be 'libgomp' as in 'info libgomp' or the URL
  https://gcc.gnu.org/onlinedocs/libgomp/ (or filename of the PDF) implies?
- Or as  'GNU Offloading and Multi Processing Runtime Library Manual'
  as named linked to at https://gcc.gnu.org/onlinedocs or on the title page
  of the the PDF - but that name is not repeated in the info file or the HTML
  file.
- Or even 'GNU libgomp' to mirror a substring in the  of the HTML file.
I now ended up only implicitly referring that document.

Aside: Shouldn't all the HTML documents start with a  and  before
the table of content? Currently, it has:
  Top (GNU libgomp)
and the body starts with
  Short Table of Contents

Tobias

PS: In the testsuite, it mostly happens when iterating over
omp_get_num_devices() or when mixing calls to API routines with
device code ('omp target', compute constructs).
invoke.texi: Add note that -foffload= does not affect device detection

gcc/ChangeLog:

	* doc/invoke.texi (-foffload): Add note that the flag does not
	affect whether offload devices are detected.

 gcc/doc/invoke.texi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dc5fd863ca4..4153863020b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2736,38 +2736,45 @@ targets using ms-abi.
 
 @opindex foffload
 @cindex Offloading targets
 @cindex OpenACC offloading targets
 @cindex OpenMP offloading targets
 @item -foffload=disable
 @itemx -foffload=default
 @itemx -foffload=@var{target-list}
 Specify for which OpenMP and OpenACC offload targets code should be generated.
 The default behavior, equivalent to @option{-foffload=default}, is to generate
 code for all supported offload targets.  The @option{-foffload=disable} form
 generates code only for the host fallback, while
 @option{-foffload=@var{target-list}} generates code only for the specified
 comma-separated list of offload targets.
 
 Offload targets are specified in GCC's internal target-triplet format. You can
 run the compiler with @option{-v} to show the list of configured offload targets
 under @code{OFFLOAD_TARGET_NAMES}.
 
+Note that this option does not affect the available offload devices detected by
+the run-time library and, hence, the values returned by the OpenMP/OpenACC API
+routines or access to devices using those routines.  The run-time library
+itself can be tuned using environment variables; in particular, to fully disable
+the device detection, set the @code{OMP_TARGET_OFFLOAD} environment variable to
+@code{disabled}.
+
 @opindex foffload-options
 @cindex Offloading options
 @cindex OpenACC offloading options
 @cindex OpenMP offloading options
 @item -foffload-options=@var{options}
 @itemx -foffload-options=@var{target-triplet-list}=@var{options}
 
 With @option{-foffload-options=@var{options}}, GCC passes the specified
 @var{options} to the compilers for all enabled offloading targets.  You can
 specify options that apply only to a specific target or targets by using
 the @option{-foffload-options=@var{target-list}=@var{options}} form.  The
 @var{target-list} is a comma-separated list in the same format as for the
 @option{-foffload=} option.
 
 Typical command lines are
 
 @smallexample
 -foffload-options='-fno-math-errno -ffinite-math-only' -foffload-options=nvptx-none=-latomic
 -foffload-options=amdgcn-amdhsa=-march=gfx906


Re: [PATCH] c++: auto(x) partial substitution [PR110025, PR114138]

2024-03-01 Thread Jason Merrill

On 3/1/24 10:17, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 2/29/24 14:17, Patrick Palka wrote:

On Wed, 28 Feb 2024, Jason Merrill wrote:

I wonder about, rather than returning it directly, setting its level to 1
for
the substitution?


Done, that works nicely.


Then I wonder if it would be feasible to give all autos level 0 and adjust
it
here?  That's probably not a stage 4 change, though...


It seems feasible.  I experimented doing this in the past[1] and ran
into two complications.  One complication was with constrained auto
deduction, e.g.

template
void g() {
  C auto x = ...;
};

Here the underlying concept-id that we enter satisfaction with is
C where this auto has level one greater than the template
depth, and the argument vector we pass has an extra innermost level
containing the deduced type, so things match up nicely.  This seems
to be the only place where we truly need auto to have a non 0/1 level.
In my WIP patch in that thread I just made do_auto_deduction build the
concept-id C in terms of an auto of the proper level before
entering satisfaction, which was kind of ugly but worked.


So maybe set its level to TMPL_ARGS_DEPTH (targs) after add_to_template_args,
rather than 1?


AFAICT in-place type modification in this case would be unsafe or at
least difficult to reason about due to the satisfaction/normalization
caches.  We would cache the result as if the auto had the nonzero level
and then (presumably) reset its level back to 0 afterward, leaving the
hash tables in an inconsistent state.


Hmm, sure.

Anyway, the second patch is OK.

Jason



[COMITTED] contrib: mklog: Use present tense in ChangeLog

2024-03-01 Thread Bernhard Reutner-Fischer
Hi!

installed as r14-9256

> diff --git a/contrib/mklog.py b/contrib/mklog.py
> index d764fb41f99..7d8d554b15e 100755
> --- a/contrib/mklog.py
> +++ b/contrib/mklog.py
> @@ -277,7 +277,7 @@ def generate_changelog(data, no_functions=False, 
> fill_pr_titles=False,
>  # it used to be path.source_file[2:]
>  relative_path = 
> get_rel_path_if_prefixed(file.source_file[2:],
>   changelog)
> -out = append_changelog_line(out, relative_path, 'Moved 
> to...')
> +out = append_changelog_line(out, relative_path, 'Move to...')
>  new_path = get_rel_path_if_prefixed(file.target_file[2:],
>  changelog)
>  out += f'\t* {new_path}: ...here.\n'
> 
> 
> cheers



Re: [COMMITTED htdocs] robots.txt: Disallow various wiki actions

2024-03-01 Thread Gerald Pfeifer
On Fri, 1 Mar 2024, Mark Wielaard wrote:
> It is fine for robots to crawl the wiki pages, but they should perform
> actions, generate huge diffs, search/highlight pages or generate
> calendars.

s/should/should not/ :-)

I see your patch does exactly that - thank you!

Gerald


Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 3/1/24 10:00, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 2/29/24 15:56, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a function template
instantiation it seems we neglect to make the function depend on the enum
definition, which ultimately causes streaming to fail due to the enum
definition not being streamed before uses of its enumerators are streamed,
as far as I can tell.


I would think that the function doesn't need to depend on the local enum in
order for the local enum to be streamed before the use of the enumerator,
which comes after the definition of the enum in the function body?

Why isn't streaming the body of the function outputting the enum definition
before the use of the enumerator?


IIUC (based on observing the behavior for local classes) streaming the
definition of a local class/enum as part of the function definition is
what we want to avoid; we want to treat a local type definition as a
logically separate definition and stream it separately (similar
to class defns vs member defns I guess).  And by not registering a dependency
between the function and the local enum, we end up never streaming out
the local enum definition separately and instead stream it out as part
of the function definition (accidentally) which we then can't stream in
properly.

Perhaps the motivation for treating local type definitions as logically
separate from the function definition is because they can leak out of a
function with a deduced return type:

   auto f() {
 struct A { };
 return A();
   }

   using type = decltype(f()); // refers directly to f()::A


Yes, I believe that's what modules.cc refers to as a "voldemort".

But for non-voldemort local types, the declaration of the function 
doesn't depend on them, only the definition.  Why doesn't streaming them 
in the definition work properly?


Jason



Re: [PATCH v6 1/5] arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns

2024-03-01 Thread Richard Earnshaw (lists)
On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch adds an attribute to the mve md patterns to be able to identify
> predicable MVE instructions and what their predicated and unpredicated 
> variants
> are.  This attribute is used to encode the icode of the unpredicated variant 
> of
> an instruction in its predicated variant.
> 
> This will make it possible for us to transform VPT-predicated insns in
> the insn chain into their unpredicated equivalents when transforming the loop
> into a MVE Tail-Predicated Low Overhead Loop. For example:
> `mve_vldrbq_z_ -> mve_vldrbq_`.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.md (mve_unpredicated_insn): New attribute.
>   * config/arm/arm.h (MVE_VPT_PREDICATED_INSN_P): New define.
>   (MVE_VPT_UNPREDICATED_INSN_P): Likewise.
>   (MVE_VPT_PREDICABLE_INSN_P): Likewise.
>   * config/arm/vec-common.md (mve_vshlq_): Add attribute.
>   * config/arm/mve.md (arm_vcx1q_p_v16qi): Add attribute.
>   (arm_vcx1qv16qi): Likewise.
>   (arm_vcx1qav16qi): Likewise.
>   (arm_vcx1qv16qi): Likewise.
>   (arm_vcx2q_p_v16qi): Likewise.
>   (arm_vcx2qv16qi): Likewise.
>   (arm_vcx2qav16qi): Likewise.
>   (arm_vcx2qv16qi): Likewise.
>   (arm_vcx3q_p_v16qi): Likewise.
>   (arm_vcx3qv16qi): Likewise.
>   (arm_vcx3qav16qi): Likewise.
>   (arm_vcx3qv16qi): Likewise.
>   (@mve_q_): Likewise.
>   (@mve_q_int_): Likewise.
>   (@mve_q_v4si): Likewise.
>   (@mve_q_n_): Likewise.
>   (@mve_q_r_): Likewise.
>   (@mve_q_f): Likewise.
>   (@mve_q_m_): Likewise.
>   (@mve_q_m_n_): Likewise.
>   (@mve_q_m_r_): Likewise.
>   (@mve_q_m_f): Likewise.
>   (@mve_q_int_m_): Likewise.
>   (@mve_q_p_v4si): Likewise.
>   (@mve_q_p_): Likewise.
>   (@mve_q_): Likewise.
>   (@mve_q_f): Likewise.
>   (@mve_q_m_): Likewise.
>   (@mve_q_m_f): Likewise.
>   (mve_vq_f): Likewise.
>   (mve_q): Likewise.
>   (mve_q_f): Likewise.
>   (mve_vadciq_v4si): Likewise.
>   (mve_vadciq_m_v4si): Likewise.
>   (mve_vadcq_v4si): Likewise.
>   (mve_vadcq_m_v4si): Likewise.
>   (mve_vandq_): Likewise.
>   (mve_vandq_f): Likewise.
>   (mve_vandq_m_): Likewise.
>   (mve_vandq_m_f): Likewise.
>   (mve_vandq_s): Likewise.
>   (mve_vandq_u): Likewise.
>   (mve_vbicq_): Likewise.
>   (mve_vbicq_f): Likewise.
>   (mve_vbicq_m_): Likewise.
>   (mve_vbicq_m_f): Likewise.
>   (mve_vbicq_m_n_): Likewise.
>   (mve_vbicq_n_): Likewise.
>   (mve_vbicq_s): Likewise.
>   (mve_vbicq_u): Likewise.
>   (@mve_vclzq_s): Likewise.
>   (mve_vclzq_u): Likewise.
>   (@mve_vcmp_q_): Likewise.
>   (@mve_vcmp_q_n_): Likewise.
>   (@mve_vcmp_q_f): Likewise.
>   (@mve_vcmp_q_n_f): Likewise.
>   (@mve_vcmp_q_m_f): Likewise.
>   (@mve_vcmp_q_m_n_): Likewise.
>   (@mve_vcmp_q_m_): Likewise.
>   (@mve_vcmp_q_m_n_f): Likewise.
>   (mve_vctpq): Likewise.
>   (mve_vctpq_m): Likewise.
>   (mve_vcvtaq_): Likewise.
>   (mve_vcvtaq_m_): Likewise.
>   (mve_vcvtbq_f16_f32v8hf): Likewise.
>   (mve_vcvtbq_f32_f16v4sf): Likewise.
>   (mve_vcvtbq_m_f16_f32v8hf): Likewise.
>   (mve_vcvtbq_m_f32_f16v4sf): Likewise.
>   (mve_vcvtmq_): Likewise.
>   (mve_vcvtmq_m_): Likewise.
>   (mve_vcvtnq_): Likewise.
>   (mve_vcvtnq_m_): Likewise.
>   (mve_vcvtpq_): Likewise.
>   (mve_vcvtpq_m_): Likewise.
>   (mve_vcvtq_from_f_): Likewise.
>   (mve_vcvtq_m_from_f_): Likewise.
>   (mve_vcvtq_m_n_from_f_): Likewise.
>   (mve_vcvtq_m_n_to_f_): Likewise.
>   (mve_vcvtq_m_to_f_): Likewise.
>   (mve_vcvtq_n_from_f_): Likewise.
>   (mve_vcvtq_n_to_f_): Likewise.
>   (mve_vcvtq_to_f_): Likewise.
>   (mve_vcvttq_f16_f32v8hf): Likewise.
>   (mve_vcvttq_f32_f16v4sf): Likewise.
>   (mve_vcvttq_m_f16_f32v8hf): Likewise.
>   (mve_vcvttq_m_f32_f16v4sf): Likewise.
>   (mve_vdwdupq_m_wb_u_insn): Likewise.
>   (mve_vdwdupq_wb_u_insn): Likewise.
>   (mve_veorq_s>): Likewise.
>   (mve_veorq_u>): Likewise.
>   (mve_veorq_f): Likewise.
>   (mve_vidupq_m_wb_u_insn): Likewise.
>   (mve_vidupq_u_insn): Likewise.
>   (mve_viwdupq_m_wb_u_insn): Likewise.
>   (mve_viwdupq_wb_u_insn): Likewise.
>   (mve_vldrbq_): Likewise.
>   (mve_vldrbq_gather_offset_): Likewise.
>   (mve_vldrbq_gather_offset_z_): Likewise.
>   (mve_vldrbq_z_): Likewise.
>   (mve_vldrdq_gather_base_v2di): Likewise.
>   (mve_vldrdq_gather_base_wb_v2di_insn): Likewise.
>   (mve_vldrdq_gather_base_wb_z_v2di_insn): Likewise.
>   (mve_vldrdq_gather_base_z_v2di): Likewise.
>   (mve_vldrdq_gather_offset_v2di): Likewise.
>   (mve_vldrdq_gather_offset_z_v2di): Likewise.
>   (mve_vldrdq_gather_shifted_offset_v2di): Likewise.
>   (mve_vldrdq_gather_shifted_offset_z_v2di): Likewise.
>   (mve_vldrhq_): Likewise.
>   

Re: [PATCH v6 2/5] arm: Annotate instructions with mve_safe_imp_xlane_pred

2024-03-01 Thread Richard Earnshaw (lists)
On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch annotates some MVE across lane instructions with a new attribute.
> We use this attribute to let the compiler know that these instructions can be
> safely implicitly predicated when tail predicating if their operands are
> guaranteed to have zeroed tail predicated lanes.  These instructions were
> selected because having the value 0 in those lanes or 'tail-predicating' those
> lanes have the same effect.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.md (mve_safe_imp_xlane_pred): New attribute.
>   * config/arm/iterators.md (mve_vmaxmin_safe_imp): New iterator
>   attribute.
>   * config/arm/mve.md (vaddvq_s, vaddvq_u, vaddlvq_s, vaddlvq_u,
>   vaddvaq_s, vaddvaq_u, vmaxavq_s, vmaxvq_u, vmladavq_s, vmladavq_u,
>   vmladavxq_s, vmlsdavq_s, vmlsdavxq_s, vaddlvaq_s, vaddlvaq_u,
>   vmlaldavq_u, vmlaldavq_s, vmlaldavq_u, vmlaldavxq_s, vmlsldavq_s,
>   vmlsldavxq_s, vrmlaldavhq_u, vrmlaldavhq_s, vrmlaldavhxq_s,
>   vrmlsldavhq_s, vrmlsldavhxq_s, vrmlaldavhaq_s, vrmlaldavhaq_u,
>   vrmlaldavhaxq_s, vrmlsldavhaq_s, vrmlsldavhaxq_s, vabavq_s, vabavq_u,
>   vmladavaq_u, vmladavaq_s, vmladavaxq_s, vmlsdavaq_s, vmlsdavaxq_s,
>   vmlaldavaq_s, vmlaldavaq_u, vmlaldavaxq_s, vmlsldavaq_s,
>   vmlsldavaxq_s): Added mve_safe_imp_xlane_pred.
> ---
>  gcc/config/arm/arm.md   |  6 ++
>  gcc/config/arm/iterators.md |  8 
>  gcc/config/arm/mve.md   | 12 
>  3 files changed, 26 insertions(+)
> 

OK

R.


Re: [PATCH v6 3/5] arm: Fix a wrong attribute use and remove unused unspecs and iterators

2024-03-01 Thread Richard Earnshaw (lists)
On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch fixes the erroneous use of a mode attribute without a mode iterator
> in the pattern and removes unused unspecs and iterators.
> 
> gcc/ChangeLog:
> 
>   * config/arm/iterators.md (supf): Remove VMLALDAVXQ_U, VMLALDAVXQ_P_U,
>   VMLALDAVAXQ_U cases.
>   (VMLALDAVXQ): Remove iterator.
>   (VMLALDAVXQ_P): Likewise.
>   (VMLALDAVAXQ): Likewise.
>   * config/arm/mve.md (mve_vstrwq_p_fv4sf): Replace use of 
>   mode iterator attribute with V4BI mode.
>   * config/arm/unspecs.md (VMLALDAVXQ_U, VMLALDAVXQ_P_U,
>   VMLALDAVAXQ_U): Remove unused unspecs.
> ---
>  gcc/config/arm/iterators.md | 9 +++--
>  gcc/config/arm/mve.md   | 2 +-
>  gcc/config/arm/unspecs.md   | 3 ---
>  3 files changed, 4 insertions(+), 10 deletions(-)
> 

OK

R.


Re: [PATCH] c++, v2: Fix up decltype of non-dependent structured binding decl in template [PR92687]

2024-03-01 Thread Jason Merrill

On 3/1/24 02:55, Jakub Jelinek wrote:

On Thu, Feb 29, 2024 at 12:50:47PM +0100, Jakub Jelinek wrote:

finish_decltype_type uses DECL_HAS_VALUE_EXPR_P (expr) check for
DECL_DECOMPOSITION_P (expr) to determine if it is
array/struct/vector/complex etc. subobject proxy case vs. structured
binding using std::tuple_{size,element}.
For non-templates or when templates are already instantiated, that works
correctly, finalized DECL_DECOMPOSITION_P non-base vars indeed have
DECL_VALUE_EXPR in the former case and don't have it in the latter.
It works fine for dependent structured bindings as well, cp_finish_decomp in
that case creates DECLTYPE_TYPE tree and defers the handling until
instantiation.
As the testcase shows, this doesn't work for the non-dependent structured
binding case in templates, because DECL_HAS_VALUE_EXPR_P is set in that case
always; cp_finish_decomp ends with:
   if (processing_template_decl)
 {
   for (unsigned int i = 0; i < count; i++)
 if (!DECL_HAS_VALUE_EXPR_P (v[i]))
   {
 tree a = build_nt (ARRAY_REF, decl, size_int (i),
NULL_TREE, NULL_TREE);
 SET_DECL_VALUE_EXPR (v[i], a);
 DECL_HAS_VALUE_EXPR_P (v[i]) = 1;
   }
 }
and those artificial ARRAY_REFs are used in various places during
instantiation to find out what base the DECL_DECOMPOSITION_P VAR_DECLs
have and their positions.



Another option would be to change
  tree
  lookup_decomp_type (tree v)
  {
-  return *decomp_type_table->get (v);
+  if (tree *slot = decomp_type_table->get (v))
+return *slot;
+  return NULL_TREE;
  }

and in finish_decl_decomp either just in the ptds.saved case or always
try to lookup_decomp_type, if it returns non-NULL, return what it returned,
otherwise return unlowered_expr_type (expr).  I guess it would be cleaner,
I thought it would be more costly due to the hash table lookup, but now that
I think about it again, DECL_VALUE_EXPR is a hash table lookup as well.
So maybe then
+ if (ptds.saved)
+   {
+ gcc_checking_assert (DECL_HAS_VALUE_EXPR_P (expr));
+ /* DECL_HAS_VALUE_EXPR_P is always set if
+processing_template_decl.  If lookup_decomp_type
+returns non-NULL, it is the tuple case.  */
+ if (tree ret = lookup_decomp_type (expr))
+   return ret;
+   }
  if (DECL_HAS_VALUE_EXPR_P (expr))
/* Expr is an array or struct subobject proxy, handle
   bit-fields properly.  */
return unlowered_expr_type (expr);
  else
/* Expr is a reference variable for the tuple case.  */
return lookup_decomp_type (expr);


Here is a variant of the patch which does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


Or the other version, or adding some flag to the DECL_DECOMPOSITION_P
decls?

2024-03-01  Jakub Jelinek  

PR c++/92687
* decl.cc (lookup_decomp_type): Return NULL_TREE if decomp_type_table
doesn't have entry for V.
* semantics.cc (finish_decltype_type): If ptds.saved, assert
DECL_HAS_VALUE_EXPR_P is true and decide on tuple vs. non-tuple based
on if lookup_decomp_type is NULL or not.

* g++.dg/cpp1z/decomp59.C: New test.

--- gcc/cp/decl.cc.jj   2024-02-28 23:20:01.004751204 +0100
+++ gcc/cp/decl.cc  2024-02-29 20:03:11.087218176 +0100
@@ -9262,7 +9262,9 @@ static GTY((cache)) decl_tree_cache_map
  tree
  lookup_decomp_type (tree v)
  {
-  return *decomp_type_table->get (v);
+  if (tree *slot = decomp_type_table->get (v))
+return *slot;
+  return NULL_TREE;
  }
  
  /* Mangle a decomposition declaration if needed.  Arguments like

--- gcc/cp/semantics.cc.jj  2024-02-28 22:57:08.101800588 +0100
+++ gcc/cp/semantics.cc 2024-02-29 20:04:51.936880622 +0100
@@ -11804,6 +11804,15 @@ finish_decltype_type (tree expr, bool id
 access expression).  */
if (DECL_DECOMPOSITION_P (expr))
{
+ if (ptds.saved)
+   {
+ gcc_checking_assert (DECL_HAS_VALUE_EXPR_P (expr));
+ /* DECL_HAS_VALUE_EXPR_P is always set if
+processing_template_decl.  If lookup_decomp_type
+returns non-NULL, it is the tuple case.  */
+ if (tree ret = lookup_decomp_type (expr))
+   return ret;
+   }
  if (DECL_HAS_VALUE_EXPR_P (expr))
/* Expr is an array or struct subobject proxy, handle
   bit-fields properly.  */
--- gcc/testsuite/g++.dg/cpp1z/decomp59.C.jj2024-02-29 20:02:17.467929327 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp59.C   2024-02-29 20:02:17.467929327 
+0100
@@ -0,0 +1,63 @@
+// PR c++/92687
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+namespace std {
+  template struct tuple_size;
+  template struct tuple_element;
+}
+
+struct A {
+  int i;
+  template  int& get() { return i; }
+};
+

Re: [PATCH v6 4/5] doloop: Add support for predicated vectorized loops

2024-03-01 Thread Richard Earnshaw (lists)
On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch adds support in the target agnostic doloop pass for the detection 
> of
> predicated vectorized hardware loops.  Arm is currently the only target that
> will make use of this feature.
> 
> gcc/ChangeLog:
> 
>   * df-core.cc (df_bb_regno_only_def_find): New helper function.
>   * df.h (df_bb_regno_only_def_find): Declare new function.
>   * loop-doloop.cc (doloop_condition_get): Add support for detecting
>   predicated vectorized hardware loops.
>   (doloop_modify): Add support for GTU condition checks.
>   (doloop_optimize): Update costing computation to support alterations to
>   desc->niter_expr by the backend.
> 
> Co-authored-by: Stam Markianos-Wright 
> ---
>  gcc/df-core.cc |  15 +
>  gcc/df.h   |   1 +
>  gcc/loop-doloop.cc | 164 +++--
>  3 files changed, 113 insertions(+), 67 deletions(-)
> 

As discussed, I think we should wait for gcc-15 for this[*]; I know it was 
initially submitted during stage1 but it's had to go through a lot of revision 
since then and we're very close to wanting to cut the release branch.

R.

[*] Unless an independent reviewer wants to sign this off anyway.


Re: [PATCH v6 5/5] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2024-03-01 Thread Richard Earnshaw (lists)
On 27/02/2024 13:56, Andre Vieira wrote:
> 
> This patch adds support for MVE Tail-Predicated Low Overhead Loops by using 
> the
> doloop funcitonality added to support predicated vectorized hardware loops.
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-protos.h (arm_target_bb_ok_for_lob): Change
>   declaration to pass basic_block.
>   (arm_attempt_dlstp_transform): New declaration.
>   * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): Define targethook.
>   (TARGET_PREDICT_DOLOOP_P): Likewise.
>   (arm_target_bb_ok_for_lob): Adapt condition.
>   (arm_mve_get_vctp_lanes): New function.
>   (arm_dl_usage_type): New internal enum.
>   (arm_get_required_vpr_reg): New function.
>   (arm_get_required_vpr_reg_param): New function.
>   (arm_get_required_vpr_reg_ret_val): New function.
>   (arm_mve_get_loop_vctp): New function.
>   (arm_mve_insn_predicated_by): New function.
>   (arm_mve_across_lane_insn_p): New function.
>   (arm_mve_load_store_insn_p): New function.
>   (arm_mve_impl_pred_on_outputs_p): New function.
>   (arm_mve_impl_pred_on_inputs_p): New function.
>   (arm_last_vect_def_insn): New function.
>   (arm_mve_impl_predicated_p): New function.
>   (arm_mve_check_reg_origin_is_num_elems): New function.
>   (arm_mve_dlstp_check_inc_counter): New function.
>   (arm_mve_dlstp_check_dec_counter): New function.
>   (arm_mve_loop_valid_for_dlstp): New function.
>   (arm_predict_doloop_p): New function.
>   (arm_loop_unroll_adjust): New function.
>   (arm_emit_mve_unpredicated_insn_to_seq): New function.
>   (arm_attempt_dlstp_transform): New function.
>   * config/arm/arm.opt (mdlstp): New option.
>   * config/arm/iteratords.md (dlstp_elemsize, letp_num_lanes,
>   letp_num_lanes_neg, letp_num_lanes_minus_1): New attributes.
>   (DLSTP, LETP): New iterators.
>   (predicated_doloop_end_internal): New pattern.
>   (dlstp_insn): New pattern.
>   * config/arm/thumb2.md (doloop_end): Adapt to support tail-predicated
>   loops.
>   (doloop_begin): Likewise.
>   * config/arm/types.md (mve_misc): New mve type to represent
>   predicated_loop_end insn sequences.
>   * config/arm/unspecs.md:
>   (DLSTP8, DLSTP16, DLSTP32, DSLTP64,
>   LETP8, LETP16, LETP32, LETP64): New unspecs for DLSTP and LETP.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/lob.h: Add new helpers.
>   * gcc.target/arm/lob1.c: Use new helpers.
>   * gcc.target/arm/lob6.c: Likewise.
>   * gcc.target/arm/dlstp-compile-asm-1.c: New test.
>   * gcc.target/arm/dlstp-compile-asm-2.c: New test.
>   * gcc.target/arm/dlstp-compile-asm-3.c: New test.
>   * gcc.target/arm/dlstp-int8x16.c: New test.
>   * gcc.target/arm/dlstp-int8x16-run.c: New test.
>   * gcc.target/arm/dlstp-int16x8.c: New test.
>   * gcc.target/arm/dlstp-int16x8-run.c: New test.
>   * gcc.target/arm/dlstp-int32x4.c: New test.
>   * gcc.target/arm/dlstp-int32x4-run.c: New test.
>   * gcc.target/arm/dlstp-int64x2.c: New test.
>   * gcc.target/arm/dlstp-int64x2-run.c: New test.
>   * gcc.target/arm/dlstp-invalid-asm.c: New test.
> 
> Co-authored-by: Stam Markianos-Wright 
> ---
>  gcc/config/arm/arm-protos.h   |4 +-
>  gcc/config/arm/arm.cc | 1249 -
>  gcc/config/arm/arm.opt|3 +
>  gcc/config/arm/iterators.md   |   15 +
>  gcc/config/arm/mve.md |   50 +
>  gcc/config/arm/thumb2.md  |  138 +-
>  gcc/config/arm/types.md   |6 +-
>  gcc/config/arm/unspecs.md |   14 +-
>  gcc/testsuite/gcc.target/arm/lob.h|  128 +-
>  gcc/testsuite/gcc.target/arm/lob1.c   |   23 +-
>  gcc/testsuite/gcc.target/arm/lob6.c   |8 +-
>  .../gcc.target/arm/mve/dlstp-compile-asm-1.c  |  146 ++
>  .../gcc.target/arm/mve/dlstp-compile-asm-2.c  |  749 ++
>  .../gcc.target/arm/mve/dlstp-compile-asm-3.c  |   46 +
>  .../gcc.target/arm/mve/dlstp-int16x8-run.c|   44 +
>  .../gcc.target/arm/mve/dlstp-int16x8.c|   31 +
>  .../gcc.target/arm/mve/dlstp-int32x4-run.c|   45 +
>  .../gcc.target/arm/mve/dlstp-int32x4.c|   31 +
>  .../gcc.target/arm/mve/dlstp-int64x2-run.c|   48 +
>  .../gcc.target/arm/mve/dlstp-int64x2.c|   28 +
>  .../gcc.target/arm/mve/dlstp-int8x16-run.c|   44 +
>  .../gcc.target/arm/mve/dlstp-int8x16.c|   32 +
>  .../gcc.target/arm/mve/dlstp-invalid-asm.c|  521 +++
>  23 files changed, 3321 insertions(+), 82 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/dlstp-int

Re: [PATCH v4] aarch64,arm: Move branch-protection data to targets

2024-03-01 Thread Richard Earnshaw (lists)
On 11/01/2024 14:35, Szabolcs Nagy wrote:
> The branch-protection types are target specific, not the same on arm
> and aarch64.  This currently affects pac-ret+b-key, but there will be
> a new type on aarch64 that is not relevant for arm.
> 
> After the move, change aarch_ identifiers to aarch64_ or arm_ as
> appropriate.
> 
> Refactor aarch_validate_mbranch_protection to take the target specific
> branch-protection types as an argument.
> 
> In case of invalid input currently no hints are provided: the way
> branch-protection types and subtypes can be mixed makes it difficult
> without causing confusion.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.md: Rename aarch_ to aarch64_.
>   * config/aarch64/aarch64.opt: Likewise.
>   * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Likewise.
>   * config/aarch64/aarch64.cc (aarch64_expand_prologue): Likewise.
>   (aarch64_expand_epilogue): Likewise.
>   (aarch64_post_cfi_startproc): Likewise.
>   (aarch64_handle_no_branch_protection): Copy and rename.
>   (aarch64_handle_standard_branch_protection): Likewise.
>   (aarch64_handle_pac_ret_protection): Likewise.
>   (aarch64_handle_pac_ret_leaf): Likewise.
>   (aarch64_handle_pac_ret_b_key): Likewise.
>   (aarch64_handle_bti_protection): Likewise.
>   (aarch64_override_options): Update branch protection validation.
>   (aarch64_handle_attr_branch_protection): Likewise.
>   * config/arm/aarch-common-protos.h (aarch_validate_mbranch_protection):
>   Pass branch protection type description as argument.
>   (struct aarch_branch_protect_type): Move from aarch-common.h.
>   * config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
>   Remove.
>   (aarch_handle_standard_branch_protection): Remove.
>   (aarch_handle_pac_ret_protection): Remove.
>   (aarch_handle_pac_ret_leaf): Remove.
>   (aarch_handle_pac_ret_b_key): Remove.
>   (aarch_handle_bti_protection): Remove.
>   (aarch_validate_mbranch_protection): Pass branch protection type
>   description as argument.
>   * config/arm/aarch-common.h (enum aarch_key_type): Remove.
>   (struct aarch_branch_protect_type): Remove.
>   * config/arm/arm-c.cc (arm_cpu_builtins): Remove aarch_ra_sign_key.
>   * config/arm/arm.cc (arm_handle_no_branch_protection): Copy and rename.
>   (arm_handle_standard_branch_protection): Likewise.
>   (arm_handle_pac_ret_protection): Likewise.
>   (arm_handle_pac_ret_leaf): Likewise.
>   (arm_handle_bti_protection): Likewise.
>   (arm_configure_build_target): Update branch protection validation.
>   * config/arm/arm.opt: Remove aarch_ra_sign_key.
> ---
> v4:
> - pass types as argument to validation.
> - make target specific types data static.
> 
>  gcc/config/aarch64/aarch64-c.cc  |  4 +-
>  gcc/config/aarch64/aarch64.cc| 75 
>  gcc/config/aarch64/aarch64.md|  2 +-
>  gcc/config/aarch64/aarch64.opt   |  2 +-
>  gcc/config/arm/aarch-common-protos.h | 19 ++-
>  gcc/config/arm/aarch-common.cc   | 71 --
>  gcc/config/arm/aarch-common.h| 20 
>  gcc/config/arm/arm-c.cc  |  2 -
>  gcc/config/arm/arm.cc| 55 +---
>  gcc/config/arm/arm.opt   |  3 --
>  10 files changed, 145 insertions(+), 108 deletions(-)
> 


OK

R.



[PATCH] s390: Deprecate some vector builtins

2024-03-01 Thread Stefan Schulze Frielinghaus
According to IBM Open XL C/C++ for z/OS version 1.1 builtins

- vec_permi
- vec_ctd
- vec_ctsl
- vec_ctul
- vec_ld2f
- vec_st2f

are deprecated.  Also deprecate helper builtins vec_ctd_s64 and
vec_ctd_u64.

Furthermore, the overloads of vec_insert which make use of a bool vector
are deprecated, too.

gcc/ChangeLog:

* config/s390/s390-builtins.def (vec_permi): Deprecate.
(vec_ctd): Deprecate.
(vec_ctd_s64): Deprecate.
(vec_ctd_u64): Deprecate.
(vec_ctsl): Deprecate.
(vec_ctul): Deprecate.
(vec_ld2f): Deprecate.
(vec_st2f): Deprecate.
(vec_insert): Deprecate overloads with bool vectors.
---
 Ok for mainline?

 gcc/config/s390/s390-builtins.def | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 680a038fa4b..54f400ceb5a 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -416,16 +416,16 @@ B_DEF  (s390_vec_splat_s64, vec_splatsv2di,   
  0,
 OB_DEF (s390_vec_insert,s390_vec_insert_s8, 
s390_vec_insert_dbl,B_VX,   BT_FN_OV4SI_INT_OV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_V16QI_SCHAR_V16QI_INT)
 OB_DEF_VAR (s390_vec_insert_u8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_UV16QI_INT)
-OB_DEF_VAR (s390_vec_insert_b8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_BV16QI_INT)
+OB_DEF_VAR (s390_vec_insert_b8, s390_vlvgb, B_DEP, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_BV16QI_INT)
 OB_DEF_VAR (s390_vec_insert_s16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_V8HI_SHORT_V8HI_INT)
 OB_DEF_VAR (s390_vec_insert_u16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_UV8HI_USHORT_UV8HI_INT)
-OB_DEF_VAR (s390_vec_insert_b16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_UV8HI_USHORT_BV8HI_INT)
+OB_DEF_VAR (s390_vec_insert_b16,s390_vlvgh, B_DEP, 
 O3_ELEM,BT_OV_UV8HI_USHORT_BV8HI_INT)
 OB_DEF_VAR (s390_vec_insert_s32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_V4SI_INT_V4SI_INT)
 OB_DEF_VAR (s390_vec_insert_u32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_UV4SI_UINT_UV4SI_INT)
-OB_DEF_VAR (s390_vec_insert_b32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_UV4SI_UINT_BV4SI_INT)
+OB_DEF_VAR (s390_vec_insert_b32,s390_vlvgf, B_DEP, 
 O3_ELEM,BT_OV_UV4SI_UINT_BV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_V2DI_LONGLONG_V2DI_INT)
 OB_DEF_VAR (s390_vec_insert_u64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_UV2DI_INT)
-OB_DEF_VAR (s390_vec_insert_b64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_BV2DI_INT)
+OB_DEF_VAR (s390_vec_insert_b64,s390_vlvgg, B_DEP, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_BV2DI_INT)
 OB_DEF_VAR (s390_vec_insert_flt,s390_vlvgf_flt, B_VXE, 
 O3_ELEM,BT_OV_V4SF_FLT_V4SF_INT) /* vlvgf */
 OB_DEF_VAR (s390_vec_insert_dbl,s390_vlvgg_dbl, 0, 
 O3_ELEM,BT_OV_V2DF_DBL_V2DF_INT) /* vlvgg */
 
@@ -658,7 +658,7 @@ OB_DEF_VAR (s390_vec_perm_dbl,  s390_vperm, 
0,
 
 B_DEF  (s390_vperm, vec_permv16qi,  0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI_UV16QI_UV16QI)
 
-OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
+OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_DEP | B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
 OB_DEF_VAR (s390_vec_permi_s64, s390_vpdi,  0, 
 O3_U2,  BT_OV_V2DI_V2DI_V2DI_INT)
 OB_DEF_VAR (s390_vec_permi_b64, s390_vpdi,  0, 
 O3_U2,  BT_OV_BV2DI_BV2DI_BV2DI_INT)
 OB_DEF_VAR (s390_vec_permi_u64, s390_vpdi,  0, 
 O3_U2,  BT_OV_UV2DI_UV2DI_UV2DI_INT)
@@ -2806,7 +2806,7 @@ OB_DEF (s390_vec_any_ngt,   
s390_vec_any_ngt_flt,s390_vec_any_ngt_db
 OB_DEF_VAR (s390_vec_any_ngt_flt,   vec_any_unlev4sf,   B_VXE, 
 0,  BT_OV_INT_V4SF_V4SF)
 OB_DEF_VAR (s390_vec_any_ngt_dbl,   vec_any_unlev2df,   0, 
 0,  BT_OV_INT_V2DF_V2DF)
 
-OB_DEF (s390_vec_ctd,   s390_vec_ctd_s64,   s390_vec_ctd_u64,  
 B_VX,   BT_FN

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 3/1/24 10:32, Jason Merrill wrote:

On 3/1/24 10:00, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 2/29/24 15:56, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a function 
template
instantiation it seems we neglect to make the function depend on the 
enum

definition, which ultimately causes streaming to fail due to the enum
definition not being streamed before uses of its enumerators are 
streamed,

as far as I can tell.


I would think that the function doesn't need to depend on the local 
enum in
order for the local enum to be streamed before the use of the 
enumerator,

which comes after the definition of the enum in the function body?

Why isn't streaming the body of the function outputting the enum 
definition

before the use of the enumerator?


IIUC (based on observing the behavior for local classes) streaming the
definition of a local class/enum as part of the function definition is
what we want to avoid; we want to treat a local type definition as a
logically separate definition and stream it separately (similar
to class defns vs member defns I guess).  And by not registering a 
dependency

between the function and the local enum, we end up never streaming out
the local enum definition separately and instead stream it out as part
of the function definition (accidentally) which we then can't stream in
properly.

Perhaps the motivation for treating local type definitions as logically
separate from the function definition is because they can leak out of a
function with a deduced return type:

   auto f() {
 struct A { };
 return A();
   }

   using type = decltype(f()); // refers directly to f()::A


Yes, I believe that's what modules.cc refers to as a "voldemort".

But for non-voldemort local types, the declaration of the function 
doesn't depend on them, only the definition.  Why doesn't streaming them 
in the definition work properly?


And does your 99426 patch address that problem?

Jason



Re: [Patch] OpenMP/C++: Fix (first)private clause with member variables [PR110347]

2024-03-01 Thread Tobias Burnus

Jakub Jelinek wrote:

As discussed on IRC, I believe not disregarding the capture proxies in
target regions if they shouldn't be shared is always wrong, but also the
gimplify.cc suggestion was incorrect.

The thing is that at the place where the omp_disregard_value_expr call
is done currently for target region flags is always in_code ? GOVD_SEEN : 0
so by testing flags & anything we actually don't differentiate between
privatized vars and mapped vars.  So, it needs to be moved after we
actually compute the flags, similarly how we do it for non-target.

...

I have now added Jakub's updated the gimplify.cc patch, renamed the test 
files, added the proposed lambda test case as well, did add a missing 
line break, and updated the target-lambda-1.C to also work with shared 
memory.


I think the patch should be good, having testing it with offloading here 
and Jakub also testing it on his side.


Final comments, suggestions, remarks?

Tobias
OpenMP/C++: Fix (first)private clause with member variables [PR110347]

OpenMP permits '(first)private' for C++ member variables, which GCC handles
by tagging those by DECL_OMP_PRIVATIZED_MEMBER, adding a temporary VAR_DECL
and DECL_VALUE_EXPR pointing to the 'this->member_var' in the C++ front end.

The idea is that in omp-low.cc, the DECL_VALUE_EXPR is used before the
region (for 'firstprivate'; ignored for 'private') while in the region,
the DECL itself is used.

In gimplify, the value expansion is suppressed and deferred if the
  lang_hooks.decls.omp_disregard_value_expr (decl, shared)
returns true - which is never the case if 'shared' is true. In OpenMP 4.5,
only 'map' and 'use_device_ptr' was permitted for the 'target' directive.
And when OpenMP 5.0's 'private'/'firstprivate' clauses was added, the
the update that now 'shared' argument could be false was missed. The
respective check has now been added.

2024-03-01  Jakub Jelinek  
	Tobias Burnus  

	PR c++/110347

gcc/ChangeLog:

	* gimplify.cc (omp_notice_variable): Fix 'shared' arg to
	lang_hooks.decls.omp_disregard_value_expr for
	(first)private in target regions.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-lambda-3.C: Moved from
	gcc/testsuite/g++.dg/gomp/ and fixed is-mapped handling.
	* testsuite/libgomp.c++/target-lambda-1.C: Modify to also
	also work without offloading.
	* testsuite/libgomp.c++/firstprivate-1.C: New test.
	* testsuite/libgomp.c++/firstprivate-2.C: New test.
	* testsuite/libgomp.c++/private-1.C: New test.
	* testsuite/libgomp.c++/private-2.C: New test.
	* testsuite/libgomp.c++/target-lambda-4.C: New test.
	* testsuite/libgomp.c++/use_device_ptr-1.C: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/target-lambda-1.C: Moved to become a
	run-time test under testsuite/libgomp.c++.

Co-authored-by: Tobias Burnus 

 gcc/gimplify.cc  |  20 +-
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C  |  94 ---
 libgomp/testsuite/libgomp.c++/firstprivate-1.C   | 305 +++
 libgomp/testsuite/libgomp.c++/firstprivate-2.C   | 125 ++
 libgomp/testsuite/libgomp.c++/private-1.C| 247 ++
 libgomp/testsuite/libgomp.c++/private-2.C| 117 +
 libgomp/testsuite/libgomp.c++/target-lambda-1.C  |  15 +-
 libgomp/testsuite/libgomp.c++/target-lambda-3.C  | 104 
 libgomp/testsuite/libgomp.c++/target-lambda-4.C  |  41 +++
 libgomp/testsuite/libgomp.c++/use_device_ptr-1.C | 126 ++
 10 files changed, 1089 insertions(+), 105 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..6ebca964cb2 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8144,13 +8144,6 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
   if ((ctx->region_type & ORT_TARGET) != 0)
 {
-  if (ctx->region_type & ORT_ACC)
-	/* For OpenACC, as remarked above, defer expansion.  */
-	shared = false;
-  else
-	shared = true;
-
-  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
   if (n == NULL)
 	{
 	  unsigned nflags = flags;
@@ -8275,9 +8268,22 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 	}
 	found_outer:
 	  omp_add_variable (ctx, decl, nflags);
+	  if (ctx->region_type & ORT_ACC)
+	/* For OpenACC, as remarked above, defer expansion.  */
+	shared = false;
+	  else
+	shared = (nflags & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE)) == 0;
+	  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
 	}
   else
 	{
+	  if (ctx->region_type & ORT_ACC)
+	/* For OpenACC, as remarked above, defer expansion.  */
+	shared = false;
+	  else
+	shared = ((n->value | flags)
+		  & (GOVD_PRIVATE | GOVD_FIRSTPRIVATE)) == 0;
+	  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
 	  /* If nothing changed, there's nothing left to do.  */
 	  if ((n->value & flags) == flags)
 	return ret;
diff --git a/gcc/testsuite/

Re: [Patch] OpenMP/C++: Fix (first)private clause with member variables [PR110347]

2024-03-01 Thread Jakub Jelinek
On Fri, Mar 01, 2024 at 05:19:29PM +0100, Tobias Burnus wrote:
> Jakub Jelinek wrote:
> > As discussed on IRC, I believe not disregarding the capture proxies in
> > target regions if they shouldn't be shared is always wrong, but also the
> > gimplify.cc suggestion was incorrect.
> > 
> > The thing is that at the place where the omp_disregard_value_expr call
> > is done currently for target region flags is always in_code ? GOVD_SEEN : 0
> > so by testing flags & anything we actually don't differentiate between
> > privatized vars and mapped vars.  So, it needs to be moved after we
> > actually compute the flags, similarly how we do it for non-target.
> ...
> 
> I have now added Jakub's updated the gimplify.cc patch, renamed the test
> files, added the proposed lambda test case as well, did add a missing line
> break, and updated the target-lambda-1.C to also work with shared memory.
> 
> I think the patch should be good, having testing it with offloading here and
> Jakub also testing it on his side.
> 
> Final comments, suggestions, remarks?

LGTM, thanks.
Just please mention those FIXMEs somewhere in PR113436, so that when that
bug is fixed we don't remember to remove those #if 0s.

Jakub



Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 3/1/24 10:00, Patrick Palka wrote:
> > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > 
> > > On 2/29/24 15:56, Patrick Palka wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > OK for trunk?
> > > > 
> > > > -- >8 --
> > > > 
> > > > For local enums defined in a non-template function or a function
> > > > template
> > > > instantiation it seems we neglect to make the function depend on the
> > > > enum
> > > > definition, which ultimately causes streaming to fail due to the enum
> > > > definition not being streamed before uses of its enumerators are
> > > > streamed,
> > > > as far as I can tell.
> > > 
> > > I would think that the function doesn't need to depend on the local enum
> > > in
> > > order for the local enum to be streamed before the use of the enumerator,
> > > which comes after the definition of the enum in the function body?
> > > 
> > > Why isn't streaming the body of the function outputting the enum
> > > definition
> > > before the use of the enumerator?
> > 
> > IIUC (based on observing the behavior for local classes) streaming the
> > definition of a local class/enum as part of the function definition is
> > what we want to avoid; we want to treat a local type definition as a
> > logically separate definition and stream it separately (similar
> > to class defns vs member defns I guess).  And by not registering a
> > dependency
> > between the function and the local enum, we end up never streaming out
> > the local enum definition separately and instead stream it out as part
> > of the function definition (accidentally) which we then can't stream in
> > properly.
> > 
> > Perhaps the motivation for treating local type definitions as logically
> > separate from the function definition is because they can leak out of a
> > function with a deduced return type:
> > 
> >auto f() {
> >  struct A { };
> >  return A();
> >}
> > 
> >using type = decltype(f()); // refers directly to f()::A
> 
> Yes, I believe that's what modules.cc refers to as a "voldemort".
> 
> But for non-voldemort local types, the declaration of the function doesn't
> depend on them, only the definition.  Why doesn't streaming them in the
> definition work properly?

I should note that for a templated local type we already always add a
dependency between the function template _pattern_ and the local type
_pattern_ and therefore always stream the local type pattern separately
(even if its not actually a voldemort), thanks to the TREE_CODE (decl) == 
TEMPLATE_DECL
case guarding the add_dependency call (inside a template pattern we
see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
missing only when the function is a non-template or non-template-pattern.
My patch makes us consistently add the dependency and in turn consistently 
stream the definitions separately.

(For a local _class_, in the non-template and non-template-pattern case
we currently add a dependency between the function and the
injected-class-name of the class as opposed to the class itself, which
seems quite accidental but suffices.  And that's why only local enums
are problematic currently.  After my patch we instead add a dependency
to the local class itself.)

Part of the puzzle of why we don't/can't stream them as part of the
function definition is because we don't mark the enumerators for
by-value walking when marking the function definition.  So when
streaming out the enumerator definition we stream out _references_
to the enumerators (tt_const_decl tags) instead of the actual
definitions which breaks stream-in.

The best place to mark local types for by-value walking would be
in trees_out::mark_function_def which is suspiciously empty!  I
experimented with (never mind that it only marks the outermost block's
types):

@@ -11713,8 +11713,12 @@ trees_out::write_function_def (tree decl)
 }
 
 void
-trees_out::mark_function_def (tree)
+trees_out::mark_function_def (tree decl)
 {
+  tree initial = DECL_INITIAL (decl);
+  for (tree var = BLOCK_VARS (initial); var; var = DECL_CHAIN (var))
+if (DECL_IMPLICIT_TYPEDEF_P (var))
+  mark_declaration (var, true);
 }

Which actually fixes the non-template PR104919 testcase, but it
breaks streaming of templated local types wherein we run into
the sanity check:

@@ -7677,16 +7677,6 @@ trees_out::decl_value (tree decl, depset *dep)
 
   merge_kind mk = get_merge_kind (decl, dep);
 
!  if (CHECKING_P)
!{
!  /* Never start in the middle of a template.  */
!  int use_tpl = -1;
!  if (tree ti = node_template_info (decl, use_tpl))
!   gcc_checking_assert (TREE_CODE (TI_TEMPLATE (ti)) == OVERLOAD
!|| TREE_CODE (TI_TEMPLATE (ti)) == FIELD_DECL
!|| (DECL_TEMPLATE_RESULT (TI_TEMPLATE (ti))
!!= decl));
!}
 
   if (streaming_p ())
 {

If we try to work around this sanity check by only marking local

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 3/1/24 10:32, Jason Merrill wrote:
> > On 3/1/24 10:00, Patrick Palka wrote:
> > > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > > 
> > > > On 2/29/24 15:56, Patrick Palka wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > OK for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > For local enums defined in a non-template function or a function
> > > > > template
> > > > > instantiation it seems we neglect to make the function depend on the
> > > > > enum
> > > > > definition, which ultimately causes streaming to fail due to the enum
> > > > > definition not being streamed before uses of its enumerators are
> > > > > streamed,
> > > > > as far as I can tell.
> > > > 
> > > > I would think that the function doesn't need to depend on the local enum
> > > > in
> > > > order for the local enum to be streamed before the use of the
> > > > enumerator,
> > > > which comes after the definition of the enum in the function body?
> > > > 
> > > > Why isn't streaming the body of the function outputting the enum
> > > > definition
> > > > before the use of the enumerator?
> > > 
> > > IIUC (based on observing the behavior for local classes) streaming the
> > > definition of a local class/enum as part of the function definition is
> > > what we want to avoid; we want to treat a local type definition as a
> > > logically separate definition and stream it separately (similar
> > > to class defns vs member defns I guess).  And by not registering a
> > > dependency
> > > between the function and the local enum, we end up never streaming out
> > > the local enum definition separately and instead stream it out as part
> > > of the function definition (accidentally) which we then can't stream in
> > > properly.
> > > 
> > > Perhaps the motivation for treating local type definitions as logically
> > > separate from the function definition is because they can leak out of a
> > > function with a deduced return type:
> > > 
> > >    auto f() {
> > >  struct A { };
> > >  return A();
> > >    }
> > > 
> > >    using type = decltype(f()); // refers directly to f()::A
> > 
> > Yes, I believe that's what modules.cc refers to as a "voldemort".
> > 
> > But for non-voldemort local types, the declaration of the function doesn't
> > depend on them, only the definition.  Why doesn't streaming them in the
> > definition work properly?
> 
> And does your 99426 patch address that problem?

I don't think so, that patch should only affect declaration merging (of
a streamed-in local type with the corresponding in-TU local type after
their containing function is merged).


> > This was nearly enough to make things work, except we now ran into
> > issues with the local TYPE/CONST_DECL copies when streaming the
> > constexpr version of a function body.  It occurred to me that we don't
> > need to make copies of local types when copying a constexpr function
> > body; only VAR_DECLs etc need to be copied for sake of recursive
> > constexpr calls.  So this patch adjusts copy_fn accordingly.
>
> Maybe adjust can_be_nonlocal instead?  It seems unnecessary in general
> to remap types and enumerators for inlining.

Unfortunately this approached caused a boostrap failure with Ada:

raised STORAGE_ERROR : stack overflow or erroneous memory access

The patch was

--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -725,6 +725,9 @@ can_be_nonlocal (tree decl, copy_body_data *id)
   if (TREE_CODE (decl) == FUNCTION_DECL)
 return true;
 
+  if (TREE_CODE (decl) == TYPE_DECL || TREE_CODE (decl) == CONST_DECL)
+return true;
+
   /* Local static vars must be non-local or we get multiple declaration
  problems.  */
   if (VAR_P (decl) && !auto_var_in_fn_p (decl, id->src_fn))


> 
> Jason
> 
> 

[patch,avr,applied] Adjust help messages.

2024-03-01 Thread Georg-Johann Lay

This patch unifies help screen messages.

Johann

--

AVR: Overhaul help screen

gcc/
* config/avr/avr.opt: Overhaul help screen.diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index ea35b7d5b4e..c3ca8379ee3 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -20,27 +20,27 @@
 
 mcall-prologues
 Target Mask(CALL_PROLOGUES) Optimization
-Use subroutines for function prologues and epilogues.
+Optimization. Use subroutines for function prologues and epilogues.
 
 mmcu=
 Target RejectNegative Joined Var(avr_mmcu) MissingArgError(missing device or architecture after %qs)
--mmcu=MCU	Select the target MCU.
+-mmcu=	Select the target MCU.
 
 mgas-isr-prologues
 Target Var(avr_gasisr_prologues) UInteger Init(0) Optimization
-Allow usage of __gcc_isr pseudo instructions in ISR prologues and epilogues.
+Optimization. Allow usage of __gcc_isr pseudo instructions in ISR prologues and epilogues.
 
 mn-flash=
 Target RejectNegative Joined Var(avr_n_flash) UInteger Init(-1)
-Set the number of 64 KiB flash segments.
+This option is used internally. Set the number of 64 KiB flash segments.
 
 mskip-bug
 Target Mask(SKIP_BUG)
-Indicate presence of a processor erratum.
+This option is used internally. Indicate presence of a processor erratum.  Do not skip 32-bit instructions.
 
 mrmw
 Target Mask(RMW)
-Enable Read-Modify-Write (RMW) instructions support/use.
+This option is used internally. Enable Read-Modify-Write (RMW) instructions support/use.
 
 mdeb
 Target Undocumented Mask(ALL_DEBUG)
@@ -50,7 +50,7 @@ Target RejectNegative Joined Undocumented Var(avr_log_details)
 
 mshort-calls
 Target RejectNegative Mask(SHORT_CALLS)
-Use RJMP / RCALL even though CALL / JMP are available.
+This option is used internally for multilib generation and selection.  Assume RJMP / RCALL can target all program memory.
 
 mint8
 Target Mask(INT8)
@@ -62,11 +62,11 @@ Change the stack pointer without disabling interrupts.
 
 mbranch-cost=
 Target Joined RejectNegative UInteger Var(avr_branch_cost) Init(0) Optimization
-Set the branch costs for conditional branch instructions.  Reasonable values are small, non-negative integers.  The default branch cost is 0.
+-mbranch-cost=	Optimization. Set the branch costs for conditional branch instructions.  Reasonable values are small, non-negative integers.  The default branch cost is 0.
 
 mmain-is-OS_task
 Target Mask(MAIN_IS_OS_TASK) Optimization
-Treat main as if it had attribute OS_task.
+Optimization. Treat main as if it had attribute OS_task.
 
 morder1
 Target Undocumented Mask(ORDER_1)
@@ -80,7 +80,7 @@ Change only the low 8 bits of the stack pointer.
 
 mrelax
 Target Optimization
-Relax branches.
+Optimization. Relax branches.
 
 mpmem-wrap-around
 Target
@@ -88,15 +88,15 @@ Make the linker relaxation machine assume that a program counter wrap-around occ
 
 maccumulate-args
 Target Mask(ACCUMULATE_OUTGOING_ARGS) Optimization
-Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments in function prologue/epilogue.  Without this option, outgoing arguments are pushed before calling a function and popped afterwards.  This option can lead to reduced code size for functions that call many functions that get their arguments on the stack like, for example printf.
+Optimization. Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments in function prologue/epilogue.  Without this option, outgoing arguments are pushed before calling a function and popped afterwards.  This option can lead to reduced code size for functions that call many functions that get their arguments on the stack like, for example printf.
 
 mstrict-X
 Target Var(avr_strict_X) Init(0) Optimization
-When accessing RAM, use X as imposed by the hardware, i.e. just use pre-decrement, post-increment and indirect addressing with the X register.  Without this option, the compiler may assume that there is an addressing mode X+const similar to Y+const and Z+const and emit instructions to emulate such an addressing mode for X.
+Optimization. When accessing RAM, use X as imposed by the hardware, i.e. just use pre-decrement, post-increment and indirect addressing with the X register.  Without this option, the compiler may assume that there is an addressing mode X+const similar to Y+const and Z+const and emit instructions to emulate such an addressing mode for X.
 
 mflmap
 Target Var(avr_flmap) Init(0)
-The device has the bitfield NVMCTRL_CTRLB.FLMAP.  This option is used internally.
+This option is used internally. The device has the bitfield NVMCTRL_CTRLB.FLMAP.
 
 mrodata-in-ram
 Target Var(avr_rodata_in_ram) Init(-1)
@@ -105,15 +105,15 @@ The device has the .rodata section located in the RAM area.
 ;; For rationale behind -msp8 see explanation in avr.h.
 msp8
 Target RejectNegative Var(avr_sp8) Init(0)
-The device has no SPH special function register. This option will be overridden by the compile

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Patrick Palka wrote:

> On Fri, 1 Mar 2024, Jason Merrill wrote:
> 
> > On 3/1/24 10:00, Patrick Palka wrote:
> > > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > > 
> > > > On 2/29/24 15:56, Patrick Palka wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > OK for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > For local enums defined in a non-template function or a function
> > > > > template
> > > > > instantiation it seems we neglect to make the function depend on the
> > > > > enum
> > > > > definition, which ultimately causes streaming to fail due to the enum
> > > > > definition not being streamed before uses of its enumerators are
> > > > > streamed,
> > > > > as far as I can tell.
> > > > 
> > > > I would think that the function doesn't need to depend on the local enum
> > > > in
> > > > order for the local enum to be streamed before the use of the 
> > > > enumerator,
> > > > which comes after the definition of the enum in the function body?
> > > > 
> > > > Why isn't streaming the body of the function outputting the enum
> > > > definition
> > > > before the use of the enumerator?
> > > 
> > > IIUC (based on observing the behavior for local classes) streaming the
> > > definition of a local class/enum as part of the function definition is
> > > what we want to avoid; we want to treat a local type definition as a
> > > logically separate definition and stream it separately (similar
> > > to class defns vs member defns I guess).  And by not registering a
> > > dependency
> > > between the function and the local enum, we end up never streaming out
> > > the local enum definition separately and instead stream it out as part
> > > of the function definition (accidentally) which we then can't stream in
> > > properly.
> > > 
> > > Perhaps the motivation for treating local type definitions as logically
> > > separate from the function definition is because they can leak out of a
> > > function with a deduced return type:
> > > 
> > >auto f() {
> > >  struct A { };
> > >  return A();
> > >}
> > > 
> > >using type = decltype(f()); // refers directly to f()::A
> > 
> > Yes, I believe that's what modules.cc refers to as a "voldemort".
> > 
> > But for non-voldemort local types, the declaration of the function doesn't
> > depend on them, only the definition.  Why doesn't streaming them in the
> > definition work properly?
> 
> I should note that for a templated local type we already always add a
> dependency between the function template _pattern_ and the local type
> _pattern_ and therefore always stream the local type pattern separately
> (even if its not actually a voldemort), thanks to the TREE_CODE (decl) == 
> TEMPLATE_DECL
> case guarding the add_dependency call (inside a template pattern we
> see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
> missing only when the function is a non-template or non-template-pattern.
> My patch makes us consistently add the dependency and in turn consistently 
> stream the definitions separately.
> 
> (For a local _class_, in the non-template and non-template-pattern case
> we currently add a dependency between the function and the
> injected-class-name of the class as opposed to the class itself, which
> seems quite accidental but suffices.  And that's why only local enums
> are problematic currently.  After my patch we instead add a dependency
> to the local class itself.)
> 
> Part of the puzzle of why we don't/can't stream them as part of the
> function definition is because we don't mark the enumerators for
> by-value walking when marking the function definition.  So when
> streaming out the enumerator definition we stream out _references_
> to the enumerators (tt_const_decl tags) instead of the actual
> definitions which breaks stream-in.
> 
> The best place to mark local types for by-value walking would be
> in trees_out::mark_function_def which is suspiciously empty!  I
> experimented with (never mind that it only marks the outermost block's
> types):
> 
> @@ -11713,8 +11713,12 @@ trees_out::write_function_def (tree decl)
>  }
>  
>  void
> -trees_out::mark_function_def (tree)
> +trees_out::mark_function_def (tree decl)
>  {
> +  tree initial = DECL_INITIAL (decl);
> +  for (tree var = BLOCK_VARS (initial); var; var = DECL_CHAIN (var))
> +if (DECL_IMPLICIT_TYPEDEF_P (var))
> +  mark_declaration (var, true);
>  }
> 
> Which actually fixes the non-template PR104919 testcase, but it
> breaks streaming of templated local types wherein we run into
> the sanity check:
> 
> @@ -7677,16 +7677,6 @@ trees_out::decl_value (tree decl, depset *dep)
>  
>merge_kind mk = get_merge_kind (decl, dep);
>  
> !  if (CHECKING_P)
> !{
> !  /* Never start in the middle of a template.  */
> !  int use_tpl = -1;
> !  if (tree ti = node_template_info (decl, use_tpl))
> ! gcc_checking_assert (TREE_CODE (TI_TEMPLATE (ti)) == OVERLOAD
> !

[PATCH v4] c++: implement [[gnu::non_owning]] [PR110358]

2024-03-01 Thread Marek Polacek
On Thu, Feb 29, 2024 at 07:30:02PM -0500, Jason Merrill wrote:
> On 2/29/24 19:12, Marek Polacek wrote:
> > On Wed, Feb 28, 2024 at 06:03:54PM -0500, Jason Merrill wrote:
> > 
> > > Hmm, if we're also going to allow the attribute to be applied to a 
> > > function,
> > > the name doesn't make so much sense.  For a class, it says that the class
> > > refers to its initializer; for a function, it says that the function 
> > > return
> > > value *doesn't* refer to its argument.
> > 
> > Yeah, that's a fair point; I guess "non_owning" would be too perplexing.
> > 
> > > If we want something that can apply to both classes and functions, we're
> > > probably back to an attribute that just suppresses the warning, with a
> > > different name.
> > > 
> > > Or I guess we could have two attributes, but that seems like a lot.
> > > 
> > > WDYT?
> > 
> > I think we don't want two separate attributes, and we do want that one
> > attribute to apply to both fns and classes.  We could implement something
> > like
> > 
> >[[gnu::no_warning("Wdangling-reference")]]
> >[[gnu::no_warning("Wdangling-reference", bool)]]
> > 
> > but first, that's a lot of typing, second, it would be confusing because
> > it wouldn't work for any other warning.  We already have [[unused]] and
> > [[maybe_unused]] whose effect is to suppress a warning.  It think our
> > best bet is to do the most straightforward thing: [[gnu::no_dangling]],
> > which this patch implements.  I didn't call it no_dangling_reference in
> > the hope that it can, some day, be also used for some -Wdangling-pointer
> > purposes.
> > 
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > Since -Wdangling-reference has false positives that can't be
> > prevented, we should offer an easy way to suppress the warning.
> > Currently, that is only possible by using a #pragma, either around the
> > enclosing class or around the call site.  But #pragma GCC diagnostic tend
> > to be onerous.  A better solution would be to have an attribute.
> > 
> > To that end, this patch adds a new attribute, [[gnu::no_dangling]].
> > This attribute takes an optional bool argument to support cases like:
> > 
> >template 
> >struct [[gnu::no_dangling(std::is_reference_v)]] S {
> >   // ...
> >};
> > 
> > PR c++/110358
> > PR c++/109642
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (no_dangling_p): New.
> > (reference_like_class_p): Use it.
> > (do_warn_dangling_reference): Use it.  Don't warn when the function
> > or its enclosing class has attribute gnu::no_dangling.
> > * tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
> > (handle_no_dangling_attribute): New.
> > 
> > gcc/ChangeLog:
> > 
> > * doc/extend.texi: Document gnu::no_dangling.
> > * doc/invoke.texi: Mention that gnu::no_dangling disables
> > -Wdangling-reference.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/ext/attr-no-dangling1.C: New test.
> > * g++.dg/ext/attr-no-dangling2.C: New test.
> > * g++.dg/ext/attr-no-dangling3.C: New test.
> > * g++.dg/ext/attr-no-dangling4.C: New test.
> > * g++.dg/ext/attr-no-dangling5.C: New test.
> > * g++.dg/ext/attr-no-dangling6.C: New test.
> > * g++.dg/ext/attr-no-dangling7.C: New test.
> > * g++.dg/ext/attr-no-dangling8.C: New test.
> > * g++.dg/ext/attr-no-dangling9.C: New test.
> > ---
> >   gcc/cp/call.cc   | 38 ++--
> >   gcc/cp/tree.cc   | 26 
> >   gcc/doc/extend.texi  | 21 +++
> >   gcc/doc/invoke.texi  | 21 +++
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling1.C | 38 
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling2.C | 29 +
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling3.C | 24 
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling4.C | 14 +
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling5.C | 31 ++
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling6.C | 65 
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling7.C | 31 ++
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling8.C | 30 +
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling9.C | 25 
> >   13 files changed, 387 insertions(+), 6 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling2.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling3.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling4.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling5.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling6.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling7.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling8.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling9.C
> > 
> > diff --git a/gcc/cp/call.cc

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 3/1/24 11:45, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


This was nearly enough to make things work, except we now ran into
issues with the local TYPE/CONST_DECL copies when streaming the
constexpr version of a function body.  It occurred to me that we don't
need to make copies of local types when copying a constexpr function
body; only VAR_DECLs etc need to be copied for sake of recursive
constexpr calls.  So this patch adjusts copy_fn accordingly.


Maybe adjust can_be_nonlocal instead?  It seems unnecessary in general
to remap types and enumerators for inlining.


Unfortunately this approached caused a boostrap failure with Ada:

raised STORAGE_ERROR : stack overflow or erroneous memory access

The patch was

--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -725,6 +725,9 @@ can_be_nonlocal (tree decl, copy_body_data *id)
if (TREE_CODE (decl) == FUNCTION_DECL)
  return true;
  
+  if (TREE_CODE (decl) == TYPE_DECL || TREE_CODE (decl) == CONST_DECL)

+return true;


Hmm, maybe a problem with a local type whose size depends on a local 
variable, so this would need to exclude that case.  I suppose an 
enumerator could also have a value of sizeof(local-var), even in C++.


Jason



Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 3/1/24 12:08, Patrick Palka wrote:

On Fri, 1 Mar 2024, Patrick Palka wrote:


On Fri, 1 Mar 2024, Jason Merrill wrote:


On 3/1/24 10:00, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 2/29/24 15:56, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a function
template
instantiation it seems we neglect to make the function depend on the
enum
definition, which ultimately causes streaming to fail due to the enum
definition not being streamed before uses of its enumerators are
streamed,
as far as I can tell.


I would think that the function doesn't need to depend on the local enum
in
order for the local enum to be streamed before the use of the enumerator,
which comes after the definition of the enum in the function body?

Why isn't streaming the body of the function outputting the enum
definition
before the use of the enumerator?


IIUC (based on observing the behavior for local classes) streaming the
definition of a local class/enum as part of the function definition is
what we want to avoid; we want to treat a local type definition as a
logically separate definition and stream it separately (similar
to class defns vs member defns I guess).  And by not registering a
dependency
between the function and the local enum, we end up never streaming out
the local enum definition separately and instead stream it out as part
of the function definition (accidentally) which we then can't stream in
properly.

Perhaps the motivation for treating local type definitions as logically
separate from the function definition is because they can leak out of a
function with a deduced return type:

auto f() {
  struct A { };
  return A();
}

using type = decltype(f()); // refers directly to f()::A


Yes, I believe that's what modules.cc refers to as a "voldemort".

But for non-voldemort local types, the declaration of the function doesn't
depend on them, only the definition.  Why doesn't streaming them in the
definition work properly?


I should note that for a templated local type we already always add a
dependency between the function template _pattern_ and the local type
_pattern_ and therefore always stream the local type pattern separately
(even if its not actually a voldemort), thanks to the TREE_CODE (decl) == 
TEMPLATE_DECL
case guarding the add_dependency call (inside a template pattern we
see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
missing only when the function is a non-template or non-template-pattern.
My patch makes us consistently add the dependency and in turn consistently 
stream the definitions separately.

(For a local _class_, in the non-template and non-template-pattern case
we currently add a dependency between the function and the
injected-class-name of the class as opposed to the class itself, which
seems quite accidental but suffices.  And that's why only local enums
are problematic currently.  After my patch we instead add a dependency
to the local class itself.)

Part of the puzzle of why we don't/can't stream them as part of the
function definition is because we don't mark the enumerators for
by-value walking when marking the function definition.  So when
streaming out the enumerator definition we stream out _references_
to the enumerators (tt_const_decl tags) instead of the actual
definitions which breaks stream-in.

The best place to mark local types for by-value walking would be
in trees_out::mark_function_def which is suspiciously empty!  I
experimented with (never mind that it only marks the outermost block's
types):

@@ -11713,8 +11713,12 @@ trees_out::write_function_def (tree decl)
  }
  
  void

-trees_out::mark_function_def (tree)
+trees_out::mark_function_def (tree decl)
  {
+  tree initial = DECL_INITIAL (decl);
+  for (tree var = BLOCK_VARS (initial); var; var = DECL_CHAIN (var))
+if (DECL_IMPLICIT_TYPEDEF_P (var))
+  mark_declaration (var, true);
  }

Which actually fixes the non-template PR104919 testcase, but it
breaks streaming of templated local types wherein we run into
the sanity check:

@@ -7677,16 +7677,6 @@ trees_out::decl_value (tree decl, depset *dep)
  
merge_kind mk = get_merge_kind (decl, dep);
  
!  if (CHECKING_P)

!{
!  /* Never start in the middle of a template.  */
!  int use_tpl = -1;
!  if (tree ti = node_template_info (decl, use_tpl))
!   gcc_checking_assert (TREE_CODE (TI_TEMPLATE (ti)) == OVERLOAD
!|| TREE_CODE (TI_TEMPLATE (ti)) == FIELD_DECL
!|| (DECL_TEMPLATE_RESULT (TI_TEMPLATE (ti))
!!= decl));
!}
  
if (streaming_p ())

  {

If we try to work around this sanity check by only marking local types
when inside a non-template and non-template-pattern (i.e. inside an
instantiation):

@@ -11713,8 +11713,16 @@ trees_out::write_function_def (tree decl)
  }
 

Re: [PATCH v4] c++: implement [[gnu::non_owning]] [PR110358]

2024-03-01 Thread Jason Merrill

On 3/1/24 12:39, Marek Polacek wrote:

On Thu, Feb 29, 2024 at 07:30:02PM -0500, Jason Merrill wrote:

On 2/29/24 19:12, Marek Polacek wrote:

On Wed, Feb 28, 2024 at 06:03:54PM -0500, Jason Merrill wrote:


Hmm, if we're also going to allow the attribute to be applied to a function,
the name doesn't make so much sense.  For a class, it says that the class
refers to its initializer; for a function, it says that the function return
value *doesn't* refer to its argument.


Yeah, that's a fair point; I guess "non_owning" would be too perplexing.


If we want something that can apply to both classes and functions, we're
probably back to an attribute that just suppresses the warning, with a
different name.

Or I guess we could have two attributes, but that seems like a lot.

WDYT?


I think we don't want two separate attributes, and we do want that one
attribute to apply to both fns and classes.  We could implement something
like

[[gnu::no_warning("Wdangling-reference")]]
[[gnu::no_warning("Wdangling-reference", bool)]]

but first, that's a lot of typing, second, it would be confusing because
it wouldn't work for any other warning.  We already have [[unused]] and
[[maybe_unused]] whose effect is to suppress a warning.  It think our
best bet is to do the most straightforward thing: [[gnu::no_dangling]],
which this patch implements.  I didn't call it no_dangling_reference in
the hope that it can, some day, be also used for some -Wdangling-pointer
purposes.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site.  But #pragma GCC diagnostic tend
to be onerous.  A better solution would be to have an attribute.

To that end, this patch adds a new attribute, [[gnu::no_dangling]].
This attribute takes an optional bool argument to support cases like:

template 
struct [[gnu::no_dangling(std::is_reference_v)]] S {
   // ...
};

PR c++/110358
PR c++/109642

gcc/cp/ChangeLog:

* call.cc (no_dangling_p): New.
(reference_like_class_p): Use it.
(do_warn_dangling_reference): Use it.  Don't warn when the function
or its enclosing class has attribute gnu::no_dangling.
* tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
(handle_no_dangling_attribute): New.

gcc/ChangeLog:

* doc/extend.texi: Document gnu::no_dangling.
* doc/invoke.texi: Mention that gnu::no_dangling disables
-Wdangling-reference.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-no-dangling1.C: New test.
* g++.dg/ext/attr-no-dangling2.C: New test.
* g++.dg/ext/attr-no-dangling3.C: New test.
* g++.dg/ext/attr-no-dangling4.C: New test.
* g++.dg/ext/attr-no-dangling5.C: New test.
* g++.dg/ext/attr-no-dangling6.C: New test.
* g++.dg/ext/attr-no-dangling7.C: New test.
* g++.dg/ext/attr-no-dangling8.C: New test.
* g++.dg/ext/attr-no-dangling9.C: New test.
---
   gcc/cp/call.cc   | 38 ++--
   gcc/cp/tree.cc   | 26 
   gcc/doc/extend.texi  | 21 +++
   gcc/doc/invoke.texi  | 21 +++
   gcc/testsuite/g++.dg/ext/attr-no-dangling1.C | 38 
   gcc/testsuite/g++.dg/ext/attr-no-dangling2.C | 29 +
   gcc/testsuite/g++.dg/ext/attr-no-dangling3.C | 24 
   gcc/testsuite/g++.dg/ext/attr-no-dangling4.C | 14 +
   gcc/testsuite/g++.dg/ext/attr-no-dangling5.C | 31 ++
   gcc/testsuite/g++.dg/ext/attr-no-dangling6.C | 65 
   gcc/testsuite/g++.dg/ext/attr-no-dangling7.C | 31 ++
   gcc/testsuite/g++.dg/ext/attr-no-dangling8.C | 30 +
   gcc/testsuite/g++.dg/ext/attr-no-dangling9.C | 25 
   13 files changed, 387 insertions(+), 6 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling2.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling3.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling4.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling5.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling6.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling7.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling8.C
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling9.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c40ef2e3028..9e4c8073600 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14033,11 +14033,7 @@ std_pair_ref_ref_p (tree t)
 return true;
   }
-/* Return true if a class CTYPE is either std::reference_wrapper or
-   std::ref_view, or a reference wrapper class.  We conside

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 3/1/24 12:08, Patrick Palka wrote:
> > On Fri, 1 Mar 2024, Patrick Palka wrote:
> > 
> > > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > > 
> > > > On 3/1/24 10:00, Patrick Palka wrote:
> > > > > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > > > > 
> > > > > > On 2/29/24 15:56, Patrick Palka wrote:
> > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > > > > > > OK for trunk?
> > > > > > > 
> > > > > > > -- >8 --
> > > > > > > 
> > > > > > > For local enums defined in a non-template function or a function
> > > > > > > template
> > > > > > > instantiation it seems we neglect to make the function depend on
> > > > > > > the
> > > > > > > enum
> > > > > > > definition, which ultimately causes streaming to fail due to the
> > > > > > > enum
> > > > > > > definition not being streamed before uses of its enumerators are
> > > > > > > streamed,
> > > > > > > as far as I can tell.
> > > > > > 
> > > > > > I would think that the function doesn't need to depend on the local
> > > > > > enum
> > > > > > in
> > > > > > order for the local enum to be streamed before the use of the
> > > > > > enumerator,
> > > > > > which comes after the definition of the enum in the function body?
> > > > > > 
> > > > > > Why isn't streaming the body of the function outputting the enum
> > > > > > definition
> > > > > > before the use of the enumerator?
> > > > > 
> > > > > IIUC (based on observing the behavior for local classes) streaming the
> > > > > definition of a local class/enum as part of the function definition is
> > > > > what we want to avoid; we want to treat a local type definition as a
> > > > > logically separate definition and stream it separately (similar
> > > > > to class defns vs member defns I guess).  And by not registering a
> > > > > dependency
> > > > > between the function and the local enum, we end up never streaming out
> > > > > the local enum definition separately and instead stream it out as part
> > > > > of the function definition (accidentally) which we then can't stream
> > > > > in
> > > > > properly.
> > > > > 
> > > > > Perhaps the motivation for treating local type definitions as
> > > > > logically
> > > > > separate from the function definition is because they can leak out of
> > > > > a
> > > > > function with a deduced return type:
> > > > > 
> > > > > auto f() {
> > > > >   struct A { };
> > > > >   return A();
> > > > > }
> > > > > 
> > > > > using type = decltype(f()); // refers directly to f()::A
> > > > 
> > > > Yes, I believe that's what modules.cc refers to as a "voldemort".
> > > > 
> > > > But for non-voldemort local types, the declaration of the function
> > > > doesn't
> > > > depend on them, only the definition.  Why doesn't streaming them in the
> > > > definition work properly?
> > > 
> > > I should note that for a templated local type we already always add a
> > > dependency between the function template _pattern_ and the local type
> > > _pattern_ and therefore always stream the local type pattern separately
> > > (even if its not actually a voldemort), thanks to the TREE_CODE (decl) ==
> > > TEMPLATE_DECL
> > > case guarding the add_dependency call (inside a template pattern we
> > > see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
> > > missing only when the function is a non-template or non-template-pattern.
> > > My patch makes us consistently add the dependency and in turn consistently
> > > stream the definitions separately.
> > > 
> > > (For a local _class_, in the non-template and non-template-pattern case
> > > we currently add a dependency between the function and the
> > > injected-class-name of the class as opposed to the class itself, which
> > > seems quite accidental but suffices.  And that's why only local enums
> > > are problematic currently.  After my patch we instead add a dependency
> > > to the local class itself.)
> > > 
> > > Part of the puzzle of why we don't/can't stream them as part of the
> > > function definition is because we don't mark the enumerators for
> > > by-value walking when marking the function definition.  So when
> > > streaming out the enumerator definition we stream out _references_
> > > to the enumerators (tt_const_decl tags) instead of the actual
> > > definitions which breaks stream-in.
> > > 
> > > The best place to mark local types for by-value walking would be
> > > in trees_out::mark_function_def which is suspiciously empty!  I
> > > experimented with (never mind that it only marks the outermost block's
> > > types):
> > > 
> > > @@ -11713,8 +11713,12 @@ trees_out::write_function_def (tree decl)
> > >   }
> > > void
> > > -trees_out::mark_function_def (tree)
> > > +trees_out::mark_function_def (tree decl)
> > >   {
> > > +  tree initial = DECL_INITIAL (decl);
> > > +  for (tree var = BLOCK_VARS (initial); var; var = DECL_CHAIN (var))
> > > +if (DECL_IMPLICIT_TYPEDEF_P (var))
> > > +  mark_declaration (v

Re: [PATCH v3] RISC-V: Introduce gcc option mrvv-vector-bits for RVV

2024-03-01 Thread Vineet Gupta
Hi Pan,

On 2/28/24 17:23, Li, Pan2 wrote:
>
> Personally I prefer to remove --param=riscv-autovec-preference=none
> and only allow
>
> mrvv-vector-bits, to avoid tricky(maybe) sematic of none preference.
> However, let’s
>
> wait for a while in case there are some comments from others.
>

We are very interested in this topic. Could you please CC me and Palmer
for future versions of the patchset.

Thx,
-Vineet

>  
>
> Pan
>
>  
>
> *From:*Kito Cheng 
> *Sent:* Wednesday, February 28, 2024 10:55 PM
> *To:* 钟居哲
> *Cc:* Li, Pan2 ; gcc-patches
> ; Wang, Yanzhang ;
> rdapp.gcc ; Jeff Law 
> *Subject:* Re: Re: [PATCH v3] RISC-V: Introduce gcc option
> mrvv-vector-bits for RVV
>
>  
>
> Hmm, maybe only keep --param=riscv-autovec-preference=none and remove
> other two if we think that might still useful? But anyway I have no
> strong opinion to keep that, I mean I am ok to remove whole
> --param=riscv-autovec-preference.
>
>  
>
> 钟居哲  於 2024年2月28日週三 21:59 寫道:
>
> I think it makes more sense to remove
> --param=riscv-autovec-preference and add -mrvv-vector-bits
>
>  
>
> 
>
> juzhe.zh...@rivai.ai
>
>  
>
> *From:* Kito Cheng 
>
> *Date:* 2024-02-28 20:56
>
> *To:* pan2.li 
>
> *CC:* gcc-patches ;
> juzhe.zhong ; yanzhang.wang
> ; rdapp.gcc
> ; jeffreyalaw
> 
>
> *Subject:* Re: [PATCH v3] RISC-V: Introduce gcc option
> mrvv-vector-bits for RVV
>
> Take one more look, I think this option should work and
> integrate with
>
> --param=riscv-autovec-preference= since they have similar jobs but
>
> slightly different.
>
>  
>
> We have 3 value for  --param=riscv-autovec-preference=: none,
> scalable
>
> and fixed-vlmax
>
>  
>
> -mrvv-vector-bits=scalable is work like
>
> --param=riscv-autovec-preference=scalable and
>
> -mrvv-vector-bits=zvl is work like
>
> --param=riscv-autovec-preference=fixed-vlmax.
>
>  
>
> So I think...we need to do some conflict check, like:
>
>  
>
> -mrvv-vector-bits=zvl can't work with
> --param=riscv-autovec-preference=scalable
>
> -mrvv-vector-bits=scalable can't work with
>
> --param=riscv-autovec-preference=fixed-vlmax
>
>  
>
> but it may not just alias since there is some useful
> combinations like:
>
>  
>
> -mrvv-vector-bits=zvl with --param=riscv-autovec-preference=none:
>
> NO auto vectorization but intrinsic code still could benefit
> from the
>
> -mrvv-vector-bits=zvl option.
>
>  
>
> -mrvv-vector-bits=scalable with
> --param=riscv-autovec-preference=none
>
> Should still work for VLS code gen, but just disable auto
>
> vectorization per the option semantic.
>
>  
>
> However here is something we need some fix, since
>
> --param=riscv-autovec-preference=none still disable VLS code
> gen for
>
> now, you can see some example here:
>
> https://godbolt.org/z/fMTr3eW7K
>
>  
>
> But I think it's really the right behavior here, this part
> might need
>
> to be fixed in vls_mode_valid_p and some other places.
>
>  
>
>  
>
> Anyway I think we need to check all use sites with
> RVV_FIXED_VLMAX and
>
> RVV_SCALABLE, and need to make sure all use site of
> RVV_FIXED_VLMAX
>
> also checked with RVV_VECTOR_BITS_ZVL.
>
>  
>
>  
>
>  
>
> > -/* Return the VLEN value associated with -march.
>
> > +static int
>
> > +riscv_convert_vector_bits (int min_vlen)
>
>  
>
> Not sure if we really need this function, it seems it always
> returns min_vlen?
>
>  
>
> > +{
>
> > +  int rvv_bits = 0;
>
> > +
>
> > +  switch (rvv_vector_bits)
>
> > +    {
>
> > +  case RVV_VECTOR_BITS_ZVL:
>
> > +  case RVV_VECTOR_BITS_SCALABLE:
>
> > +   rvv_bits = min_vlen;
>
> > +   break;
>
> > +  default:
>
> > +   gcc_unreachable ();
>
> > +    }
>
> > +
>
> > +  return rvv_bits;
>
> > +}
>
> > +
>
> > +/* Return the VLEN value associated with -march and
> -mwrvv-vector-bits.
>
>  
>



[PATCH] testsuite: ctf: make array in ctf-file-scope-1 fixed length

2024-03-01 Thread David Faust
The array member of struct SFOO in the ctf-file-scope-1 caused the test
to fail for the BPF target, since BPF does not support dynamic stack
allocation. The array does not need to variable length for the sake of
the test, so make it fixed length instead to allow the test to run
successfully for the bpf-unknown-none target.

Tested on x86_64-linux-gnu, and on x86_64-linux-gnu host for
bpf-unknown-none target.

gcc/testsuite/

* gcc.dg/debug/ctf/ctf-file-scope-1.c (SFOO): Make array member
fixed-length.
---
 gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c 
b/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c
index a683113e505..ddfb31da405 100644
--- a/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c
+++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c
@@ -9,7 +9,7 @@
 
 int foo (int n)
 {
-  typedef struct { int a[n]; } SFOO;
+  typedef struct { int a[6]; } SFOO;
 
   SFOO a;
   __attribute__ ((noinline)) SFOO gfoo (void) { return a; }
-- 
2.43.0



Re: [PATCH] testsuite: ctf: make array in ctf-file-scope-1 fixed length

2024-03-01 Thread Indu Bhagat

On 3/1/24 11:01, David Faust wrote:

The array member of struct SFOO in the ctf-file-scope-1 caused the test
to fail for the BPF target, since BPF does not support dynamic stack
allocation. The array does not need to variable length for the sake of
the test, so make it fixed length instead to allow the test to run
successfully for the bpf-unknown-none target.

Tested on x86_64-linux-gnu, and on x86_64-linux-gnu host for
bpf-unknown-none target.



LGTM.
Thanks!


gcc/testsuite/

* gcc.dg/debug/ctf/ctf-file-scope-1.c (SFOO): Make array member
fixed-length.
---
  gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c 
b/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c
index a683113e505..ddfb31da405 100644
--- a/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c
+++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf-file-scope-1.c
@@ -9,7 +9,7 @@
  
  int foo (int n)

  {
-  typedef struct { int a[n]; } SFOO;
+  typedef struct { int a[6]; } SFOO;
  
SFOO a;

__attribute__ ((noinline)) SFOO gfoo (void) { return a; }




Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 3/1/24 13:28, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 3/1/24 12:08, Patrick Palka wrote:

On Fri, 1 Mar 2024, Patrick Palka wrote:


On Fri, 1 Mar 2024, Jason Merrill wrote:


On 3/1/24 10:00, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 2/29/24 15:56, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a function
template
instantiation it seems we neglect to make the function depend on
the
enum
definition, which ultimately causes streaming to fail due to the
enum
definition not being streamed before uses of its enumerators are
streamed,
as far as I can tell.


I would think that the function doesn't need to depend on the local
enum
in
order for the local enum to be streamed before the use of the
enumerator,
which comes after the definition of the enum in the function body?

Why isn't streaming the body of the function outputting the enum
definition
before the use of the enumerator?


IIUC (based on observing the behavior for local classes) streaming the
definition of a local class/enum as part of the function definition is
what we want to avoid; we want to treat a local type definition as a
logically separate definition and stream it separately (similar
to class defns vs member defns I guess).  And by not registering a
dependency
between the function and the local enum, we end up never streaming out
the local enum definition separately and instead stream it out as part
of the function definition (accidentally) which we then can't stream
in
properly.

Perhaps the motivation for treating local type definitions as
logically
separate from the function definition is because they can leak out of
a
function with a deduced return type:

 auto f() {
   struct A { };
   return A();
 }

 using type = decltype(f()); // refers directly to f()::A


Yes, I believe that's what modules.cc refers to as a "voldemort".

But for non-voldemort local types, the declaration of the function
doesn't
depend on them, only the definition.  Why doesn't streaming them in the
definition work properly?


I should note that for a templated local type we already always add a
dependency between the function template _pattern_ and the local type
_pattern_ and therefore always stream the local type pattern separately
(even if its not actually a voldemort), thanks to the TREE_CODE (decl) ==
TEMPLATE_DECL
case guarding the add_dependency call (inside a template pattern we
see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
missing only when the function is a non-template or non-template-pattern.
My patch makes us consistently add the dependency and in turn consistently
stream the definitions separately.

(For a local _class_, in the non-template and non-template-pattern case
we currently add a dependency between the function and the
injected-class-name of the class as opposed to the class itself, which
seems quite accidental but suffices.  And that's why only local enums
are problematic currently.  After my patch we instead add a dependency
to the local class itself.)

Part of the puzzle of why we don't/can't stream them as part of the
function definition is because we don't mark the enumerators for
by-value walking when marking the function definition.  So when
streaming out the enumerator definition we stream out _references_
to the enumerators (tt_const_decl tags) instead of the actual
definitions which breaks stream-in.

The best place to mark local types for by-value walking would be
in trees_out::mark_function_def which is suspiciously empty!  I
experimented with (never mind that it only marks the outermost block's
types):

@@ -11713,8 +11713,12 @@ trees_out::write_function_def (tree decl)
   }
 void
-trees_out::mark_function_def (tree)
+trees_out::mark_function_def (tree decl)
   {
+  tree initial = DECL_INITIAL (decl);
+  for (tree var = BLOCK_VARS (initial); var; var = DECL_CHAIN (var))
+if (DECL_IMPLICIT_TYPEDEF_P (var))
+  mark_declaration (var, true);
   }

Which actually fixes the non-template PR104919 testcase, but it
breaks streaming of templated local types wherein we run into
the sanity check:

@@ -7677,16 +7677,6 @@ trees_out::decl_value (tree decl, depset *dep)
   merge_kind mk = get_merge_kind (decl, dep);
   !  if (CHECKING_P)
!{
!  /* Never start in the middle of a template.  */
!  int use_tpl = -1;
!  if (tree ti = node_template_info (decl, use_tpl))
!   gcc_checking_assert (TREE_CODE (TI_TEMPLATE (ti)) == OVERLOAD
!|| TREE_CODE (TI_TEMPLATE (ti)) == FIELD_DECL
!|| (DECL_TEMPLATE_RESULT (TI_TEMPLATE (ti))
!!= decl));
!}
   if (streaming_p ())
   {

If we try to work around this sanity check by only marking local types
when inside a non-template and non-template-pattern (i.e. inside an
ins

Re: [PATCH] c++/modules: relax diagnostic about GMF contents

2024-03-01 Thread Jason Merrill

On 2/15/24 16:51, Patrick Palka wrote:

On Thu, 15 Feb 2024, Jason Merrill wrote:


Relaxing to pedwarn is fine, but I think it should be on by default, not just
with -pedantic.  So it should get a new option.


Ah, like so?  I'm not sure about naming the option Wmodules-gmf-contents
vs just Wgmf-contents, or something else...


Maybe -Wglobal-module?  OK with that change.

Jason



Re: [PATCH] c++/modules: complete_vars ICE with non-exported constexpr var

2024-03-01 Thread Jason Merrill

On 2/26/24 15:52, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here during stream-in of the non-exported constexpr var 'a' we call
maybe_register_incomplete_var, which ends up taking the second branch
and pushing {a, NULL_TREE} onto incomplete_vars.  We later ICE from
complete_vars due to this NULL_TREE class context.

Judging by the two commits that introduced/modified this part of
maybe_register_incomplete_var, r196852 and r214333, ISTM this branch
is really only concerned with constexpr static data members (whose
initializer may contain a pointer-to-member for a currently open class).
So this patch restricts this branch accordingly so it's not inadvertently
taken during stream-in.

gcc/cp/ChangeLog:

* decl.cc (maybe_register_incomplete_var): Restrict second
branch to static data members from a currently open class.

gcc/testsuite/ChangeLog:

* g++.dg/modules/cexpr-4_a.C: New test.
* g++.dg/modules/cexpr-4_b.C: New test.
---
  gcc/cp/decl.cc   |  2 ++
  gcc/testsuite/g++.dg/modules/cexpr-4_a.C | 12 
  gcc/testsuite/g++.dg/modules/cexpr-4_b.C |  6 ++
  3 files changed, 20 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-4_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-4_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e47f694e4e5..82b5bc83927 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -18976,6 +18976,8 @@ maybe_register_incomplete_var (tree var)
  vec_safe_push (incomplete_vars, iv);
}
else if (!(DECL_LANG_SPECIFIC (var) && DECL_TEMPLATE_INFO (var))
+  && DECL_CLASS_SCOPE_P (var)
+  && currently_open_class (DECL_CONTEXT (var))


I think TYPE_BEING_DEFINED (as used a few lines up) would be a bit 
better than currently_open_class?  OK with that change.


Jason



[PATCH v5] c++: implement [[gnu::non_owning]] [PR110358]

2024-03-01 Thread Marek Polacek
On Fri, Mar 01, 2024 at 01:19:40PM -0500, Jason Merrill wrote:
> On 3/1/24 12:39, Marek Polacek wrote:
> >   @option{-Wdangling-reference} also warns about code like
> >   @smallexample
> > @@ -3932,6 +3935,10 @@ struct Span @{
> >   as @code{std::span}-like; that is, the class is a non-union class
> >   that has a pointer data member and a trivial destructor.
> > +The warning can be disabled by using the @code{gnu::no_dangling} attribute
> > +on a function (@pxref{Common Function Attributes}), or a class type
> > +(@pxref{C++ Attributes}).
> 
> It seems surprising that one is in a generic attributes section and the
> other in the C++-specific section.  Maybe both uses could be covered in the
> C++ attributes section?

Arg yes, definitely.  Done here.
 
> >   This warning is enabled by @option{-Wall}.
> >   @opindex Wdelete-non-virtual-dtor
> > diff --git a/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C 
> > b/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
> > new file mode 100644
> > index 000..02eabbc5003
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
> > @@ -0,0 +1,38 @@
> > +// { dg-do compile { target c++11 } }
> > +// { dg-options "-Wdangling-reference" }
> > +
> > +int g = 42;
> > +
> > +struct [[gnu::no_dangling]] A {
> > +  int *i;
> > +  int &foo() { return *i; }
> > +};
> > +
> > +struct A2 {
> > +  int *i;
> > +  [[gnu::no_dangling]] int &foo() { return *i; }
> > +  [[gnu::no_dangling]] static int &bar (const int &) { return *&g; }
> > +};
> > +
> > +union [[gnu::no_dangling]] U { };
> > +
> > +A a() { return A{&g}; }
> > +A2 a2() { return A2{&g}; }
> > +
> > +class X { };
> > +const X x1;
> > +const X x2;
> > +
> > +[[gnu::no_dangling]] const X& get(const int& i)
> > +{
> > +   return i == 0 ? x1 : x2;
> > +}
> > +
> > +void
> > +test ()
> > +{
> > +  [[maybe_unused]] const X& x = get (10);  // { dg-bogus "dangling" }
> > +  [[maybe_unused]] const int &i = a().foo();   // { dg-bogus 
> > "dangling" }
> > +  [[maybe_unused]] const int &j = a2().foo();  // { dg-bogus 
> > "dangling" }
> > +  [[maybe_unused]] const int &k = a2().bar(10);// { dg-bogus 
> > "dangling" }
> > +}
> 
> Do you want to add destructors to A/A2 like you did in other tests?

Added.  I think this test predates the recent heuristic.

Ok for trunk?

-- >8 --
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site.  But #pragma GCC diagnostic tend
to be onerous.  A better solution would be to have an attribute.

To that end, this patch adds a new attribute, [[gnu::no_dangling]].
This attribute takes an optional bool argument to support cases like:

  template 
  struct [[gnu::no_dangling(std::is_reference_v)]] S {
 // ...
  };

PR c++/110358
PR c++/109642

gcc/cp/ChangeLog:

* call.cc (no_dangling_p): New.
(reference_like_class_p): Use it.
(do_warn_dangling_reference): Use it.  Don't warn when the function
or its enclosing class has attribute gnu::no_dangling.
* tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
(handle_no_dangling_attribute): New.

gcc/ChangeLog:

* doc/extend.texi: Document gnu::no_dangling.
* doc/invoke.texi: Mention that gnu::no_dangling disables
-Wdangling-reference.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-no-dangling1.C: New test.
* g++.dg/ext/attr-no-dangling2.C: New test.
* g++.dg/ext/attr-no-dangling3.C: New test.
* g++.dg/ext/attr-no-dangling4.C: New test.
* g++.dg/ext/attr-no-dangling5.C: New test.
* g++.dg/ext/attr-no-dangling6.C: New test.
* g++.dg/ext/attr-no-dangling7.C: New test.
* g++.dg/ext/attr-no-dangling8.C: New test.
* g++.dg/ext/attr-no-dangling9.C: New test.
---
 gcc/cp/call.cc   | 38 ++--
 gcc/cp/tree.cc   | 26 
 gcc/doc/extend.texi  | 47 ++
 gcc/doc/invoke.texi  |  6 ++
 gcc/testsuite/g++.dg/ext/attr-no-dangling1.C | 40 
 gcc/testsuite/g++.dg/ext/attr-no-dangling2.C | 29 +
 gcc/testsuite/g++.dg/ext/attr-no-dangling3.C | 24 
 gcc/testsuite/g++.dg/ext/attr-no-dangling4.C | 14 +
 gcc/testsuite/g++.dg/ext/attr-no-dangling5.C | 31 ++
 gcc/testsuite/g++.dg/ext/attr-no-dangling6.C | 65 
 gcc/testsuite/g++.dg/ext/attr-no-dangling7.C | 31 ++
 gcc/testsuite/g++.dg/ext/attr-no-dangling8.C | 30 +
 gcc/testsuite/g++.dg/ext/attr-no-dangling9.C | 25 
 13 files changed, 400 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling3.C
 create mo

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 3/1/24 13:28, Patrick Palka wrote:
> > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > 
> > > On 3/1/24 12:08, Patrick Palka wrote:
> > > > On Fri, 1 Mar 2024, Patrick Palka wrote:
> > > > 
> > > > > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > > > > 
> > > > > > On 3/1/24 10:00, Patrick Palka wrote:
> > > > > > > On Fri, 1 Mar 2024, Jason Merrill wrote:
> > > > > > > 
> > > > > > > > On 2/29/24 15:56, Patrick Palka wrote:
> > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
> > > > > > > > > look
> > > > > > > > > OK for trunk?
> > > > > > > > > 
> > > > > > > > > -- >8 --
> > > > > > > > > 
> > > > > > > > > For local enums defined in a non-template function or a
> > > > > > > > > function
> > > > > > > > > template
> > > > > > > > > instantiation it seems we neglect to make the function depend
> > > > > > > > > on
> > > > > > > > > the
> > > > > > > > > enum
> > > > > > > > > definition, which ultimately causes streaming to fail due to
> > > > > > > > > the
> > > > > > > > > enum
> > > > > > > > > definition not being streamed before uses of its enumerators
> > > > > > > > > are
> > > > > > > > > streamed,
> > > > > > > > > as far as I can tell.
> > > > > > > > 
> > > > > > > > I would think that the function doesn't need to depend on the
> > > > > > > > local
> > > > > > > > enum
> > > > > > > > in
> > > > > > > > order for the local enum to be streamed before the use of the
> > > > > > > > enumerator,
> > > > > > > > which comes after the definition of the enum in the function
> > > > > > > > body?
> > > > > > > > 
> > > > > > > > Why isn't streaming the body of the function outputting the enum
> > > > > > > > definition
> > > > > > > > before the use of the enumerator?
> > > > > > > 
> > > > > > > IIUC (based on observing the behavior for local classes) streaming
> > > > > > > the
> > > > > > > definition of a local class/enum as part of the function
> > > > > > > definition is
> > > > > > > what we want to avoid; we want to treat a local type definition as
> > > > > > > a
> > > > > > > logically separate definition and stream it separately (similar
> > > > > > > to class defns vs member defns I guess).  And by not registering a
> > > > > > > dependency
> > > > > > > between the function and the local enum, we end up never streaming
> > > > > > > out
> > > > > > > the local enum definition separately and instead stream it out as
> > > > > > > part
> > > > > > > of the function definition (accidentally) which we then can't
> > > > > > > stream
> > > > > > > in
> > > > > > > properly.
> > > > > > > 
> > > > > > > Perhaps the motivation for treating local type definitions as
> > > > > > > logically
> > > > > > > separate from the function definition is because they can leak out
> > > > > > > of
> > > > > > > a
> > > > > > > function with a deduced return type:
> > > > > > > 
> > > > > > >  auto f() {
> > > > > > >struct A { };
> > > > > > >return A();
> > > > > > >  }
> > > > > > > 
> > > > > > >  using type = decltype(f()); // refers directly to f()::A
> > > > > > 
> > > > > > Yes, I believe that's what modules.cc refers to as a "voldemort".
> > > > > > 
> > > > > > But for non-voldemort local types, the declaration of the function
> > > > > > doesn't
> > > > > > depend on them, only the definition.  Why doesn't streaming them in
> > > > > > the
> > > > > > definition work properly?
> > > > > 
> > > > > I should note that for a templated local type we already always add a
> > > > > dependency between the function template _pattern_ and the local type
> > > > > _pattern_ and therefore always stream the local type pattern
> > > > > separately
> > > > > (even if its not actually a voldemort), thanks to the TREE_CODE (decl)
> > > > > ==
> > > > > TEMPLATE_DECL
> > > > > case guarding the add_dependency call (inside a template pattern we
> > > > > see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
> > > > > missing only when the function is a non-template or
> > > > > non-template-pattern.
> > > > > My patch makes us consistently add the dependency and in turn
> > > > > consistently
> > > > > stream the definitions separately.
> > > > > 
> > > > > (For a local _class_, in the non-template and non-template-pattern
> > > > > case
> > > > > we currently add a dependency between the function and the
> > > > > injected-class-name of the class as opposed to the class itself, which
> > > > > seems quite accidental but suffices.  And that's why only local enums
> > > > > are problematic currently.  After my patch we instead add a dependency
> > > > > to the local class itself.)
> > > > > 
> > > > > Part of the puzzle of why we don't/can't stream them as part of the
> > > > > function definition is because we don't mark the enumerators for
> > > > > by-value walking when marking the function definition.  So when
> > > > > streaming out the enumerator definition we stream out _references_
> > > > > to the 

Re: [PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-03-01 Thread Jason Merrill

On 3/1/24 14:34, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 3/1/24 13:28, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 3/1/24 12:08, Patrick Palka wrote:

On Fri, 1 Mar 2024, Patrick Palka wrote:


On Fri, 1 Mar 2024, Jason Merrill wrote:


On 3/1/24 10:00, Patrick Palka wrote:

On Fri, 1 Mar 2024, Jason Merrill wrote:


On 2/29/24 15:56, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a
function
template
instantiation it seems we neglect to make the function depend
on
the
enum
definition, which ultimately causes streaming to fail due to
the
enum
definition not being streamed before uses of its enumerators
are
streamed,
as far as I can tell.


I would think that the function doesn't need to depend on the
local
enum
in
order for the local enum to be streamed before the use of the
enumerator,
which comes after the definition of the enum in the function
body?

Why isn't streaming the body of the function outputting the enum
definition
before the use of the enumerator?


IIUC (based on observing the behavior for local classes) streaming
the
definition of a local class/enum as part of the function
definition is
what we want to avoid; we want to treat a local type definition as
a
logically separate definition and stream it separately (similar
to class defns vs member defns I guess).  And by not registering a
dependency
between the function and the local enum, we end up never streaming
out
the local enum definition separately and instead stream it out as
part
of the function definition (accidentally) which we then can't
stream
in
properly.

Perhaps the motivation for treating local type definitions as
logically
separate from the function definition is because they can leak out
of
a
function with a deduced return type:

  auto f() {
struct A { };
return A();
  }

  using type = decltype(f()); // refers directly to f()::A


Yes, I believe that's what modules.cc refers to as a "voldemort".

But for non-voldemort local types, the declaration of the function
doesn't
depend on them, only the definition.  Why doesn't streaming them in
the
definition work properly?


I should note that for a templated local type we already always add a
dependency between the function template _pattern_ and the local type
_pattern_ and therefore always stream the local type pattern
separately
(even if its not actually a voldemort), thanks to the TREE_CODE (decl)
==
TEMPLATE_DECL
case guarding the add_dependency call (inside a template pattern we
see the TEMPLATE_DECL of the local TYPE_DECL).  The dependency is
missing only when the function is a non-template or
non-template-pattern.
My patch makes us consistently add the dependency and in turn
consistently
stream the definitions separately.

(For a local _class_, in the non-template and non-template-pattern
case
we currently add a dependency between the function and the
injected-class-name of the class as opposed to the class itself, which
seems quite accidental but suffices.  And that's why only local enums
are problematic currently.  After my patch we instead add a dependency
to the local class itself.)

Part of the puzzle of why we don't/can't stream them as part of the
function definition is because we don't mark the enumerators for
by-value walking when marking the function definition.  So when
streaming out the enumerator definition we stream out _references_
to the enumerators (tt_const_decl tags) instead of the actual
definitions which breaks stream-in.

The best place to mark local types for by-value walking would be
in trees_out::mark_function_def which is suspiciously empty!  I
experimented with (never mind that it only marks the outermost block's
types):

@@ -11713,8 +11713,12 @@ trees_out::write_function_def (tree decl)
}
  void
-trees_out::mark_function_def (tree)
+trees_out::mark_function_def (tree decl)
{
+  tree initial = DECL_INITIAL (decl);
+  for (tree var = BLOCK_VARS (initial); var; var = DECL_CHAIN (var))
+if (DECL_IMPLICIT_TYPEDEF_P (var))
+  mark_declaration (var, true);
}

Which actually fixes the non-template PR104919 testcase, but it
breaks streaming of templated local types wherein we run into
the sanity check:

@@ -7677,16 +7677,6 @@ trees_out::decl_value (tree decl, depset *dep)
merge_kind mk = get_merge_kind (decl, dep);
!  if (CHECKING_P)
!{
!  /* Never start in the middle of a template.  */
!  int use_tpl = -1;
!  if (tree ti = node_template_info (decl, use_tpl))
!   gcc_checking_assert (TREE_CODE (TI_TEMPLATE (ti)) == OVERLOAD
!|| TREE_CODE (TI_TEMPLATE (ti)) ==
FIELD_DECL
!|| (DECL_TEMPLATE_RESULT (TI_TEMPLATE
(ti))
!!= decl));
!}
if (streaming_p ())
{

If we try to work around this sanity check by o

Re: [PATCH v5] c++: implement [[gnu::non_owning]] [PR110358]

2024-03-01 Thread Jason Merrill

On 3/1/24 14:24, Marek Polacek wrote:

On Fri, Mar 01, 2024 at 01:19:40PM -0500, Jason Merrill wrote:

On 3/1/24 12:39, Marek Polacek wrote:

   @option{-Wdangling-reference} also warns about code like
   @smallexample
@@ -3932,6 +3935,10 @@ struct Span @{
   as @code{std::span}-like; that is, the class is a non-union class
   that has a pointer data member and a trivial destructor.
+The warning can be disabled by using the @code{gnu::no_dangling} attribute
+on a function (@pxref{Common Function Attributes}), or a class type
+(@pxref{C++ Attributes}).


It seems surprising that one is in a generic attributes section and the
other in the C++-specific section.  Maybe both uses could be covered in the
C++ attributes section?


Arg yes, definitely.  Done here.
  

   This warning is enabled by @option{-Wall}.
   @opindex Wdelete-non-virtual-dtor
diff --git a/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C 
b/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
new file mode 100644
index 000..02eabbc5003
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
@@ -0,0 +1,38 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wdangling-reference" }
+
+int g = 42;
+
+struct [[gnu::no_dangling]] A {
+  int *i;
+  int &foo() { return *i; }
+};
+
+struct A2 {
+  int *i;
+  [[gnu::no_dangling]] int &foo() { return *i; }
+  [[gnu::no_dangling]] static int &bar (const int &) { return *&g; }
+};
+
+union [[gnu::no_dangling]] U { };
+
+A a() { return A{&g}; }
+A2 a2() { return A2{&g}; }
+
+class X { };
+const X x1;
+const X x2;
+
+[[gnu::no_dangling]] const X& get(const int& i)
+{
+   return i == 0 ? x1 : x2;
+}
+
+void
+test ()
+{
+  [[maybe_unused]] const X& x = get (10);  // { dg-bogus "dangling" }
+  [[maybe_unused]] const int &i = a().foo();   // { dg-bogus "dangling" }
+  [[maybe_unused]] const int &j = a2().foo();  // { dg-bogus "dangling" }
+  [[maybe_unused]] const int &k = a2().bar(10);// { dg-bogus "dangling" }
+}


Do you want to add destructors to A/A2 like you did in other tests?


Added.  I think this test predates the recent heuristic.

Ok for trunk?

-- >8 --
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site.  But #pragma GCC diagnostic tend
to be onerous.  A better solution would be to have an attribute.

To that end, this patch adds a new attribute, [[gnu::no_dangling]].
This attribute takes an optional bool argument to support cases like:

   template 
   struct [[gnu::no_dangling(std::is_reference_v)]] S {
  // ...
   };

PR c++/110358
PR c++/109642

gcc/cp/ChangeLog:

* call.cc (no_dangling_p): New.
(reference_like_class_p): Use it.
(do_warn_dangling_reference): Use it.  Don't warn when the function
or its enclosing class has attribute gnu::no_dangling.
* tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
(handle_no_dangling_attribute): New.

gcc/ChangeLog:

* doc/extend.texi: Document gnu::no_dangling.
* doc/invoke.texi: Mention that gnu::no_dangling disables
-Wdangling-reference.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-no-dangling1.C: New test.
* g++.dg/ext/attr-no-dangling2.C: New test.
* g++.dg/ext/attr-no-dangling3.C: New test.
* g++.dg/ext/attr-no-dangling4.C: New test.
* g++.dg/ext/attr-no-dangling5.C: New test.
* g++.dg/ext/attr-no-dangling6.C: New test.
* g++.dg/ext/attr-no-dangling7.C: New test.
* g++.dg/ext/attr-no-dangling8.C: New test.
* g++.dg/ext/attr-no-dangling9.C: New test.
---
  gcc/cp/call.cc   | 38 ++--
  gcc/cp/tree.cc   | 26 
  gcc/doc/extend.texi  | 47 ++
  gcc/doc/invoke.texi  |  6 ++
  gcc/testsuite/g++.dg/ext/attr-no-dangling1.C | 40 
  gcc/testsuite/g++.dg/ext/attr-no-dangling2.C | 29 +
  gcc/testsuite/g++.dg/ext/attr-no-dangling3.C | 24 
  gcc/testsuite/g++.dg/ext/attr-no-dangling4.C | 14 +
  gcc/testsuite/g++.dg/ext/attr-no-dangling5.C | 31 ++
  gcc/testsuite/g++.dg/ext/attr-no-dangling6.C | 65 
  gcc/testsuite/g++.dg/ext/attr-no-dangling7.C | 31 ++
  gcc/testsuite/g++.dg/ext/attr-no-dangling8.C | 30 +
  gcc/testsuite/g++.dg/ext/attr-no-dangling9.C | 25 
  13 files changed, 400 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling2.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling3.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling4.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling5.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-n

[PATCH] gcc_update: Add missing generated files

2024-03-01 Thread Jonathan Wakely
Is this OK for trunk?

-- >8 --

I'm seeing errors for --enable-maintainer-mode builds due to incorrectly
regenerating these files. They should be touched by gcc_update so they
aren't regenerated unnecessarily.

contrib/ChangeLog:

* gcc_update: Add more generated files in libcc1, lto-plugin,
fixincludes, and libstdc++-v3.
---
 contrib/gcc_update | 8 
 1 file changed, 8 insertions(+)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 774c926e723..fac86d0e33e 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -70,6 +70,7 @@ files_and_dependencies () {
 # fixincludes
 fixincludes/configure: fixincludes/configure.ac fixincludes/aclocal.m4
 fixincludes/config.h.in: fixincludes/configure.ac fixincludes/aclocal.m4
+fixincludes/fixincl.x: fixincludes/inclhack.def fixincludes/fixincl.tpl
 # intl library
 intl/plural.c: intl/plural.y
 intl/plural-config.h: intl/plural.y
@@ -106,6 +107,7 @@ gcc/testsuite/gcc.dg/cpp/_Pragma3.c: 
gcc/testsuite/gcc.dg/cpp/mi1c.h
 # direct2s.c:35: warning: current file is older than direct2.c
 gcc/testsuite/gcc.dg/cpp/direct2s.c: gcc/testsuite/gcc.dg/cpp/direct2.c
 # lto-plugin
+lto-plugin/aclocal.m4: lto-plugin/configure.ac
 lto-plugin/configure: lto-plugin/configure.ac lto-plugin/aclocal.m4
 lto-plugin/Makefile.in: lto-plugin/Makefile.am lto-plugin/aclocal.m4
 # tools
@@ -186,7 +188,13 @@ libphobos/config.h.in: libphobos/configure.ac 
libphobos/aclocal.m4
 libphobos/configure: libphobos/configure.ac libphobos/aclocal.m4
 libphobos/src/Makefile.in: libphobos/src/Makefile.am libphobos/aclocal.m4
 libphobos/testsuite/Makefile.in: libphobos/testsuite/Makefile.am 
libphobos/aclocal.m4
+libstdc++-v3/aclocal.m4: libstdc++-v3/configure.ac libstdc++-v3/acinclude.m4
+libstdc++-v3/Makefile.in: libstdc++-v3/Makefile.am libstdc++-v3/aclocal.m4
+libstdc++-v3/configure: libstdc++-v3/configure.ac libstdc++-v3/acinclude.m4
 libstdc++-v3/include/bits/version.h: libstdc++-v3/include/bits/version.def 
libstdc++-v3/include/bits/version.tpl
+libcc1/aclocal.m4: libcc1/configure.ac
+libcc1/Makefile.in: libcc1/Makefile.am libcc1/configure.ac libcc1/aclocal.m4
+libcc1/configure: libcc1/configure.ac
 # Top level
 Makefile.in: Makefile.tpl Makefile.def
 configure: configure.ac config/acx.m4
-- 
2.43.2



Re: [PATCH v5] c++: implement [[gnu::non_owning]] [PR110358]

2024-03-01 Thread Patrick Palka
On Fri, 1 Mar 2024, Jason Merrill wrote:

> On 3/1/24 14:24, Marek Polacek wrote:
> > On Fri, Mar 01, 2024 at 01:19:40PM -0500, Jason Merrill wrote:
> > > On 3/1/24 12:39, Marek Polacek wrote:
> > > >@option{-Wdangling-reference} also warns about code like
> > > >@smallexample
> > > > @@ -3932,6 +3935,10 @@ struct Span @{
> > > >as @code{std::span}-like; that is, the class is a non-union class
> > > >that has a pointer data member and a trivial destructor.
> > > > +The warning can be disabled by using the @code{gnu::no_dangling}
> > > > attribute
> > > > +on a function (@pxref{Common Function Attributes}), or a class type
> > > > +(@pxref{C++ Attributes}).
> > > 
> > > It seems surprising that one is in a generic attributes section and the
> > > other in the C++-specific section.  Maybe both uses could be covered in
> > > the
> > > C++ attributes section?
> > 
> > Arg yes, definitely.  Done here.
> >   
> > > >This warning is enabled by @option{-Wall}.
> > > >@opindex Wdelete-non-virtual-dtor
> > > > diff --git a/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
> > > > b/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
> > > > new file mode 100644
> > > > index 000..02eabbc5003
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
> > > > @@ -0,0 +1,38 @@
> > > > +// { dg-do compile { target c++11 } }
> > > > +// { dg-options "-Wdangling-reference" }
> > > > +
> > > > +int g = 42;
> > > > +
> > > > +struct [[gnu::no_dangling]] A {
> > > > +  int *i;
> > > > +  int &foo() { return *i; }
> > > > +};
> > > > +
> > > > +struct A2 {
> > > > +  int *i;
> > > > +  [[gnu::no_dangling]] int &foo() { return *i; }
> > > > +  [[gnu::no_dangling]] static int &bar (const int &) { return *&g; }
> > > > +};
> > > > +
> > > > +union [[gnu::no_dangling]] U { };
> > > > +
> > > > +A a() { return A{&g}; }
> > > > +A2 a2() { return A2{&g}; }
> > > > +
> > > > +class X { };
> > > > +const X x1;
> > > > +const X x2;
> > > > +
> > > > +[[gnu::no_dangling]] const X& get(const int& i)
> > > > +{
> > > > +   return i == 0 ? x1 : x2;
> > > > +}
> > > > +
> > > > +void
> > > > +test ()
> > > > +{
> > > > +  [[maybe_unused]] const X& x = get (10);  // { dg-bogus
> > > > "dangling" }
> > > > +  [[maybe_unused]] const int &i = a().foo();   // { dg-bogus
> > > > "dangling" }
> > > > +  [[maybe_unused]] const int &j = a2().foo();  // { dg-bogus
> > > > "dangling" }
> > > > +  [[maybe_unused]] const int &k = a2().bar(10);// { dg-bogus
> > > > "dangling" }
> > > > +}
> > > 
> > > Do you want to add destructors to A/A2 like you did in other tests?
> > 
> > Added.  I think this test predates the recent heuristic.
> > 
> > Ok for trunk?
> > 
> > -- >8 --
> > Since -Wdangling-reference has false positives that can't be
> > prevented, we should offer an easy way to suppress the warning.
> > Currently, that is only possible by using a #pragma, either around the
> > enclosing class or around the call site.  But #pragma GCC diagnostic tend
> > to be onerous.  A better solution would be to have an attribute.
> > 
> > To that end, this patch adds a new attribute, [[gnu::no_dangling]].
> > This attribute takes an optional bool argument to support cases like:
> > 
> >template 
> >struct [[gnu::no_dangling(std::is_reference_v)]] S {
> >   // ...
> >};
> > 
> > PR c++/110358
> > PR c++/109642
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (no_dangling_p): New.
> > (reference_like_class_p): Use it.
> > (do_warn_dangling_reference): Use it.  Don't warn when the function
> > or its enclosing class has attribute gnu::no_dangling.
> > * tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
> > (handle_no_dangling_attribute): New.
> > 
> > gcc/ChangeLog:
> > 
> > * doc/extend.texi: Document gnu::no_dangling.
> > * doc/invoke.texi: Mention that gnu::no_dangling disables
> > -Wdangling-reference.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/ext/attr-no-dangling1.C: New test.
> > * g++.dg/ext/attr-no-dangling2.C: New test.
> > * g++.dg/ext/attr-no-dangling3.C: New test.
> > * g++.dg/ext/attr-no-dangling4.C: New test.
> > * g++.dg/ext/attr-no-dangling5.C: New test.
> > * g++.dg/ext/attr-no-dangling6.C: New test.
> > * g++.dg/ext/attr-no-dangling7.C: New test.
> > * g++.dg/ext/attr-no-dangling8.C: New test.
> > * g++.dg/ext/attr-no-dangling9.C: New test.
> > ---
> >   gcc/cp/call.cc   | 38 ++--
> >   gcc/cp/tree.cc   | 26 
> >   gcc/doc/extend.texi  | 47 ++
> >   gcc/doc/invoke.texi  |  6 ++
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling1.C | 40 
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling2.C | 29 +
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling3.C | 24 
> >   gcc/testsuite/g++.dg/ext/attr-no-dangling4.C | 14 +
> >   gcc/testsuite/g++.dg/e

Re: [PATCH v5] c++: implement [[gnu::non_owning]] [PR110358]

2024-03-01 Thread Jason Merrill

On 3/1/24 16:23, Patrick Palka wrote:


Sorry for not asking this sooner, but does it matter whether we attach
the attribute to the function type rather than the function declaration?
I noticed e.g. nodiscard gets attached to the decl.

And we document it as a function attribute despite attaching it to the
function type.


I think it doesn't matter much, some attributes are represented on the 
type and some on the decl.  Might be a bit better on the decl but I 
wasn't worrying about it.


Jason



[14 regression] Fix insn types in risc-v port

2024-03-01 Thread Jeff Law
So one of the broad goals we've had over the last few months has been to 
ensure that every insn has a scheduling type and that every insn is 
associated with an insn reservation in the scheduler.


This avoids some amazingly bad behavior in the scheduler.  I won't go 
through the gory details.


I was recently analyzing a code quality regression with dhrystone (ugh!) 
and one of the issues was poor scheduling which lengthened the lifetime 
of a pseudo and ultimately resulted in needing an additional callee 
saved register save/restore.


This was ultimately tracked down incorrect types on a few patterns.  So 
I did an audit of all the patterns that had types added/changed as part 
of this effort and found a variety of problems, primarily in the various 
move patterns and extension patterns.  This is a regression relative to 
gcc-13.



Naturally the change in types affects scheduling, which in turn changes 
the precise code we generate and causes some testsuite fallout.


I considered updating the regexps since the change in the resulting 
output is pretty consistent.  But of course the test would still be 
sensitive to things like load latency.  So instead I just turned off the 
2nd phase scheduler in the affected tests.


Bootstrapped and regression tested on rv64gc-linux-gnu.

Pushing to the trunk.

jeff
gcc
* config/riscv/riscv.md (zero_extendqi2_internal): Fix
type attribute.
(extendsidi2_internal, movhf_hardfloat, movhf_softfloat): Likewise.
(movdi_32bit, movdi_64bit, movsi_internal): Likewise.
(movhi_internal, movqi_internal): Likewise.
(movsf_softfloat, movsf_hardfloat): Likewise.
(movdf_hardfloat_rv32, movdf_hardfloat_rv64): Likewise.
(movdf_softfloat): Likewise.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Turn off
second phase scheduler.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: Likewise.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 1fec13092e2..b16ed97909c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1836,7 +1836,7 @@ (define_insn "*zero_extendqi2_internal"
andi\t%0,%1,0xff
lbu\t%0,%1"
   [(set_attr "move_type" "andi,load")
-   (set_attr "type" "multi")
+   (set_attr "type" "arith,load")
(set_attr "mode" "")])
 
 ;;
@@ -1861,7 +1861,7 @@ (define_insn "*extendsidi2_internal"
sext.w\t%0,%1
lw\t%0,%1"
   [(set_attr "move_type" "move,load")
-   (set_attr "type" "multi")
+   (set_attr "type" "move,load")
(set_attr "mode" "DI")])
 
 (define_expand "extend2"
@@ -1938,7 +1938,7 @@ (define_insn "*movhf_hardfloat"
|| reg_or_0_operand (operands[1], HFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" 
"fmove,fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
-   (set_attr "type" "fmove")
+   (set_attr "type" 
"fmove,fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
(set_attr "mode" "HF")])
 
 (define_insn "*movhf_softfloat"
@@ -1949,7 +1949,7 @@ (define_insn "*movhf_softfloat"
|| reg_or_0_operand (operands[1], HFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" "fmove,move,load,store,mtc,mfc")
-   (set_attr "type" "fmove")
+   (set_attr "type" "fmove,move,load,store,mtc,mfc")
(set_attr "mode" "HF")])
 
 (define_insn "*movhf_softfloat_boxing"
@@ -2182,7 +2182,7 @@ (define_insn "*movdi_32bit"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" 
"move,const,load,store,mtc,fpload,mfc,fmove,fpstore,rdvlenb")
(set_attr "mode" "DI")
-   (set_attr "type" "move")
+   (set_attr "type" "move,move,load,store,move,fpload,move,fmove,fpstore,move")
(set_attr "ext" "base,base,base,base,d,d,d,d,d,vector")])
 
 (define_insn "*movdi_64bit"
@@ -2194,7 +2194,7 @@ (define_insn "*movdi_64bit"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" 
"move,const,load,store,mtc,fpload,mfc,fmove,fpstore,rdvlenb")
(set_attr "mode" "DI")
-   (set_attr "type" "move")
+   (set_attr "type" "move,move,load,store,mtc,fpload,mfc,fmove,fpstore,move")
(set_attr "ext" "base,base,base,base,d,d,d,d,d,vector")])
 
 ;; 32-bit Integer moves
@@ -2217,7 +2217,7 @@ (define_insn "*movsi_internal"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" 
"move,const,load,store,mtc,fpload,mfc,fpstore,rdvlenb")
(set_attr "mode" "SI")
-   (set_attr "type" "move")
+   (set_attr "type" "move,move,load,store,mtc,fpload,mfc,fps

[PATCH] middle-end: Fix dominator information with loop duplication PR114197

2024-03-01 Thread Edwin Lu
When adding the new_preheader to the cfg, only the new_preheader's dominator
information is updated. If one of the new basic block's children was part
of the original cfg and adding new_preheader to the cfg introduces another path
to that child, the child's dominator information will not be updated. This may
cause verify_dominator's assertion to fail.

Force recalculating dominators for all duplicated basic blocks and their
successors when updating new_preheader's dominator information.

PR 114197

gcc/ChangeLog:

* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Recalculate dominator info when adding new_preheader to cfg

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr114197.c: New test.

Signed-off-by: Edwin Lu 
---
 gcc/testsuite/gcc.dg/vect/pr114197.c | 18 ++
 gcc/tree-vect-loop-manip.cc  | 17 -
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr114197.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114197.c 
b/gcc/testsuite/gcc.dg/vect/pr114197.c
new file mode 100644
index 000..b1fb807729c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114197.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+
+#pragma pack(push)
+struct a {
+  volatile signed b : 8;
+};
+#pragma pack(pop)
+int c;
+static struct a d = {5};
+void e() {
+f:
+  for (c = 8; c < 55; ++c)
+if (!d.b)
+  goto f;
+}
+
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f72da915103..0f3a489e78c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1840,7 +1840,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
}
 
   if (was_imm_dom || duplicate_outer_loop)
-   set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
+   {
+ set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
+
+ /* Update the dominator info for children of duplicated bbs.  */
+ for (unsigned i = 0; i < scalar_loop->num_nodes; i++)
+   {
+ basic_block dom_bb = NULL;
+ edge e;
+ edge_iterator ei;
+ FOR_EACH_EDGE (e, ei, new_bbs[i]->succs)
+   {
+ dom_bb = recompute_dominator (CDI_DOMINATORS, e->dest);
+ set_immediate_dominator (CDI_DOMINATORS, e->dest, dom_bb);
+   }
+   }
+   }
 
   /* And remove the non-necessary forwarder again.  Keep the other
  one so we have a proper pre-header for the loop at the exit edge.  */
-- 
2.34.1



Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-03-01 Thread John David Anglin

On 2024-03-01 3:44 a.m., Jakub Jelinek wrote:

Isn't this just that you have 50 in there?

No.  It's okay.

The problem is we run out of memory caused by a "ulimit -s 81920" statement 
that I had
in .bashrc.  The test pass with default stack allocation.

clone(child_stack=0x3191040, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, 
parent_tid=[1108], tls=0x81918c0, child_tidptr=0x8191468) = 1108

rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
mmap2(NULL, 83890176, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 
-1 ENOMEM (Cannot allocate memory)

Will revert change to tests.

Dave

--
John David Anglin  dave.ang...@bell.net



[PATCH 03/11] gcc/doc/extend.texi: Add documentation for __is_bounded_array

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_bounded_array): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5aeb9bdd47a..4c8c0631ca7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29596,6 +29596,11 @@ type (disregarding cv-qualifiers), @var{derived_type} 
shall be a complete
 type.  A diagnostic is produced if this requirement is not met.
 @enddefbuiltin
 
+@defbuiltin{bool __is_bounded_array (@var{type})}
+If @var{type} is an array type of known bound ([dcl.array])
+the trait is @code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_class (@var{type})}
 If @var{type} is a cv-qualified class type, and not a union type
 ([basic.compound]) the trait is @code{true}, else it is @code{false}.
-- 
2.44.0



[PATCH 09/11] gcc/doc/extend.texi: Add documentation for __is_reference

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_reference): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 05f864e3dd5..d36707fcdf3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29679,6 +29679,11 @@ is @code{true}, else it is @code{false}.
 Requires: If @var{type} is a non-union class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __is_reference (@var{type})}
+If @var{type} is a reference type ([dcl.ref]) the trait is @code{true},
+else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_standard_layout (@var{type})}
 If @var{type} is a standard-layout type ([basic.types]) the trait is
 @code{true}, else it is @code{false}.
-- 
2.44.0



[PATCH 07/11] gcc/doc/extend.texi: Add documentation for __is_member_pointer

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_member_pointer): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index df2df98567a..08276f734f2 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29656,6 +29656,11 @@ If @var{type} is a pointer to member object 
([dcl.mptr]) the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{bool __is_member_pointer (@var{type})}
+If @var{type} is a pointer to member ([dcl.mptr]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH 06/11] gcc/doc/extend.texi: Add documentation for __is_member_object_pointer

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_member_object_pointer): New
documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 9361b425ba1..df2df98567a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29651,6 +29651,11 @@ If @var{type} is a pointer to member function 
([dcl.mptr]) the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{bool __is_member_object_pointer (@var{type})}
+If @var{type} is a pointer to member object ([dcl.mptr]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH 01/11] gcc/doc/extend.texi: Sort built-in traits alphabetically

2024-03-01 Thread Ken Matsui
This patch sorts built-in traits alphabetically for better codebase
consistency and easier future integration of changes.

gcc/ChangeLog:

* doc/extend.texi (Type Traits): Sort built-in traits
alphabetically.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 62 ++---
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f679c81acf2..b13f9d6f934 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29499,15 +29499,6 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
-@defbuiltin{bool __has_nothrow_copy (@var{type})}
-If @code{__has_trivial_copy (type)} is @code{true} then the trait is
-@code{true}, else if @var{type} is a cv-qualified class or union type
-with copy constructors that are known not to throw an exception then
-the trait is @code{true}, else it is @code{false}.
-Requires: @var{type} shall be a complete type, (possibly cv-qualified)
-@code{void}, or an array of unknown bound.
-@enddefbuiltin
-
 @defbuiltin{bool __has_nothrow_constructor (@var{type})}
 If @code{__has_trivial_constructor (type)} is @code{true} then the trait
 is @code{true}, else if @var{type} is a cv class or union type (or array
@@ -29517,6 +29508,15 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
+@defbuiltin{bool __has_nothrow_copy (@var{type})}
+If @code{__has_trivial_copy (type)} is @code{true} then the trait is
+@code{true}, else if @var{type} is a cv-qualified class or union type
+with copy constructors that are known not to throw an exception then
+the trait is @code{true}, else it is @code{false}.
+Requires: @var{type} shall be a complete type, (possibly cv-qualified)
+@code{void}, or an array of unknown bound.
+@enddefbuiltin
+
 @defbuiltin{bool __has_trivial_assign (@var{type})}
 If @var{type} is @code{const}- qualified or is a reference type then
 the trait is @code{false}.  Otherwise if @code{__is_trivial (type)} is
@@ -29527,15 +29527,6 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
-@defbuiltin{bool __has_trivial_copy (@var{type})}
-If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
-type then the trait is @code{true}, else if @var{type} is a cv class
-or union type with a trivial copy constructor ([class.copy]) then the trait
-is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
-a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
-bound.
-@enddefbuiltin
-
 @defbuiltin{bool __has_trivial_constructor (@var{type})}
 If @code{__is_trivial (type)} is @code{true} then the trait is @code{true},
 else if @var{type} is a cv-qualified class or union type (or array thereof)
@@ -29545,6 +29536,15 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
+@defbuiltin{bool __has_trivial_copy (@var{type})}
+If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
+type then the trait is @code{true}, else if @var{type} is a cv class
+or union type with a trivial copy constructor ([class.copy]) then the trait
+is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
+a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
+bound.
+@enddefbuiltin
+
 @defbuiltin{bool __has_trivial_destructor (@var{type})}
 If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference type
 then the trait is @code{true}, else if @var{type} is a cv class or union
@@ -29560,6 +29560,13 @@ If @var{type} is a class type with a virtual destructor
 Requires: If @var{type} is a non-union class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __integer_pack (@var{length})}
+When used as the pattern of a pack expansion within a template
+definition, expands to a template argument pack containing integers
+from @code{0} to @code{@var{length}-1}.  This is provided for
+efficient implementation of @code{std::make_integer_sequence}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_abstract (@var{type})}
 If @var{type} is an abstract class ([class.abstract]) then the trait
 is @code{true}, else it is @code{false}.
@@ -29589,12 +29596,6 @@ If @var{type} is a cv-qualified class type, and not a 
union type
 ([basic.compound]) the trait is @code{true}, else it is @code{false}.
 @enddefbuiltin
 
-@c FIXME Commented out for GCC 13, discuss user interface for GCC 14.
-@c @defbuiltin{bool __is_deducible (@var{template}, @var{type})}
-@c If template arguments for @code{template} can be deduced from
-@c @code{type} or obtained from default template arguments.
-@c @enddefbuiltin
-
 @defbuiltin{bool __is_empty (@var{type})}
 If @code{__is_class (type)} is @code{false}

Re: [PATCH 5/5] RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

2024-03-01 Thread Andrew Waterman
On Fri, Mar 1, 2024 at 4:07 AM Robin Dapp  wrote:
>
> Hi Han,
>
> in addition to what Juzhe mentioned (and that late-combine is going
> to handle such cases) it should be noted that register pressure
> should not be the only consideration here.  Many uarchs have a higher
> latency for register-file-crossing moves.  At least without spilling
> the vv variant is preferable, with spilling it very much depends.

And of course there are uarches for which this is not the case (e.g.
post-commit decoupled vector unit), in which case the .vx and .vf
versions are preferable to the .vv form regardless of vector register
pressure, because they reduce vector regfile access energy (especially
if a splat can be avoided).  So it's a job for -mtune.

>
>
> Regards
>  Robin
>


[PATCH 04/11] gcc/doc/extend.texi: Add documentation for __is_function

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_function): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4c8c0631ca7..8ad88516c04 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29628,6 +29628,11 @@ is @code{true}, else it is @code{false}.
 Requires: If @var{type} is a class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __is_function (@var{type})}
+If @var{type} is a function type ([dcl.fct]) the trait is @code{true},
+else it is @code{false}.
+@enddefbuiltin
+
 @c FIXME Commented out for GCC 13, discuss user interface for GCC 14.
 @c @defbuiltin{bool __is_deducible (@var{template}, @var{type})}
 @c If template arguments for @code{template} can be deduced from
-- 
2.44.0



[PATCH 05/11] gcc/doc/extend.texi: Add documentation for __is_member_function_pointer

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_member_function_pointer): New
documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8ad88516c04..9361b425ba1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29646,6 +29646,11 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
+@defbuiltin{bool __is_member_function_pointer (@var{type})}
+If @var{type} is a pointer to member function ([dcl.mptr]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH 11/11] gcc/doc/extend.texi: Add documentation for __remove_pointer

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__remove_pointer): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fb2614176e5..1705ed93934 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29708,6 +29708,11 @@ If @var{type} is a cv union type ([basic.compound]) 
the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{@var{type} __remove_pointer (@var{type} @var{t})}
+If @var{type} is a pointer type ([dcl.ptr]) then the trait is the @var{type}
+pointed to by @var{t}, else it is @var{t}.
+@enddefbuiltin
+
 @defbuiltin{bool __underlying_type (@var{type})}
 The underlying type of @var{type}.
 Requires: @var{type} shall be an enumeration type ([dcl.enum]).
-- 
2.44.0



[PATCH 08/11] gcc/doc/extend.texi: Add documentation for __is_object

2024-03-01 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_object): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 08276f734f2..05f864e3dd5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29661,6 +29661,11 @@ If @var{type} is a pointer to member ([dcl.mptr]) the 
trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{bool __is_object (@var{type})}
+If @var{type} is an object type ([basic.types]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



  1   2   >