Re: [PATCH v1] Doc: Add doc for standard name mask_len_strided_load{store}m

2024-10-30 Thread Richard Biener
On Wed, Oct 30, 2024 at 2:39 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add doc for the below 2 standard names.
>
> 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
> 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)
>
> gcc/ChangeLog:
>
> * doc/md.texi: Add doc for mask_len_stried_load{store}.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Juzhe-Zhong 
> ---
>  gcc/doc/md.texi | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6d9c8643739..83036383fe1 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5135,6 +5135,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.

wouldn't it be zero as base?

> +For each element index i load address is operand 1 + @var{i} * operand 2.

the load address

Otherwise OK.

Thanks,
Richard.

> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
> 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
> result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5172,6 +5186,19 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + 
> operand 5) elements of
> +mask (operand 3) to memory.  Element @var{i} of the mask is set if element 
> @var{i} of (operand 3)
> +should be stored.  Mask elements @var{i} with @var{i} > (operand 4 + operand 
> 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> --
> 2.43.0
>


[committed] i386: Use assign_stack_temp instead of assign_386_stack_local with SLOT_TEMP

2024-10-30 Thread Uros Bizjak
It is better to use assign_stack_temp instead of assign_386_stack_local
with SLOT_TEMP because assign_stack_temp also shares sub-space of stack
slots (e.g. HImode temp shares stack slot with SImode stack slot).

Use assign_386_stack_local only for special stack slots (SLOT_STV_TEMP that
can be nested inside other stack temp access, SLOT_FLOATxFDI_387 that has
relaxed alignment constraint) or slots that can't be shared (SLOT_CW_*).

The patch removes SLOT_TEMP. assign_stack_temp should be used instead.

gcc/ChangeLog:

* config/i386/i386.h (enum ix86_stack_slot): Remove SLOT_TEMP.
* config/i386/i386-expand.cc (ix86_expand_builtin)
: Use assign_stack_temp instead of
assign_386_stack_local with SLOT_TEMP.
: Ditto.
(ix86_expand_divmod_libfunc): Ditto.
* config/i386/i386.md (floatunssi2): Ditto.
* config/i386/sync.md (atomic_load): Ditto.
(atomic_store): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 36011cc6b35..0de0e842731 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -13738,13 +13738,13 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
 
 case IX86_BUILTIN_LDMXCSR:
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
-  target = assign_386_stack_local (SImode, SLOT_TEMP);
+  target = assign_stack_temp (SImode, GET_MODE_SIZE (SImode));
   emit_move_insn (target, op0);
   emit_insn (gen_sse_ldmxcsr (target));
   return 0;
 
 case IX86_BUILTIN_STMXCSR:
-  target = assign_386_stack_local (SImode, SLOT_TEMP);
+  target = assign_stack_temp (SImode, GET_MODE_SIZE (SImode));
   emit_insn (gen_sse_stmxcsr (target));
   return copy_to_mode_reg (SImode, target);
 
@@ -25743,7 +25743,7 @@ ix86_expand_divmod_libfunc (rtx libfunc, machine_mode 
mode,
rtx op0, rtx op1,
rtx *quot_p, rtx *rem_p)
 {
-  rtx rem = assign_386_stack_local (mode, SLOT_TEMP);
+  rtx rem = assign_stack_temp (mode, GET_MODE_SIZE (mode));
 
   rtx quot = emit_library_call_value (libfunc, NULL_RTX, LCT_NORMAL,
  mode, op0, mode, op1, mode,
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 2dcd8803a08..51934400951 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2539,8 +2539,7 @@ enum ix86_fpcmp_strategy {
 
 enum ix86_stack_slot
 {
-  SLOT_TEMP = 0,
-  SLOT_CW_STORED,
+  SLOT_CW_STORED = 0,
   SLOT_CW_ROUNDEVEN,
   SLOT_CW_TRUNC,
   SLOT_CW_FLOOR,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e4d1c56ea54..fb6aaa81505 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6282,7 +6282,7 @@ (define_expand "floatunssi2"
 {
   emit_insn (gen_floatunssi2_i387_with_xmm
  (operands[0], operands[1],
-  assign_386_stack_local (DImode, SLOT_TEMP)));
+  assign_stack_temp (DImode, GET_MODE_SIZE (DImode;
   DONE;
 }
   if (!TARGET_AVX512F)
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index f2b3ba0aa7a..f03d418c369 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -170,7 +170,7 @@ (define_expand "atomic_load"
   if (mode == DImode && !TARGET_64BIT)
 emit_insn (gen_atomic_loaddi_fpu
   (operands[0], operands[1],
-   assign_386_stack_local (DImode, SLOT_TEMP)));
+   assign_stack_temp (DImode, GET_MODE_SIZE (DImode;
   else
 {
   rtx dst = operands[0];
@@ -251,7 +251,7 @@ (define_expand "atomic_store"
 out to be significantly larger than this plus a barrier.  */
   emit_insn (gen_atomic_storedi_fpu
 (operands[0], operands[1],
- assign_386_stack_local (DImode, SLOT_TEMP)));
+ assign_stack_temp (DImode, GET_MODE_SIZE (DImode;
 }
   else
 {


Re: [PATCH 1/5] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-10-30 Thread Richard Biener
On Wed, Oct 30, 2024 at 3:09 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > You are testing GENERIC folding, so gcc.dg/ is a better location, not 
> > tree-ssa/
>
> Sure, will move all test files to there.
>
> > I wonder if the simplification is already applied by the frontend and thus
> > .original shows the simplified form or only .gimple?
>
> Yes, you are right, the .original shows the simplified form. Take below code 
> as example:
>
>6   │ #define T uint8_t
>7   │
>8   │ T sat_add_u_1 (T x, T y)
>9   │ {
>   10   │   return (T)(x + y) < x ? -1 : (x + y);
>   11   │ }
>
> We have .original similar as below:
>
>6   │ {
>7   │   return (unsigned char) x + (unsigned char) y | -(uint8_t) 
> ((unsigned char) x + (unsigned char) y < (unsigned char) x);
>8   │ }
>
> > Did you check the simplification applies when writing as
> >
> >  if (x + y < x)
> >return -1;
> >  else
> >return x + y;
> > ?  If it doesn't then removing the match is likely premature.  I think 
> > phiopt
> > should be able to perform the matching here (it does the COND_EXPR
> > building).
>
> The form above doesn't hit the simplification after .gimple but the .phiopt2 
> performs
> the simplify and then we have below gimple after .phiopt2.
>
>   10   │ COND_EXPR in block 2 and PHI in block 4 converted to straightline 
> code.
>   11   │ Merging blocks 2 and 4
>   12   │ fix_loop_structure: fixing up loops for function
>   13   │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y)
>   14   │ {
>   15   │   unsigned char _1;
>   16   │   _Bool _6;
>   17   │   unsigned char _7;
>   18   │   unsigned char _8;
>   19   │   unsigned char _9;
>   20   │
>   21   │[local count: 1073741824]:
>   22   │   _1 = x_3(D) + y_4(D);
>   23   │   _6 = _1 < x_3(D);
>   24   │   _7 = (unsigned char) _6;
>   25   │   _8 = -_7;
>   26   │   _9 = _1 | _8;
>   27   │   return _9;
>   28   │
>   29   │ }
>
> BTW, given sorts of forms will be simplified to the "cheap" one, do we have 
> some best
> practice to avoid the "cheap" form duplication?  Currently for each 
> simplification we have
> one copy of the "cheap" form. The "switch" in match.pd looks like not 
> applicable here.
> I also have a try similar as below but it will generate invalid tree for some 
> cases.
>
> (simplify (cond (gt @0 (plus:c@2 @0 @1)) integer_minus_onep @2)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1))
> -  (bit_ior @2 (negate (convert (lt @2 @0))
> +  { build_cheap_unsigned_int_sat_add (@0, @1, type); }))

That won't work for GIMPLE.

I was originally thinking we either canonicalize all forms to a single
"simple cheap"
form and only match that to .SAT_* later or we use generate the
"simple cheap" form
at the point we'd introduce .SAT_* as fallback in a programmatic way
(do code-gen
from the pass and not via match.pd simplify patterns).  The latter
variant is more
complicated as we'd need to duplicate things in the vectorizer and late.  The
former variant indeed needs to duplicate the simplify result expression for each
matched source (but you could use preprocessor macros if it doesn't
get too ugly).

Richard.

>
> +tree
> +build_cheap_unsigned_int_sat_add (tree op_0, tree op_1, tree type)
> +{
> +  /* (bit_ior @2 (negate (convert (lt @2 @0)) */
> +  return build2 (BIT_IOR_EXPR, type,
> +build2 (PLUS_EXPR, type, op_0, op_1),
> +build1 (NEGATE_EXPR, type,
> +build1 (NOP_EXPR, type,
> +build2 (LT_EXPR, boolean_type_node,
> +build2 (PLUS_EXPR, type, op_0, op_1),
> +op_0;
> +}
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 29, 2024 9:06 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Andrew 
> Pinski 
> Subject: Re: [PATCH 1/5] Match: Simplify branch form 4 of unsigned SAT_ADD 
> into branchless
>
> On Tue, Oct 29, 2024 at 9:27 AM  wrote:
> >
> > From: Pan Li 
> >
> > There are sorts of forms for the unsigned SAT_ADD.  Some of them are
> > complicated while others are cheap.  This patch would like to simplify
> > the complicated form into the cheap ones.  For example as below:
> >
> > From the form 4 (branch):
> >   SAT_U_ADD = (X + Y) < x ? -1 : (X + Y).
> >
> > To (branchless):
> >   SAT_U_ADD = (X + Y) | - ((X + Y) < X).
> >
> >   #define T uint8_t
> >
> >   T sat_add_u_1 (T x, T y)
> >   {
> > return (T)(x + y) < x ? -1 : (x + y);
> >   }
> >
> > Before this patch:
> >1   │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y)
> >2   │ {
> >3   │   uint8_t D.2809;
> >4   │
> >5   │   _1 = x + y;
> >6   │   if (x <= _1) goto ; else goto ;
> >7   │   :
> >8   │   D.2809 = x + y;
> >9   │   goto ;
> >   10   │   :
> >   11   │   D.2809

*PING* [PATCH 0/7] fortran: Inline MINLOC/MAXLOC with DIM [PR90608]

2024-10-30 Thread Mikael Morin

*PING*

The first series of patches was pushed, the second (and last) one [1][2] 
is awaiting review.


[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665360.html
[2] https://gcc.gnu.org/pipermail/fortran/2024-October/061180.html


Re: [PATCH v8] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-10-30 Thread Simon Martin
Friendly ping. Thanks! Simon

On 16 Oct 2024, at 17:43, Simon Martin wrote:

> Hi Jason,
>
> On 12 Oct 2024, at 4:51, Jason Merrill wrote:
>
>> On 10/11/24 7:02 AM, Simon Martin wrote:
>>> Hi Jason,
>>>
>>> On 11 Oct 2024, at 0:35, Jason Merrill wrote:
>>>
 On 10/7/24 3:35 PM, Simon Martin wrote:
> On 7 Oct 2024, at 18:58, Jason Merrill wrote:
>> On 10/7/24 11:27 AM, Simon Martin wrote:

>>> /* Now give a warning for all base functions without overriders,
>>>as they are hidden.  */
>>> for (tree base_fndecl : base_fndecls)
>>> + {
>>> +   if (!base_fndecl || overriden_base_fndecls.contains
>>> (base_fndecl))
>>> + continue;
>>> +   tree *hider = hidden_base_fndecls.get (base_fndecl);
>>> +   if (hider)
>>
>> How about looping over hidden_base_fndecls instead of base_fndecls?

> Unfortunately it does not work because a given base method can be
> hidden
> by one overload and overriden by another, in which case we don’t
> want
> to warn (see for example AA:foo(int) in Woverloaded-virt7.C). So we
>
> need
> to take both collections into account.

 Yes, you'd still need to check overridden_base_fndecls.contains, but
>
 that doesn't seem any different iterating over hidden_base_fndecls
 instead of base_fndecls.
>>> Sure, and I guess iterating over hidden_base_fndecls is more coherent
>
>>>
>>> with what the warning is about. Changed in the attached updated patch,
>>> successfully tested on x86_64-pc-linux-gnu. OK?
>>
>> OK, thanks.
> As you know the patch had to be reverted due to PR117114, that
> highlighted a bunch of issues with comparing DECL_VINDEXes: it might
> give false positives in case of multiple inheritance (the case in
> PR117114), but also if there’s single inheritance by the hierarchy has
> more than two levels (another issue I found while bootstrapping with
> rust enabled).
>
> The attached updated patch introduces an overrides_p function, based on
> the existing check_final_overrider, and uses it when the signatures match.
>
> It’s been successfully tested on x86_64-pc-linux-gnu, and bootstrap
> works fine with —enable-languages=all (and rust properly configured, so
> included here). OK for trunk?
>
> Thanks, Simon



Re: [PATCH] c++: Relax checking assert about elision to support -fno-elide-constructors [PR114619]

2024-10-30 Thread Simon Martin
On 19 Oct 2024, at 11:09, Simon Martin wrote:

> We currently ICE in checking mode with cxx_dialect < 17 on the 
> following
> valid code
>
> === cut here ===
> struct X {
>   X(const X&) {}
> };
> extern X x;
> void foo () {
>   new X[1]{x};
> }
> === cut here ===
>
> The problem is that cp_gimplify_expr gcc_checking_asserts that a
> TARGET_EXPR is not TARGET_EXPR_ELIDING_P (or cannot be elided), while 
> in
> this case with cxx_dialect < 17, it is TARGET_EXPR_ELIDING_P but we 
> have
> not even tried to elide.
>
> This patch relaxes that gcc_checking_assert to not fail when using
> cxx_dialect < 17 and -fno-elide-constructors (I considered being more
> clever at setting TARGET_EXPR_ELIDING_P appropriately but it looks 
> more
> risky and not worth the extra complexity for a checking assert).
>
> Successfully tested on x86_64-pc-linux-gnu.
Friendly ping. Thanks!

>
>   PR c++/114619
>
> gcc/cp/ChangeLog:
>
>   * cp-gimplify.cc (cp_gimplify_expr): Relax gcc_checking_assert
>   to support the usage of -fno-elide-constructors with c++ < 17.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/init/no-elide3.C: New test.
>
> ---
>  gcc/cp/cp-gimplify.cc |  7 ++-
>  gcc/testsuite/g++.dg/init/no-elide3.C | 11 +++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/init/no-elide3.C
>
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index 003e68f1ea7..354ea73c63b 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -908,7 +908,12 @@ cp_gimplify_expr (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p)
>gimplify_init_ctor_preeval can materialize subobjects of a 
> CONSTRUCTOR
>on the rhs of an assignment, as in constexpr-aggr1.C.  */
>gcc_checking_assert (!TARGET_EXPR_ELIDING_P (*expr_p)
> -|| !TREE_ADDRESSABLE (TREE_TYPE (*expr_p)));
> +|| !TREE_ADDRESSABLE (TREE_TYPE (*expr_p))
> +/* If we explicitly asked not to elide and it's not
> +   required by the standard, we can't expect elision
> +   to have happened.  */
> +|| (cxx_dialect < cxx17
> +&& !flag_elide_constructors));
>ret = GS_UNHANDLED;
>break;
>
> diff --git a/gcc/testsuite/g++.dg/init/no-elide3.C 
> b/gcc/testsuite/g++.dg/init/no-elide3.C
> new file mode 100644
> index 000..9377d9f0161
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/init/no-elide3.C
> @@ -0,0 +1,11 @@
> +// PR c++/114619
> +// { dg-do "compile" { target c++11 } }
> +// { dg-options "-fno-elide-constructors" }
> +
> +struct X {
> +  X(const X&) {}
> +};
> +extern X x;
> +void foo () {
> +  new X[1]{x};
> +}
> -- 
> 2.44.0


[COMMITED] [wwwdocs] index: GCC developer room at FOSDEM 2025: Call for Participation open

2024-10-30 Thread Marc Poulhiès
---
 htdocs/index.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/index.html b/htdocs/index.html
index b303744c..d20c7348 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -55,6 +55,10 @@ mission statement.
 News
 
 
+https://inbox.sourceware.org/gcc/875xparmhn@kataplop.net/";>GCC 
developer room at FOSDEM 2025: Call for Participation open
+[2023-10-30]
+FOSDEM 2025: Brussels, Belgium, February 1-2 2025
+
 GCC 14.2 released
 [2024-08-01]
 
-- 
2.42.0



[committed] [PATCH] arm: [MVE intrinsics] Remove unused builtins qualifiers

2024-10-30 Thread Christophe Lyon
After the re-implementation of MVE vld/vst intrinsics, a few builtins
qualifiers became useless.

This patch removes them to restore bootstrap (otherwise the build
fails because of 'defined but not used' errors.

gcc/ChangeLog:

* config/arm/arm-builtins.cc (STRS_QUALIFIERS): Delete.
(STRU_QUALIFIERS): Delete.
(STRS_P_QUALIFIERS): Delete.
(STRU_P_QUALIFIERS): Delete.
(LDRS_QUALIFIERS): Delete.
(LDRU_QUALIFIERS): Delete.
(LDRS_Z_QUALIFIERS): Delete.
(LDRU_Z_QUALIFIERS): Delete.
---
 gcc/config/arm/arm-builtins.cc | 41 --
 1 file changed, 41 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 0f16503e92d..6ee1563c02f 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -610,16 +610,6 @@ 
arm_quadop_unone_unone_unone_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_unone_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_strs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_pointer, qualifier_none };
-#define STRS_QUALIFIERS (arm_strs_qualifiers)
-
-static enum arm_type_qualifiers
-arm_stru_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_pointer, qualifier_unsigned };
-#define STRU_QUALIFIERS (arm_stru_qualifiers)
-
 static enum arm_type_qualifiers
 arm_strss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer, qualifier_unsigned,
@@ -643,17 +633,6 @@ arm_strsbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned};
 #define STRSBU_QUALIFIERS (arm_strsbu_qualifiers)
 
-static enum arm_type_qualifiers
-arm_strs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_pointer, qualifier_none, qualifier_predicate};
-#define STRS_P_QUALIFIERS (arm_strs_p_qualifiers)
-
-static enum arm_type_qualifiers
-arm_stru_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_pointer, qualifier_unsigned,
-  qualifier_predicate};
-#define STRU_P_QUALIFIERS (arm_stru_p_qualifiers)
-
 static enum arm_type_qualifiers
 arm_strsu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_pointer, qualifier_unsigned,
@@ -688,16 +667,6 @@ arm_ldrgs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_pointer, qualifier_unsigned};
 #define LDRGS_QUALIFIERS (arm_ldrgs_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ldrs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_pointer};
-#define LDRS_QUALIFIERS (arm_ldrs_qualifiers)
-
-static enum arm_type_qualifiers
-arm_ldru_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_pointer};
-#define LDRU_QUALIFIERS (arm_ldru_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ldrgbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate};
@@ -732,16 +701,6 @@ arm_ldrgu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_predicate};
 #define LDRGU_Z_QUALIFIERS (arm_ldrgu_z_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ldrs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_pointer, qualifier_predicate};
-#define LDRS_Z_QUALIFIERS (arm_ldrs_z_qualifiers)
-
-static enum arm_type_qualifiers
-arm_ldru_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_pointer, qualifier_predicate};
-#define LDRU_Z_QUALIFIERS (arm_ldru_z_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
-- 
2.34.1



Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Richard Sandiford
Vineet Gupta  writes:
> On 10/29/24 11:51, Wilco Dijkstra wrote:
>> Hi Vineet,
>>> I agree the NARROW/WIDE stuff is obfuscating things in technicalities.
>> Is there evidence this change would make things significantly worse for
>> some targets? 
>
> Honestly I don't think this needs to be behind any toggle or made optional at 
> all. The old algorithm was overly eager in spilling. But per last
> discussion with Richard [1] at least back in 2012 for some in-order arm32 
> core this was better. And also that's where the wide vs. narrow discussions
> came up and that it really mattered, as far as I understood.

Right, that's the key.  The current algorithm was tuned on an in-order
core for which GCC already had a relatively accurate pipeline model.
The question is whether this is better on a core like that: that is,
on an in-order core for which GCC has a relatively accurate pipeline model.
No amount of benchmarking on out-of-order cores will answer that.

Somewhat surprisingly, we don't AFAIK have a target hook for "is the
current target out-of-order?".  Why not make the target hook that
instead?  I think everyone agrees (including me in the previous
thread) that the current behaviour isn't right for OoO cores.

If someone has an OoO core that for some reason prefers the current
approach (unlikely), we can decide what to do then.  But in the meantime,
keying of OoO-ness seems simpler and easier to document.

Thanks,
Richard


Re: [PATCH v4 3/7] OpenMP: C front-end support for dispatch + adjust_args

2024-10-30 Thread Paul-Antoine Arras

On 24/10/2024 13:42, Tobias Burnus wrote:

Hi,

some more comments:

Paul-Antoine Arras wrote:

Here is an updated patch following these comments.
 gcc/testsuite/ChangeLog:
 
 * gcc.dg/gomp/adjust-args-1.c: New test.

 * gcc.dg/gomp/dispatch-1.c: New test.


The ChangeLog misses to include libgomp/testsuite/libgomp.c/dispatch-1.c 
and libgomp/testsuite/libgomp.c/dispatch-2.c, which are part of this patch.


But there is reason to move them to 5/7: I think we also need a run test 
for C++ to make sure that it works, i.e. moving them to libgomp.c-c++- 
common/ makes sense, which in turn requires the 4/7 C++ FE patch.


Agreed. That's actually what I did in my local tree but forgot to move 
it to the right commit.



* * *


+/* Parse a function dispatch structured block:
+
+lvalue-expression = target-call ( [expression-list] );
+or
+target-call ( [expression-list] );
+
+   Adapted from c_parser_expr_no_commas.
+*/


Can you expand the description, e.g. like:

Adapted from c_parser_expr_no_commas and
c_parser_postfix_expression (CPP_NAME/C_ID_ID) for the
function name).


Done.


And:


+static tree
+c_parser_omp_dispatch_body (c_parser *parser)
+{

...

+  /* Parse function name.  */
+  if (!c_parser_next_token_is (parser, CPP_NAME))
+{
+  c_parser_error (parser, "expected a function name");
+  rhs.set_error ();
+  return rhs.value;
+}
+  expr_loc = c_parser_peek_token (parser)->location;
+  tree id = c_parser_peek_token (parser)->value;
+  c_parser_consume_token (parser);
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+return error_mark_node;
+
+  rhs.value = build_external_ref (expr_loc, id, true, &rhs.original_type);
+  set_c_expr_source_range (&rhs, tok_range);
+  /* Parse argument list.  */


Wouldn't it be way easier and future proof (less code duplication)
to just do the following:

Note:
* rhs.set_error() / return rhs.value; → error_mark_node;
* 'c_parser_require (parser, CPP_OPEN_PAREN' → c_parser_next_token_is
* And then checking that the returned code is the expected CALL.

   /* Parse function name.  */
   if (!c_parser_next_token_is (parser, CPP_NAME))
 {
   c_parser_error (parser, "expected a function name");
   return error_mark_node;
 }
   expr_loc = c_parser_peek_token (parser)->location;
   tree id = c_parser_peek_token (parser)->value;
   c_parser_consume_token (parser);
   if (!c_parser_next_token_is (parser, CPP_OPEN_PAREN))
 {
   c_parser_error (parser, "expected a function name");
   return error_mark_node;
 }

   rhs.value = build_external_ref (expr_loc, id, true, &rhs.original_type);
   set_c_expr_source_range (&rhs, tok_range);
   
   /* Parse argument list.  */

   rhs = c_parser_postfix_expression_after_primary
 (parser, EXPR_LOC_OR_LOC (rhs.value, expr_loc), rhs);
   if (TREE_CODE (lhs.value) != CALL_EXPR)
 {
   error_at (EXPR_LOC_OR_LOC (rhs.value, expr_loc),
 "expected target-function call");
   return error_mark_node;
 }


Yes, much better indeed!


* * *

Testing it shows then the following error:

dispatch3.c:11:13: Fehler: expected target-function call
11 |x = bar()[0];
   |~^~~

which seems to be reasonable.

* * *


  #define OMP_DECLARE_SIMD_CLAUSE_MASK  \
@@ -25242,77 +25512,223 @@ c_finish_omp_declare_variant (c_parser *parser, tree 
fndecl, tree parms)


[...]

The old code did:

tree ctx = c_parser_omp_context_selector_specification (parser, parms); 
if (ctx == error_mark_node) goto fail; ctx = omp_check_context_selector 
(match_loc, ctx); if (ctx != error_mark_node && variant != 
error_mark_node) { if (TREE_CODE (variant) != FUNCTION_DECL)


i.e. it effectively finished parsing (except for the ')', then checks 
the context selector and, finally, the variant declaration. The new code 
is similar, except that the variant declaration check is inside the 
'match(…)' check – and missing this when processing 'adjust_args'. 
Either all parsing needs to happen before this - or the handling needs 
to be be but next to the relative item. NOTE: You need to ensure that 
the location is still okay if you move the checking after parsing. * * * 


Added a check for error_mark after parsing, just before the list loop.

As mentioned in the older 2/7 review, this will fail (ICE) if the 
variant has issues (e.g. variant function not declared) as the adjust- 
args code assumes that it is valid. → testcase in that email (+ some off 
list communication) * * * As mentioned off list, the following fails as well:


void variant_fn();  // Assume C < C23; in C++/C23 it becomes the same as 
'…(void)'.

#pragma omp declare variant(variant_fn) match(construct={dispatch}) 
adjust_args(need_device_ptr: x,y)
void bar(int *x, int *y);

void sub(int *x, *y)
{
   #pragma omp dispatch is_device_ptr(x)
  bar(x, y);
}


Issue: the host-to-device pointer conversion is lacking for 'varia

Re: [PATCH] Fortran: fix several front-end memleaks

2024-10-30 Thread Harald Anlauf

Am 29.10.24 um 23:06 schrieb Jerry D:

On 10/29/24 2:00 PM, Harald Anlauf wrote:

Dear all,

while looking at the recent testcase gfortran.dg/pr115070.f90 with f951
running under valgrind, I noticed minor front-end memleaks of gfc_expr's
that are probably fallout from a code refactoring, which are fixed by
the attached.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald



Yes OK for mainline.


Pushed as r15-4774-gb8291710e3a6d9.

Thanks,
Harald


Thanks,

Jerry






Re: [PATCH] genmatch: Fix build on hppa64-hpux [PR117348]

2024-10-30 Thread Richard Biener
On Wed, 30 Oct 2024, Jakub Jelinek wrote:

> Hi!
> 
> Apparently autoconf defines the HAVE_DECL_* macros to 0
> rather than not defining them at all, so defined(HAVE_DECL_FMEMOPEN)
> test doesn't do much.
> 
> The following patch fixes it by testing HAVE_DECL_FMEMOPEN
> for being non-zero instead.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-10-30  Jakub Jelinek  
> 
>   PR middle-end/117348
>   * genmatch.cc: Replace defined(HAVE_DECL_FMEMOPEN)
>   test with HAVE_DECL_FMEMOPEN.
> 
> --- gcc/genmatch.cc.jj2024-10-27 16:44:30.792004967 +0100
> +++ gcc/genmatch.cc   2024-10-29 18:04:52.888907269 +0100
> @@ -585,7 +585,7 @@ diag_vfprintf (FILE *f, int err_no, cons
>fprintf (f, "%s", q);
>  }
>  
> -#if defined(GENMATCH_SELFTESTS) && defined(HAVE_DECL_FMEMOPEN)
> +#if defined(GENMATCH_SELFTESTS) && HAVE_DECL_FMEMOPEN
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wsuggest-attribute=format"
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 2/2] Add a new permute optimization step in SLP

2024-10-30 Thread Christoph Müllner
On Fri, Oct 18, 2024 at 1:08 PM Richard Biener  wrote:
>
> On Fri, 18 Oct 2024, Tamar Christina wrote:
>
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, October 18, 2024 11:03 AM
> > > To: Tamar Christina 
> > > Cc: Christoph Müllner ; 
> > > gcc-patches@gcc.gnu.org;
> > > Philipp Tomsich ; Jeff Law 
> > > ;
> > > Robin Dapp 
> > > Subject: RE: [PATCH 2/2] Add a new permute optimization step in SLP
> > >
> > > On Thu, 17 Oct 2024, Tamar Christina wrote:
> > >
> > > > Hi Christoph,
> > > >
> > > > > -Original Message-
> > > > > From: Christoph Müllner 
> > > > > Sent: Tuesday, October 15, 2024 3:57 PM
> > > > > To: gcc-patches@gcc.gnu.org; Philipp Tomsich 
> > > > > ;
> > > Tamar
> > > > > Christina ; Richard Biener 
> > > > > 
> > > > > Cc: Jeff Law ; Robin Dapp
> > > ;
> > > > > Christoph Müllner 
> > > > > Subject: [PATCH 2/2] Add a new permute optimization step in SLP
> > > > >
> > > > > This commit adds a new permute optimization step after running SLP
> > > > > vectorization.
> > > > > Although there are existing places where individual or nested permutes
> > > > > can be optimized, there are cases where independent permutes can be
> > > optimized,
> > > > > which cannot be expressed in the current pattern matching framework.
> > > > > The optimization step is run at the end so that permutes from 
> > > > > completely
> > > different
> > > > > SLP builds can be optimized.
> > > > >
> > > > > The initial optimizations implemented can detect some cases where 
> > > > > different
> > > > > "select permutes" (permutes that only use some of the incoming vector 
> > > > > lanes)
> > > > > can be co-located in a single permute. This can optimize some cases 
> > > > > where
> > > > > two_operator SLP nodes have duplicate elements.
> > > > >
> > > > > Bootstrapped and reg-tested on AArch64 (C, C++, Fortran).
> > > > >
> > > > > Manolis Tsamis was the patch's initial author before I took it over.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * tree-vect-slp.cc (get_tree_def): Return the definition of a 
> > > > > name.
> > > > > (recognise_perm_binop_perm_pattern): Helper function.
> > > > > (vect_slp_optimize_permutes): New permute optimization step.
> > > > > (vect_slp_function): Run the new permute optimization step.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.dg/vect/slp-perm-14.c: New test.
> > > > > * gcc.target/aarch64/sve/slp-perm-14.c: New test.
> > > > >
> > > > > Signed-off-by: Christoph Müllner 
> > > > > ---
> > > > >  gcc/testsuite/gcc.dg/vect/slp-perm-14.c   |  42 +++
> > > > >  .../gcc.target/aarch64/sve/slp-perm-14.c  |   3 +
> > > > >  gcc/tree-vect-slp.cc  | 248 
> > > > > ++
> > > > >  3 files changed, 293 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-14.c
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> > > > > b/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> > > > > new file mode 100644
> > > > > index 000..f56e3982a62
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-14.c
> > > > > @@ -0,0 +1,42 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-additional-options "-O3 -fdump-tree-slp1-details" } */
> > > > > +
> > > > > +#include 
> > > > > +
> > > > > +#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
> > > > > +int t0 = s0 + s1;\
> > > > > +int t1 = s0 - s1;\
> > > > > +int t2 = s2 + s3;\
> > > > > +int t3 = s2 - s3;\
> > > > > +d0 = t0 + t2;\
> > > > > +d1 = t1 + t3;\
> > > > > +d2 = t0 - t2;\
> > > > > +d3 = t1 - t3;\
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +x264_pixel_satd_8x4_simplified (uint8_t *pix1, int i_pix1, uint8_t 
> > > > > *pix2, int
> > > > > i_pix2)
> > > > > +{
> > > > > +  uint32_t tmp[4][4];
> > > > > +  uint32_t a0, a1, a2, a3;
> > > > > +  int sum = 0;
> > > > > +
> > > > > +  for (int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2)
> > > > > +{
> > > > > +  a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
> > > > > +  a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
> > > > > +  a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
> > > > > +  a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
> > > > > +  HADAMARD4(tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0, a1, 
> > > > > a2, a3);
> > > > > +}
> > > > > +
> > > > > +  for (int i = 0; i < 4; i++)
> > > > > +{
> > > > > +  HADAMARD4(a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], 
> > > > > tmp[3][i]);
> > > > > +  sum += a0 + a1 + a2 + a3;
> > > > > +}
> > > > > +
> > > > > +  return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" 
> > > > > "slp1" } } */
> > >

Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Jeff Law




On 10/30/24 4:05 AM, Richard Sandiford wrote:

Vineet Gupta  writes:

On 10/29/24 11:51, Wilco Dijkstra wrote:

Hi Vineet,

I agree the NARROW/WIDE stuff is obfuscating things in technicalities.

Is there evidence this change would make things significantly worse for
some targets?


Honestly I don't think this needs to be behind any toggle or made optional at 
all. The old algorithm was overly eager in spilling. But per last
discussion with Richard [1] at least back in 2012 for some in-order arm32 core 
this was better. And also that's where the wide vs. narrow discussions
came up and that it really mattered, as far as I understood.


Right, that's the key.  The current algorithm was tuned on an in-order
core for which GCC already had a relatively accurate pipeline model.
The question is whether this is better on a core like that: that is,
on an in-order core for which GCC has a relatively accurate pipeline model.
No amount of benchmarking on out-of-order cores will answer that.

Somewhat surprisingly, we don't AFAIK have a target hook for "is the
current target out-of-order?".  Why not make the target hook that
instead?  I think everyone agrees (including me in the previous
thread) that the current behaviour isn't right for OoO cores.

If someone has an OoO core that for some reason prefers the current
approach (unlikely), we can decide what to do then.  But in the meantime,
keying of OoO-ness seems simpler and easier to document.
But the data from the BPI (spacemit k1 chip) is an in-order core. 
Granted we don't have a good model of its pipeline, but it's definitely 
in-order.


jeff



Re: [PATCH v2] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-10-30 Thread Patrick Palka
On Tue, 29 Oct 2024, Marek Polacek wrote:

> On Tue, Oct 22, 2024 at 07:42:57PM -0400, Jason Merrill wrote:
> > On 10/22/24 3:22 PM, Marek Polacek wrote:
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > -- >8 --
> > > This patch implements C++26 Pack Indexing, as described in
> > > .
> > 
> > Great!
> > 
> > > The issue discussing how to mangle pack indexes has not been resolved
> > > yet  and I've
> > > made no attempt to address it so far.
> > > 
> > > Rather than introducing a new template code for a pack indexing, I'm
> > > adding a new operand to EXPR_PACK_EXPANSION to store the index; for
> > > TYPE_PACK_EXPANSION, I'm stashing the index into TYPE_VALUES_RAW.  This
> > > feature is akin to __type_pack_element, so they can share the element
> > > extraction part.
> 
> In v2, I'm using two new codes, as discussed elsewhere.
> 
> > > A pack indexing in a decltype proved to be a bit tricky; eventually,
> > > I've added PACK_EXPANSION_PARENTHESIZED_P -- while parsing, we can't
> > > really tell what it's going to expand to.
> > 
> > As I comment below, I think we should have enough information while parsing;
> > what it expands to doesn't matter.
> 
> Yup; I must not have realized that pack-index-expression is a product of
> id-expression.
>  
> > > With this feature, it's valid to write something like
> > > 
> > >using U = tmpl;
> > > 
> > > where we first expand the template argument into
> > > 
> > >Ts...[Is#0], Ts...[Is#1], ...
> > > 
> > > and then substitute each individual pack index.
> > > 
> > > I have no test for the module.cc code, that is just guesswork.
> > 
> > Looks straightforward enough.
> 
> It was.  I made sure with an assert that the new code is exercised.
>  
> > > @@ -2605,6 +2605,8 @@ write_type (tree type)
> > >  case TYPE_PACK_EXPANSION:
> > >write_string ("Dp");
> > >write_type (PACK_EXPANSION_PATTERN (type));
> > > +   /* TODO: Mangle PACK_EXPANSION_INDEX
> > > +    */
> > 
> > Could we warn about this so it doesn't get forgotten?  And similarly in
> > write_expression?
> 
> There is now a new sorry.
>  
> > > @@ -3952,7 +3953,11 @@ find_parameter_packs_r (tree *tp, int 
> > > *walk_subtrees, void* data)
> > > break;
> > >   case VAR_DECL:
> > > -  if (DECL_PACK_P (t))
> > > +  /* We can have
> > > +T...[0] a;
> > > +(T...[0])(a); // #1
> > > +  where the 'a' in #1 is not a bare parameter pack.  */
> > > +  if (DECL_PACK_P (t) && !PACK_EXPANSION_INDEX (TREE_TYPE (t)))
> > 
> > Seems like the INDEX check should move into DECL_PACK_P?
> > 
> > Why doesn't this apply to PARM_DECL above?
> 
> I think this is now moot.
>  
> > > @@ -13946,6 +13969,10 @@ tsubst_pack_expansion (tree t, tree args, 
> > > tsubst_flags_t complain,
> > > && PACK_EXPANSION_P (TREE_VEC_ELT (result, 0)))
> > >   return TREE_VEC_ELT (result, 0);
> > > +  /* C++26 Pack Indexing.  */
> > > +  if (index)
> > > +return pack_index_element (index, result, complain);
> > 
> > Could we only compute the desired element rather than computing all of them
> > and selecting the desired one?
> 
> I don't think so.  Especially now that the PACK_EXPANSION_P is just one
> operand of a PACK_INDEX_*, and tsubst_pack_expansion is agnostic about
> whether the expansion is part of a pack index.
>  
> > > @@ -16897,17 +16924,23 @@ tsubst (tree t, tree args, tsubst_flags_t 
> > > complain, tree in_decl)
> > >   ctx = tsubst_pack_expansion (ctx, args,
> > >complain | tf_qualifying_scope,
> > >in_decl);
> > > - if (ctx == error_mark_node
> > > - || TREE_VEC_LENGTH (ctx) > 1)
> > > + if (ctx == error_mark_node)
> > > return error_mark_node;
> > > - if (TREE_VEC_LENGTH (ctx) == 0)
> > > + /* If there was a pack-index-specifier, we won't get a TREE_VEC,
> > > +just the single element.  */
> > > + if (TREE_CODE (ctx) == TREE_VEC)
> > > {
> > > - if (complain & tf_error)
> > > -   error ("%qD is instantiated for an empty pack",
> > > -  TYPENAME_TYPE_FULLNAME (t));
> > > - return error_mark_node;
> > > + if (TREE_VEC_LENGTH (ctx) > 1)
> > > +   return error_mark_node;
> > 
> > This is preexisting, but it seems like we're missing a call to error() in
> > this case.
> 
> Added.
>  
> > > @@ -17041,13 +17074,20 @@ tsubst (tree t, tree args, tsubst_flags_t 
> > > complain, tree in_decl)
> > >   else
> > > {
> > >   bool id = DECLTYPE_TYPE_ID_EXPR_OR_MEMBER_ACCESS_P (t);
> > > - if (id && TREE_CODE (DECLTYPE_TYPE_EXPR (t)) == BIT_NOT_EXPR
> > > - && EXPR_P (type))
> > > + tree op = DECLTYPE_TYPE_EXPR (t)

RE: [PATCH 1/5] Match: Simplify branch form 4 of unsigned SAT_ADD into branchless

2024-10-30 Thread Li, Pan2
Thanks Richard for comments.

> The
> former variant indeed needs to duplicate the simplify result expression for 
> each
> matched source (but you could use preprocessor macros if it doesn't
> get too ugly).

Got it, will have a try similar to #define cheap_sat_u_add (op_0, op_1) ... in 
v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, October 30, 2024 5:23 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Andrew Pinski 

Subject: Re: [PATCH 1/5] Match: Simplify branch form 4 of unsigned SAT_ADD into 
branchless

On Wed, Oct 30, 2024 at 3:09 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > You are testing GENERIC folding, so gcc.dg/ is a better location, not 
> > tree-ssa/
>
> Sure, will move all test files to there.
>
> > I wonder if the simplification is already applied by the frontend and thus
> > .original shows the simplified form or only .gimple?
>
> Yes, you are right, the .original shows the simplified form. Take below code 
> as example:
>
>6   │ #define T uint8_t
>7   │
>8   │ T sat_add_u_1 (T x, T y)
>9   │ {
>   10   │   return (T)(x + y) < x ? -1 : (x + y);
>   11   │ }
>
> We have .original similar as below:
>
>6   │ {
>7   │   return (unsigned char) x + (unsigned char) y | -(uint8_t) 
> ((unsigned char) x + (unsigned char) y < (unsigned char) x);
>8   │ }
>
> > Did you check the simplification applies when writing as
> >
> >  if (x + y < x)
> >return -1;
> >  else
> >return x + y;
> > ?  If it doesn't then removing the match is likely premature.  I think 
> > phiopt
> > should be able to perform the matching here (it does the COND_EXPR
> > building).
>
> The form above doesn't hit the simplification after .gimple but the .phiopt2 
> performs
> the simplify and then we have below gimple after .phiopt2.
>
>   10   │ COND_EXPR in block 2 and PHI in block 4 converted to straightline 
> code.
>   11   │ Merging blocks 2 and 4
>   12   │ fix_loop_structure: fixing up loops for function
>   13   │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y)
>   14   │ {
>   15   │   unsigned char _1;
>   16   │   _Bool _6;
>   17   │   unsigned char _7;
>   18   │   unsigned char _8;
>   19   │   unsigned char _9;
>   20   │
>   21   │[local count: 1073741824]:
>   22   │   _1 = x_3(D) + y_4(D);
>   23   │   _6 = _1 < x_3(D);
>   24   │   _7 = (unsigned char) _6;
>   25   │   _8 = -_7;
>   26   │   _9 = _1 | _8;
>   27   │   return _9;
>   28   │
>   29   │ }
>
> BTW, given sorts of forms will be simplified to the "cheap" one, do we have 
> some best
> practice to avoid the "cheap" form duplication?  Currently for each 
> simplification we have
> one copy of the "cheap" form. The "switch" in match.pd looks like not 
> applicable here.
> I also have a try similar as below but it will generate invalid tree for some 
> cases.
>
> (simplify (cond (gt @0 (plus:c@2 @0 @1)) integer_minus_onep @2)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1))
> -  (bit_ior @2 (negate (convert (lt @2 @0))
> +  { build_cheap_unsigned_int_sat_add (@0, @1, type); }))

That won't work for GIMPLE.

I was originally thinking we either canonicalize all forms to a single
"simple cheap"
form and only match that to .SAT_* later or we use generate the
"simple cheap" form
at the point we'd introduce .SAT_* as fallback in a programmatic way
(do code-gen
from the pass and not via match.pd simplify patterns).  The latter
variant is more
complicated as we'd need to duplicate things in the vectorizer and late.  The
former variant indeed needs to duplicate the simplify result expression for each
matched source (but you could use preprocessor macros if it doesn't
get too ugly).

Richard.

>
> +tree
> +build_cheap_unsigned_int_sat_add (tree op_0, tree op_1, tree type)
> +{
> +  /* (bit_ior @2 (negate (convert (lt @2 @0)) */
> +  return build2 (BIT_IOR_EXPR, type,
> +build2 (PLUS_EXPR, type, op_0, op_1),
> +build1 (NEGATE_EXPR, type,
> +build1 (NOP_EXPR, type,
> +build2 (LT_EXPR, boolean_type_node,
> +build2 (PLUS_EXPR, type, op_0, op_1),
> +op_0;
> +}
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 29, 2024 9:06 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Andrew 
> Pinski 
> Subject: Re: [PATCH 1/5] Match: Simplify branch form 4 of unsigned SAT_ADD 
> into branchless
>
> On Tue, Oct 29, 2024 at 9:27 AM  wrote:
> >
> > From: Pan Li 
> >
> > There are sorts of forms for the unsigned SAT_ADD.  Some of them are
> > complicated while others are cheap.  This patch would like to simplify
> > the co

Re: [PATCH] c: Diagnose char argument to __builtin_stdc_*

2024-10-30 Thread Joseph Myers
On Wed, 30 Oct 2024, Jakub Jelinek wrote:

> 2024-10-30  Jakub Jelinek  
> 
> gcc/c/
>   * c-parser.cc (c_parser_postfix_expression): Diagnose if
>   first __builtin_stdc_* argument has char type even when
>   -funsigned-char.
> gcc/testsuite/
>   * gcc.dg/builtin-stdc-bit-3.c: New test.
>   * gcc.dg/builtin-stdc-rotate-3.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] Allow BB vectorisation of scalar loop when ifcvt versioned loop is not vectorized

2024-10-30 Thread Richard Biener
On Wed, Oct 30, 2024 at 8:47 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> > On 29 Oct 2024, at 8:33 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Oct 29, 2024 at 9:24 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >> Thanks for the review.
> >>
> >>> On 28 Oct 2024, at 9:18 pm, Richard Biener  
> >>> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Mon, Oct 28, 2024 at 9:35 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Hi,
> 
>  When ifcvt version a loop, it sets dont_vectorize to the scalar loop. If 
>  the
>  vector loop is not vectorized and removed, the scalar loop is still left 
>  with
>  dont_vectorize. As a result, BB vectorization will not happen.
> 
>  This patch adds a new attribute called dont_loop_vectorize (that is 
>  different
>  from general dont_vectorize) specifically for loops versioned. BB 
>  vectorization
>  does not need to honour this and still can vectorize.
> 
>  Bootstrapped and regression tested on aarch64-linux-gnu with no new 
>  regressions.
> 
>  Is this OK?
> >>>
> >>> I believe if-conversion never versions a loop that has
> >>> ->dont_vectorize set so when
> >>> the vectorizer elides the .LOOP_VECTORIZED test it can simply clear
> >>> the flag again.
> >>>
> >>> I don't like adding a new flag, instead if the above doesn't work, the
> >>> vectorizer
> >>> should be changed how it identifies loop candidates, not relying on this 
> >>> flag.
> >>
> >> Here is a version where I am resetting dont_vectorize while folding 
> >> ffold_loop_internal_call with false.
> >>
> >> Bootstrapped and regression tested on aarch64-linux-gnu. Is this OK?
> >
> > Hmm, I'm not sure this way is super reliable given taken_edge->dest
> > could be a preheader
> > block.  I see we have many calls to fold_loop_internal_call - one is
> > even suspicious to break
> > with your patch:
> >
> >  /* If we are going to vectorize outer loop, prevent vectorization
> > of the inner loop in the scalar loop - either the scalar loop is
> > thrown away, so it is a wasted work, or is used only for
> > a few iterations.  */
> >  if (scalar_loop->inner)
> >{
> >  gimple *g = vect_loop_vectorized_call (scalar_loop->inner);
> >  if (g)
> >{
> >  arg = gimple_call_arg (g, 0);
> >  get_loop (fun, tree_to_shwi (arg))->dont_vectorize = true;
> >  fold_loop_internal_call (g, boolean_false_node);
> >
> > does it work if you just do
> >
> > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > index af112f212fe..16fa0ec1bb7 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -1326,6 +1326,7 @@ pass_vectorize::execute (function *fun)
> >if (g)
> >  {
> >fold_loop_internal_call (g, boolean_false_node);
> > +   loop->dont_vectorize = false;
> >ret |= TODO_cleanup_cfg;
> >g = NULL;
> >  }
> > @@ -1335,6 +1336,7 @@ pass_vectorize::execute (function *fun)
> >if (g)
> >  {
> >fold_loop_internal_call (g, boolean_false_node);
> > +   loop->dont_vectorize = false;
> >ret |= TODO_cleanup_cfg;
> >  }
> >  }
> >
> > ?
>
> Yes, this works. Bootstrapped and regression tested on aarch64-linux-gnu. Is 
> this OK?

OK.

Thanks,
Richard.

> Thanks,
> Kugan
>
>
>
> >
> >> Thanks,
> >> Kugan
> >>
> >>
> >>>
> >>> Richard.
> >>>
>  Thanks,
>  Kugan
>
>


Ping^3 [PATCH 0/2] Prime path coverage to gcc/gcov

2024-10-30 Thread Jørgen Kvalsvik

Ping.

On 10/21/24 15:21, Jørgen Kvalsvik wrote:

Ping.

On 10/10/24 10:08, Jørgen Kvalsvik wrote:

Ping.

On 10/3/24 12:46, Jørgen Kvalsvik wrote:

This is both a ping and a minor update. A few of the patches from the
previous set have been merged, but the big feature still needs review.

Since then it has been quiet, but there are two notable changes:

1. The --prime-paths-{lines,source} flags take an optional argument to
    print covered or uncovered paths, or both. By default, uncovered
    paths are printed like before.
2. Fixed a bad vector access when independent functions share compiler
    generated statements. A reproducing case is in gcov-23.C which
    relied on printing the uncovered path of multiple destructors of
    static objects.

Jørgen Kvalsvik (2):
   gcov: branch, conds, calls in function summaries
   Add prime path coverage to gcc/gcov

  gcc/Makefile.in    |    6 +-
  gcc/builtins.cc    |    2 +-
  gcc/collect2.cc    |    5 +-
  gcc/common.opt |   16 +
  gcc/doc/gcov.texi  |  184 +++
  gcc/doc/invoke.texi    |   36 +
  gcc/gcc.cc |    4 +-
  gcc/gcov-counter.def   |    3 +
  gcc/gcov-io.h  |    3 +
  gcc/gcov.cc    |  531 ++-
  gcc/ipa-inline.cc  |    2 +-
  gcc/passes.cc  |    4 +-
  gcc/path-coverage.cc   |  782 +
  gcc/prime-paths.cc | 2031 
  gcc/profile.cc |    6 +-
  gcc/selftest-run-tests.cc  |    1 +
  gcc/selftest.h |    1 +
  gcc/testsuite/g++.dg/gcov/gcov-22.C    |  170 ++
  gcc/testsuite/g++.dg/gcov/gcov-23-1.h  |    9 +
  gcc/testsuite/g++.dg/gcov/gcov-23-2.h  |    9 +
  gcc/testsuite/g++.dg/gcov/gcov-23.C    |   30 +
  gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
  gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
  gcc/testsuite/gcc.misc-tests/gcov-33.c |   27 +
  gcc/testsuite/gcc.misc-tests/gcov-34.c |   29 +
  gcc/testsuite/lib/gcov.exp |  118 +-
  gcc/tree-profile.cc    |   11 +-
  29 files changed, 5795 insertions(+), 22 deletions(-)
  create mode 100644 gcc/path-coverage.cc
  create mode 100644 gcc/prime-paths.cc
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-1.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-2.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23.C
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-33.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-34.c









[pushed] c++, contracts: Only check contracts attributes [PR116607].

2024-10-30 Thread Iain Sandoe
Tested on x86_64-darwin, powerpc64-linux, pushed to trunk as 
trivial/obvious, thanks.
Iain

--- 8< ---

The ICE described in the PR is caused by not filtering out non-
contract attributes before making the has_active_contract_condition
test.  Fixed, as suggested by Andrew Pinski, by just using the
existing CONTRACT_CHAIN () macro to advance through the list.

PR c++/116607

gcc/cp/ChangeLog:

* contracts.cc (has_active_contract_condition): Use the
CONTRACT_CHAIN macro to advance through the attribute list.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr116607.C: New test.

Signed-off-by: Iain Sandoe 
---
 gcc/cp/contracts.cc   | 2 +-
 gcc/testsuite/g++.dg/contracts/pr116607.C | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/contracts/pr116607.C

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 2a55b87bd03..113469b49f7 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -1494,7 +1494,7 @@ contract_active_p (tree contract)
 static bool
 has_active_contract_condition (tree d, tree_code c)
 {
-  for (tree as = DECL_CONTRACTS (d) ; as != NULL_TREE; as = TREE_CHAIN (as))
+  for (tree as = DECL_CONTRACTS (d) ; as != NULL_TREE; as = CONTRACT_CHAIN 
(as))
 {
   tree contract = TREE_VALUE (TREE_VALUE (as));
   if (TREE_CODE (contract) == c && contract_active_p (contract))
diff --git a/gcc/testsuite/g++.dg/contracts/pr116607.C 
b/gcc/testsuite/g++.dg/contracts/pr116607.C
new file mode 100644
index 000..726a5bcf646
--- /dev/null
+++ b/gcc/testsuite/g++.dg/contracts/pr116607.C
@@ -0,0 +1,6 @@
+// { dg-options "-std=c++20 -fcontracts " }
+struct a {
+  __attribute__((no_sanitize("")))
+  int f(int) [[pre:true]];
+};
+int a::f(int) { return 0; }
\ No newline at end of file
-- 
2.39.2 (Apple Git-143)



[PATCH v4] [aarch64] Fix function multiversioning dispatcher link error with LTO

2024-10-30 Thread Yangyu Chen
We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
building with LTO, the linker cannot find the
__init_cpu_features_resolver.lto_priv* symbol, causing the link error.

This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
but never defined warning for this symbol, we also mark TREE_PUBLIC to the decl.
We should also mark the decl having hidden visibility. And fix the attribute in
the same way for __aarch64_cpu_features identifier.

Minimal steps to reproduce the bug:

echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c
echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c
echo 'void func1();void func2();int main(){func1();func2();return 0;}' > main.c
gcc -flto -c 1.c 2.c
gcc -flto main.c 1.o 2.o

Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")
Signed-off-by: Yangyu Chen 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (dispatch_function_versions): Adding
DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
__init_cpu_features_resolver and __aarch64_cpu_features.
---
 gcc/config/aarch64/aarch64.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5770491b30c..2b2d5b9e390 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20437,6 +20437,10 @@ dispatch_function_versions (tree dispatch_decl,
   tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
   tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
  init_fn_id, init_fn_type);
+  DECL_EXTERNAL (init_fn_decl) = 1;
+  TREE_PUBLIC (init_fn_decl) = 1;
+  DECL_VISIBILITY (init_fn_decl) = VISIBILITY_HIDDEN;
+  DECL_VISIBILITY_SPECIFIED (init_fn_decl) = 1;
   tree arg1 = DECL_ARGUMENTS (dispatch_decl);
   tree arg2 = TREE_CHAIN (arg1);
   ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
@@ -20456,6 +20460,9 @@ dispatch_function_versions (tree dispatch_decl,
get_identifier ("__aarch64_cpu_features"),
global_type);
   DECL_EXTERNAL (global_var) = 1;
+  TREE_PUBLIC (global_var) = 1;
+  DECL_VISIBILITY (global_var) = VISIBILITY_HIDDEN;
+  DECL_VISIBILITY_SPECIFIED (global_var) = 1;
   tree mask_var = create_tmp_var (long_long_unsigned_type_node);
 
   tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,
-- 
2.47.0



[PATCH v2] phi-opt: Add missed optimization for "(cond | (a != b)) ? b : a"

2024-10-30 Thread Jovan Vukic
Thanks for the feedback on the first version of the patch. Accordingly:

I have corrected the code formatting as requested. I added new tests to
the existing file phi-opt-11.c, instead of creating a new one.

I performed testing before and after applying the patch on the x86
architecture, and I confirm that there are no new regressions.

The logic and general code of the patch itself have not been changed.

> So the A EQ/NE B expression, we can reverse A and B in the expression
> and still get the same result. But don't we have to be more careful for
> the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for
> BIT_IOR we need ? b : a.
>
> I don't see that gets verified in the existing code or after your
> change. I suspect I'm just missing something here. Can you clarify how
> we verify that BIT_AND gets ? a : b for the true/false arms and that
> BIT_IOR gets ? b : a for the true/false arms?

I did not communicate this clearly last time, but the existing optimization
simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b".
Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a".

Thus, the existing and my optimization perform the following
simplifications:

(cond & (a == b)) ? a : b -> b
(cond & (a == b)) ? b : a -> a
(cond | (a != b)) ? a : b -> a
(cond | (a != b)) ? b : a -> b

For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to
confirm that one operand matches the true/false arm and the other matches
the false/true arm. In both cases, we simplify the expression to the third
operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2).
This is achieved in the value_replacement function after successfully
setting the value of *code within the rhs_is_fed_for_value_replacement
function to EQ_EXPR.

For BIT_IOR_EXPR, the same check is performed for A NE B, except now
*code remains NE_EXPR, and then value_replacement returns the second
operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1).

2024-10-30  Jovan Vukic  

gcc/ChangeLog:

* tree-ssa-phiopt.cc
(rhs_is_fed_for_value_replacement): Add a new optimization opportunity
for BIT_IOR_EXPR and a != b.
(operand_equal_for_value_replacement): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-11.c: Add more tests.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c | 31 +-
 gcc/tree-ssa-phiopt.cc | 48 ++
 2 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
index 14c82cd5216..d1e284c5325 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-optimized --param 
logical-op-non-short-circuit=1" } */
+/* { dg-options "-O1 -fdump-tree-phiopt2 -fdump-tree-optimized --param 
logical-op-non-short-circuit=1" } */
 
 int f(int a, int b, int c)
 {
@@ -22,4 +22,33 @@ int h(int a, int b, int c, int d)
  return a;
 }
 
+int i(int a, int b, int c)
+{
+  if ((a > c) & (a == b))
+return a;
+  return b;
+}
+
+int j(int a, int b, int c)
+{
+  if ((a > c) & (a == b))
+return b;
+  return a;
+}
+
+int k(int a, int b, int c)
+{
+  if ((a > c) | (a != b))
+return b;
+  return a;
+}
+
+int l(int a, int b, int c)
+{
+  if ((a > c) | (a != b))
+return a;
+  return b;
+}
+
+/* { dg-final { scan-tree-dump-times "if" 0 "phiopt2" } } */
 /* { dg-final { scan-tree-dump-times "if" 0 "optimized" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index cffafe101a4..61b33bfc361 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1078,17 +1078,18 @@ jump_function_from_stmt (tree *arg, gimple *stmt)
   return false;
 }
 
-/* RHS is a source argument in a BIT_AND_EXPR which feeds a conditional
-   of the form SSA_NAME NE 0.
+/* RHS is a source argument in a BIT_AND_EXPR or BIT_IOR_EXPR which feeds
+   a conditional of the form SSA_NAME NE 0.
 
-   If RHS is fed by a simple EQ_EXPR comparison of two values, see if
-   the two input values of the EQ_EXPR match arg0 and arg1.
+   If RHS is fed by a simple EQ_EXPR or NE_EXPR comparison of two values,
+   see if the two input values of the comparison match arg0 and arg1.
 
If so update *code and return TRUE.  Otherwise return FALSE.  */
 
 static bool
 rhs_is_fed_for_value_replacement (const_tree arg0, const_tree arg1,
-

[PATCH] RISC-V: Fix gcc.target/riscv/rvv/base/cpymem-1.c f3

2024-10-30 Thread Craig Blackmore
The function body checks for f3 only ran with -mcmodel explicitly set
which meant I missed a regression in my local testing of:

  commit b039d06c9a810a3fab4c5eb9d50b0c7aff94b2d8
  Author: Craig Blackmore 
  Date:   Fri Oct 18 09:17:21 2024 -0600

  [PATCH 3/7] RISC-V: Fix vector memcpy smaller LMUL generation

The failure showed up in the rivos CI and it is due to f3 now using
LMUL m1 instead of m8.

I have reworked the test to make it more robust and maintainable.  This
allowed most of the special casing of command line arguments to be
removed.  It also fixes an issue where some targets would enable
multiple versions of the function body check e.g. `-march=rv32gcv
-mcmodel=medany`.
---
 .../gcc.target/riscv/rvv/base/cpymem-1.c  | 107 --
 1 file changed, 48 insertions(+), 59 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
index 6edb4c9253a..81d14d83633 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
@@ -9,6 +9,8 @@
 extern void *memcpy(void *__restrict dest, const void *__restrict src, 
__SIZE_TYPE__ n);
 #endif
 
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8)
+
 /* memcpy should be implemented using the cpymem pattern.
 ** f1:
 XX \.L\d+: # local label is ignored
@@ -50,70 +52,57 @@ void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l)
Use extern here so that we get a known alignment, lest
DATA_ALIGNMENT force us to make the scan pattern accomodate
code for different alignments depending on word size.
-** f3: { target { { any-opts "-mcmodel=medlow" } && { no-opts 
"-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl1024b" "-mrvv-max-lmul=dynamic" 
"-mrvv-max-lmul=m2" "-mrvv-max-lmul=m4" "-mrvv-max-lmul=m8" 
"-mrvv-vector-bits=zvl" } } }
-**lui\s+[ta][0-7],%hi\(a_a\)
-**addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
-**lui\s+[ta][0-7],%hi\(a_b\)
-**addi\s+a4,[ta][0-7],%lo\(a_b\)
-**vsetivli\s+zero,16,e32,m8,ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medlow -mrvv-vector-bits=zvl" 
"-mcmodel=medlow -march=rv64gcv_zvl512b -mrvv-vector-bits=zvl" } && { no-opts 
"-march=rv64gcv_zvl1024b" } } }
-**lui\s+[ta][0-7],%hi\(a_a\)
-**lui\s+[ta][0-7],%hi\(a_b\)
-**addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
-**addi\s+a4,[ta][0-7],%lo\(a_b\)
-**vl(1|4|2)re32\.v\s+v\d+,0\([ta][0-7]\)
-**vs(1|4|2)r\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medlow -march=rv64gcv_zvl1024b" 
"-mcmodel=medlow -march=rv64gcv_zvl512b" } && { no-opts "-mrvv-vector-bits=zvl" 
} } }
-**lui\s+[ta][0-7],%hi\(a_a\)
-**lui\s+[ta][0-7],%hi\(a_b\)
-**addi\s+a4,[ta][0-7],%lo\(a_b\)
-**vsetivli\s+zero,16,e32,(m1|m4|mf2),ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medany" } && { no-opts 
"-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl256b" "-march=rv64gcv_zvl1024b" 
"-mrvv-max-lmul=dynamic" "-mrvv-max-lmul=m8" "-mrvv-max-lmul=m4" 
"-mrvv-vector-bits=zvl" } } }
-**lla\s+[ta][0-7],a_a
-**lla\s+[ta][0-7],a_b
-**vsetivli\s+zero,16,e32,m8,ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medany"  } && { no-opts 
"-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl256b" "-march=rv64gcv" 
"-march=rv64gc_zve64d" "-march=rv64gc_zve32f" } } }
-**lla\s+[ta][0-7],a_b
-**vsetivli\s+zero,16,e32,m(f2|1|4),ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**lla\s+[ta][0-7],a_a
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
+** f3: { target { no-opts "-mrvv-vector-bits=zvl" } }
+**  (
+**  lui\s+[ta][0-7],%hi\(a_a\)
+**  lui\s+[ta][0-7],%hi\(a_b\)
+**  addi\s+[ta][0-7],[ta][0-7],%lo\(a_b\)
+**  vsetivli\s+zero,4,e32,m1,ta,ma
+**  |
+**  lui\s+[ta][0-7],%hi\(a_a\)
+**  lui\s+[ta][0-7],%hi\(a_b\)
+**  li\s+[ta][0-7],\d+
+**  addi\s+[ta][0-7],[ta][0-7],%lo\(a_b\)
+**  vsetvli\s+zero,[ta][0-7],e32,m1,ta,ma
+**  |
+**  lla\s+[ta][0-7],a_b
+**  vsetivli\s+zero,4,e32,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  lla\s+[ta][0-7],a_b
+**  vsetvli\s+zero,[ta][0-7],e32,m1,ta,ma
+**  |
+**  lla\s+[ta][0-7],a_b
+**  li\s+[ta][0-7],32
+**  vsetvli\s+zero,[ta][0-7],e32,m1,ta,ma
+**  )
+**  vle32.v\s+v\d+,0\([ta][0-7]\)
+**  (
+**  addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
+**  |
+**  lla\s+[ta][0-7],a_a
+**  )
+**  vse32.v\s+v\d+,0\([ta][0-7]\)
+**  ret
 */
 
 /*
-** f3: { target { { any-opts "-mcmodel=medany -mrvv-vector-bits=zvl" } && { 
no-opts "-march=rv64gcv_zvl1024b" } } }
-**lla\s+[ta][0-7],a_a
-**lla\s+[ta][0-7],a_b
-*

Re: [PATCH #1/7] allow vuses in ifcombine blocks (was: Re: [PATCH] fold fold_truth_andor field merging into ifcombine)

2024-10-30 Thread Richard Biener
On Fri, Oct 25, 2024 at 4:39 PM Alexandre Oliva  wrote:
>
>
> Disallowing vuses in blocks for ifcombine is too strict, and it
> prevents usefully moving fold_truth_andor into ifcombine.  That
> tree-level folder has long ifcombined loads, absent other relevant
> side effects.

OK.

Richard.

>
> for  gcc/ChangeLog
>
> * tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
> but not vdefs.
> ---
>  gcc/tree-ssa-ifcombine.cc |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index 6a3bc99190d9e..ed20a231951a3 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -129,7 +129,7 @@ bb_no_side_effects_p (basic_block bb)
>enum tree_code rhs_code;
>if (gimple_has_side_effects (stmt)
>   || gimple_could_trap_p (stmt)
> - || gimple_vuse (stmt)
> + || gimple_vdef (stmt)
>   /* We need to rewrite stmts with undefined overflow to use
>  unsigned arithmetic but cannot do so for signed division.  */
>   || ((ass = dyn_cast  (stmt))
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH #3/7] introduce ifcombine_replace_cond (was: Re: [PATCH] fold fold_truth_andor field merging into ifcombine)

2024-10-30 Thread Richard Biener
On Fri, Oct 25, 2024 at 4:39 PM Alexandre Oliva  wrote:
>
>
> Refactor ifcombine_ifandif, moving the common code from the various
> paths that apply the combined condition to a new function.
>
>
> for  gcc/ChangeLog
>
> * tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
> of...
> (ifcombine_ifandif): ... this.
> ---
>  gcc/tree-ssa-ifcombine.cc |  137 
> +
>  1 file changed, 65 insertions(+), 72 deletions(-)
>
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index 0a2ba970548c8..6dcf5e6efe1de 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -399,6 +399,51 @@ update_profile_after_ifcombine (basic_block 
> inner_cond_bb,
>outer2->probability = profile_probability::never ();
>  }
>
> +/* Replace the conditions in INNER_COND with COND.
> +   Replace OUTER_COND with a constant.  */
> +
> +static bool
> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
> +   gcond *outer_cond, bool outer_inv,
> +   tree cond, bool must_canon, tree cond2)
> +{
> +  bool result_inv = inner_inv;
> +
> +  gcc_checking_assert (!cond2);
> +
> +  if (result_inv)
> +cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
> +
> +  if (tree tcanon = canonicalize_cond_expr_cond (cond))
> +cond = tcanon;
> +  else if (must_canon)
> +return false;
> +
> +{

no need for this brace pair?

OK with it dropped.

Richard.

> +  if (!is_gimple_condexpr_for_cond (cond))
> +   {
> + gimple_stmt_iterator gsi = gsi_for_stmt (inner_cond);
> + cond = force_gimple_operand_gsi_1 (&gsi, cond,
> +is_gimple_condexpr_for_cond,
> +NULL, true, GSI_SAME_STMT);
> +   }
> +  gimple_cond_set_condition_from_tree (inner_cond, cond);
> +  update_stmt (inner_cond);
> +
> +  /* Leave CFG optimization to cfg_cleanup.  */
> +  gimple_cond_set_condition_from_tree (outer_cond,
> +  outer_inv
> +  ? boolean_false_node
> +  : boolean_true_node);
> +  update_stmt (outer_cond);
> +}
> +
> +  update_profile_after_ifcombine (gimple_bb (inner_cond),
> + gimple_bb (outer_cond));
> +
> +  return true;
> +}
> +
>  /* If-convert on a and pattern with a common else block.  The inner
> if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
> inner_inv, outer_inv indicate whether the conditions are inverted.
> @@ -408,7 +453,6 @@ static bool
>  ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
>basic_block outer_cond_bb, bool outer_inv)
>  {
> -  bool result_inv = inner_inv;
>gimple_stmt_iterator gsi;
>tree name1, name2, bit1, bit2, bits1, bits2;
>
> @@ -446,26 +490,13 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
> inner_inv,
>t2 = fold_build2 (BIT_AND_EXPR, TREE_TYPE (name1), name1, t);
>t2 = force_gimple_operand_gsi (&gsi, t2, true, NULL_TREE,
>  true, GSI_SAME_STMT);
> -  t = fold_build2 (result_inv ? NE_EXPR : EQ_EXPR,
> -  boolean_type_node, t2, t);
> -  t = canonicalize_cond_expr_cond (t);
> -  if (!t)
> -   return false;
> -  if (!is_gimple_condexpr_for_cond (t))
> -   {
> - gsi = gsi_for_stmt (inner_cond);
> - t = force_gimple_operand_gsi_1 (&gsi, t, 
> is_gimple_condexpr_for_cond,
> - NULL, true, GSI_SAME_STMT);
> -   }
> -  gimple_cond_set_condition_from_tree (inner_cond, t);
> -  update_stmt (inner_cond);
>
> -  /* Leave CFG optimization to cfg_cleanup.  */
> -  gimple_cond_set_condition_from_tree (outer_cond,
> -   outer_inv ? boolean_false_node : boolean_true_node);
> -  update_stmt (outer_cond);
> +  t = fold_build2 (EQ_EXPR, boolean_type_node, t2, t);
>
> -  update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
> +  if (!ifcombine_replace_cond (inner_cond, inner_inv,
> +  outer_cond, outer_inv,
> +  t, true, NULL_TREE))
> +   return false;
>
>if (dump_file)
> {
> @@ -485,9 +516,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
> inner_inv,
>   In that case remove the outer test and change the inner one to
>   test for name & (bits1 | bits2) != 0.  */
>else if (recognize_bits_test (inner_cond, &name1, &bits1, !inner_inv)
> -  && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
> +  && recognize_bits_test (outer_cond, &name2, &bits2, !outer_inv))
>  {
> -  gimple_stmt_iterator gsi;
>tree t;
>
>if ((TREE_CODE (name1) == SSA_NAME
> @@ -530,33 +560,14 @@ ifcombine_ifandif (bas

Re: [PATCH v4 3/7] OpenMP: C front-end support for dispatch + adjust_args

2024-10-30 Thread Tobias Burnus

Hi,

Paul-Antoine Arras wrote:

On 24/10/2024 13:42, Tobias Burnus wrote:
But there is reason to move them to 5/7: I think we also need a run 
test for C++ to make sure that it works, i.e. moving them to 
libgomp.c-c++- common/ makes sense, which in turn requires the 4/7 
C++ FE patch.


I still need to look at 4/7 (C++) and 5/7 (tests for C and C++) [either 
before after you posted the new version].


* * *

However, this 3/7 patch LGTM :-)

One comment: For the < C23 testcase, can you add, e.g., -std=gnu17 ? The 
reason is that Joseph plans to switch to -std=gnu23 by default and 
already modified existing testcases, e.g. r15-4391-g9fb5348e302102  
"testsuite: Prepare for -std=gnu23 default" – and 
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665612.html


* * *

In summary, Patches 1 to 3 are now approved :-)

For 2, I expect a follow that for a known NULL ptr value (0L, nullptr, 
(void*)0L, absent argument in Fortran, absent argument in C++ with '= 
NULL/nullptr' parameter default), the need_device_ptr conversion will 
skip the __builtin_omp_get_mapped_ptr call (at is will just return 
NULL); but that can be done later :-) [Cf. also my comment to the 4/7 
patch.]


* * *

Thanks,

Tobias



Re: [PATCH #4/7] adjust update_profile_after_ifcombine for noncontiguous ifcombine (was: Re: [PATCH] fold fold_truth_andor field merging into ifcombine)

2024-10-30 Thread Richard Biener
On Fri, Oct 25, 2024 at 4:39 PM Alexandre Oliva  wrote:
>
>
> Prepare for ifcombining noncontiguous blocks, adding (still unused)
> logic to the ifcombine profile updater to handle such cases.
>
>
> for  gcc/ChangeLog
>
> * tree-ssa-ifcombine.cc (known_succ_p): New.
> (update_profile_after_ifcombine): Handle noncontiguous blocks.
> ---
>  gcc/tree-ssa-ifcombine.cc |  109 
> +++--
>  1 file changed, 85 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index 6dcf5e6efe1de..b5b72be29bbf9 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -49,6 +49,21 @@ along with GCC; see the file COPYING3.  If not see
>  false) >= 2)
>  #endif
>
> +/* Return FALSE iff the COND_BB ends with a conditional whose result is not a
> +   known constant.  */
> +
> +static bool
> +known_succ_p (basic_block cond_bb)
> +{
> +  gcond *cond = safe_dyn_cast  (*gsi_last_bb (cond_bb));
> +
> +  if (!cond)
> +return true;
> +
> +  return (CONSTANT_CLASS_P (gimple_cond_lhs (cond))
> + && CONSTANT_CLASS_P (gimple_cond_rhs (cond)));
> +}
> +

It now occurs to me that you could use

   find_taken_edge  (cond_bb, NULL_TREE) != NULL

in place of known_succ_p.

>  /* This pass combines COND_EXPRs to simplify control flow.  It
> currently recognizes bit tests and comparisons in chains that
> represent logical and or logical or of two COND_EXPRs.
> @@ -356,14 +371,28 @@ recognize_bits_test (gcond *cond, tree *name, tree 
> *bits, bool inv)
>  }
>
>
> -/* Update profile after code in outer_cond_bb was adjusted so
> -   outer_cond_bb has no condition.  */
> +/* Update profile after code in either outer_cond_bb or inner_cond_bb was
> +   adjusted so that it has no condition.  */
>
>  static void
>  update_profile_after_ifcombine (basic_block inner_cond_bb,
> basic_block outer_cond_bb)

I would hope that Honza can take a look here - in absence OK once the rest is
approved.

Richard.

>  {
> -  edge outer_to_inner = find_edge (outer_cond_bb, inner_cond_bb);
> +  /* In the following we assume that inner_cond_bb has single predecessor.  
> */
> +  gcc_assert (single_pred_p (inner_cond_bb));
> +
> +  basic_block outer_to_inner_bb = inner_cond_bb;
> +  profile_probability prob = profile_probability::always ();
> +  for (;;)
> +{
> +  basic_block parent = single_pred (outer_to_inner_bb);
> +  prob *= find_edge (parent, outer_to_inner_bb)->probability;
> +  if (parent == outer_cond_bb)
> +   break;
> +  outer_to_inner_bb = parent;
> +}
> +
> +  edge outer_to_inner = find_edge (outer_cond_bb, outer_to_inner_bb);
>edge outer2 = (EDGE_SUCC (outer_cond_bb, 0) == outer_to_inner
>  ? EDGE_SUCC (outer_cond_bb, 1)
>  : EDGE_SUCC (outer_cond_bb, 0));
> @@ -374,29 +403,61 @@ update_profile_after_ifcombine (basic_block 
> inner_cond_bb,
>  std::swap (inner_taken, inner_not_taken);
>gcc_assert (inner_taken->dest == outer2->dest);
>
> -  /* In the following we assume that inner_cond_bb has single predecessor.  
> */
> -  gcc_assert (single_pred_p (inner_cond_bb));
> -
> -  /* Path outer_cond_bb->(outer2) needs to be merged into path
> - outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
> - and probability of inner_not_taken updated.  */
> -
> -  inner_cond_bb->count = outer_cond_bb->count;
> +  if (outer_to_inner_bb == inner_cond_bb
> +  && known_succ_p (outer_cond_bb))
> +{
> +  /* Path outer_cond_bb->(outer2) needs to be merged into path
> +outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
> +and probability of inner_not_taken updated.  */
> +
> +  inner_cond_bb->count = outer_cond_bb->count;
> +
> +  /* Handle special case where inner_taken probability is always. In this
> +case we know that the overall outcome will be always as well, but
> +combining probabilities will be conservative because it does not know
> +that outer2->probability is inverse of
> +outer_to_inner->probability.  */
> +  if (inner_taken->probability == profile_probability::always ())
> +   ;
> +  else
> +   inner_taken->probability = outer2->probability
> + + outer_to_inner->probability * inner_taken->probability;
> +  inner_not_taken->probability = profile_probability::always ()
> +   - inner_taken->probability;
>
> -  /* Handle special case where inner_taken probability is always. In this 
> case
> - we know that the overall outcome will be always as well, but combining
> - probabilities will be conservative because it does not know that
> - outer2->probability is inverse of outer_to_inner->probability.  */
> -  if (inner_taken->probability == profile_probability::always ())
> -;
> +  outer_to_inner->probability = profile_probability::always ();
> +  outer2->probability 

Re: [pushed: r15-4760] diagnostics: support multiple output formats simultaneously [PR116613]

2024-10-30 Thread David Malcolm
On Wed, 2024-10-30 at 12:22 +, Jonathan Wakely wrote:
> On 29/10/24 19:19 -0400, David Malcolm wrote:
> > This patch generalizes diagnostic_context so that rather than
> > having
> > a single output format, it has a vector of zero or more.
> 
> [snip]
> 
> > +/* Class for parsing the arguments of -fdiagnostics-add-output=
> > and
> > +   -fdiagnostics-set-output=, and making diagnostic_output_format
> > +   instances (or issuing errors).  */
> > +
> > +class output_factory
> > +{
> > +public:
> > +  class handler
> > +  {
> > +  public:
> > +    handler (std::string name) : m_name (name) {}
> 
> How long are these names?
> 
> If they don't fit in 15 chars, then this should be std::move(name).
> 
> So for a name like "sarif:version=2.1" it should be moved, otherwise
> you make a deep copy and reallocate a new string.

These are the names of the output schemes, which currently are just
"text" and "sarif" [1], well under 15 chars - but there might be other
schemes with longer names in the future [2] so I can use std::move here
(and reinforce the habit of doing it).

Dave

[1] see
https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-add-output

[2] I'm playing with "experimental-html" which is 17 chars



Re: [PATCH #5/7] extend ifcombine_replace_cond to handle noncontiguous ifcombine (was: Re: [PATCH] fold fold_truth_andor field merging into ifcombine)

2024-10-30 Thread Richard Biener
On Fri, Oct 25, 2024 at 4:39 PM Alexandre Oliva  wrote:
>
>
> Prepare to handle noncontiguous ifcombine, introducing logic to modify
> the outer condition when needed.  There are two cases worth
> mentioning:
>
> - when blocks are noncontiguous, we have to place the combined
>   condition in the outer block to avoid pessimizing carefully crafted
>   short-circuited tests;
>
> - even when blocks are contiguous, we prepare for situations in which
>   the combined condition has two tests, one to be placed in outer and
>   the other in inner.  This circumstance will not come up when
>   noncontiguous ifcombine is first enabled, but it will when
>   an improved fold_truth_andor is integrated with ifcombine.
>
> Combining the condition from inner into outer may require moving SSA
> DEFs used in the inner condition, and the changes implement this as
> well.
>
>
> for  gcc/ChangeLog
>
> * tree-ssa-ifcombine.cc: Include bitmap.h.
> (ifcombine_mark_ssa_name): New.
> (struct ifcombine_mark_ssa_name_t): New.
> (ifcombine_mark_ssa_name_walk): New.
> (ifcombine_replace_cond): Prepare to handle noncontiguous and
> split-condition ifcombine.
> ---
>  gcc/tree-ssa-ifcombine.cc |  173 
> -
>  1 file changed, 168 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index b5b72be29bbf9..71c7c9074e94a 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa.h"
>  #include "attribs.h"
>  #include "asan.h"
> +#include "bitmap.h"
>
>  #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
>  #define LOGICAL_OP_NON_SHORT_CIRCUIT \
> @@ -460,17 +461,57 @@ update_profile_after_ifcombine (basic_block 
> inner_cond_bb,
>  }
>  }
>
> -/* Replace the conditions in INNER_COND with COND.
> -   Replace OUTER_COND with a constant.  */
> +/* Set NAME's bit in USED if OUTER dominates it.  */
> +
> +static void
> +ifcombine_mark_ssa_name (bitmap used, tree name, basic_block outer)
> +{
> +  if (SSA_NAME_IS_DEFAULT_DEF (name))
> +return;
> +
> +  gimple *def = SSA_NAME_DEF_STMT (name);
> +  basic_block bb = gimple_bb (def);
> +  if (!dominated_by_p (CDI_DOMINATORS, bb, outer))
> +return;
> +
> +  bitmap_set_bit (used, SSA_NAME_VERSION (name));
> +}
> +
> +/* Data structure passed to ifcombine_mark_ssa_name.  */
> +struct ifcombine_mark_ssa_name_t
> +{
> +  /* SSA_NAMEs that have been referenced.  */
> +  bitmap used;
> +  /* Dominating block of DEFs that might need moving.  */
> +  basic_block outer;
> +};
> +
> +/* Mark in DATA->used any SSA_NAMEs used in *t.  */
> +
> +static tree
> +ifcombine_mark_ssa_name_walk (tree *t, int *, void *data_)
> +{
> +  ifcombine_mark_ssa_name_t *data = (ifcombine_mark_ssa_name_t *)data_;
> +
> +  if (*t && TREE_CODE (*t) == SSA_NAME)
> +ifcombine_mark_ssa_name (data->used, *t, data->outer);
> +
> +  return NULL;
> +}
> +
> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
> +   COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
> +   replaced with a constant, but if there are intervening blocks, it's best 
> to
> +   adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND.  */
>
>  static bool
>  ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
> gcond *outer_cond, bool outer_inv,
> tree cond, bool must_canon, tree cond2)
>  {
> -  bool result_inv = inner_inv;
> -
> -  gcc_checking_assert (!cond2);
> +  bool outer_p = cond2 || (single_pred (gimple_bb (inner_cond))
> +  != gimple_bb (outer_cond));
> +  bool result_inv = outer_p ? outer_inv : inner_inv;
>
>if (result_inv)
>  cond = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
> @@ -480,6 +521,128 @@ ifcombine_replace_cond (gcond *inner_cond, bool 
> inner_inv,
>else if (must_canon)
>  return false;
>
> +  if (outer_p)
> +{
> +  {
> +   auto_bitmap used;

As you are only doing bitmap_set_bit/bitmap_bit_p consider doing

   bitmap_tree_view (used);

to get O(log N) worst-case behavior rather than O(N), not that I expect it
to make a difference in practice.  But we don't have any artificial
limit on the number
of stmts in the middle block, right?

Otherwise OK (tree view at your discretion).

Thanks,
Richard.

> +   basic_block outer_bb = gimple_bb (outer_cond);
> +
> +   /* Mark SSA DEFs that are referenced by cond and may thus need to be
> +  moved to outer.  */
> +   {
> + ifcombine_mark_ssa_name_t data = { used, outer_bb };
> + walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
> +   }
> +
> +   if (!bitmap_empty_p (used))
> + {
> +   /* Iterate up from inner_cond, moving DEFs identified as used by
> +  cond, and marking USEs in the DEFs for moving as well

Re: [PATCH #2/7] drop redundant ifcombine_ifandif parm (was: Re: [PATCH] fold fold_truth_andor field merging into ifcombine)

2024-10-30 Thread Richard Biener
On Fri, Oct 25, 2024 at 4:39 PM Alexandre Oliva  wrote:
>
>
> In preparation to changes that may modify both inner and outer
> conditions in ifcombine, drop the redundant parameter result_inv, that
> is always identical to inner_inv.

OK.

>
> for  gcc/ChangeLog
>
> * tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
> result_inv parm.  Adjust all callers.
> ---
>  gcc/tree-ssa-ifcombine.cc |   18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index ed20a231951a3..0a2ba970548c8 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -401,14 +401,14 @@ update_profile_after_ifcombine (basic_block 
> inner_cond_bb,
>
>  /* If-convert on a and pattern with a common else block.  The inner
> if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
> -   inner_inv, outer_inv and result_inv indicate whether the conditions
> -   are inverted.
> +   inner_inv, outer_inv indicate whether the conditions are inverted.
> Returns true if the edges to the common else basic-block were merged.  */
>
>  static bool
>  ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
> -  basic_block outer_cond_bb, bool outer_inv, bool result_inv)
> +  basic_block outer_cond_bb, bool outer_inv)
>  {
> +  bool result_inv = inner_inv;
>gimple_stmt_iterator gsi;
>tree name1, name2, bit1, bit2, bits1, bits2;
>
> @@ -693,8 +693,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>
>  ...
> */
> -  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false,
> -   false);
> +  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false);
>  }
>
>/* And a version where the outer condition is negated.  */
> @@ -711,8 +710,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>
>  ...
> */
> -  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true,
> -   false);
> +  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true);
>  }
>
>/* The || form is characterized by a common then_bb with the
> @@ -731,8 +729,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>
>  ...
> */
> -  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true,
> -   true);
> +  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true);
>  }
>
>/* And a version where the outer condition is negated.  */
> @@ -748,8 +745,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
> basic_block outer_cond_bb,
>
>  ...
> */
> -  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false,
> -   true);
> +  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false);
>  }
>
>return false;
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH v3] [aarch64] Fix function multiversioning dispatcher link error with LTO

2024-10-30 Thread Yangyu Chen



> On Oct 30, 2024, at 19:59, Richard Sandiford  
> wrote:
> 
> Yangyu Chen  writes:
>> We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
>> building with LTO, the linker cannot find the
>> __init_cpu_features_resolver.lto_priv* symbol, causing the link error.
>> 
>> This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
>> but never defined warning for this symbol, we also mark TREE_PUBLIC to the 
>> decl.
>> We should also mark the decl having hidden visibility. And fix the attribute 
>> in
>> the same way for __aarch64_cpu_features identifier.
>> 
>> Minimal steps to reproduce the bug:
>> 
>> echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 
>> 1.c
>> echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 
>> 2.c
>> echo 'void func1();void func2();int main(){func1();func2();return 0;}' > 
>> main.c
>> gcc -flto -c 1.c 2.c
>> gcc -flto main.c 1.o 2.o
>> 
>> Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/aarch64.cc (dispatch_function_versions): Adding
>> DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
>> __init_cpu_features_resolver and __aarch64_cpu_features.
> 
> Thanks, LGTM.  I've tested this locally and was about to push, but then
> realised: since you've already contributed changes (great!), it probably
> wouldn't be acceptable to treat it as trivial for copyright purposes.
> Could you confirm that you're contributing under the DCO:
> https://gcc.gnu.org/dco.html ?  If so, could you repost with a
> Signed-off-by?
> 
> Sorry for the administrivia.

I added signed-off-by and revised to v4:

https://patchwork.sourceware.org/project/gcc/patch/tencent_f08be088f6b1e3152e508c63c870e31cd...@qq.com/

Since this is a fix patch, it should also be committed to the
releases/gcc-14 branch.

Thanks,
Yangyu Chen

> 
> Richard
> 
>> ---
>> gcc/config/aarch64/aarch64.cc | 7 +++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 5770491b30c..2b2d5b9e390 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -20437,6 +20437,10 @@ dispatch_function_versions (tree dispatch_decl,
>>   tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
>>   tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
>>   init_fn_id, init_fn_type);
>> +  DECL_EXTERNAL (init_fn_decl) = 1;
>> +  TREE_PUBLIC (init_fn_decl) = 1;
>> +  DECL_VISIBILITY (init_fn_decl) = VISIBILITY_HIDDEN;
>> +  DECL_VISIBILITY_SPECIFIED (init_fn_decl) = 1;
>>   tree arg1 = DECL_ARGUMENTS (dispatch_decl);
>>   tree arg2 = TREE_CHAIN (arg1);
>>   ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
>> @@ -20456,6 +20460,9 @@ dispatch_function_versions (tree dispatch_decl,
>> get_identifier ("__aarch64_cpu_features"),
>> global_type);
>>   DECL_EXTERNAL (global_var) = 1;
>> +  TREE_PUBLIC (global_var) = 1;
>> +  DECL_VISIBILITY (global_var) = VISIBILITY_HIDDEN;
>> +  DECL_VISIBILITY_SPECIFIED (global_var) = 1;
>>   tree mask_var = create_tmp_var (long_long_unsigned_type_node);
>> 
>>   tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,




jit backports to GCC 14

2024-10-30 Thread David Malcolm
I've backported the following patches from trunk to releases/gcc-14

testsuite, jit: fix test-error-pr63969-missing-driver.c
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665552.html
  Trunk: r15-4360-gf8dcb559e615db.
  GCC 14: r14-10854-g771873f0a95162

jit: reset state in varasm.cc [PR117275]
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666263.html
  Trunk: r15-4580-g779c0390e3b57d.
  GCC 14: r14-10855-g70f911bf547326

jit: fix leak of pending_assemble_externals_set [PR117275]
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666732.html
  Trunk: r15-4739-g7f41203f08b994.
  GCC 14: r14-10856-gacc0b9ff9cf1bc

Dave



Re: [PATCH 2/4] RISC-V: Implement TARGET_SCHED_PRESSURE_PREFER_NARROW [PR/114729]

2024-10-30 Thread Jeff Law




On 10/20/24 1:40 PM, Vineet Gupta wrote:

This inhibits sched1 aggressive spilling on RISC-V (see prev commit for
details of what the hook does).

On RISC-V (BPI-F3) we see good results.
(Build: -Ofast -march=rv64gcv_zba_zbb_zbs)

   Before:
   --
   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

   4,769,844.10 msec task-clock:u #1.000 CPUs 
utilized
  6,029  context-switches:u   #1.264 /sec
  0  cpu-migrations:u #0.000 /sec
201,468  page-faults:u#   42.238 /sec
  7,631,707,552,979  cycles:u #1.600 GHz
  2,630,225,489,010  instructions:u   #0.34  insn per 
cycle
 10,592,305,077  branches:u   #2.221 M/sec
 16,274,388  branch-misses:u  #0.15% of all 
branches

   After:
   -
   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

   4,471,770.20 msec task-clock:u #0.998 CPUs 
utilized
159,245  context-switches:u   #   35.611 /sec
  2  cpu-migrations:u #0.000 /sec
204,065  page-faults:u#   45.634 /sec
  7,153,778,156,281  cycles:u ( 6% faster)#1.600 GHz
  2,143,115,846,207  instructions:u   (18.5% fewer)   #0.30  insn per 
cycle
 10,592,316,035  branches:u   #2.369 M/sec
 17,229,411  branch-misses:u  #0.16% of all 
branches

Similarly, good results on Cactu on aarch64 as well (qemu dynamic icounts only)
(Build: -march=armv9-a+sve2)

   Before: 1,382,403,783,566
After: 1,264,869,192,921 (8.5% improv)

gcc/ChangeLog:
PR target/114729
* config/riscv/riscv.cc (TARGET_SCHED_PRESSURE_PREFER_NARROW):
Define to true.

gcc/testsuite/ChangeLog:
PR target/114729
* gcc.target/riscv/riscv.exp: Enable new tests to build.
* gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.
This is fine once we finalize naming for patch #1 and update this patch 
for whatever final name is selected.


I worry ever-so-slightly about the testcase being overly-sensitive to 
unrelated changes, but not overly so.  If it turns out to require 
regular twiddling, then we'll deal with it at that time.



jeff



Re: [PATCH 4/4] sched1: model: ICE on infinite loops in predecessor promotion (Not for Merge)

2024-10-30 Thread Jeff Law




On 10/20/24 1:40 PM, Vineet Gupta wrote:

This is just a testing hack in case someone runs into infinite loops
with model schedule change. I did run into quite a few during the course
of development and instead of sched trace files eating up the disk,
better to ICE and abort.

gcc/ChangeLog:

* haifa-sched.cc (model_promote_predecessors): Add infinite
looping checks.
Of course this could fail on really big single block functions.  IIRC 
fppp from older versions of spec was on the order of 10k fp loads/stores 
on targets with 32 double precision registers.  Plus whether arithmetic 
instructions were needed.


jeff



Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Vineet Gupta
On 10/30/24 10:25, Jeff Law wrote:
> On 10/30/24 9:31 AM, Richard Sandiford wrote:
>> That might need some finessing of the name.  But I think the concept
>> is right.  I'd rather base the hook (or param) on a general concept
>> like that rather than a specific "wide vs narrow" thing.
> Agreed.  Naming was my real only concern about the first patch.

We are leaning towards
  - TARGET_SCHED_PRESSURE_SPILL_AGGRESSIVE
  - targetm.sched.pressure_spill_aggressive

Targets could wire them up however they like

>>> I still see Vineet's data as compelling, even with GIGO concern.
>> Do you mean the reduction in dynamic instruction counts?  If so,
>> that isn't what the algorithm is aiming to reduce.  Like I mentioned
>> in the previous thread, trying to minimise dynamic instruction counts
>> was also harmful for the core & benchmarks I was looking at.
>> We just ended up with lots of pipeline bubbles that could be
>> alleviated by judicious spilling.
> Vineet showed significant cycle and icount improvements.  I'm much more 
> interested in the former :-)

The initial premise indeed was icounts but with recent access to some credible 
hardware I'm all for perf measurement now.

Please look at patch 2/4 [1] for actual perf data both cycles and instructions.
I kept 1/4 introducing hook seperate from 2/4 which implements the hook for 
RISC-V.

    [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665945.html

As Jeff mentioned on a In-order RISC-V core, are we are seeing 6% cycle 
improvements from the hook and another 6% cycles improvement from patch 3/4

Also Wilco gave this a spin on high end OoO Neoverse and seems to be seeing 20% 
improvement which I gather is cycles.


> I'm planning to run it on our internal design, but it's not the top of 
> the priority list and it's a scarce resource right now...  I fully 
> expect it'll show a cycle improvement there too, though probably much 
> smaller than the improvement seen on that spacemit k1 design.

That would be another out of order data point.

>> I'm not saying that the algorithm gets the decision right for cactu
>> when tuning for in-order CPU X and running on that same CPU X.
>> But it seems like that combination hasn't been tried, and that,
>> even on the combinations that the patch has been tried on, the cactu
>> justification is based on static properties of the binary rather than
>> a particular runtime improvement (Y% faster).

I'd requested Wilco to possibly try this on some in-order arm cores.

I have a couple of RPI's lying around. RPI3 seems to have A53 which per specs 
is an partial dual issue yet in-order core. I also presume gcc has a
fairly accurate model of A53 pipeline.
I planning to bake a vanilla  -mpcu="cortex-a53+neon" build and give that a 
spin - do I need to anything extra for cpu/uarch.

>> To be clear, the two paragraphs above are trying to explain why I think
>> this should be behind a hook or param rather than unconditional.  The
>> changes themselves look fine, and incorporate the suggestions from the
>> previous thread (thanks!).
> Thanks for that clarifying statement.  I actually think we're broadly in 
> agreement here -- keep it as a hook/param rather than making it 
> unconditional.
>
> Assuming we keep it as a hook/param, opt-in & come up with better 
> name/docs, any objections from your side?

Yes and also please take a look at 3/4 which is a different fix altogether and 
not gated behind any hook.

Thx,
-Vineet


[PATCH 3/5] ctf: translate annotation DIEs to internal ctf

2024-10-30 Thread David Faust
Translate DW_TAG_GNU_annotation DIEs created for C attributes
btf_decl_tag and btf_type_tag into an in-memory representation in the
CTF/BTF container.  They will be output in BTF as BTF_KIND_DECL_TAG and
BTF_KIND_TYPE_TAG records.

The new CTF kinds used to represent these annotations, CTF_K_DECL_TAG
and CTF_K_TYPE_TAG, are expected to be formalized in the next version of
the CTF specification.  For now they only exist in memory as a
translation step to BTF, and are not emitted when generating CTF
information.

gcc/

* ctfc.cc (ctf_dtu_d_union_selector): Handle CTF_K_DECL_TAG and
CTF_K_TYPE_TAG.
(ctf_add_type_tag, ctf_add_decl_tag): New.
(ctf_add_variable): Return the new ctf_dvdef_ref rather than zero.
(new_ctf_container): Initialize new members.
(ctfc_delete_container): Deallocate new members.
* ctfc.h (ctf_dvdef, ctf_dvdef_t, ctf_dvdef_ref): Move forward
declarations earlier in file.
(ctf_tag_t): New typedef.
(ctf_dtdef): Add ctf_tag_t member to dtd_u union.
(ctf_dtu_d_union_enum): Add new CTF_DTU_D_TAG enumerator.
(ctf_container): Add ctfc_tags vector and ctfc_tags_map hash_map
members.
(ctf_add_type_tag, ctf_add_decl_tag): New function protos.
(ctf_add_variable): Change prototype return type to ctf_dvdef_ref.
* dwarf2ctf.cc (gen_ctf_type_tags, gen_ctf_decl_tags)
(gen_ctf_decl_tags_for_var): New static functions.
(gen_ctf_modifier_type): Handle type tags on types with cv-quals.
(gen_ctf_sou_type): Handle decl tags.
(gen_ctf_function_type): Likewise.
(gen_ctf_variable): Likewise.
(gen_ctf_function): Likewise.
(is_cvr_die): New helper function.
(gen_ctf_type): Handle type tags.

include/

* ctf.h (CTF_K_DECL_TAG, CTF_K_TYPE_TAG): New defines.
---
 gcc/ctfc.cc  |  66 -
 gcc/ctfc.h   |  41 +--
 gcc/dwarf2ctf.cc | 180 +--
 include/ctf.h|   4 ++
 4 files changed, 277 insertions(+), 14 deletions(-)

diff --git a/gcc/ctfc.cc b/gcc/ctfc.cc
index 8f531ffebf8..8fca36caa1e 100644
--- a/gcc/ctfc.cc
+++ b/gcc/ctfc.cc
@@ -107,6 +107,9 @@ ctf_dtu_d_union_selector (ctf_dtdef_ref ctftype)
   return CTF_DTU_D_ARGUMENTS;
 case CTF_K_SLICE:
   return CTF_DTU_D_SLICE;
+case CTF_K_DECL_TAG:
+case CTF_K_TYPE_TAG:
+  return CTF_DTU_D_TAG;
 default:
   /* The largest member as default.  */
   return CTF_DTU_D_ARRAY;
@@ -445,6 +448,54 @@ ctf_add_reftype (ctf_container_ref ctfc, uint32_t flag, 
ctf_dtdef_ref ref,
   return dtd;
 }
 
+ctf_dtdef_ref
+ctf_add_type_tag (ctf_container_ref ctfc, uint32_t flag, const char *value,
+ ctf_dtdef_ref ref_dtd)
+{
+  ctf_dtdef_ref dtd;
+   /* Create a DTD for the tag, but do not place it in the regular types list;
+  ctf format does not (yet) encode tags.  */
+  dtd = ggc_cleared_alloc ();
+
+  dtd->dtd_name = ctf_add_string (ctfc, value, &(dtd->dtd_data.ctti_name),
+ CTF_AUX_STRTAB);
+  /* Tags uniquely have a one-to-many relationship where one die may translate
+ to many ctf types.  Therefore we cannot key the tags by die.  */
+  dtd->dtd_key = NULL;
+  dtd->ref_type = ref_dtd;
+  dtd->dtd_data.ctti_info = CTF_TYPE_INFO (CTF_K_TYPE_TAG, flag, 0);
+  dtd->dtd_u.dtu_tag.ref_var = NULL; /* Not used for type tags.  */
+  dtd->dtd_u.dtu_tag.component_idx = -1U; /* Not used for type tags.  */
+
+  /* Insert tag directly into the tag list.  Type ID will be assigned later.  
*/
+  vec_safe_push (ctfc->ctfc_tags, dtd);
+  return dtd;
+}
+
+ctf_dtdef_ref
+ctf_add_decl_tag (ctf_container_ref ctfc, uint32_t flag, const char *value,
+ ctf_dtdef_ref ref_dtd, uint32_t comp_idx)
+{
+   ctf_dtdef_ref dtd;
+   /* Create a DTD for the tag, but do not place it in the regular types list;
+  ctf format does not (yet) encode tags.  */
+  dtd = ggc_cleared_alloc ();
+
+  dtd->dtd_name = ctf_add_string (ctfc, value, &(dtd->dtd_data.ctti_name),
+ CTF_AUX_STRTAB);
+  /* Tags uniquely have a one-to-many relationship where one die may translate
+ to many ctf types.  Therefore we cannot key the tags by die.  */
+  dtd->dtd_key = NULL;
+  dtd->ref_type = ref_dtd;
+  dtd->dtd_data.ctti_info = CTF_TYPE_INFO (CTF_K_DECL_TAG, flag, 0);
+  dtd->dtd_u.dtu_tag.ref_var = NULL;
+  dtd->dtd_u.dtu_tag.component_idx = comp_idx;
+
+  /* Insert tag directly into the tag list.  Type ID will be assigned later.  
*/
+  vec_safe_push (ctfc->ctfc_tags, dtd);
+  return dtd;
+}
+
 ctf_dtdef_ref
 ctf_add_forward (ctf_container_ref ctfc, uint32_t flag, const char * name,
 uint32_t kind, dw_die_ref die)
@@ -691,12 +742,12 @@ ctf_add_member_offset (ctf_container_ref ctfc, dw_die_ref 
sou,
   return 0;
 }
 
-int
+ctf_dvdef_ref
 ctf_add_variable (ctf_container_ref ctfc, const char * name, ctf_dtdef_ref ref,
 

[PATCH 1/5] c-family: add btf_type_tag and btf_decl_tag attributes

2024-10-30 Thread David Faust
Add two new c-family attributes, "btf_decl_tag" and "btf_type_tag" along
with a simple shared handler for them.

gcc/c-family/

* c-attribs.cc (c_common_attribute_table): Add btf_decl_tag and
btf_type_tag attributes.
(handle_btf_tag_attribute): New handler for both new attributes.
---
 gcc/c-family/c-attribs.cc | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 4dd2eecbea5..76374413f9e 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -187,6 +187,8 @@ static tree handle_fd_arg_attribute (tree *, tree, tree, 
int, bool *);
 static tree handle_flag_enum_attribute (tree *, tree, tree, int, bool *);
 static tree handle_null_terminated_string_arg_attribute (tree *, tree, tree, 
int, bool *);
 
+static tree handle_btf_tag_attribute (tree *, tree, tree, int, bool *);
+
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)  \
   { name, function, type, variable }
@@ -635,7 +637,11 @@ const struct attribute_spec c_common_gnu_attributes[] =
   { "flag_enum", 0, 0, false, true, false, false,
  handle_flag_enum_attribute, NULL },
   { "null_terminated_string_arg", 1, 1, false, true, true, false,
- handle_null_terminated_string_arg_attribute, NULL}
+ handle_null_terminated_string_arg_attribute, 
NULL},
+  { "btf_type_tag",  1, 1, false, true, false, false,
+ handle_btf_tag_attribute, NULL},
+  { "btf_decl_tag",  1, 1, true, false, false, false,
+ handle_btf_tag_attribute, NULL}
 };
 
 const struct scoped_attribute_specs c_common_gnu_attribute_table =
@@ -5069,6 +5075,23 @@ handle_null_terminated_string_arg_attribute (tree *node, 
tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle the "btf_decl_tag" and "btf_type_tag" attributes.  */
+
+static tree
+handle_btf_tag_attribute (tree * ARG_UNUSED (node), tree name, tree args,
+ int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (!args)
+*no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  error ("%qE attribute requires a string", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Handle the "nonstring" variable attribute.  */
 
 static tree
-- 
2.45.2



[PATCH 5/5] doc: document btf_type_tag and btf_decl_tag attributes

2024-10-30 Thread David Faust
gcc/

* doc/extend.texi (Common Variable Attributes): Document new
btf_decl_tag attribute.
(Common Type Attributes): Document new btf_type_tag attribute.
---
 gcc/doc/extend.texi | 68 +
 1 file changed, 68 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 42bd567119d..fd8f2425947 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7920,6 +7920,41 @@ align them on any target.
 The @code{aligned} attribute can also be used for functions
 (@pxref{Common Function Attributes}.)
 
+@cindex @code{btf_decl_tag} variable attribute
+@item btf_decl_tag (@var{argument})
+The @code{btf_decl_tag} attribute may be used to associate variable
+declarations, struct or union member declarations, function
+declarations, or function parameter declarations with arbitrary strings.
+These strings are not interpreted by the compiler in any way, and have
+no effect on code generation.  Instead, these user-provided strings
+are recorded in DWARF (via @code{DW_AT_GNU_annotation} and
+@code{DW_TAG_GNU_annotation} extensions) and BTF information (via
+@code{BTF_KIND_DECL_TAG} records), and associated to the attributed
+declaration.  If neither DWARF nor BTF information is generated, the
+attribute has no effect.
+
+The argument is treated as an ordinary string in the source language
+with no additional special rules.
+
+The attribute may be supplied multiple times for a single declaration,
+in which case each distinct argument string will be recorded in a
+separate DIE or BTF record, each associated to the declaration.  For
+a single declaration with multiple @code{btf_decl_tag} attributes,
+the order of the @code{DW_TAG_GNU_annotation} DIEs produced is not
+guaranteed to maintain the order of attributes in the source code.
+
+For example:
+
+@smallexample
+int * foo __attribute__ ((btf_decl_tag ("__percpu")));
+@end smallexample
+
+@noindent
+when compiled with @code{-gbtf} results in an additional
+@code{BTF_KIND_DECL_TAG} BTF record to be emitted in the BTF info,
+associating the string ``__rcu'' with the normal @code{BTF_KIND_VAR}
+record for the variable ``foo''.
+
 @cindex @code{counted_by} variable attribute
 @item counted_by (@var{count})
 The @code{counted_by} attribute may be attached to the C99 flexible array
@@ -9109,6 +9144,39 @@ is given by the product of arguments 1 and 2, and that
 @code{malloc_type}, like the standard C function @code{malloc},
 returns an object whose size is given by argument 1 to the function.
 
+@cindex @code{btf_type_tag} type attribute
+@item btf_type_tag (@var{argument})
+The @code{btf_type_tag} attribute may be used to associate (to ``tag'')
+particular types with arbitrary string annotations.  These annotations
+are recorded in debugging info by supported debug formats, currently
+DWARF (via @code{DW_AT_GNU_annotation} and @code{DW_TAG_GNU_annotation}
+extensions) and BTF (via @code{BTF_KIND_TYPE_TAG} records).  These
+annotation string are not interpreted by the compiler in any way, and
+have no effect on code generation.  If neither DWARF nor BTF
+information is generated, the attribute has no effect.
+
+The argument is treated as an ordinary string in the source language
+with no additional special rules.
+
+The attribute may be supplied multiple times for a single declaration,
+in which case each distinct argument string will be recorded in a
+separate DIE or BTF record, each associated to the type.  For a single
+type with multiple @code{btf_decl_tag} attributes, the order of the
+@code{DW_TAG_GNU_annotation} DIEs produced is not guaranteed to
+maintain the order of attributes in the source code.
+
+For example
+
+@smallexample
+int * __attribute__ ((btf_type_tag ("__user"))) foo;
+@end smallexample
+
+@noindent
+associates the string ``__user'' to the pointer-to-integer type of
+the declaration.  This string will be recorded in DWARF and/or BTF
+information associated with the appropriate pointer type DIE or
+@code{BTF_KIND_PTR} record.
+
 @cindex @code{copy} type attribute
 @item copy
 @itemx copy (@var{expression})
-- 
2.45.2



[PATCH 4/5] btf: generate and output DECL_TAG and TYPE_TAG records

2024-10-30 Thread David Faust
Support the btf_decl_tag and btf_type_tag attributes in BTF by creating
and emitting BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records,
respectively, for them.

Some care is required when -gprune-btf is in effect to avoid emitting
decl or type tags for declarations or types which have been pruned and
will not be emitted in BTF.

gcc/
* btfout.cc (get_btf_kind): Handle DECL_TAG and TYPE_TAG kinds.
(btf_calc_num_vbytes): Likewise.
(btf_asm_type): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_tags): New.
(btf_output): Call it here.
(btf_add_used_type): Replace with simple wrapper around...
(btf_add_used_type_1): ...the implementation.  Handle
BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
(btf_add_vars): Update btf_add_used_type call.
(btf_assign_tag_ids): New.
(btf_mark_type_used): Update btf_add_used_type call.
(btf_collect_pruned_types): Likewise.  Handle type and decl tags.
(btf_finish): Call btf_assign_tag_ids.

gcc/testsuite/
* gcc.dg/debug/btf/btf-decl-tag-1.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-2.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-3.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-4.c: New test.
* gcc.dg/debug/btf/btf-type-tag-1.c: New test.
* gcc.dg/debug/btf/btf-type-tag-2.c: New test.
* gcc.dg/debug/btf/btf-type-tag-3.c: New test.
* gcc.dg/debug/btf/btf-type-tag-4.c: New test.
* gcc.dg/debug/btf/btf-type-tag-c2x-1.c: New test.

include/
* btf.h (BTF_KIND_DECL_TAG, BTF_KIND_TYPE_TAG) New defines.
(struct btf_decl_tag): New.
---
 gcc/btfout.cc | 176 +++---
 .../gcc.dg/debug/btf/btf-decl-tag-1.c |  14 ++
 .../gcc.dg/debug/btf/btf-decl-tag-2.c |  22 +++
 .../gcc.dg/debug/btf/btf-decl-tag-3.c |  22 +++
 .../gcc.dg/debug/btf/btf-decl-tag-4.c |  34 
 .../gcc.dg/debug/btf/btf-type-tag-1.c |  27 +++
 .../gcc.dg/debug/btf/btf-type-tag-2.c |  17 ++
 .../gcc.dg/debug/btf/btf-type-tag-3.c |  21 +++
 .../gcc.dg/debug/btf/btf-type-tag-4.c |  25 +++
 .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c |  23 +++
 include/btf.h |  14 ++
 11 files changed, 371 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 083ca48d627..e8190f685f9 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -141,6 +141,8 @@ get_btf_kind (uint32_t ctf_kind)
 case CTF_K_VOLATILE: return BTF_KIND_VOLATILE;
 case CTF_K_CONST:return BTF_KIND_CONST;
 case CTF_K_RESTRICT: return BTF_KIND_RESTRICT;
+case CTF_K_DECL_TAG: return BTF_KIND_DECL_TAG;
+case CTF_K_TYPE_TAG: return BTF_KIND_TYPE_TAG;
 default:;
 }
   return BTF_KIND_UNKN;
@@ -217,6 +219,7 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
 case BTF_KIND_CONST:
 case BTF_KIND_RESTRICT:
 case BTF_KIND_FUNC:
+case BTF_KIND_TYPE_TAG:
 /* These kinds have no vlen data.  */
   break;
 
@@ -256,6 +259,10 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
   vlen_bytes += vlen * sizeof (struct btf_var_secinfo);
   break;
 
+case BTF_KIND_DECL_TAG:
+  vlen_bytes += sizeof (struct btf_decl_tag);
+  break;
+
 default:
   break;
 }
@@ -452,6 +459,20 @@ btf_asm_type (ctf_dtdef_ref dtd)
 and should write 0.  */
   dw2_asm_output_data (4, 0, "(unused)");
   return;
+case BTF_KIND_DECL_TAG:
+  {
+   if (dtd->ref_type)
+ break;
+   else if (dtd->dtd_u.dtu_tag.ref_var)
+ {
+   /* ref_type is NULL for decl tag attached to a variable.  */
+   ctf_dvdef_ref dvd = dtd->dtd_u.dtu_tag.ref_var;
+   dw2_asm_output_data (4, dvd->dvd_id,
+"btt_type: (BTF_KIND_VAR '%s')",
+dvd->dvd_name);
+   return;
+ }
+  }
 default:
   break;
 }
@@ -801,6 +822,12 @@ output_asm_btf_vlen_bytes (ctf_container_ref ctfc, 
ctf_dtdef_ref dtd)
 at this point.  */
   gcc_unreachable ();
 
+case BTF_KIND_DECL_TAG:
+  dw2_asm_output_data (4, dtd->dtd_u.dtu_tag.component_idx,
+  "component_idx=%d",
+  dtd->dtd_u.dtu_tag.component_idx);
+  break;
+
 

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread David Malcolm
On Wed, 2024-10-30 at 15:53 +, Qing Zhao wrote:
> 
> 
> > On Oct 30, 2024, at 10:48, David Malcolm 
> > wrote:
> > 
> > On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
> > > Qing Zhao  writes:
> > > 
> > > > Control this with a new option -fdiagnostics-details.
> > > > 

[...]

> 
> I have a question on the changes to the “warning_at”: (there are a
> lot of such changes for -Warray-bounds and -Wstringop-**)
> 
> -   warned = warning_at (location, OPT_Warray_bounds_,
> +   {
> + rich_location *richloc
> +   = build_rich_location_with_diagnostic_path (location,
> stmt);
> + warned = warning_at (richloc, OPT_Warray_bounds_,
> 
> The above is the current change.
> 
> My concern with this change is: 
> even when -fdiagnostics_details is NOT on, the rich_location is
> created. 

A rich_location instance is always constructed when emitting
diagnostics; warning_at with a location_t simply makes a rich_location
on the stack.
> 
> How much is the additional overhead when using “rich_location *”
> other than “location_t” as the 1st argument of warning_at?

The warning_at overload taking a rich_location * takes a borrowed
pointer to a rich_location; it doesn't take ownership.  Hence, as
written, the patch has a memory leak: every call to
build_rich_location_with_diagnostic_path is using "new" to make a new
rich_location instance on the heap, and they aren't being deleted.

> 
> Should I control the creation of “rich_location" with the flag
> “flag_diagnostics_details” (Similar as I control the creation of
> “move_history” data structure with the flag
> “flag_diagnostics_details”? 
> 
> If so, how should I do it? Do you have a suggestion on a clean and
> simply coding here (Sorry for the stupid question on this)

You can probably do all of this on the stack; make a new rich_location
subclass, with something like:

class rich_location_with_details : public gcc_rich_location
{
public:
  rich_location_with_details (location_t location, gimple *stmt);

private:
  class deferred_move_history_path {
  public:
 deferred_move_history_path (location_t location, gimple *stmt)
 : m_location (location), m_stmt (stmt)
 {
 }

 std::unique_ptr
 make_path () const final override;
 /* TODO: you'll need to implement this; it will be called on
demand if a diagnostic is acutally emitted for this
rich_location.  */

location_t m_location;
gimple *m_stmt;
  } m_deferred_move_history_path;
};

rich_location_with_details::
rich_location_with_details (location_t location, gimple *stmt)
: gcc_rich_location (location),
  m_deferred_move_history_path (location, stmt)
{
  set_path (&m_deferred_move_history_path);
}

using class deferred_diagnostic_path from the attached patch (caveat: I
haven't tried bootstrapping it yet).

With that support subclass, you should be able to do something like
this to make them on the stack:

   rich_location_with_details richloc (location, stmt);
   warned = warning_at (&richloc, OPT_Warray_bounds_,
"array subscript %E is outside array"
" bounds of %qT", low_sub_org, artype);

and no work will be done for path creation unless and until a
diagnostic is actually emitted for richloc - the richloc ctor will just
initialize the vtable and some location_t/gimple * fields, which ought
to be very cheap for the "warning is disabled" case .

I'll try bootstrapping the attached patch.

Hope this makes sense.
Dave
From a45e3718315ed1c2e2242f76c6af3aa5d646636a Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Wed, 30 Oct 2024 14:03:52 -0400
Subject: [PATCH] diagnostics: add class deferred_diagnostic_path

This patch adds a new class deferred_diagnostic_path for
use when creating rich_location instances, to allow deferring
expensive computations until the path is actually used (when
a diagnostic using the rich_location is emitted).

gcc/ChangeLog:
	* Makefile.in (OBJS): Add deferred-diagnostic-path.o.
	* deferred-diagnostic-path.cc: New file.
	* deferred-diagnostic-path.h: New file.
	* selftest-diagnostic.cc: Include "diagnostic-format.h".
	(test_diagnostic_context::test_diagnostic_context): Turn off
	flushing for the output format's printer.
	* selftest-run-tests.cc (selftest::run_tests): Call
	selftest::deferred_diagnostic_path_cc_tests.
	* selftest.h (selftest::deferred_diagnostic_path_cc_tests): New decl.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in |   1 +
 gcc/deferred-diagnostic-path.cc | 206 
 gcc/deferred-diagnostic-path.h  |  56 +
 gcc/selftest-diagnostic.cc  |   2 +
 gcc/selftest-run-tests.cc   |   1 +
 gcc/selftest.h  |   1 +
 6 files changed, 267 insertions(+)
 create mode 100644 gcc/deferred-diagnostic-path.cc
 create mode 100644 gcc/deferred-diagnostic-path.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 798d4302fa78..2d93d6451e20 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread David Malcolm
On Wed, 2024-10-30 at 17:33 +, Sam James wrote:
> Qing Zhao  writes:
> 
> > > On Oct 30, 2024, at 10:48, David Malcolm 
> > > wrote:
> > > 
> > > On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
> > > > Qing Zhao  writes:
> > > > 
> > > > > Control this with a new option -fdiagnostics-details.
> > > > > 
> > > > > [...]
> > > > 
> > > > The patch doesn't apply for me on very latest trunk -- I think
> > > > David's
> > > > recent diag refactoring means it needs a slight rebase. Could
> > > > you
> > > > send
> > > > that?
> > > 
> > > If it's broken, it was probably by:
> > > 
> > > r15-4610 ("Use unique_ptr in more places in
> > > pretty_printer/diagnostics
> > > [PR116613]")
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bf43fe6aa966eaf397ea3b8ebd6408d3d124e285
> > 
> > Yes, due to the following change in the above commit:
> > 
> > diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> > index
> > 62034c32b4aff32cdf2cb051bf9d0803b4730b3f..a12a2e1afba15ba16f6ade624
> > cde3e60907ba5d2 100644 (file)
> > --- a/gcc/toplev.cc
> > +++ b/gcc/toplev.cc
> > @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not
> > see
> >  #include "cgraph.h"
> >  #include "coverage.h"
> >  #include "diagnostic.h"
> > +#include "pretty-print-urlifier.h"
> >  #include "varasm.h"
> >  #include "tree-inline.h"
> >  #include "realmpfr.h"  /* For GMP/MPFR/MPC versions, in
> > print_version.  */
> > 
> > 
> > [...]
> > > 
> 
> To continue testing, I am using the attached hacked up patches

Thanks; FWIW the fixes in those patches look correct to me.

Dave



Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-30 Thread Andi Kleen
On Wed, Oct 23, 2024 at 02:56:51PM +0200, Richard Biener wrote:
> On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen  wrote:
> >
> > From: Andi Kleen 
> >
> > Retrieving sys/user time in timevars is quite expensive because it
> > always needs a system call. Only getting the wall time is much
> > cheaper because operating systems have optimized paths for this.
> >
> > The sys time isn't that interesting for a compiler and wall time
> > is usually close to user time except when the system is overloaded.
> > On the other hand when it is not wall time is more accurate because
> > it has less overhead.
> >
> > For building tramp3d with -O0 the -ftime-report overhead drops from
> > 18% to 3%. For -O2 it drops from 8% to not measurable.
> >
> > I changed the code to use gettimeofday as a fallback for clock_gettime
> > CLOCK_MONOTONIC.  If a host has neither of those the time will not
> > be measured. Previously clock was the fallback.
> 
> OK for trunk if there's no serious objection until mid next week.

I committed the patch now.

-Andi


[Patch, fortran] PR115700 - comment 5: uninitialized string length in ASSOCIATE

2024-10-30 Thread Paul Richard Thomas
This wrinkle to PR115700 came about because the associate-name string
length was not being initialized, when an array selector had a substring
reference with non-constant start or end. This, of course, caused
subsequent references to fail.

The ChangeLog provides an adequate explanation of the attached patch.

OK for mainline and backporting to 14-branch?

Paul
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 565d4aa5fe9..8045deddd8a 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -6153,6 +6153,15 @@ resolve_variable (gfc_expr *e)
 	  e->ref = newref;
 	}
 }
+  else if (sym->assoc && sym->ts.type == BT_CHARACTER && sym->ts.deferred)
+{
+  gfc_ref *ref;
+  for (ref = e->ref; ref; ref = ref->next)
+	if (ref->type == REF_SUBSTRING)
+	  break;
+  if (ref == NULL)
+	e->ts = sym->ts;
+}
 
   if (e->ref && !gfc_resolve_ref (e))
 return false;
@@ -9871,6 +9880,15 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
   /* Fix up the type-spec for CHARACTER types.  */
   if (sym->ts.type == BT_CHARACTER && !sym->attr.select_type_temporary)
 {
+  gfc_ref *ref;
+  for (ref = target->ref; ref; ref = ref->next)
+	if (ref->type == REF_SUBSTRING
+	&& ((ref->u.ss.start
+		 && ref->u.ss.start->expr_type != EXPR_CONSTANT)
+		|| (ref->u.ss.end
+		&& ref->u.ss.end->expr_type != EXPR_CONSTANT)))
+	  break;
+
   if (!sym->ts.u.cl)
 	sym->ts.u.cl = target->ts.u.cl;
 
@@ -9889,9 +9907,10 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
 		gfc_get_int_expr (gfc_charlen_int_kind, NULL,
   target->value.character.length);
 	}
-  else if ((!sym->ts.u.cl->length
-		|| sym->ts.u.cl->length->expr_type != EXPR_CONSTANT)
+  else if (((!sym->ts.u.cl->length
+		 || sym->ts.u.cl->length->expr_type != EXPR_CONSTANT)
 		&& target->expr_type != EXPR_VARIABLE)
+	   || ref)
 	{
 	  if (!sym->ts.deferred)
 	{
@@ -9901,7 +9920,10 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
 
 	  /* This is reset in trans-stmt.cc after the assignment
 	 of the target expression to the associate name.  */
-	  sym->attr.allocatable = 1;
+	  if (ref && sym->as)
+	sym->attr.pointer = 1;
+	  else
+	sym->attr.allocatable = 1;
 	}
 }
 
@@ -11508,8 +11530,9 @@ resolve_block_construct (gfc_code* code)
 {
   gfc_namespace *ns = code->ext.block.ns;
 
-  /* For an ASSOCIATE block, the associations (and their targets) are already
- resolved during resolve_symbol. Resolve the BLOCK's namespace.  */
+  /* For an ASSOCIATE block, the associations (and their targets) will be
+ resolved by gfc_resolve_symbol, during resolution of the BLOCK's
+ namespace.  */
   gfc_resolve (ns);
 }
 
diff --git a/gcc/testsuite/gfortran.dg/associate_70.f90 b/gcc/testsuite/gfortran.dg/associate_70.f90
new file mode 100644
index 000..397754c0b52
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_70.f90
@@ -0,0 +1,40 @@
+! { dg-do run }
+! ( dg-options "-Wuninitialized" )
+!
+! Test fix for PR115700 comment 5, in which ‘.tmp4’ is used uninitialized and
+! both normal and scalarized array references did not work correctly.
+!
+! Contributed by Harald Anlauf  
+!
+  character(4), dimension(3) :: chr = ['abcd', 'efgh', 'ijkl']
+  call mvce (chr)
+  if (any (chr /= ['ABcd', 'EFgh', 'IJkl'])) stop 1
+contains
+  subroutine mvce(x)
+implicit none
+character(len=*), dimension(:), intent(inOUT), target :: x
+integer :: i
+i = len(x)
+
+! This was broken
+associate (tmp1 => x(:)(1:i/2))
+  if (len (tmp1) /= i/2) stop 2
+  if (tmp1(2) /= 'ef') stop 3
+  if (any (tmp1 /= ['ab', 'ef', 'ij'])) stop 4
+  tmp1 = ['AB','EF','IJ']
+end associate
+
+! Retest things that worked previously.
+associate (tmp2 => x(:)(1:2))
+  if (len (tmp2) /= i/2) stop 5
+  if (tmp2(2) /= 'EF') stop 6
+  if (any (tmp2 /= ['AB','EF','IJ'])) stop 7
+end associate
+
+associate (tmp3 => x(3)(1:i/2))
+  if (len (tmp3) /= i/2) stop 8
+  if (tmp3 /= 'IJ') stop 9
+end associate
+
+  end subroutine mvce
+end


Change.Logs
Description: Binary data


Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao


> On Oct 30, 2024, at 10:34, Sam James  wrote:
> 
> Qing Zhao  writes:
> 
>> Control this with a new option -fdiagnostics-details.
>> 
>> [...]
> 
> The patch doesn't apply for me on very latest trunk -- I think David's
> recent diag refactoring means it needs a slight rebase. Could you send
> that?
I rebased the patch sets against a later trunk, but looks like not late enough…
I will rebase it on the latest one and resend the patch.

Sorry about this.

Qing
> 



Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Richard Sandiford
Jeff Law  writes:
> On 10/30/24 8:44 AM, Richard Sandiford wrote:
>
>>> But the data from the BPI (spacemit k1 chip) is an in-order core.
>>> Granted we don't have a good model of its pipeline, but it's definitely
>>> in-order.
>> 
>> Damn :)  (I did try to clarify what was being tested earlier, but the
>> response wasn't clear.)
>> 
>> So how representative is the DFA model being used for the BPI?
>> Is it more "pretty close, but maybe different in a few minor details"?
>> Or is it more "we're just using an existing DFA model for a different
>> core and hoping for the best"?  Is the issue width accurate?
>> 
>> If we're scheduling for an in-order core without an accurate pipeline
>> model then that feels like the first thing to fix.  Otherwise we're
>> in danger of GIGO.
> GIGO is a risk here -- there really isn't good data on the pipeline for 
> that chip, especially on the FP side.  I don't really have a good way to 
> test this on an in-order RISC-V target where there is a reasonable DFA 
> model.

OK (and yeah, I can sympathise).  But I think there's an argument that,
if you're scheduling for one in-order core using the pipeline of an
unrelated core, that's effectively scheduling for the core as though
it were out-of-order.  In other words, the property we care about
isn't so much whether the processor itself is in-order (a statement
about the uarch), but whether we trying to schedule for a particular
in-order pipeline (a statement about what GCC is doing or knows about).
I'd argue that in the case you describe, we're not trying to schedule
for a particular in-order pipeline.

That might need some finessing of the name.  But I think the concept
is right.  I'd rather base the hook (or param) on a general concept
like that rather than a specific "wide vs narrow" thing.

> I still see Vineet's data as compelling, even with GIGO concern.

Do you mean the reduction in dynamic instruction counts?  If so,
that isn't what the algorithm is aiming to reduce.  Like I mentioned
in the previous thread, trying to minimise dynamic instruction counts
was also harmful for the core & benchmarks I was looking at.
We just ended up with lots of pipeline bubbles that could be
alleviated by judicious spilling.

I'm not saying that the algorithm gets the decision right for cactu
when tuning for in-order CPU X and running on that same CPU X.
But it seems like that combination hasn't been tried, and that,
even on the combinations that the patch has been tried on, the cactu
justification is based on static properties of the binary rather than
a particular runtime improvement (Y% faster).

To be clear, the two paragraphs above are trying to explain why I think
this should be behind a hook or param rather than unconditional.  The
changes themselves look fine, and incorporate the suggestions from the
previous thread (thanks!).

Richard


[PATCH v1 2/2] aarch64: specify fpm mode in function instances and groups

2024-10-30 Thread Claudio Bantaloukas

Some intrinsics require setting the fpm register before calling the
specific asm opcode required.
In order to simplify review, this patch:
- adds the fpm_mode_index attribute to function_group_info and
  function_instance objects
- updates existing initialisations and call sites.
- updates equality and hash operations

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svdiv_impl): Specify FPM_unused when folding.
(svmul_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def
(svreinterpret): Specify FPM_unused mode
* config/aarch64/aarch64-sve-builtins-shapes.cc
(build_one): Use the group fpm_mode when creating function instances.
* config/aarch64/aarch64-sve-builtins-sme.def
(DEF_SME_FUNCTION): specify FPM_unset mode
(DEF_SME_ZA_FUNCTION_GS): Allow specifying fpm mode
(DEF_SME_ZA_FUNCTION): specify FPM_unset mode
(svadd,svadd_write,svdot, svdot_lane, svluti2_lane_zt, svluti4_lane_zt,
svmla, svmla_lane, svmls, svmls_lane, svread, svread_hor, svread_ver,
svsub, svsub_write, svsudot, svsudot_lane, svsuvdot_lane, svusdot,
svusdot_lane, svusvdot_lane, svvdot_lane, svwrite, svwrite_hor,
svwrite_ver): Likewise
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svaba_impl, svqrshl_impl, svqshl_impl,svrshl_impl, svsra_impl):
Specify FPM_unused when folding.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svadd, svclamp, svcvt, svcvtn, svld1, svldnt1, svmax, svmaxnm, svmin,
svminnm, svpext_lane, svqcvt, svqcvtn, svqdmulh, svqrshr, svqrshrn,
svqrshru, svqrshrun, svrinta, svrintm, svrintn, svrintp, svrshl, svsel,
svst1, svstnt1, svunpk, svuzp, svuzpq, svwhilege, svwhilegt, svwhilele,
svwhilelt, svzip, svzipq): Likewise
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Set
fpm_mode on all elements.
(neon_sve_function_groups, sme_function_groups): Likewise.
(function_instance::hash): Include fpm_mode in hash.
(function_builder::add_overloaded_functions): Use the group fpm mode.
(function_resolver::lookup_form): Use the function instance fpm_mode
when looking up a function.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_FUNCTION_GS): add argument.
(DEF_SVE_FUNCTION): specify FPM_unset mode.
* config/aarch64/aarch64-sve-builtins.h (fpm_mode_index): New.
(function_group_info): Add fpm_mode.
(function_instance): Likewise.
(function_instance::operator==): Handle fpm_mode.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  15 +-
 .../aarch64/aarch64-sve-builtins-base.def |   2 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc|   3 +-
 .../aarch64/aarch64-sve-builtins-sme.def  | 130 ++
 .../aarch64/aarch64-sve-builtins-sve2.cc  |  20 ++-
 .../aarch64/aarch64-sve-builtins-sve2.def |  96 +++--
 gcc/config/aarch64/aarch64-sve-builtins.cc|  21 +--
 gcc/config/aarch64/aarch64-sve-builtins.def   |   4 +-
 gcc/config/aarch64/aarch64-sve-builtins.h |  25 +++-
 9 files changed, 185 insertions(+), 131 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index fe16d93adcd..47d9a01c3dc 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -774,7 +774,8 @@ public:
   {
 	function_instance instance ("svneg", functions::svneg,
 shapes::unary, MODE_none,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	gcall *call = f.redirect_call (instance);
 	unsigned offset_index = 0;
 	if (f.pred == PRED_m)
@@ -802,7 +803,8 @@ public:
   {
 	function_instance instance ("svlsr", functions::svlsr,
 shapes::binary_uint_opt_n, MODE_n,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	call = f.redirect_call (instance);
 	tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : op2_cst;
 	new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
@@ -815,7 +817,8 @@ public:
 
 	function_instance instance ("svasrd", functions::svasrd,
 shapes::shift_right_imm, MODE_n,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	call = f.redirect_call (instance);
 	new_divisor = wide_int_to_tree (scalar_types[VECTOR_TYPE_svuint64_t],
 	tree_log2 (op2_cst));
@@ -2092,7 +2095,8 @@ public:
   {
 	function_instance instance ("svneg", functions::svneg,
 shapes::unary, MODE_none,
-f.type_suffix_ids, GROUP_none, f.pred);
+f.type_suffix_ids, GROUP_none, f.pred,
+FPM_unused);
 	gcall *call = f.redirect_call (instance);
 	unsigned offset_index = 0;
 	if (f.pred == PRED_m)
@@ -2133,7 +2137,8 @

[PATCH v1 0/2] aarch64: Add fp8 sve foundation

2024-10-30 Thread Claudio Bantaloukas


The ACLE defines a new set of fp8 vector types and intrinsics that operate on
these, some of them operating on the vectors as if they were bags of bits and
some requiring an additional argument of type fpm_t.
The following two patches introduce:
- the types
- intrinsics that operate without the fpm_t type
- foundational changes that will be used to implement intrinsics requiring an
  fpm_t argument at the end.

Is this ok for master? I do not have commit rights yet, if ok, can someone 
commit it on my behalf?

Regression tested on aarch64-unknown-linux-gnu.

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (2):
  aarch64: Add basic svmfloat8_t support to arm_sve.h
  aarch64: specify fpm mode in function instances and groups

 .../aarch64/aarch64-sve-builtins-base.cc  |  15 +-
 .../aarch64/aarch64-sve-builtins-base.def |   2 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc|   3 +-
 .../aarch64/aarch64-sve-builtins-sme.def  | 130 
 .../aarch64/aarch64-sve-builtins-sve2.cc  |  20 +-
 .../aarch64/aarch64-sve-builtins-sve2.def |  96 +++---
 gcc/config/aarch64/aarch64-sve-builtins.cc|  29 +-
 gcc/config/aarch64/aarch64-sve-builtins.def   |   7 +-
 gcc/config/aarch64/aarch64-sve-builtins.h |  26 +-
 .../aarch64/sve/acle/general-c++/mangle_1.C   |   2 +
 .../aarch64/sve/acle/general-c++/mangle_2.C   |   2 +
 .../aarch64/sve/acle/asm/clasta_mf8.c |  52 +++
 .../aarch64/sve/acle/asm/clastb_mf8.c |  52 +++
 .../aarch64/sve/acle/asm/create2_1.c  |  15 +
 .../aarch64/sve/acle/asm/create3_1.c  |  11 +
 .../aarch64/sve/acle/asm/create4_1.c  |  12 +
 .../aarch64/sve/acle/asm/dup_lane_mf8.c   | 124 
 .../gcc.target/aarch64/sve/acle/asm/dup_mf8.c |  31 ++
 .../aarch64/sve/acle/asm/dupq_lane_mf8.c  |  48 +++
 .../gcc.target/aarch64/sve/acle/asm/ext_mf8.c |  73 +
 .../aarch64/sve/acle/asm/get2_mf8.c   |  55 
 .../aarch64/sve/acle/asm/get3_mf8.c   | 108 +++
 .../aarch64/sve/acle/asm/get4_mf8.c   | 179 +++
 .../aarch64/sve/acle/asm/insr_mf8.c   |  22 ++
 .../aarch64/sve/acle/asm/lasta_mf8.c  |  12 +
 .../aarch64/sve/acle/asm/lastb_mf8.c  |  12 +
 .../gcc.target/aarch64/sve/acle/asm/ld1_mf8.c | 162 ++
 .../aarch64/sve/acle/asm/ld1ro_mf8.c  | 121 +++
 .../aarch64/sve/acle/asm/ld1rq_mf8.c  | 137 
 .../gcc.target/aarch64/sve/acle/asm/ld2_mf8.c | 204 
 .../gcc.target/aarch64/sve/acle/asm/ld3_mf8.c | 246 +++
 .../gcc.target/aarch64/sve/acle/asm/ld4_mf8.c | 290 +
 .../aarch64/sve/acle/asm/ldff1_mf8.c  |  91 ++
 .../aarch64/sve/acle/asm/ldnf1_mf8.c  | 155 +
 .../aarch64/sve/acle/asm/ldnt1_mf8.c  | 162 ++
 .../gcc.target/aarch64/sve/acle/asm/len_mf8.c |  12 +
 .../aarch64/sve/acle/asm/reinterpret_bf16.c   |  17 +
 .../aarch64/sve/acle/asm/reinterpret_f16.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_f32.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_f64.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_mf8.c| 297 ++
 .../aarch64/sve/acle/asm/reinterpret_s16.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_s32.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_s64.c|  17 +
 .../aarch64/sve/acle/asm/reinterpret_s8.c |  17 +
 .../aarch64/sve/acle/asm/reinterpret_u16.c|  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u32.c|  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u64.c|  28 ++
 .../aarch64/sve/acle/asm/reinterpret_u8.c |  28 ++
 .../gcc.target/aarch64/sve/acle/asm/rev_mf8.c |  21 ++
 .../gcc.target/aarch64/sve/acle/asm/sel_mf8.c |  30 ++
 .../aarch64/sve/acle/asm/set2_mf8.c   |  41 +++
 .../aarch64/sve/acle/asm/set3_mf8.c   |  63 
 .../aarch64/sve/acle/asm/set4_mf8.c   |  87 +
 .../aarch64/sve/acle/asm/splice_mf8.c |  33 ++
 .../gcc.target/aarch64/sve/acle/asm/st1_mf8.c | 162 ++
 .../gcc.target/aarch64/sve/acle/asm/st2_mf8.c | 204 
 .../gcc.target/aarch64/sve/acle/asm/st3_mf8.c | 246 +++
 .../gcc.target/aarch64/sve/acle/asm/st4_mf8.c | 290 +
 .../aarch64/sve/acle/asm/stnt1_mf8.c  | 162 ++
 .../gcc.target/aarch64/sve/acle/asm/tbl_mf8.c |  30 ++
 .../aarch64/sve/acle/asm/trn1_mf8.c   |  30 ++
 .../aarch64/sve/acle/asm/trn1q_mf8.c  |  32 ++
 .../aarch64/sve/acle/asm/trn2_mf8.c   |  30 ++
 .../aarch64/sve/acle/asm/trn2q_mf8.c  |  32 ++
 .../aarch64/sve/acle/asm/undef2_1.c   |   7 +
 .../aarch64/sve/acle/asm/undef3_1.c   |   7 +
 .../aarch64/sve/acle/asm/undef4_1.c   |   7 +
 .../gcc.target/aarch64/sve/acle/asm/undef_1.c |   7 +
 .../aarch64/sve/acle/asm/uzp1_mf8.c   |  30 ++
 .../aarch64/sve/acle/asm/uzp1q_mf8.c  |  32 ++
 .../aarch64/sve/acle/asm/uzp2_mf8.c   |  30 ++
 .../aarch64/sve/acle/asm/uzp2q_mf8.c  |  32 ++
 ..

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Sam James
Qing Zhao  writes:

> Control this with a new option -fdiagnostics-details.
>
> $ cat t.c
> extern void warn(void);
> static inline void assign(int val, int *regs, int *index)
> {
>   if (*index >= 4)
> warn();
>   *regs = val;
> }
> struct nums {int vals[4];};
>
> void sparx5_set (int *ptr, struct nums *sg, int index)
> {
>   int *val = &sg->vals[index];
>
>   assign(0,ptr, &index);
>   assign(*val, ptr, &index);
> }
>
> $ gcc -Wall -O2  -c -o t.o t.c
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
> [-Warray-bounds=]
>12 |   int *val = &sg->vals[index];
>   |   ^~~
> t.c:8:18: note: while referencing ‘vals’
> 8 | struct nums {int vals[4];};
>   |  ^~~~
>
> In the above, Although the warning is correct in theory, the warning message
> itself is confusing to the end-user since there is information that cannot
> be connected to the source code directly.
>
> It will be a nice improvement to add more information in the warning message
> to report where such index value come from.
>
> In order to achieve this, we add a new data structure "move_history" to record
> 1. the "condition" that triggers the code movement;
> 2. whether the code movement is on the true path of the "condition";
> 3. the "compiler transformation" that triggers the code movement.
>
> Whenever there is a code movement along control flow graph due to some
> specific transformations, such as jump threading, path isolation, tree
> sinking, etc., a move_history structure is created and attached to the
> moved gimple statement.
>
> During array out-of-bound checking or -Wstringop-* warning checking, the
> "move_history" that was attached to the gimple statement is used to form
> a sequence of diagnostic events that are added to the corresponding rich
> location to be used to report the warning message.
>
> This behavior is controled by the new option -fdiagnostics-details
> which is off by default.
>
> With this change, by adding -fdiagnostics-details,
> the warning message for the above testing case is now:
>
> $ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c
> t.c: In function ‘sparx5_set’:
> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
> [-Warray-bounds=]
>12 |   int *val = &sg->vals[index];
>   |   ^~~
>   ‘sparx5_set’: events 1-2
> 4 |   if (*index >= 4)
>   |  ^
>   |  |
>   |  (1) when the condition is evaluated to true
> ..
>12 |   int *val = &sg->vals[index];
>   |   ~~~
>   |   |
>   |   (2) out of array bounds here
> t.c:8:18: note: while referencing ‘vals’
> 8 | struct nums {int vals[4];};
>   |  ^~~~
>
>   PR tree-optimization/109071
>
> gcc/ChangeLog:
>
>   * Makefile.in (OBJS): Add diagnostic-move-history.o
>   and move-history-diagnostic-path.o.
>   * gcc/common.opt (fdiagnostics-details): New option.
>   * gcc/doc/invoke.texi (fdiagnostics-details): Add
>   documentation for the new option.
>   * gimple-array-bounds.cc (build_rich_location_with_diagnostic_path):
>   New function.
>   (check_out_of_bounds_and_warn): Add one new parameter. Use rich
>   location with move_history_diagnostic_path for warning_at.
>   (array_bounds_checker::check_array_ref): Use rich location with
>   move_history_diagnostic_path for warning_at.
>   (array_bounds_checker::check_mem_ref): Add one new parameter.
>   Use rich location with move_history_diagnostic_path for warning_at.
>   (array_bounds_checker::check_addr_expr): Use rich location with
>   move_history_diagnostic_path for warning_at.
>   (array_bounds_checker::check_array_bounds): Call check_mem_ref with
>   one more parameter.
>   * gimple-array-bounds.h: Update prototype for check_mem_ref.
>   * gimple-iterator.cc (gsi_remove): (gsi_remove): Remove the move
>   history when removing the gimple.
>   * gimple-pretty-print.cc (pp_gimple_stmt_1): Emit MV_H marking
>   if the gimple has a move_history.
>   * gimple-ssa-isolate-paths.cc (isolate_path): Set move history
>   for the gimples of the duplicated blocks.
>   * gimple-ssa-warn-restrict.cc (maybe_diag_access_bounds): Use
>   rich location with move_history_diagnostic_path for warning_at.
> * gimple-ssa-warn-access.cc (warn_string_no_nul): Likewise.
> (maybe_warn_nonstring_arg): Likewise.
> (maybe_warn_for_bound): Likewise.
> (warn_for_access): Likewise.
> (check_access): Likewise.
> (pass_waccess::check_strncat): Likewise.
> (pass_waccess::maybe_check_access_sizes): Likewise.
> * tree-ssa-sink.cc (sink_code_in_bb): Create move_history for
>   stmt when it is sinked.
>   * toplev.cc (toplev::finalize):  Call move_history_finalize.
>   * t

Re: [Patch, fortran] PR115700 - comment 5: uninitialized string length in ASSOCIATE

2024-10-30 Thread Steve Kargl
On Wed, Oct 30, 2024 at 04:41:40PM +, Paul Richard Thomas wrote:
> This wrinkle to PR115700 came about because the associate-name string
> length was not being initialized, when an array selector had a substring
> reference with non-constant start or end. This, of course, caused
> subsequent references to fail.
> 
> The ChangeLog provides an adequate explanation of the attached patch.
> 
> OK for mainline and backporting to 14-branch?
> 

Yes.  Thanks for the patch.

-- 
Steve


Re: [PATCH] libgo: Use stub syscall on GNU/Hurd

2024-10-30 Thread Ian Lance Taylor
On Tue, Oct 29, 2024 at 2:04 PM Samuel Thibault 
wrote:

>
> * libgo/go/syscall/syscall_funcs.go: Do not build on GNU/Hurd.
> * libgo/go/syscall/syscall_funcs_stubs.go: Build on GNU/Hurd.
> * libgo/runtime/go-nosys.c: Do not produce syscall() stub on
> GNU/Hurd.
>

Thanks.  Committed as follows.

Ian
3ca4f43f50a3e8c7a398ea797220808b3318
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index c39aca9b1b0..59badf80f40 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-092668d6ce6d7b3aff6797247cd53dc44319c558
+f9ea9801058aa98a421784da12b76cda0b4c6cf2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/syscall_funcs.go 
b/libgo/go/syscall/syscall_funcs.go
index a906fa5a42e..fc14cb18286 100644
--- a/libgo/go/syscall/syscall_funcs.go
+++ b/libgo/go/syscall/syscall_funcs.go
@@ -2,8 +2,8 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-//go:build darwin || dragonfly || freebsd || hurd || linux || netbsd || 
openbsd || solaris
-// +build darwin dragonfly freebsd hurd linux netbsd openbsd solaris
+//go:build darwin || dragonfly || freebsd || linux || netbsd || openbsd || 
solaris
+// +build darwin dragonfly freebsd linux netbsd openbsd solaris
 
 package syscall
 
diff --git a/libgo/go/syscall/syscall_funcs_stubs.go 
b/libgo/go/syscall/syscall_funcs_stubs.go
index 11f12bd9ae3..e37a6483b02 100644
--- a/libgo/go/syscall/syscall_funcs_stubs.go
+++ b/libgo/go/syscall/syscall_funcs_stubs.go
@@ -2,8 +2,8 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-//go:build aix || rtems
-// +build aix rtems
+//go:build aix || hurd || rtems
+// +build aix hurd rtems
 
 // These are stubs.
 
diff --git a/libgo/runtime/go-nosys.c b/libgo/runtime/go-nosys.c
index 30222df7815..cd3e7664ca0 100644
--- a/libgo/runtime/go-nosys.c
+++ b/libgo/runtime/go-nosys.c
@@ -504,7 +504,7 @@ strerror_r (int errnum, char *buf, size_t buflen)
 
 #endif /* ! HAVE_STRERROR_R */
 
-#ifndef HAVE_SYSCALL
+#if !defined(HAVE_SYSCALL) && !defined(__GNU__) /* GNU/Hurd already has a stub 
*/
 int
 syscall(int number __attribute__ ((unused)), ...)
 {


Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Richard Sandiford
Vineet Gupta  writes:
> On 10/30/24 10:25, Jeff Law wrote:
>> On 10/30/24 9:31 AM, Richard Sandiford wrote:
>>> That might need some finessing of the name.  But I think the concept
>>> is right.  I'd rather base the hook (or param) on a general concept
>>> like that rather than a specific "wide vs narrow" thing.
>> Agreed.  Naming was my real only concern about the first patch.
>
> We are leaning towards
>   - TARGET_SCHED_PRESSURE_SPILL_AGGRESSIVE
>   - targetm.sched.pressure_spill_aggressive
>
> Targets could wire them up however they like
>
 I still see Vineet's data as compelling, even with GIGO concern.
>>> Do you mean the reduction in dynamic instruction counts?  If so,
>>> that isn't what the algorithm is aiming to reduce.  Like I mentioned
>>> in the previous thread, trying to minimise dynamic instruction counts
>>> was also harmful for the core & benchmarks I was looking at.
>>> We just ended up with lots of pipeline bubbles that could be
>>> alleviated by judicious spilling.
>> Vineet showed significant cycle and icount improvements.  I'm much more 
>> interested in the former :-)
>
> The initial premise indeed was icounts but with recent access to some 
> credible hardware I'm all for perf measurement now.
>
> Please look at patch 2/4 [1] for actual perf data both cycles and 
> instructions.
> I kept 1/4 introducing hook seperate from 2/4 which implements the hook for 
> RISC-V.
>
>     [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665945.html

Ah, sorry, I was indeed going only from the description in 1/4.
I've not had time to look at the rest of the series yet.

> As Jeff mentioned on a In-order RISC-V core, are we are seeing 6% cycle 
> improvements from the hook and another 6% cycles improvement from patch 3/4

Sounds good!

> Also Wilco gave this a spin on high end OoO Neoverse and seems to be seeing 
> 20% improvement which I gather is cycles.

Yeah, it's common ground that we should change this for OoO cores.

>>> I'm not saying that the algorithm gets the decision right for cactu
>>> when tuning for in-order CPU X and running on that same CPU X.
>>> But it seems like that combination hasn't been tried, and that,
>>> even on the combinations that the patch has been tried on, the cactu
>>> justification is based on static properties of the binary rather than
>>> a particular runtime improvement (Y% faster).
>
> I'd requested Wilco to possibly try this on some in-order arm cores.

OK.  FWIW, I think the original testing was on Cortex-A9 or Cortex-A15,
It was also heavy on filters, such as yiq.

But is this about making the argument in favour of an unconditional change?
If so, I don't think it's necessary to front-load this testing.  Like I said
in my reply to Jeff, that can happen naturally if all major targets move
to the new behaviour.  And for a hook/param approach, we already have
enough data to justify the patch.

Thanks,
Richard


Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao


> On Oct 30, 2024, at 13:48, Sam James  wrote:
> 
> Qing Zhao  writes:
> 
>> Control this with a new option -fdiagnostics-details.
>> 
>> $ cat t.c
>> extern void warn(void);
>> static inline void assign(int val, int *regs, int *index)
>> {
>>  if (*index >= 4)
>>warn();
>>  *regs = val;
>> }
>> struct nums {int vals[4];};
>> 
>> void sparx5_set (int *ptr, struct nums *sg, int index)
>> {
>>  int *val = &sg->vals[index];
>> 
>>  assign(0,ptr, &index);
>>  assign(*val, ptr, &index);
>> }
>> 
>> $ gcc -Wall -O2  -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
>> [-Warray-bounds=]
>>   12 |   int *val = &sg->vals[index];
>>  |   ^~~
>> t.c:8:18: note: while referencing ‘vals’
>>8 | struct nums {int vals[4];};
>>  |  ^~~~
>> 
>> In the above, Although the warning is correct in theory, the warning message
>> itself is confusing to the end-user since there is information that cannot
>> be connected to the source code directly.
>> 
>> It will be a nice improvement to add more information in the warning message
>> to report where such index value come from.
>> 
>> In order to achieve this, we add a new data structure "move_history" to 
>> record
>> 1. the "condition" that triggers the code movement;
>> 2. whether the code movement is on the true path of the "condition";
>> 3. the "compiler transformation" that triggers the code movement.
>> 
>> Whenever there is a code movement along control flow graph due to some
>> specific transformations, such as jump threading, path isolation, tree
>> sinking, etc., a move_history structure is created and attached to the
>> moved gimple statement.
>> 
>> During array out-of-bound checking or -Wstringop-* warning checking, the
>> "move_history" that was attached to the gimple statement is used to form
>> a sequence of diagnostic events that are added to the corresponding rich
>> location to be used to report the warning message.
>> 
>> This behavior is controled by the new option -fdiagnostics-details
>> which is off by default.
>> 
>> With this change, by adding -fdiagnostics-details,
>> the warning message for the above testing case is now:
>> 
>> $ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
>> [-Warray-bounds=]
>>   12 |   int *val = &sg->vals[index];
>>  |   ^~~
>>  ‘sparx5_set’: events 1-2
>>4 |   if (*index >= 4)
>>  |  ^
>>  |  |
>>  |  (1) when the condition is evaluated to true
>> ..
>>   12 |   int *val = &sg->vals[index];
>>  |   ~~~
>>  |   |
>>  |   (2) out of array bounds here
>> t.c:8:18: note: while referencing ‘vals’
>>8 | struct nums {int vals[4];};
>>  |  ^~~~
>> 
>> PR tree-optimization/109071
>> 
>> gcc/ChangeLog:
>> 
>> * Makefile.in (OBJS): Add diagnostic-move-history.o
>> and move-history-diagnostic-path.o.
>> * gcc/common.opt (fdiagnostics-details): New option.
>> * gcc/doc/invoke.texi (fdiagnostics-details): Add
>> documentation for the new option.
>> * gimple-array-bounds.cc (build_rich_location_with_diagnostic_path):
>> New function.
>> (check_out_of_bounds_and_warn): Add one new parameter. Use rich
>> location with move_history_diagnostic_path for warning_at.
>> (array_bounds_checker::check_array_ref): Use rich location with
>> move_history_diagnostic_path for warning_at.
>> (array_bounds_checker::check_mem_ref): Add one new parameter.
>> Use rich location with move_history_diagnostic_path for warning_at.
>> (array_bounds_checker::check_addr_expr): Use rich location with
>> move_history_diagnostic_path for warning_at.
>> (array_bounds_checker::check_array_bounds): Call check_mem_ref with
>> one more parameter.
>> * gimple-array-bounds.h: Update prototype for check_mem_ref.
>> * gimple-iterator.cc (gsi_remove): (gsi_remove): Remove the move
>> history when removing the gimple.
>> * gimple-pretty-print.cc (pp_gimple_stmt_1): Emit MV_H marking
>> if the gimple has a move_history.
>> * gimple-ssa-isolate-paths.cc (isolate_path): Set move history
>> for the gimples of the duplicated blocks.
>> * gimple-ssa-warn-restrict.cc (maybe_diag_access_bounds): Use
>> rich location with move_history_diagnostic_path for warning_at.
>>* gimple-ssa-warn-access.cc (warn_string_no_nul): Likewise.
>>(maybe_warn_nonstring_arg): Likewise.
>>(maybe_warn_for_bound): Likewise.
>>(warn_for_access): Likewise.
>>(check_access): Likewise.
>>(pass_waccess::check_strncat): Likewise.
>>(pass_waccess::maybe_check_access_sizes): Likewise.
>>* tree-ssa-sink.cc (sink_code_in_bb): Create move_history for
>> stmt when it is sinked.
>> * toplev.cc (toplev::finalize):  Call move_history_finalize.
>> * tree-ssa-threadupd

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao
Hi, Andi

> On Oct 30, 2024, at 12:15, Andi Kleen  wrote:
> 
> Qing Zhao  writes:
> 
>> Control this with a new option -fdiagnostics-details.
> 
> It would be useful to be also able to print the inline call stack,
> maybe with a separate option.

Thank you for the suggestion. 
Yes, inline call stack will also be very helpful to the users to understand the 
warning message better.

I noticed that currently,  some inlining information has already been issued 
when reporting warning. For example, 
For the testing case of PR115274, when it was compiled with -O2 -Wall, we got:

$ cat t_115274.c
#include 
char *c;
void a();
int b(char *d) { return strlen(d); }
void e() {
  long f = 1;
  f = b(c + f);
  if (c == 0)
a(f);
}

$/home/opc/Install/latest-d/bin/gcc -O2 -Wall t_115274.c
In function ‘b’,
inlined from ‘e’ at t_115274.c:7:7:
t_115274.c:4:25: warning: ‘strlen’ reading 1 or more bytes from a region of 
size 0 [-Wstringop-overread]
4 | int b(char *d) { return strlen(d); }
  | ^
In function ‘e’:
cc1: note: source object is likely at address zero.

I located that the following routine in gcc/langhooks.cc  
reports the inlining information when reporting an error.

/* The default function to print out name of current function that caused
   an error.  */
void
lhd_print_error_function (diagnostic_text_output_format &text_output,
  const char *file,
  const diagnostic_info *diagnostic)

So, I am wondering whether there already is some available utility routine we 
can use to report the inlining chain for one location? 

Thanks.

Qing


> 
> In some array bounds cases I looked at the problem was hidden in some inlines
> and it wasn't trivial to figure it out.
> 
> I wrote this patch for it at some point.
> 
> 
>Print inline stack for warn access warnings
> 
>The warnings reported by gimple-ssa-warn-access often depend on the
>caller with inlining, and when there are a lot of callers it can be
>difficult to figure out which caller triggered a warning.
> 
>Print the function context including inline stack for these
>warnings.
> 
>gcc/ChangeLog:
> 
>* gimple-ssa-warn-access.cc (maybe_inform_function): New
>function to report function context.
>(warn_string_no_nul): Use maybe_inform_function.
>(maybe_warn_nonstring_arg): Dito.
>(maybe_warn_for_bound): Dito.
>(warn_for_access): Dito.
>(check_access): Dito.
>(warn_dealloc_offset): Dito.
>(maybe_warn_alloc_args_overflow): Dito.
>(pass_waccess::check_strncat): Dito.
>(pass_waccess::maybe_check_access_sizes): Dito.
>(pass_waccess::maybe_check_dealloc_call): Dito.
>(pass_waccess::warn_invalid_pointer): Dito.
>(maybe_warn_mismatched_realloc): Dito.
>(pass_waccess::check_dangling_stores): Dito.
>(pass_waccess::execute): Reset last_function variable.
> 
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index 61f9f0f3d310..94c043531988 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -125,6 +125,21 @@ call_arg (tree expr, unsigned argno)
>   return CALL_EXPR_ARG (expr, argno);
> }
> 
> +/* Already printed inform for the function.  */
> +static bool printed_function;
> +
> +/* Inform about the function stack unless warning is suppressed at LOC
> +   with opt code OPT.  */
> +static void
> +maybe_inform_function (location_t loc, int opt)
> +{
> +  if (printed_function)
> +return;
> +  printed_function = true;
> +  if (!warning_suppressed_at (loc, (opt_code)opt))
> +inform (DECL_SOURCE_LOCATION (cfun->decl), "in function %qD", 
> cfun->decl);
> +}
> +
> /* For a call EXPR at LOC to a function FNAME that expects a string
>in the argument ARG, issue a diagnostic due to it being a called
>with an argument that is a character array with no terminating
> @@ -162,6 +177,8 @@ warn_string_no_nul (location_t loc, GimpleOrTree expr, 
> const char *fname,
> 
>   auto_diagnostic_group d;
> 
> +  maybe_inform_function (loc, opt);
> +
>   const tree maxobjsize = max_object_size ();
>   const wide_int maxsiz = wi::to_wide (maxobjsize);
>   if (expr)
> @@ -485,6 +502,7 @@ maybe_warn_nonstring_arg (tree fndecl, GimpleOrTree exp)
>   if (tree_int_cst_lt (maxobjsize, bndrng[0]))
> {
>  bool warned = false;
> +  maybe_inform_function (loc, OPT_Wstringop_overread);
>  if (tree_int_cst_equal (bndrng[0], bndrng[1]))
>warned = warning_at (loc, OPT_Wstringop_overread,
> "%qD specified bound %E "
> @@ -638,6 +656,7 @@ maybe_warn_nonstring_arg (tree fndecl, GimpleOrTree exp)
>   auto_diagnostic_group d;
>   if (wi::ltu_p (asize, wibnd))
> {
> +  maybe_inform_function (loc, OPT_Wstringop_overread);
>  if (bndrng[0] == bndrng[1])
>warned = warning_at (loc, OPT_Wstringop_overre

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Sam James
David Malcolm  writes:

> On Wed, 2024-10-30 at 17:33 +, Sam James wrote:
>> Qing Zhao  writes:
>> 
>> > > On Oct 30, 2024, at 10:48, David Malcolm 
>> > > wrote:
>> > > 
>> > > On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
>> > > > Qing Zhao  writes:
>> > > > 
>> > > > > Control this with a new option -fdiagnostics-details.
>> > > > > 
>> > > > > [...]
>> > > > 
>> > > > The patch doesn't apply for me on very latest trunk -- I think
>> > > > David's
>> > > > recent diag refactoring means it needs a slight rebase. Could
>> > > > you
>> > > > send
>> > > > that?
>> > > 
>> > > If it's broken, it was probably by:
>> > > 
>> > > r15-4610 ("Use unique_ptr in more places in
>> > > pretty_printer/diagnostics
>> > > [PR116613]")
>> > > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bf43fe6aa966eaf397ea3b8ebd6408d3d124e285
>> > 
>> > Yes, due to the following change in the above commit:
>> > 
>> > diff --git a/gcc/toplev.cc b/gcc/toplev.cc
>> > index
>> > 62034c32b4aff32cdf2cb051bf9d0803b4730b3f..a12a2e1afba15ba16f6ade624
>> > cde3e60907ba5d2 100644 (file)
>> > --- a/gcc/toplev.cc
>> > +++ b/gcc/toplev.cc
>> > @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not
>> > see
>> >  #include "cgraph.h"
>> >  #include "coverage.h"
>> >  #include "diagnostic.h"
>> > +#include "pretty-print-urlifier.h"
>> >  #include "varasm.h"
>> >  #include "tree-inline.h"
>> >  #include "realmpfr.h"  /* For GMP/MPFR/MPC versions, in
>> > print_version.  */
>> > 
>> > 
>> > [...]
>> > > 
>> 
>> To continue testing, I am using the attached hacked up patches
>
> Thanks; FWIW the fixes in those patches look correct to me.

Thanks! Need a bit more confidence I think.

>
> Dave


[committed] aarch64: Assume alias conflict if common address reg changes [PR116783]

2024-10-30 Thread Alex Coplan
Hi,

This is a backport of the PR116783 fix to GCC 14.  It was pre-approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665097.html

The only intended non-contextual difference w.r.t. the patch on trunk is
that the test no longer needs -fno-late-combine-instructions on the 14
branch (I verified that it failed there without the change to
aarch64-ldp-fusion.cc).

Bootstrapped/regtested on aarch64-linux-gnu (all languages), no
regressions.  Pushed to the 14 branch.

Thanks,
Alex

---

As the PR shows, pair fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

This is a backport (but not a straight cherry pick) of
r15-4518-gc0e54ce1999ccf2241f74c5188b11b92e5aedc1f.

gcc/ChangeLog:

PR rtl-optimization/116783
* config/aarch64/aarch64-ldp-fusion.cc
(def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(ldp_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 1fc25e389cf..f32d30d54c5 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -2173,11 +2173,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into the pp_printf
+   // calls, doing so triggers -Wformat-diag.
+   pretty_printer pp;
+

Re: [PATCH] c++: Fix crash during NRV optimization with invalid input [PR117099]

2024-10-30 Thread Simon Martin
[ Resending since this was somehow sent in HMTL mode and was scrubbed ]

On 30 Oct 2024, at 17:16, Simon Martin wrote:

> Hi,
>
> Just closing the loop on this...
>
> On 19 Oct 2024, at 11:57, Iain Sandoe wrote:
>
> On 19 Oct 2024, at 10:16, Simon Martin  wrote:
>
> On 18 Oct 2024, at 10:55, Sam James wrote:
>
> Simon Martin  writes:
>
> Hi Sam,
>
> Hi Simon,
>
> On 16 Oct 2024, at 22:06, Sam James wrote:
>
> Simon Martin  writes:
>
> We ICE upon the following invalid code because we end up calling
> finalize_nrv_r with a RETURN_EXPR with no operand.
>
> === cut here ===
> struct X {
>  ~X();
> };
> X test(bool b) {
>  {
>  X x;
>  return x;
>  }
>  if (!(b)) return;
> }
> === cut here ===
>
> This patch fixes this by simply returning error_mark_node when
> detecting
> a void return in a function returning non-void.
>
> Successfully tested on x86_64-pc-linux-gnu.
>
> PR c++/117099
>
> gcc/cp/ChangeLog:
>
> * typeck.cc (check_return_expr): Return error_mark_node upon
> void return for function returning non-void.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/parse/crash77.C: New test.
>
> ---
> gcc/cp/typeck.cc | 1 +
> gcc/testsuite/g++.dg/parse/crash77.C | 14 ++
> 2 files changed, 15 insertions(+)
> create mode 100644 gcc/testsuite/g++.dg/parse/crash77.C
>
> diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
> index 71d879abef1..22a6ec9a185 100644
> --- a/gcc/cp/typeck.cc
> +++ b/gcc/cp/typeck.cc
> @@ -11238,6 +11238,7 @@ check_return_expr (tree retval, bool
> *no_warning, bool *dangling)
>  RETURN_EXPR to avoid control reaches end of non-void function
>  warnings in tree-cfg.cc. */
>  *no_warning = true;
> + return error_mark_node;
>  }
>  /* Check for a return statement with a value in a function that
>  isn't supposed to return a value. */
> diff --git a/gcc/testsuite/g++.dg/parse/crash77.C
> b/gcc/testsuite/g++.dg/parse/crash77.C
> new file mode 100644
> index 000..d3f0ae6a877
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/parse/crash77.C
> @@ -0,0 +1,14 @@
> +// PR c++/117099
> +// { dg-compile }
>
> dg-do compile
>
> Aarg, of course, thanks for spotting this! Fixed in the attached
> version.
>
> +
> +struct X {
> + ~X();
> +};
> +
> +X test(bool b) {
> + {
> + X x;
> + return x;
> + }
> + if (!(b)) return; // { dg-error "return-statement with no value" }
> +}
> -- 
> 2.44.0
>
> BTW, the line-endings on this seem a bit odd. Did you use
> git-send-email?
>
> I did use git-send-email indeed. What oddities do you see with line
> endings?
> cat -A over the patch file looks good.
>
> Weird -- if I open your original email in mu4e, I see a bunch of ^M at
> the end of the lines.
>
> Strange. FWIW I’m generating and sending the patches from a MacOS 
> box,
> and there might be some weirdness coming from that. I’ll check and 
> try
> to fix.
>
> macOS Mail messes up whitespace, for sure (but I’ve not seen it 
> append ).
> AFAIK “git send-email” works fine (at least no-one has reported a 
> problem).
>
> maybe the editor in use thinks the target format is Windows and has 
> changed
> line endings accordingly?
>
> I investigated further and the issue was actually linked to the AWS
> WorkMail SMTP server: when I use the native AWS SES one instead, my 
> test
>
> messages don’t have any =0D=0A anymore.
>
I’ve permanently updated my configuration to use SES, and all my 
future
git send-email’s should be good (note that this was purely a SMTP 
issue,
and the commits that I actually pushed never had any \r).

Thanks, Simon



Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Andi Kleen
Qing Zhao  writes:

> Control this with a new option -fdiagnostics-details.

It would be useful to be also able to print the inline call stack,
maybe with a separate option.

In some array bounds cases I looked at the problem was hidden in some inlines
and it wasn't trivial to figure it out.

I wrote this patch for it at some point.


Print inline stack for warn access warnings

The warnings reported by gimple-ssa-warn-access often depend on the
caller with inlining, and when there are a lot of callers it can be
difficult to figure out which caller triggered a warning.

Print the function context including inline stack for these
warnings.

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (maybe_inform_function): New
function to report function context.
(warn_string_no_nul): Use maybe_inform_function.
(maybe_warn_nonstring_arg): Dito.
(maybe_warn_for_bound): Dito.
(warn_for_access): Dito.
(check_access): Dito.
(warn_dealloc_offset): Dito.
(maybe_warn_alloc_args_overflow): Dito.
(pass_waccess::check_strncat): Dito.
(pass_waccess::maybe_check_access_sizes): Dito.
(pass_waccess::maybe_check_dealloc_call): Dito.
(pass_waccess::warn_invalid_pointer): Dito.
(maybe_warn_mismatched_realloc): Dito.
(pass_waccess::check_dangling_stores): Dito.
(pass_waccess::execute): Reset last_function variable.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 61f9f0f3d310..94c043531988 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -125,6 +125,21 @@ call_arg (tree expr, unsigned argno)
   return CALL_EXPR_ARG (expr, argno);
 }
 
+/* Already printed inform for the function.  */
+static bool printed_function;
+
+/* Inform about the function stack unless warning is suppressed at LOC
+   with opt code OPT.  */
+static void
+maybe_inform_function (location_t loc, int opt)
+{
+  if (printed_function)
+return;
+  printed_function = true;
+  if (!warning_suppressed_at (loc, (opt_code)opt))
+inform (DECL_SOURCE_LOCATION (cfun->decl), "in function %qD", cfun->decl);
+}
+
 /* For a call EXPR at LOC to a function FNAME that expects a string
in the argument ARG, issue a diagnostic due to it being a called
with an argument that is a character array with no terminating
@@ -162,6 +177,8 @@ warn_string_no_nul (location_t loc, GimpleOrTree expr, 
const char *fname,
 
   auto_diagnostic_group d;
 
+  maybe_inform_function (loc, opt);
+
   const tree maxobjsize = max_object_size ();
   const wide_int maxsiz = wi::to_wide (maxobjsize);
   if (expr)
@@ -485,6 +502,7 @@ maybe_warn_nonstring_arg (tree fndecl, GimpleOrTree exp)
   if (tree_int_cst_lt (maxobjsize, bndrng[0]))
{
  bool warned = false;
+ maybe_inform_function (loc, OPT_Wstringop_overread);
  if (tree_int_cst_equal (bndrng[0], bndrng[1]))
warned = warning_at (loc, OPT_Wstringop_overread,
 "%qD specified bound %E "
@@ -638,6 +656,7 @@ maybe_warn_nonstring_arg (tree fndecl, GimpleOrTree exp)
   auto_diagnostic_group d;
   if (wi::ltu_p (asize, wibnd))
{
+ maybe_inform_function (loc, OPT_Wstringop_overread);
  if (bndrng[0] == bndrng[1])
warned = warning_at (loc, OPT_Wstringop_overread,
 "%qD argument %i declared attribute "
@@ -723,6 +742,7 @@ maybe_warn_for_bound (opt_code opt, location_t loc, 
GimpleOrTree exp, tree func,
   auto_diagnostic_group d;
   if (tree_int_cst_lt (maxobjsize, bndrng[0]))
{
+ maybe_inform_function (loc, opt);
  if (bndrng[0] == bndrng[1])
warned = (func
  ? warning_at (loc, opt,
@@ -760,7 +780,9 @@ maybe_warn_for_bound (opt_code opt, location_t loc, 
GimpleOrTree exp, tree func,
   else if (!size || tree_int_cst_le (bndrng[0], size))
return false;
   else if (tree_int_cst_equal (bndrng[0], bndrng[1]))
-   warned = (func
+   {
+ maybe_inform_function (loc, opt);
+ warned = (func
  ? warning_at (loc, opt,
(maybe
 ? G_("%qD specified bound %E may exceed "
@@ -775,8 +797,11 @@ maybe_warn_for_bound (opt_code opt, location_t loc, 
GimpleOrTree exp, tree func,
 : G_("specified bound %E exceeds "
  "source size %E")),
bndrng[0], size));
+   }
   else
-   warned = (func
+   {
+ maybe_inform_function (loc, opt);
+ warned = (func
  ? warning_at (loc, opt,
(maybe
 ? G_("%qD specified bound [%E, %E] may "
@@ -791,6 +816,7 @@ maybe

Re: [PATCH] gimple: Remove special handling of COND_EXPR for COMPARISON_CLASS_P [PR116949, PR114785]

2024-10-30 Thread Richard Biener
On Wed, Oct 30, 2024 at 1:56 AM Andrew Pinski  wrote:
>
> After r13-707-g68e0063397ba82, COND_EXPR for gimple assign no longer could 
> contain a comparison.
> The vectorizer was builting gimple assigns with comparison until 
> r15-4695-gd17e672ce82e69
> (which added an assert to make sure it no longer builds it).
>
> So let's remove the special handling COND_EXPR in a few places and add an 
> assert to
> gimple_build_assign_1 to make sure we don't build a gimple assign any more 
> with a comparison.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

The biggest offender still present is phiopt building a GENERIC
comparison for the
piecewise COND_EXPR simplification in gimple_simplify_phiopt (so
genmatch needs to
create both GIMPLE and GENERIC match variants for COND_EXPR conditions).
maybe_fold_comparisons_from_match_pd uses on-stack temporary GIMPLE for
a similar (more complex) case.

Richard.

> gcc/ChangeLog:
>
> PR middle-end/114785
> PR middle-end/116949
> * gimple-match-exports.cc (maybe_push_res_to_seq): Remove special
> handling of COMPARISON_CLASS_P in COND_EXPR/VEC_COND_EXPR.
> (gimple_extract): Likewise.
> * gimple-walk.cc (walk_stmt_load_store_addr_ops): Likewise.
> * gimple.cc (gimple_build_assign_1):
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-match-exports.cc | 12 +---
>  gcc/gimple-walk.cc  | 11 ---
>  gcc/gimple.cc   |  3 +++
>  3 files changed, 4 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index 77d225825cf..bc8038c19f0 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -489,12 +489,6 @@ maybe_push_res_to_seq (gimple_match_op *res_op, 
> gimple_seq *seq, tree res)
> && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[i]))
>return NULL_TREE;
>
> -  if (num_ops > 0 && COMPARISON_CLASS_P (ops[0]))
> -for (unsigned int i = 0; i < 2; ++i)
> -  if (TREE_CODE (TREE_OPERAND (ops[0], i)) == SSA_NAME
> - && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (TREE_OPERAND (ops[0], i)))
> -   return NULL_TREE;
> -
>if (res_op->code.is_tree_code ())
>  {
>auto code = tree_code (res_op->code);
> @@ -786,11 +780,7 @@ gimple_extract (gimple *stmt, gimple_match_op *res_op,
> }
>   case GIMPLE_TERNARY_RHS:
> {
> - tree rhs1 = gimple_assign_rhs1 (stmt);
> - if (code == COND_EXPR && COMPARISON_CLASS_P (rhs1))
> -   rhs1 = valueize_condition (rhs1);
> - else
> -   rhs1 = valueize_op (rhs1);
> + tree rhs1 = valueize_op (gimple_assign_rhs1 (stmt));
>   tree rhs2 = valueize_op (gimple_assign_rhs2 (stmt));
>   tree rhs3 = valueize_op (gimple_assign_rhs3 (stmt));
>   res_op->set_op (code, type, rhs1, rhs2, rhs3);
> diff --git a/gcc/gimple-walk.cc b/gcc/gimple-walk.cc
> index 9f768ca20fd..00520319aa9 100644
> --- a/gcc/gimple-walk.cc
> +++ b/gcc/gimple-walk.cc
> @@ -835,17 +835,6 @@ walk_stmt_load_store_addr_ops (gimple *stmt, void *data,
> ;
>   else if (TREE_CODE (op) == ADDR_EXPR)
> ret |= visit_addr (stmt, TREE_OPERAND (op, 0), op, data);
> - /* COND_EXPR and VCOND_EXPR rhs1 argument is a comparison
> -tree with two operands.  */
> - else if (i == 1 && COMPARISON_CLASS_P (op))
> -   {
> - if (TREE_CODE (TREE_OPERAND (op, 0)) == ADDR_EXPR)
> -   ret |= visit_addr (stmt, TREE_OPERAND (TREE_OPERAND (op, 0),
> -  0), op, data);
> - if (TREE_CODE (TREE_OPERAND (op, 1)) == ADDR_EXPR)
> -   ret |= visit_addr (stmt, TREE_OPERAND (TREE_OPERAND (op, 1),
> -  0), op, data);
> -   }
> }
>  }
>else if (gcall *call_stmt = dyn_cast  (stmt))
> diff --git a/gcc/gimple.cc b/gcc/gimple.cc
> index eeb1badff5f..f7b313be40e 100644
> --- a/gcc/gimple.cc
> +++ b/gcc/gimple.cc
> @@ -475,6 +475,9 @@ gimple_build_assign_1 (tree lhs, enum tree_code subcode, 
> tree op1,
>  gimple_build_with_ops_stat (GIMPLE_ASSIGN, (unsigned)subcode, num_ops
> PASS_MEM_STAT));
>gimple_assign_set_lhs (p, lhs);
> +  /* For COND_EXPR, op1 should not be a comparison. */
> +  if (op1 && subcode == COND_EXPR)
> +gcc_assert (!COMPARISON_CLASS_P  (op1));
>gimple_assign_set_rhs1 (p, op1);
>if (op2)
>  {
> --
> 2.43.0
>


Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Sam James
Qing Zhao  writes:

>> On Oct 30, 2024, at 10:48, David Malcolm  wrote:
>> 
>> On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
>>> Qing Zhao  writes:
>>> 
 Control this with a new option -fdiagnostics-details.
 
 [...]
>>> 
>>> The patch doesn't apply for me on very latest trunk -- I think
>>> David's
>>> recent diag refactoring means it needs a slight rebase. Could you
>>> send
>>> that?
>> 
>> If it's broken, it was probably by:
>> 
>> r15-4610 ("Use unique_ptr in more places in pretty_printer/diagnostics
>> [PR116613]")
>> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bf43fe6aa966eaf397ea3b8ebd6408d3d124e285
>
> Yes, due to the following change in the above commit:
>
> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> index 
> 62034c32b4aff32cdf2cb051bf9d0803b4730b3f..a12a2e1afba15ba16f6ade624cde3e60907ba5d2
>  100644 (file)
> --- a/gcc/toplev.cc
> +++ b/gcc/toplev.cc
> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cgraph.h"
>  #include "coverage.h"
>  #include "diagnostic.h"
> +#include "pretty-print-urlifier.h"
>  #include "varasm.h"
>  #include "tree-inline.h"
>  #include "realmpfr.h"  /* For GMP/MPFR/MPC versions, in print_version.  */
>
>
> [...]
>> 

To continue testing, I am using the attached hacked up patches (David,
please don't hurt me, it was minimal "just get it building" x "do more
than needed to reduce iterations" ;)).

>From f0d521fb56035e71a2b7da3a6c524abab811b42b Mon Sep 17 00:00:00 2001
Message-ID: 
In-Reply-To: 
References: 
From: Sam James 
Date: Wed, 30 Oct 2024 16:15:55 +
Subject: [PATCH 1/2] gcc: add INCLUDE_MEMORY to diagnostic-move-history (and
 friends)

gcc/ChangeLog:

	* diagnostic-move-history.cc (INCLUDE_MEMORY): Define INCLUDE_MEMORY.
	* move-history-diagnostic-path.cc (INCLUDE_MEMORY): Ditto.
	* move-history-diagnostic-path.h (INCLUDE_MEMORY): Ditto.
---
 gcc/diagnostic-move-history.cc  | 1 +
 gcc/gimple-array-bounds.cc  | 1 +
 gcc/move-history-diagnostic-path.cc | 1 +
 gcc/move-history-diagnostic-path.h  | 1 +
 4 files changed, 4 insertions(+)

diff --git a/gcc/diagnostic-move-history.cc b/gcc/diagnostic-move-history.cc
index e4c471ab50f..49adeac1094 100644
--- a/gcc/diagnostic-move-history.cc
+++ b/gcc/diagnostic-move-history.cc
@@ -18,6 +18,7 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 464dafa6555..a0b04ed0bc5 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/move-history-diagnostic-path.cc b/gcc/move-history-diagnostic-path.cc
index ab29893d1f6..15034616be7 100644
--- a/gcc/move-history-diagnostic-path.cc
+++ b/gcc/move-history-diagnostic-path.cc
@@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
diff --git a/gcc/move-history-diagnostic-path.h b/gcc/move-history-diagnostic-path.h
index d04337ea377..dd27cec 100644
--- a/gcc/move-history-diagnostic-path.h
+++ b/gcc/move-history-diagnostic-path.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_MOVE_HISTORY_DIAGNOSTIC_PATH_H
 #define GCC_MOVE_HISTORY_DIAGNOSTIC_PATH_H
 
+#define INCLUDE_MEMORY
 #include "diagnostic-path.h"
 #include "simple-diagnostic-path.h"
 #include "diagnostic-move-history.h"
-- 
2.47.0

>From 4933baa95dd6994443e299606e4dbfc0bad67be0 Mon Sep 17 00:00:00 2001
Message-ID: <4933baa95dd6994443e299606e4dbfc0bad67be0.1730309582.git@gentoo.org>
In-Reply-To: 
References: 
From: Sam James 
Date: Wed, 30 Oct 2024 17:12:08 +
Subject: [PATCH 2/2] gcc: adapt to m_printer change

gcc/ChangeLog:

	* move-history-diagnostic-path.cc (build_rich_location_with_diagnostic_path): Use get_reference_printer.
---
 gcc/move-history-diagnostic-path.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/move-history-diagnostic-path.cc b/gcc/move-history-diagnostic-path.cc
index 15034616be7..12de0050bb0 100644
--- a/gcc/move-history-diagnostic-path.cc
+++ b/gcc/move-history-diagnostic-path.cc
@@ -97,7 +97,7 @@ build_rich_location_with_diagnostic_path (location_t location, gimple *stmt)
 
   move_history_t mv_history = stmt ? get_move_history (stmt) : NULL;
   move_history_diagnostic_path *path
-= new move_history_diagnostic_path (global_dc->m_printer,
+= new move_history_diagnostic_path (global_dc->get_reference_printer (),
 	mv_history, location);
   path->po

Re: [PATCH v4 3/7] OpenMP: C front-end support for dispatch + adjust_args

2024-10-30 Thread Paul-Antoine Arras

On 30/10/2024 15:08, Tobias Burnus wrote:
I still need to look at 4/7 (C++) and 5/7 (tests for C and C++) [either 
before after you posted the new version].


I sent a revised C++ patch a few moments ago.


* * *

However, this 3/7 patch LGTM 🙂

One comment: For the < C23 testcase, can you add, e.g., -std=gnu17 ? The 
reason is that Joseph plans to switch to -std=gnu23 by default and 
already modified existing testcases, e.g. r15-4391-g9fb5348e302102 
"testsuite: Prepare for -std=gnu23 default" – and https://gcc.gnu.org/ 
pipermail/gcc-patches/2024-October/665612.html


Done.


* * *

In summary, Patches 1 to 3 are now approved 🙂

For 2, I expect a follow that for a known NULL ptr value (0L, nullptr, 
(void*)0L, absent argument in Fortran, absent argument in C++ with '= 
NULL/nullptr' parameter default), the need_device_ptr conversion will 
skip the __builtin_omp_get_mapped_ptr call (at is will just return 
NULL); but that can be done later 🙂 [Cf. also my comment to the 4/7 
patch.]


I attached an updated ME patch along with the C++ patch.
--
PA


Re: [PATCH v3 0/2][RFC]Provide more contexts for -Warray-bounds warning messages

2024-10-30 Thread Sam James
Qing Zhao  writes:

> Hi,
>
> This is the 3rd version of the patch for fixing PR109071.
>
> Thanks a lot for San James's help to test the previous 2nd version of
> the patch on a lot of packages in the wild and provide detailed analysis
> and filed new bugs. (PR117179, PR117180, etc). 

Thank you Qing!

>
> From his testing results, we all feel that this work will be a general
> improvement in this area and will be very helpful to the end-users.
>

Absolutely. Both in terms of improving safety as the whole point of
these warnings is, but also stopping users from being panicked. They
sometimes believe the warnings imply miscompilation because they can't
understand how they would happen otherwise.

An earlier version of these patches already helped find a real bug in
GNU wget.

I will be testing these patches and reporting any issues I see (either
in GCC or to projects).


> In this patch, we try to provide more contexts for -Warray-bounds, 
> -Wstringop-*
> warning messages due to code movements from various compiler transformations.
>
> Control this with a new option -fdiagnostics-details.
>
> Compared to the 2nd version: 
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657150.html
>
> Which is limited to fix PR109071, there are the following major improvement
> to the patch:
>
> 1. All the following current open PRs were identified as the duplications
>of PR109071 and were studied and fixed by this 3rd version of the patch.
>each testing case was added to the patch as a unit-test case:
>
> PR88771
> PR85788
> PR108770
> PR106762
> PR115274
> PR117179
>
> 2. Change the name of the new option from -fdiagnostics-explain-harder
>to -fdiagnostics-details;

The new name sounds good to me.

>
> 3. Change the name of the new data structure from "copy_history" to
>"move_history" due to the following reason:
>
>The key feature of the compiler transformation that might provide more
>accurate information to the value-range analysis is: moving a statement
>from the joint point of one specific condition to this condition's
>TRUE or FALSE path.  
>
>For example, threadjump and isolate-path transformation make duplicated
>basic block "Ba'" of the original basic block "Ba" that on the joint
>point of a condition "cond", and then move the two basic block "Ba'"
>and "Ba" to the TRUE path and FALSE path of the condition "cond"; on the
>otherhand, tree sink transformation just move some of the statements from
>the joint point of a condition "cond" to one specific path of this
>condition. 
>
>So, the new data structure "move_history" will include the following
>information:
>A. the "condition" that triggers the code movement;
>B. whether the code movement is on the true path of the "condition";
>C. the "compiler transformation" that triggers the code movement.
>
> 4. In addition to backward threadjump, this patch can handle more compiler
>transformations: 
>A. forward threadjump;
>B. isolate-path;
>B. tree-sinking;
>
> 5. In addition  to -Warray-bound, making -Wstringop-* work as well.

stringop* are really the most notorious for this so this is very
welcome.

>
> 6. Adding debugging mechanism to the new data structure “move_history”;
> 7. Adding all the testing cases of the duplicated bugs as the testing cases.

Can you tag each of those PRs in the ChangeLog so the commit hook
updates those too?

> 8. More detailed comments in the patch;
>
> bootstrapping and regression testing on both x86 and aarch64.
>
> Please let me know any comment and suggestion.
>
> Thanks.
>
> Qing
>
> Qing Zhao (2):
>   Provide more contexts for -Warray-bounds, -Wstringop-* warning
> messages due to code movements from compiler transformation
> [PR109071]
>   Add debugging for move history.


[PATCH v2] RISC-V: Fix gcc.target/riscv/rvv/base/cpymem-1.c f3

2024-10-30 Thread Craig Blackmore
The function body checks for f3 only ran with -mcmodel explicitly set
which meant I missed a regression in my local testing of:

  commit b039d06c9a810a3fab4c5eb9d50b0c7aff94b2d8
  Author: Craig Blackmore 
  Date:   Fri Oct 18 09:17:21 2024 -0600

  [PATCH 3/7] RISC-V: Fix vector memcpy smaller LMUL generation

The failure showed up in the rivos CI and it is due to f3 now using
LMUL m1 instead of m8.

I have reworked the test to make it more robust and maintainable.  This
allowed most of the special casing of command line arguments to be
removed.  It also fixes an issue where some targets would enable
multiple versions of the function body check e.g. `-march=rv32gcv
-mcmodel=medany`.

Changes since v1: Added missing ChangeLog.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-1.c: Fix and rework f3.
---
 .../gcc.target/riscv/rvv/base/cpymem-1.c  | 107 --
 1 file changed, 48 insertions(+), 59 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
index 6edb4c9253a..81d14d83633 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-1.c
@@ -9,6 +9,8 @@
 extern void *memcpy(void *__restrict dest, const void *__restrict src, 
__SIZE_TYPE__ n);
 #endif
 
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8)
+
 /* memcpy should be implemented using the cpymem pattern.
 ** f1:
 XX \.L\d+: # local label is ignored
@@ -50,70 +52,57 @@ void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l)
Use extern here so that we get a known alignment, lest
DATA_ALIGNMENT force us to make the scan pattern accomodate
code for different alignments depending on word size.
-** f3: { target { { any-opts "-mcmodel=medlow" } && { no-opts 
"-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl1024b" "-mrvv-max-lmul=dynamic" 
"-mrvv-max-lmul=m2" "-mrvv-max-lmul=m4" "-mrvv-max-lmul=m8" 
"-mrvv-vector-bits=zvl" } } }
-**lui\s+[ta][0-7],%hi\(a_a\)
-**addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
-**lui\s+[ta][0-7],%hi\(a_b\)
-**addi\s+a4,[ta][0-7],%lo\(a_b\)
-**vsetivli\s+zero,16,e32,m8,ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medlow -mrvv-vector-bits=zvl" 
"-mcmodel=medlow -march=rv64gcv_zvl512b -mrvv-vector-bits=zvl" } && { no-opts 
"-march=rv64gcv_zvl1024b" } } }
-**lui\s+[ta][0-7],%hi\(a_a\)
-**lui\s+[ta][0-7],%hi\(a_b\)
-**addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
-**addi\s+a4,[ta][0-7],%lo\(a_b\)
-**vl(1|4|2)re32\.v\s+v\d+,0\([ta][0-7]\)
-**vs(1|4|2)r\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medlow -march=rv64gcv_zvl1024b" 
"-mcmodel=medlow -march=rv64gcv_zvl512b" } && { no-opts "-mrvv-vector-bits=zvl" 
} } }
-**lui\s+[ta][0-7],%hi\(a_a\)
-**lui\s+[ta][0-7],%hi\(a_b\)
-**addi\s+a4,[ta][0-7],%lo\(a_b\)
-**vsetivli\s+zero,16,e32,(m1|m4|mf2),ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medany" } && { no-opts 
"-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl256b" "-march=rv64gcv_zvl1024b" 
"-mrvv-max-lmul=dynamic" "-mrvv-max-lmul=m8" "-mrvv-max-lmul=m4" 
"-mrvv-vector-bits=zvl" } } }
-**lla\s+[ta][0-7],a_a
-**lla\s+[ta][0-7],a_b
-**vsetivli\s+zero,16,e32,m8,ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
-*/
-
-/*
-** f3: { target { { any-opts "-mcmodel=medany"  } && { no-opts 
"-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl256b" "-march=rv64gcv" 
"-march=rv64gc_zve64d" "-march=rv64gc_zve32f" } } }
-**lla\s+[ta][0-7],a_b
-**vsetivli\s+zero,16,e32,m(f2|1|4),ta,ma
-**vle32.v\s+v\d+,0\([ta][0-7]\)
-**lla\s+[ta][0-7],a_a
-**vse32\.v\s+v\d+,0\([ta][0-7]\)
-**ret
+** f3: { target { no-opts "-mrvv-vector-bits=zvl" } }
+**  (
+**  lui\s+[ta][0-7],%hi\(a_a\)
+**  lui\s+[ta][0-7],%hi\(a_b\)
+**  addi\s+[ta][0-7],[ta][0-7],%lo\(a_b\)
+**  vsetivli\s+zero,4,e32,m1,ta,ma
+**  |
+**  lui\s+[ta][0-7],%hi\(a_a\)
+**  lui\s+[ta][0-7],%hi\(a_b\)
+**  li\s+[ta][0-7],\d+
+**  addi\s+[ta][0-7],[ta][0-7],%lo\(a_b\)
+**  vsetvli\s+zero,[ta][0-7],e32,m1,ta,ma
+**  |
+**  lla\s+[ta][0-7],a_b
+**  vsetivli\s+zero,4,e32,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  lla\s+[ta][0-7],a_b
+**  vsetvli\s+zero,[ta][0-7],e32,m1,ta,ma
+**  |
+**  lla\s+[ta][0-7],a_b
+**  li\s+[ta][0-7],32
+**  vsetvli\s+zero,[ta][0-7],e32,m1,ta,ma
+**  )
+**  vle32.v\s+v\d+,0\([ta][0-7]\)
+**  (
+**  addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
+**  |
+**  lla\s+[ta][0-7],a_a
+**  )
+**  vse32.v\s+v\d+,0\([ta][0-7]\)
+**  ret
 */
 
 /*
-** f3: { target { { any-opts "-mcmodel=meda

[Patch] OpenMP/C++: Fix declare variant with reference-returning functions

2024-10-30 Thread Tobias Burnus

Before the patch, the included testcase fails with:

declare-variant-9.C:4:29: error: could not find variant declaration
4 | #pragma omp declare variant(variant_fn) match(user={condition(1)})
  | ^~

Comments, remarks, suggestions before I commit it?

Tobias
OpenMP/C++: Fix declare variant with reference-returning functions

gcc/cp/ChangeLog:

	* decl.cc (omp_declare_variant_finalize_one): Strip indirect ref
	around function decl when processing variant function.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/declare-variant-9.C: New test.

 gcc/cp/decl.cc|  3 +++
 gcc/testsuite/g++.dg/gomp/declare-variant-9.C | 14 ++
 2 files changed, 17 insertions(+)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 0bc320a2b39..b638f3af294 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8375,6 +8375,9 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
   if (variant == error_mark_node && !processing_template_decl)
 return true;
 
+  if (TREE_CODE (variant) == INDIRECT_REF)
+variant = TREE_OPERAND (variant, 0);
+
   variant = cp_get_callee_fndecl_nofold (variant);
   input_location = save_loc;
 
diff --git a/gcc/testsuite/g++.dg/gomp/declare-variant-9.C b/gcc/testsuite/g++.dg/gomp/declare-variant-9.C
new file mode 100644
index 000..7856f7c40cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/declare-variant-9.C
@@ -0,0 +1,14 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+int &variant_fn();
+
+#pragma omp declare variant(variant_fn) match(user={condition(1)})
+int &bar();
+
+void sub(int &a)
+{
+ bar();
+ a = bar(); 
+}
+
+/* { dg-final { scan-tree-dump "  variant_fn \\(\\);" "gimple" } } */
+/* { dg-final { scan-tree-dump "  _1 = variant_fn \\(\\);" "gimple" } } */


Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Richard Sandiford
Jeff Law  writes:
> On 10/30/24 9:31 AM, Richard Sandiford wrote:
>
>> 
>> OK (and yeah, I can sympathise).  But I think there's an argument that,
>> if you're scheduling for one in-order core using the pipeline of an
>> unrelated core, that's effectively scheduling for the core as though
>> it were out-of-order.  In other words, the property we care about
>> isn't so much whether the processor itself is in-order (a statement
>> about the uarch), but whether we trying to schedule for a particular
>> in-order pipeline (a statement about what GCC is doing or knows about).
>> I'd argue that in the case you describe, we're not trying to schedule
>> for a particular in-order pipeline.
> I can see that point.
>
>> 
>> That might need some finessing of the name.  But I think the concept
>> is right.  I'd rather base the hook (or param) on a general concept
>> like that rather than a specific "wide vs narrow" thing.
> Agreed.  Naming was my real only concern about the first patch.
>
>> 
>>> I still see Vineet's data as compelling, even with GIGO concern.
>> 
>> Do you mean the reduction in dynamic instruction counts?  If so,
>> that isn't what the algorithm is aiming to reduce.  Like I mentioned
>> in the previous thread, trying to minimise dynamic instruction counts
>> was also harmful for the core & benchmarks I was looking at.
>> We just ended up with lots of pipeline bubbles that could be
>> alleviated by judicious spilling.
> Vineet showed significant cycle and icount improvements.  I'm much more 
> interested in the former :-)
>
> I'm planning to run it on our internal design, but it's not the top of 
> the priority list and it's a scarce resource right now...  I fully 
> expect it'll show a cycle improvement there too, though probably much 
> smaller than the improvement seen on that spacemit k1 design.
>
>> 
>> I'm not saying that the algorithm gets the decision right for cactu
>> when tuning for in-order CPU X and running on that same CPU X.
>> But it seems like that combination hasn't been tried, and that,
>> even on the combinations that the patch has been tried on, the cactu
>> justification is based on static properties of the binary rather than
>> a particular runtime improvement (Y% faster).
>> 
>> To be clear, the two paragraphs above are trying to explain why I think
>> this should be behind a hook or param rather than unconditional.  The
>> changes themselves look fine, and incorporate the suggestions from the
>> previous thread (thanks!).
> Thanks for that clarifying statement.  I actually think we're broadly in 
> agreement here -- keep it as a hook/param rather than making it 
> unconditional.

Yeah, agreed.

> Assuming we keep it as a hook/param, opt-in & come up with better 
> name/docs, any objections from your side?

No, seems fine to me.

I'm kind-of leaning towards a --param.  The hook definition would logically
be determined by -mtune (at least on targets like aarch32 that do have
meaningful in-order scheduling descriptions -- I can imagine that for
aarch64 we'd set it unconditionally).  But that wouldn't capture the
case above, where you're tuning for a different core from the one
that will actually be used.

How about:

--param cycle-accurate-model

but with the description:

  Whether GCC should assume that the scheduling description is mostly
  a cycle-accurate model of the target processor, in the absence of cache
  misses.  Nonzero usually means that the selected scheduling model
  describes an in-order processor, that the scheduling model accurately
  predicts pipeline bubbles in the absence of cache misses, and that GCC
  should assume that the scheduling model matches the target that the code
  is intended to run on.

(with better word-smithing)?

I suppose it should initially default to 1, but we could flip that later
if all major targets set it to 0.  (Or we could take that as proof that
the old approach isn't needed and just remove the --param.)

A param would also be cheaper to test.

Thanks,
Richard


Re: [PATCH v3 2/2][RFC] Add debugging for move history.

2024-10-30 Thread Sam James
Qing Zhao  writes:

> gcc/ChangeLog:
>
>   * diagnostic-move-history.cc (dump_move_history): New routine.
>   (dump_move_history_for): Likewise.
>   (debug_mv_h): Likewise.
>   * diagnostic-move-history.h (dump_move_history): New prototype.
>   (dump_move_history_for): Likewise.
>   * gimple-ssa-isolate-paths.cc (isolate_path): Add debugging message
>   when setting move history for statements.
>   * tree-ssa-sink.cc (sink_code_in_bb): Likewise.
>   * tree-ssa-threadupdate.cc (ssa_redirect_edges): Likewise.
>   (back_jt_path_registry::duplicate_thread_path): Likewise.
> ---
>  gcc/diagnostic-move-history.cc  | 67 +
>  gcc/diagnostic-move-history.h   |  2 +
>  gcc/gimple-ssa-isolate-paths.cc | 10 +
>  gcc/tree-ssa-sink.cc|  3 ++
>  gcc/tree-ssa-threadupdate.cc| 18 +
>  5 files changed, 100 insertions(+)
>
> diff --git a/gcc/diagnostic-move-history.cc b/gcc/diagnostic-move-history.cc
> index b0e8308dbf6b..e4c471ab50f3 100644
> --- a/gcc/diagnostic-move-history.cc
> +++ b/gcc/diagnostic-move-history.cc
> @@ -24,6 +24,7 @@
>  #include "backend.h"
>  #include "tree.h"
>  #include "gimple.h"
> +#include "tree-pretty-print.h"
>  #include "gimple-iterator.h"
>  #include "cfganal.h"
>  #include "diagnostic-move-history.h"
> @@ -262,3 +263,69 @@ set_move_history_to_stmts_in_bb (basic_block bb, edge 
> entry,
>  
>return true;
>  }
> +
> +/* Dump the move_history data structure MV_HISTORY.  */
> +
> +void
> +dump_move_history (FILE *file, move_history_t mv_history)
> +{
> +  fprintf (file, "The move history is: \n");

"is:\n"

> +  if (!mv_history)
> +{
> +  fprintf (file, "No move history.\n");
> +  return;
> +}
> +
> +  for (move_history_t cur_ch = mv_history; cur_ch;
> +   cur_ch = cur_ch->prev_move)
> +{
> +  expanded_location exploc_cond = expand_location (cur_ch->condition);
> +
> +  if (exploc_cond.file)
> + fprintf (file, "[%s:", exploc_cond.file);
> +  fprintf (file, "%d, ", exploc_cond.line);
> +  fprintf (file, "%d] ", exploc_cond.column);
> +
> +  fprintf (file, "%s ", cur_ch->is_true_path ? "true" : "false");
> +  const char *reason = NULL;
> +  switch (cur_ch->reason)
> + {
> + case COPY_BY_THREAD_JUMP:
> +   reason = "copy_by_thread_jump";
> +   break;
> + case COPY_BY_ISOLATE_PATH:
> +   reason = "copy_by_isolate_path";
> +   break;
> + case MOVE_BY_SINK:
> +   reason = "move_by_sink";
> +   break;
> + default:
> +   reason = "UNKNOWN";
> +   break;
> + }
> +  fprintf (file, "%s \n", reason);

"s\n"

> +}
> +}
> +
> +/* Dump the move_history date structure attached to the gimple STMT.  */

date -> data

> +void
> +dump_move_history_for (FILE *file, const gimple *stmt)
> +{
> +  move_history_t mv_history = get_move_history (stmt);
> +  if (!mv_history)
> +fprintf (file, "No move history.\n");
> +  else
> +dump_move_history (file, mv_history);
> +}
> +
> +DEBUG_FUNCTION void
> +debug_mv_h (const move_history_t mv_history)
> +{
> +  dump_move_history (stderr, mv_history);
> +}
> +
> +DEBUG_FUNCTION void
> +debug_mv_h (const gimple * stmt)
> +{
> +  dump_move_history_for (stderr, stmt);
> +}
> [...]


Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao
Thanks, Sam.

Yes, the changes you made are exactly what I made in my local area for the 
rebase.
All the new testing cases passed.
I am doing the complete regression test on X86 and also bootstrap on aarch64 
right now.

Qing

> On Oct 30, 2024, at 13:33, Sam James  wrote:
> 
> Qing Zhao  writes:
> 
>>> On Oct 30, 2024, at 10:48, David Malcolm  wrote:
>>> 
>>> On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
 Qing Zhao  writes:
 
> Control this with a new option -fdiagnostics-details.
> 
> [...]
 
 The patch doesn't apply for me on very latest trunk -- I think
 David's
 recent diag refactoring means it needs a slight rebase. Could you
 send
 that?
>>> 
>>> If it's broken, it was probably by:
>>> 
>>> r15-4610 ("Use unique_ptr in more places in pretty_printer/diagnostics
>>> [PR116613]")
>>> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bf43fe6aa966eaf397ea3b8ebd6408d3d124e285
>> 
>> Yes, due to the following change in the above commit:
>> 
>> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
>> index 
>> 62034c32b4aff32cdf2cb051bf9d0803b4730b3f..a12a2e1afba15ba16f6ade624cde3e60907ba5d2
>>  100644 (file)
>> --- a/gcc/toplev.cc
>> +++ b/gcc/toplev.cc
>> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>> #include "cgraph.h"
>> #include "coverage.h"
>> #include "diagnostic.h"
>> +#include "pretty-print-urlifier.h"
>> #include "varasm.h"
>> #include "tree-inline.h"
>> #include "realmpfr.h"  /* For GMP/MPFR/MPC versions, in print_version.  */
>> 
>> 
>> [...]
>>> 
> 
> To continue testing, I am using the attached hacked up patches (David,
> please don't hurt me, it was minimal "just get it building" x "do more
> than needed to reduce iterations" ;)).
> 
> From f0d521fb56035e71a2b7da3a6c524abab811b42b Mon Sep 17 00:00:00 2001
> Message-ID: 
> 
> In-Reply-To: 
> References: 
> From: Sam James 
> Date: Wed, 30 Oct 2024 16:15:55 +
> Subject: [PATCH 1/2] gcc: add INCLUDE_MEMORY to diagnostic-move-history (and
> friends)
> 
> gcc/ChangeLog:
> 
> * diagnostic-move-history.cc (INCLUDE_MEMORY): Define INCLUDE_MEMORY.
> * move-history-diagnostic-path.cc (INCLUDE_MEMORY): Ditto.
> * move-history-diagnostic-path.h (INCLUDE_MEMORY): Ditto.
> ---
> gcc/diagnostic-move-history.cc  | 1 +
> gcc/gimple-array-bounds.cc  | 1 +
> gcc/move-history-diagnostic-path.cc | 1 +
> gcc/move-history-diagnostic-path.h  | 1 +
> 4 files changed, 4 insertions(+)
> 
> diff --git a/gcc/diagnostic-move-history.cc b/gcc/diagnostic-move-history.cc
> index e4c471ab50f..49adeac1094 100644
> --- a/gcc/diagnostic-move-history.cc
> +++ b/gcc/diagnostic-move-history.cc
> @@ -18,6 +18,7 @@
>along with GCC; see the file COPYING3.  If not see
>.  */
> 
> +#define INCLUDE_MEMORY
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> index 464dafa6555..a0b04ed0bc5 100644
> --- a/gcc/gimple-array-bounds.cc
> +++ b/gcc/gimple-array-bounds.cc
> @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public 
> License
> along with GCC; see the file COPYING3.  If not see
> .  */
> 
> +#define INCLUDE_MEMORY
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> diff --git a/gcc/move-history-diagnostic-path.cc 
> b/gcc/move-history-diagnostic-path.cc
> index ab29893d1f6..15034616be7 100644
> --- a/gcc/move-history-diagnostic-path.cc
> +++ b/gcc/move-history-diagnostic-path.cc
> @@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public 
> License
> along with GCC; see the file COPYING3.  If not see
> .  */
> 
> +#define INCLUDE_MEMORY
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> diff --git a/gcc/move-history-diagnostic-path.h 
> b/gcc/move-history-diagnostic-path.h
> index d04337ea377..dd27cec 100644
> --- a/gcc/move-history-diagnostic-path.h
> +++ b/gcc/move-history-diagnostic-path.h
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
> #ifndef GCC_MOVE_HISTORY_DIAGNOSTIC_PATH_H
> #define GCC_MOVE_HISTORY_DIAGNOSTIC_PATH_H
> 
> +#define INCLUDE_MEMORY
> #include "diagnostic-path.h"
> #include "simple-diagnostic-path.h"
> #include "diagnostic-move-history.h"
> -- 
> 2.47.0
> 
> From 4933baa95dd6994443e299606e4dbfc0bad67be0 Mon Sep 17 00:00:00 2001
> Message-ID: 
> <4933baa95dd6994443e299606e4dbfc0bad67be0.1730309582.git@gentoo.org>
> In-Reply-To: 
> References: 
> From: Sam James 
> Date: Wed, 30 Oct 2024 17:12:08 +
> Subject: [PATCH 2/2] gcc: adapt to m_printer change
> 
> gcc/ChangeLog:
> 
> * move-history-diagnostic-path.cc (build_rich_location_with_diagnostic_path): 
> Use get_reference_printer.
> ---
> gcc/move-history-diagnostic-path.cc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/move-history-diagnostic-path.cc 
> b/gcc/move-history-diagnostic-path.cc
> index 1503461

Re: [PATCH v3 0/2][RFC]Provide more contexts for -Warray-bounds warning messages

2024-10-30 Thread Qing Zhao


> On Oct 30, 2024, at 13:38, Sam James  wrote:
> 
> 
> Absolutely. Both in terms of improving safety as the whole point of
> these warnings is, but also stopping users from being panicked. They
> sometimes believe the warnings imply miscompilation because they can't
> understand how they would happen otherwise.
> 
> An earlier version of these patches already helped find a real bug in
> GNU wget.
> 
> I will be testing these patches and reporting any issues I see (either
> in GCC or to projects).

Thanks a lot for the help!
> 
> 
>> In this patch, we try to provide more contexts for -Warray-bounds, 
>> -Wstringop-*
>> warning messages due to code movements from various compiler transformations.
>> 
>> Control this with a new option -fdiagnostics-details.
>> 
>> Compared to the 2nd version: 
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657150.html
>> 
>> Which is limited to fix PR109071, there are the following major improvement
>> to the patch:
>> 
>> 1. All the following current open PRs were identified as the duplications
>>   of PR109071 and were studied and fixed by this 3rd version of the patch.
>>   each testing case was added to the patch as a unit-test case:
>> 
>> PR88771
>> PR85788
>> PR108770
>> PR106762
>> PR115274
>> PR117179
>> 
>> 2. Change the name of the new option from -fdiagnostics-explain-harder
>>   to -fdiagnostics-details;
> 
> The new name sounds good to me.
> 
>> 
>> 3. Change the name of the new data structure from "copy_history" to
>>   "move_history" due to the following reason:
>> 
>>   The key feature of the compiler transformation that might provide more
>>   accurate information to the value-range analysis is: moving a statement
>>   from the joint point of one specific condition to this condition's
>>   TRUE or FALSE path.  
>> 
>>   For example, threadjump and isolate-path transformation make duplicated
>>   basic block "Ba'" of the original basic block "Ba" that on the joint
>>   point of a condition "cond", and then move the two basic block "Ba'"
>>   and "Ba" to the TRUE path and FALSE path of the condition "cond"; on the
>>   otherhand, tree sink transformation just move some of the statements from
>>   the joint point of a condition "cond" to one specific path of this
>>   condition. 
>> 
>>   So, the new data structure "move_history" will include the following
>>   information:
>>   A. the "condition" that triggers the code movement;
>>   B. whether the code movement is on the true path of the "condition";
>>   C. the "compiler transformation" that triggers the code movement.
>> 
>> 4. In addition to backward threadjump, this patch can handle more compiler
>>   transformations: 
>>   A. forward threadjump;
>>   B. isolate-path;
>>   B. tree-sinking;
>> 
>> 5. In addition  to -Warray-bound, making -Wstringop-* work as well.
> 
> stringop* are really the most notorious for this so this is very
> welcome.
> 
>> 
>> 6. Adding debugging mechanism to the new data structure “move_history”;
>> 7. Adding all the testing cases of the duplicated bugs as the testing cases.
> 
> Can you tag each of those PRs in the ChangeLog so the commit hook
> updates those too?

Sure, I will add them in the next version.

Qing



Re: [PATCH] c++: Fix crash during NRV optimization with invalid input [PR117099]

2024-10-30 Thread Simon Martin
Hi,

Just closing the loop on this...

On 19 Oct 2024, at 11:57, Iain Sandoe wrote:

On 19 Oct 2024, at 10:16, Simon Martin  wrote:

On 18 Oct 2024, at 10:55, Sam James wrote:

Simon Martin  writes:

Hi Sam,

Hi Simon,

On 16 Oct 2024, at 22:06, Sam James wrote:

Simon Martin  writes:

We ICE upon the following invalid code because we end up calling
finalize_nrv_r with a RETURN_EXPR with no operand.

=== cut here ===
struct X {
 ~X();
};
X test(bool b) {
 {
 X x;
 return x;
 }
 if (!(b)) return;
}
=== cut here ===

This patch fixes this by simply returning error_mark_node when
detecting
a void return in a function returning non-void.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/117099

gcc/cp/ChangeLog:

* typeck.cc (check_return_expr): Return error_mark_node upon
void return for function returning non-void.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash77.C: New test.

---
gcc/cp/typeck.cc | 1 +
gcc/testsuite/g++.dg/parse/crash77.C | 14 ++
2 files changed, 15 insertions(+)
create mode 100644 gcc/testsuite/g++.dg/parse/crash77.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 71d879abef1..22a6ec9a185 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11238,6 +11238,7 @@ check_return_expr (tree retval, bool
*no_warning, bool *dangling)
 RETURN_EXPR to avoid control reaches end of non-void function
 warnings in tree-cfg.cc. */
 *no_warning = true;
+ return error_mark_node;
 }
 /* Check for a return statement with a value in a function that
 isn't supposed to return a value. */
diff --git a/gcc/testsuite/g++.dg/parse/crash77.C
b/gcc/testsuite/g++.dg/parse/crash77.C
new file mode 100644
index 000..d3f0ae6a877
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash77.C
@@ -0,0 +1,14 @@
+// PR c++/117099
+// { dg-compile }

dg-do compile

Aarg, of course, thanks for spotting this! Fixed in the attached
version.

+
+struct X {
+ ~X();
+};
+
+X test(bool b) {
+ {
+ X x;
+ return x;
+ }
+ if (!(b)) return; // { dg-error "return-statement with no value" }
+}
-- 
2.44.0

BTW, the line-endings on this seem a bit odd. Did you use
git-send-email?

I did use git-send-email indeed. What oddities do you see with line
endings?
cat -A over the patch file looks good.

Weird -- if I open your original email in mu4e, I see a bunch of ^M at
the end of the lines.

Strange. FWIW I’m generating and sending the patches from a MacOS box,
and there might be some weirdness coming from that. I’ll check and try
to fix.

macOS Mail messes up whitespace, for sure (but I’ve not seen it append ).
AFAIK “git send-email” works fine (at least no-one has reported a problem).

maybe the editor in use thinks the target format is Windows and has changed
line endings accordingly?

I investigated further and the issue was actually linked to the AWS
WorkMail SMTP server: when I use the native AWS SES one instead, my test

messages don’t have any =0D=0A anymore.

I’ve permanently updated my configuration to use SES, and all my future
git send-email’s should be good (note that this was purely a SMTP issue,
and the commits that I actually pushed never had any \r).

Thanks, Simon



Re: [PATCH v4 4/7] OpenMP: C++ front-end support for dispatch + adjust_args

2024-10-30 Thread Paul-Antoine Arras

On 24/10/2024 16:10, Tobias Burnus wrote:

Hi PA;

only playing around quickly and glancing at the patch; I need to have a
real look at this later.

Paul-Antoine Arras:
This patch adds C++ support for the `dispatch` construct and the 
`adjust_args`
clause. It relies on the c-family bits comprised in the corresponding 
C front

end patch for pragmas and attributes.


Regarding the parsing, I am wondering whether you could do the same as
proposed for the C parser, i.e. instead of swallowing '(' just checking
whether it is there - and then call the normaler parser, followed by 
checking that it is only the call and not expressions involving that call.


In C++, there is no equivalent to 
c_parser_postfix_expression_after_primary. So I am now calling 
cp_parser_postfix_expression, which parses not only the argument list 
but the whole function call.



* * *

Starting with playing around a bit:

int variant_fn(int *, int * = nullptr);

#pragma omp declare variant(variant_fn) match(construct={dispatch}) 
adjust_args(need_device_ptr : x, y)

int bar(int *x, int *y = nullptr);

void sub(int *a, int *b)
{
   int x;
   #pragma omp dispatch
    x = bar(a);
}

     D.2973 = __builtin_omp_get_default_device ();
     D.2974 = __builtin_omp_get_mapped_ptr (0B, D.2973);
     D.2975 = __builtin_omp_get_mapped_ptr (a, D.2973);
     x = variant_fn (D.2975, D.2974);

This code should work, but converting the NULL pointer is is a bit
pointless and OpenMP (current 6.0 draft) states:

"For each adjust_args clause that is present on the selected function
variant, the adjustment operation specified by the adjust-op modifier is
applied to each argument specified in the clause before being passed to
the selected function variant. Any argument specified in the clause that
does not exist at a given function call site is ignored."


Removed pointless conversion in the attached patch.


* * *

The following testcase produces an odd error in the C++ FE:

int& variant_fn();

#pragma omp declare variant(variant_fn) match(construct={dispatch})
int& bar();

void sub(int a)
{
   #pragma omp dispatch
     bar();
   #pragma omp dispatch
     a = bar();
}


I can reproduce but it predates my patch set. Should I try to fix it now?


* * *

Tobias



--
PAcommit 80ff2a257b07206c15d3a2d0200dba7e8234d87e
Author: Paul-Antoine Arras 
Date:   Fri May 24 18:38:07 2024 +0200

OpenMP: C++ front-end support for dispatch + adjust_args

This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.

Additional C/C++ common testcases are provided in a subsequent patch in the
series.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add parameter. Update call to
cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
* g++.dg/gomp/dispatch-3.C: New test.

diff --git gcc/cp/decl.cc gcc/cp/decl.cc
index 0c5b5c06a12..de8d088e69c 100644
--- gcc/cp/decl.cc
+++ gcc/cp/decl.cc
@@ -8403,6 +8403,13 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
 	  if (!omp_context_selector_matches (ctx))
 	return true;
 	  TREE_PURPOSE (TREE_VALUE (attr)) = variant;
+
+	  // Prepend adjust_args list to variant attributes
+	  tree adjust_args_list = TREE_CHAIN (TREE_CHAIN (chain));
+	  if (adjust_args

[PATCH] aarch64: Forbid F64MM permutes in streaming mode

2024-10-30 Thread Richard Sandiford
The current code was based on an early version of the SME spec,
which allowed the .Q forms of TRN1, TRN2, UZP1, UZP2, ZIP1, and ZIP2
to be used in streaming mode.  We should now forbid them instead;
see 
https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/TRN1--TRN2--vectors---Interleave-even-or-odd-elements-from-two-vectors-?lang=en
and the corresponding entries for the others.

Tested on aarch64-linux-gnu.  I'm planning to push to trunk and gcc-14
branch tomorrow evening if there are no comments before then.

Richard


gcc/
* config/aarch64/aarch64-sve-builtins-base.def (svtrn1q, svtrn2q)
(svuzp1q, svuzp2q, svzip1q, svzip2q): Require SM_OFF.

gcc/testsuite/
* g++.target/aarch64/sve/aarch64-ssve.exp: Add tests for trn[12]q,
uzp[12].c, and zip[12]q.
* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Skip for
STREAMING_COMPATIBLE.
* gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise.
 

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao


> On Oct 30, 2024, at 10:48, David Malcolm  wrote:
> 
> On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
>> Qing Zhao  writes:
>> 
>>> Control this with a new option -fdiagnostics-details.
>>> 
>>> [...]
>> 
>> The patch doesn't apply for me on very latest trunk -- I think
>> David's
>> recent diag refactoring means it needs a slight rebase. Could you
>> send
>> that?
> 
> If it's broken, it was probably by:
> 
> r15-4610 ("Use unique_ptr in more places in pretty_printer/diagnostics
> [PR116613]")
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bf43fe6aa966eaf397ea3b8ebd6408d3d124e285

Yes, due to the following change in the above commit:

diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index 
62034c32b4aff32cdf2cb051bf9d0803b4730b3f..a12a2e1afba15ba16f6ade624cde3e60907ba5d2
 100644 (file)
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "coverage.h"
 #include "diagnostic.h"
+#include "pretty-print-urlifier.h"
 #include "varasm.h"
 #include "tree-inline.h"
 #include "realmpfr.h"  /* For GMP/MPFR/MPC versions, in print_version.  */



> 
> and/or 
> 
> r15-4617 ("analyzer: avoid implicit use of global_dc's pretty_printer
> [PR116613]") 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666390.html
> 
> which both made small changes to the internal interface for creating
> events in a diagnostic path, and possibly by:
> 
> r15-4760 ("diagnostics: support multiple output formats simultaneously
> [PR116613]")
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666807.html
> 
> which made big changes to diagnostic_context internally.

Just rebased the patch against the latest trunk today.
Redo the bootstrap and regression test now. 

> 
> Sorry about this; let me know if you want help debugging/fixing things.

I have a question on the changes to the “warning_at”: (there are a lot of such 
changes for -Warray-bounds and -Wstringop-**)

-   warned = warning_at (location, OPT_Warray_bounds_,
+   {
+ rich_location *richloc
+   = build_rich_location_with_diagnostic_path (location, stmt);
+ warned = warning_at (richloc, OPT_Warray_bounds_,

The above is the current change.

My concern with this change is: 
even when -fdiagnostics_details is NOT on, the rich_location is created. 

How much is the additional overhead when using “rich_location *” other than 
“location_t” as the 1st argument of warning_at?

Should I control the creation of “rich_location" with the flag 
“flag_diagnostics_details” (Similar as I control the creation of “move_history” 
data structure with the flag “flag_diagnostics_details”? 

If so, how should I do it? Do you have a suggestion on a clean and simply 
coding here (Sorry for the stupid question on this)

Thanks a lot for the help.

Qing



> Dave
> 



Re: [PATCH v2 2/2] Match: make SAT_ADD case 7 commutative

2024-10-30 Thread Akram Ahmad

On 29/10/2024 12:48, Richard Biener wrote:

On Mon, Oct 28, 2024 at 4:45 PM Akram Ahmad  wrote:

Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.

The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Therefore, this commit adds the
commutative property to this case which causes more valid cases of
unsigned saturating arithmetic to be recognised.

Before:
  
  _1 = BIT_FIELD_REF ;
  sum_5 = _1 + a_4(D);
  if (a_4(D) <= sum_5)
goto ; [INV]
  else
goto ; [INV]

   :

   :
  _2 = PHI <255(3), sum_5(2)>
  return _2;

After:
[local count: 1073741824]:
   _1 = BIT_FIELD_REF ;
   _2 = .SAT_ADD (_1, a_4(D)); [tail call]
   return _2;

This passes the aarch64-none-linux-gnu regression tests with no new
failures. The tests written in this patch will fail on targets which
do not implement the standard names for IFN SAT_ADD.

gcc/ChangeLog:

 * match.pd: Modify existing case for SAT_ADD.

gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test.
 * gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test.
 * gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test.
 * gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test.
---
  gcc/match.pd  |  4 ++--
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 21 +++
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 21 +++
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 21 +++
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fc5efa6247..98c50ab097f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
 SAT_ADD = (X + Y) | -((X + Y) < X)  */
  (match (usadd_left_part_1 @0 @1)
- (plus:c @0 @1)
+ (plus @0 @1)
   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
&& types_match (type, @0, @1

@@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* Unsigned saturation add, case 7 (branch with le):
 SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
  (match (unsigned_integer_sat_add @0 @1)
- (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
+ (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep))

  /* Unsigned saturation add, case 8 (branch with gt):
 SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
new file mode 100644
index 000..0202c70cc83
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */

The testcases will FAIL unless the target has support for .SAT_ADD - you want to
add proper effective target tests here.

The match.pd part looks OK to me.

Richard.


Hi Richard,

I assume this also applies to the tests written for the SAT_SUB pattern 
too in that case?


Many thanks,

Akram




\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
new file mode 100644
index 000..34c80ba3854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
new file mode 100644
index 000..0718cb566d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+#define UMAX (

[PATCH 0/5] Add btf_decl_tag and btf_type_tag C attributes

2024-10-30 Thread David Faust
This patch series adds support for the btf_decl_tag and btf_type_tag attributes
to GCC. This entails:

- Two new C-family attributes that allow to associate (to "tag") particular
  declarations and types with arbitrary strings. As explained below, this is
  intended to be used to, for example, characterize certain pointer types.  A
  single declaration or type may have multiple occurrences of these attributes.

- The conveyance of that information in the DWARF output in the form of a new
  DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
  kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF
  kinds are already supported by LLVM and other tools in the BPF ecosystem.

Both of these attributes are already supported by clang, and beginning to be
used in various ways by eBPF users and inside the Linux kernel.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
tags on certain language elements, such as struct fields.

The purpose of these annotations is to provide additional information about
types, variables, and function parameters of interest to the kernel. A
driving use case is to tag pointer types within the Linux kernel and eBPF
programs with additional semantic information, such as '__user' or '__rcu'.

For example, consider the Linux kernel function do_execve with the
following declaration:

  static int do_execve(struct filename *filename,
 const char __user *const __user *__argv,
 const char __user *const __user *__envp);

Here, __user could be defined with these annotations to record semantic
information about the pointer parameters (e.g., they are user-provided) in
DWARF and BTF information. Other kernel facilities such as the eBPF verifier
can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

The main motivation for emitting the tags in DWARF is that the Linux kernel
generates its BTF information via pahole, using DWARF as a source:

++  BTF  BTF   +--+
| pahole |---> vmlinux.btf --->| verifier |
++ +--+
^^
||
  DWARF |BTF |
||
 vmlinux  +-+
 module1.ko   | BPF program |
 module2.ko   +-+
   ...

This is because:

a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

b)  GCC can generate BTF for whatever target with -gbtf, but there is no
support for linking/deduplicating BTF in the linker.

In the scenario above, the verifier needs access to the pointer tags of
both the kernel types/declarations (conveyed in the DWARF and translated
to BTF by pahole) and those of the BPF program (available directly in BTF).

Another motivation for having the tag information in DWARF, unrelated to
BPF and BTF, is that the drgn project (another DWARF consumer) also wants
to benefit from these tags in order to differentiate between different
kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

This is easy: the main purpose of having this info in BTF is for the
compiled eBPF programs. The kernel verifier can then access the tags
of pointers used by the eBPF programs.

For more information about these tags and the motivation behind them, please
refer to the following Linux kernel discussions: [1], [2], [3].

DWARF Representation


Compared to prior iterations of this work, this patch series introduces a new
DWARF representation meant to address issues in the previous format. The format
is detailed below.

New DWARF extension: DW_TAG_GNU_annotation.  These DIEs encode the annotation
information.  They exist near the top level of the DIE tree as children of the
compilation unit DIE.  The user-supplied annotations ("tags") are encoded via
DW_AT_name and DW_AT_const_value.  DW_AT_name holds the name of the attribute
which is the source of the annotation (currently only "btf_type_tag" or
"btf_decl_tag").  DW_AT_const_value holds the arbitrary user string from the
attribute argument.

  DW_TAG_GNU_annotation
DW_AT_name: "btf_decl_tag" or "btf_type_tag"
DW_AT_const_value: 
DW_AT_GNU_annotation: see below.

New DWARF extension: DW_AT_GNU_annotation.  If present, the
DW_AT_GNU_annotation attribute is a reference to a DW_TAG_GNU_annotation DIE
holding annotations for the object.

If a single declaration or type at the language level has multiple occurrences
of btf_decl_tag 

Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Vineet Gupta
Hi Richard,

Apologies as I replied w/o looking for another update on the thread first.

On 10/30/24 11:35, Richard Sandiford wrote:
 I'm not saying that the algorithm gets the decision right for cactu
 when tuning for in-order CPU X and running on that same CPU X.
 But it seems like that combination hasn't been tried, and that,
 even on the combinations that the patch has been tried on, the cactu
 justification is based on static properties of the binary rather than
 a particular runtime improvement (Y% faster).
>> I'd requested Wilco to possibly try this on some in-order arm cores.
> OK.  FWIW, I think the original testing was on Cortex-A9 or Cortex-A15,
> It was also heavy on filters, such as yiq.
>
> But is this about making the argument in favour of an unconditional change?

Not really. This is just to get some additional data while we are on the topic 
and also help inform any future changes in the area.

> If so, I don't think it's necessary to front-load this testing.  Like I said
> in my reply to Jeff, that can happen naturally if all major targets move
> to the new behaviour.  And for a hook/param approach, we already have
> enough data to justify the patch.

Right we can proceed with param approach as you suggested.

Thx for taking a look. I look fwd to your comments on 3/4.

-Vineet


[PATCH 1/3] aarch64: Move ENTRY_VHSDF to aarch64-simd-pragma-builtins.def

2024-10-30 Thread Richard Sandiford
It's more convenient for later patches if we only define ENTRY_VHSDF
once, in the .def file.  Then the only macro that needs to be defined
before including the file is ENTRY itself.

The patch also moves the architecture requirements out of the
individual ENTRY invocations into a block-level definition of
REQUIRED_EXTENSIONS.  This reduces cut-&-paste a little and makes
things more consistent with aarch64-sve-builtins*.def.

gcc/
* config/aarch64/aarch64-builtins.cc (ENTRY): Remove the features
argument and get the features from REQUIRED_EXTENSIONS instead.
(ENTRY_VHSDF): Move definition to...
* config/aarch64/aarch64-simd-pragma-builtins.def: ...here.
Move the architecture requirements to REQUIRED_EXTENSIONS.
---
 gcc/config/aarch64/aarch64-builtins.cc| 22 +++
 .../aarch64/aarch64-simd-pragma-builtins.def  | 14 ++--
 2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 86d96e47f01..480ac223d86 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -780,17 +780,9 @@ typedef struct
   AARCH64_SIMD_BUILTIN_##T##_##N##A,
 
 #undef ENTRY
-#define ENTRY(N, S, M, U, F) \
+#define ENTRY(N, S, M, U) \
   AARCH64_##N,
 
-#undef ENTRY_VHSDF
-#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC, EXTENSIONS) \
-  AARCH64_##NAME##_f16, \
-  AARCH64_##NAME##q_f16, \
-  AARCH64_##NAME##_f32, \
-  AARCH64_##NAME##q_f32, \
-  AARCH64_##NAME##q_f64,
-
 enum aarch64_builtins
 {
   AARCH64_BUILTIN_MIN,
@@ -1602,16 +1594,8 @@ enum class aarch64_builtin_signatures
 };
 
 #undef ENTRY
-#define ENTRY(N, S, M, U, F) \
-  {#N, aarch64_builtin_signatures::S, E_##M##mode, U, F},
-
-#undef ENTRY_VHSDF
-#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC, EXTENSIONS) \
-  ENTRY (NAME##_f16, SIGNATURE, V4HF, UNSPEC, EXTENSIONS) \
-  ENTRY (NAME##q_f16, SIGNATURE, V8HF, UNSPEC, EXTENSIONS) \
-  ENTRY (NAME##_f32, SIGNATURE, V2SF, UNSPEC, EXTENSIONS) \
-  ENTRY (NAME##q_f32, SIGNATURE, V4SF, UNSPEC, EXTENSIONS) \
-  ENTRY (NAME##q_f64, SIGNATURE, V2DF, UNSPEC, EXTENSIONS)
+#define ENTRY(N, S, M, U) \
+  {#N, aarch64_builtin_signatures::S, E_##M##mode, U, REQUIRED_EXTENSIONS},
 
 /* Initialize pragma builtins.  */
 
diff --git a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def 
b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def
index f432185be46..9d530fc45d4 100644
--- a/gcc/config/aarch64/aarch64-simd-pragma-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-pragma-builtins.def
@@ -18,6 +18,16 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
+#undef ENTRY_VHSDF
+#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC) \
+  ENTRY (NAME##_f16, SIGNATURE, V4HF, UNSPEC) \
+  ENTRY (NAME##q_f16, SIGNATURE, V8HF, UNSPEC) \
+  ENTRY (NAME##_f32, SIGNATURE, V2SF, UNSPEC) \
+  ENTRY (NAME##q_f32, SIGNATURE, V4SF, UNSPEC) \
+  ENTRY (NAME##q_f64, SIGNATURE, V2DF, UNSPEC)
+
 // faminmax
-ENTRY_VHSDF (vamax, binary, UNSPEC_FAMAX, AARCH64_FL_FAMINMAX)
-ENTRY_VHSDF (vamin, binary, UNSPEC_FAMIN, AARCH64_FL_FAMINMAX)
+#define REQUIRED_EXTENSIONS AARCH64_FL_FAMINMAX
+ENTRY_VHSDF (vamax, binary, UNSPEC_FAMAX)
+ENTRY_VHSDF (vamin, binary, UNSPEC_FAMIN)
+#undef REQUIRED_EXTENSIONS
-- 
2.25.1



[PATCH 0/3] aarch64: Allow separate SVE and SME feature requirements

2024-10-30 Thread Richard Sandiford
Currently we represent architecture requirements using a single bitmask
of features.  However, some of the new extensions have different
requirements in non-streaming mode compared to stremaing mode.
This series adds support for that and applies it to FAMINMAX.

Tested on aarch64-linux-gnu.  Since we have quite a bit of work gated
behind this, I'm planning to commit tomorrow evening (UTC) if there are
no comments before then, but please let me know if you'd like more time
to review.

Richard

Richard Sandiford (3):
  aarch64: Move ENTRY_VHSDF to aarch64-simd-pragma-builtins.def
  aarch64: Record separate streaming and non-streaming ISA requirements
  aarch64: Require SVE2 and/or SME2 for SVE FAMINMAX intrinsics

 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-builtins.cc| 142 +-
 gcc/config/aarch64/aarch64-protos.h   |  87 ++-
 .../aarch64/aarch64-simd-pragma-builtins.def  |  14 +-
 .../aarch64/aarch64-sve-builtins-base.cc  |   4 -
 .../aarch64/aarch64-sve-builtins-base.def |  29 +---
 .../aarch64/aarch64-sve-builtins-sme.def  |  30 ++--
 .../aarch64/aarch64-sve-builtins-sve2.cc  |   4 +
 .../aarch64/aarch64-sve-builtins-sve2.def |  48 +++---
 gcc/config/aarch64/aarch64-sve-builtins.cc|  51 ---
 gcc/config/aarch64/aarch64-sve-builtins.h |  13 +-
 .../aarch64/sve/acle/general/amin_1.c |   9 ++
 .../aarch64/sve2/acle/asm/amax_f16.c  |   5 +-
 .../aarch64/sve2/acle/asm/amax_f32.c  |   5 +-
 .../aarch64/sve2/acle/asm/amax_f64.c  |   5 +-
 .../aarch64/sve2/acle/asm/amin_f16.c  |   5 +-
 .../aarch64/sve2/acle/asm/amin_f32.c  |   5 +-
 .../aarch64/sve2/acle/asm/amin_f64.c  |   5 +-
 18 files changed, 282 insertions(+), 181 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/amin_1.c

-- 
2.25.1



[PATCH 2/3] aarch64: Record separate streaming and non-streaming ISA requirements

2024-10-30 Thread Richard Sandiford
For some upcoming extensions, we need to add intrinsics whose
ISA requirements differ between streaming mode and non-streaming mode.
This patch tries to generalise the infrastructure to support that:

- Rather than have a single set of feature flags, the patch uses a
  separate set for sm_off (non-streaming, PSTATE.SM==0) and sm_on
  (streaming, PSTATE.SM==1).

- The sm_off set is zero if the intrinsic is streaming-only.
  Otherwise it is AARCH64_FL_SM_OFF | .

- Similarly, the sm_on set is zero if the intrinsic is non-streaming-only.
  Otherwise it is AARCH64_FL_SM_ON | .  AARCH64_FL_SME is
  taken as given in streaming mode.

- Streaming-compatible code must satisfy both sets of requirements.

There should be no functional change.

gcc/
* config.gcc (aarch64*-*-*): Add aarch64-protos.h to target_gtfiles.
* config/aarch64/aarch64-protos.h
(aarch64_required_extensions): New structure.
(aarch64_check_required_extensions): Change the type of the
required_extensions parameter from aarch64_feature_flags to
aarch64_required_extensions.
* config/aarch64/aarch64-sve-builtins.h
(function_builder::add_unique_function): Likewise.
(function_builder::add_overloaded_function): Likewise.
(function_builder::get_attributes): Likewise.
(function_builder::add_function): Likewise.
(function_group_info): Change the type of required_extensions
in the same way.
* config/aarch64/aarch64-builtins.cc
(aarch64_pragma_builtins_data::required_extensions): Change the type
from aarch64_feature_flags to aarch64_required_extensions.
(aarch64_check_required_extensions): Likewise change the type
of the required_extensions parameter.  Separate the requirements
for non-streaming mode and streaming mode, ORing them together
for streaming-compatible mode.
(aarch64_general_required_extensions): New function.
(aarch64_general_check_builtin_call): Use it.
* config/aarch64/aarch64-sve-builtins.cc
(registered_function::required_extensions): Change the type
from aarch64_feature_flags to aarch64_required_extensions.
(DEF_NEON_SVE_FUNCTION, DEF_SME_ZA_FUNCTION_GS): Update accordingly.
(function_builder::get_attributes): Change the type of the
required_extensions parameter from aarch64_feature_flags to
aarch64_required_extensions.
(function_builder::add_function): Likewise.
(function_builder::add_unique_function): Likewise.
(function_builder::add_overloaded_function): Likewise.
* config/aarch64/aarch64-simd-pragma-builtins.def: Update
REQUIRED_EXTENSIONS definitions to use aarch64_required_extensions.
* config/aarch64/aarch64-sve-builtins-base.def: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-builtins.cc| 122 ++
 gcc/config/aarch64/aarch64-protos.h   |  87 -
 .../aarch64/aarch64-simd-pragma-builtins.def  |   2 +-
 .../aarch64/aarch64-sve-builtins-base.def |  26 ++--
 .../aarch64/aarch64-sve-builtins-sme.def  |  30 ++---
 .../aarch64/aarch64-sve-builtins-sve2.def |  41 ++
 gcc/config/aarch64/aarch64-sve-builtins.cc|  51 +---
 gcc/config/aarch64/aarch64-sve-builtins.h |  13 +-
 9 files changed, 226 insertions(+), 148 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e2ed3b309cc..c3531e56c9d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -352,7 +352,7 @@ aarch64*-*-*)
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o 
aarch64-early-ra.o aarch64-ldp-fusion.o"
-   target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.h 
\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
+   target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h 
\$(srcdir)/config/aarch64/aarch64-builtins.h 
\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
 alpha*-*-*)
diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 480ac223d86..97bde7c15d3 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1595,7 +1595,8 @@ enum class aarch64_builtin_signatures
 
 #undef EN

[PATCH 2/5] dwarf: create annotation DIEs for btf tags

2024-10-30 Thread David Faust
The btf_decl_tag and btf_type_tag attributes provide a means to annotate
declarations and types respectively with arbitrary user provided
strings.  These strings are recorded in debug information for
post-compilation uses, and despite the name they are meant to be
recorded in DWARF as well as BTF.  New DWARF extensions
DW_TAG_GNU_annotation and DW_AT_GNU_annotation are used to represent
these user annotations in DWARF.

This patch introduces the new DWARF extension DIE and attribute, and
generates them as necessary to represent user annotations from
btf_decl_tag and btf_type_tag.

The format of the new DIE is as follows:

  DW_TAG_GNU_annotation
DW_AT_name: "btf_decl_tag" or "btf_type_tag"
DW_AT_const_value: 
DW_AT_GNU_annotation: 

DW_AT_GNU_annotation is a new attribute extension used to refer to these
new annotation DIEs.  If non-null in any given declaration or type DIE,
it is a reference to a DW_TAG_GNU_annotation DIE holding an annotation
for that declaration or type.  In addition, the DW_TAG_GNU_annotation
DIEs may also have a non-null DW_AT_GNU_annotation, referring to another
annotation DIE.  This allows chains of annotation DIEs to be formed,
such as in the case where a single declaration has multiple instances of
btf_decl_tag with different string annotations.

gcc/

* dwarf2out.cc (struct annotation_node, struct annotation_node_hasher)
(btf_tag_htab): New ancillary structures and hash table.
(annotation_node_hasher::hash, annotation_node_hasher::equal): New.
(hash_btf_tag, gen_btf_tag_dies, gen_btf_type_tag_dies)
(gen_btf_decl_tag_dies): New functions.
(modified_type_die): Handle btf_type_tag attribute.
(gen_formal_parameter_die): Call gen_btf_decl_tags for the parameter.
(gen_decl_die): Call gen_btf_decl_tags for the decl.
(dwarf2out_early_finish): Empty btf_tag_htab hash table.
(dwarf2out_cc_finalize): Delete btf_tag_htab hash table.

include/

* dwarf2.def (DW_TAG_GNU_annotation): New DWARF extension.
(DW_AT_GNU_annotation): Likewise.

gcc/testsuite/

* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c: New test.
---
 gcc/dwarf2out.cc  | 253 +-
 .../debug/dwarf2/dwarf-btf-decl-tag-1.c   |  11 +
 .../debug/dwarf2/dwarf-btf-decl-tag-2.c   |  25 ++
 .../debug/dwarf2/dwarf-btf-decl-tag-3.c   |  21 ++
 .../debug/dwarf2/dwarf-btf-type-tag-1.c   |  10 +
 .../debug/dwarf2/dwarf-btf-type-tag-2.c   |  31 +++
 .../debug/dwarf2/dwarf-btf-type-tag-3.c   |  15 ++
 include/dwarf2.def|   4 +
 8 files changed, 366 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 38aedb64470..9f95539062c 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3696,6 +3696,32 @@ static bool frame_pointer_fb_offset_valid;
 
 static vec base_types;
 
+/* A cached btf_type_tag or btf_decl_tag user annotation.  */
+struct GTY ((for_user)) annotation_node
+{
+  const char *name;
+  const char *value;
+  hashval_t hash;
+  dw_die_ref die;
+  struct annotation_node *next;
+};
+
+struct annotation_node_hasher : ggc_ptr_hash
+{
+  typedef const struct annotation_node *compare_type;
+
+  static hashval_t hash (struct annotation_node *);
+  static bool equal (const struct annotation_node *,
+const struct annotation_node *);
+};
+
+/* A hash table of tag annotation nodes for btf_type_tag and btf_decl_tag C
+   attributes.  DIEs for these user annotations may be reused if they are
+   structurally equivalent; this hash table is used to ensure the DIEs are
+   reused wherever possible.  */
+static GTY (()) hash_table *btf_tag_htab;
+
+
 /* Flags to represent a set of attribute classes for attributes that represent
a scalar value (bounds, pointers, ...).  */
 enum dw_scalar_form
@@ -13649,6 +13675,168 @@ long_double_as_float128 (tree type)
   return NULL_TREE;
 }
 
+
+hashval_t
+annotation_node_hasher::hash (struct annotation_node *node)
+{
+  return node->hash;
+}
+
+bool
+annotation_node_hasher::equal (const struct annotation_node *node1,
+  const struct annotation_node *node

[PATCH 3/3] aarch64: Require SVE2 and/or SME2 for SVE FAMINMAX intrinsics

2024-10-30 Thread Richard Sandiford
After the previous patch, we can now accurately model the ISA
requirements for the SVE FAMINMAX intrinsics.  They can be used
in non-streaming mode if TARGET_SVE2 and in streaming mode if
TARGET_SME2 (with both cases also requiring TARGET_FAMINMAX).
They can be used in streaming-compatible mode if TARGET_SVE2
&& TARGET_SME2.

Also, Kyrill pointed out in the original review of the FAMINMAX
support that it would be more consistent to define the rtl patterns
in aarch64-sve2.md rather than aarch64-sve.md, so the pushed patch
did that.  This patch moves the definitions of the intrinsics to
the sve2 files too, for consistency.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmax, svamin): Move
definitions to...
* config/aarch64/aarch64-sve-builtins-sve2.cc: ...here.
* config/aarch64/aarch64-sve-builtins-base.def (svmax, svamin): Move
definitions to...
* config/aarch64/aarch64-sve-builtins-sve2.def: ...here.  Require
SME2 in streaming mode.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/amin_1.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amax_f16.c: Enabled sve2 and
(for streaming mode) sme2.
* gcc.target/aarch64/sve2/acle/asm/amax_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amax_f64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amin_f16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amin_f32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/amin_f64.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve-builtins-base.cc  | 4 
 gcc/config/aarch64/aarch64-sve-builtins-base.def | 5 -
 gcc/config/aarch64/aarch64-sve-builtins-sve2.cc  | 4 
 gcc/config/aarch64/aarch64-sve-builtins-sve2.def | 7 +++
 .../gcc.target/aarch64/sve/acle/general/amin_1.c | 9 +
 .../gcc.target/aarch64/sve2/acle/asm/amax_f16.c  | 5 -
 .../gcc.target/aarch64/sve2/acle/asm/amax_f32.c  | 5 -
 .../gcc.target/aarch64/sve2/acle/asm/amax_f64.c  | 5 -
 .../gcc.target/aarch64/sve2/acle/asm/amin_f16.c  | 5 -
 .../gcc.target/aarch64/sve2/acle/asm/amin_f32.c  | 5 -
 .../gcc.target/aarch64/sve2/acle/asm/amin_f64.c  | 5 -
 11 files changed, 44 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/amin_1.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index fe16d93adcd..1c9f515a52c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -3184,10 +3184,6 @@ FUNCTION (svadrb, svadr_bhwd_impl, (0))
 FUNCTION (svadrd, svadr_bhwd_impl, (3))
 FUNCTION (svadrh, svadr_bhwd_impl, (1))
 FUNCTION (svadrw, svadr_bhwd_impl, (2))
-FUNCTION (svamax, cond_or_uncond_unspec_function,
- (UNSPEC_COND_FAMAX, UNSPEC_FAMAX))
-FUNCTION (svamin, cond_or_uncond_unspec_function,
- (UNSPEC_COND_FAMIN, UNSPEC_FAMIN))
 FUNCTION (svand, rtx_code_function, (AND, AND))
 FUNCTION (svandv, reduction, (UNSPEC_ANDV))
 FUNCTION (svasr, rtx_code_function, (ASHIFTRT, ASHIFTRT))
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def 
b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index edfe2574507..da2a0e41aa5 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -368,8 +368,3 @@ DEF_SVE_FUNCTION (svuzp2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2q, binary, all_data, none)
 #undef REQUIRED_EXTENSIONS
-
-#define REQUIRED_EXTENSIONS ssve (AARCH64_FL_FAMINMAX)
-DEF_SVE_FUNCTION (svamax, binary_opt_single_n, all_float, mxz)
-DEF_SVE_FUNCTION (svamin, binary_opt_single_n, all_float, mxz)
-#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
index d29c2209fdf..64f86035c30 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
@@ -591,6 +591,10 @@ FUNCTION (svaesd, fixed_insn_function, 
(CODE_FOR_aarch64_sve2_aesd))
 FUNCTION (svaese, fixed_insn_function, (CODE_FOR_aarch64_sve2_aese))
 FUNCTION (svaesimc, fixed_insn_function, (CODE_FOR_aarch64_sve2_aesimc))
 FUNCTION (svaesmc, fixed_insn_function, (CODE_FOR_aarch64_sve2_aesmc))
+FUNCTION (svamax, cond_or_uncond_unspec_function,
+ (UNSPEC_COND_FAMAX, UNSPEC_FAMAX))
+FUNCTION (svamin, cond_or_uncond_unspec_function,
+ (UNSPEC_COND_FAMIN, UNSPEC_FAMIN))
 FUNCTION (svbcax, CODE_FOR_MODE0 (aarch64_sve2_bcax),)
 FUNCTION (svbdep, unspec_based_function, (UNSPEC_BDEP, UNSPEC_BDEP, -1))
 FUNCTION (svbext, unspec_based_function, (UNSPEC_BEXT, UNSPEC_BEXT, -1))
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index 345a7621b6f..e4021559f36 100644
--- a

[committed] c: Do not document C23 support as experimental and incomplete

2024-10-30 Thread Joseph Myers
Since C23 support is substantially feature-complete, update
documentation to no longer refer to it as experimental and incomplete.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
* doc/cpp.texi (__STDC_VERSION__): Do not refer to C23 support as
experimental.
* doc/invoke.texi (std=c23, std=gnu23): Do not document as
experimental and incomplete.
* doc/standards.texi: Do not refer to C23 support as experimental
and incomplete.

gcc/c-family/
* c.opt (std=c23, std=gnu23, std=iso9899:2024): Do not mark as
experimental and incomplete.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 3fd331cda82..e2c01083aec 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2590,7 +2590,7 @@ Conform to the ISO 2017 C standard (published in 2018).
 
 std=c23
 C ObjC
-Conform to the ISO 2023 C standard draft (expected to be published in 2024) 
(experimental and incomplete support).
+Conform to the ISO 2023 C standard draft (expected to be published in 2024).
 
 std=c2x
 C ObjC Alias(std=c23)
@@ -2692,7 +2692,7 @@ Conform to the ISO 2017 C standard (published in 2018) 
with GNU extensions.
 
 std=gnu23
 C ObjC
-Conform to the ISO 2023 C standard draft (expected to be published in 2024) 
with GNU extensions (experimental and incomplete support).
+Conform to the ISO 2023 C standard draft (expected to be published in 2024) 
with GNU extensions.
 
 std=gnu2x
 C ObjC Alias(std=gnu23)
@@ -2748,7 +2748,7 @@ Conform to the ISO 2017 C standard (published in 2018).
 
 std=iso9899:2024
 C ObjC Alias(std=c23)
-Conform to the ISO 2023 C standard draft (expected to be published in 2024) 
(experimental and incomplete support).
+Conform to the ISO 2023 C standard draft (expected to be published in 2024).
 
 stdlib=
 Driver C++ ObjC++ Common Condition(ENABLE_STDLIB_OPTION) Var(flag_stdlib_kind) 
Joined Enum(stdlib_kind) RejectNegative Init(1)
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index db3a075c5a9..a83aa263df0 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -1887,7 +1887,7 @@ the 1999 revision of the C standard; the value 
@code{201112L}
 signifies the 2011 revision of the C standard; the value
 @code{201710L} signifies the 2017 revision of the C standard (which is
 otherwise identical to the 2011 version apart from correction of
-defects).  The value @code{202311L} is used for the experimental
+defects).  The value @code{202311L} is used for the
 @option{-std=c23} and @option{-std=gnu23} modes.  An unspecified value
 larger than @code{202311L} is used for the experimental
 @option{-std=c2y} and @option{-std=gnu2y} modes.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 07920e07b4d..19c148a7d6f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2480,8 +2480,7 @@ applied with @option{-std=c11}) and a new value of
 @itemx c2x
 @itemx iso9899:2024
 ISO C23, the 2023 revision of the ISO C standard (expected to be
-published in 2024).  The support for this version is experimental and
-incomplete.  The name @samp{c2x} is deprecated.
+published in 2024).  The name @samp{c2x} is deprecated.
 
 @item c2y
 The next version of the ISO C standard, still under development.  The
@@ -2506,8 +2505,7 @@ GNU dialect of ISO C17.  This is the default for C code.
 
 @item gnu23
 @itemx gnu2x
-GNU dialect of ISO C23.  The support for this version is experimental
-and incomplete.  The name @samp{gnu2x} is deprecated.
+GNU dialect of ISO C23.  The name @samp{gnu2x} is deprecated.
 
 @item gnu2y
 The next version of the ISO C standard, still under development, plus
diff --git a/gcc/doc/standards.texi b/gcc/doc/standards.texi
index 484fbb10352..4ef28e910d7 100644
--- a/gcc/doc/standards.texi
+++ b/gcc/doc/standards.texi
@@ -115,10 +115,10 @@ known as @dfn{C17} and is supported with 
@option{-std=c17} or
 @option{-std=c11}, and the only difference between the options is the
 value of @code{__STDC_VERSION__}.
 
-A fifth version of the C standard, known as @dfn{C23}, is under
-development and expected to be published in 2024 as ISO/IEC 9899:2024.
+A fifth version of the C standard, known as @dfn{C23}, is
+expected to be published in 2024 as ISO/IEC 9899:2024.
 (While in development, drafts of this standard version were referred
-to as @dfn{C2X}.)  Experimental and incomplete support for this is
+to as @dfn{C2X}.)  Support for this is
 enabled with @option{-std=c23} or @option{-std=iso9899:2024}.
 
 A further version of the C standard, known as @dfn{C2Y}, is under

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [Patch, fortran] PR115700 - comment 5: uninitialized string length in ASSOCIATE

2024-10-30 Thread Jerry D

On 10/30/24 9:58 AM, Steve Kargl wrote:

On Wed, Oct 30, 2024 at 04:41:40PM +, Paul Richard Thomas wrote:

This wrinkle to PR115700 came about because the associate-name string
length was not being initialized, when an array selector had a substring
reference with non-constant start or end. This, of course, caused
subsequent references to fail.

The ChangeLog provides an adequate explanation of the attached patch.

OK for mainline and backporting to 14-branch?



Yes.  Thanks for the patch.



The comment in the test case refers to a tmp4. There is no tmp4 
referenced in the test case.


Jerry


Re: [PATCH 6/7] RISC-V: Make vectorized memset handle more cases

2024-10-30 Thread Craig Blackmore



On 29/10/2024 15:09, Jeff Law wrote:



On 10/29/24 7:59 AM, Craig Blackmore wrote:


On 19/10/2024 14:05, Jeff Law wrote:



On 10/18/24 7:12 AM, Craig Blackmore wrote:
`expand_vec_setmem` only generated vectorized memset if it fitted 
into a

single vector store.  Extend it to generate a loop for longer and
unknown lengths.

The test cases now use -O1 so that they are not sensitive to 
scheduling.


gcc/ChangeLog:

* config/riscv/riscv-string.cc
(use_vector_stringop_p): Add comment.
(expand_vec_setmem): Use use_vector_stringop_p instead of
check_vectorise_memory_operation.  Add loop generation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/setmem-1.c: Use -O1.  Expect a loop
instead of a libcall.  Add test for unknown length.
* gcc.target/riscv/rvv/base/setmem-2.c: Likewise.
* gcc.target/riscv/rvv/base/setmem-3.c: Likewise and expect 
smaller

lmul.
So why handle memset differently than the other mem* routines where 
we limit ourselves to what we can handle without needing loops?


My suspicion is that once we're moving enough data that we can't do 
it with a single big lmul store that calling out to the library 
variant probably isn't a big deal for memset.  Do you have data 
which suggests otherwise?
I don't have data for this yet. My thinking was that the glibc and 
newlib memset implementations are scalar and they also do byte stores 
to reach alignment which is unnecessary on fast unaligned access 
targets.
Consider that a short term problem, at least for glibc.  I've got the 
magic ifunc bits which introduce vector versions and also check for 
fast unaligned support.    Does that change the calculus in your mind?



Yes, with those bits in place it would seem less of an obvious win.




This patch may still be useful in the meantime if I removed the loop 
generation parts as it would still allow us to generate vector setmem 
for smaller lengths than currently allowed.
Yea, which would unblock #7 of the series.  Then we could circle back 
on whether or not we should let setmem loop when expanded by the 
compiler?



Ok, I'll follow up with a non-loop version of this patch.

Thanks,

Craig


Jeff



Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-30 Thread Jeff Law




On 10/30/24 9:31 AM, Richard Sandiford wrote:



OK (and yeah, I can sympathise).  But I think there's an argument that,
if you're scheduling for one in-order core using the pipeline of an
unrelated core, that's effectively scheduling for the core as though
it were out-of-order.  In other words, the property we care about
isn't so much whether the processor itself is in-order (a statement
about the uarch), but whether we trying to schedule for a particular
in-order pipeline (a statement about what GCC is doing or knows about).
I'd argue that in the case you describe, we're not trying to schedule
for a particular in-order pipeline.

I can see that point.



That might need some finessing of the name.  But I think the concept
is right.  I'd rather base the hook (or param) on a general concept
like that rather than a specific "wide vs narrow" thing.

Agreed.  Naming was my real only concern about the first patch.




I still see Vineet's data as compelling, even with GIGO concern.


Do you mean the reduction in dynamic instruction counts?  If so,
that isn't what the algorithm is aiming to reduce.  Like I mentioned
in the previous thread, trying to minimise dynamic instruction counts
was also harmful for the core & benchmarks I was looking at.
We just ended up with lots of pipeline bubbles that could be
alleviated by judicious spilling.
Vineet showed significant cycle and icount improvements.  I'm much more 
interested in the former :-)


I'm planning to run it on our internal design, but it's not the top of 
the priority list and it's a scarce resource right now...  I fully 
expect it'll show a cycle improvement there too, though probably much 
smaller than the improvement seen on that spacemit k1 design.




I'm not saying that the algorithm gets the decision right for cactu
when tuning for in-order CPU X and running on that same CPU X.
But it seems like that combination hasn't been tried, and that,
even on the combinations that the patch has been tried on, the cactu
justification is based on static properties of the binary rather than
a particular runtime improvement (Y% faster).

To be clear, the two paragraphs above are trying to explain why I think
this should be behind a hook or param rather than unconditional.  The
changes themselves look fine, and incorporate the suggestions from the
previous thread (thanks!).
Thanks for that clarifying statement.  I actually think we're broadly in 
agreement here -- keep it as a hook/param rather than making it 
unconditional.


Assuming we keep it as a hook/param, opt-in & come up with better 
name/docs, any objections from your side?


jeff



Re: [PATCH v2 7/8] i386: Add else operand to masked loads.

2024-10-30 Thread Robin Dapp
> Could you just try the below change?

They work, including them in v3.

-- 
Regards
 Robin



Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao
Hi, David,

> On Oct 30, 2024, at 14:54, David Malcolm  wrote:
> 
> On Wed, 2024-10-30 at 15:53 +, Qing Zhao wrote:
>> 
>> 
>>> On Oct 30, 2024, at 10:48, David Malcolm 
>>> wrote:
>>> 
>>> On Wed, 2024-10-30 at 14:34 +, Sam James wrote:
 Qing Zhao  writes:
 
> Control this with a new option -fdiagnostics-details.
> 
> 
> [...]
> 
>> 
>> I have a question on the changes to the “warning_at”: (there are a
>> lot of such changes for -Warray-bounds and -Wstringop-**)
>> 
>> -   warned = warning_at (location, OPT_Warray_bounds_,
>> +   {
>> + rich_location *richloc
>> +   = build_rich_location_with_diagnostic_path (location,
>> stmt);
>> + warned = warning_at (richloc, OPT_Warray_bounds_,
>> 
>> The above is the current change.
>> 
>> My concern with this change is: 
>> even when -fdiagnostics_details is NOT on, the rich_location is
>> created.
> 
> A rich_location instance is always constructed when emitting
> diagnostics; warning_at with a location_t simply makes a rich_location
> on the stack.

Okay, I see. Thanks for the explanation.
>> 
>> How much is the additional overhead when using “rich_location *”
>> other than “location_t” as the 1st argument of warning_at?
> 
> The warning_at overload taking a rich_location * takes a borrowed
> pointer to a rich_location; it doesn't take ownership.  Hence, as
> written, the patch has a memory leak: every call to
> build_rich_location_with_diagnostic_path is using "new" to make a new
> rich_location instance on the heap, and they aren't being deleted.
Oops, good catch!
> 
>> 
>> Should I control the creation of “rich_location" with the flag
>> “flag_diagnostics_details” (Similar as I control the creation of
>> “move_history” data structure with the flag
>> “flag_diagnostics_details”? 
>> 
>> If so, how should I do it? Do you have a suggestion on a clean and
>> simply coding here (Sorry for the stupid question on this)
> 
> You can probably do all of this on the stack; make a new rich_location
> subclass, with something like:
> 
> class rich_location_with_details : public gcc_rich_location
> {
> public:
>  rich_location_with_details (location_t location, gimple *stmt);
> 
> private:
>  class deferred_move_history_path {
>  public:
> deferred_move_history_path (location_t location, gimple *stmt)
> : m_location (location), m_stmt (stmt)
> {
> }
> 
> std::unique_ptr
> make_path () const final override;
> /* TODO: you'll need to implement this; it will be called on
>demand if a diagnostic is acutally emitted for this
>rich_location.  */

What do you mean by “it will be called on demand if a diagnostic is actually 
emitted”? 
Do I need to do anything special in the code to call this “make_path”? 
> 
>location_t m_location;
>gimple *m_stmt;
>  } m_deferred_move_history_path;
> };
> 
> rich_location_with_details::
> rich_location_with_details (location_t location, gimple *stmt)
> : gcc_rich_location (location),
>  m_deferred_move_history_path (location, stmt)
> {
>  set_path (&m_deferred_move_history_path);
> }
> 
> using class deferred_diagnostic_path from the attached patch (caveat: I
> haven't tried bootstrapping it yet).
So, I also need to add the new class “deferred_diangostic_path” ? 
> 
> With that support subclass, you should be able to do something like
> this to make them on the stack:
> 
>   rich_location_with_details richloc (location, stmt);
>   warned = warning_at (&richloc, OPT_Warray_bounds_,
>"array subscript %E is outside array"
>" bounds of %qT", low_sub_org, artype);
> 
> and no work will be done for path creation unless and until a
> diagnostic is actually emitted for richloc - the richloc ctor will just
> initialize the vtable and some location_t/gimple * fields, which ought
> to be very cheap for the "warning is disabled" case .
> 
> I'll try bootstrapping the attached patch.

Will you commit the attached patch? (A little confused…)

Qing
> 
> Hope this makes sense.
> Dave
> <0001-diagnostics-add-class-deferred_diagnostic_path.patch>



[PATCH v3] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-10-30 Thread Marek Polacek
On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote:
> On Tue, 29 Oct 2024, Marek Polacek wrote:
> > +/* Substitute ARGS into T, which is a pack index (i.e., PACK_INDEX_TYPE or
> > +   PACK_INDEX_EXPR).  Returns a single type or expression, a PACK_INDEX_*
> > +   node if only a partial substitution could be performed, or 
> > ERROR_MARK_NODE
> > +   if there was an error.  */
> > +
> > +tree
> > +tsubst_pack_index (tree t, tree args, tsubst_flags_t complain, tree 
> > in_decl)
> > +{
> > +  tree index = tsubst_expr (PACK_INDEX_INDEX (t), args, complain, in_decl);
> > +  if (value_dependent_expression_p (index))
> > +return t;
> 
> In the dependent case I think we want to return a partially instantiated
> PACK_INDEX_* rather than the original one to correctly handle them inside
> a generic lambda, e.g.:
> 
>   template
>   constexpr auto f() {
> return []() { return Vs...[N]; }.template operator()<1>();
>   }
>   static_assert(f<1, 2, 3>() == 2);
> 
>   template
>   constexpr auto g() {
> return []() { return Vs...[N]; }.template operator()<1, 2, 
> 3>();
>   }
>   static_assert(g<1>() == 2);

Thanks a lot for the testcase!  This patch fixes tsubst_pack_index to
actually perform partial substitution if we can't get the element yet.

I've also adjusted p_c_e_1 to always return true for PACK_INDEX_EXPR.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements C++26 Pack Indexing, as described in
.

The issue discussing how to mangle pack indexes has not been resolved
yet  and I've
made no attempt to address it so far.

Unlike v1, which used augmented TYPE/EXPR_PACK_EXPANSION codes, this
version introduces two new codes: PACK_INDEX_EXPR and PACK_INDEX_TYPE.
Both carry two operands: the pack expansion and the index.  They are
handled in tsubst_pack_index: substitute the index and the pack and
then extract the element from the vector (if possible).

To handle pack indexing in a decltype or with decltype(auto), there is
also the new PACK_INDEX_PARENTHESIZED_P flag.

With this feature, it's valid to write something like

  using U = tmpl;

where we first expand the template argument into

  Ts...[Is#0], Ts...[Is#1], ...

and then substitute each individual pack index.

PR c++/113798

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
New case.
* cp-objcp-common.cc (cp_common_init_ts): Mark PACK_INDEX_TYPE and
PACK_INDEX_EXPR.
* cp-tree.def (PACK_INDEX_TYPE): New.
(PACK_INDEX_EXPR): New.
* cp-tree.h (WILDCARD_TYPE_P): Also check PACK_INDEX_P.
(PACK_INDEX_CHECK): Define.
(PACK_INDEX_P): Define.
(PACK_INDEX_PACK): Define.
(PACK_INDEX_INDEX): Define.
(PACK_INDEX_PARENTHESIZED_P): Define.
(make_pack_index): Declare.
(pack_index_element): Declare.
* cxx-pretty-print.cc (cxx_pretty_printer::expression) : New case.
(cxx_pretty_printer::type_id) : New case.
* error.cc (dump_type) : New case.
(dump_type_prefix): Handle PACK_INDEX_TYPE.
(dump_type_suffix): Likewise.
(dump_expr) : New case.
* mangle.cc (write_type) : New case.
* module.cc (trees_out::type_node) : New case.
(trees_in::tree_node) : New case.
* parser.cc (cp_parser_pack_index): New.
(cp_parser_primary_expression): Handle a C++26 pack-index-expression.
(cp_parser_unqualified_id): Handle a C++26 pack-index-specifier.
(cp_parser_nested_name_specifier_opt): See if a pack-index-specifier
follows.  Handle a C++26 pack-index-specifier.
(cp_parser_decltype_expr): Set id_expression_or_member_access_p for
pack indexing.
(cp_parser_mem_initializer_id): Handle a C++26 pack-index-specifier.
(cp_parser_simple_type_specifier): Likewise.
(cp_parser_base_specifier): Likewise.
* pt.cc (iterative_hash_template_arg) : New case.
(find_parameter_packs_r) : New
case.
(make_pack_index): New.
(tsubst_pack_index): New.
(tsubst): Avoid tsubst on PACK_INDEX_TYPE.
: Add a call to error.
: Check PACK_INDEX_PARENTHESIZED_P.
: New case.
(tsubst_expr) : New case.
(dependent_type_p_r): Return true for PACK_INDEX_TYPE.
(type_dependent_expression_p): Return true for PACK_INDEX_EXPR.
* ptree.cc (cxx_print_type) : New case.
* semantics.cc (finish_parenthesized_expr): Set
PACK_INDEX_PARENTHESIZED_P for PACK_INDEX_P.
(finish_type_pack_element): Adjust error messages.
(pack_index_element): New.
* tree.cc (cp_tree_equal) : New case.
(cp_walk_subtrees) : New case.
* typeck.cc (structural_comptypes) : New case.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
dg-prune-outp

[PATCH v2 08/10] Test: Add testcases for form 6 of unsigned integer SAT_ADD simplify

2024-10-30 Thread pan2 . li
From: Pan Li 

The phiopt2 pass will also try the gimple_simplify for the form 6
of unsigned integer SAT_ADD.  Thus add the testcase to make sure
it will be performed in phiopt2 pass.

gcc/testsuite/ChangeLog:

* gcc.dg/sat_arith_simplify.h: Add test helper macros.
* gcc.dg/sat_u_add-simplify-6-u16.c: New test.
* gcc.dg/sat_u_add-simplify-6-u32.c: New test.
* gcc.dg/sat_u_add-simplify-6-u64.c: New test.
* gcc.dg/sat_u_add-simplify-6-u8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.dg/sat_arith_simplify.h   |  9 +
 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u16.c | 11 +++
 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u32.c | 11 +++
 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u64.c | 11 +++
 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u8.c  | 11 +++
 5 files changed, 53 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u8.c

diff --git a/gcc/testsuite/gcc.dg/sat_arith_simplify.h 
b/gcc/testsuite/gcc.dg/sat_arith_simplify.h
index 34fae32ae3a..d577adb9a88 100644
--- a/gcc/testsuite/gcc.dg/sat_arith_simplify.h
+++ b/gcc/testsuite/gcc.dg/sat_arith_simplify.h
@@ -34,4 +34,13 @@ T sat_u_add_##T##_1 (T x, T y)  \
 return -1;  \
 }
 
+#define DEF_SAT_U_ADD_6(T)  \
+T sat_u_add_##T##_6 (T x, T y)  \
+{   \
+  if ((T)(x + y) < x)   \
+return -1;  \
+  else  \
+return x + y;   \
+}
+
 #endif
diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u16.c 
b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u16.c
new file mode 100644
index 000..83e6a8993b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u16.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2-details" } */
+
+#include 
+#include "sat_arith_simplify.h"
+
+DEF_SAT_U_ADD_6 (uint16_t)
+
+/* { dg-final { scan-tree-dump-not " if " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " else " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " goto " "phiopt2" } } */
diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u32.c 
b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u32.c
new file mode 100644
index 000..622206486e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u32.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2-details" } */
+
+#include 
+#include "sat_arith_simplify.h"
+
+DEF_SAT_U_ADD_6 (uint32_t)
+
+/* { dg-final { scan-tree-dump-not " if " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " else " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " goto " "phiopt2" } } */
diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u64.c 
b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u64.c
new file mode 100644
index 000..f8ea3c8767a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u64.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2-details" } */
+
+#include 
+#include "sat_arith_simplify.h"
+
+DEF_SAT_U_ADD_6 (uint64_t)
+
+/* { dg-final { scan-tree-dump-not " if " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " else " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " goto " "phiopt2" } } */
diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u8.c 
b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u8.c
new file mode 100644
index 000..bcd136d899f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-6-u8.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2-details" } */
+
+#include 
+#include "sat_arith_simplify.h"
+
+DEF_SAT_U_ADD_6 (uint8_t)
+
+/* { dg-final { scan-tree-dump-not " if " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " else " "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not " goto " "phiopt2" } } */
-- 
2.43.0



[PATCH] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2024-10-30 Thread Yuta Mukai (Fujitsu)
Hello,

This patch adds initial support for FUJITSU-MONAKA CPU, which we are developing.
This is the slides for the CPU: 
https://www.fujitsu.com/downloads/SUPER/topics/isc24/next-arm-based-processor-fujitsu-monaka-and-its-software-ecosystem.pdf

Bootstrapped/regtested on aarch64-unknown-linux-gnu.

We will post a patch for backporting to GCC 14 later.

We would be grateful if someone could push this on our behalf, as we do not 
have write access.

Thanks,
Yuta
--
Yuta Mukai
Fujitsu Limited



0001-aarch64-Add-support-for-FUJITSU-MONAKA-mcpu-fujitsu-.patch
Description: 0001-aarch64-Add-support-for-FUJITSU-MONAKA-mcpu-fujitsu-.patch


Re: [PATCH v3] [aarch64] Fix function multiversioning dispatcher link error with LTO

2024-10-30 Thread Richard Sandiford
Yangyu Chen  writes:
> We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
> building with LTO, the linker cannot find the
> __init_cpu_features_resolver.lto_priv* symbol, causing the link error.
>
> This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
> but never defined warning for this symbol, we also mark TREE_PUBLIC to the 
> decl.
> We should also mark the decl having hidden visibility. And fix the attribute 
> in
> the same way for __aarch64_cpu_features identifier.
>
> Minimal steps to reproduce the bug:
>
> echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c
> echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c
> echo 'void func1();void func2();int main(){func1();func2();return 0;}' > 
> main.c
> gcc -flto -c 1.c 2.c
> gcc -flto main.c 1.o 2.o
>
> Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (dispatch_function_versions): Adding
>   DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
>   __init_cpu_features_resolver and __aarch64_cpu_features.

Thanks, LGTM.  I've tested this locally and was about to push, but then
realised: since you've already contributed changes (great!), it probably
wouldn't be acceptable to treat it as trivial for copyright purposes.
Could you confirm that you're contributing under the DCO:
https://gcc.gnu.org/dco.html ?  If so, could you repost with a
Signed-off-by?

Sorry for the administrivia.

Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5770491b30c..2b2d5b9e390 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -20437,6 +20437,10 @@ dispatch_function_versions (tree dispatch_decl,
>tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
>tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
> init_fn_id, init_fn_type);
> +  DECL_EXTERNAL (init_fn_decl) = 1;
> +  TREE_PUBLIC (init_fn_decl) = 1;
> +  DECL_VISIBILITY (init_fn_decl) = VISIBILITY_HIDDEN;
> +  DECL_VISIBILITY_SPECIFIED (init_fn_decl) = 1;
>tree arg1 = DECL_ARGUMENTS (dispatch_decl);
>tree arg2 = TREE_CHAIN (arg1);
>ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
> @@ -20456,6 +20460,9 @@ dispatch_function_versions (tree dispatch_decl,
>   get_identifier ("__aarch64_cpu_features"),
>   global_type);
>DECL_EXTERNAL (global_var) = 1;
> +  TREE_PUBLIC (global_var) = 1;
> +  DECL_VISIBILITY (global_var) = VISIBILITY_HIDDEN;
> +  DECL_VISIBILITY_SPECIFIED (global_var) = 1;
>tree mask_var = create_tmp_var (long_long_unsigned_type_node);
>  
>tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,


[PATCH v2] Doc: Add doc for standard name mask_len_strided_load{store}m

2024-10-30 Thread pan2 . li
From: Pan Li 

This patch would like to add doc for the below 2 standard names.

1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)

gcc/ChangeLog:

* doc/md.texi: Add doc for mask_len_stried_load{store}.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
---
 gcc/doc/md.texi | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6d9c8643739..25ded86f0d1 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5135,6 +5135,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with zero as base and 
operand 2 as step.
+For each element the load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
result should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5172,6 +5186,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with zero as base and 
operand 1 as step.
+For each element the store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of
+mask (operand 3) to memory.  Element @var{i} of the mask is set if element 
@var{i} of (operand 3)
+should be stored.  Mask elements @var{i} with @var{i} > (operand 4 + operand 
5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
-- 
2.43.0



Re: [pushed: r15-4760] diagnostics: support multiple output formats simultaneously [PR116613]

2024-10-30 Thread Jonathan Wakely

On 29/10/24 19:19 -0400, David Malcolm wrote:

This patch generalizes diagnostic_context so that rather than having
a single output format, it has a vector of zero or more.


[snip]


+/* Class for parsing the arguments of -fdiagnostics-add-output= and
+   -fdiagnostics-set-output=, and making diagnostic_output_format
+   instances (or issuing errors).  */
+
+class output_factory
+{
+public:
+  class handler
+  {
+  public:
+handler (std::string name) : m_name (name) {}


How long are these names?

If they don't fit in 15 chars, then this should be std::move(name).

So for a name like "sarif:version=2.1" it should be moved, otherwise
you make a deep copy and reallocate a new string.




[PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-10-30 Thread Evgeny Karpov
Tuesday, October 29, 2024
Richard Sandiford  wrote:

> Hmm, I see.  I think this is surprising enough that it would be worth
> a comment.  How about:
>
>  /* Since the assembly directive only specifies a size, and not an
> alignment, we need to follow the default ASM_OUTPUT_LOCAL behavior
> and round the size up to at least a multiple of BIGGEST_ALIGNMENT bits,
> so that each uninitialized object starts on such a boundary.
> However, we also want to allow the alignment (and thus minimum size)
> to exceed BIGGEST_ALIGNMENT.  */

Thanks for the suggestion. It will be included in the next version of the patch.

> But how does using a larger size force the linker to assign a larger
> alignment than BIGGEST_ALIGNMENT?  Is there a second limit in play?
> 
> Or does this patch not guarantee that the ffmpeg variable gets the
> alignment it wants?  Is it just about suppresing the error?
> 
> If it's just about suppressing the error without guaranteeing the
> requested alignment, then, yeah, I think patching ffmpeg would
> be better.  If the patch does guarantee the alignment, then the
> patch seems ok, but I think the comment should explain how, and
> explain why BIGGEST_ALIGNMENT isn't larger.

It looks like it generates the expected assembly code for the alignments
and the correct object file, and it should be the expected code for FFmpeg.

The alignment cannot be larger than 8192, otherwise, it will generate an error.

error: requested alignment ‘16384’ exceeds object file maximum 8192
   16 | float __attribute__((aligned (1 << 14))) large_aligned_array10[3];

Regards,
Evgeny


Here an example:

float large_aligned_array[3];
float __attribute__((aligned (8))) large_aligned_array2[3];
float __attribute__((aligned (16))) large_aligned_array3[3];
float __attribute__((aligned (32))) large_aligned_array4[3];
float __attribute__((aligned (64))) large_aligned_array5[3];
float __attribute__((aligned (128))) large_aligned_array6[3];
float __attribute__((aligned (256))) large_aligned_array7[3];
float __attribute__((aligned (512))) large_aligned_array8[3];
float __attribute__((aligned (1024))) large_aligned_array9[3];


.align  3
.deflarge_aligned_array;.scl3;  .type   0;  .endef
large_aligned_array:
.space  12  // skip

.global large_aligned_array2
.align  3
.deflarge_aligned_array2;   .scl3;  .type   0;  .endef
large_aligned_array2:
.space  12  // skip

.global large_aligned_array3
.align  4
.deflarge_aligned_array3;   .scl3;  .type   0;  .endef
large_aligned_array3:
.space  12  // skip

.global large_aligned_array4
.align  5
.deflarge_aligned_array4;   .scl3;  .type   0;  .endef
large_aligned_array4:
.space  12  // skip

.global large_aligned_array5
.align  6
.deflarge_aligned_array5;   .scl3;  .type   0;  .endef
large_aligned_array5:
.space  12  // skip

.global large_aligned_array6
.align  7
.deflarge_aligned_array6;   .scl3;  .type   0;  .endef
large_aligned_array6:
.space  12  // skip

.global large_aligned_array7
.align  8
.deflarge_aligned_array7;   .scl3;  .type   0;  .endef
large_aligned_array7:
.space  12  // skip

.global large_aligned_array8
.align  9
.deflarge_aligned_array8;   .scl3;  .type   0;  .endef
large_aligned_array8:
.space  12  // skip

.global large_aligned_array9
.align  10
.deflarge_aligned_array9;   .scl3;  .type   0;  .endef
large_aligned_array9:
.space  12  // skip


Symbols in the object file also look good.

015  SECT2  notype   External | large_aligned_array
016 0010 SECT2  notype   External | large_aligned_array2
017 0020 SECT2  notype   External | large_aligned_array3
018 0040 SECT2  notype   External | large_aligned_array4
019 0080 SECT2  notype   External | large_aligned_array5
01A 0100 SECT2  notype   External | large_aligned_array6
01B 0200 SECT2  notype   External | large_aligned_array7
01C 0400 SECT2  notype   External | large_aligned_array8
01D 0800 SECT2  notype   External | large_aligned_array9


RE: [PATCH v1] Doc: Add doc for standard name mask_len_strided_load{store}m

2024-10-30 Thread Li, Pan2
>> +Load several separate memory locations into a destination vector of mode 
>> @var{m}.
>> +Operand 0 is a destination vector of mode @var{m}.
>> +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
>> Pmode.
>> +operand 3 is mask operand, operand 4 is length operand and operand 5 is 
>> bias operand.
>> +The instruction can be seen as a special case of 
>> @code{mask_len_gather_load@var{m}@var{n}}
>> +with an offset vector that is a @code{vec_series} with operand 1 as base 
>> and operand 2 as step.
> wouldn't it be zero as base?
Yes, the base of vec_serices should be zero.

>> +For each element index i load address is operand 1 + @var{i} * operand 2.
> the load address
Sure, will update in v2.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, October 30, 2024 5:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Doc: Add doc for standard name 
mask_len_strided_load{store}m

On Wed, Oct 30, 2024 at 2:39 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to add doc for the below 2 standard names.
>
> 1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
> 2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)
>
> gcc/ChangeLog:
>
> * doc/md.texi: Add doc for mask_len_stried_load{store}.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Juzhe-Zhong 
> ---
>  gcc/doc/md.texi | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 6d9c8643739..83036383fe1 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5135,6 +5135,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.

wouldn't it be zero as base?
> +For each element index i load address is operand 1 + @var{i} * operand 2.

the load address

Otherwise OK.

Thanks,
Richard.

> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
> 5) elements from memory.
> +Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
> result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5172,6 +5186,19 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + 
> operand 5) elements of
> +mask (operand 3) to memory.  Element @var{i} of the mask is set if element 
> @var{i} of (operand 3)
> +should be stored.  Mask elements @var{i} with @var{i} > (operand 4 + operand 
> 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> --
> 2.43.0
>


[PATCH] c++: Fix ICE on constexpr virtual function [PR117317]

2024-10-30 Thread Jakub Jelinek
Hi!

Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.

The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and eventually for backports?

2024-10-30  Jakub Jelinek  

PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.

* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.

--- gcc/cp/semantics.cc.jj  2024-10-25 10:00:29.433768358 +0200
+++ gcc/cp/semantics.cc 2024-10-29 13:10:32.234068524 +0100
@@ -5150,7 +5150,10 @@ emit_associated_thunks (tree fn)
  enabling you to output all the thunks with the function itself.  */
   if (DECL_VIRTUAL_P (fn)
   /* Do not emit thunks for extern template instantiations.  */
-  && ! DECL_REALLY_EXTERN (fn))
+  && ! DECL_REALLY_EXTERN (fn)
+  /* Do not emit thunks for tentative decls, those will be processed
+again at_eof if really needed.  */
+  && (DECL_INTERFACE_KNOWN (fn) || !DECL_DEFER_OUTPUT (fn)))
 {
   tree thunk;
 
--- gcc/testsuite/g++.dg/cpp2a/pr117317-1.C.jj  2024-10-29 13:12:23.373519669 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/pr117317-1.C 2024-10-29 13:12:18.223591437 
+0100
@@ -0,0 +1,19 @@
+// PR c++/117317
+// { dg-do compile { target c++20 } }
+
+struct C {
+  constexpr bool operator== (const C &b) const { return foo (); }
+  constexpr virtual bool foo () const = 0;
+};
+class A : public C {};
+class B : public C {};
+template 
+struct D : A, B
+{
+  constexpr bool operator== (const D &) const = default;
+  constexpr bool foo () const override { return true; }
+};
+struct E : D<1> {};
+constexpr E e;
+constexpr E f;
+static_assert (e == f, "");
--- gcc/testsuite/g++.dg/cpp2a/pr117317-2.C.jj  2024-10-29 13:16:10.101359947 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/pr117317-2.C 2024-10-29 13:16:15.981278003 
+0100
@@ -0,0 +1,15 @@
+// PR c++/117317
+// { dg-do compile { target c++20 } }
+
+struct C {
+  constexpr virtual bool foo () const = 0;
+};
+struct A : public C {};
+struct B : public C {};
+template 
+struct D : A, B
+{
+  constexpr bool foo () const override { return true; }
+};
+constexpr D<0> d;
+static_assert (d.foo (), "");

Jakub



[PATCH] c: Diagnose char argument to __builtin_stdc_*

2024-10-30 Thread Jakub Jelinek
Hi!

When working on __builtin_stdc_rotate_*, I've noticed that while the
second argument to those is explicitly allowed to have char type,
the first argument to all the stdc_* type-generic functions is
- standard unsigned integer type, excluding bool;
- extended unsigned integer type;
- or, bit-precise unsigned integer type whose width matches a standard
  or extended integer type, excluding bool.
but the __builtin_stdc_* lowering code was diagnosing just
!INTEGRAL_TYPE_P
ENUMERAL_TYPE
BOOLEAN_TYPE
!TYPE_UNSIGNED
Now, with -funsigned-char plain char type is TYPE_UNSIGNED, yet it isn't
allowed because it isn't standard unsigned integer type, nor
extended unsigned integer type, nor bit-precise unsigned integer type.

The following patch diagnoses char arguments and adds testsuite coverage
for that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or should I make it a pedwarn instead?

2024-10-30  Jakub Jelinek  

gcc/c/
* c-parser.cc (c_parser_postfix_expression): Diagnose if
first __builtin_stdc_* argument has char type even when
-funsigned-char.
gcc/testsuite/
* gcc.dg/builtin-stdc-bit-3.c: New test.
* gcc.dg/builtin-stdc-rotate-3.c: New test.

--- gcc/c/c-parser.cc.jj2024-10-29 09:06:12.976008357 +0100
+++ gcc/c/c-parser.cc   2024-10-29 16:45:45.770813360 +0100
@@ -12382,6 +12382,14 @@ c_parser_postfix_expression (c_parser *p
expr.set_error ();
break;
  }
+   if (TYPE_MAIN_VARIANT (TREE_TYPE (arg_p->value))
+   == char_type_node)
+ {
+   error_at (loc, "argument 1 in call to function "
+ "%qs has % type", name);
+   expr.set_error ();
+   break;
+ }
tree arg = arg_p->value;
tree type = TYPE_MAIN_VARIANT (TREE_TYPE (arg));
/* Expand:
--- gcc/testsuite/gcc.dg/builtin-stdc-bit-3.c.jj2024-10-29 
16:48:59.186127709 +0100
+++ gcc/testsuite/gcc.dg/builtin-stdc-bit-3.c   2024-10-29 16:49:56.188336214 
+0100
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-funsigned-char" } */
+
+void
+foo (void)
+{
+  __builtin_stdc_leading_zeros ((char) 0); /* { dg-error "argument 
1 in call to function '__builtin_stdc_leading_zeros' has 'char' type" } */
+  __builtin_stdc_leading_ones ((char) 0);  /* { dg-error "argument 
1 in call to function '__builtin_stdc_leading_ones' has 'char' type" } */
+  __builtin_stdc_trailing_zeros ((char) 0);/* { dg-error "argument 
1 in call to function '__builtin_stdc_trailing_zeros' has 'char' type" } */
+  __builtin_stdc_trailing_ones ((char) 0); /* { dg-error "argument 
1 in call to function '__builtin_stdc_trailing_ones' has 'char' type" } */
+  __builtin_stdc_first_leading_zero ((char) 0);/* { dg-error 
"argument 1 in call to function '__builtin_stdc_first_leading_zero' has 'char' 
type" } */
+  __builtin_stdc_first_leading_one ((char) 0); /* { dg-error "argument 
1 in call to function '__builtin_stdc_first_leading_one' has 'char' type" } */
+  __builtin_stdc_first_trailing_zero ((char) 0);   /* { dg-error "argument 
1 in call to function '__builtin_stdc_first_trailing_zero' has 'char' type" } */
+  __builtin_stdc_first_trailing_one ((char) 0);/* { dg-error 
"argument 1 in call to function '__builtin_stdc_first_trailing_one' has 'char' 
type" } */
+  __builtin_stdc_count_zeros ((char) 0);   /* { dg-error "argument 
1 in call to function '__builtin_stdc_count_zeros' has 'char' type" } */
+  __builtin_stdc_count_ones ((char) 0);/* { dg-error 
"argument 1 in call to function '__builtin_stdc_count_ones' has 'char' type" } 
*/
+  __builtin_stdc_has_single_bit ((char) 0);/* { dg-error "argument 
1 in call to function '__builtin_stdc_has_single_bit' has 'char' type" } */
+  __builtin_stdc_bit_width ((char) 0); /* { dg-error "argument 
1 in call to function '__builtin_stdc_bit_width' has 'char' type" } */
+  __builtin_stdc_bit_floor ((char) 0); /* { dg-error "argument 
1 in call to function '__builtin_stdc_bit_floor' has 'char' type" } */
+  __builtin_stdc_bit_ceil ((char) 0);  /* { dg-error "argument 
1 in call to function '__builtin_stdc_bit_ceil' has 'char' type" } */
+}
--- gcc/testsuite/gcc.dg/builtin-stdc-rotate-3.c.jj 2024-10-29 
16:48:55.506178811 +0100
+++ gcc/testsuite/gcc.dg/builtin-stdc-rotate-3.c2024-10-29 
16:50:15.338070312 +0100
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-funsigned-char" } */
+
+void
+foo (void)
+{
+  __builtin_stdc_rotate_left ((char) 0, 0);/* { dg-error 
"argument 1 in call to function '__builtin_stdc_rotate_left' has 'char' type" } 
*/
+  __builtin_stdc_rotate_right ((char) 0, 0);   /* { dg-error 
"argument 1 in call to function '__builtin_stdc_ro

[PATCH v3 2/2][RFC] Add debugging for move history.

2024-10-30 Thread Qing Zhao
gcc/ChangeLog:

* diagnostic-move-history.cc (dump_move_history): New routine.
(dump_move_history_for): Likewise.
(debug_mv_h): Likewise.
* diagnostic-move-history.h (dump_move_history): New prototype.
(dump_move_history_for): Likewise.
* gimple-ssa-isolate-paths.cc (isolate_path): Add debugging message
when setting move history for statements.
* tree-ssa-sink.cc (sink_code_in_bb): Likewise.
* tree-ssa-threadupdate.cc (ssa_redirect_edges): Likewise.
(back_jt_path_registry::duplicate_thread_path): Likewise.
---
 gcc/diagnostic-move-history.cc  | 67 +
 gcc/diagnostic-move-history.h   |  2 +
 gcc/gimple-ssa-isolate-paths.cc | 10 +
 gcc/tree-ssa-sink.cc|  3 ++
 gcc/tree-ssa-threadupdate.cc| 18 +
 5 files changed, 100 insertions(+)

diff --git a/gcc/diagnostic-move-history.cc b/gcc/diagnostic-move-history.cc
index b0e8308dbf6b..e4c471ab50f3 100644
--- a/gcc/diagnostic-move-history.cc
+++ b/gcc/diagnostic-move-history.cc
@@ -24,6 +24,7 @@
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
+#include "tree-pretty-print.h"
 #include "gimple-iterator.h"
 #include "cfganal.h"
 #include "diagnostic-move-history.h"
@@ -262,3 +263,69 @@ set_move_history_to_stmts_in_bb (basic_block bb, edge 
entry,
 
   return true;
 }
+
+/* Dump the move_history data structure MV_HISTORY.  */
+
+void
+dump_move_history (FILE *file, move_history_t mv_history)
+{
+  fprintf (file, "The move history is: \n");
+  if (!mv_history)
+{
+  fprintf (file, "No move history.\n");
+  return;
+}
+
+  for (move_history_t cur_ch = mv_history; cur_ch;
+   cur_ch = cur_ch->prev_move)
+{
+  expanded_location exploc_cond = expand_location (cur_ch->condition);
+
+  if (exploc_cond.file)
+   fprintf (file, "[%s:", exploc_cond.file);
+  fprintf (file, "%d, ", exploc_cond.line);
+  fprintf (file, "%d] ", exploc_cond.column);
+
+  fprintf (file, "%s ", cur_ch->is_true_path ? "true" : "false");
+  const char *reason = NULL;
+  switch (cur_ch->reason)
+   {
+   case COPY_BY_THREAD_JUMP:
+ reason = "copy_by_thread_jump";
+ break;
+   case COPY_BY_ISOLATE_PATH:
+ reason = "copy_by_isolate_path";
+ break;
+   case MOVE_BY_SINK:
+ reason = "move_by_sink";
+ break;
+   default:
+ reason = "UNKNOWN";
+ break;
+   }
+  fprintf (file, "%s \n", reason);
+}
+}
+
+/* Dump the move_history date structure attached to the gimple STMT.  */
+void
+dump_move_history_for (FILE *file, const gimple *stmt)
+{
+  move_history_t mv_history = get_move_history (stmt);
+  if (!mv_history)
+fprintf (file, "No move history.\n");
+  else
+dump_move_history (file, mv_history);
+}
+
+DEBUG_FUNCTION void
+debug_mv_h (const move_history_t mv_history)
+{
+  dump_move_history (stderr, mv_history);
+}
+
+DEBUG_FUNCTION void
+debug_mv_h (const gimple * stmt)
+{
+  dump_move_history_for (stderr, stmt);
+}
diff --git a/gcc/diagnostic-move-history.h b/gcc/diagnostic-move-history.h
index cac9cb1e2675..0133f379dbbd 100644
--- a/gcc/diagnostic-move-history.h
+++ b/gcc/diagnostic-move-history.h
@@ -88,5 +88,7 @@ extern bool set_move_history_to_stmt (gimple *, edge,
of the entry edge.  */
 extern bool set_move_history_to_stmts_in_bb (basic_block, edge,
 bool, enum move_reason);
+extern void dump_move_history (FILE *, move_history_t);
+extern void dump_move_history_for (FILE *, const gimple *);
 
 #endif // DIAGNOSTIC_MOVE_HISTORY_H_INCLUDED
diff --git a/gcc/gimple-ssa-isolate-paths.cc b/gcc/gimple-ssa-isolate-paths.cc
index a79b512f63bd..0a6520ee8311 100644
--- a/gcc/gimple-ssa-isolate-paths.cc
+++ b/gcc/gimple-ssa-isolate-paths.cc
@@ -176,6 +176,16 @@ isolate_path (basic_block bb, basic_block duplicate,
  incoming edge.  */
   if (flag_diagnostics_details)
 {
+  if (dump_file)
+   {
+ fprintf (dump_file, "Set move history for stmts of B[%d]"
+  " as not on the destination of the edge\n",
+  bb->index);
+ fprintf (dump_file, "Set move history for stmts of B[%d]"
+  " as on the destination of the edge\n",
+  duplicate->index);
+   }
+
   set_move_history_to_stmts_in_bb (bb, e, false, COPY_BY_ISOLATE_PATH);
   set_move_history_to_stmts_in_bb (duplicate, e,
   true, COPY_BY_ISOLATE_PATH);
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2fddb1a63268..e7ee445feb69 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -718,6 +718,9 @@ sink_code_in_bb (basic_block bb, virtual_operand_live 
&vop_live)
{
  edge entry = find_edge (bb, gsi_bb (togsi));
  set_move_history_to_stmt (stmt, entry, true, MOVE_BY_SINK);
+ if (dump_file)
+   fprintf (dump_file, " Set 

[PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Qing Zhao
Control this with a new option -fdiagnostics-details.

$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
  if (*index >= 4)
warn();
  *regs = val;
}
struct nums {int vals[4];};

void sparx5_set (int *ptr, struct nums *sg, int index)
{
  int *val = &sg->vals[index];

  assign(0,ptr, &index);
  assign(*val, ptr, &index);
}

$ gcc -Wall -O2  -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

In the above, Although the warning is correct in theory, the warning message
itself is confusing to the end-user since there is information that cannot
be connected to the source code directly.

It will be a nice improvement to add more information in the warning message
to report where such index value come from.

In order to achieve this, we add a new data structure "move_history" to record
1. the "condition" that triggers the code movement;
2. whether the code movement is on the true path of the "condition";
3. the "compiler transformation" that triggers the code movement.

Whenever there is a code movement along control flow graph due to some
specific transformations, such as jump threading, path isolation, tree
sinking, etc., a move_history structure is created and attached to the
moved gimple statement.

During array out-of-bound checking or -Wstringop-* warning checking, the
"move_history" that was attached to the gimple statement is used to form
a sequence of diagnostic events that are added to the corresponding rich
location to be used to report the warning message.

This behavior is controled by the new option -fdiagnostics-details
which is off by default.

With this change, by adding -fdiagnostics-details,
the warning message for the above testing case is now:

$ gcc -Wall -O2 -fdiagnostics-details -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
  ‘sparx5_set’: events 1-2
4 |   if (*index >= 4)
  |  ^
  |  |
  |  (1) when the condition is evaluated to true
..
   12 |   int *val = &sg->vals[index];
  |   ~~~
  |   |
  |   (2) out of array bounds here
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

PR tree-optimization/109071

gcc/ChangeLog:

* Makefile.in (OBJS): Add diagnostic-move-history.o
and move-history-diagnostic-path.o.
* gcc/common.opt (fdiagnostics-details): New option.
* gcc/doc/invoke.texi (fdiagnostics-details): Add
documentation for the new option.
* gimple-array-bounds.cc (build_rich_location_with_diagnostic_path):
New function.
(check_out_of_bounds_and_warn): Add one new parameter. Use rich
location with move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_array_ref): Use rich location with
move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_mem_ref): Add one new parameter.
Use rich location with move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_addr_expr): Use rich location with
move_history_diagnostic_path for warning_at.
(array_bounds_checker::check_array_bounds): Call check_mem_ref with
one more parameter.
* gimple-array-bounds.h: Update prototype for check_mem_ref.
* gimple-iterator.cc (gsi_remove): (gsi_remove): Remove the move
history when removing the gimple.
* gimple-pretty-print.cc (pp_gimple_stmt_1): Emit MV_H marking
if the gimple has a move_history.
* gimple-ssa-isolate-paths.cc (isolate_path): Set move history
for the gimples of the duplicated blocks.
* gimple-ssa-warn-restrict.cc (maybe_diag_access_bounds): Use
rich location with move_history_diagnostic_path for warning_at.
* gimple-ssa-warn-access.cc (warn_string_no_nul): Likewise.
(maybe_warn_nonstring_arg): Likewise.
(maybe_warn_for_bound): Likewise.
(warn_for_access): Likewise.
(check_access): Likewise.
(pass_waccess::check_strncat): Likewise.
(pass_waccess::maybe_check_access_sizes): Likewise.
* tree-ssa-sink.cc (sink_code_in_bb): Create move_history for
stmt when it is sinked.
* toplev.cc (toplev::finalize):  Call move_history_finalize.
* tree-ssa-threadupdate.cc (ssa_redirect_edges): Create move_history
for stmts when they are duplicated.
(back_jt_path_registry::duplicate_thread_path): Likewise.
   

Re: [PATCH v4] [aarch64] Fix function multiversioning dispatcher link error with LTO

2024-10-30 Thread Richard Sandiford
Yangyu Chen  writes:
> We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
> building with LTO, the linker cannot find the
> __init_cpu_features_resolver.lto_priv* symbol, causing the link error.
>
> This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
> but never defined warning for this symbol, we also mark TREE_PUBLIC to the 
> decl.
> We should also mark the decl having hidden visibility. And fix the attribute 
> in
> the same way for __aarch64_cpu_features identifier.
>
> Minimal steps to reproduce the bug:
>
> echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c
> echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c
> echo 'void func1();void func2();int main(){func1();func2();return 0;}' > 
> main.c
> gcc -flto -c 1.c 2.c
> gcc -flto main.c 1.o 2.o
>
> Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")
> Signed-off-by: Yangyu Chen 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (dispatch_function_versions): Adding
>   DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
>   __init_cpu_features_resolver and __aarch64_cpu_features.

Thanks, pushed to trunk.  I'll push to GCC 14 branch tomorrow when
testing & pushing another patch.

Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5770491b30c..2b2d5b9e390 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -20437,6 +20437,10 @@ dispatch_function_versions (tree dispatch_decl,
>tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
>tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
> init_fn_id, init_fn_type);
> +  DECL_EXTERNAL (init_fn_decl) = 1;
> +  TREE_PUBLIC (init_fn_decl) = 1;
> +  DECL_VISIBILITY (init_fn_decl) = VISIBILITY_HIDDEN;
> +  DECL_VISIBILITY_SPECIFIED (init_fn_decl) = 1;
>tree arg1 = DECL_ARGUMENTS (dispatch_decl);
>tree arg2 = TREE_CHAIN (arg1);
>ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
> @@ -20456,6 +20460,9 @@ dispatch_function_versions (tree dispatch_decl,
>   get_identifier ("__aarch64_cpu_features"),
>   global_type);
>DECL_EXTERNAL (global_var) = 1;
> +  TREE_PUBLIC (global_var) = 1;
> +  DECL_VISIBILITY (global_var) = VISIBILITY_HIDDEN;
> +  DECL_VISIBILITY_SPECIFIED (global_var) = 1;
>tree mask_var = create_tmp_var (long_long_unsigned_type_node);
>  
>tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,


Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Sam James
Qing Zhao  writes:

> Control this with a new option -fdiagnostics-details.
>
> [...]

The patch doesn't apply for me on very latest trunk -- I think David's
recent diag refactoring means it needs a slight rebase. Could you send
that?



[PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-10-30 Thread Evgeny Karpov
> Symbols in the object file also look good.
> 
> 015  SECT2  notype   External | large_aligned_array
> 016 0010 SECT2  notype   External | large_aligned_array2
> 017 0020 SECT2  notype   External | large_aligned_array3
> 018 0040 SECT2  notype   External | large_aligned_array4

Here is another example that shows it works correctly at link time.

Regards,
Evgeny


struct T {
  char v1[25];
  char v2 __attribute__((aligned (8)));
  char v3 __attribute__((aligned (16)));
  char v4 __attribute__((aligned (32)));
  char v5 __attribute__((aligned (64)));
  char v6 __attribute__((aligned (128)));
  char v7 __attribute__((aligned (256)));
  char v8 __attribute__((aligned (512)));
};

v1: 
v2: 0020
v3: 0030
v4: 0040
v5: 0080
v6: 0100
v7: 0200
v8: 0400


  1   2   >