Re: [wwwdocs] Document support for znver4 in gcc-13/changes.html

2023-03-24 Thread Richard Biener via Gcc-patches
On Thu, Mar 23, 2023 at 7:05 PM Martin Jambor  wrote:
>
> Hello,
>
> On Wed, Mar 22 2023, Richard Biener wrote:
> > On Tue, Mar 21, 2023 at 8:25 PM Martin Jambor  wrote:
> >>
> >> Hello,
> >>
> >> is the following item documenting that gcc13 can generate code for Zen 4
> >> OK for the changes.html file on the web?
> >
> > OK.
>
> thanks.
>
> > Note the gcc-12 changes for the upcoming 12.3 need something similar
> > as most of the changes were backported.
>
> Like this?

Yes.  Thanks,
Richard.

> Thanks,
>
> Martin
>
>
>
> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
> index d565c217..0854b83b 100644
> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html
> @@ -1144,6 +1144,17 @@ are not listed here).
>  Note: GCC 12.3 has not been released yet, so this section is a
>  work-in-progress.
>
> +Target Specific Changes
> +
> +x86-64
> +
> +  GCC now supports AMD CPUs based on the znver4 core
> +via -march=znver4.  The switch makes GCC consider
> +using 512 bit vectors when auto-vectorizing.
> +  
> +
> +
> +
>  This is the  href="https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=12.3";>list
>  of problem reports (PRs) from GCC's bug tracking system that are
>  known to be fixed in the 12.3 release. This list might not be


Re: [PATCH] PR tree-optimization/109238 - Ranger cache dominator queries should ignore backedges.

2023-03-24 Thread Richard Biener via Gcc-patches
On Thu, Mar 23, 2023 at 7:37 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> Detailed info in the PR.
>
> As we walk the DOM tree to calculate ranges, any block with multiple
> predecessors is processed by evaluating and unioning incoming values.
> This catches more complex cases where the dominator node itself may not
> carry range adjustments that we care about.
>
> What was missing was the "quick check" doesn't propagate any info. If
> the edge we check is dominated by this block (ie, its a back edge), then
> no additional useful information can be provided as it just leads back
> to where we currently are.   Only edges which are not dominated by the
> current block need be checked.
>
> The issue arose because this "quick check" mechanism gives up on complex
> cases and returns VARYING... so any backedge would union the real value
> from the dominators with the failed result "VARYING" from that edge, and
> we get VARYING instead of the correct result.
>
> The patch simply checks if the current block dominates the predecessor
> of an edge before launching the query.
>
> Performance impact in negligible. slight slowdown for the check, slight
> speedup by doing less work.. its a wash.
>
> Bootstraps on x86_64-pc-linux-gnu with no regressions.
>
> Ok for trunk?

LGTM with the testcase Jakub provided.

Richard.

> Andrew
>
> PS I have not managed to produce a reduced testcase yet.. If I do I will
> supply it.
>
>
>


Re: Should -ffp-contract=off the default on GCC?

2023-03-24 Thread Fangrui Song via Gcc-patches
On Wed, Mar 22, 2023 at 8:52 AM Qing Zhao via Gcc-patches
 wrote:
>
>
>
> > On Mar 22, 2023, at 9:57 AM, Richard Biener via Gcc-patches 
> >  wrote:
> >
> > On Wed, Mar 22, 2023 at 1:26 PM Alexander Monakov  
> > wrote:
> >>
> >>
> >> On Wed, 22 Mar 2023, Richard Biener wrote:
> >>
> >>> I think it's even less realistic to expect users to know the details of
> >>> floating-point math.  So I doubt any such sentence will be helpful
> >>> besides spreading some FUD?
> >>
> >> I think it's closer to "fundamental notions" rather than "details". For
> >> users who bother to read the GCC manual there's a decent chance it wouldn't
> >> be for naught.
> >>
> >> For documentation, I was thinking
> >>
> >>  Together with -fexcess-precision=standard, -ffp-contract=off
> >>  is necessary to ensure that rounding of intermediate results to precision
> >>  implied by the source code and the FLT_EVAL_METHOD macro is not
> >>  omitted by the compiler.
> >
> > that sounds good to me
>
> Shall we add such clarification to our Gcc13 doc? That should be helpful if 
> we keep the currently default.
>
> Qing
> >
> >> Alexander
>

While updating the documentation, consider adding information that
#pragma STDC FP_CONTRACT OFF is ignored with -ffp-contract=fast.

This surprising behavior motivated Clang to add
-Xclang=-ffp-contract=fast-honor-pragmas
(https://discourse.llvm.org/t/fp-contract-fast-and-pragmas/58529).



-- 
宋方睿


[committed] testsuite: Fix up gcc.target/i386/pr109137.c testcase [PR109137]

2023-03-24 Thread Jakub Jelinek via Gcc-patches
Hi!
On Wed, Mar 22, 2023 at 01:37:39PM -0400, Vladimir Makarov via Gcc-patches 
wrote:
> * gcc.target/i386/pr109137.c: New.

The testcase has a couple of small problems:
1) had -m32 in dg-options, that should never be done, instead the test
   should be guarded on ia32
2) adds -fPIC unconditionally (that should be guarded on fpic effective
   target)
3) using #include  for a RA test seems unnecessary, __builtin_memset
   handles it without the header

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2023-03-24  Jakub Jelinek  

PR target/109137
* gcc.target/i386/pr109137.c: Remove -m32 from dg-options, instead
require ia32 effective target.  Only add -fPIC for fpic effective
target.  Remove #include , use __builtin_memset instead of
memset.

--- gcc/testsuite/gcc.target/i386/pr109137.c.jj 2023-03-23 10:00:58.776093791 
+0100
+++ gcc/testsuite/gcc.target/i386/pr109137.c2023-03-23 19:16:37.858850237 
+0100
@@ -1,6 +1,8 @@
-/* { dg-do compile } */
-/* { dg-options "-m32 -O3 -march=znver1 -fPIC -mfpmath=sse -w" } */
-#include 
+/* PR target/109137 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O3 -march=znver1 -mfpmath=sse -w" } */
+/* { dg-additional-options "-fPIC" { target fpic } } */
+
 typedef struct {
   char bytestream_end;
 } CABACContext;
@@ -12,7 +14,7 @@ typedef struct {
 } H264SliceContext;
 H264SliceContext ff_h264_decode_mb_cabac_sl;
 void ff_h264_decode_mb_cabac(void) {
-  memset((void*)ff_h264_decode_mb_cabac_h_0, 6, 48);
+  __builtin_memset((void*)ff_h264_decode_mb_cabac_h_0, 6, 48);
   int i;
   for (;; i++) {
 __asm__(""/* { dg-error "'asm' operand has impossible constraints" } */
@@ -25,4 +27,3 @@ void ff_h264_decode_mb_cabac(void) {
 decode_cabac_mb_intra4x4_pred_mode_mode;
   }
 }
-


Jakub



[PATCH] builtins: Fix up ICE in inline_string_cmp [PR109258]

2023-03-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The PR109086 r13-6690 inline_string_cmp change to
  if (diff != result)
emit_move_insn (result, diff);
regressed
FAIL: go.test/test/fixedbugs/bug207.go,  -O2 -g  (internal compiler error: in 
emit_move_insn, at expr.cc:4224)
The problem is the Go FE doesn't mark __builtin_memcmp as pure (I'll also
send patch for that) and so result is const0_rtx when the call lost its lhs
and the above move ICEs because moving something into const0_rtx is obviously
invalid.
I think it is better not to rely on all FEs having these *cmp functions
pure anD DCE being performed.  The following patch just punts from the
inline expansion in that case, so we just emit normal library call.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-24  Jakub Jelinek  

PR middle-end/109258
* builtins.cc (inline_expand_builtin_bytecmp): Return NULL_RTX early
if target == const0_rtx.

--- gcc/builtins.cc.jj  2023-03-23 10:00:58.308100548 +0100
+++ gcc/builtins.cc 2023-03-23 11:05:38.308135309 +0100
@@ -7178,8 +7178,8 @@ inline_expand_builtin_bytecmp (tree exp,
   bool is_ncmp = (fcode == BUILT_IN_STRNCMP || fcode == BUILT_IN_MEMCMP);
 
   /* Do NOT apply this inlining expansion when optimizing for size or
- optimization level below 2.  */
-  if (optimize < 2 || optimize_insn_for_size_p ())
+ optimization level below 2 or if unused *cmp hasn't been DCEd.  */
+  if (optimize < 2 || optimize_insn_for_size_p () || target == const0_rtx)
 return NULL_RTX;
 
   gcc_checking_assert (fcode == BUILT_IN_STRCMP

Jakub



[PATCH] go: Fix up go.test/test/fixedbugs/bug207.go failure [PR109258]

2023-03-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The PR109086 r13-6690 inline_string_cmp change to
  if (diff != result)
emit_move_insn (result, diff);
regressed
FAIL: go.test/test/fixedbugs/bug207.go,  -O2 -g  (internal compiler error: in 
emit_move_insn, at expr.cc:4224)
The problem is the Go FE doesn't mark __builtin_memcmp as pure as other FEs,
so we ended up with
  __builtin_memcmp (whatever, whateverelse, somesize);
in the IL before expansion and the expansion ICEd on it.
As the builtin calls a library function which is pure or is inline expanded
as such, not marking it pure is an unnecessary pessimization from the FE
side, keeping such dead calls in the IL if they aren't needed will not help
anything.

The following patch fixes that.  Initially I've added just DECL_PURE_P to
it, but that unfortunately broke bootstrap, for __builtin_memcmp there is
also __builtin_memcmp_eq registered by the middle-end code if not registered
earlier and that one is registered with the usual flags (pure, nothrow,
leaf), so if __builtin_memcmp from FE was just pure, it would appear in the
IL as that it can raise exceptions and when folded into __builtin_memcmp_eq
all of sudden it couldn't and we'd ICE in verification.

I think tons of functions should have builtin_nothrow as well, but changing
that wasn't necessary for this fix.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-24  Jakub Jelinek  

PR middle-end/109258
* go-gcc.cc (Gcc_backend): Add new static data members builtin_pure
and builtin_nothrow.
(Gcc_backend::Gcc_backend): Pass builtin_pure | builtin_nothrow for
BUILT_IN_MEMCMP.
(Gcc_backend::define_builtin): Handle builtin_pure and builtin_nothrow
in flags.

--- gcc/go/go-gcc.cc.jj 2023-01-18 12:22:10.396234744 +0100
+++ gcc/go/go-gcc.cc2023-03-23 23:48:11.120064331 +0100
@@ -543,6 +543,8 @@ private:
   static const int builtin_const = 1 << 0;
   static const int builtin_noreturn = 1 << 1;
   static const int builtin_novops = 1 << 2;
+  static const int builtin_pure = 1 << 3;
+  static const int builtin_nothrow = 1 << 4;
 
   void
   define_builtin(built_in_function bcode, const char* name, const char* 
libname,
@@ -601,7 +603,7 @@ Gcc_backend::Gcc_backend()
const_ptr_type_node,
size_type_node,
NULL_TREE),
-  0);
+  builtin_pure | builtin_nothrow);
 
   // We use __builtin_memmove for copying data.
   this->define_builtin(BUILT_IN_MEMMOVE, "__builtin_memmove", "memmove",
@@ -3596,6 +3598,10 @@ Gcc_backend::define_builtin(built_in_fun
   libname, NULL_TREE);
   if ((flags & builtin_const) != 0)
 TREE_READONLY(decl) = 1;
+  if ((flags & builtin_pure) != 0)
+DECL_PURE_P(decl) = 1;
+  if ((flags & builtin_nothrow) != 0)
+TREE_NOTHROW (decl) = 1;
   if ((flags & builtin_noreturn) != 0)
 TREE_THIS_VOLATILE(decl) = 1;
   if ((flags & builtin_novops) != 0)
@@ -3608,6 +3614,10 @@ Gcc_backend::define_builtin(built_in_fun
  NULL, NULL_TREE);
   if ((flags & builtin_const) != 0)
TREE_READONLY(decl) = 1;
+  if ((flags & builtin_pure) != 0)
+   DECL_PURE_P(decl) = 1;
+  if ((flags & builtin_nothrow) != 0)
+   TREE_NOTHROW (decl) = 1;
   if ((flags & builtin_noreturn) != 0)
TREE_THIS_VOLATILE(decl) = 1;
   if ((flags & builtin_novops) != 0)

Jakub



Re: [PATCH] builtins: Fix up ICE in inline_string_cmp [PR109258]

2023-03-24 Thread Richard Biener via Gcc-patches
On Fri, 24 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> The PR109086 r13-6690 inline_string_cmp change to
>   if (diff != result)
>   emit_move_insn (result, diff);
> regressed
> FAIL: go.test/test/fixedbugs/bug207.go,  -O2 -g  (internal compiler error: in 
> emit_move_insn, at expr.cc:4224)
> The problem is the Go FE doesn't mark __builtin_memcmp as pure (I'll also
> send patch for that) and so result is const0_rtx when the call lost its lhs
> and the above move ICEs because moving something into const0_rtx is obviously
> invalid.
> I think it is better not to rely on all FEs having these *cmp functions
> pure anD DCE being performed.  The following patch just punts from the
> inline expansion in that case, so we just emit normal library call.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-03-24  Jakub Jelinek  
> 
>   PR middle-end/109258
>   * builtins.cc (inline_expand_builtin_bytecmp): Return NULL_RTX early
>   if target == const0_rtx.
> 
> --- gcc/builtins.cc.jj2023-03-23 10:00:58.308100548 +0100
> +++ gcc/builtins.cc   2023-03-23 11:05:38.308135309 +0100
> @@ -7178,8 +7178,8 @@ inline_expand_builtin_bytecmp (tree exp,
>bool is_ncmp = (fcode == BUILT_IN_STRNCMP || fcode == BUILT_IN_MEMCMP);
>  
>/* Do NOT apply this inlining expansion when optimizing for size or
> - optimization level below 2.  */
> -  if (optimize < 2 || optimize_insn_for_size_p ())
> + optimization level below 2 or if unused *cmp hasn't been DCEd.  */
> +  if (optimize < 2 || optimize_insn_for_size_p () || target == const0_rtx)
>  return NULL_RTX;
>  
>gcc_checking_assert (fcode == BUILT_IN_STRCMP
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH 1/2] Disallow -gno-dwarf, gno-dwarf-N, -gno-gdb and -gno-vms

2023-03-24 Thread Richard Biener via Gcc-patches
The following adds RejectNegative to the gdwarf, gdwarf-, ggdb and gvms
options since the current behavior is to treat the negative variant
the same as the positive variant.  In particular -ggdb -gno-gdb
do not cancel, and plain -gno-dwarf will enable (dwarf!) debug output.

Rejecting the negative forms avoids interpreting sensible behavior
to combinations of options like -gdwarf-5 -gno-dwarf-3 and sticks to
the behavior that later -g options simply override earlier ones and
the only negative form is -g0.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

* common.opt (gdwarf): Add RejectNegative.
(gdwarf-): Likewise.
(ggdb): Likewise.
(gvms): Likewise.
---
 gcc/common.opt | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index e558385c7f4..4546acb5b81 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3349,11 +3349,11 @@ Common Driver RejectNegative JoinedOrMissing
 Generate BTF debug information at default level.
 
 gdwarf
-Common Driver JoinedOrMissing Negative(gdwarf-)
+Common Driver JoinedOrMissing Negative(gdwarf-) RejectNegative
 Generate debug information in default version of DWARF format.
 
 gdwarf-
-Common Driver Joined UInteger Var(dwarf_version) Init(DWARF_VERSION_DEFAULT)
+Common Driver Joined UInteger Var(dwarf_version) Init(DWARF_VERSION_DEFAULT) 
RejectNegative
 Generate debug information in DWARF v2 (or later) format.
 
 gdwarf32
@@ -3365,7 +3365,7 @@ Common Driver Var(dwarf_offset_size,8) RejectNegative
 Use 64-bit DWARF format when emitting DWARF debug information.
 
 ggdb
-Common Driver JoinedOrMissing
+Common Driver JoinedOrMissing RejectNegative
 Generate debug information in default extended format.
 
 ginline-points
@@ -3432,7 +3432,7 @@ gvariable-location-views=incompat5
 Common Driver RejectNegative Var(debug_variable_location_views, -1) Init(2)
 
 gvms
-Common Driver JoinedOrMissing
+Common Driver JoinedOrMissing RejectNegative
 Generate debug information in VMS format.
 
 gxcoff
-- 
2.35.3



[PATCH 2/2] Remove Negative(gwarf-) from gdwarf

2023-03-24 Thread Richard Biener via Gcc-patches
Prior to the removal of STABS support the gdwarf, gstabs, ... options
formed a cycle with their Negative(..) option attribute.  But that
didn't actually have any effect since most of the options also
are Joined or JoinedOrMissing for which there's no pruning of options
and so once ran into the set_debug_level diagnostics reporting
conflicting debug formats.

The following removes the remains of that cycle, which is a
Negative option from gdwarf to gdwarf-.  With RejectNegative
added the expected effect of -gdwarf-4 -gdwarf would be to
enable DWARF5 support (but this doesn't happen for some reason).
I think the more sensible behavior is that seen and implemented
in opts.cc, the more specific -gdwarf-4 determines the DWARF level
and a later or earlier -gdwarf becomes a no-op.  So the
Negative(..) annotation on gdwarf is just confusing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

* common.opt (gdwarf): Remove Negative(gdwarf-).
---
 gcc/common.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4546acb5b81..862c474d3c8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3349,7 +3349,7 @@ Common Driver RejectNegative JoinedOrMissing
 Generate BTF debug information at default level.
 
 gdwarf
-Common Driver JoinedOrMissing Negative(gdwarf-) RejectNegative
+Common Driver JoinedOrMissing RejectNegative
 Generate debug information in default version of DWARF format.
 
 gdwarf-
-- 
2.35.3


Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Richard Biener via Gcc-patches
On Mon, 20 Mar 2023, Richard Biener wrote:

> On Fri, 17 Mar 2023, Jakub Jelinek wrote:
> 
> > On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > > > +   /* Drop the const attribute from the call type (the pure
> > > > +  attribute is not available on types).  */
> > > > +   tree fntype = gimple_call_fntype (call);
> > > > +   if (fntype && TYPE_READONLY (fntype))
> > > > + gimple_call_set_fntype
> > > > +   (call, build_qualified_type (fntype, (TYPE_QUALS 
> > > > (fntype)
> > > > + & 
> > > > ~TYPE_QUAL_CONST)));
> > > 
> > > Sorry, now I am bit confused on why Jakub's fix did not need similar
> > > fixup.  The flag is only set during the profiling stage and thus I would
> > > expect it to still run into the problem that VOPs are missing.
> > > Is it only becuase we do not sanity check?
> > 
> > My patch started from this point ignoring all TYPE_READONLY bits on
> > all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
> > explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
> > show up in any of the gimple_call_fntype types unless it is a direct
> > call to a const function for which we don't have a body.
> > 
> > In either case, vops are added on the update_stmt a few lines later.
> > 
> > > Here is a testcase that shows the problem:
> > > 
> > > include 
> > > float c;
> > > 
> > > __attribute__ ((const))
> > > float
> > > instrument(float v)
> > > {
> > > return sin (v);
> > > }
> > > __attribute__ ((no_profile_instrument_function,const,noinline))
> > > float noinstrument (float v)
> > > {
> > > return instrument (v);
> > > }
> > > 
> > > m()
> > > {
> > > c+=instrument(c);
> > > if (!__builtin_expect (c,1))
> > > {
> > >   c+=noinstrument (c);
> > > }
> > > c+=instrument(c);
> > > }
> > > main()
> > > {
> > > m();
> > > }
> > > 
> > > Compiling 
> > > gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm 
> > > -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> > > makes gcov to report 3 executions on instrument while with -O3 it is 2.
> 
> With my proposed patch it works fine and reports 3 executions on
> 'instrument' with both -O0 and -O3.  I checked it indeed reports only
> 2 executions with GCC 12.
> 
> So it seems the patch is a progression in general?
> 
> Thus, OK?

Honza - ping, are you fine with the patch in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614074.html?

Thanks,
Richard.


[committed] testsuite: Add testcase for already fixed PR [PR99739]

2023-03-24 Thread Jakub Jelinek via Gcc-patches
Hi!

This PR was fixed by r13-1268-g8c99e307b20, I'm adding testcase
to make sure we don't regress on it in the future.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2023-03-24  Jakub Jelinek  

PR tree-optimization/99739
* gcc.dg/tree-ssa/pr99739.c: New test.

--- gcc/testsuite/gcc.dg/tree-ssa/pr99739.c.jj  2023-03-23 19:08:39.528651948 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr99739.c 2023-03-23 19:08:31.158770968 
+0100
@@ -0,0 +1,40 @@
+/* PR tree-optimization/99739 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "__builtin_abort \\\(\\\);" "optimized" } } 
*/
+
+static inline int
+foo (int i, int j, int k)
+{
+  int x = 1;
+  if (i && j && k)
+x = 2;
+  if (i && j && k)
+return x;
+  return -1;
+}
+
+void
+bar (int i, int j, int k)
+{
+  if (foo (i, j, k) == 1)
+__builtin_abort ();
+}
+
+static inline int
+baz (int i, int j, int k)
+{
+  int x = 1;
+  if (i && j && k)
+x = 2;
+  if (i && k && j)
+return x;
+  return -1;
+}
+
+void
+qux (int i, int j, int k)
+{
+  if (baz (i, j, k) == 1)
+__builtin_abort ();
+}

Jakub



Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Jan Hubicka via Gcc-patches
> On Fri, 17 Mar 2023, Jakub Jelinek wrote:
> 
> > On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > > > +   /* Drop the const attribute from the call type (the pure
> > > > +  attribute is not available on types).  */
> > > > +   tree fntype = gimple_call_fntype (call);
> > > > +   if (fntype && TYPE_READONLY (fntype))
> > > > + gimple_call_set_fntype
> > > > +   (call, build_qualified_type (fntype, (TYPE_QUALS 
> > > > (fntype)
> > > > + & 
> > > > ~TYPE_QUAL_CONST)));
> > > 
> > > Sorry, now I am bit confused on why Jakub's fix did not need similar
> > > fixup.  The flag is only set during the profiling stage and thus I would
> > > expect it to still run into the problem that VOPs are missing.
> > > Is it only becuase we do not sanity check?
> > 
> > My patch started from this point ignoring all TYPE_READONLY bits on
> > all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
> > explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
> > show up in any of the gimple_call_fntype types unless it is a direct
> > call to a const function for which we don't have a body.
> > 
> > In either case, vops are added on the update_stmt a few lines later.
> > 
> > > Here is a testcase that shows the problem:
> > > 
> > > include 
> > > float c;
> > > 
> > > __attribute__ ((const))
> > > float
> > > instrument(float v)
> > > {
> > > return sin (v);
> > > }
> > > __attribute__ ((no_profile_instrument_function,const,noinline))
> > > float noinstrument (float v)
> > > {
> > > return instrument (v);
> > > }
> > > 
> > > m()
> > > {
> > > c+=instrument(c);
> > > if (!__builtin_expect (c,1))
> > > {
> > >   c+=noinstrument (c);
> > > }
> > > c+=instrument(c);
> > > }
> > > main()
> > > {
> > > m();
> > > }
> > > 
> > > Compiling 
> > > gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm 
> > > -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> > > makes gcov to report 3 executions on instrument while with -O3 it is 2.
> 
> With my proposed patch it works fine and reports 3 executions on
> 'instrument' with both -O0 and -O3.  I checked it indeed reports only
> 2 executions with GCC 12.
> 
> So it seems the patch is a progression in general?
> 
> Thus, OK?
OK,
thanks!
Honza
> 
> Thanks,
> Richard.


Re: [PATCHv4] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2023-03-24 Thread Andrew Pinski via Gcc-patches
On Fri, Mar 3, 2023 at 10:28 AM Andrew Pinski  wrote:
>
> On Thu, Feb 9, 2023 at 7:54 PM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > The problem here is that aarch64_expand_setmem does not change the alignment
> > for strict alignment case.
> > This is version 4 of the fix, major changes from the last version is fixing
> > the way store pairs are handled which allows handling of storing 2 SI mode
> > at a time.
> > This also adds a testcase to show a case with -mstrict-align we can do
> > the store word pair stores.
> >
> > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> Ping?

Ping?


>
> >
> > PR target/103100
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc (aarch64_gen_store_pair):
> > Add support for SImode.
> > (aarch64_set_one_block_and_progress_pointer):
> > Add use_pair argument and rewrite and simplifying the
> > code.
> > (aarch64_can_use_pair_load_stores): New function.
> > (aarch64_expand_setmem): Rewrite mode selection to
> > better handle strict alignment and non ld/stp pair case.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/memset-strict-align-1.c: Update test.
> > Reduce the size down to 207 and make s1 global and aligned
> > to 16 bytes.
> > * gcc.target/aarch64/memset-strict-align-2.c: New test.
> > * gcc.target/aarch64/memset-strict-align-3.c: New test.
> > ---
> >  gcc/config/aarch64/aarch64.cc | 136 ++
> >  .../aarch64/memset-strict-align-1.c   |  19 ++-
> >  .../aarch64/memset-strict-align-2.c   |  14 ++
> >  .../aarch64/memset-strict-align-3.c   |  15 ++
> >  4 files changed, 113 insertions(+), 71 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-3.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 5c40b6ed22a..3eaf9bd608a 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -8850,6 +8850,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, 
> > rtx reg1, rtx mem2,
> >  {
> >switch (mode)
> >  {
> > +case E_SImode:
> > +  return gen_store_pair_sw_sisi (mem1, reg1, mem2, reg2);
> > +
> >  case E_DImode:
> >return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2);
> >
> > @@ -24896,42 +24899,49 @@ aarch64_expand_cpymem (rtx *operands)
> > SRC is a register we have created with the duplicated value to be set.  
> > */
> >  static void
> >  aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
> > -   machine_mode mode)
> > +   machine_mode mode, bool 
> > use_pairs)
> >  {
> > +  rtx reg = src;
> >/* If we are copying 128bits or 256bits, we can do that straight from
> >   the SIMD register we prepared.  */
> > -  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> > -{
> > -  mode = GET_MODE (src);
> > -  /* "Cast" the *dst to the correct mode.  */
> > -  *dst = adjust_address (*dst, mode, 0);
> > -  /* Emit the memset.  */
> > -  emit_insn (aarch64_gen_store_pair (mode, *dst, src,
> > -aarch64_progress_pointer (*dst), 
> > src));
> > -
> > -  /* Move the pointers forward.  */
> > -  *dst = aarch64_move_pointer (*dst, 32);
> > -  return;
> > -}
> >if (known_eq (GET_MODE_BITSIZE (mode), 128))
> > -{
> > -  /* "Cast" the *dst to the correct mode.  */
> > -  *dst = adjust_address (*dst, GET_MODE (src), 0);
> > -  /* Emit the memset.  */
> > -  emit_move_insn (*dst, src);
> > -  /* Move the pointers forward.  */
> > -  *dst = aarch64_move_pointer (*dst, 16);
> > -  return;
> > -}
> > -  /* For copying less, we have to extract the right amount from src.  */
> > -  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
> > +mode = GET_MODE(src);
> > +  else
> > +/* For copying less, we have to extract the right amount from src.  */
> > +reg = lowpart_subreg (mode, src, GET_MODE (src));
> >
> >/* "Cast" the *dst to the correct mode.  */
> >*dst = adjust_address (*dst, mode, 0);
> >/* Emit the memset.  */
> > -  emit_move_insn (*dst, reg);
> > +  if (use_pairs)
> > +emit_insn (aarch64_gen_store_pair (mode, *dst, reg,
> > +  aarch64_progress_pointer (*dst),
> > +  reg));
> > +  else
> > +emit_move_insn (*dst, reg);
> > +
> >/* Move the pointer forward.  */
> >*dst = aarch64_progress_pointer (*dst);
> > +  if (use_pairs)
> > +*dst = aarch64_progress_pointer (*dst);
> > +}
> > +
> > +/* Returns true if size can be used as a store/load pair.
> > +   This is a helper function for aarch64_expand_setmem and others. */
> > 

[wwwdocs] Mention the GNU C enum changes in gcc-13/changes.html

2023-03-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch attempts to document the PR36113 changes to the
GNU C extension support of enumerators which don't fit into int.

Ok for wwwdocs?

Shall we mention it in porting_to.html as well?
The only known affected package is (was?) the Linux kernel.

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index f8e9560c..d2035435 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -59,6 +59,22 @@ a work-in-progress.
   https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/ARM-iWMMXt-Built-in-Functions.html";>
   iWMMXt built-in functions.
 
+The behavior of the GNU C extension support of enumerators which
+  don't fit into int has been changed to match the C2X
+  behavior.  Previously, in enumerations where at least one enumerator
+  didn't fit into int, only those enumerators that didn't
+  fit into int had the enum type and others
+  had int type and while the enum is being
+  defined enumerators that didn't fit into int had some
+  unspecified type with the sign and precision of its value.
+  In GCC13, in such enumerations all enumerators have the
+  enum type and while the enum is being
+  defined enumerators that didn't fit into int have
+  type of their value.  If all enumerators fit into int
+  type, as before all enumerators have int type, both
+  while the enum is being defined and after it is defined.
+  See https://gcc.gnu.org/PR36113";>PR36113 for details.
+
 
 
 

Jakub



Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Jan Hubicka via Gcc-patches
> From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Thu, 16 Mar 2023 13:51:19 +0100
> Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
> To: gcc-patches@gcc.gnu.org
> 
> The following makes sure that after clearing pure/const from
> instrumented function declarations we are adjusting call statements
> fntype as well to handle indirect calls and because gimple_call_flags
> looks at both decl and fntype.
> 
> Like the pure/const flag clearing on decls we refrain from touching
> calls to known functions that do not have a body in the current TU.
> 
>   PR tree-optimization/106912
>   * tree-profile.cc (tree_profiling): Update stmts only when
>   profiling or testing coverage.  Make sure to update calls
>   fntype, stripping 'const' there.
> 
>   * gcc.dg/profile-generate-4.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/profile-generate-4.c | 19 
>  gcc/tree-profile.cc   | 38 +--
>  2 files changed, 47 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
> 
> diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c 
> b/gcc/testsuite/gcc.dg/profile-generate-4.c
> new file mode 100644
> index 000..c2b999fe4cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
> @@ -0,0 +1,19 @@
> +/* PR106912 */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
> +
> +__attribute__ ((__simd__))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
> +double foo (double x);
> +
> +void bar(double *f, int n)
> +{
> +  int i;
> +  for (i = 0; i < n; i++)
> +f[i] = foo(f[i]);
> +}
> +
> +double foo(double x)
> +{
> +  return x * x / 3.0;
> +}
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index ea9d7a23443..7854cd4bc31 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -835,16 +835,34 @@ tree_profiling (void)
>  
>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>  
> -  FOR_EACH_BB_FN (bb, cfun)
> - {
> -   gimple_stmt_iterator gsi;
> -   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> - {
> -   gimple *stmt = gsi_stmt (gsi);
> -   if (is_gimple_call (stmt))
> - update_stmt (stmt);
> - }
> - }
> +  if (profile_arc_flag || flag_test_coverage)
> + FOR_EACH_BB_FN (bb, cfun)
> +   {
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +   {
> + gcall *call = dyn_cast  (gsi_stmt (gsi));
> + if (!call)
> +   continue;
> +
> + /* We do not clear pure/const on decls without body.  */
> + tree fndecl = gimple_call_fndecl (call);
> + if (fndecl && !gimple_has_body_p (fndecl))
> +   continue;

Actually on second thought, I think I can break this either by making
the wraping function to be thunk or alias or by moving it to different
compilation unit.
Also with LTO we will get body later.

So I think we need to drop this optimization.

Honza
> +
> + /* Drop the const attribute from the call type (the pure
> +attribute is not available on types).  */
> + tree fntype = gimple_call_fntype (call);
> + if (fntype && TYPE_READONLY (fntype))
> +   gimple_call_set_fntype
> + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> +   & ~TYPE_QUAL_CONST)));
> +
> + /* Update virtual operands of calls to no longer const/pure
> +functions.  */
> + update_stmt (call);
> +   }
> +   }
>  
>/* re-merge split blocks.  */
>cleanup_tree_cfg ();
> -- 
> 2.35.3
> 


Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Richard Biener via Gcc-patches
On Fri, 24 Mar 2023, Jan Hubicka wrote:

> > From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> > From: Richard Biener 
> > Date: Thu, 16 Mar 2023 13:51:19 +0100
> > Subject: [PATCH] tree-optimization/106912 - clear const attribute from 
> > fntype
> > To: gcc-patches@gcc.gnu.org
> > 
> > The following makes sure that after clearing pure/const from
> > instrumented function declarations we are adjusting call statements
> > fntype as well to handle indirect calls and because gimple_call_flags
> > looks at both decl and fntype.
> > 
> > Like the pure/const flag clearing on decls we refrain from touching
> > calls to known functions that do not have a body in the current TU.
> > 
> > PR tree-optimization/106912
> > * tree-profile.cc (tree_profiling): Update stmts only when
> > profiling or testing coverage.  Make sure to update calls
> > fntype, stripping 'const' there.
> > 
> > * gcc.dg/profile-generate-4.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/profile-generate-4.c | 19 
> >  gcc/tree-profile.cc   | 38 +--
> >  2 files changed, 47 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
> > 
> > diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c 
> > b/gcc/testsuite/gcc.dg/profile-generate-4.c
> > new file mode 100644
> > index 000..c2b999fe4cb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
> > @@ -0,0 +1,19 @@
> > +/* PR106912 */
> > +/* { dg-require-profiling "-fprofile-generate" } */
> > +/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
> > +
> > +__attribute__ ((__simd__))
> > +__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
> > +double foo (double x);
> > +
> > +void bar(double *f, int n)
> > +{
> > +  int i;
> > +  for (i = 0; i < n; i++)
> > +f[i] = foo(f[i]);
> > +}
> > +
> > +double foo(double x)
> > +{
> > +  return x * x / 3.0;
> > +}
> > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > index ea9d7a23443..7854cd4bc31 100644
> > --- a/gcc/tree-profile.cc
> > +++ b/gcc/tree-profile.cc
> > @@ -835,16 +835,34 @@ tree_profiling (void)
> >  
> >push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> >  
> > -  FOR_EACH_BB_FN (bb, cfun)
> > -   {
> > - gimple_stmt_iterator gsi;
> > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > -   {
> > - gimple *stmt = gsi_stmt (gsi);
> > - if (is_gimple_call (stmt))
> > -   update_stmt (stmt);
> > -   }
> > -   }
> > +  if (profile_arc_flag || flag_test_coverage)
> > +   FOR_EACH_BB_FN (bb, cfun)
> > + {
> > +   gimple_stmt_iterator gsi;
> > +   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > + {
> > +   gcall *call = dyn_cast  (gsi_stmt (gsi));
> > +   if (!call)
> > + continue;
> > +
> > +   /* We do not clear pure/const on decls without body.  */
> > +   tree fndecl = gimple_call_fndecl (call);
> > +   if (fndecl && !gimple_has_body_p (fndecl))
> > + continue;
> 
> Actually on second thought, I think I can break this either by making
> the wraping function to be thunk or alias or by moving it to different
> compilation unit.
> Also with LTO we will get body later.
> 
> So I think we need to drop this optimization.

It's the same condition guarding the set_{const,pure}_flag call earlier
(but yes, I agree).  So it isn't covered by the regression so we
should address this for next stage1.

Richard.


Re: [PATCH 2/2] [i386] Adjust costing of emulated vectorized gather/scatter

2023-03-24 Thread Jan Hubicka via Gcc-patches
> Emulated gather/scatter behave similar to strided elementwise
> accesses in that they need to decompose the offset vector
> and construct or decompose the data vector so handle them
> the same way, pessimizing the cases with may elements.
> 
> For pr88531-2c.c instead of
> 
> .L4:
> leaq(%r15,%rcx), %rdx
> incl%edi
> movl16(%rdx), %r13d
> movl24(%rdx), %r14d
> movl(%rdx), %r10d
> movl4(%rdx), %r9d
> movl8(%rdx), %ebx
> movl12(%rdx), %r11d
> movl20(%rdx), %r12d
> vmovss  (%rax,%r14,4), %xmm2
> movl28(%rdx), %edx
> vmovss  (%rax,%r13,4), %xmm1
> vmovss  (%rax,%r10,4), %xmm0
> vinsertps   $0x10, (%rax,%rdx,4), %xmm2, %xmm2
> vinsertps   $0x10, (%rax,%r12,4), %xmm1, %xmm1
> vinsertps   $0x10, (%rax,%r9,4), %xmm0, %xmm0
> vmovlhps%xmm2, %xmm1, %xmm1
> vmovss  (%rax,%rbx,4), %xmm2
> vinsertps   $0x10, (%rax,%r11,4), %xmm2, %xmm2
> vmovlhps%xmm2, %xmm0, %xmm0
> vinsertf128 $0x1, %xmm1, %ymm0, %ymm0
> vmulps  %ymm3, %ymm0, %ymm0
> vmovups %ymm0, (%r8,%rcx)
> addq$32, %rcx
> cmpl%esi, %edi
> jb  .L4
> 
> we now prefer
> 
> .L4:
> leaq0(%rbp,%rdx,8), %rcx
> movl(%rcx), %r10d
> movl4(%rcx), %ecx
> vmovss  (%rsi,%r10,4), %xmm0
> vinsertps   $0x10, (%rsi,%rcx,4), %xmm0, %xmm0
> vmulps  %xmm1, %xmm0, %xmm0
> vmovlps %xmm0, (%rbx,%rdx,8)
> incq%rdx
> cmpl%edi, %edx
> jb  .L4
> 
> which vectorizes with SSE instead of AVX2 which looks like an
> improvement.
> 
> When testing this on SPEC CPU 2017 with -Ofast -flto -march=znver4
> there are quite some cases where we now prefer SSE vectorization
> over AVX512 + AVX2 epilogue and some cases where we now reject
> vectorization.  Runtime the changes are noise with the off-noise
> candidates better after the patch.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for stage1?
> 
> Thanks,
> Richard.
> 
>   * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
>   Tame down element extracts and scalar loads for gather/scatter
>   similar to elementwise strided accesses.
> 
>   * gcc.target/i386/pr89618-2.c: New testcase.
>   * gcc.target/i386/pr88531-2b.c: Adjust.
>   * gcc.target/i386/pr88531-2c.c: Likewise.
OK.

Honza


[PATCH 1/2] Add emulated scatter capability to the vectorizer

2023-03-24 Thread Richard Biener via Gcc-patches
This adds a scatter vectorization capability to the vectorizer
without target support by decomposing the offset and data vectors
and then performing scalar stores in the order of vector lanes.
This is aimed at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the scatter.

The offset load is still vectorized and costed as such, but like
with emulated gather those will be turned back to scalar loads
by forwrpop.

Ontop of the cost model patch when testing on SPEC CPU 2017 this
vectorizes an additional three loops and there's no runtime
change.  When one enables AVX512 scatters with
-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter
that on the unpatched tree enables 166 new loops + epilogues
to be vectorized instead, so the actual Zen4 costs for AVX512
scatters are cheaper than what we cost the emulated cases.
Note the handled cases are not 100% overlapping, there's one
additional loop vectorized with the emulation that's not
vectorized with the AVX512 scatter ISA vectorization.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

Richard.

* tree-vect-data-refs.cc (vect_analyze_data_refs): Always
consider scatters.
* tree-vect-stmts.cc (vect_model_store_cost): Pass in the
gather-scatter info and cost emulated scatters accordingly.
(get_load_store_type): Support emulated scatters.
(vectorizable_store): Likewise.  Emulate them by extracting
scalar offsets and data, doing scalar stores.

* gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
* gcc.dg/vect/vect-71.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr25413a.c  |   3 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-s4113.c|   2 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-s491.c |   2 +-
 .../gcc.dg/vect/tsvc/vect-tsvc-vas.c  |   2 +-
 gcc/testsuite/gcc.dg/vect/vect-71.c   |   2 +-
 gcc/tree-vect-data-refs.cc|   4 +-
 gcc/tree-vect-stmts.cc| 102 +++---
 7 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr25413a.c 
b/gcc/testsuite/gcc.dg/vect/pr25413a.c
index e444b2c3e8e..ffb517c9ce0 100644
--- a/gcc/testsuite/gcc.dg/vect/pr25413a.c
+++ b/gcc/testsuite/gcc.dg/vect/pr25413a.c
@@ -123,7 +123,6 @@ int main (void)
   return 0;
 } 
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! vect_scatter_store } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target 
vect_scatter_store } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "vector alignment may not be reachable" 1 
"vect" { target { ! vector_alignment_reachable  } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
versioning" 1 "vect" { target { ! vector_alignment_reachable } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
index b64682a65df..ddb7e9dc0e8 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
aarch64_sve }  } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
index 8465e137070..29e90ff0aff 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
aarch64_sve }  } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
index 5ff38851f43..b72ee21a9a3 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
aarch64_sve }  } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c 
b/gcc/testsuite/gcc.dg/vect/vect-71.c
index f15521176df..581473fa4a1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-71.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-71.c
@@ -36,4 +36,4 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! 
vect_scatter_store } } } } */
+/* { dg-final { scan-tree-dump-times "vectorize

[PATCH 2/2] [i386] Adjust costing of emulated vectorized gather/scatter

2023-03-24 Thread Richard Biener via Gcc-patches
Emulated gather/scatter behave similar to strided elementwise
accesses in that they need to decompose the offset vector
and construct or decompose the data vector so handle them
the same way, pessimizing the cases with may elements.

For pr88531-2c.c instead of

.L4:
leaq(%r15,%rcx), %rdx
incl%edi
movl16(%rdx), %r13d
movl24(%rdx), %r14d
movl(%rdx), %r10d
movl4(%rdx), %r9d
movl8(%rdx), %ebx
movl12(%rdx), %r11d
movl20(%rdx), %r12d
vmovss  (%rax,%r14,4), %xmm2
movl28(%rdx), %edx
vmovss  (%rax,%r13,4), %xmm1
vmovss  (%rax,%r10,4), %xmm0
vinsertps   $0x10, (%rax,%rdx,4), %xmm2, %xmm2
vinsertps   $0x10, (%rax,%r12,4), %xmm1, %xmm1
vinsertps   $0x10, (%rax,%r9,4), %xmm0, %xmm0
vmovlhps%xmm2, %xmm1, %xmm1
vmovss  (%rax,%rbx,4), %xmm2
vinsertps   $0x10, (%rax,%r11,4), %xmm2, %xmm2
vmovlhps%xmm2, %xmm0, %xmm0
vinsertf128 $0x1, %xmm1, %ymm0, %ymm0
vmulps  %ymm3, %ymm0, %ymm0
vmovups %ymm0, (%r8,%rcx)
addq$32, %rcx
cmpl%esi, %edi
jb  .L4

we now prefer

.L4:
leaq0(%rbp,%rdx,8), %rcx
movl(%rcx), %r10d
movl4(%rcx), %ecx
vmovss  (%rsi,%r10,4), %xmm0
vinsertps   $0x10, (%rsi,%rcx,4), %xmm0, %xmm0
vmulps  %xmm1, %xmm0, %xmm0
vmovlps %xmm0, (%rbx,%rdx,8)
incq%rdx
cmpl%edi, %edx
jb  .L4

which vectorizes with SSE instead of AVX2 which looks like an
improvement.

When testing this on SPEC CPU 2017 with -Ofast -flto -march=znver4
there are quite some cases where we now prefer SSE vectorization
over AVX512 + AVX2 epilogue and some cases where we now reject
vectorization.  Runtime the changes are noise with the off-noise
candidates better after the patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for stage1?

Thanks,
Richard.

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Tame down element extracts and scalar loads for gather/scatter
similar to elementwise strided accesses.

* gcc.target/i386/pr89618-2.c: New testcase.
* gcc.target/i386/pr88531-2b.c: Adjust.
* gcc.target/i386/pr88531-2c.c: Likewise.
---
 gcc/config/i386/i386.cc|  6 --
 gcc/testsuite/gcc.target/i386/pr88531-2b.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr88531-2c.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr89618-2.c  | 23 ++
 4 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89618-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 6a8734c2346..7a0b48c62c5 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23555,8 +23555,10 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   && stmt_info
   && (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
  || STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
-  && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
-  && TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info))) != INTEGER_CST)
+  && ((STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
+  && (TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info)))
+  != INTEGER_CST))
+ || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER))
 {
   stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
   stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
diff --git a/gcc/testsuite/gcc.target/i386/pr88531-2b.c 
b/gcc/testsuite/gcc.target/i386/pr88531-2b.c
index 011607c3d54..cdefff2ce8e 100644
--- a/gcc/testsuite/gcc.target/i386/pr88531-2b.c
+++ b/gcc/testsuite/gcc.target/i386/pr88531-2b.c
@@ -3,4 +3,4 @@
 
 #include "pr88531-2a.c"
 
-/* { dg-final { scan-assembler-times "vmulps" 2 } } */
+/* { dg-final { scan-assembler-times "vmulps" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr88531-2c.c 
b/gcc/testsuite/gcc.target/i386/pr88531-2c.c
index 0f7ec3832f8..17b24c0dacc 100644
--- a/gcc/testsuite/gcc.target/i386/pr88531-2c.c
+++ b/gcc/testsuite/gcc.target/i386/pr88531-2c.c
@@ -3,4 +3,4 @@
 
 #include "pr88531-2a.c"
 
-/* { dg-final { scan-assembler-times "vmulps" 2 } } */
+/* { dg-final { scan-assembler-times "vmulps" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89618-2.c 
b/gcc/testsuite/gcc.target/i386/pr89618-2.c
new file mode 100644
index 000..0b7dcfd8806
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89618-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -fdump-tree-vect-details" } */
+
+void foo (int n, int *off, double *a)
+{
+  const int m = 32;
+
+  for (int j = 0; j < n/m; ++j)
+{
+  int const start = j*m;
+  int const end = (j+1)*m;
+
+#pragma GCC ivdep
+  for

Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-24 Thread Jan Hubicka via Gcc-patches
> > 
> > Actually on second thought, I think I can break this either by making
> > the wraping function to be thunk or alias or by moving it to different
> > compilation unit.
> > Also with LTO we will get body later.
> > 
> > So I think we need to drop this optimization.
> 
> It's the same condition guarding the set_{const,pure}_flag call earlier
> (but yes, I agree).  So it isn't covered by the regression so we
> should address this for next stage1.

I will do that next stage1 then.  This bug is mine and it there for a
long time (it doesn't even need LTO to reproduce).  
So it may be regression relative to pre-tree-ssa gccs but fixing such bug can
wait for 13.2.

Honza


[PATCH v2] C, ObjC: Add -Wunterminated-string-initialization

2023-03-24 Thread Alejandro Colomar via Gcc-patches
Warn about the following:

char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

char  *log_levels[]   = { "info", "warning", "err" };
vs.
char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Link: 
Link: 
Link: 

Acked-by: Doug McIlroy 
Cc: "G. Branden Robinson" 
Cc: Ralph Corderoy 
Cc: Dave Kemper 
Cc: Larry McVoy 
Cc: Andrew Pinski 
Cc: Jonathan Wakely 
Cc: Andrew Clayton 
Cc: Martin Uecker 
Signed-off-by: Alejandro Colomar 
---

Hi,

I sent v1 to the wrong list.  This time I've made sure to write to
gcc-patches@.

v2 adds some draft of a test, as suggested by Martin.  However, I don't
know yet how to write those, so the test is just a draft.  But I did
test the feature, by compiling GCC and compiling some small program with
it.

Cheers,
Alex


Range-diff against v1:
1:  61ddf1eb816 ! 1:  e40d8f54942 C, ObjC: Add 
-Wunterminated-string-initialization
@@ Commit message
 Cc: Andrew Pinski 
 Cc: Jonathan Wakely 
 Cc: Andrew Clayton 
+Cc: Martin Uecker 
 Signed-off-by: Alejandro Colomar 
 
  ## gcc/c-family/c.opt ##
@@ gcc/c/c-typeck.cc: digest_init (location_t init_loc, tree type, tree 
init, tree
  if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
{
  unsigned HOST_WIDE_INT size
+
+ ## gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c (new) ##
+@@
++/* { dg-do compile } */
++/* { dg-options "-Wunterminated-string-initialization" } */
++
++char a1[] = "a";
++char a2[1] = "a"; /* { dg-warning "unterminated char sequence" } */
++char a3[2] = "a";

 gcc/c-family/c.opt | 4 
 gcc/c/c-typeck.cc  | 6 +++---
 gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c | 6 ++
 3 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..7f1fccfe02b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1382,6 +1382,10 @@ Wunsuffixed-float-constants
 C ObjC Var(warn_unsuffixed_float_constants) Warning
 Warn about unsuffixed float constants.
 
+Wunterminated-string-initialization
+C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C 
ObjC,Wextra || Wc++-compat)
+Warn about character arrays initialized as unterminated character sequences by 
a string literal.
+
 Wunused
 C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wall)
 ; documented in common.opt
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 45bacc06c47..ce2750f98bb 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -8420,11 +8420,11 @@ digest_init (location_t init_loc, tree type, tree init, 
tree origtype,
pedwarn_init (init_loc, 0,
  ("initializer-string for array of %qT "
   "is too long"), typ1);
- else if (warn_cxx_compat
+ else if (warn_unterminated_string_initialization
   && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
-   warning_at (init_loc, OPT_Wc___compat,
+   warning_at (init_loc, OPT_Wunterminated_string_initialization,
("initializer-string for array of %qT "
-"is too long for C++"), typ1);
+"is too long"), typ1);
  if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
{
  unsigned HOST_WIDE_INT size
diff --git a/gcc/testsuite/gcc.dg/Wunterminated-string-initialization.c 
b/g

[PATCH] RTL: Bugfix for wrong code with v16hi compare & mask

2023-03-24 Thread pan2.li--- via Gcc-patches
From: Pan Li 

Fix the bug of the incorrect code generation for the
below code sample.

typedef unsigned short __attribute__((__vector_size__ (32))) V;
typedef unsigned short u16;

void
foo (V m, u16 *ret)
{
  V v = 6 > ((V) { 2049, 8 } & m);
  *ret = v[0]; // + a + b + c + d;
}

Before this patch.
addisp,sp,-64
ld  a5,0(a0)
li  a4,528384
addia4,a4,-2047
and a5,a5,a4
// sllia5,a5,48 <- eliminated by mistake
// srlia5,a5,48 <- eliminated by mistake
sltiu   a5,a5,6
negwa5,a5
sh  a5,0(a1)

After this patch.
addisp,sp,-64
ld  a5,0(a0)
li  a4,528384
addia4,a4,-2047
and a5,a5,a4
sllia5,a5,48
srlia5,a5,48
sltiu   a5,a5,6
negwa5,a5
sh  a5,0(a1)

The simplify_comparation for the AND operation will
try to simplify below RTL code from:
(and:DI (subreg:DI (reg:HI 154) 0) (const_int 0x801))
to:
(subreg:DI (and (reg:HI 154) (const_int 0x801)) 0)

If reg:HI 154 is 0x801 and reg:DI 154 is 0x80801, the RTL will
be simplified continuely to:
(subreg:DI (reg:HI 154) 0)

That will loss the chance to clean the upper bits of the
reg:DI 154, which result in the slli/srli to be eliminated. This
patch will try 2 times when simplify_gen_binary for both the
reg:HI 154 and the reg:DI 154, and only perform the operation if
the returned simplified RTX equals.

PR 109040

gcc/ChangeLog:

* combine.cc (simplify_comparison):

gcc/testsuite/ChangeLog:

* gcc.dg/pr109040.c: New test.

Signed-off-by: Pan Li 
Co-authored-by: Hongtao Liu 
---
 gcc/combine.cc| 14 +++---
 gcc/testsuite/gcc.target/riscv/pr109040.c | 14 ++
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr109040.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..7a62c95ddc8 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12681,10 +12681,18 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
rtx *pop1)
  && c1 != mask
  && c1 != GET_MODE_MASK (tmode))
{
- op0 = simplify_gen_binary (AND, tmode,
-SUBREG_REG (XEXP (op0, 0)),
+ rtx op0_exp0 = XEXP (op0, 0);
+ machine_mode op0_exp0_mode = GET_MODE (op0_exp0);
+ rtx op0_subreg = simplify_gen_binary (AND, tmode,
+SUBREG_REG (op0_exp0),
 gen_int_mode (c1, tmode));
- op0 = gen_lowpart (mode, op0);
+ rtx op0_reg = simplify_gen_binary (AND, GET_MODE (op0_exp0),
+op0_exp0,
+gen_int_mode (c1, op0_exp0_mode));
+ if (!rtx_equal_p (op0_subreg, op0_reg))
+   break;
+
+ op0 = gen_lowpart (mode, op0_reg);
  continue;
}
}
diff --git a/gcc/testsuite/gcc.target/riscv/pr109040.c 
b/gcc/testsuite/gcc.target/riscv/pr109040.c
new file mode 100644
index 000..079b2156364
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr109040.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O2 -fno-schedule-insns 
-fno-schedule-insns2" } */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+typedef unsigned short u16;
+
+void
+foo (V m, u16 *ret)
+{
+  V v = 6 > ((V) { 2049, 8 } & m);
+  *ret = v[0];
+}
+
+/* { dg-final { scan-assembler-times 
{slli\s+a[0-9]+,\s*a[0-9]+,\s*48\s+srli\s+a[0-9]+,\s*a[0-9]+,\s*48} 1 } } */
-- 
2.34.1



PING: Re: [PATCH] json: preserve key-insertion order [PR109163]

2023-03-24 Thread David Malcolm via Gcc-patches
I'd like to ping the following patch for review:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html

Thanks
Dave

On Fri, 2023-03-17 at 16:53 -0400, David Malcolm wrote:
> PR other/109163 notes that when we write out JSON files, we traverse
> the keys within each object via hash_map iteration, and thus the
> ordering is non-deterministic - it can arbitrarily vary from run to
> run and from different machines, making it harder for users to
> compare
> results and determine if anything has "really" changed.
> 
> I'm running into this issue with SARIF output, but there are several
> places where we're currently emitting JSON:
> 
>   * -fsave-optimization-record emits SRCFILE.opt-record.json.gz
> "This option is experimental and the format of the data
> within
> the compressed JSON file is subject to change."; see
> optinfo-emit-json.{h,cc}, dumpfile.cc, etc
>   * -fdiagnostics-format= with the various "sarif" and "json" options
>   * -fdump-analyzer-json is a developer option in the analyzer
>   * gcov has:
>  "-j, --json-format: Output JSON intermediate format into
>  .gcov.json.gz file"
> 
> This patch adds an auto_vec to class json::object to preserve
> key-insertion order, and use it when writing out objects. 
> Potentially
> this slightly slows down JSON output, but I believe that this isn't
> normally a bottleneck, and that the benefits to the user of
> deterministic output are worth it.
> 
> I had first attempted to use ordered_hash_map.h for this, but ran
> into
> impenetrable template errors, so this patch uses a simpler approach
> of
> just adding an auto_vec to json::object.
> 
> Testing showed a failure of diagnostic-format-json-5.c, which was
> using
> a convoluted set of regexps to consume the output; I believe that
> this
> was brittle, and was intermittently failing for some of the random
> orderings of output.  I rewrote these regexps to work with the
> expected
> output order.  The other such tests seem to pass with the
> now-deterministic orderings.
> 
> Lightly tested with valgrind.
> I manually verified that the SARIF output is now deterministic.
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
> PR other/109163
> * json.cc: Update comments to indicate that we now preserve
> insertion order of keys within objects.
> (object::print): Traverse keys in insertion order.
> (object::set): Preserve insertion order of keys.
> (selftest::test_writing_objects): Add an additional key to
> verify
> that we preserve insertion order.
> * json.h (object::m_keys): New field.
> 
> gcc/testsuite/ChangeLog:
> PR other/109163
> * c-c++-common/diagnostic-format-json-1.c: Update comment.
> * c-c++-common/diagnostic-format-json-2.c: Likewise.
> * c-c++-common/diagnostic-format-json-3.c: Likewise.
> * c-c++-common/diagnostic-format-json-4.c: Likewise.
> * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps.
> * c-c++-common/diagnostic-format-json-stderr-1.c: Update
> comment.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/json.cc   |  40 ---
>  gcc/json.h    |  10 +-
>  .../c-c++-common/diagnostic-format-json-1.c   |   3 +-
>  .../c-c++-common/diagnostic-format-json-2.c   |   3 +-
>  .../c-c++-common/diagnostic-format-json-3.c   |   3 +-
>  .../c-c++-common/diagnostic-format-json-4.c   |   3 +-
>  .../c-c++-common/diagnostic-format-json-5.c   | 100 ++--
> --
>  .../diagnostic-format-json-stderr-1.c |   3 +-
>  8 files changed, 95 insertions(+), 70 deletions(-)
> 
> diff --git a/gcc/json.cc b/gcc/json.cc
> index 01879c9c466..741e97b20e5 100644
> --- a/gcc/json.cc
> +++ b/gcc/json.cc
> @@ -31,8 +31,11 @@ using namespace json;
>  /* class json::value.  */
>  
>  /* Dump this json::value tree to OUTF.
> -   No formatting is done.  There are no guarantees about the order
> -   in which the key/value pairs of json::objects are printed.  */
> +
> +   No formatting is done.
> +
> +   The key/value pairs of json::objects are printed in the order
> +   in which the keys were originally inserted.  */
>  
>  void
>  value::dump (FILE *outf) const
> @@ -44,7 +47,7 @@ value::dump (FILE *outf) const
>  }
>  
>  /* class json::object, a subclass of json::value, representing
> -   an unordered collection of key/value pairs.  */
> +   an ordered collection of key/value pairs.  */
>  
>  /* json:object's dtor.  */
>  
> @@ -62,14 +65,17 @@ object::~object ()
>  void
>  object::print (pretty_printer *pp) const
>  {
> -  /* Note that the order is not guaranteed.  */
>    pp_character (pp, '{');
> -  for (map_t::iterator it = m_map.begin (); it != m_map.end ();
> ++it)
> +
> +  /* Iterate in the order that the keys were inserted.  */
> +  unsigned i;
> +  const char *key;
> +  FOR_EACH_VEC_ELT (m_keys, i, key)
>  {
> -

Re: [PATCH v2] C, ObjC: Add -Wunterminated-string-initialization

2023-03-24 Thread David Malcolm via Gcc-patches
On Fri, 2023-03-24 at 14:39 +0100, Alejandro Colomar via Gcc-patches
wrote:
> Warn about the following:
> 
>     char  s[3] = "foo";
> 
> Initializing a char array with a string literal of the same length as
> the size of the array is usually a mistake.  Rarely is the case where
> one wants to create a non-terminated character sequence from a string
> literal.
> 
> In some cases, for writing faster code, one may want to use arrays
> instead of pointers, since that removes the need for storing an array
> of
> pointers apart from the strings themselves.
> 
>     char  *log_levels[]   = { "info", "warning", "err" };
> vs.
>     char  log_levels[][7] = { "info", "warning", "err" };
> 
> This forces the programmer to specify a size, which might change if a
> new entry is later added.  Having no way to enforce null termination
> is
> very dangerous, however, so it is useful to have a warning for this,
> so
> that the compiler can make sure that the programmer didn't make any
> mistakes.  This warning catches the bug above, so that the programmer
> will be able to fix it and write:
> 
>     char  log_levels[][8] = { "info", "warning", "err" };
> 
> This warning already existed as part of -Wc++-compat, but this patch
> allows enabling it separately.  It is also included in -Wextra, since
> it may not always be desired (when unterminated character sequences
> are
> wanted), but it's likely to be desired in most cases.
> 
> Link:
> 
> Link:
> 
> Link:
>  3...@gmail.com/T/>
> Acked-by: Doug McIlroy 
> Cc: "G. Branden Robinson" 
> Cc: Ralph Corderoy 
> Cc: Dave Kemper 
> Cc: Larry McVoy 
> Cc: Andrew Pinski 
> Cc: Jonathan Wakely 
> Cc: Andrew Clayton 
> Cc: Martin Uecker 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> Hi,

Hi Alex, thanks for the patch.

> 
> I sent v1 to the wrong list.  This time I've made sure to write to
> gcc-patches@.

Note that right now we're deep in bug-fixing/stabilization for GCC 13
(and trunk is still tracking that effort), so your patch might be more
suitable for GCC 14.

> 
> v2 adds some draft of a test, as suggested by Martin.  However, I
> don't
> know yet how to write those, so the test is just a draft.  But I did
> test the feature, by compiling GCC and compiling some small program
> with
> it.

Unfortunately the answer to the question "how do I run just one
testcase in GCC's testsuite" is rather non-trivial; FWIW I've written
up some notes on working with the GCC testsuite here:
https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html

Hope this is helpful
Dave


[...snip...]



[PATCH] PR tree-optimization/109274 - Don't interpret contents of a value_relation record.

2023-03-24 Thread Andrew MacLeod via Gcc-patches
Before floating point relations were added, we tried to sanitize 
value-relation records to not include non-sensensical records... ie x != 
x or x < x.   Instead, we made a VREL_VARYING record with no operands.


When floating point relation support was added, some of these were no 
longer non-sensical, AND we expanded the use of value_relation records 
into GORI shortly thereafter.


As a result, this sanitization is no longer needed, nor desired. The 
Oracle does not create records with op1 == op2 already, so its only 
within GORI that these records can exist, and we shouldn't try to 
interpret them.


The bug occurs because the "sanitized" records doesn't set op1 and op2, 
and changes the relation to VARYING..  and we expected the operands it 
to be set the way they were specified.  We should not be setting a 
VREL_VARYING record if asked to set something else.  In fact, we are 
missing some opportunities because we are trying to FP range-ops that 
op1 != op1  but its getting transformed into a VREL_VARYING record and 
not communicated properly.


Currently bootstrapping on x86_64-pc-linux-gnu and assuming no 
regressions, OK for trunk?


Andrew
commit 1f02961b23976d35b10e2399708c6eb00632f9d6
Author: Andrew MacLeod 
Date:   Fri Mar 24 09:18:33 2023 -0400

Don't interpret contents of a value_relation record.

before floating point relations were added, we tried to sanitize
value-relation records to not include non-sensensical records... ie
x != x or x < x.   INstead, we made a VREL_VARYING record with no
operands.

When floating point relations were supported, some of these were no
longer non-sensical, AND we expanded the use of value_relation records
into GORI.

As a result, this sanitization is no longer needed.  The Oracle
does not create records with op1 == op2, so its only within GORI
that these records can exist, and we shouldnt try to interpret them.

The bug occurs because the "sanitized" records doesnt set op1 anmd op2,
but we have a record so expected it to be set.

PR tree-optimization/109265
PR tree-optimization/109274
gcc/
* value-relation.h (value_relation::set_relation): Always create the
record that is requested.

gcc/testsuite/
* gcc.dg/pr109274.c: New.

diff --git a/gcc/testsuite/gcc.dg/pr109274.c b/gcc/testsuite/gcc.dg/pr109274.c
new file mode 100644
index 000..5dbc0232f8e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109274.c
@@ -0,0 +1,16 @@
+/* PR tree-optimization/109274 */
+/* { dg-do compile } */
+/* { dg-options "-O2 " } */
+
+float a, b, c;
+int d;
+float bar (void);
+
+void
+foo (void)
+{
+  a = 0 * -(2.0f * c);
+  d = a != a ? 0 : bar ();
+  b = c;
+}
+
diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index 36a75862cc7..3177ecb1ad0 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -445,13 +445,6 @@ value_relation::set_relation (relation_kind r, tree n1, tree n2)
 {
   gcc_checking_assert (TREE_CODE (n1) == SSA_NAME
 		   && TREE_CODE (n2) == SSA_NAME);
-  if (n1 == n2 && r != VREL_EQ)
-{
-  related = VREL_VARYING;
-  name1 = NULL_TREE;
-  name2 = NULL_TREE;
-  return;
-}
   related = r;
   name1 = n1;
   name2 = n2;


Re: [PATCH] PR tree-optimization/109274 - Don't interpret contents of a value_relation record.

2023-03-24 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 24, 2023 at 11:08:54AM -0400, Andrew MacLeod wrote:
> Before floating point relations were added, we tried to sanitize
> value-relation records to not include non-sensensical records... ie x != x
> or x < x.   Instead, we made a VREL_VARYING record with no operands.
> 
> When floating point relation support was added, some of these were no longer
> non-sensical, AND we expanded the use of value_relation records into GORI
> shortly thereafter.
> 
> As a result, this sanitization is no longer needed, nor desired. The Oracle
> does not create records with op1 == op2 already, so its only within GORI
> that these records can exist, and we shouldn't try to interpret them.
> 
> The bug occurs because the "sanitized" records doesn't set op1 and op2, and
> changes the relation to VARYING..  and we expected the operands it to be set
> the way they were specified.  We should not be setting a VREL_VARYING record
> if asked to set something else.  In fact, we are missing some opportunities
> because we are trying to FP range-ops that op1 != op1  but its getting
> transformed into a VREL_VARYING record and not communicated properly.
> 
> Currently bootstrapping on x86_64-pc-linux-gnu and assuming no regressions,
> OK for trunk?
> 
> Andrew

> commit 1f02961b23976d35b10e2399708c6eb00632f9d6
> Author: Andrew MacLeod 
> Date:   Fri Mar 24 09:18:33 2023 -0400
> 
> Don't interpret contents of a value_relation record.
> 
> before floating point relations were added, we tried to sanitize
> value-relation records to not include non-sensensical records... ie
> x != x or x < x.   INstead, we made a VREL_VARYING record with no

s/IN/In/

> operands.
> 
> When floating point relations were supported, some of these were no
> longer non-sensical, AND we expanded the use of value_relation records
> into GORI.
> 
> As a result, this sanitization is no longer needed.  The Oracle
> does not create records with op1 == op2, so its only within GORI
> that these records can exist, and we shouldnt try to interpret them.

s/shouldnt/shouldn't/
> 
> The bug occurs because the "sanitized" records doesnt set op1 anmd op2,

s/doesnt/doesn't/

> but we have a record so expected it to be set.
> 
> PR tree-optimization/109265
> PR tree-optimization/109274
> gcc/
> * value-relation.h (value_relation::set_relation): Always create 
> the
> record that is requested.
> 
> gcc/testsuite/
> * gcc.dg/pr109274.c: New.

LGTM, indeed with floating point  a != a isn't nonsensical but basically
__builtin_isnan (a) check.

I'll commit the Fortran testcase I've added in my version of the patch
incrementally when you commit.

Jakub



[PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-03-24 Thread Frederik Harwath
Hi,
this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.  The "unroll" and "tile" directives are
internally implemented as clauses.  This fits the representation of
collapsed loop nests by a single internal gomp_for construct.  Loop
transformations can be applied to loops at the different levels of
such a loop nest and this can be represented well with the clause
representation.  The transformations can also be applied to loops
which are not going to be associated with any OpenMP directive after
the transformation. This is represented by a new gomp_for kind.  Loops
of this kind are lowered in the transformation pass since they are not
subject to any further OpenMP-specific processing.

The patches are roughly presented in the order of their development:
Each construct is implemented in the Fortran front end first including
the middle-end additions/changes, followed by a patch that adds the C
and C++ front end changes.  This initial implementation supports the
loop transformation constructs on the outermost loop of a loop nest
only.  The support for applying the transformations to inner loops is
then added in two further patches.

The patches have been bootstrapped and tested on x86_64-linux-gnu with
both nvptx-none and amdgcn-amdhsa offloading.

Best regards,
Frederik

Frederik Harwath (7):
  openmp: Add Fortran support for "omp unroll" directive
  openmp: Add C/C++ support for "omp unroll" directive
  openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE
  openmp: Add Fortran support for "omp tile"
  openmp: Add C/C++ support for "omp tile"
  openmp: Add Fortran support for loop transformations on inner loops
  openmp: Add C/C++ support for loop transformations on inner loops

 gcc/Makefile.in   |1 +
 gcc/c-family/c-gimplify.cc|1 +
 gcc/c-family/c-omp.cc |   12 +-
 gcc/c-family/c-pragma.cc  |2 +
 gcc/c-family/c-pragma.h   |7 +-
 gcc/c/c-parser.cc |  403 +++-
 gcc/c/c-typeck.cc |   10 +-
 gcc/cp/cp-gimplify.cc |3 +
 gcc/cp/parser.cc  |  453 -
 gcc/cp/pt.cc  |   15 +-
 gcc/cp/semantics.cc   |  104 +-
 gcc/fortran/dump-parse-tree.cc|   30 +
 gcc/fortran/gfortran.h|   12 +-
 gcc/fortran/match.h   |2 +
 gcc/fortran/openmp.cc |  460 -
 gcc/fortran/parse.cc  |   52 +-
 gcc/fortran/resolve.cc|6 +
 gcc/fortran/st.cc |2 +
 gcc/fortran/trans-openmp.cc   |  187 +-
 gcc/fortran/trans.cc  |2 +
 gcc/gimple-pretty-print.cc|6 +
 gcc/gimple.h  |1 +
 gcc/gimplify.cc   |   79 +-
 gcc/omp-general.cc|   22 +-
 gcc/omp-general.h |1 +
 gcc/omp-low.cc|6 +-
 gcc/omp-transform-loops.cc| 1773 +
 gcc/params.opt|9 +
 gcc/passes.def|1 +
 .../loop-transforms/imperfect-loop-nest.c |   12 +
 .../gomp/loop-transforms/tile-1.c |  164 ++
 .../gomp/loop-transforms/tile-2.c |  183 ++
 .../gomp/loop-transforms/tile-3.c |  117 ++
 .../gomp/loop-transforms/tile-4.c |  322 +++
 .../gomp/loop-transforms/tile-5.c |  150 ++
 .../gomp/loop-transforms/tile-6.c |   34 +
 .../gomp/loop-transforms/tile-7.c |   31 +
 .../gomp/loop-transforms/tile-8.c |   40 +
 .../gomp/loop-transforms/unroll-1.c   |  133 ++
 .../gomp/loop-transforms/unroll-2.c   |   95 +
 .../gomp/loop-transforms/unroll-3.c   |   18 +
 .../gomp/loop-transforms/unroll-4.c   |   19 +
 .../gomp/loop-transforms/unroll-5.c   |   19 +
 .../gomp/loop-transforms/unroll-6.c   |   20 +
 .../gomp/loop-transforms/unroll-7.c   |  144 ++
 .../gomp/loop-transforms/unroll-inner-1.c |   15 +
 .../gomp/loop-transforms/unroll-inner-2.c |   31 +
 .../gomp/loop-transforms/unroll-non-rect-1.c  |   37 +
 .../gomp/loop-transforms/unroll-non-rect-2.c  |   22 +
 .../gomp/loop-transforms/unroll-simd-1.c  |   84 +
 .../g++.dg/gomp/loop-transforms/tile-1.h  |   27 +
 .../g++.dg/gomp/loop-transforms/tile-1a.C |   27 +
 .../g++.dg/gomp/loop-transforms/tile-1b.C |   27 +
 .../g++.dg/gomp/loop-transforms/unroll-1.C  

[PATCH 2/7] openmp: Add C/C++ support for "omp unroll" directive

2023-03-24 Thread Frederik Harwath
This commit implements the C and the C++ front end changes to support
the "omp unroll" directive.  The execution of the loop transformation
relies on the pass that has been added as a part of the earlier
Fortran patch.

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_UNROLL.
* c-omp.cc: Add "unroll" to omp_directives[].
* c-pragma.cc: Add "unroll" to omp_pragmas_simd[].
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_UNROLL to
pragma_kind and adjust PRAGMA_OMP__LAST_.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_name): Handle "full" and
"partial" clauses.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(c_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(c_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL
and PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(c_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(c_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise.
* parser.cc (cp_parser_omp_clause_name): Handle "full" clause.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(cp_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(cp_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_UNROLL and
OMP_CLAUSE_FULL.
(cp_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(cp_parser_omp_for_loop): Handle "omp unroll" directives
between directive and loop.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(cp_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.

(cp_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
(cp_parser_pragma): Handle PRAGMA_OMP_UNROLL.
* pt.cc (tsubst_omp_clauses): Handle
OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_NONE.
(tsubst_expr): Handle OMP_UNROLL.
* semantics.cc (finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.

libgomp/ChangeLog:

* testsuite/libgomp.c++/loop-transforms/unroll-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/unroll-2.C: New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-3.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-4.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-5.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-6.c: New test.
* g++.dg/gomp/loop-transforms/unroll-1.C: New test.
* g++.dg/gomp/loop-transforms/unroll-2.C: New test.
* g++.dg/gomp/loop-transforms/unroll-3.C: New test.
---
 gcc/c-family/c-gimplify.cc|   1 +
 gcc/c-family/c-omp.cc |   6 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   5 +-
 gcc/c/c-parser.cc | 161 -
 gcc/c/c-typeck.cc |   8 +
 gcc/cp/cp-gimplify.cc |   3 +
 gcc/cp/parser.cc  | 164 +-
 gcc/cp/pt.cc  |   4 +
 gcc/cp/semantics.cc   |  56 ++
 .../gomp/loop-transforms/unroll-1.c   | 133 ++
 .../gomp/loop-transforms/unroll-2.c   |  99 +++
 .../gomp/loop-transforms/unroll-3.c   |  18 ++
 .../gomp/loop-transforms/unroll-4.c   |  19 ++
 .../gomp/loop-transforms/unroll-5.c   |  19 ++
 .../gomp/loop-transforms/unroll-6.c   |  20 +++
 .../gomp/loop-transforms/unroll-7.c   | 14

[PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE

2023-03-24 Thread Frederik Harwath
OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation
construct "omp tile".

gcc/ChangeLog:

* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_LIST): Rename to ...
(OMP_CLAUSE_OACC_TILE_LIST): ... this.
(OMP_CLAUSE_TILE_ITERVAR): Rename to ...
(OMP_CLAUSE_OACC_TILE_ITERVAR): ... this.
(OMP_CLAUSE_TILE_COUNT): Rename to ...
(OMP_CLAUSE_OACC_TILE_COUNT): this.
* gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_for): Likewise.
* omp-general.cc (omp_extract_for_data): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_oacc_head_mark): Likewise.
* tree-nested.cc (convert_nonlocal_omp_clauses): Likewise.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.cc: Likewise.

gcc/c-family/ChangeLog:

* c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings.
(c_parser_oacc_clause_tile): Likewise.
(c_parser_omp_for_loop): Likewise.
* c-typeck.cc (c_finish_omp_clauses): Likewise.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings.
(cp_parser_omp_clause_collapse): Likewise.
(cp_parser_omp_for_loop): Likewise.
* pt.cc (tsubst_omp_clauses): Likewise.
* semantics.cc (finish_omp_clauses): Likewise.
(finish_omp_for): Likewise.

gcc/fortran/ChangeLog:

* openmp.cc (enum omp_mask2): Adjust to renamings.
(gfc_match_omp_clauses): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
---
 gcc/c-family/c-omp.cc   |  2 +-
 gcc/c/c-parser.cc   | 12 ++--
 gcc/c/c-typeck.cc   |  2 +-
 gcc/cp/parser.cc| 12 ++--
 gcc/cp/pt.cc|  2 +-
 gcc/cp/semantics.cc |  8 
 gcc/fortran/openmp.cc   |  6 +++---
 gcc/fortran/trans-openmp.cc |  4 ++--
 gcc/gimplify.cc |  8 
 gcc/omp-general.cc  |  8 
 gcc/omp-low.cc  |  6 +++---
 gcc/tree-core.h |  2 +-
 gcc/tree-nested.cc  |  4 ++--
 gcc/tree-pretty-print.cc|  4 ++--
 gcc/tree.cc |  2 +-
 gcc/tree.h  | 12 ++--
 16 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 85ba9c528c8..fec7f337772 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -1749,7 +1749,7 @@ c_oacc_split_loop_clauses (tree clauses, tree 
*not_loop_clauses,
 {
  /* Loop clauses.  */
case OMP_CLAUSE_COLLAPSE:
-   case OMP_CLAUSE_TILE:
+   case OMP_CLAUSE_OACC_TILE:
case OMP_CLAUSE_GANG:
case OMP_CLAUSE_WORKER:
case OMP_CLAUSE_VECTOR:
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9d875befccc..e7c9da99552 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14183,7 +14183,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tree 
list)
   location_t loc;

   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");

   loc = c_parser_peek_token (parser)->location;
   matching_parens parens;
@@ -15349,7 +15349,7 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   location_t loc;
   tree tile = NULL_TREE;

-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");

   loc = c_parser_peek_token (parser)->location;
@@ -15401,9 +15401,9 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   /* Consume the trailing ')'.  */
   c_parser_consume_token (parser);

-  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  c = build_omp_clause (loc, OMP_CLAUSE_OACC_TILE);
   tile = nreverse (tile);
-  OMP_CLAUSE_TILE_LIST (c) = tile;
+  OMP_CLAUSE_OACC_TILE_LIST (c) = tile;
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -20270,10 +20270,10 @@ c_parser_omp_for_loop (location_t loc, c_parser 
*parser, enum tree_code code,
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
-else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
   {
tiling = true;
-   collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+   collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
   }
 else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
 && OMP_CLAUSE_ORDERED_EXPR (cl))
diff --git a/gcc/

[og12] In 'libgomp/target.c:gomp_unmap_vars_internal', defer 'gomp_remove_var' (was: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0)

2023-03-24 Thread Thomas Schwinge
Hi!

On 2020-12-04T22:15:46+0800, Chung-Lin Tang  wrote:
> this is a new version of the structure element mapping patch for OpenMP 5.0 
> requirement
> changes.

>   (gomp_exit_data): [...]
>   adjust to queue splay-tree keys for removal
>   after main loop.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -2485,14 +2714,17 @@ gomp_exit_data (struct gomp_device_descr *devicep, 
> size_t mapnum,

> +  int nrmvars = 0;
> +  splay_tree_key remove_vars[mapnum];
> +
>for (i = 0; i < mapnum; i++)
>  {

> -   if (k->refcount == 0)
> - gomp_remove_var (devicep, k);
> +
> +   /* Structure elements lists are removed altogether at once, which
> +  may cause immediate deallocation of the target_mem_desc, causing
> +  errors if we still have following element siblings to copy back.
> +  While we're at it, it also seems more disciplined to simply
> +  queue all removals together for processing below.
> +
> +  Structured block unmapping (i.e. gomp_unmap_vars_internal) should
> +  not have this problem, since they maintain an additional
> +  tgt->refcount = 1 reference to the target_mem_desc to start with.
> +   */
> +   if (do_remove)
> + remove_vars[nrmvars++] = k;

>  }
>
> +  for (int i = 0; i < nrmvars; i++)
> +gomp_remove_var (devicep, remove_vars[i]);
> +
>gomp_mutex_unlock (&devicep->lock);
>  }

Upcoming work of mine actually now does require this change also for
'gomp_unmap_vars_internal', such that 'gomp_remove_var' be deferred until
after all 'gomp_copy_dev2host' calls have been handled.
I've pushed to devel/omp/gcc-12
commit 65037818987ffce7d6f466fa8bde13e9f59a3218
"In 'libgomp/target.c:gomp_unmap_vars_internal', defer 'gomp_remove_var'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 65037818987ffce7d6f466fa8bde13e9f59a3218 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 14 Mar 2023 19:42:12 +0100
Subject: [PATCH] In 'libgomp/target.c:gomp_unmap_vars_internal', defer
 'gomp_remove_var'

An upcoming change requires that 'gomp_remove_var' be deferred until after all
'gomp_copy_dev2host' calls have been handled.

Do this likewise to how commit 275c736e732d29934e4d22e8f030d5aae8c12a52
"libgomp: Structure element mapping for OpenMP 5.0" changed 'gomp_exit_data'.

	libgomp/
	* target.c (gomp_unmap_vars_internal): Queue splay-tree keys for
	removal after main loop.
---
 libgomp/ChangeLog.omp |  3 +++
 libgomp/target.c  | 34 +++---
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 85ebab14ba8..9360db66b03 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@
 2023-03-24  Thomas Schwinge  
 
+	* target.c (gomp_unmap_vars_internal): Queue splay-tree keys for
+	removal after main loop.
+
 	PR other/76739
 	* oacc-parallel.c (GOACC_parallel_keyed): Given OpenACC 'async',
 	defer 'free' of non-contiguous array support data structures.
diff --git a/libgomp/target.c b/libgomp/target.c
index aaa597f6610..107c3567a30 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2180,6 +2180,9 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
 			 false, NULL);
 }
 
+  size_t nrmvars = 0;
+  splay_tree_key remove_vars[tgt->list_count];
+
   for (i = 0; i < tgt->list_count; i++)
 {
   splay_tree_key k = tgt->list[i].key;
@@ -2201,16 +2204,21 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
 			(void *) (k->tgt->tgt_start + k->tgt_offset
   + tgt->list[i].offset),
 			tgt->list[i].length);
+  /* Queue all removals together for processing below.
+	 See also 'gomp_exit_data'.  */
   if (do_remove)
-	{
-	  struct target_mem_desc *k_tgt = k->tgt;
-	  bool is_tgt_unmapped = gomp_remove_var (devicep, k);
-	  /* It would be bad if TGT got unmapped while we're still iterating
-	 over its LIST_COUNT, and also expect to use it in the following
-	 code.  */
-	  assert (!is_tgt_unmapped
-		  || k_tgt != tgt);
-	}
+	remove_vars[nrmvars++] = k;
+}
+
+  for (i = 0; i < nrmvars; i++)
+{
+  splay_tree_key k = remove_vars[i];
+  struct target_mem_desc *k_tgt = k->tgt;
+  bool is_tgt_unmapped = gomp_remove_var (devicep, k);
+  /* It would be bad if TGT got unmapped while we're still iterating over
+	 its LIST_COUNT, and also expect to use it in the following code.  */
+  assert (!is_tgt_unmapped
+	  || k_tgt != tgt);
 }
 
   if (aq)
@@ -4157,7 +4165,7 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 			   false, NULL);
   }
 
-  int nrmvars = 0;
+  size_t nrmvars = 0;
   splay_tre

[og12] libgomp: Simplify OpenMP reverse offload host <-> device memory copy implementation (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)

2023-03-24 Thread Thomas Schwinge
Hi!

On 2023-03-21T16:53:31+0100, I wrote:
> On 2022-08-26T11:07:28+0200, Tobias Burnus  wrote:
>> This patch adds initial [OpenMP reverse offload] support for nvptx.
>
>> CUDA does lockup when trying to copy data from the currently running
>> stream; hence, a new stream is generated to do the memory copying.
>
> As part of other work, where I had to touch those special code paths, I
> found that we may reduce complexity a little bit "by using the existing
> 'goacc_asyncqueue' instead of re-coding parts of it".  OK to push
> "libgomp: Simplify OpenMP reverse offload host <-> device memory copy 
> implementation"
> (still testing), see attached?

My other work now actually does depend on this; I've pushed to
devel/omp/gcc-12 branch commit c276fa0616eb79ddc4d0245e775a841e84cbb7dd
"libgomp: Simplify OpenMP reverse offload host <-> device memory copy 
implementation",
see attached.

May this also go into master branch still at this time, or "next year"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c276fa0616eb79ddc4d0245e775a841e84cbb7dd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 21 Mar 2023 16:14:16 +0100
Subject: [PATCH] libgomp: Simplify OpenMP reverse offload host <-> device
 memory copy implementation

... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.

Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8
"libgomp: Handle OpenMP's reverse offloads".

	libgomp/
	* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
	'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
	* libgomp.h (gomp_target_rev): Adjust.
	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
	* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
	* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
	(rev_off_host_to_dev_cpy): Remove.
	(GOMP_OFFLOAD_run): Adjust.
---
 libgomp/ChangeLog.omp |  10 
 libgomp/libgomp-plugin.c  |   7 +--
 libgomp/libgomp-plugin.h  |   6 +-
 libgomp/libgomp.h |   5 +-
 libgomp/plugin/plugin-gcn.c   |   2 +-
 libgomp/plugin/plugin-nvptx.c |  77 ++---
 libgomp/target.c  | 102 +++---
 7 files changed, 106 insertions(+), 103 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 9360db66b03..fb352b39a6d 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,15 @@
 2023-03-24  Thomas Schwinge  
 
+	* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
+	'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
+	* libgomp.h (gomp_target_rev): Adjust.
+	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
+	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
+	* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
+	* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
+	(rev_off_host_to_dev_cpy): Remove.
+	(GOMP_OFFLOAD_run): Adjust.
+
 	* target.c (gomp_unmap_vars_internal): Queue splay-tree keys for
 	removal after main loop.
 
diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
index 316de749f69..c76fa63da83 100644
--- a/libgomp/libgomp-plugin.c
+++ b/libgomp/libgomp-plugin.c
@@ -82,11 +82,8 @@ GOMP_PLUGIN_fatal (const char *msg, ...)
 void
 GOMP_PLUGIN_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t devaddrs_ptr,
 			uint64_t sizes_ptr, uint64_t kinds_ptr, int dev_num,
-			void (*dev_to_host_cpy) (void *, const void *, size_t,
-		 void *),
-			void (*host_to_dev_cpy) (void *, const void *, size_t,
-		 void *), void *token)
+			struct goacc_asyncqueue *aq)
 {
   gomp_target_rev (fn_ptr, mapnum, devaddrs_ptr, sizes_ptr, kinds_ptr, dev_num,
-		   dev_to_host_cpy, host_to_dev_cpy, token);
+		   aq);
 }
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 66d995f33e8..ca557a79380 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -122,11 +122,7 @@ extern void GOMP_PLUGIN_fatal (const char *, ...)
 	__attribute__ ((noreturn, format (printf, 1, 2)));
 
 extern void GOMP_PLUGIN_target_rev (uint64_t, uint64_t, uint64_t, uint64_t,
-uint64_t, int,
-void (*) (void *, const void *, size_t,
-	  void *),
-void (*) (void *, const void *, size_t,
-	  void *), void *);
+uint64_t, int, struct goacc_asyncqueue *);
 
 /* Prototypes for functions implemented by libgomp plugins.  */
 extern const char *GOMP_OFFLOAD_get_name (void);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 92f6f14960f..3b2b4aa9534 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1127,10 +1127,7 @@ extern void gomp_init_

Re: [PATCH, OpenACC, v3] Non-contiguous array support for OpenACC data clauses

2023-03-24 Thread Thomas Schwinge
Hi!

On 2023-03-15T15:47:47+0100, I wrote:
> On 2019-11-26T22:49:21+0800, Chung-Lin Tang  wrote:
>> this is a reorg of the last non-contiguous arrays patch.
>
> (Sorry, this is still not the master branch integration email...)
>
>
> I noticed the following while working on something else:
>
>> --- libgomp/oacc-parallel.c   (revision 278656)
>> +++ libgomp/oacc-parallel.c   (working copy)
>
>> @@ -311,8 +488,10 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (voi
>>
>>goacc_aq aq = get_goacc_asyncqueue (async);
>>
>> -  tgt = gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes, 
>> kinds,
>> -  true, GOMP_MAP_VARS_OPENACC);
>> +  tgt = gomp_map_vars_openacc (acc_dev, aq, mapnum, hostaddrs, sizes, kinds,
>> +nca_info);
>> +  free (nca_info);
>
> Given OpenACC 'async', don't we have to defer 'free' of the
> non-contiguous array support data structure here?  But I'm not completely
> sure -- can we rule out that any asynchronous copying of any data of
> 'nca_info' is still going on after returning from 'gomp_map_vars'?
>
>> @@ -488,9 +666,19 @@ GOACC_data_start (int flags_m, size_t mapnum,
>
>> -  tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
>> -GOMP_MAP_VARS_OPENACC);
>> +  tgt = gomp_map_vars_openacc (acc_dev, NULL, mapnum, hostaddrs, sizes, 
>> kinds,
>> +nca_info);
>> +  free (nca_info);
>
> Here, it's not relevant, as there is no 'async' support (yet) for 'data'
> constructs.
>
>> --- libgomp/target.c  (revision 278656)
>> +++ libgomp/target.c  (working copy)
>
>> @@ -1044,6 +1114,98 @@ gomp_map_vars_internal (struct gomp_device_descr *
>
>> +   void *ptrblock = goacc_noncontig_array_create_ptrblock
>> + (nca, target_ptrblock);
>> +   gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock,
>> +   nca->ptrblock_size, cbufp);
>> +   free (ptrblock);
>
> Here again, however, don't we have to defer the 'free'?
>
> Please verify the attached
> "Given OpenACC 'async', defer 'free' of non-contiguous array support data 
> structures",
> in particular the 'libgomp/oacc-parallel.c:GOACC_parallel_keyed' case.

To allow me to make progress with the "something else" (depending on
this), I've now pushed to devel/omp/gcc-12
commit a1f6758ae08fa748b291954371859e0158d4d667
"Given OpenACC 'async', defer 'free' of non-contiguous array support data 
structures [PR76739]",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From a1f6758ae08fa748b291954371859e0158d4d667 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Mar 2023 13:34:02 +0100
Subject: [PATCH] Given OpenACC 'async', defer 'free' of non-contiguous array
 support data structures [PR76739]

Fix-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".

	PR other/76739
	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Given OpenACC 'async',
	defer 'free' of non-contiguous array support data structures.
	* target.c (gomp_map_vars_internal): Likewise.
---
 libgomp/ChangeLog.omp   | 7 +++
 libgomp/oacc-parallel.c | 5 -
 libgomp/target.c| 6 +-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index ace54f2f82f..85ebab14ba8 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,10 @@
+2023-03-24  Thomas Schwinge  
+
+	PR other/76739
+	* oacc-parallel.c (GOACC_parallel_keyed): Given OpenACC 'async',
+	defer 'free' of non-contiguous array support data structures.
+	* target.c (gomp_map_vars_internal): Likewise.
+
 2023-03-23  Tobias Burnus  
 
 	* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 9cd99b4a0b4..136702d6e61 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -470,7 +470,10 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
 = goacc_map_vars (acc_dev, aq, mapnum, hostaddrs, NULL, sizes, kinds,
 		  nca_info, true, GOMP_MAP_VARS_TARGET);
 
-  free (nca_info);
+  if (aq == NULL)
+free (nca_info);
+  else
+acc_dev->openacc.async.queue_callback_func (aq, free, nca_info);
 
   if (profiling_p)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index 96ece0b31fd..aaa597f6610 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1938,7 +1938,11 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		(nca, target_ptrblock);
 		  gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock,
   nca->ptrblock_size, false, cbufp);
-		  free (ptrblock);
+		  if (aq)
+		/* Free once the transfer 

[PATCH 4/7] openmp: Add Fortran support for "omp tile"

2023-03-24 Thread Frederik Harwath
This commit implements the Fortran front end support for the "omp
tile" directive and the corresponding middle end transformation.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE.
(enum gfc_exec_op): Add EXEC_OMP_TILE.
(loop_transform_p): New declaration.
(struct gfc_omp_clauses): Add "tile_sizes" field.
* dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping.
(show_omp_node): Handle EXEC_OMP_TILE.
(show_code_node): Likewise.
* match.h (gfc_match_omp_tile): New declaration.
* openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field.
(match_tile_sizes): New function.
(OMP_TILE_CLAUSES): New macro.
(gfc_match_omp_tile): New function.
(resolve_omp_do): Handle EXEC_OMP_TILE.
(resolve_omp_tile): New function.
(omp_code_to_statement): Handle EXEC_OMP_TILE.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE
and ST_OMP_TILE.
(next_statement): Handle ST_OMP_TILE.
(gfc_ascii_statement): Likewise.
(parse_omp_do): Likewise.
(parse_executable): Likewise.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field.
(loop_transform_p): New function.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Handle EXEC_OMP_TILE.
(gfc_trans_omp_directive): Likewise.
* trans.cc (trans_code): Likewise.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_loop): Likewise.
* omp-transform-loops.cc (walk_omp_for_loops): New declaration.
(subst_var_in_op): New function.
(subst_var): New function.
(gomp_for_number_of_iterations): Adjust.
(gomp_for_iter_count_type): New function.
(gimple_assign_rhs_to_tree): New function.
(subst_defs): New function.
(gomp_for_uncollapse): Adjust.
(transformation_clause_p): Add OMP_CLAUSE_TILE.
(tile): New function.
(transform_gomp_for): Handle OMP_CLAUSE_TILE.
(optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
* omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TILE.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
* tree.cc: Add OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_SIZES): New macro.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc|  17 +-
 gcc/fortran/gfortran.h|   7 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 373 +-
 gcc/fortran/parse.cc  |  15 +
 gcc/fortran/resolve.cc|   3 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-openmp.cc   |  86 ++--
 gcc/fortran/trans.cc  |   1 +
 gcc/gimplify.cc   |   3 +
 gcc/omp-general.cc|   2 +-
 gcc/omp-transform-loops.cc| 340 +++-
 .../gomp/loop-transforms/tile-1.f90   | 163 
 .../gomp/loop-transforms/tile-1a.f90  |  10 +
 .../gomp/loop-transforms/tile-2.f90   |  80 
 .../gomp/loop-transforms/tile-3.f90   |  18 +
 .../gomp/loop-transforms/tile-4.f90   |  95 +
 .../gomp/loop-transfor

[PATCH 5/7] openmp: Add C/C++ support for "omp tile"

2023-03-24 Thread Frederik Harwath
This commit adds the C and C++ front end support for the "omp tile"
directive.

gcc/c-family/ChangeLog:

* c-omp.cc (c_omp_directives): Add PRAGMA_OMP_TILE.
* c-pragma.cc (omp_pragmas_simd): Likewise.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_TILE

gcc/c/ChangeLog:

* c-parser.cc (c_parser_nested_omp_unroll_clauses): Rename and
generalize ...
(c_parser_omp_nested_loop_transform_clauses): ... to this.
(c_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(c_parser_omp_tile_sizes): Parse single "sizes" clause.
(c_parser_omp_loop_transform_clause): New function.
(c_parser_omp_tile): New function for parsing "omp tile"
(c_parser_omp_unroll): Adjust to renaming.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_unroll_partial): Adjust.
(cp_parser_nested_omp_unroll_clauses): Rename ...
(cp_parser_omp_nested_loop_transform_clauses): ... to this.
(cp_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(cp_parser_omp_tile_sizes): New function, parses single "sizes" clause
(cp_parser_omp_tile): New function for parsing "omp tile".
(cp_parser_omp_loop_transform_clause): New  function.
(cp_parser_omp_unroll): Adjust to renaming.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE.
(cp_parser_pragma): Likewise.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_TILE.
* semantics.cc (finish_omp_clauses): Likewise.

gcc/ChangeLog:

* gimplify.cc (omp_for_drop_tile_clauses): New function, ...
(gimplify_omp_for): ... used here.

libgomp/ChangeLog:

* testsuite/libgomp.c++/loop-transforms/tile-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-2.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-3.C: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/tile-1.c: New test.
* c-c++-common/gomp/loop-transforms/tile-2.c: New test.
* c-c++-common/gomp/loop-transforms/tile-3.c: New test.
* c-c++-common/gomp/loop-transforms/tile-4.c: New test.
* c-c++-common/gomp/loop-transforms/tile-5.c: New test.
* c-c++-common/gomp/loop-transforms/tile-6.c: New test.
* c-c++-common/gomp/loop-transforms/tile-7.c: New test.
* c-c++-common/gomp/loop-transforms/tile-8.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: Adapt.
* g++.dg/gomp/loop-transforms/tile-1.h: New test.
* g++.dg/gomp/loop-transforms/tile-1a.C: New test.
* g++.dg/gomp/loop-transforms/tile-1b.C: New test.
---
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   2 +
 gcc/c/c-parser.cc | 277 ---
 gcc/cp/parser.cc  | 289 +---
 gcc/cp/pt.cc  |   1 +
 gcc/cp/semantics.cc   |  40 +++
 gcc/gimplify.cc   |  28 ++
 .../gomp/loop-transforms/tile-1.c | 164 +
 .../gomp/loop-transforms/tile-2.c | 183 ++
 .../gomp/loop-transforms/tile-3.c | 117 +++
 .../gomp/loop-transforms/tile-4.c | 322 ++
 .../gomp/loop-transforms/tile-5.c | 150 
 .../gomp/loop-transforms/tile-6.c |  34 ++
 .../gomp/loop-transforms/tile-7.c |  31 ++
 .../gomp/loop-transforms/tile-8.c |  40 +++
 .../gomp/loop-transforms/unroll-2.c   |  12 +-
 .../g++.dg/gomp/loop-transforms/tile-1.h  |  27 ++
 .../g++.dg/gomp/loop-transforms/tile-1a.C |  27 ++
 .../g++.dg/gomp/loop-transforms/tile-1b.C |  27 ++
 .../libgomp.c++/loop-transforms/tile-1.C  |  52 +++
 .../libgomp.c++/loop-transforms/tile-2.C  |  69 
 .../libgomp.c++/loop-transforms/tile-3.C  |  28 ++
 23 files changed, 1823 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/tile-8.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1.h
 create mode 100644 gcc/testsuite/g++.dg/gomp/loop-transforms/tile-1a.C
 create mode 100644 gcc/testsuite/g

[og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory

2023-03-24 Thread Thomas Schwinge
Hi!

On 2022-01-04T15:32:17+, Andrew Stubbs  wrote:
> This patch implements the OpenMP pinned memory trait [...]

I figure it may be helpful to document the current og12 state of affairs;
does the attached "libgomp: Document OpenMP 'pinned' memory" look good to
you?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 35ac1fb2d37f6c33a69f85ca8bac6f6a7bd7d837 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 24 Mar 2023 15:14:57 +0100
Subject: [PATCH] libgomp: Document OpenMP 'pinned' memory

	libgomp/
	* libgomp.texi (AMD Radeon, nvptx): Document OpenMP 'pinned'
	memory.
---
 libgomp/libgomp.texi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 288e0b3a8ea..1cfae0cb8d1 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4456,6 +4456,9 @@ The implementation remark:
 @item OpenMP code that has a requires directive with @code{unified_address} or
   @code{unified_shared_memory} will remove any GCN device from the list of
   available devices (``host fallback'').
+@item OpenMP @emph{pinned} memory (@code{omp_atk_pinned},
+  @code{ompx_pinned_mem_alloc}, for example) is allocated not via
+  the device, but via @code{mmap}, @code{mlock}.
 @end itemize
 
 
@@ -4518,6 +4521,10 @@ The implementation remark:
 @item OpenMP code that has a requires directive with @code{unified_address}
   or @code{unified_shared_memory} will remove any nvptx device from the
   list of available devices (``host fallback'').
+@item OpenMP @emph{pinned} memory (@code{omp_atk_pinned},
+  @code{ompx_pinned_mem_alloc}, for example) is allocated via the
+  device, thus helping lower-overhead host <-> device data
+  transfers.
 @end itemize
 
 
-- 
2.25.1



[PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops

2023-03-24 Thread Frederik Harwath
So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops.  The transformation clauses are extended to
carry the information about the level of the loop nest at which a
transformation should be applied.  The middle end transformation pass
is adjusted to apply the transformations at the correct level of a
loop nest and to take their effect on the loop nest depth into
account.

gcc/fortran/ChangeLog:

* openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
(resolve_loop_transform_generic): Remove, and ...
(resolve_omp_unroll): ... inline and adapt here. Move function.
Move functin.
(find_nested_loop_in_block): New function.
(find_nested_loop_in_chain): New function, used ...
(is_outer_iteration_variable): ... here, and ...
(expr_is_invariant): ... here.
(resolve_omp_do): Adjust code for resolving loop transformations.
(resolve_omp_tile): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
on new clause.
(compute_transformed_depth): New function to compute the depth
("collapse") of a transformed loop nest, used
(gfc_trans_omp_do): ... here.

gcc/ChangeLog:

* omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
in comment.
(gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
(partial_unroll): Add argument for the loop nest level to be 
transformed.
(tile): Likewise.
(transform_gomp_for): Pass level to transformatoin functions.
(optimize_transformation_clauses): Handle transformation clauses for all
levels recursively.
* tree-pretty-print.cc (dump_omp_clause): Print
OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
clause operand 0.
(OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
(OMP_CLAUSE_TILE_SIZES): Likewise.

gcc/cp/ChangeLog

* parser.cc (cp_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(cp_parser_omp_clause_unroll_partial): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
(cp_parser_omp_loop_transform_clause): Likewise.
(cp_parser_omp_nested_loop_transform_clauses): Likewise.
(cp_parser_omp_unroll): Likewise.
* pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
and OMP_CLAUSE_TILE handling to changed number of operands.

gcc/c/ChangeLog

* c-parser.cc (c_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(c_parser_omp_clause_unroll_partial): Likewise.
(c_parser_omp_tile_sizes): Likewise.
(c_parser_omp_loop_transform_clause): Likewise.
(c_parser_omp_nested_loop_transform_clauses): Likewise.
(c_parser_omp_unroll): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
changed diagnostic messages.

libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/cp/parser.cc  |  12 +-
 gcc/cp/pt.cc  |  12 +-
 gcc/fortran/openmp.cc | 173 --
 gcc/fortran/trans-openmp.cc   |  74 ++--
 gcc/omp-transform-loops.cc| 1

[PATCH 7/7] openmp: Add C/C++ support for loop transformations on inner loops

2023-03-24 Thread Frederik Harwath
Add the parsing of loop transformations on inner loops of a loop-nest.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(c_parser_omp_tile): ... adjust use here,
(c_parser_omp_unroll): ... and here,
(c_parser_omp_for_loop): ... and here.  Stop treating loop
transformations like intervening code, parse them, and adjust
the loop-nest depth if necessary for tiling.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_is_pragma): New function.
(cp_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(cp_parser_omp_tile): ... adjust use here,
(cp_parser_omp_unroll): ... and here,
(cp_parser_omp_for_loop): ... and here.  Stop treating loop

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-inner-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: New test.

libgomp/ChangeLog
* testsuite/libgomp.c++/loop-transforms/tile-1.C: Deleted, replaced by
matrix-* tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h:
New header file for new tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
New test.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
New test.
---
 gcc/c/c-parser.cc |  35 +++-
 gcc/cp/parser.cc  |  88 ++--
 .../loop-transforms/imperfect-loop-nest.c |  12 ++
 .../gomp/loop-transforms/unroll-inner-1.c |  15 ++
 .../gomp/loop-transforms/unroll-inner-2.c |  31 +++
 .../gomp/loop-transforms/unroll-non-rect-1.c  |  37 
 .../gomp/loop-transforms/unroll-non-rect-2.c  |  22 ++
 .../libgomp.c++/loop-transforms/tile-1.C  |  52 -
 .../loop-transforms/matrix-1.h|  70 +++
 .../loop-transforms/matrix-constant-iter.h|  71 +++
 .../loop-transforms/matrix-helper.h   |  19 ++
 .../loop-transforms/matrix-no-directive-1.c   |  11 +
 .../matrix-no-directive-unroll-full-1.c   |  13 ++
 .../matrix-omp-distribute-parallel-for-1.c|   6 +
 .../loop-transforms/matrix-omp-for-1.c|  13 ++
 .../matrix-omp-parallel-for-1.c   |  13 ++
 .../matrix-omp-parallel-masked-taskloop-1.c   |   6 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |   6 +
 .../matrix-omp-target-parallel-for-1.c|  13 ++
 ...p-target-teams-distribute-parallel-for-1.c |   6 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |   6 +
 ...trix-omp-teams-distribute-parallel-for-1.c |   6 +
 .../loop-transforms/matrix-simd-1.c   |   6 +
 .../matrix-transform-variants-1.h | 191 ++
 .../loop-transforms/unroll-non-rect-1.c   | 129 
 25 files changed, 801 insertions(+), 76 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-inner-2.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-non-rect-2.c
 delete mode 100644 

Re: [PATCH] PR tree-optimization/109274 - Don't interpret contents of a value_relation record.

2023-03-24 Thread Andrew MacLeod via Gcc-patches

Thanks.. Ive incorporated it into my commit  too.

Andrew

On 3/24/23 11:15, Jakub Jelinek wrote:

On Fri, Mar 24, 2023 at 11:08:54AM -0400, Andrew MacLeod wrote:

Before floating point relations were added, we tried to sanitize
value-relation records to not include non-sensensical records... ie x != x
or x < x.   Instead, we made a VREL_VARYING record with no operands.

When floating point relation support was added, some of these were no longer
non-sensical, AND we expanded the use of value_relation records into GORI
shortly thereafter.

As a result, this sanitization is no longer needed, nor desired. The Oracle
does not create records with op1 == op2 already, so its only within GORI
that these records can exist, and we shouldn't try to interpret them.

The bug occurs because the "sanitized" records doesn't set op1 and op2, and
changes the relation to VARYING..  and we expected the operands it to be set
the way they were specified.  We should not be setting a VREL_VARYING record
if asked to set something else.  In fact, we are missing some opportunities
because we are trying to FP range-ops that op1 != op1  but its getting
transformed into a VREL_VARYING record and not communicated properly.

Currently bootstrapping on x86_64-pc-linux-gnu and assuming no regressions,
OK for trunk?

Andrew
commit 1f02961b23976d35b10e2399708c6eb00632f9d6
Author: Andrew MacLeod 
Date:   Fri Mar 24 09:18:33 2023 -0400

 Don't interpret contents of a value_relation record.
 
 before floating point relations were added, we tried to sanitize

 value-relation records to not include non-sensensical records... ie
 x != x or x < x.   INstead, we made a VREL_VARYING record with no

s/IN/In/


 operands.
 
 When floating point relations were supported, some of these were no

 longer non-sensical, AND we expanded the use of value_relation records
 into GORI.
 
 As a result, this sanitization is no longer needed.  The Oracle

 does not create records with op1 == op2, so its only within GORI
 that these records can exist, and we shouldnt try to interpret them.

s/shouldnt/shouldn't/
 
 The bug occurs because the "sanitized" records doesnt set op1 anmd op2,

s/doesnt/doesn't/


 but we have a record so expected it to be set.
 
 PR tree-optimization/109265

 PR tree-optimization/109274
 gcc/
 * value-relation.h (value_relation::set_relation): Always create 
the
 record that is requested.
 
 gcc/testsuite/

 * gcc.dg/pr109274.c: New.

LGTM, indeed with floating point  a != a isn't nonsensical but basically
__builtin_isnan (a) check.

I'll commit the Fortran testcase I've added in my version of the patch
incrementally when you commit.

Jakub
commit 2d8ef296f43c930a1822817e3280539ce5e7075b
Author: Andrew MacLeod 
Date:   Fri Mar 24 11:21:20 2023 -0400

Don't interpret contents of a value_relation record.

Before floating point relations were added, we tried to sanitize
value-relation records to not include non-sensensical records... ie
x != x or x < x.   Instead, we made a VREL_VARYING record with no
operands.

When floating point relations were supported, some of these were no
longer non-sensical, AND we shortly expanded the use of value_relation
records into GORI.

As a result, this sanitization is no longer needed.  The Oracle
does not create records with op1 == op2, so its only within GORI
that these records can exist, and we shouldn't try to interpret them.

The bug occurs because the "sanitized" records doesn't set op1 anmd op2,
we should simply set the record with the specified operands..

PR tree-optimization/109265
PR tree-optimization/109274
gcc/
* value-relation.h (value_relation::set_relation): Always create the
record that is requested.

gcc/testsuite/
* gcc.dg/pr109274.c: New.
* gfortran.dg/pr109265.f90: New.

diff --git a/gcc/testsuite/gcc.dg/pr109274.c b/gcc/testsuite/gcc.dg/pr109274.c
new file mode 100644
index 000..5dbc0232f8e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109274.c
@@ -0,0 +1,16 @@
+/* PR tree-optimization/109274 */
+/* { dg-do compile } */
+/* { dg-options "-O2 " } */
+
+float a, b, c;
+int d;
+float bar (void);
+
+void
+foo (void)
+{
+  a = 0 * -(2.0f * c);
+  d = a != a ? 0 : bar ();
+  b = c;
+}
+
diff --git a/gcc/testsuite/gfortran.dg/pr109265.f90 b/gcc/testsuite/gfortran.dg/pr109265.f90
new file mode 100644
index 000..0d7124c7741
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr109265.f90
@@ -0,0 +1,39 @@
+! PR tree-optimization/109265
+! { dg-do compile }
+! { dg-options "-O3 -w" }
+
+module pr109265
+  integer, parameter :: r8 = selected_real_kind (12)
+contains
+  subroutine foo (b, c, d, e, f)
+implicit none
+logical :: b
+real (kind = r8) 

Re: PING: Re: [PATCH] json: preserve key-insertion order [PR109163]

2023-03-24 Thread Richard Biener via Gcc-patches



> Am 24.03.2023 um 15:44 schrieb David Malcolm via Gcc-patches 
> :
> 
> I'd like to ping the following patch for review:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html

Ok.

Richard 

> Thanks
> Dave
> 
>> On Fri, 2023-03-17 at 16:53 -0400, David Malcolm wrote:
>> PR other/109163 notes that when we write out JSON files, we traverse
>> the keys within each object via hash_map iteration, and thus the
>> ordering is non-deterministic - it can arbitrarily vary from run to
>> run and from different machines, making it harder for users to
>> compare
>> results and determine if anything has "really" changed.
>> 
>> I'm running into this issue with SARIF output, but there are several
>> places where we're currently emitting JSON:
>> 
>>   * -fsave-optimization-record emits SRCFILE.opt-record.json.gz
>> "This option is experimental and the format of the data
>> within
>> the compressed JSON file is subject to change."; see
>> optinfo-emit-json.{h,cc}, dumpfile.cc, etc
>>   * -fdiagnostics-format= with the various "sarif" and "json" options
>>   * -fdump-analyzer-json is a developer option in the analyzer
>>   * gcov has:
>>  "-j, --json-format: Output JSON intermediate format into
>>  .gcov.json.gz file"
>> 
>> This patch adds an auto_vec to class json::object to preserve
>> key-insertion order, and use it when writing out objects. 
>> Potentially
>> this slightly slows down JSON output, but I believe that this isn't
>> normally a bottleneck, and that the benefits to the user of
>> deterministic output are worth it.
>> 
>> I had first attempted to use ordered_hash_map.h for this, but ran
>> into
>> impenetrable template errors, so this patch uses a simpler approach
>> of
>> just adding an auto_vec to json::object.
>> 
>> Testing showed a failure of diagnostic-format-json-5.c, which was
>> using
>> a convoluted set of regexps to consume the output; I believe that
>> this
>> was brittle, and was intermittently failing for some of the random
>> orderings of output.  I rewrote these regexps to work with the
>> expected
>> output order.  The other such tests seem to pass with the
>> now-deterministic orderings.
>> 
>> Lightly tested with valgrind.
>> I manually verified that the SARIF output is now deterministic.
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>> 
>> OK for trunk?
>> 
>> gcc/ChangeLog:
>> PR other/109163
>> * json.cc: Update comments to indicate that we now preserve
>> insertion order of keys within objects.
>> (object::print): Traverse keys in insertion order.
>> (object::set): Preserve insertion order of keys.
>> (selftest::test_writing_objects): Add an additional key to
>> verify
>> that we preserve insertion order.
>> * json.h (object::m_keys): New field.
>> 
>> gcc/testsuite/ChangeLog:
>> PR other/109163
>> * c-c++-common/diagnostic-format-json-1.c: Update comment.
>> * c-c++-common/diagnostic-format-json-2.c: Likewise.
>> * c-c++-common/diagnostic-format-json-3.c: Likewise.
>> * c-c++-common/diagnostic-format-json-4.c: Likewise.
>> * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps.
>> * c-c++-common/diagnostic-format-json-stderr-1.c: Update
>> comment.
>> 
>> Signed-off-by: David Malcolm 
>> ---
>>  gcc/json.cc   |  40 ---
>>  gcc/json.h|  10 +-
>>  .../c-c++-common/diagnostic-format-json-1.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-2.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-3.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-4.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-5.c   | 100 ++--
>> --
>>  .../diagnostic-format-json-stderr-1.c |   3 +-
>>  8 files changed, 95 insertions(+), 70 deletions(-)
>> 
>> diff --git a/gcc/json.cc b/gcc/json.cc
>> index 01879c9c466..741e97b20e5 100644
>> --- a/gcc/json.cc
>> +++ b/gcc/json.cc
>> @@ -31,8 +31,11 @@ using namespace json;
>>  /* class json::value.  */
>>  
>>  /* Dump this json::value tree to OUTF.
>> -   No formatting is done.  There are no guarantees about the order
>> -   in which the key/value pairs of json::objects are printed.  */
>> +
>> +   No formatting is done.
>> +
>> +   The key/value pairs of json::objects are printed in the order
>> +   in which the keys were originally inserted.  */
>>  
>>  void
>>  value::dump (FILE *outf) const
>> @@ -44,7 +47,7 @@ value::dump (FILE *outf) const
>>  }
>>  
>>  /* class json::object, a subclass of json::value, representing
>> -   an unordered collection of key/value pairs.  */
>> +   an ordered collection of key/value pairs.  */
>>  
>>  /* json:object's dtor.  */
>>  
>> @@ -62,14 +65,17 @@ object::~object ()
>>  void
>>  object::print (pretty_printer *pp) const
>>  {
>> -  /* Note that the order is not guaranteed.  */
>>pp_character (pp, '{');
>> -  for (m

Add caveat/safeguard to OpenMP: Handle descriptors in target's firstprivate [PR104949] (was: [Patch] OpenMP: Handle descriptors in target's firstprivate [PR104949])

2023-03-24 Thread Thomas Schwinge
Hi!

On 2023-02-28T11:56:01+0100, I wrote:
> I'm currently reviewing 'gomp_copy_host2dev', 'ephemeral' in a different
> context, and a question came up here;
> commit r13-706-g49d1a2f91325fa8cc011149e27e5093a988b3a49
> "OpenMP: Handle descriptors in target's firstprivate [PR104949]":

It doesn't seem as if we're going to address this question anytime soon,
but it also isn't necessary; the worrysome condition currently cannot
arise.  I've therefore now documented that via 'assert (!aq)', and pushed
to master branch commit e8fec6998b656dac02d4bc6c69b35a0fb5611e87
"Add caveat/safeguard to OpenMP: Handle descriptors in target's firstprivate 
[PR104949]",
see attached.


Grüße
 Thomas


> On 2022-05-11T19:33:00+0200, Tobias Burnus  wrote:
>> this patch handles (for target regions)
>>firstprivate(array_descriptor)
>> by not only firstprivatizing the descriptor but also the data
>> it points to. This is done by turning it in omp-low.cc the clause
>> into
>>firstprivate(descr) firstprivate(descr.data)
>> and then attaching the latter to the former. That works by
>> adding an 'attach' after the last firstprivate (and checking
>> for it in libgomp). The attached-to device address for a
>> previous (here: the first) firstprivate is obtained by returning
>> the device address inside the hostaddrs[i] alias omp_arr array,
>> i.e. the compiler generates:
>>omp_arr.1 = &descr;  /* firstprivate */
>>omp_arr.2 = descr.data;  /* firstprivate */
>>omp_arr.3 = &omp_arr.1;  /* attach; bias: &desc.data-&desc */
>> and libgomp then knows that the device address is in the
>> pointer.
>
>> Note: The code is not active for OpenACC. The existing code uses, e.g.,
>> 'goto oacc_firstprivate' – thus, the new code would be
>> partially active. I went for making it completely inactive for OpenACC
>> by adding one '!is_gimple_omp_oacc'.
>
> ACK.
>
>> I bet that a deep copy would be
>> also useful for OpenACC, but I have neither checked what the current
>> code does nor what the OpenACC spec says about this.
>
> Instead of adding corresponding handling to the OpenACC 'firstprivate'
> special code paths later on, I suggest that we first address known issues
> with OpenACC 'firstprivate' -- which probably may largely be achieved by
> in fact removing those 'goto oacc_firstprivate's and other special code
> paths?  For example, see 
> "OpenACC 'firstprivate' clause: initial value".
>
> That means, the following code currently isn't active for OpenACC, and
> given that OpenMP 'target' doesn't do asynchronous device execution
> (meaning: not in the way/implementation of OpenACC 'async'), it thus
> doesn't care about the 'ephemeral' argument to 'gomp_copy_host2dev', but
> still, for correctness (and once that code gets used for OpenACC):
>
>> OpenMP: Handle descriptors in target's firstprivate [PR104949]
>>
>> For allocatable/pointer arrays, a firstprivate to a device
>> not only needs to privatize the descriptor but also the actual
>> data. This is implemented as:
>>   firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
>> where the address of x in device memory is saved in hostaddrs[i]
>> by libgomp and the middle end actually passes hostaddrs[i]' to
>> attach.
>
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -1350,7 +1350,24 @@ gomp_map_vars_internal (struct gomp_device_descr 
>> *devicep,
>>  gomp_copy_host2dev (devicep, aq,
>>  (void *) (tgt->tgt_start + tgt_size),
>>  (void *) hostaddrs[i], len, false, cbufp);
>
> Here, passing 'ephemeral <- false' is correct, as 'h <- hostaddrs[i]'
> points to non-ephemeral data.
>
>> +/* Save device address in hostaddr to permit latter availablity
>> +   when doing a deep-firstprivate with pointer attach.  */
>> +hostaddrs[i] = (void *) (tgt->tgt_start + tgt_size);
>
> Here, we modify 'hostaddrs[i]' (itself -- *not* the data that the
> original 'hostaddrs[i]' points to), so the above 'gomp_copy_host2dev'
> with 'ephemeral <- false' is still correct, right?
>
>>  tgt_size += len;
>> +
>> +/* If followed by GOMP_MAP_ATTACH, pointer assign this
>> +   firstprivate to hostaddrs[i+1], which is assumed to contain a
>> +   device address.  */
>> +if (i + 1 < mapnum
>> +&& (GOMP_MAP_ATTACH
>> +== (typemask & get_kind (short_mapkind, kinds, i+1
>> +  {
>> +uintptr_t target = (uintptr_t) hostaddrs[i];
>> +void *devptr = *(void**) hostaddrs[i+1] + sizes[i+1];
>> +gomp_copy_host2dev (devicep, aq, devptr, &target,
>> +sizeof (void *), false, cbufp);
>
> However, 'h <- &target' here points to data in the local frame
> ('target'), which potentially goes out of scope before an asynchronous
> 'gomp_copy_host2dev' has completed.  Thus, don't we have

Re: [PATCH] [rs6000] adjust return_pc debug attrs

2023-03-24 Thread Tom Tromey via Gcc-patches
> "Segher" == Segher Boessenkool  writes:

>> FWIW I sent a gdb patch to work around this bug.  However, in my
>> examples, I only ever saw a nop following the call instruction -- so I
>> had gdb check for this.

Segher> GCC inserts just a nop in most cases, but the linker or dynamic linker
Segher> can replace it.

Thanks.  I've updated my gdb patch to drop the nop check.

Tom


Re: [PATCH] c++: outer 'this' leaking into local class [PR106969]

2023-03-24 Thread Jason Merrill via Gcc-patches

On 3/23/23 11:00, Patrick Palka wrote:

Here when resolving the implicit object for '&wrapped' within the
local class Foo, we expect to obtain a dummy object of type Foo& since
there's no 'this' available in this context.  And yet at this point
current_class_ref still corresponds to the outer class Context (and is
const), which confuses maybe_dummy_object into propagating the cv-quals
of current_class_ref and returning an object of type const Foo&.  Thus
decltype(&wrapped) wrongly yields const int* instead of int*.

The problem ultimately seems to be that the 'this' from the enclosing
class appears available for use when parsing the local class, but 'this'
shouldn't leak across classes like that.  This patch fixes this by
clearing current_class_ptr/ref when parsing a class definition.

After this change, for name-clash11.C in C++98 mode we would now
complain about an invalid use of 'this' for e.g.

   ASSERT (sizeof (this->A) == 16);

due to the way the ASSERT macro is defined using a local class.  This
patch redefines it using a local typedef instead.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12?


OK.


PR c++/106969

gcc/cp/ChangeLog:

* parser.cc (cp_parser_class_specifier): Clear current_class_ptr
and current_class_ref when parsing a class definition.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/name-clash11.C: New test.
* g++.dg/lookup/this2.C: New test.
---
  gcc/cp/parser.cc   | 13 +
  gcc/testsuite/g++.dg/lookup/name-clash11.C |  2 +-
  gcc/testsuite/g++.dg/lookup/this2.C| 22 ++
  3 files changed, 32 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/lookup/this2.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index a277003ea58..be9c77b415e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -26151,6 +26151,11 @@ cp_parser_class_specifier (cp_parser* parser)
saved_in_unbraced_linkage_specification_p
  = parser->in_unbraced_linkage_specification_p;
parser->in_unbraced_linkage_specification_p = false;
+  /* 'this' from an enclosing non-static member function is unvailable.  */
+  tree saved_ccp = current_class_ptr;
+  tree saved_ccr = current_class_ref;
+  current_class_ptr = NULL_TREE;
+  current_class_ref = NULL_TREE;
  
/* Start the class.  */

if (nested_name_specifier_p)
@@ -26369,8 +26374,6 @@ cp_parser_class_specifier (cp_parser* parser)
/* If there are noexcept-specifiers that have not yet been processed,
 take care of them now.  Do this before processing NSDMIs as they
 may depend on noexcept-specifiers already having been processed.  */
-  tree save_ccp = current_class_ptr;
-  tree save_ccr = current_class_ref;
FOR_EACH_VEC_SAFE_ELT (unparsed_noexcepts, ix, decl)
{
  tree ctx = DECL_CONTEXT (decl);
@@ -26496,8 +26499,8 @@ cp_parser_class_specifier (cp_parser* parser)
}
vec_safe_truncate (unparsed_contracts, 0);
  
-  current_class_ptr = save_ccp;

-  current_class_ref = save_ccr;
+  current_class_ptr = NULL_TREE;
+  current_class_ref = NULL_TREE;
if (pushed_scope)
pop_scope (pushed_scope);
  
@@ -26529,6 +26532,8 @@ cp_parser_class_specifier (cp_parser* parser)

  = saved_num_template_parameter_lists;
parser->in_unbraced_linkage_specification_p
  = saved_in_unbraced_linkage_specification_p;
+  current_class_ptr = saved_ccp;
+  current_class_ref = saved_ccr;
  
return type;

  }
diff --git a/gcc/testsuite/g++.dg/lookup/name-clash11.C 
b/gcc/testsuite/g++.dg/lookup/name-clash11.C
index bc63645e8d3..2ae9a65264d 100644
--- a/gcc/testsuite/g++.dg/lookup/name-clash11.C
+++ b/gcc/testsuite/g++.dg/lookup/name-clash11.C
@@ -7,7 +7,7 @@
  #  define ASSERT(e) static_assert (e, #e)
  #else
  #  define ASSERT(e) \
-  do { struct S { bool: !!(e); } asrt; (void)&asrt; } while (0)
+  do { typedef int asrt[bool(e) ? 1 : -1]; } while (0)
  #endif
  
  
diff --git a/gcc/testsuite/g++.dg/lookup/this2.C b/gcc/testsuite/g++.dg/lookup/this2.C

new file mode 100644
index 000..1450c563d92
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/this2.C
@@ -0,0 +1,22 @@
+// PR c++/106969
+// { dg-do compile { target c++11 } }
+
+struct Context
+{
+void
+action() const
+{
+struct Foo
+{
+int wrapped;
+decltype( &wrapped ) get() { return &wrapped; }
+} t;
+
+*t.get()= 42; // OK, get() returns int* not const int*
+
+struct Bar
+{
+using type = decltype(this); // { dg-error "invalid use of 'this'" 
}
+};
+}
+};




Re: [OG12][committed] amdgcn: OpenMP low-latency allocator

2023-03-24 Thread Thomas Schwinge
Hi!

On 2023-02-16T18:06:41+, Andrew Stubbs  wrote:
> 2. 230216-amd-low-lat.patch
>
> Allocate the memory, adjust the default address space, and hook up the
> allocator.

Like done for nvptx in og12 commit 23f52e49368d7b26a1b1a72d6bb903d31666e961
"Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 
'ompx_host_mem_space'",
I've now pushed the corresponding GCN 'ompx_host_mem_space' thing to
devel/omp/gcc-12 branch in commit b39e4bbab59f5e4b551c44dbce0ce3acf4afc22a
"Miscellaneous clean-up re OpenMP 'ompx_host_mem_space'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b39e4bbab59f5e4b551c44dbce0ce3acf4afc22a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 17 Feb 2023 14:13:15 +0100
Subject: [PATCH] Miscellaneous clean-up re OpenMP 'ompx_host_mem_space'

Like done for nvptx in og12 commit 23f52e49368d7b26a1b1a72d6bb903d31666e961
"Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 'ompx_host_mem_space'".

Clean-up for og12 commit c77c45a641fedc3fe770e909cc010fb1735bdbbd
"amdgcn, libgomp: low-latency allocator".  No functional change.

	libgomp/
	* config/gcn/allocator.c (gcn_memspace_free): Explicitly handle
	'memspace == ompx_host_mem_space'.
---
 libgomp/ChangeLog.omp  | 3 +++
 libgomp/config/gcn/allocator.c | 4 
 2 files changed, 7 insertions(+)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 63d1f563d5d..ef957e3d2d8 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@
 2023-03-24  Thomas Schwinge  
 
+	* config/gcn/allocator.c (gcn_memspace_free): Explicitly handle
+	'memspace == ompx_host_mem_space'.
+
 	Backported from master:
 	2023-03-24  Thomas Schwinge  
 
diff --git a/libgomp/config/gcn/allocator.c b/libgomp/config/gcn/allocator.c
index 001de89ffe0..e9980f6f98e 100644
--- a/libgomp/config/gcn/allocator.c
+++ b/libgomp/config/gcn/allocator.c
@@ -36,6 +36,7 @@
when the memspace access trait is set accordingly.  */
 
 #include "libgomp.h"
+#include 
 #include 
 
 #define BASIC_ALLOC_PREFIX __gcn_lowlat
@@ -86,6 +87,9 @@ gcn_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
 
   __gcn_lowlat_free (shared_pool, addr, size);
 }
+  else if (memspace == ompx_host_mem_space)
+/* Just verify what all allocator functions return.  */
+assert (addr == NULL);
   else
 free (addr);
 }
-- 
2.25.1



Pending OpenMP patches

2023-03-24 Thread Tobias Burnus

Just to keep track myself, the following OpenMP are pending review - either
by Jakub or by me. It does not look as if any of those will get into GCC 13,
but who knows. I think the first three could:


[Doc patch - will go into GCC 13, still comments are welcome:]
[patch V2] Docs, OpenMP: Correct internal documentation of OMP_FOR
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614392.html


libgomp: Simplify OpenMP reverse offload host <-> device memory copy 
implementation
   (was: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling)
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614569.html

[Patch,v4] Fortran/OpenMP: Fix mapping of array descriptors and deferred-length 
strings
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614472.html

[PATCH] openmp: Add support for 'present' modifier
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611299.html

[PATCH 0/3] OpenMP 5.0: Strided updates and array shape-operator support (C++)
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613785.html

[PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614564.html

[Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608904.html

Complete patch set was just re-submitted by Julian, overview patch is
[PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran "declare mapper" 
support
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/thread.html#609031
- with follow up for 10/11 -
[PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html

Plus some more pending patches, some of which require re-diffing
or checking the backlog. Others are to be expected soon...

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[pushed] c++: default template arg, partial ordering [PR105481]

2023-03-24 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The default argument code in type_unification_real was assuming that all
targs we've deduced by that point are non-dependent, but that's not the case
for partial ordering.

PR c++/105481

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Adjust for partial ordering.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/fntmpdefarg-partial1.C: New test.
---
 gcc/cp/pt.cc   | 18 ++
 .../g++.dg/cpp0x/fntmpdefarg-partial1.C|  8 
 2 files changed, 22 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/fntmpdefarg-partial1.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 40deedc9ba9..3bb98ebeac1 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23304,14 +23304,24 @@ type_unification_real (tree tparms,
  return unify_parameter_deduction_failure (explain_p, tparm);
}
 
+  /* During partial ordering, we deduce dependent template args.  */
+  bool any_dependent_targs = false;
+
   /* Now substitute into the default template arguments.  */
   for (i = 0; i < ntparms; i++)
{
  tree targ = TREE_VEC_ELT (targs, i);
  tree tparm = TREE_VEC_ELT (tparms, i);
 
- if (targ || tparm == error_mark_node)
+ if (targ)
+   {
+ if (!any_dependent_targs && dependent_template_arg_p (targ))
+   any_dependent_targs = true;
+ continue;
+   }
+ if (tparm == error_mark_node)
continue;
+
  tree parm = TREE_VALUE (tparm);
  tree arg = TREE_PURPOSE (tparm);
  reopen_deferring_access_checks (*checks);
@@ -23347,9 +23357,9 @@ type_unification_real (tree tparms,
 do this substitution without processing_template_decl.  This
 is important if the default argument contains something that
 might be instantiation-dependent like access (87480).  */
- processing_template_decl_sentinel s;
+ processing_template_decl_sentinel s (!any_dependent_targs);
  tree substed = NULL_TREE;
- if (saw_undeduced == 1)
+ if (saw_undeduced == 1 && !any_dependent_targs)
{
  /* First instatiate in template context, in case we still
 depend on undeduced template parameters.  */
@@ -23372,7 +23382,7 @@ type_unification_real (tree tparms,
 complain, i, NULL_TREE);
  else if (saw_undeduced == 1)
arg = NULL_TREE;
- else
+ else if (!any_dependent_targs)
arg = error_mark_node;
}
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg-partial1.C 
b/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg-partial1.C
new file mode 100644
index 000..2a6783e566b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/fntmpdefarg-partial1.C
@@ -0,0 +1,8 @@
+// PR c++/105481
+// { dg-do compile { target c++11 } }
+
+template struct uint;
+template uint f(const uint &);
+template> uint f(T);
+using X = uint<1>;
+X (*fp)(X const &) = f;

base-commit: c4792bd1de0621932a47fb86aca09fafafdb2972
-- 
2.31.1



[og12] Add 'libgomp.c/alloc-ompx_host_mem_alloc-1.c'

2023-03-24 Thread Thomas Schwinge
Hi!

This had fallen out of some earlier work of mine; I've now pushed to
devel/omp/gcc-12 branch commit ae2dca26602678f8b70e22da1bce8302c0751b75
"Add 'libgomp.c/alloc-ompx_host_mem_alloc-1.c'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From ae2dca26602678f8b70e22da1bce8302c0751b75 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 15 Feb 2023 11:27:55 +0100
Subject: [PATCH] Add 'libgomp.c/alloc-ompx_host_mem_alloc-1.c'

OpenMP 'ompx_host_mem_alloc' is available for host and nvptx offloading as of
og12 commit 84914e197d91a67b3d27db0e4c69a433462983a5
"openmp, nvptx: ompx_unified_shared_mem_alloc", and for GCN offloading as of
og12 commit c77c45a641fedc3fe770e909cc010fb1735bdbbd
"amdgcn, libgomp: low-latency allocator".

	libgomp/
	* testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c: New.
---
 libgomp/ChangeLog.omp |  2 +
 .../libgomp.c/alloc-ompx_host_mem_alloc-1.c   | 77 +++
 2 files changed, 79 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index ef957e3d2d8..6b816e46cd2 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,7 @@
 2023-03-24  Thomas Schwinge  
 
+	* testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c: New.
+
 	* config/gcn/allocator.c (gcn_memspace_free): Explicitly handle
 	'memspace == ompx_host_mem_space'.
 
diff --git a/libgomp/testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c b/libgomp/testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c
new file mode 100644
index 000..683b7aff3d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c
@@ -0,0 +1,77 @@
+/* Verify that on the host we can but on a device we cannot allocate 'ompx_host_mem_alloc' memory.  */
+
+/* { dg-additional-options -DOFFLOAD_DEVICE { target offload_device } } */
+
+#include 
+
+#pragma omp requires dynamic_allocators
+
+int main()
+{
+#pragma omp target
+  {
+char *c, *c_;
+
+c = omp_alloc(1, ompx_host_mem_alloc);
+#ifdef OFFLOAD_DEVICE
+if (c)
+  __builtin_abort ();
+#else
+if (!c)
+  __builtin_abort ();
+#endif
+omp_free(c, ompx_host_mem_alloc);
+
+c = omp_aligned_alloc(128, 256, ompx_host_mem_alloc);
+#ifdef OFFLOAD_DEVICE
+if (c)
+  __builtin_abort ();
+#else
+if (!c)
+  __builtin_abort ();
+#endif
+omp_free(c, omp_null_allocator);
+
+c = omp_calloc(1, 1, ompx_host_mem_alloc);
+#ifdef OFFLOAD_DEVICE
+if (c)
+  __builtin_abort ();
+#else
+if (!c)
+  __builtin_abort ();
+#endif
+c_ = omp_realloc(c, 2, ompx_host_mem_alloc, ompx_host_mem_alloc);
+#ifdef OFFLOAD_DEVICE
+if (c_)
+  __builtin_abort ();
+#else
+if (!c_)
+  __builtin_abort ();
+#endif
+c = omp_realloc(c_, 0, ompx_host_mem_alloc, ompx_host_mem_alloc);
+if (c)
+  __builtin_abort ();
+
+c = omp_aligned_calloc(64, 1, 512, ompx_host_mem_alloc);
+#ifdef OFFLOAD_DEVICE
+if (c)
+  __builtin_abort ();
+#else
+if (!c)
+  __builtin_abort ();
+#endif
+c_ = omp_realloc(c, 2, c ? omp_null_allocator : ompx_host_mem_alloc, omp_null_allocator);
+#ifdef OFFLOAD_DEVICE
+if (c_)
+  __builtin_abort ();
+#else
+if (!c_)
+  __builtin_abort ();
+#endif
+c = omp_realloc(c_, 0, omp_null_allocator, omp_null_allocator);
+if (c)
+  __builtin_abort ();
+  }
+
+  return 0;
+}
-- 
2.25.1



Re: [PATCH] PR tree-optimization/109274 - Don't interpret contents of a value_relation record.

2023-03-24 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 24, 2023 at 11:52:30AM -0400, Andrew MacLeod wrote:
> Thanks.. Ive incorporated it into my commit  too.

Note, both my earlier version of the patch and your patch regress:
FAIL: gcc.dg/tree-ssa/vrp-float-3a.c scan-tree-dump-not evrp "link_error"
FAIL: gcc.dg/tree-ssa/vrp-float-4a.c scan-tree-dump-not evrp "link_error"

Jakub



[committed] libgomp.texi: Fix wording in GCN offload specifics

2023-03-24 Thread Tobias Burnus

Stumbled over 'Reverse offload are', fixed by changing it to 'Reverse
offload regions are'.

Committed as r13-6854

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 243fa4883cf6fccaaafddcc82e6b58843c82fb30
Author: Tobias Burnus 
Date:   Fri Mar 24 17:32:59 2023 +0100

libgomp.texi: Fix wording in GCN offload specifics

libgomp/
* libgomp.texi (Offload-Target Specifics): Grammar fix.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 5bcb84a1d6f..dc6b4aca38b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4454,7 +4454,7 @@ The implementation remark:
 @item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
   using the C library @code{printf} functions and the Fortran
   @code{print}/@code{write} statements.
-@item Reverse offload (i.e. @code{target} regions with
+@item Reverse offload regions (i.e. @code{target} regions with
   @code{device(ancestor:1)}) are processed serially per @code{target} region
   such that the next reverse offload region is only executed after the previous
   one returned.


Re: [PATCH] go: Fix up go.test/test/fixedbugs/bug207.go failure [PR109258]

2023-03-24 Thread Ian Lance Taylor
Jakub Jelinek  writes:

> 2023-03-24  Jakub Jelinek  
>
>   PR middle-end/109258
>   * go-gcc.cc (Gcc_backend): Add new static data members builtin_pure
>   and builtin_nothrow.
>   (Gcc_backend::Gcc_backend): Pass builtin_pure | builtin_nothrow for
>   BUILT_IN_MEMCMP.
>   (Gcc_backend::define_builtin): Handle builtin_pure and builtin_nothrow
>   in flags.

This is OK.  Thanks.  Go ahead and commit.

Ian


Re: [PATCH v2] C, ObjC: Add -Wunterminated-string-initialization

2023-03-24 Thread Alejandro Colomar via Gcc-patches
Hi David,

On 3/24/23 15:53, David Malcolm wrote:
> On Fri, 2023-03-24 at 14:39 +0100, Alejandro Colomar via Gcc-patches
> wrote:
>> Warn about the following:
>>
>>     char  s[3] = "foo";
>>
[...]

>> ---
>>
>> Hi,
> 
> Hi Alex, thanks for the patch.

:)

> 
>>
>> I sent v1 to the wrong list.  This time I've made sure to write to
>> gcc-patches@.
> 
> Note that right now we're deep in bug-fixing/stabilization for GCC 13
> (and trunk is still tracking that effort), so your patch might be more
> suitable for GCC 14.

Sure, no problem.  Do you have a "next" branch where you pick patches
for after the release, or should I resend after the release?  Is
discussion of a patch reasonable now, or is it too much distracting
from your stabilization efforts?

> 
>>
>> v2 adds some draft of a test, as suggested by Martin.  However, I
>> don't
>> know yet how to write those, so the test is just a draft.  But I did
>> test the feature, by compiling GCC and compiling some small program
>> with
>> it.
> 
> Unfortunately the answer to the question "how do I run just one
> testcase in GCC's testsuite" is rather non-trivial; FWIW I've written
> up some notes on working with the GCC testsuite here:
> https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html

Hmm, I'll try following that; thanks!  Is there anything obvious that
I might have missed, at first glance?

> 
> Hope this is helpful

Yup.  Thanks!

> Dave

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] [rs6000] adjust return_pc debug attrs

2023-03-24 Thread Segher Boessenkool
Hi!

On Thu, Mar 23, 2023 at 12:06:39AM -0300, Alexandre Oliva wrote:
> On Mar 13, 2023, Segher Boessenkool  wrote:
> > Yes.  On most architectures you can get multiple machine instructions of
> > course (for long calls for example), but on rs6000 (with some ABIs, in
> > some circumstances) we generate a nop insn after calls, so that the
> > linker has a spot to insert fixup code after calls (typically to restore
> > the r2 contents, but it could be anything).
> 
> Thanks.  I wasn't entirely sure that the linker would never ever relax
> the call sequence, moving the bl to the second instruction in the pair,

That would give headaches with maximal offsets, but more importantly
perhaps, many tools depend on knowing exactly where the branches are,
and some actual (assembler) code as well.  It's best not to toy with
branches :-)

> or that no 64-bit call variant existed or would come into existence.

64-bit offsets will not fit in the opcode of course, not even with
prefixes.  And a normal indirect branch *is* to a 64 bit address too.

Making prefixed branch instructions would get us in much the same hot
water (but everywhere instead of just in select cases), and our (I-form)
branches reach 32MB in either direction already (for relative branches;
and you can reach to the low or high 32MB of the address space too, but
not much code uses that at all anyway.  Terrible shame, not in the least
because the extended mnemonics for that are "ba" and "bla").

> > Subtracting 4 from what we currently return is very fragile.
> 
> Agreed.

Very glad we agree on that!

> > The actual return address is *always* the address of the branch insn
> > plus 4, can't you use that?
> 
> Yup, given this piece of knowledge I didn't have, I agree that's a far
> saner approach.

It is the same in any other arch I have seen that has an LR or RA reg.

> I'll post a new version of the patch, now broken up
> into rs6000-specific and machine-independent, momentarily.

Looking forward to it.

> > (Does the GCC code handle delay slots here, btw?
> 
> It does, in that the label is output after the insn sequence.
> 
> >> This patch introduces infrastructure for targets to add an offset to
> >> the label issued after the call_insn to set the call_return_pc
> >> attribute, and uses that on rs6000 to account for nop and l opcodes
> >> issued after actual call opcode as part of call insns output patterns.
> 
> > What is an "l opcode"?
> 
> I have a vague recollection of seeing call sequences ended by loads.
> Ah, yes, rs6000_indirect_call_template_1 outputs ld or lwz, depending on
> TARGET_64BIT, in both speculate and non-speculate cases after the branch
> in ABI_AIX and ABI_ELFv2 calls.  I understand the l in ld and lwz stands
> for 'load', and so I meant 'load' updates, but I guess in the context of
> calls the 'l' can indeed be misleading.  Anyway, that complexity is gone
> thanks to your suggestion.

Ah, "l%s", I see.  On the old POWER ("RIOS") architecture we *did* have
an actual "l" instruction.  There also is the "@l" relocation syntax.
I wasn't quite sure what you were talking about :-)

> >> +/* Return the offset to be added to the label output after CALL_INSN
> >> +   to compute the address to be placed in DW_AT_call_return_pc.  Some
> >> +   call insns output nop or l after bl, so the return address would be
> >> +   wrong without this offset.  */
> >> +
> >> +static int
> >> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> >> +{
> >> +  /* We don't expect SEQUENCEs in this port.  */
> >> +  gcc_checking_assert (GET_CODE (call_insn) == CALL_INSN);
> 
> > That is not doing what the comment says.
> 
> It is.  The documented interface, in the .def file, states that it must
> be either a CALL_INSN or a SEQUENCE.

Ah, in that sense.  But that is more confusing than just not saying
anything imo.

> My rationale to put it in was to (i) confirm that the case of SEQUENCEs
> was considered and needs not be handled in this port, and (ii) should
> someone take inspiration from this implementation of the hook for a port
> that supported delay slots, it would have to be handled.

Yeah.  Blindly copying code from annother port is a recipe for disaster
always.  It is much nicer to not say anything / to not have to say
anything, just let the code speak for itself?


Segher


Re: [PATCH v2] C, ObjC: Add -Wunterminated-string-initialization

2023-03-24 Thread David Malcolm via Gcc-patches
On Fri, 2023-03-24 at 18:45 +0100, Alejandro Colomar wrote:
> Hi David,
> 
> On 3/24/23 15:53, David Malcolm wrote:
> > On Fri, 2023-03-24 at 14:39 +0100, Alejandro Colomar via Gcc-
> > patches
> > wrote:
> > > Warn about the following:
> > > 
> > >     char  s[3] = "foo";
> > > 
> [...]
> 
> > > ---
> > > 
> > > Hi,
> > 
> > Hi Alex, thanks for the patch.
> 
> :)
> 
> > 
> > > 
> > > I sent v1 to the wrong list.  This time I've made sure to write
> > > to
> > > gcc-patches@.
> > 
> > Note that right now we're deep in bug-fixing/stabilization for GCC
> > 13
> > (and trunk is still tracking that effort), so your patch might be
> > more
> > suitable for GCC 14.
> 
> Sure, no problem.  Do you have a "next" branch where you pick patches
> for after the release, or should I resend after the release?  

We don't; resending it after release is probably best.

> Is
> discussion of a patch reasonable now, or is it too much distracting
> from your stabilization efforts?

FWIW I'd prefer to postpone the discussion until after we branch for
the release.

> 
> > 
> > > 
> > > v2 adds some draft of a test, as suggested by Martin.  However, I
> > > don't
> > > know yet how to write those, so the test is just a draft.  But I
> > > did
> > > test the feature, by compiling GCC and compiling some small
> > > program
> > > with
> > > it.
> > 
> > Unfortunately the answer to the question "how do I run just one
> > testcase in GCC's testsuite" is rather non-trivial; FWIW I've
> > written
> > up some notes on working with the GCC testsuite here:
> > https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
> 
> Hmm, I'll try following that; thanks!  Is there anything obvious that
> I might have missed, at first glance?

The main thing is that there's a difference between compiling the test
case "by hand", versus doing it through the test harness - the latter
sets up the environment in a particular way, injects a particular set
of flags, etc etc.

Dave



Re: Should -ffp-contract=off the default on GCC?

2023-03-24 Thread Andrew Pinski via Gcc-patches
On Fri, Mar 24, 2023 at 1:14 AM Fangrui Song via Gcc-patches
 wrote:
>
> On Wed, Mar 22, 2023 at 8:52 AM Qing Zhao via Gcc-patches
>  wrote:
> >
> >
> >
> > > On Mar 22, 2023, at 9:57 AM, Richard Biener via Gcc-patches 
> > >  wrote:
> > >
> > > On Wed, Mar 22, 2023 at 1:26 PM Alexander Monakov  
> > > wrote:
> > >>
> > >>
> > >> On Wed, 22 Mar 2023, Richard Biener wrote:
> > >>
> > >>> I think it's even less realistic to expect users to know the details of
> > >>> floating-point math.  So I doubt any such sentence will be helpful
> > >>> besides spreading some FUD?
> > >>
> > >> I think it's closer to "fundamental notions" rather than "details". For
> > >> users who bother to read the GCC manual there's a decent chance it 
> > >> wouldn't
> > >> be for naught.
> > >>
> > >> For documentation, I was thinking
> > >>
> > >>  Together with -fexcess-precision=standard, -ffp-contract=off
> > >>  is necessary to ensure that rounding of intermediate results to 
> > >> precision
> > >>  implied by the source code and the FLT_EVAL_METHOD macro is not
> > >>  omitted by the compiler.
> > >
> > > that sounds good to me
> >
> > Shall we add such clarification to our Gcc13 doc? That should be helpful if 
> > we keep the currently default.
> >
> > Qing
> > >
> > >> Alexander
> >
>
> While updating the documentation, consider adding information that
> #pragma STDC FP_CONTRACT OFF is ignored with -ffp-contract=fast.
>
> This surprising behavior motivated Clang to add
> -Xclang=-ffp-contract=fast-honor-pragmas
> (https://discourse.llvm.org/t/fp-contract-fast-and-pragmas/58529).

`#pragma STDC FP_CONTRACT OFF` is not even implemented yet in GCC.
Rather we should document that :).
It does not matter what clang does here really since GCC does not even
implement the pragma.

Thanks,
Andrew Pinski


>
>
>
> --
> 宋方睿


[PATCH, committed] Fortran: fix FE memleak with BOZ expressions

2023-03-24 Thread Harald Anlauf via Gcc-patches
Dear all,

while looking at variations of testcases in pr107560, I discovered
a minor FE memleak that was introduced in the BOZ rework and is
fixed by the attached simple patch.

Regtested on x86_64-pc-linux-gnu on OK'ed in the PR by Steve.

Thanks,
Harald

From 833233a4aefc9981b671c1bda34676c20b76cc90 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 24 Mar 2023 22:07:37 +0100
Subject: [PATCH] Fortran: fix FE memleak with BOZ expressions.

gcc/fortran/ChangeLog:

	* expr.cc (free_expr0): Free also BOZ strings as part of an expression.
---
 gcc/fortran/expr.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index 4662328bf31..7fb33f81788 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -466,6 +466,10 @@ free_expr0 (gfc_expr *e)
 	  mpc_clear (e->value.complex);
 	  break;

+	case BT_BOZ:
+	  free (e->boz.str);
+	  break;
+
 	default:
 	  break;
 	}
--
2.35.3



Re: [PATCH] rtl-optimization: ppc backend generates unnecessary signed extension.

2023-03-24 Thread Peter Bergner via Gcc-patches
On 3/23/23 6:12 PM, Jeff Law via Gcc-patches wrote:
 Is there a reason why REE cannot see that our (reg:QI 4) is a param 
 register
 and thus due to our ABI, already correctly sign/zero extended?
>>>
>>> I don't think REE has ever considered exploiting ABI constraints. Handling
>>> that might be a notable improvement on various targets.  It'd be a great
>>> place to do some experimentation.
>>
>> Ok, so sounds like a good follow-on project after this patch is reviewed
>> and committed (stage1).  Thanks for your input!
>
> Agreed.  I suspect that risc-v will benefit from such work as well. 
> With that in mind, if y'all start poking at this, please loop in Raphael
> (on cc) who's expressed an interest in this space.

Will do.  I suspect that it'll be best to come up with some generic interface
using target hooks like "param regs are sign/zero extended" or "call return
values are sign/zero extended", etc. that targets can conditionally opt into
depending on their ABI that is in effect.

Peter



[PATCH] aarch64, builtins: Include PR registers in FUNCTION_ARG_REGNO_P etc. [PR109254]

2023-03-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The testcase in the PR (which unfortunately because of my lack of experience
with SVE I'm not able to turn into a runtime testcase that verifies it)
is miscompiled on aarch64-linux in the regname pass, because while the
function takes arguments in the p0 register, FUNCTION_ARG_REGNO_P doesn't
reflect that, so DF doesn't know the register is used in register passing.
It sees 2 chains with p1 register and wants to replace the second one and
as DF doesn't know p0 is live at the start of the function, it will happily
use p0 register even when it is used in subsequent instructions.

The following patch fixes that.  FUNCTION_ARG_REGNO_P returns non-zero
for p0-p3 (unconditionally, seems for the floating/vector registers it
doesn't conditionalize them on TARGET_FLOAT either, but if you want,
I can conditionalize p0-p3 on TARGET_SVE), similarly
targetm.calls.function_value_regno_p returns true for p0 register
if TARGET_SVE (again for consistency, that function conditionalizes
the float/vector on TARGET_FLOAT); looking at the AAPCS, seems p1-p3
could be also used to return values in case of homogenous aggregates,
but it doesn't seem GCC supports putting svbool_t as a member of a
structure.

Now, that change broke bootstrap in libobjc and some
__builtin_apply_args/__builtin_apply/__builtin_return tests.  The
aarch64_get_reg_raw_mode hook already documents that SVE scalable arg/return
passing is fundamentally incompatible with those builtins, but unlike
the floating/vector regs where it forces a fixed vector mode, I think
there is no fixed mode which could be used for p0-p3.  So, I have tweaked
the generic code so that it uses VOIDmode return from that hook to signal
that a register shouldn't be touched by
__builtin_apply_args/__builtin_apply/__builtin_return
despite being mentioned in FUNCTION_ARG_REGNO_P or
targetm.calls.function_value_regno_p.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

Could somebody please turn the testcase from the PR into something that
can be included into the testsuite?

2023-03-24  Jakub Jelinek  

PR target/109254
* builtins.cc (apply_args_size): If targetm.calls.get_raw_arg_mode
returns VOIDmode, handle it like if the register isn't used for
passing arguments at all.
(apply_result_size): If targetm.calls.get_raw_result_mode returns
VOIDmode, handle it like if the register isn't used for returning
results at all.
* target.def (get_raw_result_mode, get_raw_arg_mode): Document what it
means to return VOIDmode.
* doc/tm.texi: Regenerated.
* config/aarch64/aarch64.cc (aarch64_function_value_regno_p): Return
TARGET_SVE for P0_REGNUM.
(aarch64_function_arg_regno_p): Also return true for p0-p3.
(aarch64_get_reg_raw_mode): Return VOIDmode for PR_REGNUM_P regs.

--- gcc/builtins.cc.jj  2023-03-24 10:38:40.185097837 +0100
+++ gcc/builtins.cc 2023-03-24 11:06:49.781725290 +0100
@@ -1446,18 +1446,19 @@ apply_args_size (void)
  {
fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
 
-   gcc_assert (mode != VOIDmode);
-
-   align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
-   if (size % align != 0)
- size = CEIL (size, align) * align;
-   size += GET_MODE_SIZE (mode);
-   apply_args_mode[regno] = mode;
+   if (mode != VOIDmode)
+ {
+   align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
+   if (size % align != 0)
+ size = CEIL (size, align) * align;
+   size += GET_MODE_SIZE (mode);
+   apply_args_mode[regno] = mode;
+ }
+   else
+ apply_args_mode[regno] = as_a  (VOIDmode);
  }
else
- {
-   apply_args_mode[regno] = as_a  (VOIDmode);
- }
+ apply_args_mode[regno] = as_a  (VOIDmode);
 }
   return size;
 }
@@ -1481,13 +1482,16 @@ apply_result_size (void)
  {
fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
 
-   gcc_assert (mode != VOIDmode);
-
-   align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
-   if (size % align != 0)
- size = CEIL (size, align) * align;
-   size += GET_MODE_SIZE (mode);
-   apply_result_mode[regno] = mode;
+   if (mode != VOIDmode)
+ {
+   align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
+   if (size % align != 0)
+ size = CEIL (size, align) * align;
+   size += GET_MODE_SIZE (mode);
+   apply_result_mode[regno] = mode;
+ }
+   else
+ apply_result_mode[regno] = as_a  (VOIDmode);
  }
else
  apply_result_mode[regno] = as_a  (VOIDmode);
--- gcc/target.def.jj   2023-03-23 10:00:58.722094571 +0100
+++ gcc/target.def  2023-03-24 11:12:46.978585647 +0100
@@ -5324,7 +5324,8 @@ DEFH

Re: [PATCH RFC] c-family: -Wsequence-point and COMPONENT_REF [PR107163]

2023-03-24 Thread Jason Merrill via Gcc-patches

On 3/23/23 17:03, Jakub Jelinek wrote:

On Thu, Mar 23, 2023 at 04:35:07PM -0400, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.  Jakub, does this make sense to you?  Do we have a
way of testing for compile-hog regressions?

-- 8< --

The patch for PR91415 fixed -Wsequence-point to treat shifts and ARRAY_REF
as sequenced in C++17, and COMPONENT_REF as well.  But this is unnecessary
for COMPONENT_REF, since the RHS is just a FIELD_DECL with no actual
evaluation, and in this testcase handling COMPONENT_REF as sequenced blows
up fast in a deep inheritance tree.

PR c++/107163

gcc/c-family/ChangeLog:

* c-common.cc (verify_tree): Don't use sequenced handling
for COMPONENT_REF.


When we touch this for COMPONENT_REF, shouldn't we then handle it as
unary given that the second operand is FIELD_DECL and third/fourth
will likely be NULL and even if not, aren't user expressions that should be
inspected?
So, instead of doing this do:
 case COMPONENT_REF:
   x = TREE_OPERAND (x, 0);
   writer = 0;
   goto restart;
?


Is clearing 'writer' what we want, since an access to COMPONENT_REF is 
an access to (a subobject of) its op0?



As for compile-hog, depends on how long it will take it to compile before
fix/after fix.  If before fix can be above the normal timeout on reasonably
fast matchines and after fix can take a few seconds, great


Currently with the fix it takes <1s while gcc12 takes ~80s.


if after fix
would take longer but still not horribly long, one way to do it is
guard the test with run_expensive_tests effective target.  Or another way
is have the test smaller in complexity normally and
// { dg-additional-options "-DEXPENSIVE" { target run_expensive_tests } }
and #ifdef EXPENSIVE make it more complex.


Curiously, making the recursion much deeper doesn't work for that; I 
guess at some point the -Wsequence-point code decides the expression is 
too complex and gives up?


But repeating the assignment brings it up over the timeout.

How about this?From bb302f97929df9b854f7f929093441da60305254 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Thu, 23 Mar 2023 15:57:39 -0400
Subject: [PATCH] c-family: -Wsequence-point and COMPONENT_REF [PR107163]
To: gcc-patches@gcc.gnu.org

The patch for PR91415 fixed -Wsequence-point to treat shifts and ARRAY_REF
as sequenced in C++17, and COMPONENT_REF as well.  But this is unnecessary
for COMPONENT_REF, since the RHS is just a FIELD_DECL with no actual
evaluation, and in this testcase handling COMPONENT_REF as sequenced blows
up fast in a deep inheritance tree.  Instead, look through it.

	PR c++/107163

gcc/c-family/ChangeLog:

	* c-common.cc (verify_tree): Don't use sequenced handling
	for COMPONENT_REF.

gcc/testsuite/ChangeLog:

	* g++.dg/template/recurse5.C: New test.
---
 gcc/c-family/c-common.cc |  5 +++-
 gcc/testsuite/g++.dg/template/recurse5.C | 37 
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/recurse5.C

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index bfb950e56db..07f7beac8fd 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2154,12 +2154,15 @@ verify_tree (tree x, struct tlist **pbefore_sp, struct tlist **pno_sp,
 
 case LSHIFT_EXPR:
 case RSHIFT_EXPR:
-case COMPONENT_REF:
 case ARRAY_REF:
   if (cxx_dialect >= cxx17)
 	goto sequenced_binary;
   goto do_default;
 
+case COMPONENT_REF:
+  x = TREE_OPERAND (x, 0);
+  goto restart;
+
 default:
 do_default:
   /* For other expressions, simply recurse on their operands.
diff --git a/gcc/testsuite/g++.dg/template/recurse5.C b/gcc/testsuite/g++.dg/template/recurse5.C
new file mode 100644
index 000..0354ab09f53
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/recurse5.C
@@ -0,0 +1,37 @@
+// PR c++/107163
+// { dg-additional-options "-Wsequence-point" }
+
+struct BaseType  {
+  int i;
+};
+
+template< int Seq >
+class DerivedType : public DerivedType< Seq - 1 > { };
+
+template<>
+class DerivedType< -1 > : public BaseType { };
+
+int main() {
+  DerivedType< 400 > d;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  d.i = 42;
+  return d.i;
+}
-- 
2.31.1



Re: [PATCH RFC] c-family: -Wsequence-point and COMPONENT_REF [PR107163]

2023-03-24 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 24, 2023 at 06:11:44PM -0400, Jason Merrill wrote:
> > When we touch this for COMPONENT_REF, shouldn't we then handle it as
> > unary given that the second operand is FIELD_DECL and third/fourth
> > will likely be NULL and even if not, aren't user expressions that should be
> > inspected?
> > So, instead of doing this do:
> >  case COMPONENT_REF:
> >x = TREE_OPERAND (x, 0);
> >writer = 0;
> >goto restart;
> > ?
> 
> Is clearing 'writer' what we want, since an access to COMPONENT_REF is an
> access to (a subobject of) its op0?

I've just mindlessly copied the unary op case.
writer is set for pre/post increments and lhs of MODIFY_EXPR, and it is
true that VIEW_CONVERT_EXPR doesn't clear it, but e.g. ARRAY_REF clears it
for all operands.

> Currently with the fix it takes <1s while gcc12 takes ~80s.

Perfect.

>   PR c++/107163
> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.cc (verify_tree): Don't use sequenced handling
>   for COMPONENT_REF.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/recurse5.C: New test.

LGTM, thanks.  Maybe the testcase would be better as
warn/Wsequence-point-5.C, dunno.

Jakub



[PATCH, V2] PR target/105325, Make load/cmp fusion know about prefixed load

2023-03-24 Thread Michael Meissner via Gcc-patches
I posted a version of patch on March 21st.  This patch makes some code changes
suggested in the genfusion.pl code.  The only change is in genfusion.pl.  The
fusion.md that it makes is the same.

The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I am re-running the tests right now, but they should have the same results
since fsuion.md is the same, and only code in genfusion.pl that makes fusion.md
was modified.  Assuming these runs pass can I check this into the master
branch?

I will also need to check these same patches into GCC 11 and GCC 12 after a
waiting period (the patch applied to those branches as well).

2023-03-21   Michael Meissner  

gcc/

PR target/105325
* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
of the ld and lwa instructions which use the DS encoding instead of D.
Use the YZ constraint for these loads.  Handle prefixed loads better.
Set the sign_extend attribute as appropriate.
* gcc/config/rs6000/fusion.md: Regenerate.
* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
instructions to the list of instructions that might have a prefixed load
instruction.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md   | 17 ++
 gcc/config/rs6000/genfusion.pl| 32 +++
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 24 ++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 5 files changed, 62 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
(match_operand:DI 3 "const_0_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
(match_operand:DI 3 "const_0_to_1_operand" "n")))
(set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET

Re: [PATCH] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-03-24 Thread Michael Meissner via Gcc-patches
On Thu, Mar 23, 2023 at 04:10:22PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> Thanks for fixing this, some minor comments are inlined below.
> 
> on 2023/3/22 07:53, Michael Meissner wrote:
> > The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
> > optimization generates illegal assembler code.
> > 
> > Ultimately the code was dying because the fusion load + compare -1/0/1 
> > patterns
> > did not handle the possibility that the load might be prefixed.
> > 
> > The main cause is the constraints for the individual loads in the fusion 
> > did not
> > match the machine.  In particular, LWA is a ds format instruction when it is
> > unprefixed.  The code did not also set the prefixed attribute correctly.
> > 
> > This patch rewrites the genfusion.pl script so that it will have more 
> > accurate
> > constraints for the LWA and LD instructions (which are DS instructions).  
> > The
> > updated genfusion.pl was then run to update fusion.md.  Finally, the code 
> > for
> > the "prefixed" attribute is modified so that it considers load + compare
> > immediate patterns to be like the normal load insns in checking whether
> > operand[1] is a prefixed instruction.
> > 
> > I have tested this patch on a little endian power10 system, on a little 
> > endian
> > power9 system, and a big endian power8 system (both -m32 and -m64 tested on
> > BE).  There were no regressions, can I check this into the trunk?
> > 
> > The same patch applies to the gcc-12 and gcc-11 branches.  Can I check this
> > patch into those branches also after a burn-in period?
> > 
> > 2023-03-21   Michael Meissner  
> >  Aaron Sawdey  
> > 
> > gcc/
> > 
> > PR target/105325
> > * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
> > of the ld and lwa instructions which use the DS encoding instead of D.
> > Use the YZ constraint for these loads.  Handle prefixed loads better.
> > Set the sign_extend attribute as appropriate.
> > * gcc/config/rs6000/fusion.md: Regenerate.
> > * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
> > instructions to the list of instructions that might have a prefixed load
> > instruction.
> > 
> > gcc/testsuite/
> > 
> > PR target/105325
> > * g++.target/powerpc/pr105325.C: New test.
> > * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
> > ---
> >  gcc/config/rs6000/genfusion.pl| 26 ---
> >  gcc/config/rs6000/fusion.md   | 17 +++-
> >  gcc/config/rs6000/rs6000.md   |  2 +-
> >  gcc/testsuite/g++.target/powerpc/pr105325.C   | 24 +
> >  .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
> >  5 files changed, 59 insertions(+), 14 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C
> > 
> > diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
> > index e4db352e0ce..4f367cadc52 100755
> > --- a/gcc/config/rs6000/genfusion.pl
> > +++ b/gcc/config/rs6000/genfusion.pl
> > @@ -56,7 +56,7 @@ sub mode_to_ldst_char
> >  sub gen_ld_cmpi_p10
> >  {
> >  my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> > -   $mempred, $ccmode, $np, $extend, $resultmode);
> > +   $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
> >LMODE: foreach $lmode ('DI','SI','HI','QI') {
> >$ldst = mode_to_ldst_char($lmode);
> >$clobbermode = $lmode;
> > @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
> >CCMODE: foreach $ccmode ('CC','CCUNS') {
> >   $np = "NON_PREFIXED_D";
> >   $mempred = "non_update_memory_operand";
> > + $constraint = "m";
> 
> The three assignments on $np $mempred $constraint can be moved
> to place (a) (see below) and add one explicit assignment for
> $constraint at place (b), since for the condition ccmode eq 'CC',
> HI/SI/DI have their own settings (btw QI is skipped), these
> assignments for default value can be moved to else arm (for CCUNS).

...

> we have broken it into two different arms for SI and DI, this
> comment can be removed?

...

> 
> ... and this comment.
> 

I have fixed these issues and reposted the patch as:

| Date: Fri, 24 Mar 2023 19:06:35 -0400
| From: Michael Meissner 
| Subject: [PATCH, V2] PR target/105325, Make load/cmp fusion know about 
prefixed load
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[pushed] docs, analyzer: improvements to "Debugging the Analyzer"

2023-03-24 Thread David Malcolm via Gcc-patches
Successfully bootstrapped on x86_64-pc-linux-gnu.
Pushed to trunk as r13-6859-gfdb06fe68253d2

gcc/ChangeLog:
* doc/analyzer.texi (Debugging the Analyzer): Add notes on useful
debugging options.
(Special Functions for Debugging the Analyzer): Convert to a
table, and rewrite in places.
(Other Debugging Techniques): Add notes on how to compare two
different exploded graphs.

Signed-off-by: David Malcolm 
---
 gcc/doc/analyzer.texi | 125 +++---
 1 file changed, 116 insertions(+), 9 deletions(-)

diff --git a/gcc/doc/analyzer.texi b/gcc/doc/analyzer.texi
index 0afd1143d4c..2692b0e3ece 100644
--- a/gcc/doc/analyzer.texi
+++ b/gcc/doc/analyzer.texi
@@ -437,13 +437,97 @@ than printing the underlying variable name.
 @cindex analyzer, debugging
 @cindex static analyzer, debugging
 
+When debugging the analyzer I normally use all of these options
+together:
+
+@smallexample
+./xgcc -B. \
+  -S \
+  -fanalyzer \
+  OTHER_GCC_ARGS \
+  -wrapper gdb,--args \
+  -fdump-analyzer-stderr \
+  -fanalyzer-fine-grained \
+  -fdump-ipa-analyzer=stderr
+@end smallexample
+
+where:
+
+@itemize @bullet
+@item @code{./xgcc -B.}
+is the usual way to invoke a self-built GCC from within the @file{BUILDDIR/gcc}
+subdirectory.
+
+@item @code{-S}
+so that the driver (@code{./xgcc}) invokes @code{cc1}, but doesn't bother
+running the assembler or linker (since the analyzer runs inside @code{cc1}).
+
+@item @code{-fanalyzer}
+enables the analyzer, obviously.
+
+@item @code{-wrapper gdb,--args}
+invokes @code{cc1} under the debugger so that I can debug @code{cc1} and
+set breakpoints and step through things.
+
+@item @code{-fdump-analyzer-stderr}
+so that the logging interface is enabled and goes to stderr, which often
+gives valuable context into what's happening when stepping through the
+analyzer
+
+@item @code{-fanalyzer-fine-grained}
+which splits the effect of every statement into its own
+exploded_node, rather than the default (which tries to combine
+successive stmts to reduce the size of the exploded_graph).  This makes
+it easier to see exactly where a particular change happens.
+
+@item @code{-fdump-ipa-analyzer=stderr}
+which dumps the GIMPLE IR seen by the analyzer pass to stderr
+
+@end itemize
+
+Other useful options:
+
+@itemize @bullet
+@item @code{-fdump-analyzer-exploded-graph}
+which dumps a @file{SRC.eg.dot} GraphViz file that I can look at (with
+python-xdot)
+
+@item @code{-fdump-analyzer-exploded-nodes-2}
+which dumps a @file{SRC.eg.txt} file containing the full @code{exploded_graph}.
+
+@end itemize
+
+Assuming that you have the
+@uref{https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html,,python
 support scripts for gdb}
+installed, you can use:
+
+@smallexample
+(gdb) break-on-saved-diagnostic
+@end smallexample
+
+to put a breakpoint at the place where a diagnostic is saved during
+@code{exploded_graph} exploration, to see where a particular diagnostic
+is being saved, and:
+
+@smallexample
+(gdb) break-on-diagnostic
+@end smallexample
+
+to put a breakpoint at the place where diagnostics are actually emitted.
+
 @subsection Special Functions for Debugging the Analyzer
 
 The analyzer recognizes various special functions by name, for use
-in debugging the analyzer.  Declarations can be seen in the testsuite
+in debugging the analyzer, and for use in DejaGnu tests.
+
+The declarations of these functions can be seen in the testsuite
 in @file{analyzer-decls.h}.  None of these functions are actually
-implemented.
+implemented in terms of code, merely as @code{known_function} subclasses
+(in @file{gcc/analyzer/kf-analyzer.cc}).
+
+@table @code
 
+@item __analyzer_break
 Add:
 @smallexample
   __analyzer_break ();
@@ -452,6 +536,7 @@ to the source being analyzed to trigger a breakpoint in the 
analyzer when
 that source is reached.  By putting a series of these in the source, it's
 much easier to effectively step through the program state as it's analyzed.
 
+@item __analyzer_describe
 The analyzer handles:
 
 @smallexample
@@ -462,6 +547,7 @@ by emitting a warning describing the 2nd argument (which 
can be of any
 type), at a verbosity level given by the 1st argument.  This is for use when
 debugging, and may be of use in DejaGnu tests.
 
+@item __analyzer_dump
 @smallexample
 __analyzer_dump ();
 @end smallexample
@@ -469,6 +555,7 @@ __analyzer_dump ();
 will dump the copious information about the analyzer's state each time it
 reaches the call in its traversal of the source.
 
+@item __analyzer_dump_capacity
 @smallexample
 extern void __analyzer_dump_capacity (const void *ptr);
 @end smallexample
@@ -476,6 +563,7 @@ extern void __analyzer_dump_capacity (const void *ptr);
 will emit a warning describing the capacity of the base region of
 the region pointed to by the 1st argument.
 
+@item __analyzer_dump_escaped
 @smallexample
 extern void __analyzer_dump_escaped (void);
 @end smallexample
@@ -484,16 +572,19 @@ will em

[pushed] diagnostics: ensure that .sarif files are UTF-8 encoded [PR109098]

2023-03-24 Thread David Malcolm via Gcc-patches
PR analyzer/109098 notes that the SARIF spec mandates that .sarif
files are UTF-8 encoded, but -fdiagnostics-format=sarif-file naively
assumes that the source files are UTF-8 encoded when quoting source
artefacts in the .sarif output, which can lead to us writing out
.sarif files with non-UTF-8 bytes in them (which break my reporting
scripts).

The root cause is that sarif_builder::maybe_make_artifact_content_object
was using maybe_read_file to load the file content as bytes, and
assuming they were UTF-8 encoded.

This patch reworks both overloads of this function (one used for the
whole file, the other for snippets of quoted lines) so that they go
through input.cc's file cache, which attempts to decode the input files
according to the input charset, and then encode as UTF-8.  They also
check that the result actually is UTF-8, for cases where the input
charset is missing, or incorrectly specified, and omit the quoted
source for such awkward cases.

Doing so fixes all of the cases I've encountered.

The patch adds a new:
  { dg-final { verify-sarif-file } }
directive to all SARIF test cases in the test suite, which verifies
that the output is UTF-8 encoded, and is valid JSON.  In particular
it verifies that when we complain about encoding problems, the .sarif
report we emit is itself correctly encoded.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Integration testing shows no regressions, and a fix for the case
seen in haproxy-2.7.1.
Pushed to trunk as r13-6861-gd495ea2b232f3e.

FWIW the invalid UTF-8 in the patch (in some of the test cases) seems
to have broken the server-side scripts:

Enumerating objects: 51, done.
Counting objects: 100% (51/51), done.
Delta compression using up to 64 threads
Compressing objects: 100% (29/29), done.
Writing objects: 100% (29/29), 7.74 KiB | 1.29 MiB/s, done.
Total 29 (delta 22), reused 0 (delta 0), pack-reused 0
remote: Traceback (most recent call last):
remote:   File "hooks/post_receive.py", line 118, in 
remote: post_receive(refs_data, args.submitter_email)
remote:   File "hooks/post_receive.py", line 65, in post_receive
remote: submitter_email)
remote:   File "hooks/post_receive.py", line 47, in post_receive_one
remote: update.send_email_notifications()
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 189, 
in send_email_notifications
remote: self.__email_new_commits()
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 
1031, in __email_new_commits
remote: commit, self.get_standard_commit_email(commit))
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 
1011, in __send_commit_email
remote: default_diff=email.diff)
remote:   File 
"/sourceware1/projects/src-home/git-hooks/hooks/updates/__init__.py", line 946, 
in __maybe_get_email_custom_contents
remote: hook_input=json.dumps(hooks_data),
remote:   File "/usr/lib64/python2.7/json/__init__.py", line 244, in dumps
remote: return _default_encoder.encode(obj)
remote:   File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
remote: chunks = self.iterencode(o, _one_shot=True)
remote:   File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
remote: return _iterencode(o, 0)
remote: UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 
13147: invalid start byte
To git+ssh://gcc.gnu.org/git/gcc.git
   13ec81eb4c3..d495ea2b232  master -> master

gcc/ChangeLog:
PR analyzer/109098
* diagnostic-format-sarif.cc (read_until_eof): Delete.
(maybe_read_file): Delete.
(sarif_builder::maybe_make_artifact_content_object): Use
get_source_file_content rather than maybe_read_file.
Reject it if it's not valid UTF-8.
* input.cc (file_cache_slot::get_full_file_content): New.
(get_source_file_content): New.
(selftest::check_cpp_valid_utf8_p): New.
(selftest::test_cpp_valid_utf8_p): New.
(selftest::input_cc_tests): Call selftest::test_cpp_valid_utf8_p.
* input.h (get_source_file_content): New prototype.

gcc/testsuite/ChangeLog:
PR analyzer/109098
* c-c++-common/diagnostic-format-sarif-file-1.c: Add
verify-sarif-file directive.
* c-c++-common/diagnostic-format-sarif-file-2.c: Likewise.
* c-c++-common/diagnostic-format-sarif-file-3.c: Likewise.
* c-c++-common/diagnostic-format-sarif-file-4.c: Likewise.
* c-c++-common/diagnostic-format-sarif-file-Wbidi-chars.c: New
test case, adapted from Wbidi-chars-1.c.
* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-1.c:
New test case.
* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-2.c:
New test case.
* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-3.c:
New test case, adapted from cpp/Winvalid-utf8-1.c.
* c-c++-common/diagnostic-format-sarif-f