Re: [PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives

2021-10-21 Thread Chung-Lin Tang



On 2021/10/21 12:15 AM, Jakub Jelinek wrote:

+program main
+  integer :: x, i, n
+
+  !$omp parallel
+  block
+x = x + 1
+  end block

I'd prefer not to use those x = j or x = x + 1 etc.
as statements that do random work here whenever possible.
While those are dg-do compile testcases, especially if
it is without dg-errors I think it is preferrable not to show
bad coding examples.
E.g. the x = x + 1 above is wrong for 2 reasons, x is uninitialized
before the parallel, and there is a data race, the threads, teams etc.
can write to x concurrently.
I think better would be to use something like
 call do_work
which doesn't have to be defined anywhere and will just stand there
as a black box for unspecified work.


+  !$omp workshare
+  block
+x = x + 1
+  end block

There are exceptions though, e.g. workshare is such a case, because
e.g. call do_work is not valid in workshare.
So, it is ok to keep using x = x + 1 here if you initialize it
first at the start of the program.


+  !$omp workshare
+  block
+x = 1
+!$omp critical
+block
+  x = 3
+end block
+  end block

And then there are cases like the above, please
just use different variables there (all initialized) or
say an array and access different elements in the different spots.

Jakub



Thanks, attached is what I finally committed.

Chung-Lin



From 2e4659199e814b7ee0f6bd925fd2c0a7610da856 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 21 Oct 2021 14:56:20 +0800
Subject: [PATCH] openmp: Fortran strictly-structured blocks support

This implements strictly-structured blocks support for Fortran, as specified in
OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most
OpenMP constructs, with a "!$omp end ..." ending directive optional for that
form.

gcc/fortran/ChangeLog:

* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
together with COMP_BLOCK.
* parse.c (parse_omp_structured_block): Change return type to
'gfc_statement', add handling for strictly-structured block case, adjust
recursive calls to parse_omp_structured_block.
(parse_executable): Adjust calls to parse_omp_structured_block.
* parse.h (enum gfc_compile_state): Add
COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
handling.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/cancel-1.f90: Adjust testcase.
* gfortran.dg/gomp/nesting-3.f90: Adjust testcase.
* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-2.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-3.f90: New test.

libgomp/ChangeLog:

* libgomp.texi (Support of strictly structured blocks in Fortran):
Adjust to 'Y'.
* testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase.
---
 gcc/fortran/decl.c|   1 +
 gcc/fortran/parse.c   |  69 +-
 gcc/fortran/parse.h   |   2 +-
 gcc/fortran/trans-openmp.c|   6 +-
 gcc/testsuite/gfortran.dg/gomp/cancel-1.f90   |   3 +
 gcc/testsuite/gfortran.dg/gomp/nesting-3.f90  |  20 +-
 .../gomp/strictly-structured-block-1.f90  | 214 ++
 .../gomp/strictly-structured-block-2.f90  | 139 
 .../gomp/strictly-structured-block-3.f90  |  52 +
 libgomp/libgomp.texi  |   2 +-
 .../libgomp.fortran/task-reduction-16.f90 |   1 +
 11 files changed, 484 insertions(+), 25 deletions(-)
 create mode 100644 
gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90
 create mode 100644 
gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90
 create mode 100644 
gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 6784b07ae9e..6043e100fbb 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8429,6 +8429,7 @@ gfc_match_end (gfc_statement *st)
   break;
 
 case COMP_BLOCK:
+case COMP_OMP_STRICTLY_STRUCTURED_BLOCK:
   *st = ST_END_BLOCK;
   target = " block";
   eos_ok = 0;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 2a454be79b0..b1e73ee6801 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -5459,7 +5459,7 @@ parse_oacc_loop (gfc_statement acc_st)
 
 /* Parse the statements of an OpenMP structured block.  */
 
-static void
+static gfc_statement
 parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
 {
   gfc_statement st, omp_end_st;
@@ -5546,6 +5546,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool 
workshare_stmts_only)
   gcc_unreachable ();
 }
 
+  bool block_construct = false;
+  gfc_namespace *my_ns = NULL;
+  gfc_namespace *my_parent = NULL;
+
+  st = next_statement ();
+
+  if (st == ST_BLOCK)
+{
+  /* Adjust stat

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-21 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu  wrote:
>
> On Wed, Oct 20, 2021 at 9:58 AM Richard Biener
>  wrote:
> >
> > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu"  
> > wrote:
> > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
> > > wrote:
> > >>
> > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong  wrote:
> > >> >
> > >> > Many thanks for your explanation. I got the meaning of operands.
> > >> > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned 
> > >> > otherwise it will rise a "Real-Address Mode Exceptions".
> > >> > I haven't considered this situation  "b(%rip)" has an address 
> > >> > dependence of "a(%rip)" before. I think this situation could be 
> > >> > resolved on the assembler side except for this dummy code like "movapd 
> > >> > 0x200b37(%rip),%xmm1, ... addpd  0x200b37(%rip),%xmm0 ".
> > >>
> > >> Of course the compiler will only emit instructions which have the
> > >> constraint of aligned memory
> > >> when the memory is known to be aligned.  That's why I wonder why you
> > >> would need such
> > >> option.  "Real-Address Mode Exceptions" may point to the issue, but I
> > >> wonder what's different
> > >> in real mode vs. protected mode - even with segmentation the alignment
> > >> of objects should
> > >> prevail unless you play linker"tricks" that make global objects have
> > >> different alignment - but
> > >> then it's better to adjust the respective hooks to not falsely claim
> > >> such alignment.  Consider
> > >> for example
> > >>
> > >>if ((uintptr_t)&a & 0x7)
> > >>  foo();
> > >>   else
> > >>  bar();
> > >>
> > >> GCC will optimize the branch statically to always call foo if 'a'
> > >> appears to be aligned,
> > >> even if you later try to "override" this with an option.  Alignment is
> > >> not only about
> > >> moves, it's also about knowledge about low bits in addresses and about
> > >> alias analysis where alignment constrains how two objects can overlap.
> > >>
> > >> So - do not lie to the compiler!  A late "workaround" avoiding aligned
> > >> SSE moves isn't a proper fix.
> > >>
> > >
> > >The motivations are
> > >
> > >1.  AVX non-load/store ops work on unaligned memory.   Unaligned
> > >load/store on aligned memory is as fast as aligned load/store on Intel
> > >AVX machines.   The new switch makes load/store consistent with
> > >other AVX ops.
> > >2. We don't properly align the stack for AVX on Windows.  This can
> > >be used as a workaround for -mavx on Windows.
> >
> > But this, with lying that the stack is aligned, causes all of the above 
> > mentioned issues and thus needs to be fixed by either properly aligning the 
> > stack or not lying to the compiler that we do.
> >
> > >
> > >We can change TARGET_USE_UNALIGNED_VECTOR_MOVE
> > >to require AVX.
> >
> > But such workaround does not make any sense since it does not fix the 
> > fundamental underlying problem.
> >
>
> There is a long standing desire to remove alignment checking (#AC(0)).
> For integer operations, alignment checking is disabled in hardware.
> For AVX ops, alignment checking is disabled in hardware for non-load/store
> instructions.  But we can't disable alignment checking in hardware for
> aligned load/store instructions.  -muse-unaligned-vector-move implements
> disabling alignment checking for all AVX ops.

No, it does not - it just emits unaligned moves.  The compiler still assumes
aligned memory.  So whatever reason you have for disabling alignment
checking for memory that is known to be aligned, I don't see it.

If you want to "fix" broken user code then this doesn't do it.

If you want to avoid the penalty for runtime stack alignment then you simply
have to change the ABI(?) to not require vector types to have big alignment.

Richard.

>
> --
> H.J.


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
On Wed, Oct 20, 2021 at 10:01 PM Jeff Law  wrote:
>
>
>
> On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
> > On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor  wrote:
> >
> >> I appreciate the heads up.  I'm happy that the threader has
> >> improved.  I'm obviously not pleased that it has led to regressions
> >> in warnings but I understand that in some cases they might be due
> >> to limitations in the warning code.  I think the test case you have
> >> xfailed might be one such example.  The uninitialized warnings are
> >> exquisitely sensitive to these types of changes.  If/when this patch
> >> is applied please reopen PR 89230 and reference this commit.
> >>
> >> Having said that, to maintain the quality of diagnostics,
> >> the work that goes into these nice optimizer improvements needs
> >> to be balanced by an effort to either update the warning code
> >> to cope with the IL changes, or the optimizers need to take care
> >> to avoid exposing undefined code that the warnings are designed
> >> to detect.  I'm concerned not just that the quality of GCC 12
> >> diagnostics has been eroding, but also that it seems to be not
> >> just acceptable but expected.
> > You make a very good point.  It is certainly not my intention to make
> > life difficult for the warning maintainers, but I'm afraid I don't
> > have sufficient knowledge in the area to improve them.
> >
> > There may be some low hanging fruit though.  At least in the warnings
> > that use the ranger, there's no reason to run these passes so late in
> > the pipeline.  You could run the warning code as early as you want,
> > insofar as SSA is available and the CFG has been built.  Heck, you may
> > even be able to run at -O0, though we may need some sort of value
> > numbering.  I believe Richi even suggested this a while back.
> Running them later in the pipeline is to take advantage of the
> optimizers removing dead and unreachable code as much as possible. In
> fact, that's critical to -Wuninitialized.  Optimizing away unreachable
> paths  to avoid Wuninitialized false positives has been the major driver
> of jump threading improvements for the last 15 years.

Ughh, that's unfortunate.  We're gonna have to come up with
improvements to the Wuninitialized code, or a different paradigm
altogether.  I'm afraid this will only get worse.

It is a bit ironic that jump threading helps reduce Wuninitialized
false positives, but yet too much of it causes even more false
positives.

Aldy



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-21 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 10:02 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 10/20/2021 9:15 AM, Aldy Hernandez wrote:
> > On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor  wrote:
> >
> >> I appreciate the heads up.  I'm happy that the threader has
> >> improved.  I'm obviously not pleased that it has led to regressions
> >> in warnings but I understand that in some cases they might be due
> >> to limitations in the warning code.  I think the test case you have
> >> xfailed might be one such example.  The uninitialized warnings are
> >> exquisitely sensitive to these types of changes.  If/when this patch
> >> is applied please reopen PR 89230 and reference this commit.
> >>
> >> Having said that, to maintain the quality of diagnostics,
> >> the work that goes into these nice optimizer improvements needs
> >> to be balanced by an effort to either update the warning code
> >> to cope with the IL changes, or the optimizers need to take care
> >> to avoid exposing undefined code that the warnings are designed
> >> to detect.  I'm concerned not just that the quality of GCC 12
> >> diagnostics has been eroding, but also that it seems to be not
> >> just acceptable but expected.
> > You make a very good point.  It is certainly not my intention to make
> > life difficult for the warning maintainers, but I'm afraid I don't
> > have sufficient knowledge in the area to improve them.
> >
> > There may be some low hanging fruit though.  At least in the warnings
> > that use the ranger, there's no reason to run these passes so late in
> > the pipeline.  You could run the warning code as early as you want,
> > insofar as SSA is available and the CFG has been built.  Heck, you may
> > even be able to run at -O0, though we may need some sort of value
> > numbering.  I believe Richi even suggested this a while back.
> Running them later in the pipeline is to take advantage of the
> optimizers removing dead and unreachable code as much as possible. In
> fact, that's critical to -Wuninitialized.  Optimizing away unreachable
> paths  to avoid Wuninitialized false positives has been the major driver
> of jump threading improvements for the last 15 years.

Well, the good old idea is to simply queue the diagnostics and only emit
them if a stmt (with the location?) gets expanded.  Of course the details
on what exactly to key the warning on is tricky.

Richard.

> jeff


Re: [r12-4559 Regression] FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1 on Linux/x86_64

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 3:39 AM sunil.k.pandey via Gcc-patches
 wrote:
>
> On Linux/x86_64,
>
> 914045dff10fbd27de27b90a0ac78a0058b2c86e is the first bad commit
> commit 914045dff10fbd27de27b90a0ac78a0058b2c86e
> Author: Andre Simoes Dias Vieira 
> Date:   Wed Oct 20 13:12:09 2021 +0100
>
> [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under 
> -ffast-math on aarch64
>
> caused
>
> FAIL: gcc.dg/vect/bb-slp-57.c -flto -ffat-lto-objects  scan-tree-dump-times 
> slp1 "transform load" 1
> FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1

That's because

_5 = .TRUNC (_3);

is

(gdb) p gimple_call_nothrow_p (call_stmt)
$2 = false

it is vectorizable though.  The conversions the call is synthesized
from are 'nothrow'.  I'll fix
that since the nothrow check is irrelevant here.

> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4559/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-57.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at skpgkp2 at gmail dot com)


Re: [r12-4559 Regression] FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1 on Linux/x86_64

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 9:30 AM Richard Biener
 wrote:
>
> On Thu, Oct 21, 2021 at 3:39 AM sunil.k.pandey via Gcc-patches
>  wrote:
> >
> > On Linux/x86_64,
> >
> > 914045dff10fbd27de27b90a0ac78a0058b2c86e is the first bad commit
> > commit 914045dff10fbd27de27b90a0ac78a0058b2c86e
> > Author: Andre Simoes Dias Vieira 
> > Date:   Wed Oct 20 13:12:09 2021 +0100
> >
> > [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under 
> > -ffast-math on aarch64
> >
> > caused
> >
> > FAIL: gcc.dg/vect/bb-slp-57.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > slp1 "transform load" 1
> > FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1
>
> That's because
>
> _5 = .TRUNC (_3);
>
> is
>
> (gdb) p gimple_call_nothrow_p (call_stmt)
> $2 = false
>
> it is vectorizable though.  The conversions the call is synthesized
> from are 'nothrow'.  I'll fix
> that since the nothrow check is irrelevant here.

For the specific testcase that doesn't help though since there's no
optab for V2SFmode as needed by the testcase.

>
> > with GCC configured with
> >
> > ../../gcc/configure 
> > --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4559/usr
> >  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet 
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-57.c --target_board='unix{-m64\ 
> > -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report, contact 
> > me at skpgkp2 at gmail dot com)


Re: [PATCH] Improve maybe_remove_writeonly_store to do a simple DCE for defining statement

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 7:53 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Instead of putting a full blow DCE after execute_fixup_cfg, it makes sense
> to try to remove the defining statement for the store that is being removed.
> Using simple_dce_from_worklist makes this easier, just mark the ssa_name on
> the rhs side of the store (if it was one) in a bitmap and then call
> simple_dce_from_worklist at the end.
>
> gcc.dg/pr36902.c needed to be changed such that the static array was no
> longer a static array but a global array.  This is because this new dce
> will remove the load as it is dead. I also filed PR 102864 for the warning
> on dead loads.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-cfg.c (maybe_remove_writeonly_store): Add dce_ssa_names 
> argument.
> Mark the ssa-name of the rhs as one to be removed.
> (execute_fixup_cfg): Update call to maybe_remove_writeonly_store.
> Call simple_dce_from_worklist at the end to a simple dce.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr36902.c: Move buf to be a non-static variable.
> ---
>  gcc/testsuite/gcc.dg/pr36902.c |  5 +
>  gcc/tree-cfg.c | 18 --
>  2 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/pr36902.c b/gcc/testsuite/gcc.dg/pr36902.c
> index 7dafc9ac171..365a26e26b7 100644
> --- a/gcc/testsuite/gcc.dg/pr36902.c
> +++ b/gcc/testsuite/gcc.dg/pr36902.c
> @@ -24,10 +24,9 @@ struct {
>unsigned char pcr_select[4];
>  } sel;
>
> +unsigned char buf[64];
>  int bar(void)
>  {
> -  static unsigned char buf[64];
> -
>sel.size_of_select = 3;
>foo(buf, sel.pcr_select, sel.size_of_select);
>
> @@ -52,8 +51,6 @@ foo2(unsigned char * to, const unsigned char * from, int n)
>
>  int baz(void)
>  {
> -  static unsigned char buf[64];
> -
>sel.size_of_select = 5;
>foo2(buf, sel.pcr_select, sel.size_of_select);
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index dbbf6beb6e4..b3a27bcd17c 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "value-prof.h"
>  #include "tree-inline.h"
>  #include "tree-ssa-live.h"
> +#include "tree-ssa-dce.h"
>  #include "omp-general.h"
>  #include "omp-expand.h"
>  #include "tree-cfgcleanup.h"
> @@ -9669,7 +9670,8 @@ make_pass_warn_unused_result (gcc::context *ctxt)
>  /* Maybe Remove stores to variables we marked write-only.
> Return true if a store was removed. */
>  static bool
> -maybe_remove_writeonly_store (gimple_stmt_iterator &gsi, gimple *stmt)
> +maybe_remove_writeonly_store (gimple_stmt_iterator &gsi, gimple *stmt,
> + bitmap dce_ssa_names)
>  {
>/* Keep access when store has side effect, i.e. in case when source
>   is volatile.  */
> @@ -9692,6 +9694,15 @@ maybe_remove_writeonly_store (gimple_stmt_iterator 
> &gsi, gimple *stmt)
>print_gimple_stmt (dump_file, stmt, 0,
>  TDF_VOPS|TDF_MEMSYMS);
>  }
> +
> +  /* Mark ssa name defining to be checked for simple dce. */
> +  if (gimple_assign_single_p (stmt))
> +{
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  if (TREE_CODE (rhs) == SSA_NAME
> + && !SSA_NAME_IS_DEFAULT_DEF (rhs))
> +   bitmap_set_bit (dce_ssa_names, SSA_NAME_VERSION (rhs));
> +}
>unlink_stmt_vdef (stmt);
>gsi_remove (&gsi, true);
>release_defs (stmt);
> @@ -9714,6 +9725,7 @@ execute_fixup_cfg (void)
>profile_count num = node->count;
>profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>bool scale = num.initialized_p () && !(num == den);
> +  auto_bitmap dce_ssa_names;
>
>if (scale)
>  {
> @@ -9754,7 +9766,7 @@ execute_fixup_cfg (void)
>  }
>
>   /* Remove stores to variables we marked write-only. */
> - if (maybe_remove_writeonly_store (gsi, stmt))
> + if (maybe_remove_writeonly_store (gsi, stmt, dce_ssa_names))
> {
>   todo |= TODO_update_ssa | TODO_cleanup_cfg;
>   continue;
> @@ -9820,6 +9832,8 @@ execute_fixup_cfg (void)
>&& (todo & TODO_cleanup_cfg))
>  loops_state_set (LOOPS_NEED_FIXUP);
>
> +  simple_dce_from_worklist (dce_ssa_names);
> +
>return todo;
>  }
>
> --
> 2.17.1
>


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
> Massaging the IL should only take two forms IIRC.
>
> First, if we have a simplification we can do.  That could be const/copy
> propagation, replacing an expression with an SSA_NAME or constant and
> the like.  It doesn't massage the IL just to massage the IL.
>
> Second, we do temporarily copy propagate the current known values of an
> SSA name into use points and then see if that allows us to determine if
> a statement is already in the hash tables.  But we undo that so that
> nobody should see that temporary change in state.
>
> Finally, it does create some expressions & statements on the fly to
> enter them into the tables.  For example, if it sees a store, it'll
> create a load with the source & dest interchanged and enter that into
> the expression table.  But none of this stuff ever shows up in the IL.
> It's just to create entries in the expression tables.
>
> So ITSM the only real concern would be if those temporary const/copy
> propagations were still in the IL and we called back into Ranger and it
> poked at that data somehow.

Hmmm, this is all very good insight.  Thanks.

One thing that would have to be adjusted then is remove the
enable_ranger() call in the patch.  This sets a global ranger, and
there are users of get_range_query() that will use it to get on-demand
ranges.  One such use that I added was ssa_name_has_boolean_range in
tree-ssa-dom.c.  This means that if the IL has been temporarily
changed, this function can and will use the global ranger.  The
alternative here would be to just create a new local ranger:

-  gimple_ranger *ranger = enable_ranger (fun);
+  gimple_ranger *ranger = new gimple_ranger;

and then obviously deallocate it at the disable_ranger call site.

This will cause any users of get_range_query() in the compiler to just
use global ranges.  Hopefully, none of these downstream users of
get_range_query() from DOM need context sensitive results.
ssa_name_has_boolean_range??

I think what you'd need to do is check that there are no calls to the
ranger from cprop_into_stmt (?? this is the place where IL changes??),
until wherever the undoing happens (I couldn't find it).  I see a call
to simplify_using_ranges in optimize_stmt that looks like it could be
called with the IL in mid-flight.  Maybe this call needs to happen
before the IL is altered?

> So if we're referring to those temporary const/copy propagations
> "escaping" into Ranger, then I would fully expect that to cause
> problems.  Essentially they're path sensitive const/copy propagations
> and may not be valid on all the paths through the CFG to the statement
> where the propagation occurs

Yeah.  disabling the global ranger should help, plus making sure you
don't use the ranger in the midst of the path sensitive changes.

Aldy



[PATCH] Handle jobserver file descriptors in btest.

2021-10-21 Thread Martin Liška

Hi.

The patch is about sensitive handling of file descriptors opened
by make's jobserver.

Ready for master?
Thanks,
Martin

PR testsuite/102742

libbacktrace/ChangeLog:

* btest.c (check_open_files): Ignore file descriptors provided
by jobserver.
---
 libbacktrace/btest.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c
index 9f9c03babf3..a1ea93728be 100644
--- a/libbacktrace/btest.c
+++ b/libbacktrace/btest.c
@@ -464,9 +464,25 @@ static void
 check_open_files (void)
 {
   int i;
+  int rfd = -1;
+  int wfd = -1;
+
+#define JS_NEEDLE "--jobserver-auth="
+
+  /* Ignore file descriptors provided by jobserver.  */
+  const char *makeflags = getenv ("MAKEFLAGS");
+  if (makeflags != NULL)
+{
+  const char *n = strstr (makeflags, JS_NEEDLE);
+  if (n != NULL)
+   sscanf (n + strlen (JS_NEEDLE), "%d,%d", &rfd, &wfd);
+}
 
   for (i = 3; i < 10; i++)

 {
+  if (i == rfd || i == wfd)
+   continue;
+
   if (close (i) == 0)
{
  fprintf (stderr,
--
2.33.1



[PATCH] testsuite/102861 - adjust gcc.dg/vect/bb-slp-16.c change

2021-10-21 Thread Richard Biener via Gcc-patches
This reverts the bogus previous change causing runtime failures
and instead realizes that we now have the loop condition
if-converted and the BB vectorization opportunity realized during
the loop vectorization pass.

Tested on x86_64-unknown-linux-gnu, pushed.

2021-10-21  Richard Biener  

PR testsuite/102861
* gcc.dg/vect/bb-slp-16.c: Revert previous change, scan
the vect dump instead.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 73 ---
 1 file changed, 32 insertions(+), 41 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 4fc176dde84..82fae06e324 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -1,4 +1,6 @@
 /* { dg-require-effective-target vect_int } */
+/* The SLP vectorization happens as part of the if-converted loop body.  */
+/* { dg-additional-options "-fdump-tree-vect-details" } */
 
 #include 
 #include "tree-vect.h"
@@ -16,52 +18,41 @@ main1 (int dummy)
   unsigned int *pin = &in[0];
   unsigned int *pout = &out[0];
   unsigned int a = 0;
-
-  i = N;
-  if (i > 0)
+  
+  for (i = 0; i < N; i++)
 {
-  do
-   {
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- *pout++ = *pin++ + a;
- if (arr[i] = i)
-   a = i;
- else
-   a = 2;
-   }
-  while (i < N);
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  *pout++ = *pin++ + a;
+  if (arr[i] = i)
+a = i;
+  else
+a = 2;
 }
 
   a = 0;
-  /* check results: */
-  i = N;
-  if (i > 0)
+  /* check results: */ 
+  for (i = 0; i < N; i++)
 {
-  do
-   {
- if (out[i*8] !=  in[i*8] + a
- || out[i*8 + 1] != in[i*8 + 1] + a
- || out[i*8 + 2] != in[i*8 + 2] + a
- || out[i*8 + 3] != in[i*8 + 3] + a
- || out[i*8 + 4] != in[i*8 + 4] + a
- || out[i*8 + 5] != in[i*8 + 5] + a
- || out[i*8 + 6] != in[i*8 + 6] + a
- || out[i*8 + 7] != in[i*8 + 7] + a)
-   abort ();
+  if (out[i*8] !=  in[i*8] + a
+ || out[i*8 + 1] != in[i*8 + 1] + a
+ || out[i*8 + 2] != in[i*8 + 2] + a
+ || out[i*8 + 3] != in[i*8 + 3] + a
+ || out[i*8 + 4] != in[i*8 + 4] + a
+ || out[i*8 + 5] != in[i*8 + 5] + a
+ || out[i*8 + 6] != in[i*8 + 6] + a
+ || out[i*8 + 7] != in[i*8 + 7] + a)
+   abort ();
 
- if (arr[i] = i)
-   a = i;
- else
-   a = 2;
- i++;
-   }
-  while (i < N);
+  if (arr[i] = i)
+a = i;
+  else
+a = 2;
 }
 
   return 0;
@@ -76,4 +67,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "vect" } } */
-- 
2.31.1


[COMMITTED] Avoid threading circular paths.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
The backward threader keeps a hash of visited blocks to avoid crossing
the same block twice.  Interestingly, we haven't been checking it for
the final block out of the path.  This may be inherited from the old
code, as it was simple enough that it didn't matter.  With the
upcoming changes enabling the fully resolving threader, it gets
tripped often enough to cause wrong code to be generated.

Tested on x86-64 Linux.

Committed as obvious.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (back_threader::maybe_register_path):
Avoid threading circular paths.
---
 gcc/tree-ssa-threadbackward.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index d94e3b962db..38913b06717 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -140,6 +140,10 @@ back_threader::maybe_register_path ()
 
   if (taken_edge && taken_edge != UNREACHABLE_EDGE)
 {
+  // Avoid circular paths.
+  if (m_visited_bbs.contains (taken_edge->dest))
+   return UNREACHABLE_EDGE;
+
   bool irreducible = false;
   bool profitable
= m_profit.profitable_path_p (m_path, m_name, taken_edge, &irreducible);
-- 
2.31.1



[PATCH] Remove restriction of SLP vectorizing internal function calls

2021-10-21 Thread Richard Biener via Gcc-patches
We already checked for unsupported internal throwing calls,
general nothrow is not required.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-21  Richard Biener  

* tree-vect-slp.c (vect_build_slp_tree_1): Remove
superfluous gimple_call_nothrow_p check.
---
 gcc/tree-vect-slp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 709bcb63686..eff46405e87 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -975,7 +975,6 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
(gimple_call_internal_fn (call_stmt
   || gimple_call_tail_p (call_stmt)
   || gimple_call_noreturn_p (call_stmt)
-  || !gimple_call_nothrow_p (call_stmt)
   || gimple_call_chain (call_stmt))
{
  if (dump_enabled_p ())
-- 
2.31.1


[committed] openmp: For default(none) ignore variables created by ubsan_create_data [PR64888]

2021-10-21 Thread Jakub Jelinek via Gcc-patches
Hi!

We weren't ignoring the ubsan variables created by c-ubsan.c before 
gimplification
(others are added later).  One way to fix this would be to introduce further
UBSAN_ internal functions and lower it later (sanopt pass) like other ifns,
this patch instead recognizes those magic vars by name/name of type and 
DECL_ARTIFICIAL
and TYPE_ARTIFICIAL.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-21  Jakub Jelinek  

PR middle-end/64888
gcc/c-family/
* c-omp.c (c_omp_predefined_variable): Return true also for
ubsan_create_data created artificial variables.
gcc/testsuite/
* c-c++-common/ubsan/pr64888.c: New test.

--- gcc/c-family/c-omp.c.jj 2021-10-15 11:59:15.753689459 +0200
+++ gcc/c-family/c-omp.c2021-10-20 12:25:44.557498097 +0200
@@ -2860,13 +2860,44 @@ c_omp_predefined_variable (tree decl)
 {
   if (VAR_P (decl)
   && DECL_ARTIFICIAL (decl)
-  && TREE_READONLY (decl)
   && TREE_STATIC (decl)
-  && DECL_NAME (decl)
-  && (DECL_NAME (decl) == ridpointers[RID_C99_FUNCTION_NAME]
- || DECL_NAME (decl) == ridpointers[RID_FUNCTION_NAME]
- || DECL_NAME (decl) == ridpointers[RID_PRETTY_FUNCTION_NAME]))
-return true;
+  && DECL_NAME (decl))
+{
+  if (TREE_READONLY (decl)
+ && (DECL_NAME (decl) == ridpointers[RID_C99_FUNCTION_NAME]
+ || DECL_NAME (decl) == ridpointers[RID_FUNCTION_NAME]
+ || DECL_NAME (decl) == ridpointers[RID_PRETTY_FUNCTION_NAME]))
+   return true;
+  /* For UBSan handle the same also ubsan_create_data created
+variables.  There is no magic flag for those, but user variables
+shouldn't be DECL_ARTIFICIAL or have TYPE_ARTIFICIAL type with
+such names.  */
+  if ((flag_sanitize & (SANITIZE_UNDEFINED
+   | SANITIZE_UNDEFINED_NONDEFAULT)) != 0
+ && DECL_IGNORED_P (decl)
+ && !TREE_READONLY (decl)
+ && TREE_CODE (DECL_NAME (decl)) == IDENTIFIER_NODE
+ && TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE
+ && TYPE_ARTIFICIAL (TREE_TYPE (decl))
+ && TYPE_NAME (TREE_TYPE (decl))
+ && TREE_CODE (TYPE_NAME (TREE_TYPE (decl))) == TYPE_DECL
+ && DECL_NAME (TYPE_NAME (TREE_TYPE (decl)))
+ && (TREE_CODE (DECL_NAME (TYPE_NAME (TREE_TYPE (decl
+ == IDENTIFIER_NODE))
+   {
+ tree id1 = DECL_NAME (decl);
+ tree id2 = DECL_NAME (TYPE_NAME (TREE_TYPE (decl)));
+ if (IDENTIFIER_LENGTH (id1) >= sizeof ("ubsan_data") - 1
+ && IDENTIFIER_LENGTH (id2) >= sizeof ("__ubsan__data")
+ && !memcmp (IDENTIFIER_POINTER (id2), "__ubsan_",
+ sizeof ("__ubsan_") - 1)
+ && !memcmp (IDENTIFIER_POINTER (id2) + IDENTIFIER_LENGTH (id2)
+ - sizeof ("_data") + 1, "_data",
+ sizeof ("_data") - 1)
+ && strstr (IDENTIFIER_POINTER (id1), "ubsan_data"))
+   return true;
+   }
+}
   return false;
 }
 
--- gcc/testsuite/c-c++-common/ubsan/pr64888.c.jj   2021-10-20 
11:48:18.715240008 +0200
+++ gcc/testsuite/c-c++-common/ubsan/pr64888.c  2021-10-20 11:47:57.670539595 
+0200
@@ -0,0 +1,27 @@
+/* PR middle-end/64888 */
+/* { dg-do compile { target fopenmp } } */
+/* { dg-options "-fopenmp -fsanitize=undefined" } */
+
+int a, b;
+
+void
+foo ()
+{
+  int c;
+#pragma omp parallel default (none) shared (a, b) private (c)
+  {
+c = a / b; /* { dg-bogus "not specified in enclosing" } */
+(void) c;
+  }
+#pragma omp task default (none) shared (a, b) private (c)
+  {
+c = a << b;/* { dg-bogus "not specified in enclosing" } */
+(void) c;
+  }
+#pragma omp teams default (none) shared (a, b)
+  {
+int d[a];  /* { dg-bogus "not specified in enclosing" } */
+d[0] = 0;
+(void) d[0];
+  }
+}

Jakub



[PATCH] c++: Implement DR2351 - void{} [PR102820]

2021-10-21 Thread Jakub Jelinek via Gcc-patches
Hi!

Here is an attempt to implement DR2351 - void{} - where void{} after
pack expansion is considered valid and the same thing as void().
For templates, dunno if we have some better way to check if a CONSTRUCTOR
might be empty after pack expansion.  Would that only if the constructor
only contains EXPR_PACK_EXPANSION elements and nothing else, or something
else too?  With the patch as is we wouldn't diagnose
template 
void
bar (T... t)
{
  void{1, t...};
}
at parsing time, only at instantiation time, even when it will always
expand to at least one CONSTRUCTOR elt.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2021-10-21  Jakub Jelinek  

PR c++/102820
* semantics.c (finish_compound_literal): Implement DR2351 - void{}.
If type is cv void and compound_literal has no elements, return
void_node.  If type is cv void and compound_literal is instantiation
dependent, handle it like other dependent compound literals.

* g++.dg/cpp0x/dr2351.C: New test.

--- gcc/cp/semantics.c.jj   2021-10-15 11:58:45.079131947 +0200
+++ gcc/cp/semantics.c  2021-10-20 17:00:38.586705600 +0200
@@ -3104,9 +3104,20 @@ finish_compound_literal (tree type, tree
 
   if (!TYPE_OBJ_P (type))
 {
-  if (complain & tf_error)
-   error ("compound literal of non-object type %qT", type);
-  return error_mark_node;
+  /* DR2351 */
+  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
+   return void_node;
+  else if (VOID_TYPE_P (type)
+  && processing_template_decl
+  && instantiation_dependent_expression_p (compound_literal))
+   /* If there are packs in compound_literal, it could
+  be void{} after pack expansion.  */;
+  else
+   {
+ if (complain & tf_error)
+   error ("compound literal of non-object type %qT", type);
+ return error_mark_node;
+   }
 }
 
   if (template_placeholder_p (type))
--- gcc/testsuite/g++.dg/cpp0x/dr2351.C.jj  2021-10-20 17:06:02.399162937 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/dr2351.C 2021-10-20 17:05:54.294276511 +0200
@@ -0,0 +1,36 @@
+// DR2351
+// { dg-do compile { target c++11 } }
+
+void
+foo ()
+{
+  void{};
+  void();
+}
+
+template 
+void
+bar (T... t)
+{
+  void{t...};
+  void(t...);
+}
+
+void
+baz ()
+{
+  bar ();
+}
+
+template 
+void
+qux (T... t)
+{
+  void{t...};  // { dg-error "compound literal of non-object type" }
+}
+
+void
+corge ()
+{
+  qux (1, 2);
+}

Jakub



Re: [PATCH] Fix loop split incorrect count and probability

2021-10-21 Thread Xionghu Luo via Gcc-patches



On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/9/23 20:17, Richard Biener wrote:
>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
>>
>>>
>>>
>>> On 2021/8/11 17:16, Richard Biener wrote:
 On Wed, 11 Aug 2021, Xionghu Luo wrote:

>
>
> On 2021/8/10 22:47, Richard Biener wrote:
>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>
>>> Thanks,
>>>
>>> On 2021/8/6 19:46, Richard Biener wrote:
 On Tue, 3 Aug 2021, Xionghu Luo wrote:

> loop split condition is moved between loop1 and loop2, the split bb's
> count and probability should also be duplicated instead of (100% vs
> INV),
> secondly, the original loop1 and loop2 count need be propotional from
> the
> original loop.
>
>
> diff base/loop-cond-split-1.c.151t.lsplit
> patched/loop-cond-split-1.c.151t.lsplit:
> ...
>   int prephitmp_16;
>   int prephitmp_25;
>
>[local count: 118111600]:
>   if (n_7(D) > 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 118111600]:
>   return;
>
>[local count: 105119324]:
>   pretmp_3 = ga;
>
> -   [local count: 955630225]:
> +   [local count: 315357973]:
>   # i_13 = PHI 
>   # prephitmp_12 = PHI 
>   if (prephitmp_12 != 0)
> goto ; [33.00%]
>   else
> goto ; [67.00%]
>
> -   [local count: 315357972]:
> +   [local count: 104068130]:
>   _2 = do_something ();
>   ga = _2;
>
> -   [local count: 955630225]:
> +   [local count: 315357973]:
>   # prephitmp_5 = PHI 
>   i_10 = inc (i_13);
>   if (n_7(D) > i_10)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   goto ; [100.00%]
>
> -   [local count: 850510901]:
> +   [local count: 280668596]:
>   if (prephitmp_12 != 0)
> -goto ; [100.00%]
> +goto ; [33.00%]
>   else
> -goto ; [INV]
> +goto ; [67.00%]
>
> -   [local count: 850510901]:
> +   [local count: 280668596]:
>   goto ; [100.00%]
>
> -   [count: 0]:
> +   [local count: 70429947]:
>   # i_23 = PHI 
>   # prephitmp_25 = PHI 
>
> -   [local count: 955630225]:
> +   [local count: 640272252]:
>   # i_15 = PHI 
>   # prephitmp_16 = PHI 
>   i_22 = inc (i_15);
>   if (n_7(D) > i_22)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
> -   [local count: 850510901]:
> +   [local count: 569842305]:
>   goto ; [100.00%]
>
> }
>
> gcc/ChangeLog:
>
>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>  (do_split_loop_on_cond): Likewise.
> ---
> gcc/tree-ssa-loop-split.c | 16 
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3a09bbc39e5..8e5a7ded0f7 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>  basic_block cond_bb;
>>>
>>> if (!initial_true)
>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>> +
>>> +   edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>> +  ? EDGE_SUCC (bbs[i], 0)
>>> +  : EDGE_SUCC (bbs[i], 1);
>>>
> 
>   class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -profile_probability::always
> (),
> -profile_probability::always
> (),
> -profile_probability::always
> (),
> -profile_probability::always
> (),
> +true_edge->probability,
> +
> true_edge->probability.invert (),
> +true_edge->probability,
> +
> true_edge->probability.invert (),
> true);

 there is no 'true_edge' variable at t

Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

2021-10-21 Thread Hongtao Liu via Gcc-patches
On Thu, Oct 21, 2021 at 1:53 PM Hongyu Wang  wrote:
>
> Yes, updated patch.
>
> gcc/ChangeLog:
> PR target/102812
> * config/i386/i386.c (ix86_get_ssemov): Adjust HFmode vector
> move to use the same logic as HImode.
>
> gcc/testsuite/ChangeLog:
> PR target/102812
> * gcc.target/i386/pr102812.c: New test.
> ---
>  gcc/config/i386/i386.c   | 15 ---
>  gcc/testsuite/gcc.target/i386/pr102812.c | 12 
>  2 files changed, 24 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102812.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 9cc903e826b..159684ce549 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5399,9 +5399,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>switch (scalar_mode)
>   {
>   case E_HFmode:
> -   opcode = (misaligned_p
> - ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> - : "vmovdqa64");
> +   if (evex_reg_p)
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> + ? "vmovdqu16"
> + : "vmovdqu64")
> +   : "vmovdqa64");
> +   else
> + opcode = (misaligned_p
> +   ? (TARGET_AVX512BW
> + ? "vmovdqu16"
> + : "%vmovdqu")
> +   : "%vmovdqa");
> break;
Assume gmail has swallow "tab", and there's no indent issue in the orinal code.
if that, LGTM.
>   case E_SFmode:
> opcode = misaligned_p ? "%vmovups" : "%vmovaps";
> diff --git a/gcc/testsuite/gcc.target/i386/pr102812.c
> b/gcc/testsuite/gcc.target/i386/pr102812.c
> new file mode 100644
> index 000..bad4fa9394e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr102812.c
> @@ -0,0 +1,12 @@
> +/* PR target/102812 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse4 -mno-avx" } */
> +/* { dg-final { scan-assembler-not "vmovdqa64\t" } } */
> +/* { dg-final { scan-assembler "movdqa\t" } } */
> +
> +typedef _Float16 v8hf __attribute__((__vector_size__ (16)));
> +
> +v8hf t (_Float16 a)
> +{
> +return (v8hf) {a, 0, 0, 0, 0, 0, 0, 0};
> +}
> --
> 2.18.1
>
> Hongtao Liu via Gcc-patches  于2021年10月21日周四 下午1:24写道:
> >
> > On Wed, Oct 20, 2021 at 1:31 PM Hongyu Wang via Gcc-patches
> >  wrote:
> > >
> > > Since _Float16 type is enabled under sse2 target, returning
> > > V8HFmode vector without AVX512F target would generate wrong
> > > vmovdqa64 instruction. Adjust ix86_get_ssemov to avoid this.
> > >
> > > Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> > >
> > > OK for master?
> > >
> > > gcc/ChangeLog:
> > > PR target/102812
> > > * config/i386/i386.c (ix86_get_ssemov): Adjust HFmode vector
> > > move without AVX512F target.
> > >
> > > gcc/testsuite/ChangeLog:
> > > PR target/102812
> > > * gcc.target/i386/pr102812.c: New test.
> > > ---
> > >  gcc/config/i386/i386.c   |  9 ++---
> > >  gcc/testsuite/gcc.target/i386/pr102812.c | 12 
> > >  2 files changed, 18 insertions(+), 3 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102812.c
> > >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 9cc903e826b..1d79180da9a 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -5399,9 +5399,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
> > >switch (scalar_mode)
> > > {
> > > case E_HFmode:
> > > - opcode = (misaligned_p
> > > -   ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> > > -   : "vmovdqa64");
> > > + if (!TARGET_AVX512F)
> > > +   opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > + else
> > > +   opcode = (misaligned_p
> > > + ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
> > > + : "vmovdqa64");
> > >   break;
> > Could we just use similar logic as HI?
> >
> > case E_HImode:
> >   if (evex_reg_p)
> > opcode = (need_unaligned_p
> >   ? (TARGET_AVX512BW
> >  ? "vmovdqu16"
> >  : "vmovdqu64")
> >   : "vmovdqa64");
> >   else
> > opcode = (need_unaligned_p
> >   ? (TARGET_AVX512BW
> >  ? "vmovdqu16"
> >  : "%vmovdqu")
> >   : "%vmovdqa");
> >   break;
> >
> > > case E_SFmode:
> > >   opcode = misaligned_p ? "%vmovups" : "%vmovaps";
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr102812.c 
> > > b/gcc/testsuite/gcc.target/i386/pr102812.c
> > > new file mode 100644
> > > index 000..bad4fa9394e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr102812.c
> > > @@ -0,0 +1,12 @@
> > > +/* PR target/102812 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -msse4 -mno-avx" } */
> > > +/* { dg-final { scan-assembler-not "vmovdqa64\t" } } */
> > > +/* { dg-final { scan-assemble

[committed] testsuite: Fix up gfortran.dg/gomp/strictly*.f90 testcases

2021-10-21 Thread Jakub Jelinek via Gcc-patches
Hi!

I meant it in the other testcases too:
While these testcases are dg-do compile only, I think it is better not to
give users bad examples and avoid unnecessary data races in testcases (unless
it is exactly what we want to test).  Perhaps one day we'll do some analysis
and warn about data races...

Tested on x86_64-linux, committed to trunk.

2021-10-21  Jakub Jelinek  

* gfortran.dg/gomp/strictly-structured-block-1.f90: Use call do_work
instead of x = x + 1 in places where the latter could be a data race.
* gfortran.dg/gomp/strictly-structured-block-2.f90: Likewise.
* gfortran.dg/gomp/strictly-structured-block-3.f90: Likewise.

--- gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90.jj   
2021-10-21 10:23:27.544833287 +0200
+++ gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90  
2021-10-21 11:03:41.146041455 +0200
@@ -9,23 +9,23 @@ program main
 
   !$omp parallel
   block
-x = x + 1
+call do_work
   end block
 
   !$omp parallel
   block
-x = x + 1
+call do_work
   end block
   !$omp end parallel
 
   !$omp teams
   block
-x = x + 1
+call do_work
   end block
 
   !$omp teams
   block
-x = x + 1
+call do_work
   end block
   !$omp end teams
 
@@ -42,12 +42,12 @@ program main
 
   !$omp scope
   block
-x = x + 1
+call do_work
   end block
 
   !$omp scope
   block
-x = x + 1
+call do_work
   end block
   !$omp end scope
 
@@ -75,12 +75,12 @@ program main
 
   !$omp task
   block
-x = x + 1
+call do_work
   end block
 
   !$omp task
   block
-x = x + 1
+call do_work
   end block
   !$omp end task
 
@@ -130,23 +130,23 @@ program main
 
   !$omp target parallel
   block
-x = x + 1
+call do_work
   end block
 
   !$omp target parallel
   block
-x = x + 1
+call do_work
   end block
   !$omp end target parallel
 
   !$omp target teams
   block
-x = x + 1
+call do_work
   end block
 
   !$omp target teams
   block
-x = x + 1
+call do_work
   end block
   !$omp end target teams
 
@@ -176,7 +176,7 @@ program main
   do i = 1, n
  !$omp ordered
  block
-   x = x + 1
+   call do_work
  end block
   end do
 
@@ -184,7 +184,7 @@ program main
   do i = 1, n
  !$omp ordered
  block
-   x = x + 1
+   call do_work
  end block
  !$omp end ordered
   end do
--- gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90.jj   
2021-10-21 10:23:27.544833287 +0200
+++ gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90  
2021-10-21 11:05:04.326876447 +0200
@@ -9,16 +9,16 @@ program main
 
   !$omp parallel
   block
-x = x + 1
+call do_work
   end block
-  x = x + 1
+  call do_work
   !$omp end parallel ! { dg-error "Unexpected !.OMP END PARALLEL statement" }
 
   !$omp teams
   block
-x = x + 1
+call do_work
   end block
-  x = x + 1
+  call do_work
   !$omp end teams ! { dg-error "Unexpected !.OMP END TEAMS statement" }
 
   !$omp masked
@@ -30,9 +30,9 @@ program main
 
   !$omp scope
   block
-x = x + 1
+call do_work
   end block
-  x = x + 1
+  call do_work
   !$omp end scope ! { dg-error "Unexpected !.OMP END SCOPE statement" }
 
   !$omp single
@@ -51,9 +51,9 @@ program main
 
   !$omp task
   block
-x = x + 1
+call do_work
   end block
-  x = x + 1
+  call do_work
   !$omp end task ! { dg-error "Unexpected !.OMP END TASK statement" }
 
   !$omp target data map(x)
@@ -86,16 +86,16 @@ program main
 
   !$omp target parallel
   block
-x = x + 1
+call do_work
   end block
-  x = x + 1
+  call do_work
   !$omp end target parallel ! { dg-error "Unexpected !.OMP END TARGET PARALLEL 
statement" }
 
   !$omp target teams
   block
-x = x + 1
+call do_work
   end block
-  x = x + 1
+  call do_work
   !$omp end target teams ! { dg-error "Unexpected !.OMP END TARGET TEAMS 
statement" }
 
   !$omp critical
@@ -116,9 +116,9 @@ program main
   do i = 1, n
  !$omp ordered
  block
-   x = x + 1
+   call do_work
  end block
- x = x + 1
+ call do_work
  !$omp end ordered ! { dg-error "Unexpected !.OMP END ORDERED statement" }
   end do
 
--- gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90.jj   
2021-10-21 10:23:27.544833287 +0200
+++ gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90  
2021-10-21 11:07:21.441956054 +0200
@@ -10,7 +10,7 @@ program main
   !$omp parallel
 !$omp parallel
 block
-  x = x + 1
+  call do_work
 end block
 !$omp end parallel
   !$omp end parallel
@@ -32,7 +32,7 @@ program main
 end block
 x = x + 2
 !$omp section
-x = x + 4
+call do_work
   end block
 
   !$omp sections


Jakub



[PATCH] tree-optimization/102847 - properly cost VMAT_INVARIANT loads

2021-10-21 Thread Richard Biener via Gcc-patches
The following adds proper costing of VMAT_INVARIANT loads, avoiding
to ask the target about the cost of an unsupported vector load cost
which we won't emit anyway.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-21  Richard Biener  

PR tree-optimization/102847
* tree-vect-stmts.c (vect_model_load_cost): Explicitely
handle VMAT_INVARIANT as a splat in the prologue.
---
 gcc/tree-vect-stmts.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9cbc1af4cc9..8f527452bd0 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1191,6 +1191,13 @@ vect_model_load_cost (vec_info *vinfo,
   ncopies * assumed_nunits,
   scalar_load, stmt_info, 0, vect_body);
 }
+  else if (memory_access_type == VMAT_INVARIANT)
+{
+  /* Invariant loads will ideally be hoisted and splat to a vector.  */
+  prologue_cost += record_stmt_cost (cost_vec, 1,
+scalar_to_vec, stmt_info, 0,
+vect_prologue);
+}
   else
 vect_get_load_cost (vinfo, stmt_info, ncopies,
alignment_support_scheme, misalignment, first_stmt_p,
-- 
2.31.1


Re: [PATCH] options: Fix variable tracking option processing.

2021-10-21 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 10:51 AM Martin Liška  wrote:
>
> On 10/19/21 12:53, Richard Biener wrote:
> > Meh ... :/
> >
> > Well, move the target override hook call down (try to shuffle things
> > so diagnostics happen after but
> > "inits" happen before).
>
> Not so easy. There are direct usages of the hooks
> (influences dwarf2out_as_loc_support and dwarf2out_as_locview_support)
>
>if (!OPTION_SET_P (dwarf2out_as_loc_support))
>  dwarf2out_as_loc_support = dwarf2out_default_as_loc_support ();
>if (!OPTION_SET_P (dwarf2out_as_locview_support))
>  dwarf2out_as_locview_support = dwarf2out_default_as_locview_support ();
>
>if (!OPTION_SET_P (debug_variable_location_views))
>  {
>debug_variable_location_views
> = (flag_var_tracking
>&& debug_info_level >= DINFO_LEVEL_NORMAL
>&& dwarf_debuginfo_p ()
>&& !dwarf_strict
>&& dwarf2out_as_loc_support
>&& dwarf2out_as_locview_support);
>  }
>
> and then the warnings depend on debug_variable_location_views.
>
> I have another attempt which is about moving option detection of 
> debug_nonbind_markers_p
> to finish_options. That works fine, except one needs to mark the option as 
> PerFunction.
> That's because it depends on 'optimize' and that would trigger:
> 'global_options are modified in local context' verification error.

That looks like a sensible change anyway - options that depend on options that
are per function have to be per function as well.  It also depends on
flag_selective_scheduling.

But note the checks now happen before

  if (flag_syntax_only)
{
  write_symbols = NO_DEBUG;
  profile_flag = 0;
}

  if (flag_gtoggle)
{
  if (debug_info_level == DINFO_LEVEL_NONE)
{
  debug_info_level = DINFO_LEVEL_NORMAL;

  if (write_symbols == NO_DEBUG)
write_symbols = PREFERRED_DEBUGGING_TYPE;
}
  else
debug_info_level = DINFO_LEVEL_NONE;
}

which previously affected debug_nonbind_markers_p.  I think it makes
sense to move
the above to finish_options as well.  I suppose -help doesn't correctly dump
the -g enabled state for -gtoggle at the moment?

Richard.

>
> What do you think about the patch?
> Cheers,
> Martin


[PATCH] tree-optimization/102847 - adjust VMAT_INVARIANT load costing

2021-10-21 Thread Richard Biener via Gcc-patches
This adds the missing scalar load cost in the prologue.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2021-10-21  Richard Biener  

PR tree-optimization/102847
* tree-vect-stmts.c (vect_model_load_cost): Add the scalar
load cost in the prologue for VMAT_INVARIANT.
---
 gcc/tree-vect-stmts.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 8f527452bd0..c28c9370655 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1194,6 +1194,9 @@ vect_model_load_cost (vec_info *vinfo,
   else if (memory_access_type == VMAT_INVARIANT)
 {
   /* Invariant loads will ideally be hoisted and splat to a vector.  */
+  prologue_cost += record_stmt_cost (cost_vec, 1,
+scalar_load, stmt_info, 0,
+vect_prologue);
   prologue_cost += record_stmt_cost (cost_vec, 1,
 scalar_to_vec, stmt_info, 0,
 vect_prologue);
-- 
2.31.1


[PATCH] Move the initial debug_hooks setting

2021-10-21 Thread Richard Biener via Gcc-patches
I just realized that when I moved the langhook call I failed to
move the initial debug_hooks setting whose comment mentions the
langhook as reason.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

2021-10-21  Richard Biener  

* toplev.c (process_options): Move the initial debug_hooks
setting ...
(toplev::main): ... before the call of the post_options
langhook.
---
 gcc/toplev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index cb4f8c470f0..67fb71612d5 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1227,9 +1227,6 @@ static void
 process_options (bool no_backend)
 {
   const char *language_string = lang_hooks.name;
-  /* Just in case lang_hooks.post_options ends up calling a debug_hook.
- This can happen with incorrect pre-processed input. */
-  debug_hooks = &do_nothing_debug_hooks;
 
   maximum_field_alignment = initial_max_fld_align * BITS_PER_UNIT;
 
@@ -2338,6 +2335,9 @@ toplev::main (int argc, char **argv)
   /* Exit early if we can (e.g. -help).  */
   if (!exit_after_options)
 {
+  /* Just in case lang_hooks.post_options ends up calling a debug_hook.
+This can happen with incorrect pre-processed input. */
+  debug_hooks = &do_nothing_debug_hooks;
   /* Allow the front end to perform consistency checks and do further
 initialization based on the command line options.  This hook also
 sets the original filename if appropriate (e.g. foo.i -> foo.c)
-- 
2.31.1


Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
On Wed, Oct 20, 2021 at 10:19 PM Jeff Law  wrote:
> So we want to keep some form of ssa-dom-thread-7.  That' s the canonical
> testcase for the case for the FSM optimization.
>
> What we need to verify is that we thread jumps across the backedge of
> the loop through the switch statement to a particular case (thus
> bypassing the indirect jump for the switch statement).  How to do that
> in a way that's easier to manage?  I have no clue.  I guess a gimple-fe
> based test might help.

Ah, I see.

After lots of pain, I was able to distill a tiny testcase that does
just that (tree-ssa/ssa-thread-backedge.c), and is easy to maintain.
I've added a "backedge" marker to the path dumping code for easy
matching in the test.  An alternative would be to convert it to a
gimple FE test examining the exact thread sequence through the
backedge, but I think that's overkill since the test is so small.

Phew, I think we're finally converging on a useful set of threading tests :).

OK for trunk?
Aldy
From d6a3114b160079a60d968cd21950a33116361d75 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Wed, 20 Oct 2021 07:29:25 +0200
Subject: [PATCH] Try to resolve paths in threader without looking further
 back.

Sometimes we can solve a candidate path without having to recurse
further back.  This can mostly happen in fully resolving mode, because
we can ask the ranger what the range on entry to the path is, but
there's no reason this can't always apply.  This one-liner removes
the fully-resolving restriction.

I'm tickled pink to see how many things we now get quite early
in the compilation.  I actually had to disable jump threading entirely
for a few tests because the early threader was catching things
disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
cleanups happening.

I was going to commit this as obvious, but I think the test changes
merit discussion.

We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
some time.  Every time a threading pass gets smarter, we push the
check further down the pipeline.  We've officially run out of dumb
threading passes to disable ;-).  In the last year we've gone up from a
handful of threads, to 34 threads with the current combination of
options.  I doubt this is testing anything useful anymore, so I've
removed it.

Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
jump threads, but they were disallowed because of loop rotation.  Then
we started catching more jump threads in VRP2 threading so we tested
there.  With this patch though, we triple the number of threads found
from 11 to 31.  I believe this test has outlived its usefulness, and
I've removed it.  Note that even though we have these outrageous
possibilities for this test, the block copier ultimately chops them
down (23 survive though).

The ssa-dom-thread-7.c is the canonical test for the FSM optimization,
but it has become a maintenance burden.  The number of threads has
been consistently growing over the years, and there's no clean way to
test that we're actually threading through the backedges.  I have
replaced this test with a much simpler one (ssa-thread-backedge.c)
that tests that we can thread across the backedge, through the switch
statement and into a particular case.

All in all, I believe the simpler jump threading tests, as well as the
gimple FE tests I've added, more than adequately cover us.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c (back_threader::find_paths_to_names):
	Always try to resolve path without looking back.
	* tree-ssa-threadupdate.c (dump_jump_thread): Indidicate whether
	edge is a back edge.

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/scop-dsyr2k-2.c: Adjust for jump threading changes.
	* gcc.dg/graphite/scop-dsyr2k.c: Same.
	* gcc.dg/graphite/scop-dsyrk-2.c: Same.
	* gcc.dg/graphite/scop-dsyrk.c: Same.
	* gcc.dg/tree-ssa/pr20701.c: Same.
	* gcc.dg/tree-ssa/pr20702.c: Same.
	* gcc.dg/tree-ssa/pr21086.c: Same.
	* gcc.dg/tree-ssa/pr25382.c: Same.
	* gcc.dg/tree-ssa/pr58480.c: Same.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
	* gcc.dg/tree-ssa/vrp08.c: Same.
	* gcc.dg/tree-ssa/vrp55.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Removed.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Removed.
	* gcc.dg/tree-ssa/ssa-thread-11.c: Removed.
	* gcc.dg/uninit-pr89230-1.c: xfail.
---
 gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c |   1 +
 gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c   |   1 +
 gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c  |   1 +
 gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c|   1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr20701.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr20702.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21086.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr25382.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr58480.c   |   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-4.c|  60 
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c| 134 --
 gcc/testsuite/gcc.dg/t

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Richard Biener via Gcc-patches
On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:
>
>
>
> On 10/18/2021 2:17 AM, Aldy Hernandez wrote:
> >
> >
> > On 10/18/21 12:52 AM, Jeff Law wrote:
> >>
> >>
> >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
> >>> The following patch converts the strlen pass from evrp to ranger,
> >>> leaving DOM as the last remaining user.
> >> So is there any reason why we can't convert DOM as well?   DOM's use
> >> of EVRP is pretty limited.  You've mentioned FP bits before, but my
> >> recollection is those are not part of the EVRP analysis DOM uses.
> >> Hell, give me a little guidance and I'll do the work...
> >
> > Not only will I take you up on that offer, but I can provide 90% of
> > the work.  Here be dragons, though (well, for me, maybe not for you ;-)).
> >
> > DOM is actually an evrp pass at -O1 in disguise.  The reason it really
> > is a covert evrp pass is because:
> >
> > a) It calls extract_range_from_stmt on each statement.
> >
> > b) It folds conditionals with simplify_using_ranges.
> >
> > c) But most importantly, it exports discovered ranges when it's done
> > (evrp_range_analyzer(true)).
> >
> > If you look at the evrp pass, you'll notice that that's basically what
> > it does, albeit with the substitute and fold engine, which also calls
> > gimple fold plus other goodies.
> >
> > But I could argue that we've made DOM into an evrp pass without
> > noticing.  The last item (c) is particularly invasive because these
> > exported ranges show up in other passes unexpectedly.  For instance, I
> > saw an RTL pass at -O1 miss an optimization because it was dependent
> > on some global range being set.  IMO, DOM should not export global
> > ranges it discovered during its walk (do one thing and do it well),
> > but I leave it to you experts to pontificate.
> All true.  But I don't think we've got many, if any, hard dependencies
> on those behaviors.
>
> >
> > The attached patch is rather trivial.  It's mostly deleting state.  It
> > seems DOM spends a lot of time massaging the IL so that it can fold
> > conditionals or thread paths.  None of this is needed, because the
> > ranger can do all of this.  Well, except floats, but...
> Massaging the IL should only take two forms IIRC.
>
> First, if we have a simplification we can do.  That could be const/copy
> propagation, replacing an expression with an SSA_NAME or constant and
> the like.  It doesn't massage the IL just to massage the IL.
>
> Second, we do temporarily copy propagate the current known values of an
> SSA name into use points and then see if that allows us to determine if
> a statement is already in the hash tables.  But we undo that so that
> nobody should see that temporary change in state.

Are you sure we still do that?  I can't find it at least.

>
> Finally, it does create some expressions & statements on the fly to
> enter them into the tables.  For example, if it sees a store, it'll
> create a load with the source & dest interchanged and enter that into
> the expression table.  But none of this stuff ever shows up in the IL.
> It's just to create entries in the expression tables.
>
> So ITSM the only real concern would be if those temporary const/copy
> propagations were still in the IL and we called back into Ranger and it
> poked at that data somehow.
> >
> > That's the good news.  The bad news is that DOM changes the IL as it
> > goes and the patch doesn't bootstrap.  Andrew insists that we should
> > work even with DOM's changing IL, but last time we played this dance
> > with the substitute_and_fold engine, there were some tweaks needed to
> > the ranger.  Could be this, but I haven't investigated.  It could also
> > be that the failures I was seeing were just DOM things that were no
> > longer needed (shuffling the IL to simplify things for evrp).
> So if we're referring to those temporary const/copy propagations
> "escaping" into Ranger, then I would fully expect that to cause
> problems.  Essentially they're path sensitive const/copy propagations
> and may not be valid on all the paths through the CFG to the statement
> where the propagation occurs
>
>
>
> >
> > This just needs a little shepherding from a DOM expert ;-).  If you
> > get it to bootstrap, I could take care of the tests, performance, and
> > making sure we're getting the same number of threads etc.
> >
> >>
> >>>
> >>> No additional cleanups have been done.  For example, the strlen pass
> >>> still has uses of VR_ANTI_RANGE, and the sprintf still passes around
> >>> pairs of integers instead of using a proper range.  Fixing this
> >>> could further improve these passes.
> >>>
> >>> As a further enhancement, if the relevant maintainers deem useful,
> >>> the domwalk could be removed from strlen.  That is, unless the pass
> >>> needs it for something else.
> >> The dom walk was strictly for the benefit of EVRP when it was added.
> >> So I think it can get zapped once the pass is converted.
> >
> > Jakub mentioned a while ago, that the strlen pass itse

[RFC PATCH 0/1] implement TLS register based stack canary for ARM

2021-10-21 Thread Ard Biesheuvel via Gcc-patches
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352

In the Linux kernel, user processes calling into the kernel are
essentially threads running in the same address space, of a program that
never terminates. This means that using a global variable for the stack
protector canary value is problematic on SMP systems, as we can never
change it unless we reboot the system. (Processes that sleep for any
reason will do so on a call into the kernel, which means that there will
always be live kernel stack frames carrying copies of the canary taken
when the function was entered)

AArch64 implements -mstack-protector-guard=sysreg for this purpose, as
this permits the kernel to use different memory addresses for the stack
canary for each CPU, and context switch the chosen system register with
the rest of the process, allowing each process to use its own unique
value for the stack canary.

This patch implements something similar, but for the 32-bit ARM kernel,
which will start using the user space TLS register TPIDRURO to index
per-process metadata while running in the kernel. This means we can just
add an offset to TPIDRURO to obtain the address from which to load the
canary value.

The patch is a bit rough around the edges, but produces the correct
results as far as I can tell. However, I couldn't quite figure out how
to modify the patterns so that the offset will be moved into the
immediate offset field of the LDR instructions, so currently, the ADD of
the offset is always a distinct instruction.

As for the spilling issues that have been fixed in this code in the
past: I suppose a register carrying the TLS register value will never
get spilled to begin with? How about a register that carries TLS+?

Comments/suggestions welcome.

Cc: thomas.preudho...@celest.fr
Cc: adhemerval.zane...@linaro.org
Cc: Qing Zhao 
Cc: Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org

Ard Biesheuvel (1):
  [ARM] Add support for TLS register based stack protector canary access

 gcc/config/arm/arm-opts.h |  6 +++
 gcc/config/arm/arm.c  | 39 +
 gcc/config/arm/arm.md | 44 ++--
 gcc/config/arm/arm.opt| 22 ++
 gcc/doc/invoke.texi   |  9 
 5 files changed, 116 insertions(+), 4 deletions(-)

-- 
2.30.2

$ cat |arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls 
-mstack-protector-guard-offset=10 -mtp=cp15 -S -o - -xc - -fstack-protector-all 
-O3
int foo(void *);
int bar(void)
{

return foo(__builtin_thread_pointer()) + 1;
}
.arch armv7-a
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file   ""
.text
.align  2
.global bar
.syntax unified
.arm
.type   bar, %function
bar:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
push{r4, lr}
mrc p15, 0, r4, c13, c0, 3  @ load_tp_hard
add r3, r4, #10
sub sp, sp, #8
mov r0, r4
add r4, r4, #10
ldr r3, [r3]
str r3, [sp, #4]
mov r3, #0
bl  foo
ldr r3, [r4]
ldr r4, [sp, #4]
eorsr3, r4, r3
mov r4, #0
bne .L5
add r0, r0, #1
add sp, sp, #8
@ sp needed
pop {r4, pc}
.L5:
bl  __stack_chk_fail
.size   bar, .-bar
.ident  "GCC: (GNU) 12.0.0 20211019 (experimental)"
.section.note.GNU-stack,"",%progbits



[RFC PATCH 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-10-21 Thread Ard Biesheuvel via Gcc-patches
Add support for accessing the stack canary value via the TLS register,
so that multiple threads running in the same address space can use
distinct canary values. This is intended for the Linux kernel running in
SMP mode, where processes entering the kernel are essentially threads
running the same program concurrently: using a global variable for the
canary in that context is problematic because it can never be rotated,
and so the OS is forced to use the same value as long as it remains up.

Using the TLS register to index the stack canary helps with this, as it
allows each CPU to context switch the TLS register along with the rest
of the process, permitting each process to use its own value for the
stack canary.

2021-10-20 Ard Biesheuvel 

* config/arm/arm-opts.h (enum stack_protector_guard): New
* config/arm/arm.c (arm_option_override_internal): Handle
and put in error checks for stack protector guard options.
(arm_stack_protect_guard): New.
(TARGET_STACK_PROTECT_GUARD): Define.
* config/arm/arm.md (reg_stack_protect_address): New.
(stack_protect_set): Adjust for SSP_GLOBAL.
(stack_protect_test): Likewise.
* config/arm/arm.opt (-mstack-protector-guard): New
(-mstack-protector-guard-offset): New.

Signed-off-by: Ard Biesheuvel 
---
 gcc/config/arm/arm-opts.h |  6 +++
 gcc/config/arm/arm.c  | 39 +
 gcc/config/arm/arm.md | 44 ++--
 gcc/config/arm/arm.opt| 22 ++
 gcc/doc/invoke.texi   |  9 
 5 files changed, 116 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
index 5c4b62f404f7..581ba3c4fbbb 100644
--- a/gcc/config/arm/arm-opts.h
+++ b/gcc/config/arm/arm-opts.h
@@ -69,4 +69,10 @@ enum arm_tls_type {
   TLS_GNU,
   TLS_GNU2
 };
+
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard {
+  SSP_TLSREG,  /* per-thread canary in TLS register */
+  SSP_GLOBAL   /* global canary */
+};
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e51f60a1841d..deccc88e006e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
+#undef TARGET_STACK_PROTECT_GUARD
+#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -3157,6 +3160,26 @@ arm_option_override_internal (struct gcc_options *opts,
   if (TARGET_THUMB2_P (opts->x_target_flags))
 opts->x_inline_asm_unified = true;
 
+  if (arm_stack_protector_guard == SSP_GLOBAL
+  && opts->x_arm_stack_protector_guard_offset_str)
+{
+  error ("incompatible options %'-mstack-protector-guard=global%' and"
+"%'-mstack-protector-guard-offset=%qs%'",
+arm_stack_protector_guard_offset_str);
+}
+
+  if (opts->x_arm_stack_protector_guard_offset_str)
+{
+  char *end;
+  const char *str = arm_stack_protector_guard_offset_str;
+  errno = 0;
+  long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0);
+  if (!*str || *end || errno)
+   error ("%qs is not a valid offset in %qs", str,
+  "-mstack-protector-guard-offset=");
+  arm_stack_protector_guard_offset = offs;
+}
+
 #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
   SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
 #endif
@@ -3824,6 +3847,10 @@ arm_option_reconfigure_globals (void)
   else
target_thread_pointer = TP_SOFT;
 }
+
+  if (arm_stack_protector_guard == SSP_TLSREG
+  && target_thread_pointer != TP_CP15)
+error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
 }
 
 /* Perform some validation between the desired architecture and the rest of the
@@ -34052,6 +34079,18 @@ arm_run_selftests (void)
 }
 } /* Namespace selftest.  */
 
+/* Implement TARGET_STACK_PROTECT_GUARD. In case of a
+   global variable based guard use the default else
+   return a null tree.  */
+static tree
+arm_stack_protect_guard (void)
+{
+  if (arm_stack_protector_guard == SSP_GLOBAL)
+return default_stack_protect_guard ();
+
+  return NULL_TREE;
+}
+
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
 #endif /* CHECKING_P */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4adc976b8b67..f57e1db07e6a 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9183,8 +9183,21 @@
  UNSPEC_SP_SET))
   (clobber (match_scratch:SI 2 ""))
   (clobber (match_scratch:SI 3 ""))])]
+  "arm_stack_protector_guard == SSP_GLOBAL"
   ""
-  ""
+)
+
+(define_expand "stack_protect_set"
+  [(match_operand:SI 0 "memory_operand")
+   (match_operand:SI 1 "")]
+  "arm_stack_protector_guard == SSP_TLSREG"
+  "
+{
+  rtx tp_reg = gen_reg

Re: [PATCH] Fix loop split incorrect count and probability

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, 21 Oct 2021, Xionghu Luo wrote:

> 
> 
> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> > 
> > 
> > On 2021/9/23 20:17, Richard Biener wrote:
> >> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> >>
> >>>
> >>>
> >>> On 2021/8/11 17:16, Richard Biener wrote:
>  On Wed, 11 Aug 2021, Xionghu Luo wrote:
> 
> >
> >
> > On 2021/8/10 22:47, Richard Biener wrote:
> >> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>
> >>> Thanks,
> >>>
> >>> On 2021/8/6 19:46, Richard Biener wrote:
>  On Tue, 3 Aug 2021, Xionghu Luo wrote:
> 
> > loop split condition is moved between loop1 and loop2, the split 
> > bb's
> > count and probability should also be duplicated instead of (100% vs
> > INV),
> > secondly, the original loop1 and loop2 count need be propotional 
> > from
> > the
> > original loop.
> >
> >
> > diff base/loop-cond-split-1.c.151t.lsplit
> > patched/loop-cond-split-1.c.151t.lsplit:
> > ...
> >   int prephitmp_16;
> >   int prephitmp_25;
> >
> >[local count: 118111600]:
> >   if (n_7(D) > 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 118111600]:
> >   return;
> >
> >[local count: 105119324]:
> >   pretmp_3 = ga;
> >
> > -   [local count: 955630225]:
> > +   [local count: 315357973]:
> >   # i_13 = PHI 
> >   # prephitmp_12 = PHI 
> >   if (prephitmp_12 != 0)
> > goto ; [33.00%]
> >   else
> > goto ; [67.00%]
> >
> > -   [local count: 315357972]:
> > +   [local count: 104068130]:
> >   _2 = do_something ();
> >   ga = _2;
> >
> > -   [local count: 955630225]:
> > +   [local count: 315357973]:
> >   # prephitmp_5 = PHI 
> >   i_10 = inc (i_13);
> >   if (n_7(D) > i_10)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >   goto ; [100.00%]
> >
> > -   [local count: 850510901]:
> > +   [local count: 280668596]:
> >   if (prephitmp_12 != 0)
> > -goto ; [100.00%]
> > +goto ; [33.00%]
> >   else
> > -goto ; [INV]
> > +goto ; [67.00%]
> >
> > -   [local count: 850510901]:
> > +   [local count: 280668596]:
> >   goto ; [100.00%]
> >
> > -   [count: 0]:
> > +   [local count: 70429947]:
> >   # i_23 = PHI 
> >   # prephitmp_25 = PHI 
> >
> > -   [local count: 955630225]:
> > +   [local count: 640272252]:
> >   # i_15 = PHI 
> >   # prephitmp_16 = PHI 
> >   i_22 = inc (i_15);
> >   if (n_7(D) > i_22)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> > -   [local count: 850510901]:
> > +   [local count: 569842305]:
> >   goto ; [100.00%]
> >
> > }
> >
> > gcc/ChangeLog:
> >
> >  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >  (do_split_loop_on_cond): Likewise.
> > ---
> > gcc/tree-ssa-loop-split.c | 16 
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index 3a09bbc39e5..8e5a7ded0f7 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >  basic_block cond_bb;
> >>>
> >>>   if (!initial_true)
> >>> -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>> +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>> +
> >>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>> +? EDGE_SUCC (bbs[i], 0)
> >>> +: EDGE_SUCC (bbs[i], 1);
> >>>
> > 
> > class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > -  profile_probability::always
> > (),
> > -  profile_probability::always
> > (),
> > -  profile_probability::always
> > (),
> > -  profile_probability::always
> > (),
> 

[PATCH] sra: Fix corner case of total scalarization with virtual inheritance (PR 102505)

2021-10-21 Thread Martin Jambor
Hi,

PR 102505 is a situation where of SRA takes its initial top-level
access size from a get_ref_base_and_extent called on a COMPONENT_REF,
and thus derived frm the FIELD_DECL, which however does not include a
virtual base.  Total scalarization then goes on traversing the type,
which however has virtual base past the non-virtual bits, tricking SRA
to create sub-accesses outside of the supposedly encompassing
accesses, which in turn triggers the verifier within the pass.

The patch below fixes that by failing total scalarization when this
situation is detected.

Bootstrapped and tested on x86_64-linux, OK for trunk and (after testing
there) on the branches?

Thanks,

Martin


gcc/ChangeLog:

2021-10-20  Martin Jambor  

PR tree-optimization/102505
* tree-sra.c (totally_scalarize_subtree): Check that the
encountered field fits within the acces we would like to put it
in.

gcc/testsuite/ChangeLog:

2021-10-20  Martin Jambor  

PR tree-optimization/102505
* g++.dg/torture/pr102505.C: New test.
---
 gcc/testsuite/g++.dg/torture/pr102505.C | 15 +++
 gcc/tree-sra.c  |  2 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr102505.C

diff --git a/gcc/testsuite/g++.dg/torture/pr102505.C 
b/gcc/testsuite/g++.dg/torture/pr102505.C
new file mode 100644
index 000..a846751a0d6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr102505.C
@@ -0,0 +1,15 @@
+struct D  { int i;  int pad alignas(16); };
+struct B : virtual D
+{
+  int j =84;
+  int k =84;
+};
+
+struct C: B { };
+
+int main()
+{
+  C c;
+  if (c.j != 84 || c.k != 84)
+__builtin_abort();
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 9b786e29e4e..f561e1a2133 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3288,6 +3288,8 @@ totally_scalarize_subtree (struct access *root)
  continue;
 
HOST_WIDE_INT pos = root->offset + int_bit_position (fld);
+   if (pos + fsize > root->size)
+ return false;
enum total_sra_field_state
  state = total_should_skip_creating_access (root,
 &last_seen_sibling,
-- 
2.33.0



Re: [PATCH v4] Improve integer bit test on __atomic_fetch_[or|and]_* returns

2021-10-21 Thread Hongtao Liu via Gcc-patches
 i is

On Wed, Oct 13, 2021 at 8:34 PM Richard Biener via Gcc-patches
 wrote:
>
> On Sun, Oct 10, 2021 at 3:49 PM H.J. Lu  wrote:
> >
> > Changes in v4:
> >
> > 1. Bypass redundant check when inputs have been transformed to the
> > equivalent canonical form with valid bit operation.
> >
> > Changes in v3:
> >
> > 1.  Check invalid bit operation.
> >
> > commit adedd5c173388ae505470df152b9cb3947339566
> > Author: Jakub Jelinek 
> > Date:   Tue May 3 13:37:25 2016 +0200
> >
> > re PR target/49244 (__sync or __atomic builtins will not emit 'lock 
> > bts/btr/btc')
> >
> > optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns
> > with lock bts/btr/btc by turning
> >
> >   mask_2 = 1 << cnt_1;
> >   _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
> >   _5 = _4 & mask_2;
> >
> > into
> >
> >   _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3);
> >   _5 = _4;
> >
> > and
> >
> >   mask_6 = 1 << bit_5(D);
> >   _1 = ~mask_6;
> >   _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
> >   _3 = _2 & mask_6;
> >   _4 = _3 != 0;
> >
> > into
> >
> >   mask_6 = 1 << bit_5(D);
> >   _1 = ~mask_6;
> >   _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0);
> >   _4 = _11 != 0;
> >
> > But it failed to optimize many equivalent, but slighly different cases:
> >
> > 1.
> >   _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
> >   _4 = (_Bool) _1;
> > 2.
> >   _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
> >   _4 = (_Bool) _1;
> > 3.
> >   _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
> >   _7 = ~_1;
> >   _5 = (_Bool) _7;
> > 4.
> >   _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
> >   _7 = ~_1;
> >   _5 = (_Bool) _7;
> > 5.
> >   _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
> >   _2 = (int) _1;
> >   _7 = ~_2;
> >   _5 = (_Bool) _7;
> > 6.
> >   _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
> >   _2 = (int) _1;
> >   _7 = ~_2;
> >   _5 = (_Bool) _7;
> > 7.
> >   _1 = _atomic_fetch_or_4 (ptr_6, mask, _3);
> >   _2 = (int) _1;
> >   _5 = _2 & mask;
> > 8.
> >   _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
> >   _5 = (signed int) _1;
> >   _4 = _5 < 0;
> > 9.
> >   _1 = __atomic_fetch_and_4 (ptr_6, 0x7fff, _3);
> >   _5 = (signed int) _1;
> >   _4 = _5 < 0;
> > 10.
> >   _1 = 1 << bit_4(D);
> >   mask_5 = (unsigned int) _1;
> >   _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0);
> >   _3 = _2 & mask_5;
> > 11.
> >   mask_7 = 1 << bit_6(D);
> >   _1 = ~mask_7;
> >   _2 = (unsigned int) _1;
> >   _3 = __atomic_fetch_and_4 (v_9(D), _2, 0);
> >   _4 = (int) _3;
> >   _5 = _4 & mask_7;
> >
> > We make
> >
> >   mask_2 = 1 << cnt_1;
> >   _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
> >   _5 = _4 & mask_2;
> >
> > and
> >
> >   mask_6 = 1 << bit_5(D);
> >   _1 = ~mask_6;
> >   _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
> >   _3 = _2 & mask_6;
> >   _4 = _3 != 0;
> >
> > the canonical forms for this optimization and transform cases 1-9 to the
> > equivalent canonical form.  For cases 10 and 11, we simply remove the cast
> > before __atomic_fetch_or_4/__atomic_fetch_and_4 with
> >
> >   _1 = 1 << bit_4(D);
> >   _2 = __atomic_fetch_or_4 (v_7(D), _1, 0);
> >   _3 = _2 & _1;
> >
> > and
> >
> >   mask_7 = 1 << bit_6(D);
> >   _1 = ~mask_7;
> >   _3 = __atomic_fetch_and_4 (v_9(D), _1, 0);
> >   _6 = _3 & mask_7;
> >   _5 = (int) _6;
> >
> > gcc/
> >
> > PR middle-end/102566
> > * tree-ssa-ccp.c (convert_atomic_bit_not): New function.
> > (optimize_atomic_bit_test_and): Transform equivalent, but slighly
> > different cases to their canonical forms.
> >
> > gcc/testsuite/
> >
> > PR middle-end/102566
> > * g++.target/i386/pr102566-1.C: New test.
> > * g++.target/i386/pr102566-2.C: Likewise.
> > * g++.target/i386/pr102566-3.C: Likewise.
> > * g++.target/i386/pr102566-4.C: Likewise.
> > * g++.target/i386/pr102566-5a.C: Likewise.
> > * g++.target/i386/pr102566-5b.C: Likewise.
> > * g++.target/i386/pr102566-6a.C: Likewise.
> > * g++.target/i386/pr102566-6b.C: Likewise.
> > * gcc.target/i386/pr102566-1a.c: Likewise.
> > * gcc.target/i386/pr102566-1b.c: Likewise.
> > * gcc.target/i386/pr102566-2.c: Likewise.
> > * gcc.target/i386/pr102566-3a.c: Likewise.
> > * gcc.target/i386/pr102566-3b.c: Likewise.
> > * gcc.target/i386/pr102566-4.c: Likewise.
> > * gcc.target/i386/pr102566-5.c: Likewise.
> > * gcc.target/i386/pr102566-6.c: Likewise.
> > * gcc.target/i386/pr102566-7.c: Likewise.
> > * gcc.target/i386/pr102566-8a.c: Likewise.
> > * gcc.target/i386/pr102566-8b.c: Likewise.
> > * gcc.target/i386/pr102566-9a.c: Likewise.
> > * gcc.target/i386/pr102566-9b.c: Likewise.
> > * gcc.target/i386/pr102566-10a.c: Likewise.
> > * gcc.target/i386/pr102566-10b.c: Likewise.
> > * gcc.target/i386/pr102566-11.c: Likewise.
> > * gcc.target/i386/pr102566-12.c: Likewise.
> > ---
> >  gcc/testsuite/g++.target/i386/pr102566-1.C   

Re: [PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]

2021-10-21 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote:
> or an unevaluated operand, or a subexpression of an immediate invocation.
> 
> Hmm...that suggests that in consteval23.C, bar(foo) should also be OK,

Ouch.  That opens a big can of worms, see below.

> because 'foo' is a subexpression of an immediate invocation.  We can handle
> that by...
> 
> > +
> > +bool
> > +in_immediate_context ()
> > +{
> > +  return (cp_unevaluated_operand != 0
> > + || (current_function_decl != NULL_TREE
> > + && DECL_IMMEDIATE_FUNCTION_P (current_function_decl))
> > + || (current_binding_level->kind == sk_function_parms
> > + && current_binding_level->immediate_fn_ctx_p)
> > + || in_consteval_if_p);
> > +}
> > +
> >   /* Return true if a call to FN with number of arguments NARGS
> >  is an immediate invocation.  */
> > @@ -9451,6 +9459,12 @@ build_over_call (struct z_candidate *can
> >   }
> > /* Default arguments */
> > +  bool save_in_consteval_if_p = in_consteval_if_p;
> > +  /* If the call is immediate function invocation, make sure
> > + taking address of immediate functions is allowed in default
> > + arguments.  */
> > +  if (immediate_invocation_p (STRIP_TEMPLATE (fn), nargs))
> > +in_consteval_if_p = true;
> 
> ...moving this earlier in build_over_call, e.g. shortly after
> not_really_used:
> 
> You can also use make_temp_override.

make_temp_override can't be used, because in_consteval_if_p is BOOL_BITFIELD
under the hood and binding a reference to a bit-field is not valid.

In the patch I've used a specialized in_consteval_if_p_temp_override class
instead.  And it either needs to be restored after all the argument
processing, because we call immediate_invocation_p again and because of
the in_consteval_if_p override it is then false (the patch calls icip.reset
()), or we'd need to remember the result of the first immediate_invocation_p
call in some bool temporary and reuse that later.
Updated patch below.

This fixes the static_assert (bar (foo) == 42); case newly added to
consteval23.C, but unfortunately that isn't enough.
Consider incremental patch to the testcase:
--- gcc/testsuite/g++.dg/cpp2a/consteval23.C.jj 2021-10-21 11:32:48.570586881 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/consteval23.C2021-10-21 12:47:42.281769906 
+0200
@@ -2,6 +2,7 @@
 // { dg-do compile { target c++20 } }
 
 consteval int foo () { return 42; }
+constexpr auto baz (int (*fn) ()) { return fn; }
 
 consteval int
 bar (int (*fn) () = foo)
@@ -11,3 +12,5 @@ bar (int (*fn) () = foo)
 
 static_assert (bar () == 42);
 static_assert (bar (foo) == 42);
+static_assert (bar (&foo) == 42);
+static_assert (bar (baz (&foo)) == 42);

If even the last two lines are supposed to be valid, then I'm afraid we
need to reconsider the errors performed in cp_build_addr_expr_1, because
when that is called, we often don't know if we are in a subexpression
of immediate invocation or not, whether it is because we are in template,
something non-type dependent but other argumeents of the function call are
type dependent, or because it is an overloaded function and some overloads
are consteval and others are not and we are deep in a middle of some
argument parsing and other arguments haven't been parsed at all yet, etc.

I'm afraid I don't have a good idea where to move that diagnostic to though,
it would need to be done somewhere where we are certain we aren't in a
subexpression of immediate invocation.  Given statement expressions, even
diagnostics after parsing whole statements might not be good enough, e.g.
void
qux ()
{
  static_assert (bar (({ constexpr auto a = 1; foo; })) == 42);
}
But if we e.g. diagnose it somewhere after parsing the whole function (and
for templates only after instantiating it) plus where we handle
non-automatic variable initializers etc., will we handle all spots where
we should diagnose it?  It better should be done before cp_fold...
In whatever spot is right doing something similar to what
cp_walk_tree find_immediate_fndecl, does, except probably error out on each
case we see (with the right locus) instead of just finding the first match.

Note, trying clang++ on godbolt on the consteval23.C testcase with the above
ammendments, it diagnoses the = foo default argument, that seems to be a
clang bug to me, but if I comment out the
static_assert (bar () == 42); test and comment out the default argument
bar (int (*fn) () /* = foo */)
then it succeeds.  It even accepts the above qux with statement expression.

2021-10-21  Jakub Jelinek  

PR c++/102753
* cp-tree.h (saved_scope): Document that consteval_if_p member
is also set while processing immediate invocation.
(in_immediate_context): Declare.
* call.c (in_immediate_context): New function.
(immediate_invocation_p): Use it.
(struct in_consteval_if_p_temp_override): New class.
(build_over_call): Temporarily set in_consteval_if_p for processing
immediate inv

Re: [PATCH] sra: Fix corner case of total scalarization with virtual inheritance (PR 102505)

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 12:57 PM Martin Jambor  wrote:
>
> Hi,
>
> PR 102505 is a situation where of SRA takes its initial top-level
> access size from a get_ref_base_and_extent called on a COMPONENT_REF,
> and thus derived frm the FIELD_DECL, which however does not include a
> virtual base.  Total scalarization then goes on traversing the type,
> which however has virtual base past the non-virtual bits, tricking SRA
> to create sub-accesses outside of the supposedly encompassing
> accesses, which in turn triggers the verifier within the pass.
>
> The patch below fixes that by failing total scalarization when this
> situation is detected.
>
> Bootstrapped and tested on x86_64-linux, OK for trunk and (after testing
> there) on the branches?

OK.

> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2021-10-20  Martin Jambor  
>
> PR tree-optimization/102505
> * tree-sra.c (totally_scalarize_subtree): Check that the
> encountered field fits within the acces we would like to put it
> in.
>
> gcc/testsuite/ChangeLog:
>
> 2021-10-20  Martin Jambor  
>
> PR tree-optimization/102505
> * g++.dg/torture/pr102505.C: New test.
> ---
>  gcc/testsuite/g++.dg/torture/pr102505.C | 15 +++
>  gcc/tree-sra.c  |  2 ++
>  2 files changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr102505.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr102505.C 
> b/gcc/testsuite/g++.dg/torture/pr102505.C
> new file mode 100644
> index 000..a846751a0d6
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr102505.C
> @@ -0,0 +1,15 @@
> +struct D  { int i;  int pad alignas(16); };
> +struct B : virtual D
> +{
> +  int j =84;
> +  int k =84;
> +};
> +
> +struct C: B { };
> +
> +int main()
> +{
> +  C c;
> +  if (c.j != 84 || c.k != 84)
> +__builtin_abort();
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 9b786e29e4e..f561e1a2133 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -3288,6 +3288,8 @@ totally_scalarize_subtree (struct access *root)
>   continue;
>
> HOST_WIDE_INT pos = root->offset + int_bit_position (fld);
> +   if (pos + fsize > root->size)
> + return false;
> enum total_sra_field_state
>   state = total_should_skip_creating_access (root,
>  &last_seen_sibling,
> --
> 2.33.0
>


[COMMITTED] Revert the avoid threading circular paths commit.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
I've tested this patch on the wrong tree, and picked up the test changes
in a pending patch, without which this patch is no longer obvious.
Plus, it causes a regression in an invalid test I've recommended we remove.

I'm reverting this patch until the dependencies are reviewed.

Sorry for the noise.

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader::maybe_register_path): Remove circular paths check.
---
 gcc/tree-ssa-threadbackward.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 38913b06717..d94e3b962db 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -140,10 +140,6 @@ back_threader::maybe_register_path ()
 
   if (taken_edge && taken_edge != UNREACHABLE_EDGE)
 {
-  // Avoid circular paths.
-  if (m_visited_bbs.contains (taken_edge->dest))
-   return UNREACHABLE_EDGE;
-
   bool irreducible = false;
   bool profitable
= m_profit.profitable_path_p (m_path, m_name, taken_edge, &irreducible);
-- 
2.31.1



[PATCH] aarch64: Remove redundant struct type definitions in arm_neon.h

2021-10-21 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch deletes some redundant type definitions in
arm_neon.h. These vector type definitions are an artifact from the initial
commit that added the AArch64 port.

Bootstrapped and regression tested on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-10-15  Jonathan Wright  

* config/aarch64/arm_neon.h (__STRUCTN): Delete function
macro and all invocations.


rb14942.patch
Description: rb14942.patch


Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

2021-10-21 Thread H.J. Lu via Gcc-patches
On Thu, Oct 21, 2021 at 12:15 AM Richard Biener
 wrote:
>
> On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu  wrote:
> >
> > On Wed, Oct 20, 2021 at 9:58 AM Richard Biener
> >  wrote:
> > >
> > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu"  
> > > wrote:
> > > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener
> > > > wrote:
> > > >>
> > > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong  
> > > >> wrote:
> > > >> >
> > > >> > Many thanks for your explanation. I got the meaning of operands.
> > > >> > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned 
> > > >> > otherwise it will rise a "Real-Address Mode Exceptions".
> > > >> > I haven't considered this situation  "b(%rip)" has an address 
> > > >> > dependence of "a(%rip)" before. I think this situation could be 
> > > >> > resolved on the assembler side except for this dummy code like 
> > > >> > "movapd 0x200b37(%rip),%xmm1, ... addpd  0x200b37(%rip),%xmm0 ".
> > > >>
> > > >> Of course the compiler will only emit instructions which have the
> > > >> constraint of aligned memory
> > > >> when the memory is known to be aligned.  That's why I wonder why you
> > > >> would need such
> > > >> option.  "Real-Address Mode Exceptions" may point to the issue, but I
> > > >> wonder what's different
> > > >> in real mode vs. protected mode - even with segmentation the alignment
> > > >> of objects should
> > > >> prevail unless you play linker"tricks" that make global objects have
> > > >> different alignment - but
> > > >> then it's better to adjust the respective hooks to not falsely claim
> > > >> such alignment.  Consider
> > > >> for example
> > > >>
> > > >>if ((uintptr_t)&a & 0x7)
> > > >>  foo();
> > > >>   else
> > > >>  bar();
> > > >>
> > > >> GCC will optimize the branch statically to always call foo if 'a'
> > > >> appears to be aligned,
> > > >> even if you later try to "override" this with an option.  Alignment is
> > > >> not only about
> > > >> moves, it's also about knowledge about low bits in addresses and about
> > > >> alias analysis where alignment constrains how two objects can overlap.
> > > >>
> > > >> So - do not lie to the compiler!  A late "workaround" avoiding aligned
> > > >> SSE moves isn't a proper fix.
> > > >>
> > > >
> > > >The motivations are
> > > >
> > > >1.  AVX non-load/store ops work on unaligned memory.   Unaligned
> > > >load/store on aligned memory is as fast as aligned load/store on Intel
> > > >AVX machines.   The new switch makes load/store consistent with
> > > >other AVX ops.
> > > >2. We don't properly align the stack for AVX on Windows.  This can
> > > >be used as a workaround for -mavx on Windows.
> > >
> > > But this, with lying that the stack is aligned, causes all of the above 
> > > mentioned issues and thus needs to be fixed by either properly aligning 
> > > the stack or not lying to the compiler that we do.
> > >
> > > >
> > > >We can change TARGET_USE_UNALIGNED_VECTOR_MOVE
> > > >to require AVX.
> > >
> > > But such workaround does not make any sense since it does not fix the 
> > > fundamental underlying problem.
> > >
> >
> > There is a long standing desire to remove alignment checking (#AC(0)).
> > For integer operations, alignment checking is disabled in hardware.
> > For AVX ops, alignment checking is disabled in hardware for non-load/store
> > instructions.  But we can't disable alignment checking in hardware for
> > aligned load/store instructions.  -muse-unaligned-vector-move implements
> > disabling alignment checking for all AVX ops.
>
> No, it does not - it just emits unaligned moves.  The compiler still assumes
> aligned memory.  So whatever reason you have for disabling alignment
> checking for memory that is known to be aligned, I don't see it.
>
> If you want to "fix" broken user code then this doesn't do it.
>
> If you want to avoid the penalty for runtime stack alignment then you simply
> have to change the ABI(?) to not require vector types to have big alignment.
>
>

Let's drop it.  We will find another way.

Thanks.

-- 
H.J.


Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-21 Thread Richard Biener via Gcc-patches
On Wed, 20 Oct 2021, Andre Vieira (lists) wrote:

> On 15/10/2021 09:48, Richard Biener wrote:
> > On Tue, 12 Oct 2021, Andre Vieira (lists) wrote:
> >
> >> Hi Richi,
> >>
> >> I think this is what you meant, I now hide all the unrolling cost
> >> calculations
> >> in the existing target hooks for costs. I did need to adjust 'finish_cost'
> >> to
> >> take the loop_vinfo so the target's implementations are able to set the
> >> newly
> >> renamed 'suggested_unroll_factor'.
> >>
> >> Also added the checks for the epilogue's VF.
> >>
> >> Is this more like what you had in mind?
> > Not exactly (sorry..).  For the target hook I think we don't want to
> > pass vec_info but instead another output parameter like the existing
> > ones.
> >
> > vect_estimate_min_profitable_iters should then via
> > vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll
> > suggestion to vect_analyze_loop which should then, if the suggestion
> > was > 1, instead of iterating to the next vector mode run again
> > with a fixed VF (old VF times suggested unroll factor - there's
> > min_vf in vect_analyze_loop_2 which we should adjust to
> > the old VF times two for example and maybe store the suggested
> > factor as hint) - if it succeeds the result will end up in the
> > list of considered modes (where we now may have more than one
> > entry for the same mode but a different VF), we probably want to
> > only consider more unrolling once.
> >
> > For simplicity I'd probably set min_vf = max_vf = old VF * suggested
> > factor, thus take the targets request literally.
> >
> > Richard.
> 
> Hi,
> 
> I now pass an output parameter to finish_costs and route it through the
> various calls up to vect_analyze_loop.  I tried to rework
> vect_determine_vectorization_factor and noticed that merely setting min_vf and
> max_vf is not enough, we only use these to check whether the vectorization
> factor is within range, well actually we only use max_vf at that stage. We
> only seem to use 'min_vf' to make sure the data_references are valid.  I am
> not sure my changes are the most appropriate here, for instance I am pretty
> sure the checks for max and min vf I added in
> vect_determine_vectorization_factor are currently superfluous as they will
> pass by design, but thought they might be good future proofing?

Ah, ok - well, max_vf is determined by dependence analysis so we do
have to check that.  I think that

+ && known_le (unrolled_vf, min_vf))

is superfluous.  So if we use min_vf as passed to vect_analyze_loop_2
to carry the suggested unroll factor then the changes look reasonable.

What's

   if (max_vf != MAX_VECTORIZATION_FACTOR
-  && maybe_lt (max_vf, min_vf))
+  && loop_vinfo->suggested_unroll_factor == 1
+  && max_vf < estimated_poly_value (min_vf, POLY_VALUE_MAX))
 return opt_result::failure_at (vect_location, "bad data 
dependence.\n");
   LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf;

supposed to be though?  Why can we allow the unroll case to
get past this point?  I suppose initializing max_vf to
MAX_VECTORIZATION_FACTOR doesn't play well with poly-ints?
That is, how does estimated_poly_value (min_vf, POLY_VALUE_MAX)
differ here?  I would have expected that we can drop the
max_vf != MAX_VECTORIZATION_FACTOR part but otherwise leave
the test the same?

Note you adjust the vectorization factor in 
vect_determine_vectorization_factor but I think you have to delay
that until after vect_update_vf_for_slp which means doing it
before the start_over: label.

@@ -3038,6 +3203,18 @@ vect_analyze_loop (class loop *loop, 
vec_info_shared *shared)

   if (res)
{
+ /* Only try unrolling main loops.  */
+ if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+   {
+ opt_loop_vec_info unrolled_vinfo =
+   vect_try_unrolling (loop_vinfo, &n_stmts,
+   suggested_unroll_factor);
+ if (unrolled_vinfo)
+   loop_vinfo = unrolled_vinfo;
+ /* Reset suggested_unroll_factor for next loop_vinfo.  */
+ suggested_unroll_factor = 1;
+   }

so it looks like this eventually will leak 'loop_vinfo' but it
also looks like you throw away the not unrolled analysis and costing.
I was suggesting to simply add the unrolled variant to the
alternatives considered - the flow of vect_analyze_loop is already
quite complicated and the changes following this hunk do not help :/
(and I'm not sure what vect_reanalyze_as_main_loop is supposed to do
or why we need to consider unrolling there as well).  If we
don't want to globally consider the unrolled variant maybe
we can at least decide between the unrolled and not unrolled
variant in vect_try_unrolling?  But of course only the whole
series, (unrolled) main + vectorized epilogues, determine the
final cost.

That said, the overall flow is OK now, some details about the
max_vf check and where to compute the unrolled VF needs to be
fleshed out.  And then there's

Re: [RFC] vect: Convert cost hooks to classes

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, 14 Oct 2021, Richard Sandiford wrote:

> The current vector cost interface has a quite a bit of redundancy
> built in.  Each target that defines its own hooks has to replicate
> the basic unsigned[3] management.  Currently each target also
> duplicates the cost adjustment for inner loops.
> 
> This patch instead defines a vector_costs class for holding
> the scalar or vector cost and allows targets to subclass it.
> There is then only one costing hook: to create a new costs
> structure of the appropriate type.  Everything else can be
> virtual functions, with common concepts implemented in the
> base class rather than in each target's derivation.
> 
> This might seem like excess C++-ification, but it shaves
> ~100 LOC.  I've also got some follow-on changes that become
> significantly easier with this patch.  Maybe it could help
> with things like weighting blocks based on frequency too.
> 
> This will clash with Andre's unrolling patches.  His patches
> have priority so this patch should queue behind them.
> 
> The x86 and rs6000 parts fully convert to a self-contained class.
> The equivalent aarch64 changes are more complex, so this patch
> just does the bare minimum.  A later patch will rework the
> aarch64 bits.
> 
> Tested on aarch64-linux-gnu, arm-linux-gnueabihf, x86_64-linux-gnu
> and powerpc64le-linux-gnu.  WDYT?

I like it!  Thus OK.

I suggested sth similar to Martin for the backend state
'[PATCH 3/N] Come up with casm global state.', abstracting
varasm global state and allowing targets to override this
via the adjusted init_sections target hook.

Richard.

> Richard
> 
> 
> gcc/
>   * target.def (targetm.vectorize.init_cost): Replace with...
>   (targetm.vectorize.create_costs): ...this.
>   (targetm.vectorize.add_stmt_cost): Delete.
>   (targetm.vectorize.finish_cost): Likewise.
>   (targetm.vectorize.destroy_cost_data): Likewise.
>   * doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with...
>   (TARGET_VECTORIZE_CREATE_COSTS): ...this.
>   (TARGET_VECTORIZE_ADD_STMT_COST): Delete.
>   (TARGET_VECTORIZE_FINISH_COST): Likewise.
>   (TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
>   * doc/tm.texi: Regenerate.
>   * tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data
>   parameter.
>   (vec_info::target_cost_data): Change from a void * to a vector_costs *.
>   (vector_costs): New class.
>   (init_cost): Take a vec_info and return a vector_costs.
>   (dump_stmt_cost): Remove data parameter.
>   (add_stmt_cost): Replace vinfo and data parameters with a vector_costs.
>   (add_stmt_costs): Likewise.
>   (finish_cost): Replace data parameter with a vector_costs.
>   (destroy_cost_data): Delete.
>   * tree-vectorizer.c (dump_stmt_cost): Remove data argument and
>   don't print it.
>   (vec_info::vec_info): Remove the target_cost_data parameter and
>   initialize the member variable to null instead.
>   (vec_info::~vec_info): Delete target_cost_data instead of calling
>   destroy_cost_data.
>   (vector_costs::add_stmt_cost): New function.
>   (vector_costs::finish_cost): Likewise.
>   (vector_costs::record_stmt_cost): Likewise.
>   (vector_costs::adjust_cost_for_freq): Likewise.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
>   call to vec_info::vec_info.
>   (vect_compute_single_scalar_iteration_cost): Update after above
>   changes to costing interface.
>   (vect_analyze_loop_operations): Likewise.
>   (vect_estimate_min_profitable_iters): Likewise.
>   (vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA
>   at the start_over point, where it needs to be recreated after
>   trying without slp.  Update retry code accordingly.
>   * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call
>   to vec_info::vec_info.
>   (vect_slp_analyze_operation): Update after above changes to costing
>   interface.
>   (vect_bb_vectorization_profitable_p): Likewise.
>   * targhooks.h (default_init_cost): Replace with...
>   (default_vectorize_create_costs): ...this.
>   (default_add_stmt_cost): Delete.
>   (default_finish_cost, default_destroy_cost_data): Likewise.
>   * targhooks.c (default_init_cost): Replace with...
>   (default_vectorize_create_costs): ...this.
>   (default_add_stmt_cost): Delete, moving logic to vector_costs instead.
>   (default_finish_cost, default_destroy_cost_data): Delete.
>   * config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from
>   vector_costs.  Add a constructor.
>   (aarch64_init_cost): Replace with...
>   (aarch64_vectorize_create_costs): ...this.
>   (aarch64_add_stmt_cost): Replace with...
>   (aarch64_vector_costs::add_stmt_cost): ...this.  Use record_stmt_cost
>   to adjust the cost for inner loops.
>   (aarch64_finish_cost): Replace with...
>   (aarch64_vector_costs::finish_c

Re: [PATCH] gcc: implement AIX-style constructors

2021-10-21 Thread CHIGOT, CLEMENT via Gcc-patches
Hi David,

The problem is that cdtors is created by the linker only when -bcdtors
flag is provided. Thus, if we add "extern void (* _cdtors[]) (void);" to
the "crtcxa.c", we can't used it without using the new constructor types.
One solution would be to create another crtcxa (eg crtcxa_cdtors.o) which will
be replacing the default crtcxa.o when needed. I didn't thought about that
when creating the patch. But it might be a better approach. What do you think ?

About documentation, I wasn't aware of that. I'll do it.

Clément

From: David Edelsohn 
Sent: Tuesday, October 19, 2021 10:14 PM
To: CHIGOT, CLEMENT 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] gcc: implement AIX-style constructors

Caution! External email. Do not open attachments or click links, unless this 
email comes from a known sender and you know the content is safe.

Clement,

+  /* Use __C_runtime_pstartup to run ctors and register dtors.
+ This whole part should normally be in libgcc but as
+ AIX cdtors format is currently not the default, managed
+ that in collect2.  */

Why are you emitting the special startup function call in collect2.c
instead of placing it in libgcc.  The comment mentions that the
special startup function should be defined in libgcc.

Yes, the AIX ld bcdtors mechanism is not the default, but what is the
harm? The symbol will be defined and exported by libgcc. If the AIX
linker -bcdtors functionality is not invoked, the symbol is not used.
And if a user does invoke the AIX linker with -bcdtors, the behavior
will be the same (either the program was compiled to use AIX cdtors or
not, which is the same if the startup function is emitted by
collect2.c.

Also, the patch should include documentation of the option.  The
documentation should mention that this is for interoperability with
IBM XL Compiler, and the option will not operate correctly unless the
application and the GCC runtime are built with the option.

Thanks, David

On Mon, Oct 18, 2021 at 3:55 AM CHIGOT, CLEMENT  wrote:
>
> AIX linker now supports constructors and destructors detection. For such
> functions to be detected, their name must starts with __sinit or __sterm.
> and -bcdtors must be passed to linker calls. It will create "_cdtors"
> symbol which can be used to launch the initialization.
>
> This patch creates a new RS6000 flag "-mcdtors=".
> With "-mcdtors=aix", gcc will generate these new constructors/destructors.
> With "-mcdtors=gcc", which is currently the default, gcc will continue
> to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D
> symbols).
> Ideally, it would have been better to enable the AIX format by default
> instead of using collect2. However, the compatibility between the
> previously-built binaries and the new ones is too complex to be done.
>
> gcc/ChangeLog:
> 2021-10-04  Clément Chigot  
>
> * collect2.c (aixbcdtors_flags): New variable.
> (main): Use it to detect -bcdtors and remove -binitfini flag.
> (write_c_file_stat): Adapt to new AIX format.
> * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define.
> (FILE_STERM_FORMAT): New define.
> (TARGET_FILE_FUNCTION_FORMAT): New define.
> * config/rs6000/aix64.opt: Add -mcdtors flag.
> * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when
>   -mcdtors=aix is passed.
> * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise.
> * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise.
> * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum.
> * tree.c (get_file_function_name): Add
>   TARGET_FILE_FUNCTION_FORMAT support.
>
> gcc/testsuite/ChangeLog:
> 2021-10-04  Clément Chigot  
>
> * gcc.target/powerpc/constructor-aix.c: New test.
>
>


Re: [PATCH 3/N] Come up with casm global state.

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 11:47 AM Martin Liška  wrote:
>
> On 10/5/21 13:54, Richard Biener wrote:
> > On Mon, Oct 4, 2021 at 1:13 PM Martin Liška  wrote:
> >>
> >> On 9/22/21 11:59, Richard Biener wrote:
> >>> On Thu, Sep 16, 2021 at 3:12 PM Martin Liška  wrote:
> 
>  This patch comes up with asm_out_state and a new global variable casm.
> 
>  Tested on all cross compilers.
>  Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
>  Ready to be installed?
> >>>
> >>>   * output.h (struct asm_out_file): New struct that replaces a
> >>> ^^^
> >>> asm_out_state?
> >>
> >> Yes, sure!
> >>
> >>>
> >>> You replace a lot of asm_out_file - do we still need the macro then?
> >>> (I'd have simplified the patch leaving that replacement out at first)
> >>
> >> Well, I actually replaced only a very small fraction of the usage of 
> >> asm_out_file.
> >>
> >> $ git grep asm_out_file | grep -v ChangeLog | wc -l
> >>
> >> 1601
> >>
> >>
> >> So I think we should live with the macro for some time ...
> >>
> >>>
> >>> You leave quite some global state out of the struct that is obviously
> >>> related, in the diff I see object_block_htab for example.
> >>
> >> Yes, I'm aware of it. Can we do that incrementally?
> >>
> >>> Basically
> >>> everything initialized in init_varasm_once is a candidate (which
> >>> then shows const_desc_htab and shared_constant_pool as well
> >>> as pending_assemble_externals_set).
> >>
> >> Well, these would probably need a different header file (or another 
> >> #include ... must
> >> be added before output.h :// ).
> >>
> >>> For the goal of outputting
> >>> early DWARF to another file the state CTOR could have a mode
> >>> to not initialize those parts or we could have asm-out-state-with-sections
> >>> as base of asm-out-state.
> >>
> >> Yes, right now asm_out_state ctor is minimal:
> >>
> >> asm_out_state (): out_file (NULL), in_section (NULL),
> >>   sec ({}), anchor_labelno (0), in_cold_section_p (false)
> >> {
> >>   section_htab = hash_table::create_ggc (31);
> >> }
> >>
> >>>
> >>> In the end there will be a target part of the state so I think
> >>> construction needs to be defered to the target which can
> >>> derive from asm-out-state and initialize the part it needs.
> >>> That's currently what targetm.asm_out.init_sections () does
> >>> and we'd transform that to a targetm.asm_out.create () or so.
> >>> That might already be necessary for the DWARF stuff.
> >>
> >> So what do you want to with content of init_varasm_once function?
> >
> > It goes away ;)  Or rather becomes the invocation of the
> > asm-out-state CTOR via the target hook.
> >
> >>>
> >>> That said, dealing with the target stuff piecemail is OK, but maybe
> >>> try to make sure that init_varasm_once is actually identical
> >>> to what the CTOR does?
> >>
> >> So you want asm_out_state::asm_out_state doing what we current initialize
> >> in init_varasm_once, right?
> >
> > Yes, asm_out_state should cover everything initialized in init_varasm_once
> > (and the called targetm.asm_out.init_sections).
> >
> > targetm.asm_out.init_sections would become
> >
> > asm_out_state *init_sections ();
> >
> > where the target can derive from asm_out_state (optionally) and
> > allocates the asm-out-state & invokes the CTORs.  The middle-end
> > visible asm_out_state would contain all the tables initialized in
> > init_varasm_once.
> >
> > I'm not sure if doing it piecemail is good - we don't want to touch
> > things multiple times without good need and it might be things
> > turn out not be as "simple" as the above plan sounds.
> >
> > I would suggest to try the conversion with powerpc since it
> > seems to be the only target with two different implementations
> > for the target hook.
> >
> > The
> >
> > static GTY(()) section *read_only_data_section;
> > static GTY(()) section *private_data_section;
> > static GTY(()) section *tls_data_section;
> > static GTY(()) section *tls_private_data_section;
> > static GTY(()) section *read_only_private_data_section;
> > static GTY(()) section *sdata2_section;
> >
> > section *toc_section = 0;
> >
> > could become #defines to static_cast
> > (casm)->section_name
> > and there'd be
> >
> > class rs6000_asm_out_state : public asm_out_state
> > {
> > ... add stuff ...
> > }
> >
> > in rs6000/xcoff.h and rs6000/sysv4.h and some generic rs6000 header as well
> > (adding no fields) and the target hook would basically do
> >
> >   return ggc_new rs6000_asm_state ();
> >
> > (OK, ggc_alloc + placement new I guess).  The default hook just
> > creates asm_out_state.
>
> Hello.
>
> All right, I made a patch candidate that does the suggested steps and I ported
> rs6000 target (both ELF and XCOFF sub-targets).
>
> So far I was able to bootstrap on ppc6le-linux-gnu.
>
> Thoughts?

+extern GTY(()) rs6000_asm_out_state *target_casm;

this seems to be unused

+#define rs6000_casm static_cast (casm)

maybe there's a better way?  Though I ca

Re: [PATCH 3/N] Come up with casm global state.

2021-10-21 Thread David Malcolm via Gcc-patches
On Thu, 2021-09-16 at 15:12 +0200, Martin Liška wrote:
> This patch comes up with asm_out_state and a new global variable
> casm.
> 
> Tested on all cross compilers.
> Patch can bootstrap on x86_64-linux-gnu and survives regression
> tests.
> 
> Ready to be installed?
> Thanks,
> Martin

This is possibly a silly question, but what does the "c" stand for in
"casm"?   "compiler"? "compilation"? "common"?

Sorry if it's explained somewhere, but I didn't spot this when glancing
at the patches.

Dave



Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

2021-10-21 Thread Martin Liška

On 10/21/21 07:47, Hongyu Wang via Gcc-patches wrote:

|Yes, updated patch.|


Note the patch caused the following test failure:

FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1

Martin


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
 wrote:
>
> On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:
> >
> >
> >
> > On 10/18/2021 2:17 AM, Aldy Hernandez wrote:
> > >
> > >
> > > On 10/18/21 12:52 AM, Jeff Law wrote:
> > >>
> > >>
> > >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
> > >>> The following patch converts the strlen pass from evrp to ranger,
> > >>> leaving DOM as the last remaining user.
> > >> So is there any reason why we can't convert DOM as well?   DOM's use
> > >> of EVRP is pretty limited.  You've mentioned FP bits before, but my
> > >> recollection is those are not part of the EVRP analysis DOM uses.
> > >> Hell, give me a little guidance and I'll do the work...
> > >
> > > Not only will I take you up on that offer, but I can provide 90% of
> > > the work.  Here be dragons, though (well, for me, maybe not for you ;-)).
> > >
> > > DOM is actually an evrp pass at -O1 in disguise.  The reason it really
> > > is a covert evrp pass is because:
> > >
> > > a) It calls extract_range_from_stmt on each statement.
> > >
> > > b) It folds conditionals with simplify_using_ranges.
> > >
> > > c) But most importantly, it exports discovered ranges when it's done
> > > (evrp_range_analyzer(true)).
> > >
> > > If you look at the evrp pass, you'll notice that that's basically what
> > > it does, albeit with the substitute and fold engine, which also calls
> > > gimple fold plus other goodies.
> > >
> > > But I could argue that we've made DOM into an evrp pass without
> > > noticing.  The last item (c) is particularly invasive because these
> > > exported ranges show up in other passes unexpectedly.  For instance, I
> > > saw an RTL pass at -O1 miss an optimization because it was dependent
> > > on some global range being set.  IMO, DOM should not export global
> > > ranges it discovered during its walk (do one thing and do it well),
> > > but I leave it to you experts to pontificate.
> > All true.  But I don't think we've got many, if any, hard dependencies
> > on those behaviors.
> >
> > >
> > > The attached patch is rather trivial.  It's mostly deleting state.  It
> > > seems DOM spends a lot of time massaging the IL so that it can fold
> > > conditionals or thread paths.  None of this is needed, because the
> > > ranger can do all of this.  Well, except floats, but...
> > Massaging the IL should only take two forms IIRC.
> >
> > First, if we have a simplification we can do.  That could be const/copy
> > propagation, replacing an expression with an SSA_NAME or constant and
> > the like.  It doesn't massage the IL just to massage the IL.
> >
> > Second, we do temporarily copy propagate the current known values of an
> > SSA name into use points and then see if that allows us to determine if
> > a statement is already in the hash tables.  But we undo that so that
> > nobody should see that temporary change in state.
>
> Are you sure we still do that?  I can't find it at least.

I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
copy propagation that happens in the DOM's use of the threader:

  /* Make a copy of the uses & vuses into USES_COPY, then cprop into
 the operands.  */
  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
{
  tree tmp = NULL;
  tree use = USE_FROM_PTR (use_p);

  copy[i++] = use;
  if (TREE_CODE (use) == SSA_NAME)
tmp = SSA_NAME_VALUE (use);
  if (tmp)
SET_USE (use_p, tmp);
}

  cached_lhs = simplifier->simplify (stmt, stmt, bb, this);

  /* Restore the statement's original uses/defs.  */
  i = 0;
  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
SET_USE (use_p, copy[i++]);

Aldy



[PATCH 0/2] jit: Complex types and loong constants

2021-10-21 Thread Petter Tomner via Gcc-patches
Hi,

These two patches fixes support for complex types and adds the possibility to 
make
long long, long double and complex long double constants.

Please see follow up mails.

Regards,

[PATCH 1/2] jit: Complex types and loong constants

2021-10-21 Thread Petter Tomner via Gcc-patches
This patch fixes support for complex types.

It adds the entrypoints:
gcc_jit_context_new_rvalue_from_complex_double
gcc_jit_context_set_bool_enable_complex_types

Aswell as the binary operator GCC_JIT_BINARY_OP_COMPLEX, to make
complex numbers from two real number rvalues.

Note that the complex types already are in the type enum, so I thought the
cleanest solution is to enable them, rather than to have a "get_complex_type"
function.

See attachement.From 6d5d7ce40f9bb84a7943e5113c8954ae4b941905 Mon Sep 17 00:00:00 2001
From: Petter Tomner 
Date: Fri, 15 Oct 2021 20:16:54 +0200
Subject: [PATCH 1/3] Complex

Fix support for complex types

The patch adds support of complex floating point types to libgccjit.
A new binary operator 'COMPLEX' is added to create complex values
from real values. Aswell as a function to create a complex double
literal.

To notify users if a binary linking to libgccjit depends on complex
types, complex type support most be enabled with a new option function
since they allready are in the types enum.

Signed-off-by:
2021-10-21	Petter Tomner	

gcc/jit/
	* jit-common.h : (INNER_BOOL_OPTION_ENABLE_COMPLEX_TYPES): New
	* jit-playback.c : Create imaginary literals, conversions
	  (complex_real_to_real) : New
	  (new_rvalue_from_const<...>) : Also build complex constants
	  (new_rvalue_from_const <_Complex double>) : New
	  (new_binary_op) : Add COMPLEX operator
	  (build_cast) : Casts to complex
	  (memento_of_new_rvalue_from_const <_Complex double>) : New
	  (binary_op::make_debug_string) : Handle complex operator
	* jit-recording.c : Reproducer, debug strings, forwarding
	* jit-recording.h :
	  (float_size_qual) : Poll size qualifier of float types
	  (is_complex) : Poll if type is complex
	* libgccjit++.h : New entrypoints, see below
	* libgccjit.c : Implementation of new entry points, see .h
	  (gcc_jit_context_get_type) : Extend range check, validation
	  (valid_binary_op_p) : Extend range check
	  (gcc_jit_context_new_binary_op) : Validation
	  (gcc_jit_context_new_unary_op) : Validation
	  (gcc_jit_context_new_comparison) : Validation
	* libgccjit.h :
	  (gcc_jit_context_new_rvalue_from_complex_double) : New
	  (GCC_JIT_BINARY_OP_COMPLEX) : New
	  (LIBGCCJIT_HAVE_COMPLEX) : New
	  (gcc_jit_context_set_bool_enable_complex_types) : New
	* libgccjit.map : Added new entrypoints
	  (LIBGCCJIT_ABI_16) : New

gcc/testsuite/
	* jit.dg/test-complex-builtins.c : New
	* jit.dg/test-complex-literals.c : New
	* jit.dg/test-complex-misc.c : New
	* jit.dg/test-complex-operators.c : New
	* jit.dg/test-complex-types.c : New
	* jit.dg/test-error-complex-noenable.c : New

gcc/jit/docs/topics/
	* contexts.rst : Update docs
	* expressions.rst
	* types.rst
---
 gcc/jit/docs/topics/contexts.rst  |  20 +
 gcc/jit/docs/topics/expressions.rst   |  89 ++-
 gcc/jit/docs/topics/types.rst |  12 +-
 gcc/jit/jit-common.h  |   1 +
 gcc/jit/jit-playback.c| 165 -
 gcc/jit/jit-recording.c   | 126 +++-
 gcc/jit/jit-recording.h   |  28 +
 gcc/jit/libgccjit++.h |  21 +
 gcc/jit/libgccjit.c   |  97 ++-
 gcc/jit/libgccjit.h   |  44 +-
 gcc/jit/libgccjit.map |   6 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h  |  30 +
 gcc/testsuite/jit.dg/test-complex-builtins.c  | 217 ++
 gcc/testsuite/jit.dg/test-complex-literals.c  | 145 
 gcc/testsuite/jit.dg/test-complex-misc.c  | 204 ++
 gcc/testsuite/jit.dg/test-complex-operators.c | 353 +
 gcc/testsuite/jit.dg/test-complex-types.c | 677 ++
 .../jit.dg/test-error-complex-noenable.c  |  31 +
 18 files changed, 2221 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-complex-builtins.c
 create mode 100644 gcc/testsuite/jit.dg/test-complex-literals.c
 create mode 100644 gcc/testsuite/jit.dg/test-complex-misc.c
 create mode 100644 gcc/testsuite/jit.dg/test-complex-operators.c
 create mode 100644 gcc/testsuite/jit.dg/test-complex-types.c
 create mode 100644 gcc/testsuite/jit.dg/test-error-complex-noenable.c

diff --git a/gcc/jit/docs/topics/contexts.rst b/gcc/jit/docs/topics/contexts.rst
index 00fb17e155d..d8689407cc6 100644
--- a/gcc/jit/docs/topics/contexts.rst
+++ b/gcc/jit/docs/topics/contexts.rst
@@ -489,6 +489,26 @@ Boolean options
 
   #ifdef LIBGCCJIT_HAVE_gcc_jit_context_set_bool_use_external_driver
 
+.. function:: void\
+  gcc_jit_context_set_bool_enable_complex_types (gcc_jit_context *ctxt,\
+ int bool_value);
+
+   This option can enable complex type support in libgccjit. By default,
+   no complex types can be used.
+
+   If you use complex types, it is recommended to use builtin functions
+   `creal`, `cimag` and `conj` etc. with the correct type suffix, since
+   those are inlined into the code, rather

Re: [PATCH 3/N] Come up with casm global state.

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 2:48 PM David Malcolm via Gcc-patches
 wrote:
>
> On Thu, 2021-09-16 at 15:12 +0200, Martin Liška wrote:
> > This patch comes up with asm_out_state and a new global variable
> > casm.
> >
> > Tested on all cross compilers.
> > Patch can bootstrap on x86_64-linux-gnu and survives regression
> > tests.
> >
> > Ready to be installed?
> > Thanks,
> > Martin
>
> This is possibly a silly question, but what does the "c" stand for in
> "casm"?   "compiler"? "compilation"? "common"?

'current' like cfun or crtl .. maybe just following "bad" examples?

> Sorry if it's explained somewhere, but I didn't spot this when glancing
> at the patches.
>
> Dave
>


[PATCH 2/2] jit: Complex types and loong constants

2021-10-21 Thread Petter Tomner via Gcc-patches
This patch adds the possibility to make the following constants:
* long long
* long double
* complex long double

The long long one is needed for 32bit systems.

The new entrypoints are:
gcc_jit_context_new_rvalue_from_long_long
gcc_jit_context_new_rvalue_from_long_double
gcc_jit_context_new_rvalue_from_complex_long_double

The patch also fixes a issue with the reproducer's debug 
c-file writer, which does not handle floating point numbers
very well. I.e. infs, NaN and losing precision on doubles.

make check-jit runs fine with the patch series on Debian 64x.

See attachment.From f49cea6830b08a019958ef3f6a1ced1e416f80b3 Mon Sep 17 00:00:00 2001
From: Petter Tomner 
Date: Sat, 16 Oct 2021 21:55:20 +0200
Subject: [PATCH 2/3] long long

Add long long, long double and complex long double constants

This patch adds entrypoints to make constants (literals) for long long,
long double and complex long double types.

Also, it fixes problems with all floating point numbers in write_reproducer.
I.e. support NAN and INF values, aswell as proper precision for double
and long double. Which is needed for testing this patch.

Signed-off-by:
2021-10-21	Petter Tomner	

gcc/jit/
	* jit-playback.c :
	(new_rvalue_from_const ): New
	(new_rvalue_from_const ): Uses long long instead
	(new_rvalue_from_const ): New
	(new_rvalue_from_const <_Complex long double>) : New
	* jit-recording.c :
	(dump_reproducer_to_file) : Type punning NANs
	(reproducer_write_arr) : arr -> "char array literal"-string
	(memento_of_new_rvalue_from_const ) : New
	(memento_of_new_rvalue_from_const ) : New
	(memento_of_new_rvalue_from_const <_Complex long double>) : New
	(memento_of_new_rvalue_from_const ::make_debug_string):
	%g instead of %f
	(memento_of_new_rvalue_from_const ::write_reproducer):
	%a instead of %f, handle NAN and INF
	(memento_of_new_rvalue_from_const <_Complex double>::write_reproducer):
	%a instead of %f, handle NAN and INF. Use CMPLX macro.
	* libgccjit.c :
	* libgccjit.h :
	(LIBGCCJIT_HAVE_LONGLONG_CONSTANTS) : New
	(gcc_jit_context_new_rvalue_from_long_long) : New
	(gcc_jit_context_new_rvalue_from_long_double) : New
	(gcc_jit_context_new_rvalue_from_complex_long_double) : New
	* libgccjit++.h : New entrypoints
	* libgccjit.map: New entrypoints added to ABI 16

gcc/testsuite/
	* jit.dg/all-non-failing-tests.h: Added test-long-literals.c
	* jit.dg/test-long-literals.c: New

gcc/jit/docs/topics/
	* expressions.rst : Updated docks
---
 gcc/jit/docs/topics/expressions.rst  |  43 +-
 gcc/jit/jit-playback.c   | 143 +-
 gcc/jit/jit-recording.c  | 442 ++-
 gcc/jit/libgccjit++.h|  39 ++
 gcc/jit/libgccjit.c  |  46 ++
 gcc/jit/libgccjit.h  |  33 +-
 gcc/jit/libgccjit.map|   3 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h |   7 +
 gcc/testsuite/jit.dg/test-long-literals.c| 283 
 9 files changed, 1017 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-long-literals.c

diff --git a/gcc/jit/docs/topics/expressions.rst b/gcc/jit/docs/topics/expressions.rst
index a98f35572b4..30a3b9780f9 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -70,6 +70,18 @@ Simple expressions
Given a numeric type (integer or floating point), build an rvalue for
the given constant :c:type:`long` value.
 
+.. function:: gcc_jit_rvalue *\
+  gcc_jit_context_new_rvalue_from_long_long (gcc_jit_context *ctxt, \
+ gcc_jit_type *numeric_type, \
+ long long value)
+
+   Given a numeric type (integer or floating point), build an rvalue for
+   the given constant :c:type:`long long` value.
+
+   This function was added in :ref:`LIBGCCJIT_ABI_16`;
+   you can test for its presence using
+   `#ifdef LIBGCCJIT_HAVE_LONGLONG_CONSTANTS`
+
 .. function::  gcc_jit_rvalue *gcc_jit_context_zero (gcc_jit_context *ctxt, \
  gcc_jit_type *numeric_type)
 
@@ -98,6 +110,18 @@ Simple expressions
Given a numeric type (integer or floating point), build an rvalue for
the given constant :c:type:`double` value.
 
+.. function:: gcc_jit_rvalue *\
+  gcc_jit_context_new_rvalue_from_long_double (gcc_jit_context *ctxt, \
+   gcc_jit_type *numeric_type, \
+   long double value)
+
+   Given a floating point type, build an rvalue for
+   the given constant :c:type:`long double`.
+
+   This function was added in :ref:`LIBGCCJIT_ABI_16`;
+   you can test for its presence using
+   `#ifdef LIBGCCJIT_HAVE_LONGLONG_CONSTANTS`
+
 .. function:: gcc_jit_rvalue *\
   gcc_jit_context_new_rvalue_from_complex_double (gcc_jit_context *ctxt, \

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-21 Thread Martin Liška

On 10/21/21 11:57, Richard Biener wrote:

which previously affected debug_nonbind_markers_p.  I think it makes
sense to move
the above to finish_options as well.  I suppose -help doesn't correctly dump
the -g enabled state for -gtoggle at the moment?


All right, works for me. The updated patch was tested on nvptx cross compiler
and can bootstrap on x86_64-linux-gnu and survives regression tests.

The -g option is showed in -Q --help=common as:

  -g

So no option value is displayed.

Is the patch fine now?
Thanks,
MartinFrom da24e51c8e108373256f1106fe4478b919189c99 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 14 Oct 2021 14:57:18 +0200
Subject: [PATCH] options: Fix variable tracking option processing.

	PR debug/102585
	PR bootstrap/102766

gcc/ChangeLog:

	* opts.c (finish_options): Process flag_var_tracking* options
	here as they can be adjusted by optimize attribute.
	Process also flag_syntax_only and flag_gtoggle.
	* toplev.c (process_options): Remove it here.
	* common.opt: Make debug_nonbind_markers_p as PerFunction
	attribute as it depends on optimization level.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr102585.c: New test.
---
 gcc/common.opt  |  2 +-
 gcc/opts.c  | 45 +++
 gcc/testsuite/gcc.dg/pr102585.c |  6 +
 gcc/toplev.c| 47 +
 4 files changed, 53 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102585.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a2af7fb36e0..c4a77f65aa2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3284,7 +3284,7 @@ Common Driver JoinedOrMissing Negative(gvms)
 Generate debug information in extended STABS format.
 
 gstatement-frontiers
-Common Driver Var(debug_nonbind_markers_p)
+Common Driver Var(debug_nonbind_markers_p) PerFunction
 Emit progressive recommended breakpoint locations.
 
 gstrict-dwarf
diff --git a/gcc/opts.c b/gcc/opts.c
index 65fe192a198..4472cec1b98 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1349,6 +1349,51 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
 SET_OPTION_IF_UNSET (opts, opts_set, flag_vect_cost_model,
 			 VECT_COST_MODEL_CHEAP);
 
+  /* One could use EnabledBy, but it would lead to a circular dependency.  */
+  if (!OPTION_SET_P (flag_var_tracking_uninit))
+ flag_var_tracking_uninit = flag_var_tracking;
+
+  if (!OPTION_SET_P (flag_var_tracking_assignments))
+flag_var_tracking_assignments
+  = (flag_var_tracking
+	 && !(flag_selective_scheduling || flag_selective_scheduling2));
+
+  if (flag_var_tracking_assignments_toggle)
+flag_var_tracking_assignments = !flag_var_tracking_assignments;
+
+  if (flag_var_tracking_assignments && !flag_var_tracking)
+flag_var_tracking = flag_var_tracking_assignments = -1;
+
+  if (flag_var_tracking_assignments
+  && (flag_selective_scheduling || flag_selective_scheduling2))
+warning_at (loc, 0,
+		"var-tracking-assignments changes selective scheduling");
+
+  if (flag_syntax_only)
+{
+  write_symbols = NO_DEBUG;
+  profile_flag = 0;
+}
+
+  if (flag_gtoggle)
+{
+  if (debug_info_level == DINFO_LEVEL_NONE)
+	{
+	  debug_info_level = DINFO_LEVEL_NORMAL;
+
+	  if (write_symbols == NO_DEBUG)
+	write_symbols = PREFERRED_DEBUGGING_TYPE;
+	}
+  else
+	debug_info_level = DINFO_LEVEL_NONE;
+}
+
+  if (!OPTION_SET_P (debug_nonbind_markers_p))
+debug_nonbind_markers_p
+  = (optimize
+	 && debug_info_level >= DINFO_LEVEL_NORMAL
+	 && dwarf_debuginfo_p ()
+	 && !(flag_selective_scheduling || flag_selective_scheduling2));
 }
 
 #define LEFT_COLUMN	27
diff --git a/gcc/testsuite/gcc.dg/pr102585.c b/gcc/testsuite/gcc.dg/pr102585.c
new file mode 100644
index 000..efd066b4a4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102585.c
@@ -0,0 +1,6 @@
+/* PR debug/102585 */
+/* { dg-do compile } */
+/* { dg-options "-fvar-tracking-assignments -fno-var-tracking" } */
+
+#pragma GCC optimize 0
+void d_demangle_callback_Og() { int c = 0; }
diff --git a/gcc/toplev.c b/gcc/toplev.c
index cb4f8c470f0..486bd8804fa 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1378,25 +1378,6 @@ process_options (bool no_backend)
 	}
 }
 
-  if (flag_syntax_only)
-{
-  write_symbols = NO_DEBUG;
-  profile_flag = 0;
-}
-
-  if (flag_gtoggle)
-{
-  if (debug_info_level == DINFO_LEVEL_NONE)
-	{
-	  debug_info_level = DINFO_LEVEL_NORMAL;
-
-	  if (write_symbols == NO_DEBUG)
-	write_symbols = PREFERRED_DEBUGGING_TYPE;
-	}
-  else
-	debug_info_level = DINFO_LEVEL_NONE;
-}
-
   /* CTF is supported for only C at this time.  */
   if (!lang_GNU_C ()
   && ctf_debug_info_level > CTFINFO_LEVEL_NONE)
@@ -1499,6 +1480,7 @@ process_options (bool no_backend)
 	}
   flag_var_tracking = 0;
   flag_var_tracking_uninit = 0;
+  flag_var_tracking_assignments = 0;
 }
 
   /* The debug hooks are used to implemen

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez  wrote:
>
> On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
>  wrote:
> >
> > On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:
> > >
> > >
> > >
> > > On 10/18/2021 2:17 AM, Aldy Hernandez wrote:
> > > >
> > > >
> > > > On 10/18/21 12:52 AM, Jeff Law wrote:
> > > >>
> > > >>
> > > >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
> > > >>> The following patch converts the strlen pass from evrp to ranger,
> > > >>> leaving DOM as the last remaining user.
> > > >> So is there any reason why we can't convert DOM as well?   DOM's use
> > > >> of EVRP is pretty limited.  You've mentioned FP bits before, but my
> > > >> recollection is those are not part of the EVRP analysis DOM uses.
> > > >> Hell, give me a little guidance and I'll do the work...
> > > >
> > > > Not only will I take you up on that offer, but I can provide 90% of
> > > > the work.  Here be dragons, though (well, for me, maybe not for you 
> > > > ;-)).
> > > >
> > > > DOM is actually an evrp pass at -O1 in disguise.  The reason it really
> > > > is a covert evrp pass is because:
> > > >
> > > > a) It calls extract_range_from_stmt on each statement.
> > > >
> > > > b) It folds conditionals with simplify_using_ranges.
> > > >
> > > > c) But most importantly, it exports discovered ranges when it's done
> > > > (evrp_range_analyzer(true)).
> > > >
> > > > If you look at the evrp pass, you'll notice that that's basically what
> > > > it does, albeit with the substitute and fold engine, which also calls
> > > > gimple fold plus other goodies.
> > > >
> > > > But I could argue that we've made DOM into an evrp pass without
> > > > noticing.  The last item (c) is particularly invasive because these
> > > > exported ranges show up in other passes unexpectedly.  For instance, I
> > > > saw an RTL pass at -O1 miss an optimization because it was dependent
> > > > on some global range being set.  IMO, DOM should not export global
> > > > ranges it discovered during its walk (do one thing and do it well),
> > > > but I leave it to you experts to pontificate.
> > > All true.  But I don't think we've got many, if any, hard dependencies
> > > on those behaviors.
> > >
> > > >
> > > > The attached patch is rather trivial.  It's mostly deleting state.  It
> > > > seems DOM spends a lot of time massaging the IL so that it can fold
> > > > conditionals or thread paths.  None of this is needed, because the
> > > > ranger can do all of this.  Well, except floats, but...
> > > Massaging the IL should only take two forms IIRC.
> > >
> > > First, if we have a simplification we can do.  That could be const/copy
> > > propagation, replacing an expression with an SSA_NAME or constant and
> > > the like.  It doesn't massage the IL just to massage the IL.
> > >
> > > Second, we do temporarily copy propagate the current known values of an
> > > SSA name into use points and then see if that allows us to determine if
> > > a statement is already in the hash tables.  But we undo that so that
> > > nobody should see that temporary change in state.
> >
> > Are you sure we still do that?  I can't find it at least.
>
> I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
> copy propagation that happens in the DOM's use of the threader:
>
>   /* Make a copy of the uses & vuses into USES_COPY, then cprop into
>  the operands.  */
>   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
> {
>   tree tmp = NULL;
>   tree use = USE_FROM_PTR (use_p);
>
>   copy[i++] = use;
>   if (TREE_CODE (use) == SSA_NAME)
> tmp = SSA_NAME_VALUE (use);
>   if (tmp)
> SET_USE (use_p, tmp);
> }
>
>   cached_lhs = simplifier->simplify (stmt, stmt, bb, this);
>
>   /* Restore the statement's original uses/defs.  */
>   i = 0;
>   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
> SET_USE (use_p, copy[i++]);

Ah, likely.  These days we'd likely use a gimple_match_op but then
this seems heavily abstracted, no idea where simplifier->simplify
might lead to ;)
I'm also not sure why the threader would do the valueization here and
not the simplify() function (and lookup_avail_expr misses an 'exploded' operand
lookup as well).  Lot's of legacy code ;)

But I think the above is not going to be an issue unless ranger runs in
circles around backedges, arriving at this very same stmt again?  A way
out might be to copy the stmt to the stack, adjust operands and use that
for the simplify entry.

Richard.

> Aldy
>


Re: [PATCH] options: Fix variable tracking option processing.

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 3:14 PM Martin Liška  wrote:
>
> On 10/21/21 11:57, Richard Biener wrote:
> > which previously affected debug_nonbind_markers_p.  I think it makes
> > sense to move
> > the above to finish_options as well.  I suppose -help doesn't correctly dump
> > the -g enabled state for -gtoggle at the moment?
>
> All right, works for me. The updated patch was tested on nvptx cross compiler
> and can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> The -g option is showed in -Q --help=common as:
>
>-g
>
> So no option value is displayed.
>
> Is the patch fine now?

OK.

Thanks,
Richard.

> Thanks,
> Martin


[PATCH] or1k: Update FPU to specify detect tininess before rounding

2021-10-21 Thread Stafford Horne via Gcc-patches
This was not defined in the spec and not consistent in the
implementation causing incosistent behavior.  After review we have
updated the CPU implementations and proposed the spec be updated to
specific that FPU tininess checks check for tininess before roudning.

Architecture change draft:

https://openrisc.io/proposals/p18-fpu-tininess

libgcc/ChangeLog:

* config/or1k/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Change to 0.
---
 libgcc/config/or1k/sfp-machine.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/or1k/sfp-machine.h b/libgcc/config/or1k/sfp-machine.h
index eebe5b0578e..162c6bc5326 100644
--- a/libgcc/config/or1k/sfp-machine.h
+++ b/libgcc/config/or1k/sfp-machine.h
@@ -85,7 +85,7 @@ do {  
\
 
 #define __BYTE_ORDER __BIG_ENDIAN
 
-#define _FP_TININESS_AFTER_ROUNDING 1
+#define _FP_TININESS_AFTER_ROUNDING 0
 
 /* Define ALIASNAME as a strong alias for NAME.  */
 # define strong_alias(name, aliasname) _strong_alias(name, aliasname)
-- 
2.31.1



[PATCH] libcody: Avoid double-free

2021-10-21 Thread Jonathan Wakely via Gcc-patches
If the listen call fails then 'goto fail' will jump to that label and
use freeaddrinfo again. Set the pointer to null to prevent that.

libcody/ChangeLog:

* netserver.cc (ListenInet6): Set pointer to null after
deallocation.
---
 libcody/netserver.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libcody/netserver.cc b/libcody/netserver.cc
index 30202c5106a..0499b5790b4 100644
--- a/libcody/netserver.cc
+++ b/libcody/netserver.cc
@@ -140,6 +140,7 @@ int ListenInet6 (char const **e, char const *name, int 
port, unsigned backlog)
 
  listen:;
   freeaddrinfo (addrs);
+  addrs = nullptr;
 
   if (listen (fd, backlog ? backlog : 17) < 0)
 {
-- 
2.31.1



[PATCH] c++tools: Fix memory leak

2021-10-21 Thread Jonathan Wakely via Gcc-patches
The allocated memory is not freed when returning early due to an error.

c++tools/ChangeLog:

* resolver.cc (module_resolver::read_tuple_file): Use unique_ptr
to ensure memory is freed before returning.
---
 c++tools/resolver.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/c++tools/resolver.cc b/c++tools/resolver.cc
index 421fdaa55fe..d1b73a47778 100644
--- a/c++tools/resolver.cc
+++ b/c++tools/resolver.cc
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "resolver.h"
 // C++
 #include 
+#include 
 // C
 #include 
 // OS
@@ -114,10 +115,17 @@ module_resolver::read_tuple_file (int fd, char const 
*prefix, bool force)
   buffer = mmap (nullptr, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
   if (buffer == MAP_FAILED)
 return -errno;
+  struct Deleter {
+void operator()(void* p) const { munmap(p, size); }
+size_t size;
+  };
+  std::unique_ptr guard(buffer, Deleter{(size_t)stat.st_size});
 #else
   buffer = xmalloc (stat.st_size);
   if (!buffer)
 return -errno;
+  struct Deleter { void operator()(void* p) const { free(p); } };
+  std::unique_ptr guard;
   if (read (fd, buffer, stat.st_size) != stat.st_size)
 return -errno;
 #endif
@@ -179,12 +187,6 @@ module_resolver::read_tuple_file (int fd, char const 
*prefix, bool force)
}
 }
 
-#if MAPPED_READING
-  munmap (buffer, stat.st_size);
-#else
-  free (buffer);
-#endif
-
   return 0;
 }
 
-- 
2.31.1



Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Aldy Hernandez via Gcc-patches




On 10/21/21 3:14 PM, Richard Biener wrote:

On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez  wrote:


On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
 wrote:


On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:




On 10/18/2021 2:17 AM, Aldy Hernandez wrote:



On 10/18/21 12:52 AM, Jeff Law wrote:



On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

So is there any reason why we can't convert DOM as well?   DOM's use
of EVRP is pretty limited.  You've mentioned FP bits before, but my
recollection is those are not part of the EVRP analysis DOM uses.
Hell, give me a little guidance and I'll do the work...


Not only will I take you up on that offer, but I can provide 90% of
the work.  Here be dragons, though (well, for me, maybe not for you ;-)).

DOM is actually an evrp pass at -O1 in disguise.  The reason it really
is a covert evrp pass is because:

a) It calls extract_range_from_stmt on each statement.

b) It folds conditionals with simplify_using_ranges.

c) But most importantly, it exports discovered ranges when it's done
(evrp_range_analyzer(true)).

If you look at the evrp pass, you'll notice that that's basically what
it does, albeit with the substitute and fold engine, which also calls
gimple fold plus other goodies.

But I could argue that we've made DOM into an evrp pass without
noticing.  The last item (c) is particularly invasive because these
exported ranges show up in other passes unexpectedly.  For instance, I
saw an RTL pass at -O1 miss an optimization because it was dependent
on some global range being set.  IMO, DOM should not export global
ranges it discovered during its walk (do one thing and do it well),
but I leave it to you experts to pontificate.

All true.  But I don't think we've got many, if any, hard dependencies
on those behaviors.



The attached patch is rather trivial.  It's mostly deleting state.  It
seems DOM spends a lot of time massaging the IL so that it can fold
conditionals or thread paths.  None of this is needed, because the
ranger can do all of this.  Well, except floats, but...

Massaging the IL should only take two forms IIRC.

First, if we have a simplification we can do.  That could be const/copy
propagation, replacing an expression with an SSA_NAME or constant and
the like.  It doesn't massage the IL just to massage the IL.

Second, we do temporarily copy propagate the current known values of an
SSA name into use points and then see if that allows us to determine if
a statement is already in the hash tables.  But we undo that so that
nobody should see that temporary change in state.


Are you sure we still do that?  I can't find it at least.


I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
copy propagation that happens in the DOM's use of the threader:

   /* Make a copy of the uses & vuses into USES_COPY, then cprop into
  the operands.  */
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
 {
   tree tmp = NULL;
   tree use = USE_FROM_PTR (use_p);

   copy[i++] = use;
   if (TREE_CODE (use) == SSA_NAME)
 tmp = SSA_NAME_VALUE (use);
   if (tmp)
 SET_USE (use_p, tmp);
 }

   cached_lhs = simplifier->simplify (stmt, stmt, bb, this);

   /* Restore the statement's original uses/defs.  */
   i = 0;
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
 SET_USE (use_p, copy[i++]);


Ah, likely.  These days we'd likely use a gimple_match_op but then
this seems heavily abstracted, no idea where simplifier->simplify
might lead to ;)
I'm also not sure why the threader would do the valueization here and
not the simplify() function (and lookup_avail_expr misses an 'exploded' operand
lookup as well).  Lot's of legacy code ;)


Yes, there's a lot of legacy code I've left mostly alone.

There are two copies of simplify() now, the DOM version 
(dom_jt_simplifier::simplify) and the VRP version 
(hybrid_jt_simplifier::simplify).  Each do slightly different things, as 
has always been the case.  It's just that the separation is now explicit.


No idea what's up with the valueization either.  The VRP version doesn't 
need any of this valuezation or on the side structures.  You just ask 
the range of a stmt on a path and it gives you an answer.




But I think the above is not going to be an issue unless ranger runs in
circles around backedges, arriving at this very same stmt again?  A way
out might be to copy the stmt to the stack, adjust operands and use that
for the simplify entry.


If you look at the patch I sent Jeff, you'll see that I've removed most 
of what's in that function.  There's no need to propagate anything.  The 
ranger can simplify the final conditional without having to set up anything.


Aldy



Re: [PATCH] libcody: Avoid double-free

2021-10-21 Thread Jonathan Wakely via Gcc-patches
Oops, this patch and the next one were only meant for gcc-patches, not the
libstdc++ list as well - sorry!


On Thu, 21 Oct 2021 at 14:27, Jonathan Wakely via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> If the listen call fails then 'goto fail' will jump to that label and
> use freeaddrinfo again. Set the pointer to null to prevent that.
>
> libcody/ChangeLog:
>
> * netserver.cc (ListenInet6): Set pointer to null after
> deallocation.
> ---
>  libcody/netserver.cc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libcody/netserver.cc b/libcody/netserver.cc
> index 30202c5106a..0499b5790b4 100644
> --- a/libcody/netserver.cc
> +++ b/libcody/netserver.cc
> @@ -140,6 +140,7 @@ int ListenInet6 (char const **e, char const *name, int
> port, unsigned backlog)
>
>   listen:;
>freeaddrinfo (addrs);
> +  addrs = nullptr;
>
>if (listen (fd, backlog ? backlog : 17) < 0)
>  {
> --
> 2.31.1
>
>


[COMMITED 1/3] Move ranger only VRP folder to tree-vrp.

2021-10-21 Thread Andrew MacLeod via Gcc-patches
This patch moves the ranger rvrp folder class to tree-vrp.c, and 
provides an execute_ranger_vrp() routine that matches the same parameter 
sequence as the execute_vrp() routine has.  The evrp pass is tweaked to 
call this routine when running in ranger-only mode.


This paves the way for a ranger version of VRP to be instantiated. There 
is no functional change.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.


>From 434ebc1e08b1d83ecd3622ee2a3c7270869bda52 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 15 Oct 2021 12:26:13 -0400
Subject: [PATCH 1/3] Move ranger only VRP folder to tree-vrp.

Consolidate the RVRP folder into a single "execute_vrp" routine that mimics
the format used by the vrp1 and vrp2 passes.  Relocate into the tree-vrp file.

	* gimple-ssa-evrp.c (class rvrp_folder): Move to tree-vrp.c.
	(execute_early_vrp): For ranger only mode, invoke ranger_vrp.
	* tree-vrp.c (class rvrp_folder): Relocate here.
	(execute_ranger_vrp): New.
	* tree-vrp.h (execute_ranger_vrp): Export.
---
 gcc/gimple-ssa-evrp.c |  92 ++-
 gcc/tree-vrp.c| 122 ++
 gcc/tree-vrp.h|   2 +
 3 files changed, 128 insertions(+), 88 deletions(-)

diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index 7f2055501a0..a0192e2b2aa 100644
--- a/gcc/gimple-ssa-evrp.c
+++ b/gcc/gimple-ssa-evrp.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-range.h"
 #include "fold-const.h"
 #include "value-pointer-equiv.h"
+#include "tree-vrp.h"
 
 // This is the classic EVRP folder which uses a dominator walk and pushes
 // ranges into the next block if it is a single predecessor block.
@@ -110,88 +111,6 @@ protected:
   simplify_using_ranges simplifier;
 };
 
-// This is a ranger based folder which continues to use the dominator
-// walk to access the substitute and fold machinery.  Ranges are calculated
-// on demand.
-
-class rvrp_folder : public substitute_and_fold_engine
-{
-public:
-
-  rvrp_folder () : substitute_and_fold_engine (), m_simplifier ()
-  {
-m_ranger = enable_ranger (cfun);
-m_simplifier.set_range_query (m_ranger, m_ranger->non_executable_edge_flag);
-m_pta = new pointer_equiv_analyzer (m_ranger);
-  }
-  
-  ~rvrp_folder ()
-  {
-if (dump_file && (dump_flags & TDF_DETAILS))
-  m_ranger->dump (dump_file);
-
-m_ranger->export_global_ranges ();
-disable_ranger (cfun);
-delete m_pta;
-  }
-
-  tree value_of_expr (tree name, gimple *s = NULL) OVERRIDE
-  {
-// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
-if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
-  return NULL;
-tree ret = m_ranger->value_of_expr (name, s);
-if (!ret && supported_pointer_equiv_p (name))
-  ret = m_pta->get_equiv (name);
-return ret;
-  }
-
-  tree value_on_edge (edge e, tree name) OVERRIDE
-  {
-// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
-if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
-  return NULL;
-tree ret = m_ranger->value_on_edge (e, name);
-if (!ret && supported_pointer_equiv_p (name))
-  ret = m_pta->get_equiv (name);
-return ret;
-  }
-
-  tree value_of_stmt (gimple *s, tree name = NULL) OVERRIDE
-  {
-// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
-if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
-  return NULL;
-return m_ranger->value_of_stmt (s, name);
-  }
-
-  void pre_fold_bb (basic_block bb) OVERRIDE
-  {
-m_pta->enter (bb);
-  }
-
-  void post_fold_bb (basic_block bb) OVERRIDE
-  {
-m_pta->leave (bb);
-  }
-
-  void pre_fold_stmt (gimple *stmt) OVERRIDE
-  {
-m_pta->visit_stmt (stmt);
-  }
-
-  bool fold_stmt (gimple_stmt_iterator *gsi) OVERRIDE
-  {
-return m_simplifier.simplify (gsi);
-  }
-
-private:
-  DISABLE_COPY_AND_ASSIGN (rvrp_folder);
-  gimple_ranger *m_ranger;
-  simplify_using_ranges m_simplifier;
-  pointer_equiv_analyzer *m_pta;
-};
-
 // In a hybrid folder, start with an EVRP folder, and add the required
 // fold_stmt bits to either try the ranger first or second.
 //
@@ -393,6 +312,9 @@ hybrid_folder::choose_value (tree evrp_val, tree ranger_val)
 static unsigned int
 execute_early_vrp ()
 {
+  if ((param_evrp_mode & EVRP_MODE_RVRP_FIRST) == EVRP_MODE_RVRP_ONLY)
+return execute_ranger_vrp (cfun, false);
+
   /* Ideally this setup code would move into the ctor for the folder
  However, this setup can change the number of blocks which
  invalidates the internal arrays that are set up by the dominator
@@ -411,12 +333,6 @@ execute_early_vrp ()
 	folder.substitute_and_fold ();
 	break;
   }
-case EVRP_MODE_RVRP_ONLY:
-  {
-	rvrp_folder folder;
-	folder.substitute_and_fold ();
-	break;
-  }
 case EVRP_MODE_EVRP_FIRST:
   {
 	hybrid_folder folder (true);
diff --git a/gcc/tree-vrp

[COMMITTED 2/3] Add --param=vrp1-mode and --param=vrp2-mode.

2021-10-21 Thread Andrew MacLeod via Gcc-patches
This patch adds 2 new --params,  vrp1-mode and vrp2-mode.  They can be 
used to select which version of VRP should be used for each pass. Valid 
options are "vrp" and "ranger": ie


--param=vrp1-mode=vrp
--param=vrp2-mode=ranger

As VRP is the current default for both, again, there is no functional 
change, but it is now possible to run ranger in either of the 2 VRP 
passes.  The plan will be to make ranger the default for VRP2 once I've 
sorted out a couple of testcases.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.


>From bd400db6d3ec167142ace352db00f84d382e33a8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 15 Oct 2021 12:06:27 -0400
Subject: [PATCH 2/3] Add --param=vrp1-mode and --param=vrp2-mode.

Add 2 new params to select between VRP and RANGER to be used for each pass.

	* doc/invoke.texi: (vrp1-mode, vrp2-mode): Document.
	* flag-types.h: (enum vrp_mode): New.
	* params.opt: (vrp1-mode, vrp2-mode): New.
	* tree-vrp.c (vrp_pass_num): New.
	(pass_vrp::pass_vrp): Set pass number.
	(pass_vrp::execute): Choose which VRP mode to execute.
---
 gcc/doc/invoke.texi |  6 ++
 gcc/flag-types.h|  7 +++
 gcc/params.opt  | 17 +
 gcc/tree-vrp.c  | 12 ++--
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6d1e328571a..b89f9b61f9c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14504,6 +14504,12 @@ Maximum number of basic blocks before EVRP uses a sparse cache.
 @item evrp-mode
 Specifies the mode Early VRP should operate in.
 
+@item vrp1-mode
+Specifies the mode VRP pass 1 should operate in.
+
+@item vrp2-mode
+Specifies the mode VRP pass 2 should operate in.
+
 @item evrp-switch-limit
 Specifies the maximum number of switch cases before EVRP ignores a switch.
 
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index ae0b216e8a3..9f104e43d40 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -463,6 +463,13 @@ enum evrp_mode
   EVRP_MODE_DEBUG = (EVRP_MODE_GORI | EVRP_MODE_CACHE)
 };
 
+/* VRP modes.  */
+enum vrp_mode
+{
+  VRP_MODE_VRP,
+  VRP_MODE_RANGER
+};
+
 /* Modes of OpenACC 'kernels' constructs handling.  */
 enum openacc_kernels
 {
diff --git a/gcc/params.opt b/gcc/params.opt
index 83b3db6fea6..27ef4b6578f 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1137,4 +1137,21 @@ Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only
 Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 1) Param Optimization
 The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized.
 
+-param=vrp1-mode=
+Common Joined Var(param_vrp1_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param Optimization
+--param=vrp1-mode=[vrp|ranger] Specifies the mode VRP1 should operate in.
+
+-param=vrp2-mode=
+Common Joined Var(param_vrp2_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param Optimization
+--param=vrp2-mode=[vrp|ranger] Specifies the mode VRP2 should operate in.
+
+Enum
+Name(vrp_mode) Type(enum vrp_mode) UnknownError(unknown vrp mode %qs)
+
+EnumValue
+Enum(vrp_mode) String(vrp) Value(VRP_MODE_VRP)
+
+EnumValue
+Enum(vrp_mode) String(ranger) Value(VRP_MODE_RANGER)
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index b0b217bbf86..ba7a4efc7c6 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -4450,11 +4450,13 @@ const pass_data pass_data_vrp =
   ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
 };
 
+static int vrp_pass_num = 0;
 class pass_vrp : public gimple_opt_pass
 {
 public:
   pass_vrp (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_vrp, ctxt), warn_array_bounds_p (false)
+: gimple_opt_pass (pass_data_vrp, ctxt), warn_array_bounds_p (false),
+  my_pass (++vrp_pass_num)
   {}
 
   /* opt_pass methods: */
@@ -4466,10 +4468,16 @@ public:
 }
   virtual bool gate (function *) { return flag_tree_vrp != 0; }
   virtual unsigned int execute (function *fun)
-{ return execute_vrp (fun, warn_array_bounds_p); }
+{
+  if ((my_pass == 1 && param_vrp1_mode == VRP_MODE_RANGER)
+	  || (my_pass == 2 && param_vrp2_mode == VRP_MODE_RANGER))
+	return execute_ranger_vrp (fun, warn_array_bounds_p);
+  return execute_vrp (fun, warn_array_bounds_p);
+}
 
  private:
   bool warn_array_bounds_p;
+  int my_pass;
 }; // class pass_vrp
 
 } // anon namespace
-- 
2.17.2



[COMMITTED 3/3] Split --param=evrp-mode into evrp-mode and ranger-debug.

2021-10-21 Thread Andrew MacLeod via Gcc-patches
Should have done this a while ago.  With ranger being utilized in other 
places, tying the debug output mode with the EVRP mode no longer makes 
sense.


This patch splits the current --param=evrp-mode values into "evrp-mode" 
and "ranger-debug".   After this patch, valid values are:


--param=evrp-mode=[legacy|ranger|legacy-first|ranger-first] default is 
"ranger"

--param=ranger-debug=[none|trace|gori|cache|tracegori|all] default is "none"

Again, this provides no real functional changes.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

>From 9cb114fd5550eb02dfd6b8db5cb5b8fb72827d53 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 19 Oct 2021 14:09:51 -0400
Subject: [PATCH 3/3] Split --param=evrp-mode into evrp-mode and ranger-debug.

With Ranger being used in more than EVRP, the debug output should no longer
be tied up with the EVRP mode flag.

	* doc/invoke.texi (ranger-debug): Document.
	* flag-types.h (enum ranger_debug): New.
	(enum evrp_mode): Remove debug values.
	* gimple-range-cache.cc (DEBUG_RANGE_CACHE): Use new debug flag.
	* gimple-range-gori.cc (gori_compute::gori_compute): Ditto.
	* gimple-range.cc (gimple_ranger::gimple_ranger): Ditto.
	* gimple-ssa-evrp.c (hybrid_folder::choose_value): Ditto.
	(execute_early_vrp): Use evrp-mode directly.
	* params.opt (enum evrp_mode): Remove debug values.
	(ranger-debug): New.
	(ranger-logical-depth): Relocate to be in alphabetical order.
---
 gcc/doc/invoke.texi   |  3 +++
 gcc/flag-types.h  | 24 +++---
 gcc/gimple-range-cache.cc |  4 +--
 gcc/gimple-range-gori.cc  |  2 +-
 gcc/gimple-range.cc   |  2 +-
 gcc/gimple-ssa-evrp.c |  6 ++---
 gcc/params.opt| 52 +++
 7 files changed, 56 insertions(+), 37 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b89f9b61f9c..c66a25fcd69 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14510,6 +14510,9 @@ Specifies the mode VRP pass 1 should operate in.
 @item vrp2-mode
 Specifies the mode VRP pass 2 should operate in.
 
+@item ranger-debug
+Specifies the type of debug output to be issued for ranges.
+
 @item evrp-switch-limit
 Specifies the maximum number of switch cases before EVRP ignores a switch.
 
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 9f104e43d40..a5a637160d7 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -449,18 +449,24 @@ enum parloops_schedule_type
   PARLOOPS_SCHEDULE_RUNTIME
 };
 
+/* Ranger debug mode.  */
+enum ranger_debug
+{
+  RANGER_DEBUG_NONE = 0,
+  RANGER_DEBUG_TRACE = 1,
+  RANGER_DEBUG_CACHE = (2 | RANGER_DEBUG_TRACE),
+  RANGER_DEBUG_GORI = 4,
+  RANGER_DEBUG_TRACE_GORI = (RANGER_DEBUG_TRACE | RANGER_DEBUG_GORI),
+  RANGER_DEBUG_ALL = (RANGER_DEBUG_GORI | RANGER_DEBUG_CACHE)
+};
+
 /* EVRP mode.  */
 enum evrp_mode
 {
-  EVRP_MODE_RVRP_ONLY = 0,
-  EVRP_MODE_EVRP_ONLY = 1,
-  EVRP_MODE_EVRP_FIRST = 2,
-  EVRP_MODE_RVRP_FIRST = 3,
-  EVRP_MODE_TRACE = 4,
-  EVRP_MODE_CACHE = (8 | EVRP_MODE_TRACE),
-  EVRP_MODE_GORI = 16,
-  EVRP_MODE_TRACE_GORI = (EVRP_MODE_TRACE | EVRP_MODE_GORI),
-  EVRP_MODE_DEBUG = (EVRP_MODE_GORI | EVRP_MODE_CACHE)
+  EVRP_MODE_RVRP_ONLY,
+  EVRP_MODE_EVRP_ONLY,
+  EVRP_MODE_EVRP_FIRST,
+  EVRP_MODE_RVRP_FIRST
 };
 
 /* VRP modes.  */
diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 9cbc63d8a40..05010cf15bc 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -30,8 +30,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-range.h"
 #include "tree-cfg.h"
 
-#define DEBUG_RANGE_CACHE (dump_file && (param_evrp_mode & EVRP_MODE_CACHE) \
-	 == EVRP_MODE_CACHE)
+#define DEBUG_RANGE_CACHE (dump_file	\
+			   && (param_ranger_debug & RANGER_DEBUG_CACHE))
 
 // During contructor, allocate the vector of ssa_names.
 
diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 6946fa65dda..4e45c593871 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -644,7 +644,7 @@ gori_compute::gori_compute (int not_executable_flag)
   // Create a boolean_type true and false range.
   m_bool_zero = int_range<2> (boolean_false_node, boolean_false_node);
   m_bool_one = int_range<2> (boolean_true_node, boolean_true_node);
-  if (dump_file && (param_evrp_mode & EVRP_MODE_GORI))
+  if (dump_file && (param_ranger_debug & RANGER_DEBUG_GORI))
 tracer.enable_trace ();
 }
 
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 93d6da66ccb..69cde911c49 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -42,7 +42,7 @@ gimple_ranger::gimple_ranger () :
 {
   // If the cache has a relation oracle, use it.
   m_oracle = m_cache.oracle ();
-  if (dump_file && (param_evrp_mode & EVRP_MODE_TRACE))
+  if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE))
 tracer.enable_trace ();
 
   // Ensure the not_executable flag is clear everywhere.
diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index a0192e2b2aa

Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

2021-10-21 Thread Hongyu Wang via Gcc-patches
Thanks for reminding this, will adjust the testcase since the output
for 128/256bit HFmode load has changed.

Martin Liška  于2021年10月21日周四 下午8:49写道:
>
> On 10/21/21 07:47, Hongyu Wang via Gcc-patches wrote:
> > |Yes, updated patch.|
>
> Note the patch caused the following test failure:
>
> FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
> \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
> FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
> \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
> FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
> \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
> FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
> \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
>
> Martin


Re: [PATCH 3/N] Come up with casm global state.

2021-10-21 Thread Martin Liška

On 10/21/21 14:42, Richard Biener wrote:

On Thu, Oct 21, 2021 at 11:47 AM Martin Liška  wrote:


On 10/5/21 13:54, Richard Biener wrote:

On Mon, Oct 4, 2021 at 1:13 PM Martin Liška  wrote:


On 9/22/21 11:59, Richard Biener wrote:

On Thu, Sep 16, 2021 at 3:12 PM Martin Liška  wrote:


This patch comes up with asm_out_state and a new global variable casm.

Tested on all cross compilers.
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


   * output.h (struct asm_out_file): New struct that replaces a
^^^
asm_out_state?


Yes, sure!



You replace a lot of asm_out_file - do we still need the macro then?
(I'd have simplified the patch leaving that replacement out at first)


Well, I actually replaced only a very small fraction of the usage of 
asm_out_file.

$ git grep asm_out_file | grep -v ChangeLog | wc -l

1601


So I think we should live with the macro for some time ...



You leave quite some global state out of the struct that is obviously
related, in the diff I see object_block_htab for example.


Yes, I'm aware of it. Can we do that incrementally?


Basically
everything initialized in init_varasm_once is a candidate (which
then shows const_desc_htab and shared_constant_pool as well
as pending_assemble_externals_set).


Well, these would probably need a different header file (or another #include 
... must
be added before output.h :// ).


For the goal of outputting
early DWARF to another file the state CTOR could have a mode
to not initialize those parts or we could have asm-out-state-with-sections
as base of asm-out-state.


Yes, right now asm_out_state ctor is minimal:

 asm_out_state (): out_file (NULL), in_section (NULL),
   sec ({}), anchor_labelno (0), in_cold_section_p (false)
 {
   section_htab = hash_table::create_ggc (31);
 }



In the end there will be a target part of the state so I think
construction needs to be defered to the target which can
derive from asm-out-state and initialize the part it needs.
That's currently what targetm.asm_out.init_sections () does
and we'd transform that to a targetm.asm_out.create () or so.
That might already be necessary for the DWARF stuff.


So what do you want to with content of init_varasm_once function?


It goes away ;)  Or rather becomes the invocation of the
asm-out-state CTOR via the target hook.



That said, dealing with the target stuff piecemail is OK, but maybe
try to make sure that init_varasm_once is actually identical
to what the CTOR does?


So you want asm_out_state::asm_out_state doing what we current initialize
in init_varasm_once, right?


Yes, asm_out_state should cover everything initialized in init_varasm_once
(and the called targetm.asm_out.init_sections).

targetm.asm_out.init_sections would become

asm_out_state *init_sections ();

where the target can derive from asm_out_state (optionally) and
allocates the asm-out-state & invokes the CTORs.  The middle-end
visible asm_out_state would contain all the tables initialized in
init_varasm_once.

I'm not sure if doing it piecemail is good - we don't want to touch
things multiple times without good need and it might be things
turn out not be as "simple" as the above plan sounds.

I would suggest to try the conversion with powerpc since it
seems to be the only target with two different implementations
for the target hook.

The

static GTY(()) section *read_only_data_section;
static GTY(()) section *private_data_section;
static GTY(()) section *tls_data_section;
static GTY(()) section *tls_private_data_section;
static GTY(()) section *read_only_private_data_section;
static GTY(()) section *sdata2_section;

section *toc_section = 0;

could become #defines to static_cast
(casm)->section_name
and there'd be

class rs6000_asm_out_state : public asm_out_state
{
... add stuff ...
}

in rs6000/xcoff.h and rs6000/sysv4.h and some generic rs6000 header as well
(adding no fields) and the target hook would basically do

   return ggc_new rs6000_asm_state ();

(OK, ggc_alloc + placement new I guess).  The default hook just
creates asm_out_state.


Hello.

All right, I made a patch candidate that does the suggested steps and I ported
rs6000 target (both ELF and XCOFF sub-targets).

So far I was able to bootstrap on ppc6le-linux-gnu.

Thoughts?


+extern GTY(()) rs6000_asm_out_state *target_casm;

this seems to be unused


Sure, it can be a static variable in rs6000.c for GGC marking purpose.



+#define rs6000_casm static_cast (casm)

maybe there's a better way?  Though I can't think of one at the moment.
There are only 10 uses so eventually we can put the
static_cast into all places.  Let's ask the powerpc maintainers (CCed).


Or we can wrap it with a function call doing the casting?



Note you do

+/* Implement TARGET_ASM_INIT_SECTIONS.  */
+
+static asm_out_state *
+rs6000_elf_asm_init_sections (void)
+{
+  rs6000_asm_out_state *target_state
+= new (ggc_alloc ()) rs6000_asm_out_state ();
+  target_state->init_elf_s

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Jeff Law via Gcc-patches




On 10/21/2021 6:56 AM, Aldy Hernandez wrote:

On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
 wrote:

On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:



On 10/18/2021 2:17 AM, Aldy Hernandez wrote:


On 10/18/21 12:52 AM, Jeff Law wrote:


On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

So is there any reason why we can't convert DOM as well?   DOM's use
of EVRP is pretty limited.  You've mentioned FP bits before, but my
recollection is those are not part of the EVRP analysis DOM uses.
Hell, give me a little guidance and I'll do the work...

Not only will I take you up on that offer, but I can provide 90% of
the work.  Here be dragons, though (well, for me, maybe not for you ;-)).

DOM is actually an evrp pass at -O1 in disguise.  The reason it really
is a covert evrp pass is because:

a) It calls extract_range_from_stmt on each statement.

b) It folds conditionals with simplify_using_ranges.

c) But most importantly, it exports discovered ranges when it's done
(evrp_range_analyzer(true)).

If you look at the evrp pass, you'll notice that that's basically what
it does, albeit with the substitute and fold engine, which also calls
gimple fold plus other goodies.

But I could argue that we've made DOM into an evrp pass without
noticing.  The last item (c) is particularly invasive because these
exported ranges show up in other passes unexpectedly.  For instance, I
saw an RTL pass at -O1 miss an optimization because it was dependent
on some global range being set.  IMO, DOM should not export global
ranges it discovered during its walk (do one thing and do it well),
but I leave it to you experts to pontificate.

All true.  But I don't think we've got many, if any, hard dependencies
on those behaviors.


The attached patch is rather trivial.  It's mostly deleting state.  It
seems DOM spends a lot of time massaging the IL so that it can fold
conditionals or thread paths.  None of this is needed, because the
ranger can do all of this.  Well, except floats, but...

Massaging the IL should only take two forms IIRC.

First, if we have a simplification we can do.  That could be const/copy
propagation, replacing an expression with an SSA_NAME or constant and
the like.  It doesn't massage the IL just to massage the IL.

Second, we do temporarily copy propagate the current known values of an
SSA name into use points and then see if that allows us to determine if
a statement is already in the hash tables.  But we undo that so that
nobody should see that temporary change in state.

Are you sure we still do that?  I can't find it at least.

I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
copy propagation that happens in the DOM's use of the threader:

   /* Make a copy of the uses & vuses into USES_COPY, then cprop into
  the operands.  */
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
 {
   tree tmp = NULL;
   tree use = USE_FROM_PTR (use_p);

   copy[i++] = use;
   if (TREE_CODE (use) == SSA_NAME)
 tmp = SSA_NAME_VALUE (use);
   if (tmp)
 SET_USE (use_p, tmp);
 }

   cached_lhs = simplifier->simplify (stmt, stmt, bb, this);

   /* Restore the statement's original uses/defs.  */
   i = 0;
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
 SET_USE (use_p, copy[i++]);
Exactly what I was referring to.  It may be worth an experiment -- I 
can't recall when this code went in.  It might be a remnant from the 
original threader that pre-dates block copying.  In that world we had to 
look up expressions in the table as part of the step to verify that we 
could safely ignore statements at the start of a block.


Or it could be something that was added to improve threading when the 
condition we're trying to thread through is partially redundant on the 
thread path.  This would allow us to discover that partial redundancy 
and exploit it for threading.


In either case this code may have outlived its usefulness.  I wonder 
what would happen if we just took it out.


jeff


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 3:30 PM Aldy Hernandez  wrote:
>
>
>
> On 10/21/21 3:14 PM, Richard Biener wrote:
> > On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez  wrote:
> >>
> >> On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
> >>  wrote:
> >>>
> >>> On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:
> 
> 
> 
>  On 10/18/2021 2:17 AM, Aldy Hernandez wrote:
> >
> >
> > On 10/18/21 12:52 AM, Jeff Law wrote:
> >>
> >>
> >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:
> >>> The following patch converts the strlen pass from evrp to ranger,
> >>> leaving DOM as the last remaining user.
> >> So is there any reason why we can't convert DOM as well?   DOM's use
> >> of EVRP is pretty limited.  You've mentioned FP bits before, but my
> >> recollection is those are not part of the EVRP analysis DOM uses.
> >> Hell, give me a little guidance and I'll do the work...
> >
> > Not only will I take you up on that offer, but I can provide 90% of
> > the work.  Here be dragons, though (well, for me, maybe not for you 
> > ;-)).
> >
> > DOM is actually an evrp pass at -O1 in disguise.  The reason it really
> > is a covert evrp pass is because:
> >
> > a) It calls extract_range_from_stmt on each statement.
> >
> > b) It folds conditionals with simplify_using_ranges.
> >
> > c) But most importantly, it exports discovered ranges when it's done
> > (evrp_range_analyzer(true)).
> >
> > If you look at the evrp pass, you'll notice that that's basically what
> > it does, albeit with the substitute and fold engine, which also calls
> > gimple fold plus other goodies.
> >
> > But I could argue that we've made DOM into an evrp pass without
> > noticing.  The last item (c) is particularly invasive because these
> > exported ranges show up in other passes unexpectedly.  For instance, I
> > saw an RTL pass at -O1 miss an optimization because it was dependent
> > on some global range being set.  IMO, DOM should not export global
> > ranges it discovered during its walk (do one thing and do it well),
> > but I leave it to you experts to pontificate.
>  All true.  But I don't think we've got many, if any, hard dependencies
>  on those behaviors.
> 
> >
> > The attached patch is rather trivial.  It's mostly deleting state.  It
> > seems DOM spends a lot of time massaging the IL so that it can fold
> > conditionals or thread paths.  None of this is needed, because the
> > ranger can do all of this.  Well, except floats, but...
>  Massaging the IL should only take two forms IIRC.
> 
>  First, if we have a simplification we can do.  That could be const/copy
>  propagation, replacing an expression with an SSA_NAME or constant and
>  the like.  It doesn't massage the IL just to massage the IL.
> 
>  Second, we do temporarily copy propagate the current known values of an
>  SSA name into use points and then see if that allows us to determine if
>  a statement is already in the hash tables.  But we undo that so that
>  nobody should see that temporary change in state.
> >>>
> >>> Are you sure we still do that?  I can't find it at least.
> >>
> >> I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
> >> copy propagation that happens in the DOM's use of the threader:
> >>
> >>/* Make a copy of the uses & vuses into USES_COPY, then cprop into
> >>   the operands.  */
> >>FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
> >>  {
> >>tree tmp = NULL;
> >>tree use = USE_FROM_PTR (use_p);
> >>
> >>copy[i++] = use;
> >>if (TREE_CODE (use) == SSA_NAME)
> >>  tmp = SSA_NAME_VALUE (use);
> >>if (tmp)
> >>  SET_USE (use_p, tmp);
> >>  }
> >>
> >>cached_lhs = simplifier->simplify (stmt, stmt, bb, this);
> >>
> >>/* Restore the statement's original uses/defs.  */
> >>i = 0;
> >>FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
> >>  SET_USE (use_p, copy[i++]);
> >
> > Ah, likely.  These days we'd likely use a gimple_match_op but then
> > this seems heavily abstracted, no idea where simplifier->simplify
> > might lead to ;)
> > I'm also not sure why the threader would do the valueization here and
> > not the simplify() function (and lookup_avail_expr misses an 'exploded' 
> > operand
> > lookup as well).  Lot's of legacy code ;)
>
> Yes, there's a lot of legacy code I've left mostly alone.
>
> There are two copies of simplify() now, the DOM version
> (dom_jt_simplifier::simplify) and the VRP version
> (hybrid_jt_simplifier::simplify).  Each do slightly different things, as
> has always been the case.  It's just that the separation is now explicit.
>
> No idea what's up with the valueization either.  The VRP version doesn't

[PATCH] Adjust testcase for 128/256 bit HF vector load/store

2021-10-21 Thread Hongyu Wang via Gcc-patches
Hi,

The HF vector move have been updated to align with HI vector,
adjust according testcase for _Float16 vector load and store.

Tested on x86_64-pc-linux-gnu{-m32,}, pushed as obvious fix.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-13.c: Adjust scan-assembler for
xmm/ymm load/store.
---
 gcc/testsuite/gcc.target/i386/avx512fp16-13.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
index c3bae65da67..b73a8f44e1a 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
@@ -18,7 +18,7 @@ store256_ph (void *p, __m256h a)
   _mm256_store_ph (p, a);
 }
 
-/* { dg-final { scan-assembler-times "vmovdqa64\[ 
\\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ 
\\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
 
 void
 __attribute__ ((noinline, noclone))
@@ -27,7 +27,7 @@ store_ph (void *p, __m128h a)
   _mm_store_ph (p, a);
 }
 
-/* { dg-final { scan-assembler-times "vmovdqa64\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
 
 __m512h
 __attribute__ ((noinline, noclone))
@@ -45,7 +45,7 @@ load256_ph (void const *p)
   return _mm256_load_ph (p);
 }
 
-/* { dg-final { scan-assembler-times "vmovdqa64\[ 
\\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ 
\\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */
 
 __m128h
 __attribute__ ((noinline, noclone))
@@ -53,7 +53,7 @@ load_ph (void const *p)
 {
   return _mm_load_ph (p);
 }
-/* { dg-final { scan-assembler-times "vmovdqa64\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */
 
 __m512h
 __attribute__ ((noinline, noclone))
-- 
2.18.1



Re: [PATCH 3/N] Come up with casm global state.

2021-10-21 Thread Richard Biener via Gcc-patches
On Thu, Oct 21, 2021 at 3:43 PM Martin Liška  wrote:
>
> On 10/21/21 14:42, Richard Biener wrote:
> > On Thu, Oct 21, 2021 at 11:47 AM Martin Liška  wrote:
> >>
> >> On 10/5/21 13:54, Richard Biener wrote:
> >>> On Mon, Oct 4, 2021 at 1:13 PM Martin Liška  wrote:
> 
>  On 9/22/21 11:59, Richard Biener wrote:
> > On Thu, Sep 16, 2021 at 3:12 PM Martin Liška  wrote:
> >>
> >> This patch comes up with asm_out_state and a new global variable casm.
> >>
> >> Tested on all cross compilers.
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> >* output.h (struct asm_out_file): New struct that replaces a
> > ^^^
> > asm_out_state?
> 
>  Yes, sure!
> 
> >
> > You replace a lot of asm_out_file - do we still need the macro then?
> > (I'd have simplified the patch leaving that replacement out at first)
> 
>  Well, I actually replaced only a very small fraction of the usage of 
>  asm_out_file.
> 
>  $ git grep asm_out_file | grep -v ChangeLog | wc -l
> 
>  1601
> 
> 
>  So I think we should live with the macro for some time ...
> 
> >
> > You leave quite some global state out of the struct that is obviously
> > related, in the diff I see object_block_htab for example.
> 
>  Yes, I'm aware of it. Can we do that incrementally?
> 
> > Basically
> > everything initialized in init_varasm_once is a candidate (which
> > then shows const_desc_htab and shared_constant_pool as well
> > as pending_assemble_externals_set).
> 
>  Well, these would probably need a different header file (or another 
>  #include ... must
>  be added before output.h :// ).
> 
> > For the goal of outputting
> > early DWARF to another file the state CTOR could have a mode
> > to not initialize those parts or we could have 
> > asm-out-state-with-sections
> > as base of asm-out-state.
> 
>  Yes, right now asm_out_state ctor is minimal:
> 
>   asm_out_state (): out_file (NULL), in_section (NULL),
> sec ({}), anchor_labelno (0), in_cold_section_p (false)
>   {
> section_htab = hash_table::create_ggc (31);
>   }
> 
> >
> > In the end there will be a target part of the state so I think
> > construction needs to be defered to the target which can
> > derive from asm-out-state and initialize the part it needs.
> > That's currently what targetm.asm_out.init_sections () does
> > and we'd transform that to a targetm.asm_out.create () or so.
> > That might already be necessary for the DWARF stuff.
> 
>  So what do you want to with content of init_varasm_once function?
> >>>
> >>> It goes away ;)  Or rather becomes the invocation of the
> >>> asm-out-state CTOR via the target hook.
> >>>
> >
> > That said, dealing with the target stuff piecemail is OK, but maybe
> > try to make sure that init_varasm_once is actually identical
> > to what the CTOR does?
> 
>  So you want asm_out_state::asm_out_state doing what we current initialize
>  in init_varasm_once, right?
> >>>
> >>> Yes, asm_out_state should cover everything initialized in init_varasm_once
> >>> (and the called targetm.asm_out.init_sections).
> >>>
> >>> targetm.asm_out.init_sections would become
> >>>
> >>> asm_out_state *init_sections ();
> >>>
> >>> where the target can derive from asm_out_state (optionally) and
> >>> allocates the asm-out-state & invokes the CTORs.  The middle-end
> >>> visible asm_out_state would contain all the tables initialized in
> >>> init_varasm_once.
> >>>
> >>> I'm not sure if doing it piecemail is good - we don't want to touch
> >>> things multiple times without good need and it might be things
> >>> turn out not be as "simple" as the above plan sounds.
> >>>
> >>> I would suggest to try the conversion with powerpc since it
> >>> seems to be the only target with two different implementations
> >>> for the target hook.
> >>>
> >>> The
> >>>
> >>> static GTY(()) section *read_only_data_section;
> >>> static GTY(()) section *private_data_section;
> >>> static GTY(()) section *tls_data_section;
> >>> static GTY(()) section *tls_private_data_section;
> >>> static GTY(()) section *read_only_private_data_section;
> >>> static GTY(()) section *sdata2_section;
> >>>
> >>> section *toc_section = 0;
> >>>
> >>> could become #defines to static_cast
> >>> (casm)->section_name
> >>> and there'd be
> >>>
> >>> class rs6000_asm_out_state : public asm_out_state
> >>> {
> >>> ... add stuff ...
> >>> }
> >>>
> >>> in rs6000/xcoff.h and rs6000/sysv4.h and some generic rs6000 header as 
> >>> well
> >>> (adding no fields) and the target hook would basically do
> >>>
> >>>return ggc_new rs6000_asm_state ();
> >>>
> >>> (OK, ggc_alloc + placement new I guess).  The default hook just
> >>> creates a

Re: [PATCH] gcc: implement AIX-style constructors

2021-10-21 Thread David Edelsohn via Gcc-patches
On Thu, Oct 21, 2021 at 8:39 AM CHIGOT, CLEMENT  wrote:
>
> Hi David,
>
> The problem is that cdtors is created by the linker only when the -bcdtors
> flag is provided. Thus, if we add "extern void (* _cdtors[]) (void);" to
> the "crtcxa.c", we can't use it without using the new constructor types.
> One solution would be to create another crtcxa (eg crtcxa_cdtors.o) which will
> be replacing the default crtcxa.o when needed. I didn't think about that
> when creating the patch. But it might be a better approach. What do you think 
> ?

Hi, Clement

Another, optional object file in libgcc seems like a cleaner solution
than collect2.c emitting all of the code.

The file should not be called crtcxa_cdtors.o because "cxa" came from
the name of the libstdc++ support functions.  Maybe crtcdtors.o or
crt_cdtors.o or aixcdtors.o or aix_cdtors.o .

Thanks, David


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Aldy Hernandez via Gcc-patches




On 10/21/21 3:46 PM, Richard Biener wrote:

On Thu, Oct 21, 2021 at 3:30 PM Aldy Hernandez  wrote:




On 10/21/21 3:14 PM, Richard Biener wrote:

On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez  wrote:


On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
 wrote:


On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:




On 10/18/2021 2:17 AM, Aldy Hernandez wrote:



On 10/18/21 12:52 AM, Jeff Law wrote:



On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

So is there any reason why we can't convert DOM as well?   DOM's use
of EVRP is pretty limited.  You've mentioned FP bits before, but my
recollection is those are not part of the EVRP analysis DOM uses.
Hell, give me a little guidance and I'll do the work...


Not only will I take you up on that offer, but I can provide 90% of
the work.  Here be dragons, though (well, for me, maybe not for you ;-)).

DOM is actually an evrp pass at -O1 in disguise.  The reason it really
is a covert evrp pass is because:

a) It calls extract_range_from_stmt on each statement.

b) It folds conditionals with simplify_using_ranges.

c) But most importantly, it exports discovered ranges when it's done
(evrp_range_analyzer(true)).

If you look at the evrp pass, you'll notice that that's basically what
it does, albeit with the substitute and fold engine, which also calls
gimple fold plus other goodies.

But I could argue that we've made DOM into an evrp pass without
noticing.  The last item (c) is particularly invasive because these
exported ranges show up in other passes unexpectedly.  For instance, I
saw an RTL pass at -O1 miss an optimization because it was dependent
on some global range being set.  IMO, DOM should not export global
ranges it discovered during its walk (do one thing and do it well),
but I leave it to you experts to pontificate.

All true.  But I don't think we've got many, if any, hard dependencies
on those behaviors.



The attached patch is rather trivial.  It's mostly deleting state.  It
seems DOM spends a lot of time massaging the IL so that it can fold
conditionals or thread paths.  None of this is needed, because the
ranger can do all of this.  Well, except floats, but...

Massaging the IL should only take two forms IIRC.

First, if we have a simplification we can do.  That could be const/copy
propagation, replacing an expression with an SSA_NAME or constant and
the like.  It doesn't massage the IL just to massage the IL.

Second, we do temporarily copy propagate the current known values of an
SSA name into use points and then see if that allows us to determine if
a statement is already in the hash tables.  But we undo that so that
nobody should see that temporary change in state.


Are you sure we still do that?  I can't find it at least.


I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
copy propagation that happens in the DOM's use of the threader:

/* Make a copy of the uses & vuses into USES_COPY, then cprop into
   the operands.  */
FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
  {
tree tmp = NULL;
tree use = USE_FROM_PTR (use_p);

copy[i++] = use;
if (TREE_CODE (use) == SSA_NAME)
  tmp = SSA_NAME_VALUE (use);
if (tmp)
  SET_USE (use_p, tmp);
  }

cached_lhs = simplifier->simplify (stmt, stmt, bb, this);

/* Restore the statement's original uses/defs.  */
i = 0;
FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
  SET_USE (use_p, copy[i++]);


Ah, likely.  These days we'd likely use a gimple_match_op but then
this seems heavily abstracted, no idea where simplifier->simplify
might lead to ;)
I'm also not sure why the threader would do the valueization here and
not the simplify() function (and lookup_avail_expr misses an 'exploded' operand
lookup as well).  Lot's of legacy code ;)


Yes, there's a lot of legacy code I've left mostly alone.

There are two copies of simplify() now, the DOM version
(dom_jt_simplifier::simplify) and the VRP version
(hybrid_jt_simplifier::simplify).  Each do slightly different things, as
has always been the case.  It's just that the separation is now explicit.

No idea what's up with the valueization either.  The VRP version doesn't
need any of this valuezation or on the side structures.  You just ask
the range of a stmt on a path and it gives you an answer.


Yeah, but it doesn't "simplify", it uses ranger to do constant folding
... (ick).
For random expressions I'd have used gimple_simplify (in fact it somehow
tries to do value-numbering or sth like that to derive new equivalences).


The ranger is read-only.  It is not folding anything.  The simplify() 
callback is only returning the singleton the conditional resolves to (1 
or 0 iirc).  It is up to the caller (the forward threader in this case) 
to d

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Aldy Hernandez via Gcc-patches




On 10/21/21 3:43 PM, Jeff Law wrote:



On 10/21/2021 6:56 AM, Aldy Hernandez wrote:

On Thu, Oct 21, 2021 at 12:20 PM Richard Biener
 wrote:

On Wed, Oct 20, 2021 at 10:58 PM Jeff Law  wrote:



On 10/18/2021 2:17 AM, Aldy Hernandez wrote:


On 10/18/21 12:52 AM, Jeff Law wrote:


On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

So is there any reason why we can't convert DOM as well?   DOM's use
of EVRP is pretty limited.  You've mentioned FP bits before, but my
recollection is those are not part of the EVRP analysis DOM uses.
Hell, give me a little guidance and I'll do the work...

Not only will I take you up on that offer, but I can provide 90% of
the work.  Here be dragons, though (well, for me, maybe not for you 
;-)).


DOM is actually an evrp pass at -O1 in disguise.  The reason it really
is a covert evrp pass is because:

a) It calls extract_range_from_stmt on each statement.

b) It folds conditionals with simplify_using_ranges.

c) But most importantly, it exports discovered ranges when it's done
(evrp_range_analyzer(true)).

If you look at the evrp pass, you'll notice that that's basically what
it does, albeit with the substitute and fold engine, which also calls
gimple fold plus other goodies.

But I could argue that we've made DOM into an evrp pass without
noticing.  The last item (c) is particularly invasive because these
exported ranges show up in other passes unexpectedly.  For instance, I
saw an RTL pass at -O1 miss an optimization because it was dependent
on some global range being set.  IMO, DOM should not export global
ranges it discovered during its walk (do one thing and do it well),
but I leave it to you experts to pontificate.

All true.  But I don't think we've got many, if any, hard dependencies
on those behaviors.


The attached patch is rather trivial.  It's mostly deleting state.  It
seems DOM spends a lot of time massaging the IL so that it can fold
conditionals or thread paths.  None of this is needed, because the
ranger can do all of this.  Well, except floats, but...

Massaging the IL should only take two forms IIRC.

First, if we have a simplification we can do.  That could be const/copy
propagation, replacing an expression with an SSA_NAME or constant and
the like.  It doesn't massage the IL just to massage the IL.

Second, we do temporarily copy propagate the current known values of an
SSA name into use points and then see if that allows us to determine if
a statement is already in the hash tables.  But we undo that so that
nobody should see that temporary change in state.

Are you sure we still do that?  I can't find it at least.

I couldn't either, but perhaps what Jeff is referring to is the ad-hoc
copy propagation that happens in the DOM's use of the threader:

   /* Make a copy of the uses & vuses into USES_COPY, then cprop into
  the operands.  */
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
 {
   tree tmp = NULL;
   tree use = USE_FROM_PTR (use_p);

   copy[i++] = use;
   if (TREE_CODE (use) == SSA_NAME)
 tmp = SSA_NAME_VALUE (use);
   if (tmp)
 SET_USE (use_p, tmp);
 }

   cached_lhs = simplifier->simplify (stmt, stmt, bb, this);

   /* Restore the statement's original uses/defs.  */
   i = 0;
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES)
 SET_USE (use_p, copy[i++]);
Exactly what I was referring to.  It may be worth an experiment -- I 
can't recall when this code went in.  It might be a remnant from the 
original threader that pre-dates block copying.  In that world we had to 
look up expressions in the table as part of the step to verify that we 
could safely ignore statements at the start of a block.


Or it could be something that was added to improve threading when the 
condition we're trying to thread through is partially redundant on the 
thread path.  This would allow us to discover that partial redundancy 
and exploit it for threading.


In either case this code may have outlived its usefulness.  I wonder 
what would happen if we just took it out.


Look at the patch I sent you.  I rip most of it out, as ranger doesn't 
need it.  Perhaps DOM+evrp+threader doesn't either??


Aldy



[PATCH] coroutine use of lambda capture proxies [PR96517]

2021-10-21 Thread Iain Sandoe
Hi

(apologies to Jason and Nathan for two copies - I sent from the wrong email and 
omitted patches@)

This is a description of how the revised handling of coroutine proxy variables 
is supposed to work, in the case of methods and lambdas. Given that this came 
up again recently, perhaps I might put some version of this description as a 
block comment before the rewrite code.

There is a bug-fix patch for PR96517, the long preamble is to put context (and, 
hopefully, thus identify any more assumptions that I might have missed).



NOTE: It is the user’s responsibility to ensure that ‘this’ and any lambda 
object persist for the duration of the coroutine.

IMO this is probably not the best design decision ever made by the committee 
since it is very easy to make innocent-looking but broken coroutine code with 
lambdas.  [aside: I think a more user-friendly decision would have been to 
extend the lifetime of temporaries in statements until *all* the code before 
the semi-colon is executed (i.e. including deferred code in a coroutine)]

anyway…

consider this illustrative-only case (where ‘coroutine’ is a suitable 
implementation of the customisation points).
I have elided some of the implementation variables, they are handled the same 
way as the ones shown and this is already long enough.

 code

struct test {
   int a;

   void foo() {
   int c;
   auto lam = [this, c](int b) -> coroutine {
   co_return a + b + c;
   };
   }
};

Before the coroutine transformation process:

closure object:
  {
int __c
test* __this
typedef closure_type
 }

inline coroutine test::foo::…..::operator() (closure_type *__closure, int b)
{
 BIND_EXPR
 VARS {
  const int c;   // lambda proxy var => __closure->__c
  test* const this;   // lambda proxy var => __clousre->__this
 }
 BODY {
   {
 const int c;
 test* const this;
 co_return this->a + b + c;
   }
 }
}

Now because of the design decision mentioned above, we do not copy the closure 
contents, only the pointer.

=== after transformation

our coroutine frame will look like this…

test::foo::……..Frame
  {
void (*)test::foo::Frame*)_Coro_resume_fn)(test::foo::…..Frame*)
void (*)test::foo::...Frame*)_Coro_destroy_fn)(est::foo::…...Frame*)
coroutine::promise_type _Coro_promise
   ...
closure_type* const __closure  // parm copy
int b  // parm copy, no need for mangling.
   ...
std::__n4861::suspend_always Is_1_1  // local var frame entries are mangled 
with their scope info,
std::__n4861::suspend_never Fs_1_3
 }

our rewritten lambda looks like this:

BIND_EXPR
VARS {
 void (*_Coro_resume_fn)(test::foo::._anon_4::.Frame*); // coroutine proxy 
var for resume fn
 void (*_Coro_destroy_fn)(test::foo::._anon_4::.Frame*); // ...
 coroutine::promise_type _Coro_promise; // for promise   DVE => 
Frame->_Coro_promise
 ...
 closure_type* const __closure;  // coro proxy var for closure parm DVE => 
Frame->__closure
 int b;  // coro proxy var for b parm
 ….
}
BODY {
 {
   void (*_Coro_resume_fn)(test::foo::._anon_4::.Frame*);
   void (*_Coro_destroy_fn)(test::foo::._anon_4::.Frame*);
  coroutine::promise_type _Coro_promise;
   ...
   const test::foo::._anon_4* const __closure;
  int b;
  ...
   try
 {
   co_await initial suspend expr.
   BIND_EXPR (original lambda body)
   VARS {
 const int c; // lambda proxy var; DVE => __closure->__c
 test* const this; // lambda proxy var DVE => __closure->__this
   }
   BODY {
 {
   const int c; 
   test* const this;
   co_return this->a + b + c;
 }
   }
 }
  catch(...)
cleanup_stmt 
…
   }
 c-finally
...

 final.suspend:;
… 
}
}

We have a predicate for  is_capture_proxy() for lambdas

So .. in this expression:

co_return this->a + b + c;

If we ask “is_capture_proxy” for ’this’ or ‘c’ we should answer “yes” - these 
vars are referred to via the closure object and, if the constraints of the 
standard are met (lifetime of the closure object and the captured this), then 
it should be just as if they were used in the body of a regular lambda.

The closure pointer itself is a coro proxy var (for the parm)
b is a coro proxy var (for the parm)
Note that the parm proxies are VAR decls not PARMs (we need to make a copy, 
because the original parms are used in the ramp, and obviously they are *not* 
parms of the helper function).

so when all the DECL_VALUE_EXPRs are expanded we would get:
Frame->__closure->__this->a + Frame->b + Frame->_closure->__c;

Whilst the initial intention of this was to make debug work (since GCC’s debug 
generation already knows about DVEs) it turns out that it greatly simplifies 
the code, since we are dealing with the original variables instead of replacing 
them with a component ref.

PR 96517 fires because we answer “NO” for is_capture_proxy in the rewritten 
code, because we do not (correctly) answer “yes” for LAMBDA_TYPE_P for 

Re: [COMMITTED] Avoid threading circular paths.

2021-10-21 Thread Jeff Law via Gcc-patches




On 10/21/2021 2:12 AM, Aldy Hernandez wrote:

The backward threader keeps a hash of visited blocks to avoid crossing
the same block twice.  Interestingly, we haven't been checking it for
the final block out of the path.  This may be inherited from the old
code, as it was simple enough that it didn't matter.  With the
upcoming changes enabling the fully resolving threader, it gets
tripped often enough to cause wrong code to be generated.

Tested on x86-64 Linux.

Committed as obvious.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (back_threader::maybe_register_path):
Avoid threading circular paths.
I noticed you already reverted.  I won't complain about the morning test 
results :-)


Jeff




[PATCH] libstdc++: missing constexpr for __[nm]iter_base [PR102358]

2021-10-21 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look ok for trunk/11/10?

PR libstdc++/102358

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__niter_base): Make constexpr
for C++20.
(__miter_base): Likewise.
* testsuite/25_algorithms/move/constexpr.cc: New test.
---
 libstdc++-v3/include/bits/stl_iterator.h  |  2 ++
 .../testsuite/25_algorithms/move/constexpr.cc | 19 +++
 2 files changed, 21 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 8afd6756613..fbd91956ced 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2470,6 +2470,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// @} group iterators
 
   template
+_GLIBCXX20_CONSTEXPR
 auto
 __niter_base(move_iterator<_Iterator> __it)
 -> decltype(make_move_iterator(__niter_base(__it.base(
@@ -2483,6 +2484,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   template
+_GLIBCXX20_CONSTEXPR
 auto
 __miter_base(move_iterator<_Iterator> __it)
 -> decltype(__miter_base(__it.base()))
diff --git a/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc 
b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc
new file mode 100644
index 000..773c55cfb50
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc
@@ -0,0 +1,19 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+#include 
+#include 
+
+constexpr bool
+test01()
+{
+  // PR libstdc++/102358
+  int x[2] = {1,2}, y[2];
+  std::span in(x), out(y);
+  std::move(std::move_iterator(in.begin()), std::move_iterator(in.end()),
+   out.begin());
+  return std::equal(std::move_iterator(in.begin()), 
std::move_iterator(in.end()),
+   std::move_iterator(out.begin()));
+}
+
+static_assert(test01());
-- 
2.33.1.711.g9d530dc002



Re: [COMMITTED] Avoid threading circular paths.

2021-10-21 Thread Aldy Hernandez via Gcc-patches
Sorry, I tried to revert as soon as I noticed, but it's always a race
against the 20 CI bots and fuzzers that keep us honest ;-).

Aldy

On Thu, Oct 21, 2021 at 4:35 PM Jeff Law  wrote:
>
>
>
> On 10/21/2021 2:12 AM, Aldy Hernandez wrote:
> > The backward threader keeps a hash of visited blocks to avoid crossing
> > the same block twice.  Interestingly, we haven't been checking it for
> > the final block out of the path.  This may be inherited from the old
> > code, as it was simple enough that it didn't matter.  With the
> > upcoming changes enabling the fully resolving threader, it gets
> > tripped often enough to cause wrong code to be generated.
> >
> > Tested on x86-64 Linux.
> >
> > Committed as obvious.
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-threadbackward.c (back_threader::maybe_register_path):
> >   Avoid threading circular paths.
> I noticed you already reverted.  I won't complain about the morning test
> results :-)
>
> Jeff
>
>



[PATCH] libstdc++: Implement P2432R1 changes for views::istream

2021-10-21 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11/10?

libstdc++-v3/ChangeLog:

* include/std/ranges (istream_view): Replace this function
template with an alias template as per P2432R1.
(wistream_view): Define as per P2432R1.
(views::_Istream, views::istream): Likewise.
* testsuite/std/ranges/istream_view.cc (test07): New test.
---
 libstdc++-v3/include/std/ranges   | 25 ---
 .../testsuite/std/ranges/istream_view.cc  | 13 ++
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 30ba0606869..7aa0f9dfc67 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -781,10 +781,27 @@ namespace views
   friend _Iterator;
 };
 
-  template
-basic_istream_view<_Val, _CharT, _Traits>
-istream_view(basic_istream<_CharT, _Traits>& __s)
-{ return basic_istream_view<_Val, _CharT, _Traits>{__s}; }
+  template
+using istream_view = basic_istream_view<_Val, char>;
+
+  template
+using wistream_view = basic_istream_view<_Val, wchar_t>;
+
+namespace views
+{
+  template
+struct _Istream
+{
+  template
+   [[nodiscard]]
+   constexpr auto
+   operator()(basic_istream<_CharT, _TraitsT>& __e) const
+   { return basic_istream_view<_Tp, _CharT, _TraitsT>(__e); }
+};
+
+  template
+inline constexpr _Istream<_Tp> istream;
+}
 
   // C++20 24.7 [range.adaptors] Range adaptors
 
diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc 
b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
index f5c0c2a6bb0..ea7c5a35ef7 100644
--- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
@@ -103,6 +103,18 @@ test06()
   static_assert( std::is_same_v );
 }
 
+void
+test07()
+{
+  // P2432R1, views::istream
+  auto nums = std::istringstream("0 1 2 3 4");
+  ranges::istream_view v(nums);
+  int sum = 0;
+  for (int val : views::istream(nums))
+sum += val;
+  VERIFY( sum == 10 );
+}
+
 int
 main()
 {
@@ -112,4 +124,5 @@ main()
   test04();
   test05();
   test06();
+  test07();
 }
-- 
2.33.1.711.g9d530dc002



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-21 Thread Martin Sebor via Gcc-patches

On 10/21/21 1:17 AM, Aldy Hernandez wrote:

On Wed, Oct 20, 2021 at 10:01 PM Jeff Law  wrote:




On 10/20/2021 9:15 AM, Aldy Hernandez wrote:

On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor  wrote:


I appreciate the heads up.  I'm happy that the threader has
improved.  I'm obviously not pleased that it has led to regressions
in warnings but I understand that in some cases they might be due
to limitations in the warning code.  I think the test case you have
xfailed might be one such example.  The uninitialized warnings are
exquisitely sensitive to these types of changes.  If/when this patch
is applied please reopen PR 89230 and reference this commit.

Having said that, to maintain the quality of diagnostics,
the work that goes into these nice optimizer improvements needs
to be balanced by an effort to either update the warning code
to cope with the IL changes, or the optimizers need to take care
to avoid exposing undefined code that the warnings are designed
to detect.  I'm concerned not just that the quality of GCC 12
diagnostics has been eroding, but also that it seems to be not
just acceptable but expected.

You make a very good point.  It is certainly not my intention to make
life difficult for the warning maintainers, but I'm afraid I don't
have sufficient knowledge in the area to improve them.

There may be some low hanging fruit though.  At least in the warnings
that use the ranger, there's no reason to run these passes so late in
the pipeline.  You could run the warning code as early as you want,
insofar as SSA is available and the CFG has been built.  Heck, you may
even be able to run at -O0, though we may need some sort of value
numbering.  I believe Richi even suggested this a while back.

Running them later in the pipeline is to take advantage of the
optimizers removing dead and unreachable code as much as possible. In
fact, that's critical to -Wuninitialized.  Optimizing away unreachable
paths  to avoid Wuninitialized false positives has been the major driver
of jump threading improvements for the last 15 years.


Ughh, that's unfortunate.  We're gonna have to come up with
improvements to the Wuninitialized code, or a different paradigm
altogether.  I'm afraid this will only get worse.

It is a bit ironic that jump threading helps reduce Wuninitialized
false positives, but yet too much of it causes even more false
positives.


For best results they need to work together.  That was also my
point.  All middle end warnings necessarily rely on optimizers
for one thing or another.  When we change one such optimization
and break the assumptions the warnings make (right or wrong),
the warnings break too.  To minimize the fallout we need to
do two things: decouple them as much as possible (minimize
the assumptions) and, when we change the optimizers, we
might need to update the warnings.  If we don't, then yes,
things are going to keep getting worse.

As an aside, in GCC 12 a good number of middle end warnings
already do run at -O0: all those in gimple-ssa-warn-access.c.
I'd like to see gimple-ssa-array-bounds invoked from the access
pass too (instead of from VRP), and eventually -Wrestrict as well.
I'm not sure about the strlen/sprintf warnings; those might need
to stay where they are because they run as part of the optimizers
there.

(By the way, I don't see range info in the access pass at -O0.
Should I?)

Martin


Re: [PATCH] libstdc++: missing constexpr for __[nm]iter_base [PR102358]

2021-10-21 Thread Jonathan Wakely via Gcc-patches
On Thu, 21 Oct 2021 at 15:38, Patrick Palka via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Tested on x86_64-pc-linux-gnu, does this look ok for trunk/11/10?
>

Yes to all, thanks.



> PR libstdc++/102358
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_iterator.h (__niter_base): Make constexpr
> for C++20.
> (__miter_base): Likewise.
> * testsuite/25_algorithms/move/constexpr.cc: New test.
> ---
>  libstdc++-v3/include/bits/stl_iterator.h  |  2 ++
>  .../testsuite/25_algorithms/move/constexpr.cc | 19 +++
>  2 files changed, 21 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc
>
> diff --git a/libstdc++-v3/include/bits/stl_iterator.h
> b/libstdc++-v3/include/bits/stl_iterator.h
> index 8afd6756613..fbd91956ced 100644
> --- a/libstdc++-v3/include/bits/stl_iterator.h
> +++ b/libstdc++-v3/include/bits/stl_iterator.h
> @@ -2470,6 +2470,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>/// @} group iterators
>
>template
> +_GLIBCXX20_CONSTEXPR
>  auto
>  __niter_base(move_iterator<_Iterator> __it)
>  -> decltype(make_move_iterator(__niter_base(__it.base(
> @@ -2483,6 +2484,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  };
>
>template
> +_GLIBCXX20_CONSTEXPR
>  auto
>  __miter_base(move_iterator<_Iterator> __it)
>  -> decltype(__miter_base(__it.base()))
> diff --git a/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc
> b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc
> new file mode 100644
> index 000..773c55cfb50
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc
> @@ -0,0 +1,19 @@
> +// { dg-options "-std=gnu++20" }
> +// { dg-do compile { target c++20 } }
> +
> +#include 
> +#include 
> +
> +constexpr bool
> +test01()
> +{
> +  // PR libstdc++/102358
> +  int x[2] = {1,2}, y[2];
> +  std::span in(x), out(y);
> +  std::move(std::move_iterator(in.begin()), std::move_iterator(in.end()),
> +   out.begin());
> +  return std::equal(std::move_iterator(in.begin()),
> std::move_iterator(in.end()),
> +   std::move_iterator(out.begin()));
> +}
> +
> +static_assert(test01());
> --
> 2.33.1.711.g9d530dc002
>
>


Re: how does vrp2 rearrange this?

2021-10-21 Thread Andrew MacLeod via Gcc-patches

On 10/19/21 7:13 PM, Andrew Pinski wrote:

On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod  wrote:

On 10/19/21 5:13 PM, Andrew Pinski wrote:

On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches
 wrote:

using testcase ifcvt-4.c:


typedef int word __attribute__((mode(word)));

word
foo (word x, word y, word a)
{
 word i = x;
 word j = y;
 /* Try to make taking the branch likely.  */
 __builtin_expect (x > y, 1);
 if (x > y)
   {
 i = a;
 j = i;
   }
 return i * j;



The testcase is broken anyways.
The builtin_expect should be inside the if to have any effect.  Look
at the estimated values:
if (x_3(D) > y_4(D))
  goto ; [50.00%]<<-- has been reversed.
else
  goto ; [50.00%]
;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
;;3 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)

See how it is 50/50?
The testcase is not even testing what it says it is testing.  Just
happened to work previously does not mean anything.  Move the
builtin_expect inside the if and try again. I am shocked it took this
long to find the testcase issue really.

Thanks,
Andrew Pinski

Moving the expect around doesn't change anything, in fact, it makes it 
worse since fre and evrp immediately eliminate it as true if it is in 
the THEN block.


It looks like it is eliminated by the CDDCE pass:

cddce1 sees:

  _1 = x_5(D) > y_7(D);
  # RANGE [0, 1] NONZERO 1
  _2 = (long int) _1;
  __builtin_expect (_2, 1);
  if (x_5(D) > y_7(D))
    goto ; [INV]
  else
    goto ; [INV]

and proceeds:

Marking useful stmt: if (x_5(D) > y_7(D))
processing: if (x_5(D) > y_7(D))
processing: i_3 = PHI 

Eliminating unnecessary statements:
Deleting : __builtin_expect (_2, 1);
Deleting : _2 = (long int) _1;
Deleting : _1 = x_5(D) > y_7(D);

IF we are suppose to reverse the If, it is not obvious to me who is 
suppose to..   You seem to be right that its a crap shot that VRP2 does 
it because there isnt enough info to dictate it.. unless somewhere it 
detects that a THEN targets an empty block which fallthrus to the ELSE 
block should be swapped.   Or maybe you are right and that it  flukeily 
happens due to the ASSERTS being added and removed.


IF i turn of DCE, then this all works like it si ssupopse to.. so maybe 
DCE isnt supopse to remove this?


Andrew



Re: [PATCH] libstdc++: Implement P2432R1 changes for views::istream

2021-10-21 Thread Jonathan Wakely via Gcc-patches
On Thu, 21 Oct 2021 at 15:41, Patrick Palka via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11/10?
>

Yes for trunk.

I'm undecided whether we want to make this breaking change on the branches.



> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (istream_view): Replace this function
> template with an alias template as per P2432R1.
> (wistream_view): Define as per P2432R1.
> (views::_Istream, views::istream): Likewise.
> * testsuite/std/ranges/istream_view.cc (test07): New test.
> ---
>  libstdc++-v3/include/std/ranges   | 25 ---
>  .../testsuite/std/ranges/istream_view.cc  | 13 ++
>  2 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges
> b/libstdc++-v3/include/std/ranges
> index 30ba0606869..7aa0f9dfc67 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -781,10 +781,27 @@ namespace views
>friend _Iterator;
>  };
>
> -  template
> -basic_istream_view<_Val, _CharT, _Traits>
> -istream_view(basic_istream<_CharT, _Traits>& __s)
> -{ return basic_istream_view<_Val, _CharT, _Traits>{__s}; }
> +  template
> +using istream_view = basic_istream_view<_Val, char>;
> +
> +  template
> +using wistream_view = basic_istream_view<_Val, wchar_t>;
> +
> +namespace views
> +{
> +  template
> +struct _Istream
> +{
> +  template
> +   [[nodiscard]]
> +   constexpr auto
> +   operator()(basic_istream<_CharT, _TraitsT>& __e) const
> +   { return basic_istream_view<_Tp, _CharT, _TraitsT>(__e); }
> +};
> +
> +  template
> +inline constexpr _Istream<_Tp> istream;
> +}
>
>// C++20 24.7 [range.adaptors] Range adaptors
>
> diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> index f5c0c2a6bb0..ea7c5a35ef7 100644
> --- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
> @@ -103,6 +103,18 @@ test06()
>static_assert( std::is_same_v );
>  }
>
> +void
> +test07()
> +{
> +  // P2432R1, views::istream
> +  auto nums = std::istringstream("0 1 2 3 4");
> +  ranges::istream_view v(nums);
> +  int sum = 0;
> +  for (int val : views::istream(nums))
> +sum += val;
> +  VERIFY( sum == 10 );
> +}
> +
>  int
>  main()
>  {
> @@ -112,4 +124,5 @@ main()
>test04();
>test05();
>test06();
> +  test07();
>  }
> --
> 2.33.1.711.g9d530dc002
>
>


Re: [PATCH 2/2] libstdc++: Implement P1739R4 changes to views::take/drop/counted

2021-10-21 Thread Jonathan Wakely via Gcc-patches
On Tue, 19 Oct 2021 at 13:20, Patrick Palka via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> This implements P1739R4 along with the resolution for LWG 3407 which
> corrects the paper's wording.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>

Yes, thanks.



> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_util.h (views::_Drop): Forward declare.
> (subrange): Befriend views::_Drop.
> (subrange::_S_store_size): Declare constexpr instead of just
> const, remove obsolete comment.
> * include/std/ranges (views::__detail::__is_empty_view): Define.
> (views::__detail::__is_basic_string_view): Likewise.
> (views::__detail::__is_subrange): Likewise.
> (views::__detail::__is_iota_view): Likewise.
> (views::__detail::__can_take_view): Rename template parm _Tp to
> _Dp.
> (views::_Take): Rename template parm _Tp to _Dp, make it
> non-deducible
> and fix its type to range_difference_t<_Range>.  Implement P1739R4
> and
> LWG 3407 changes.
> (views::__detail::__can_drop_view): Rename template parm _Tp to
> _Dp.
> (views::_Drop): As with views::_Take.
> (views::_Counted): Implement P1739R4 changes.
> * include/std/span (__detail::__is_std_span): Rename to ...
> (__detail::__is_span): ... this and turn it into a variable
> template.
> (__detail::__is_std_array): Turn it into a variable template.
> (span::span): Adjust uses of __is_std_span and __is_std_array
> accordingly.
> * testsuite/std/ranges/adaptors/p1739.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_util.h   |   7 +-
>  libstdc++-v3/include/std/ranges   | 106 +++---
>  libstdc++-v3/include/std/span |  12 +-
>  .../testsuite/std/ranges/adaptors/p1739.cc|  88 +++
>  4 files changed, 192 insertions(+), 21 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/p1739.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_util.h
> b/libstdc++-v3/include/bits/ranges_util.h
> index 7e7b958d274..1afa66d298c 100644
> --- a/libstdc++-v3/include/bits/ranges_util.h
> +++ b/libstdc++-v3/include/bits/ranges_util.h
> @@ -211,6 +211,8 @@ namespace ranges
>
>} // namespace __detail
>
> +  namespace views { struct _Drop; }
> +
>enum class subrange_kind : bool { unsized, sized };
>
>/// The ranges::subrange class template
> @@ -221,8 +223,9 @@ namespace ranges
>  class subrange : public view_interface>
>  {
>  private:
> -  // XXX: gcc complains when using constexpr here
> -  static const bool _S_store_size
> +  friend struct views::_Drop; // Needs to inspect _S_store_size.
> +
> +  static constexpr bool _S_store_size
> = _Kind == subrange_kind::sized && !sized_sentinel_for<_Sent, _It>;
>
>_It _M_begin = _It();
> diff --git a/libstdc++-v3/include/std/ranges
> b/libstdc++-v3/include/std/ranges
> index 85f232d8fb9..64396027c1b 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2147,19 +2148,66 @@ namespace views::__adaptor
>{
>  namespace __detail
>  {
> -  template
> +  template
> +   constexpr bool __is_empty_view = false;
> +
> +  template
> +   constexpr bool __is_empty_view> = true;
> +
> +  template
> +   constexpr bool __is_basic_string_view = false;
> +
> +  template
> +   constexpr bool __is_basic_string_view _Traits>> = true;
> +
> +  template
> +   constexpr bool __is_subrange = false;
> +
> +  template
> +   constexpr bool __is_subrange> = true;
> +
> +  template
> +   constexpr bool __is_iota_view = false;
> +
> +  template
> +   constexpr bool __is_iota_view> = true;
> +
> +  template
> concept __can_take_view
> - = requires { take_view(std::declval<_Range>(),
> std::declval<_Tp>()); };
> + = requires { take_view(std::declval<_Range>(),
> std::declval<_Dp>()); };
>  } // namespace __detail
>
>  struct _Take : __adaptor::_RangeAdaptor<_Take>
>  {
> -  template
> -   requires __detail::__can_take_view<_Range, _Tp>
> +  template range_difference_t<_Range>>
> +   requires __detail::__can_take_view<_Range, _Dp>
> constexpr auto
> -   operator() [[nodiscard]] (_Range&& __r, _Tp&& __n) const
> +   operator() [[nodiscard]] (_Range&& __r, type_identity_t<_Dp> __n)
> const
> {
> - return take_view(std::forward<_Range>(__r),
> std::forward<_Tp>(__n));
> + using _Tp = remove_cvref_t<_Range>;
> + if constexpr (__detail::__is_empty_view<_Tp>)
> +   return _Tp();
> + else if constexpr (random_access_range<_Tp>
> +&& sized_range<_Tp>
> +&& (std::__d

RE: [PATCH] aarch64: Remove redundant struct type definitions in arm_neon.h

2021-10-21 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: Thursday, October 21, 2021 12:46 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> 
> Subject: [PATCH] aarch64: Remove redundant struct type definitions in
> arm_neon.h
> 
> Hi,
> 
> As subject, this patch deletes some redundant type definitions in
> arm_neon.h. These vector type definitions are an artifact from the initial
> commit that added the AArch64 port.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-10-15  Jonathan Wright  
> 
>   * config/aarch64/arm_neon.h (__STRUCTN): Delete function
>   macro and all invocations.



Re: [PATCH 3/N] Come up with casm global state.

2021-10-21 Thread Segher Boessenkool
On Thu, Oct 21, 2021 at 02:42:10PM +0200, Richard Biener wrote:
> +#define rs6000_casm static_cast (casm)
> 
> maybe there's a better way?  Though I can't think of one at the moment.
> There are only 10 uses so eventually we can put the
> static_cast into all places.  Let's ask the powerpc maintainers (CCed).

It's disgusting, and fragile.  The define is slightly better than having
to write it out every time.  But can this not be done properly?

If you use object-oriented stuff and need casts for that, you are doing
something wrong.

> Note you do
> 
> +/* Implement TARGET_ASM_INIT_SECTIONS.  */
> +
> +static asm_out_state *
> +rs6000_elf_asm_init_sections (void)
> +{
> +  rs6000_asm_out_state *target_state
> += new (ggc_alloc ()) rs6000_asm_out_state ();
> +  target_state->init_elf_sections ();
> +  target_state->init_sections ();
> +
> +  return target_state;
> +}
> 
> If you'd have made init_sections virtual the flow would be more
> natural and we could separate section init from casm construction
> (and rs6000 would override init_sections but call the base function
> from the override).

Yeah.


Segher


Re: Fix PR middle-end/102764

2021-10-21 Thread Eric Botcazou via Gcc-patches
> 2021-10-20  Eric Botcazou  
> 
>   PR middle-end/102764
>   * cfgexpand.c (expand_gimple_basic_block): Disregard a final debug
>   statement to reset the current location for the outgoing edges.

This apparently breaks -fcompare-debug with -m32 so I have made it more robust 
by means of the attached fixlet.

Bootstrapped/regtested on x86-64/Linux, applied on the mainline as obvious.


PR middle-end/102764
* cfgexpand.c (expand_gimple_basic_block): Robustify latest change.

-- 
Eric Botcazoudiff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 8b067f9d848..01d0cdc548a 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5756,6 +5756,7 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls)
   rtx_insn *last;
   edge e;
   edge_iterator ei;
+  bool nondebug_stmt_seen = false;
 
   if (dump_file)
 fprintf (dump_file, "\n;; Generating RTL for gimple basic block %d\n",
@@ -5836,6 +5837,8 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls)
   basic_block new_bb;
 
   stmt = gsi_stmt (gsi);
+  if (!is_gimple_debug (stmt))
+	nondebug_stmt_seen = true;
 
   /* If this statement is a non-debug one, and we generate debug
 	 insns, then this one might be the last real use of a TERed
@@ -6090,7 +6093,7 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls)
   /* Expand implicit goto and convert goto_locus.  */
   FOR_EACH_EDGE (e, ei, bb->succs)
 {
-  if (e->goto_locus != UNKNOWN_LOCATION || !stmt || is_gimple_debug (stmt))
+  if (e->goto_locus != UNKNOWN_LOCATION || !nondebug_stmt_seen)
 	set_curr_insn_location (e->goto_locus);
   if ((e->flags & EDGE_FALLTHRU) && e->dest != bb->next_bb)
 	{


[wwwdocs, committed] GCC 12: Add release note for Fortran TS29113 improvements

2021-10-21 Thread Sandra Loosemore
I've checked in the attached patch to announce the cleanup project that 
Tobias and I have been working on over the last several months in the 
GCC 12 release notes.  I also updated the page for TS29113 on the GCC 
wiki to reflect that anything that still doesn't work ought to be 
considered a bug, not just incomplete work-in-progress.


I know that the conformance sections of the Fortran manual are badly in 
need of updating too (not just for TS29113, but the various versions of 
the standard).  That's in my queue to fix up, but I'm likely going to 
need assistance from the gfortran maintainers to get the details right.  :-S


-Sandra
commit f5971f451ae8834e928738bbfe465670aa481cea
Author: Sandra Loosemore 
Date:   Thu Oct 21 09:00:16 2021 -0700

GCC 12: Add release note for Fortran TS29113 improvements.

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 5be1570..5974b9b 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -147,7 +147,17 @@ a work-in-progress.
 
 
 
-
+Fortran
+
+  WG5/N1942, "TS 29113 Further Interoperability of Fortran with C",
+is now fully supported.  In addition to implementing previously
+missing functionality, such as support for character arguments of
+length greater than one in functions marked bind(c)
+and gaps in the handling for assumed-rank arrays, numerous other bugs
+have been fixed, and an extensive set of new conformance test cases
+has been added.
+  
+
 
 
 


Re: [RFC PATCH 0/1] implement TLS register based stack canary for ARM

2021-10-21 Thread Ard Biesheuvel via Gcc-patches
On Thu, 21 Oct 2021 at 12:23, Ard Biesheuvel  wrote:
>
> Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352
>
> In the Linux kernel, user processes calling into the kernel are
> essentially threads running in the same address space, of a program that
> never terminates. This means that using a global variable for the stack
> protector canary value is problematic on SMP systems, as we can never
> change it unless we reboot the system. (Processes that sleep for any
> reason will do so on a call into the kernel, which means that there will
> always be live kernel stack frames carrying copies of the canary taken
> when the function was entered)
>
> AArch64 implements -mstack-protector-guard=sysreg for this purpose, as
> this permits the kernel to use different memory addresses for the stack
> canary for each CPU, and context switch the chosen system register with
> the rest of the process, allowing each process to use its own unique
> value for the stack canary.
>
> This patch implements something similar, but for the 32-bit ARM kernel,
> which will start using the user space TLS register TPIDRURO to index
> per-process metadata while running in the kernel. This means we can just
> add an offset to TPIDRURO to obtain the address from which to load the
> canary value.
>
> The patch is a bit rough around the edges, but produces the correct
> results as far as I can tell.

This is a lie

> However, I couldn't quite figure out how
> to modify the patterns so that the offset will be moved into the
> immediate offset field of the LDR instructions, so currently, the ADD of
> the offset is always a distinct instruction.
>

... and this is no longer true now that I fixed the correctness
problem. I will be sending out a v2 shortly, so please disregard this
one for now.


> As for the spilling issues that have been fixed in this code in the
> past: I suppose a register carrying the TLS register value will never
> get spilled to begin with? How about a register that carries TLS+?
>
> Comments/suggestions welcome.
>
> Cc: thomas.preudho...@celest.fr
> Cc: adhemerval.zane...@linaro.org
> Cc: Qing Zhao 
> Cc: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org
>
> Ard Biesheuvel (1):
>   [ARM] Add support for TLS register based stack protector canary access
>
>  gcc/config/arm/arm-opts.h |  6 +++
>  gcc/config/arm/arm.c  | 39 +
>  gcc/config/arm/arm.md | 44 ++--
>  gcc/config/arm/arm.opt| 22 ++
>  gcc/doc/invoke.texi   |  9 
>  5 files changed, 116 insertions(+), 4 deletions(-)
>
> --
> 2.30.2
>
> $ cat |arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls 
> -mstack-protector-guard-offset=10 -mtp=cp15 -S -o - -xc - 
> -fstack-protector-all -O3
> int foo(void *);
> int bar(void)
> {
>
> return foo(__builtin_thread_pointer()) + 1;
> }
> .arch armv7-a
> .fpu softvfp
> .eabi_attribute 20, 1
> .eabi_attribute 21, 1
> .eabi_attribute 23, 3
> .eabi_attribute 24, 1
> .eabi_attribute 25, 1
> .eabi_attribute 26, 2
> .eabi_attribute 30, 2
> .eabi_attribute 34, 1
> .eabi_attribute 18, 4
> .file   ""
> .text
> .align  2
> .global bar
> .syntax unified
> .arm
> .type   bar, %function
> bar:
> @ args = 0, pretend = 0, frame = 8
> @ frame_needed = 0, uses_anonymous_args = 0
> push{r4, lr}
> mrc p15, 0, r4, c13, c0, 3  @ load_tp_hard
> add r3, r4, #10
> sub sp, sp, #8
> mov r0, r4
> add r4, r4, #10
> ldr r3, [r3]
> str r3, [sp, #4]
> mov r3, #0
> bl  foo
> ldr r3, [r4]
> ldr r4, [sp, #4]
> eorsr3, r4, r3
> mov r4, #0
> bne .L5
> add r0, r0, #1
> add sp, sp, #8
> @ sp needed
> pop {r4, pc}
> .L5:
> bl  __stack_chk_fail
> .size   bar, .-bar
> .ident  "GCC: (GNU) 12.0.0 20211019 (experimental)"
> .section.note.GNU-stack,"",%progbits
>


Re: [RFC PATCH 0/1] implement TLS register based stack canary for ARM

2021-10-21 Thread Kees Cook via Gcc-patches
On Thu, Oct 21, 2021 at 06:34:04PM +0200, Ard Biesheuvel wrote:
> On Thu, 21 Oct 2021 at 12:23, Ard Biesheuvel  wrote:
> >
> > Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352
> >
> > In the Linux kernel, user processes calling into the kernel are
> > essentially threads running in the same address space, of a program that
> > never terminates. This means that using a global variable for the stack
> > protector canary value is problematic on SMP systems, as we can never
> > change it unless we reboot the system. (Processes that sleep for any
> > reason will do so on a call into the kernel, which means that there will
> > always be live kernel stack frames carrying copies of the canary taken
> > when the function was entered)
> >
> > AArch64 implements -mstack-protector-guard=sysreg for this purpose, as
> > this permits the kernel to use different memory addresses for the stack
> > canary for each CPU, and context switch the chosen system register with
> > the rest of the process, allowing each process to use its own unique
> > value for the stack canary.
> >
> > This patch implements something similar, but for the 32-bit ARM kernel,
> > which will start using the user space TLS register TPIDRURO to index
> > per-process metadata while running in the kernel. This means we can just
> > add an offset to TPIDRURO to obtain the address from which to load the
> > canary value.
> >
> > The patch is a bit rough around the edges, but produces the correct
> > results as far as I can tell.
> 
> This is a lie

LOL.

> 
> > However, I couldn't quite figure out how
> > to modify the patterns so that the offset will be moved into the
> > immediate offset field of the LDR instructions, so currently, the ADD of
> > the offset is always a distinct instruction.
> >
> 
> ... and this is no longer true now that I fixed the correctness
> problem. I will be sending out a v2 shortly, so please disregard this
> one for now.

Heh, I hadn't even had a chance to test it, so I'll hold off. :)

Thanks!

-Kees

-- 
Kees Cook


Re: [wwwdocs, committed] GCC 12: Add release note for Fortran TS29113 improvements

2021-10-21 Thread Thomas Koenig via Gcc-patches

Hi Sandra,

I've checked in the attached patch to announce the cleanup project that 
Tobias and I have been working on over the last several months in the 
GCC 12 release notes.  I also updated the page for TS29113 on the GCC 
wiki to reflect that anything that still doesn't work ought to be 
considered a bug, not just incomplete work-in-progress.


Thanks for the work that the both of you put into this, and also thanks
for putting this into the release notes.  I was about to suggest you do
so, so you beat me to it :-)

I know that the conformance sections of the Fortran manual are badly in 
need of updating too (not just for TS29113, but the various versions of 
the standard).  That's in my queue to fix up, but I'm likely going to 
need assistance from the gfortran maintainers to get the details right.  
:-S


Just ask :-)

Regards

Thomas



[PATCH] x86: Document -fcf-protection requires i686 or newer

2021-10-21 Thread H.J. Lu via Gcc-patches
PR target/98667
* doc/invoke.texi: Document -fcf-protection requires i686 or
new.
---
 gcc/doc/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c66a25fcd69..71992b8c597 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15542,7 +15542,8 @@ which functions and calls should be skipped from 
instrumentation
 (@pxref{Function Attributes}).
 
 Currently the x86 GNU/Linux target provides an implementation based
-on Intel Control-flow Enforcement Technology (CET).
+on Intel Control-flow Enforcement Technology (CET) which works for
+i686 processor or newer.
 
 @item -fstack-protector
 @opindex fstack-protector
-- 
2.32.0



Re: [PATCH] x86: Adjust gcc.target/i386/pr22076.c

2021-10-21 Thread H.J. Lu via Gcc-patches
On Tue, Oct 19, 2021 at 11:42 PM Uros Bizjak  wrote:
>
> On Tue, Oct 19, 2021 at 8:23 PM H.J. Lu  wrote:
> >
> > commit 247c407c83f0015f4b92d5f71e45b63192f6757e
> > Author: Roger Sayle 
> > Date:   Mon Oct 18 12:15:40 2021 +0100
> >
> > Try placing RTL folded constants in the constant pool.
> >
> > My recent attempts to come up with a testcase for my patch to evaluate
> > ss_plus in simplify-rtx.c, identified a missed optimization opportunity
> > (that's potentially a long-time regression): The RTL optimizers no 
> > longer
> > place constants in the constant pool.
> >
> > changed -m32 codegen from
> >
> > movq.LC1, %mm0
> > paddb   .LC0, %mm0
> > movq%mm0, x
> > ret
> >
> > to
> >
> > movl$807671820, %eax
> > movl$1616136252, %edx
> > movl%eax, x
> > movl%edx, x+4
> > ret
> >
> > and -m64 codegen from
> >
> > movq.LC1(%rip), %mm0
> > paddb   .LC0(%rip), %mm0
> > movq%xmm0, x(%rip)
> > ret
> >
> > to
> >
> > movq.LC2(%rip), %rax
> > movq%rax, x(%rip)
> > ret
> >
> > Adjust pr22076.c to check that MMX register isn't used since avoiding
> > MMX register isn't a bad thing.
> >
> > PR testsuite/102840
> > * gcc.target/i386/pr22076.c: Updated to check that MMX register
> > isn't used.
>
> The compiler is now able to evaluate the result at the compile time
> and it optimizes the test accordingly. Let's provide some MMX
> instruction that is implemented with UNSPEC, so the compiler won't be
> able to outsmart us.
>
> Something like the attached patch.
>
> Uros.

Works for me.

Thanks.

-- 
H.J.


[RFC PATCH v2 0/1] implement TLS register based stack canary for ARM

2021-10-21 Thread Ard Biesheuvel via Gcc-patches
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352

In the Linux kernel, user processes calling into the kernel are
essentially threads running in the same address space, of a program that
never terminates. This means that using a global variable for the stack
protector canary value is problematic on SMP systems, as we can never
change it unless we reboot the system. (Processes that sleep for any
reason will do so on a call into the kernel, which means that there will
always be live kernel stack frames carrying copies of the canary taken
when the function was entered)

AArch64 implements -mstack-protector-guard=sysreg for this purpose, as
this permits the kernel to use different memory addresses for the stack
canary for each CPU, and context switch the chosen system register with
the rest of the process, allowing each process to use its own unique
value for the stack canary.

This patch implements something similar, but for the 32-bit ARM kernel,
which will start using the user space TLS register TPIDRURO to index
per-process metadata while running in the kernel. This means we can just
add an offset to TPIDRURO to obtain the address from which to load the
canary value.

As for the spilling issues that have been fixed in this code in the
past: I suppose a register carrying the TLS register value will never
get spilled to begin with?

Comments/suggestions welcome.

Cc: Keith Packard 
Cc: thomas.preudho...@celest.fr
Cc: adhemerval.zane...@linaro.org
Cc: Qing Zhao 
Cc: Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org

Ard Biesheuvel (1):
  [ARM] Add support for TLS register based stack protector canary access

 gcc/config/arm/arm-opts.h   |  6 ++
 gcc/config/arm/arm-protos.h |  2 +
 gcc/config/arm/arm.c| 52 
 gcc/config/arm/arm.md   | 62 +++-
 gcc/config/arm/arm.opt  | 22 +++
 gcc/doc/invoke.texi |  9 +++
 6 files changed, 151 insertions(+), 2 deletions(-)

-- 
2.30.2

$ cat|arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls 
-mstack-protector-guard-offset=1296 -mtp=cp15 -S -o - -xc - 
-fstack-protector-all -O3

int foo(void *);
int bar(void)
{

return foo(__builtin_thread_pointer()) + 1;
}

.arch armv7-a
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file   ""
.text
.align  2
.global bar
.syntax unified
.arm
.type   bar, %function
bar:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
push{r4, lr}
mrc p15, 0, r4, c13, c0, 3  @ load_tp_hard
mov r0, r4
sub sp, sp, #8
ldr r3, [r4, #1296]
str r3, [sp, #4]
mov r3, #0
bl  foo
ldr r2, [sp, #4]
ldr r3, [r4, #1296]
eorsr3, r2, r3
mov r2, #0
bne .L5
add r0, r0, #1
add sp, sp, #8
@ sp needed
pop {r4, pc}
.L5:
bl  __stack_chk_fail
.size   bar, .-bar
.ident  "GCC: (GNU) 12.0.0 20211021 (experimental)"
.section.note.GNU-stack,"",%progbits



[RFC PATCH v2 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-10-21 Thread Ard Biesheuvel via Gcc-patches
Add support for accessing the stack canary value via the TLS register,
so that multiple threads running in the same address space can use
distinct canary values. This is intended for the Linux kernel running in
SMP mode, where processes entering the kernel are essentially threads
running the same program concurrently: using a global variable for the
canary in that context is problematic because it can never be rotated,
and so the OS is forced to use the same value as long as it remains up.

Using the TLS register to index the stack canary helps with this, as it
allows each CPU to context switch the TLS register along with the rest
of the process, permitting each process to use its own value for the
stack canary.

2021-10-21 Ard Biesheuvel 

* config/arm/arm-opts.h (enum stack_protector_guard): New
* config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
New
* config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
(arm_option_override_internal): Handle and put in error checks
for stack protector guard options.
(arm_option_reconfigure_globals): Likewise
(arm_stack_protect_tls_canary_mem): New
(arm_stack_protect_guard): New
* config/arm/arm.md (stack_protect_set): New
(stack_protect_set_tls): Likewise
(stack_protect_test): Likewise
(stack_protect_test_tls): Likewise
* config/arm/arm.opt (-mstack-protector-guard): New
(-mstack-protector-guard-offset): New.

Signed-off-by: Ard Biesheuvel 
---
 gcc/config/arm/arm-opts.h   |  6 ++
 gcc/config/arm/arm-protos.h |  2 +
 gcc/config/arm/arm.c| 52 
 gcc/config/arm/arm.md   | 62 +++-
 gcc/config/arm/arm.opt  | 22 +++
 gcc/doc/invoke.texi |  9 +++
 6 files changed, 151 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
index 5c4b62f404f7..581ba3c4fbbb 100644
--- a/gcc/config/arm/arm-opts.h
+++ b/gcc/config/arm/arm-opts.h
@@ -69,4 +69,10 @@ enum arm_tls_type {
   TLS_GNU,
   TLS_GNU2
 };
+
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard {
+  SSP_TLSREG,  /* per-thread canary in TLS register */
+  SSP_GLOBAL   /* global canary */
+};
 #endif
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9b1f61394ad7..37e80256a78d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, 
rtx, rtx, rtx, rtx);
 extern rtx arm_load_tp (rtx);
 extern bool arm_coproc_builtin_available (enum unspecv);
 extern bool arm_coproc_ldc_stc_legitimate_address (rtx);
+extern rtx arm_stack_protect_tls_canary_mem (void);
+
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c4ff06b087eb..0bf06e764dbb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
+#undef TARGET_STACK_PROTECT_GUARD
+#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts,
   if (TARGET_THUMB2_P (opts->x_target_flags))
 opts->x_inline_asm_unified = true;
 
+  if (arm_stack_protector_guard == SSP_GLOBAL
+  && opts->x_arm_stack_protector_guard_offset_str)
+{
+  error ("incompatible options %'-mstack-protector-guard=global%' and"
+"%'-mstack-protector-guard-offset=%qs%'",
+arm_stack_protector_guard_offset_str);
+}
+
+  if (opts->x_arm_stack_protector_guard_offset_str)
+{
+  char *end;
+  const char *str = arm_stack_protector_guard_offset_str;
+  errno = 0;
+  long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0);
+  if (!*str || *end || errno)
+   error ("%qs is not a valid offset in %qs", str,
+  "-mstack-protector-guard-offset=");
+  arm_stack_protector_guard_offset = offs;
+}
+
 #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
   SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
 #endif
@@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void)
   else
target_thread_pointer = TP_SOFT;
 }
+
+  if (arm_stack_protector_guard == SSP_TLSREG
+  && target_thread_pointer != TP_CP15)
+error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
 }
 
 /* Perform some validation between the desired architecture and the rest of the
@@ -8087,6 +8114,19 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx 
reg, rtx pic_reg,
 }
 
 
+rtx
+arm_stack_protect_tls_canary_mem (void)
+{
+  rtx tp = gen_reg_rtx (SImode);
+  emit_insn (gen_load_tp_hard (tp));
+
+  rtx r

Re: [PATH][_GLIBCXX_DEBUG] Fix unordered container merge

2021-10-21 Thread François Dumont via Gcc-patches

I eventually would like to propose a different approach.

I am adding a hook in normal implementation to let the _GLIBCXX_DEBUG 
code know when a node is being extracted. This way invalidation is only 
done by comparing nodes, no need to compute hash code for this operation.


The only drawback is that for each extraction we have a linear research 
on iterators to invalidate the correct one. I will implement next an 
optimization when hasher/equal_to are noexcept.


This patch also remove the invalid noexcept qualification on the 
_Hashtable merge methods and make use of const_iterator as it is what is 
expected by the extract.


Tested under Linux x86_64.

Ok to commit ?

François


On 16/10/21 4:52 pm, Jonathan Wakely wrote:



On Sat, 16 Oct 2021, 14:49 François Dumont via Libstdc++, 
mailto:libstdc%2b...@gcc.gnu.org>> wrote:


Hi

 Here is the new proposal. My only concern is that we are also
using
hash or equal_to functors in the guard destructor.



Can we catch any exception there, invalidate all iterators, and not 
rethrow the exception?



 I am going to enhance merge normal implementation to make use of
the cached hash code when hash functors are the same between the
source
and destination of nodes. Maybe I'll be able to make use of it in
Debug
implementation too.

François


On 14/10/21 10:23 am, Jonathan Wakely wrote:
> On Wed, 13 Oct 2021 at 18:10, François Dumont via Libstdc++
> mailto:libstdc%2b...@gcc.gnu.org>> wrote:
>> Hi
>>
>>       libstdc++: [_GLIBCXX_DEBUG] Implement unordered container
merge
>>
>>       The _GLIBCXX_DEBUG unordered containers need a dedicated
merge
>> implementation
>>       so that any existing iterator on the transfered nodes is
properly
>> invalidated.
>>
>>       Add typedef/using declaration for everything used as-is
from normal
>> implementation.
>>
>>       libstdc++-v3/ChangeLog:
>>
>>               * include/debug/safe_container.h
(_Safe_container<>): Make
>> all methods
>>               protected.
>>               * include/debug/safe_unordered_container.h
>>  (_Safe_unordered_container<>::_M_invalide_all): Make public.
>>  (_Safe_unordered_container<>::_M_invalide_if): Likewise.
>> (_Safe_unordered_container<>::_M_invalide_local_if): Likewise.
>>               * include/debug/unordered_map
>>  (unordered_map<>::mapped_type, pointer, const_pointer): New
>> typedef.
>>               (unordered_map<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_map<>::get_allocator, empty, size, max_size):
>> Add usings.
>>  (unordered_map<>::bucket_count, max_bucket_count, bucket):
>> Add usings.
>>  (unordered_map<>::hash_function, key_equal, count,
>> contains): Add usings.
>>               (unordered_map<>::operator[], at, rehash,
reserve): Add usings.
>>               (unordered_map<>::merge): New.
>>  (unordered_multimap<>::mapped_type, pointer,
>> const_pointer): New typedef.
>>  (unordered_multimap<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_multimap<>::get_allocator, empty, size,
>> max_size): Add usings.
>>  (unordered_multimap<>::bucket_count, max_bucket_count,
>> bucket): Add usings.
>>  (unordered_multimap<>::hash_function, key_equal, count,
>> contains): Add usings.
>>  (unordered_multimap<>::rehash, reserve): Add usings.
>>  (unordered_multimap<>::merge): New.
>>               * include/debug/unordered_set
>>  (unordered_set<>::mapped_type, pointer, const_pointer): New
>> typedef.
>>               (unordered_set<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_set<>::get_allocator, empty, size, max_size):
>> Add usings.
>>  (unordered_set<>::bucket_count, max_bucket_count, bucket):
>> Add usings.
>>  (unordered_set<>::hash_function, key_equal, count,
>> contains): Add usings.
>>               (unordered_set<>::rehash, reserve): Add usings.
>>               (unordered_set<>::merge): New.
>>  (unordered_multiset<>::mapped_type, pointer,
>> const_pointer): New typedef.
>>  (unordered_multiset<>::reference, const_reference,
>> difference_type): New typedef.
>>  (unordered_multiset<>::get_allocator, empty, size,
>> max_size): Add usings.
>>  (unordered_multiset<>::bucket_count, max_bucket_count,
>> bucket): Add usings.
>>  (unordered_multiset<>::hash_function, key_equal, count,
>> contains): Add usings.
>>  (unordered_multiset<>::rehash, reserve): Add usings.
>>  (unordered_multiset<>::merge): New.
>>               *
>> testsuite/23_containers/unordered_map/debug/merge1_neg.cc: New
test.
>>               *
>> testsuite/23_containers/unordered_map/debug/merge2_neg.cc: New
test.
 

Re: [PATH][_GLIBCXX_DEBUG] Fix unordered container merge

2021-10-21 Thread Jonathan Wakely via Gcc-patches
On Thu, 21 Oct 2021 at 17:52, François Dumont  wrote:

> I eventually would like to propose a different approach.
>
> I am adding a hook in normal implementation to let the _GLIBCXX_DEBUG code
> know when a node is being extracted. This way invalidation is only done by
> comparing nodes, no need to compute hash code for this operation.
>

Ugh, this is horrible, I don't like the normal mode depending on the debug
mode (even if it's just having to add hooks like this).

The previous patch seemed fine to me. Already an improvement on what is on
trunk now.


[PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2021-10-21 Thread Paul A. Clarke via Gcc-patches
Power10 ISA added `vextract*` instructions which are realized in the
`vec_extractm` instrinsic.

Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
`_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.

2021-10-21  Paul A. Clarke  

gcc
* config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
when _ARCH_PWR10.
* config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
(_mm_movemask_epi8): Likewise.
---
Tested on Power10 powerpc64le-linux (compiled with and without
`-mcpu=power10`).

OK for trunk?

 gcc/config/rs6000/emmintrin.h | 8 
 gcc/config/rs6000/xmmintrin.h | 4 
 2 files changed, 12 insertions(+)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index 32ad72b4cc35..ab16c13c379e 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B)
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movemask_pd (__m128d  __A)
 {
+#ifdef _ARCH_PWR10
+  return vec_extractm ((__v2du) __A);
+#else
   __vector unsigned long long result;
   static const __vector unsigned int perm_mask =
 {
@@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d  __A)
 #else
   return result[0];
 #endif
+#endif /* !_ARCH_PWR10 */
 }
 #endif /* _ARCH_PWR8 */
 
@@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B)
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movemask_epi8 (__m128i __A)
 {
+#ifdef _ARCH_PWR10
+  return vec_extractm ((__v16qu) __A);
+#else
   __vector unsigned long long result;
   static const __vector unsigned char perm_mask =
 {
@@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A)
 #else
   return result[0];
 #endif
+#endif /* !_ARCH_PWR10 */
 }
 #endif /* _ARCH_PWR8 */
 
diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
index ae1a33e8d95b..4c093fd1d5ae 100644
--- a/gcc/config/rs6000/xmmintrin.h
+++ b/gcc/config/rs6000/xmmintrin.h
@@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A)
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movemask_ps (__m128  __A)
 {
+#ifdef _ARCH_PWR10
+  return vec_extractm ((vector unsigned int) __A);
+#else
   __vector unsigned long long result;
   static const __vector unsigned int perm_mask =
 {
@@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128  __A)
 #else
   return result[0];
 #endif
+#endif /* !_ARCH_PWR10 */
 }
 #endif /* _ARCH_PWR8 */
 
-- 
2.27.0



Re: [PATCH] Try to resolve paths in threader without looking further back.

2021-10-21 Thread Jeff Law via Gcc-patches




On 10/21/2021 1:17 AM, Aldy Hernandez wrote:

On Wed, Oct 20, 2021 at 10:01 PM Jeff Law  wrote:



On 10/20/2021 9:15 AM, Aldy Hernandez wrote:

On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor  wrote:


I appreciate the heads up.  I'm happy that the threader has
improved.  I'm obviously not pleased that it has led to regressions
in warnings but I understand that in some cases they might be due
to limitations in the warning code.  I think the test case you have
xfailed might be one such example.  The uninitialized warnings are
exquisitely sensitive to these types of changes.  If/when this patch
is applied please reopen PR 89230 and reference this commit.

Having said that, to maintain the quality of diagnostics,
the work that goes into these nice optimizer improvements needs
to be balanced by an effort to either update the warning code
to cope with the IL changes, or the optimizers need to take care
to avoid exposing undefined code that the warnings are designed
to detect.  I'm concerned not just that the quality of GCC 12
diagnostics has been eroding, but also that it seems to be not
just acceptable but expected.

You make a very good point.  It is certainly not my intention to make
life difficult for the warning maintainers, but I'm afraid I don't
have sufficient knowledge in the area to improve them.

There may be some low hanging fruit though.  At least in the warnings
that use the ranger, there's no reason to run these passes so late in
the pipeline.  You could run the warning code as early as you want,
insofar as SSA is available and the CFG has been built.  Heck, you may
even be able to run at -O0, though we may need some sort of value
numbering.  I believe Richi even suggested this a while back.

Running them later in the pipeline is to take advantage of the
optimizers removing dead and unreachable code as much as possible. In
fact, that's critical to -Wuninitialized.  Optimizing away unreachable
paths  to avoid Wuninitialized false positives has been the major driver
of jump threading improvements for the last 15 years.

Ughh, that's unfortunate.  We're gonna have to come up with
improvements to the Wuninitialized code, or a different paradigm
altogether.  I'm afraid this will only get worse.
Well, good luck with a different paradigm :-)  It's a tough little nut.  
As long as we want to reduce false positives, then we're going to be 
dependent on optimization and related analysis.


And as I've noted before, we're generally better off fixing the 
optimizers when we stumble over a false positive from Wuninitialized.  
When that's not possible, we should look to fix the predicate analysis 
code.




It is a bit ironic that jump threading helps reduce Wuninitialized
false positives, but yet too much of it causes even more false
positives.
I would expect the latter to be relatively rare for Wuninitialized. I 
think some of the other middle end warnings may be in a different boat 
though.




jeff


[r12-4601 Regression] FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 on Linux/x86_64

2021-10-21 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c8a889fc0e115d40a2d02f32842655f3eadc8fa1 is the first bad commit
commit c8a889fc0e115d40a2d02f32842655f3eadc8fa1
Author: Hongyu Wang 
Date:   Wed Oct 20 13:13:39 2021 +0800

i386: Fix wrong codegen for V8HF move without TARGET_AVX512F

caused

FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4601/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] Handle jobserver file descriptors in btest.

2021-10-21 Thread Ian Lance Taylor via Gcc-patches
On Thu, Oct 21, 2021 at 12:48 AM Martin Liška  wrote:
>
> The patch is about sensitive handling of file descriptors opened
> by make's jobserver.

Thanks.  I think a better approach would be, at the start of main,
fstat the descriptors up to 10 and record the ones for which fstat
succeeds.  Then at the end of main only check the descriptors for
which fstat failed earlier.

I can work on that at some point if you don't want to tackle it.

Ian


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-21 Thread Jeff Law via Gcc-patches




On 10/21/2021 1:42 AM, Aldy Hernandez wrote:

Massaging the IL should only take two forms IIRC.

First, if we have a simplification we can do.  That could be const/copy
propagation, replacing an expression with an SSA_NAME or constant and
the like.  It doesn't massage the IL just to massage the IL.

Second, we do temporarily copy propagate the current known values of an
SSA name into use points and then see if that allows us to determine if
a statement is already in the hash tables.  But we undo that so that
nobody should see that temporary change in state.

Finally, it does create some expressions & statements on the fly to
enter them into the tables.  For example, if it sees a store, it'll
create a load with the source & dest interchanged and enter that into
the expression table.  But none of this stuff ever shows up in the IL.
It's just to create entries in the expression tables.

So ITSM the only real concern would be if those temporary const/copy
propagations were still in the IL and we called back into Ranger and it
poked at that data somehow.

Hmmm, this is all very good insight.  Thanks.

One thing that would have to be adjusted then is remove the
enable_ranger() call in the patch.  This sets a global ranger, and
there are users of get_range_query() that will use it to get on-demand
ranges.  One such use that I added was ssa_name_has_boolean_range in
tree-ssa-dom.c.  This means that if the IL has been temporarily
changed, this function can and will use the global ranger.  The
alternative here would be to just create a new local ranger:

-  gimple_ranger *ranger = enable_ranger (fun);
+  gimple_ranger *ranger = new gimple_ranger;

and then obviously deallocate it at the disable_ranger call site.

This will cause any users of get_range_query() in the compiler to just
use global ranges.  Hopefully, none of these downstream users of
get_range_query() from DOM need context sensitive results.
ssa_name_has_boolean_range??

I think what you'd need to do is check that there are no calls to the
ranger from cprop_into_stmt (?? this is the place where IL changes??),
until wherever the undoing happens (I couldn't find it).  I see a call
to simplify_using_ranges in optimize_stmt that looks like it could be
called with the IL in mid-flight.  Maybe this call needs to happen
before the IL is altered?


So if we're referring to those temporary const/copy propagations
"escaping" into Ranger, then I would fully expect that to cause
problems.  Essentially they're path sensitive const/copy propagations
and may not be valid on all the paths through the CFG to the statement
where the propagation occurs

Yeah.  disabling the global ranger should help, plus making sure you
don't use the ranger in the midst of the path sensitive changes.
I think we should first try to remove those temporary const/copy 
propagations.  As I noted in a different follow-up, I can't remember if 
they were done as part of the original non-copying threader or if they 
enabled further optimizations in the copying threader.  If its the 
former, then they can go away and that would be my preference. I'll try 
to poke at that over the weekend.


jeff


Re: [PATCH] x86: Adjust gcc.target/i386/pr22076.c

2021-10-21 Thread Uros Bizjak via Gcc-patches
On Thu, Oct 21, 2021 at 6:50 PM H.J. Lu  wrote:
>
> On Tue, Oct 19, 2021 at 11:42 PM Uros Bizjak  wrote:
> >
> > On Tue, Oct 19, 2021 at 8:23 PM H.J. Lu  wrote:
> > >
> > > commit 247c407c83f0015f4b92d5f71e45b63192f6757e
> > > Author: Roger Sayle 
> > > Date:   Mon Oct 18 12:15:40 2021 +0100
> > >
> > > Try placing RTL folded constants in the constant pool.
> > >
> > > My recent attempts to come up with a testcase for my patch to evaluate
> > > ss_plus in simplify-rtx.c, identified a missed optimization 
> > > opportunity
> > > (that's potentially a long-time regression): The RTL optimizers no 
> > > longer
> > > place constants in the constant pool.
> > >
> > > changed -m32 codegen from
> > >
> > > movq.LC1, %mm0
> > > paddb   .LC0, %mm0
> > > movq%mm0, x
> > > ret
> > >
> > > to
> > >
> > > movl$807671820, %eax
> > > movl$1616136252, %edx
> > > movl%eax, x
> > > movl%edx, x+4
> > > ret
> > >
> > > and -m64 codegen from
> > >
> > > movq.LC1(%rip), %mm0
> > > paddb   .LC0(%rip), %mm0
> > > movq%xmm0, x(%rip)
> > > ret
> > >
> > > to
> > >
> > > movq.LC2(%rip), %rax
> > > movq%rax, x(%rip)
> > > ret
> > >
> > > Adjust pr22076.c to check that MMX register isn't used since avoiding
> > > MMX register isn't a bad thing.
> > >
> > > PR testsuite/102840
> > > * gcc.target/i386/pr22076.c: Updated to check that MMX register
> > > isn't used.
> >
> > The compiler is now able to evaluate the result at the compile time
> > and it optimizes the test accordingly. Let's provide some MMX
> > instruction that is implemented with UNSPEC, so the compiler won't be
> > able to outsmart us.
> >
> > Something like the attached patch.
> >
> > Uros.
>
> Works for me.

Committed with the following ChangeLog:

testsuite: Adjust pr22076.c to avoid compile-time optimization [PR102840]

2021-10-21  Uroš Bizjak  

PR testsuite/102840

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr22076.c: Adjust to avoid compile time optimization.

Uros.


[PATCH] Possible use before def in fortran/trans-decl.c.

2021-10-21 Thread Andrew MacLeod via Gcc-patches
As I'm tweaking installing ranger as the VRP2 pass, I am getting a stage 
2 bootstrap failure now:


In file included from 
/opt/notnfs/amacleod/master/gcc/gcc/fortran/trans-decl.c:28:
/opt/notnfs/amacleod/master/gcc/gcc/tree.h: In function ‘void 
gfc_conv_cfi_to_gfc(stmtblock_t*, stmtblock_t*, tree, tree, gfc_symbol*)’:
/opt/notnfs/amacleod/master/gcc/gcc/tree.h:244:56: error: ‘rank’ may be 
used uninitialized in this function [-Werror=maybe-uninitialized]

  244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
  | ^~~~
/opt/notnfs/amacleod/master/gcc/gcc/fortran/trans-decl.c:6671:8: note: 
‘rank’ was declared here

 6671 |   tree rank, idx, etype, tmp, tmp2, size_var = NULL_TREE;
  |    ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1136: fortran/trans-decl.o] Error 1


looking at that function, in the middle I see:

  if (sym->as->rank < 0)
    {
  /* Set gfc->dtype.rank, if assumed-rank.  */
  rank = gfc_get_cfi_desc_rank (cfi);
  gfc_add_modify (&block, gfc_conv_descriptor_rank (gfc_desc), rank);
    }
  else if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (gfc_desc)))
    /* In that case, the CFI rank and the declared rank can differ.  */
    rank = gfc_get_cfi_desc_rank (cfi);
  else
    rank = build_int_cst (signed_char_type_node, sym->as->rank);


so rank is set on all paths here.   However, stepping back a bit, 
earlier in the function I see:


  if (!sym->attr.dimension || !GFC_DESCRIPTOR_TYPE_P (TREE_TYPE 
(gfc_desc)))

    {
  tmp = gfc_get_cfi_desc_base_addr (cfi);
  gfc_add_modify (&block, gfc_desc,
  fold_convert (TREE_TYPE (gfc_desc), tmp));
  if (!sym->attr.dimension)
    goto done;
    }

The done: label occurs *after* that block of initialization code, and 
bit furtehr down , I see this:


      gfc_add_modify (&loop_body, tmpidx, idx);
  stmtblock_t inner_loop;
  gfc_init_block (&inner_loop);
  tree dim = gfc_create_var (TREE_TYPE (rank), "dim");

I cannot convince myself by looking at the intervening code that this 
can not be executed along this path.  Perhaps someone more familiar with 
the code can check it out.   However, It seems worthwhile to at least 
initialize rank to NULL_TREE, thus we can be more likely to see a trap 
if that path ever gets followed.


And it makes the warning go away :-)

OK?

Andrew

PS as a side note, it would be handy to have the def point *and* the use 
point that might be undefined.   Its a big function and it took me a 
while just to see where a possible use might be.





commit ed571a93c54e3967fbf445624e47817be5e333ed
Author: Andrew MacLeod 
Date:   Thu Oct 21 14:48:20 2021 -0400

Initialize variable.

gcc/fortran/
* trans-decl.c (gfc_conv_cfi_to_gfc): Initialize rank to NULL_TREE.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index de624c82fcf..fe5511b5285 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -6668,7 +6668,7 @@ gfc_conv_cfi_to_gfc (stmtblock_t *init, stmtblock_t *finally,
   stmtblock_t block;
   gfc_init_block (&block);
   tree cfi = build_fold_indirect_ref_loc (input_location, cfi_desc);
-  tree rank, idx, etype, tmp, tmp2, size_var = NULL_TREE;
+  tree idx, etype, tmp, tmp2, size_var = NULL_TREE, rank = NULL_TREE;
   bool do_copy_inout = false;
 
   /* When allocatable + intent out, free the cfi descriptor.  */


Re: [PATCH] x86: Document -fcf-protection requires i686 or newer

2021-10-21 Thread Uros Bizjak via Gcc-patches
On Thu, Oct 21, 2021 at 6:47 PM H.J. Lu  wrote:
>
> PR target/98667
> * doc/invoke.texi: Document -fcf-protection requires i686 or
> new.

Obvious patch?

Uros.

> ---
>  gcc/doc/invoke.texi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c66a25fcd69..71992b8c597 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15542,7 +15542,8 @@ which functions and calls should be skipped from 
> instrumentation
>  (@pxref{Function Attributes}).
>
>  Currently the x86 GNU/Linux target provides an implementation based
> -on Intel Control-flow Enforcement Technology (CET).
> +on Intel Control-flow Enforcement Technology (CET) which works for
> +i686 processor or newer.
>
>  @item -fstack-protector
>  @opindex fstack-protector
> --
> 2.32.0
>


Re: [PATCH] x86: Document -fcf-protection requires i686 or newer

2021-10-21 Thread H.J. Lu via Gcc-patches
On Thu, Oct 21, 2021 at 12:04 PM Uros Bizjak  wrote:
>
> On Thu, Oct 21, 2021 at 6:47 PM H.J. Lu  wrote:
> >
> > PR target/98667
> > * doc/invoke.texi: Document -fcf-protection requires i686 or
> > new.
>
> Obvious patch?

I am checking it in and backporting it to release branches.

Thanks.

> Uros.
>
> > ---
> >  gcc/doc/invoke.texi | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index c66a25fcd69..71992b8c597 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -15542,7 +15542,8 @@ which functions and calls should be skipped from 
> > instrumentation
> >  (@pxref{Function Attributes}).
> >
> >  Currently the x86 GNU/Linux target provides an implementation based
> > -on Intel Control-flow Enforcement Technology (CET).
> > +on Intel Control-flow Enforcement Technology (CET) which works for
> > +i686 processor or newer.
> >
> >  @item -fstack-protector
> >  @opindex fstack-protector
> > --
> > 2.32.0
> >



-- 
H.J.


Re: [PATCH] libcody: Avoid double-free

2021-10-21 Thread Jason Merrill via Gcc-patches

On 10/21/21 09:27, Jonathan Wakely wrote:

If the listen call fails then 'goto fail' will jump to that label and
use freeaddrinfo again. Set the pointer to null to prevent that.


OK.


libcody/ChangeLog:

* netserver.cc (ListenInet6): Set pointer to null after
deallocation.
---
  libcody/netserver.cc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/libcody/netserver.cc b/libcody/netserver.cc
index 30202c5106a..0499b5790b4 100644
--- a/libcody/netserver.cc
+++ b/libcody/netserver.cc
@@ -140,6 +140,7 @@ int ListenInet6 (char const **e, char const *name, int 
port, unsigned backlog)
  
   listen:;

freeaddrinfo (addrs);
+  addrs = nullptr;
  
if (listen (fd, backlog ? backlog : 17) < 0)

  {





Re: [PATCH] c++tools: Fix memory leak

2021-10-21 Thread Jason Merrill via Gcc-patches

On 10/21/21 09:28, Jonathan Wakely wrote:

The allocated memory is not freed when returning early due to an error.

c++tools/ChangeLog:

* resolver.cc (module_resolver::read_tuple_file): Use unique_ptr
to ensure memory is freed before returning.
---
  c++tools/resolver.cc | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/c++tools/resolver.cc b/c++tools/resolver.cc
index 421fdaa55fe..d1b73a47778 100644
--- a/c++tools/resolver.cc
+++ b/c++tools/resolver.cc
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "resolver.h"
  // C++
  #include 
+#include 
  // C
  #include 
  // OS
@@ -114,10 +115,17 @@ module_resolver::read_tuple_file (int fd, char const 
*prefix, bool force)
buffer = mmap (nullptr, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (buffer == MAP_FAILED)
  return -errno;
+  struct Deleter {
+void operator()(void* p) const { munmap(p, size); }
+size_t size;
+  };
+  std::unique_ptr guard(buffer, Deleter{(size_t)stat.st_size});
  #else
buffer = xmalloc (stat.st_size);
if (!buffer)
  return -errno;
+  struct Deleter { void operator()(void* p) const { free(p); } };
+  std::unique_ptr guard;


Don't you need to initialize guard from buffer?


if (read (fd, buffer, stat.st_size) != stat.st_size)
  return -errno;
  #endif
@@ -179,12 +187,6 @@ module_resolver::read_tuple_file (int fd, char const 
*prefix, bool force)
}
  }
  
-#if MAPPED_READING

-  munmap (buffer, stat.st_size);
-#else
-  free (buffer);
-#endif
-
return 0;
  }
  





PING [PATCH v4 0/2] Implement indirect external access

2021-10-21 Thread H.J. Lu via Gcc-patches
On Wed, Sep 22, 2021 at 7:02 PM H.J. Lu  wrote:
>
> Changes in the v4 patch.
>
> 1. Add nodirect_extern_access attribute.
>
> Changes in the v3 patch.
>
> 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to
> GNU binutils 2.38.  But the -z indirect-extern-access linker option is
> only available for Linux/x86.  However, the --max-cache-size=SIZE linker
> option was also addded within a day.  --max-cache-size=SIZE is used to
> check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support.
>
> Changes in the v2 patch.
>
> 1. Rename the option to -fdirect-extern-access.
>
> ---
> On systems with copy relocation:
> * A copy in executable is created for the definition in a shared library
> at run-time by ld.so.
> * The copy is referenced by executable and shared libraries.
> * Executable can access the copy directly.
>
> Issues are:
> * Overhead of a copy, time and space, may be visible at run-time.
> * Read-only data in the shared library becomes read-write copy in
> executable at run-time.
> * Local access to data with the STV_PROTECTED visibility in the shared
> library must use GOT.
>
> On systems without function descriptor, function pointers vary depending
> on where and how the functions are defined.
> * If the function is defined in executable, it can be the address of
> function body.
> * If the function, including the function with STV_PROTECTED visibility,
> is defined in the shared library, it can be the address of the PLT entry
> in executable or shared library.
>
> Issues are:
> * The address of function body may not be used as its function pointer.
> * ld.so needs to search loaded shared libraries for the function pointer
> of the function with STV_PROTECTED visibility.
>
> Here is a proposal to remove copy relocation and use canonical function
> pointer:
>
> 1. Accesses, including in PIE and non-PIE, to undefined symbols must
> use GOT.
>   a. Linker may optimize out GOT access if the data is defined in PIE or
>   non-PIE.
> 2. Read-only data in the shared library remain read-only at run-time
> 3. Address of global data with the STV_PROTECTED visibility in the shared
> library is the address of data body.
>   a. Can use IP-relative access.
>   b. May need GOT without IP-relative access.
> 4. For systems without function descriptor,
>   a. All global function pointers of undefined functions in PIE and
>   non-PIE must use GOT.  Linker may optimize out GOT access if the
>   function is defined in PIE or non-PIE.
>   b. Function pointer of functions with the STV_PROTECTED visibility in
>   executable and shared library is the address of function body.
>i. Can use IP-relative access.
>ii. May need GOT without IP-relative access.
>iii. Branches to undefined functions may use PLT.
> 5. Single global definition marker:
>
> Add GNU_PROPERTY_1_NEEDED:
>
> #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO
>
> to indicate the needed properties by the object file.
>
> Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:
>
> #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)
>
> to indicate that the object file requires canonical function pointers and
> cannot be used with copy relocation.  This bit should be cleared in
> executable when there are non-GOT or non-PLT relocations in relocatable
> input files without this bit set.
>
>   a. Protected symbol access within the shared library can be treated as
>   local.
>   b. Copy relocation should be disallowed at link-time and run-time.
>   c. GOT function pointer reference is required at link-time and run-time.
>
> The indirect external access marker can be used in the following ways:
>
> 1. Linker can decide the best way to resolve a relocation against a
> protected symbol before seeing all relocations against the symbol.
> 2. Dynamic linker can decide if it is an error to have a copy relocation
> in executable against the protected symbol in a shared library by checking
> if the shared library is built with -fno-direct-extern-access.
>
> Add a compiler option, -fdirect-extern-access. -fdirect-extern-access is
> the default.  With -fno-direct-extern-access:
>
> 1. Always to use GOT to access undefined symbols, including in PIE and
> non-PIE.  This is safe to do and does not break the ABI.
> 2. In executable and shared library, for symbols with the STV_PROTECTED
> visibility:
>   a. The address of data symbol is the address of data body.
>   b. For systems without function descriptor, the function pointer is
>   the address of function body.
> These break the ABI and resulting shared libraries may not be compatible
> with executables which are not compiled with -fno-direct-extern-access.
> 3. Generate an indirect external access marker in relocatable objects if
> supported by linker.
>
> H.J. Lu (2):
>   Add -f[no-]direct-extern-access
>   Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE
>

Hi,

This has been implemented in binutils 2.38 and glibc 2.35.
What do I need to do to get it into GCC 12?

Thanks.

-- 
H.J.


Re: how does vrp2 rearrange this?

2021-10-21 Thread Andrew Pinski via Gcc-patches
On Thu, Oct 21, 2021 at 8:04 AM Andrew MacLeod  wrote:
>
> On 10/19/21 7:13 PM, Andrew Pinski wrote:
> > On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod  wrote:
> >> On 10/19/21 5:13 PM, Andrew Pinski wrote:
> >>> On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches
> >>>  wrote:
>  using testcase ifcvt-4.c:
> 
> 
>  typedef int word __attribute__((mode(word)));
> 
>  word
>  foo (word x, word y, word a)
>  {
>   word i = x;
>   word j = y;
>   /* Try to make taking the branch likely.  */
>   __builtin_expect (x > y, 1);
>   if (x > y)
> {
>   i = a;
>   j = i;
> }
>   return i * j;
> 
> >
> > The testcase is broken anyways.
> > The builtin_expect should be inside the if to have any effect.  Look
> > at the estimated values:
> > if (x_3(D) > y_4(D))
> >   goto ; [50.00%]<<-- has been reversed.
> > else
> >   goto ; [50.00%]
> > ;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
> > locally) (TRUE_VALUE,EXECUTABLE)
> > ;;3 [50.0% (guessed)]  count:536870912 (estimated
> > locally) (FALSE_VALUE,EXECUTABLE)
> >
> > See how it is 50/50?
> > The testcase is not even testing what it says it is testing.  Just
> > happened to work previously does not mean anything.  Move the
> > builtin_expect inside the if and try again. I am shocked it took this
> > long to find the testcase issue really.
> >
> > Thanks,
> > Andrew Pinski
> >
> Moving the expect around doesn't change anything, in fact, it makes it
> worse since fre and evrp immediately eliminate it as true if it is in
> the THEN block.

I think you misunderstood the change I was saying to do.
Try this:
typedef int word __attribute__((mode(word)));

word
foo (word x, word y, word a)
{
  word i = x;
  word j = y;
  /* Try to make taking the branch likely.  */
  if (__builtin_expect (x > y, 1))
{
  i = a;
  j = i;
}
  return i * j;
}
/* { dg-final { scan-rtl-dump "2 true changes made" "ce1" } } */

This should fix the "estimated values" to be more correct.

Thanks,
Andrew Pinski

>
> It looks like it is eliminated by the CDDCE pass:
>
> cddce1 sees:
>
>_1 = x_5(D) > y_7(D);
># RANGE [0, 1] NONZERO 1
>_2 = (long int) _1;
>__builtin_expect (_2, 1);
>if (x_5(D) > y_7(D))
>  goto ; [INV]
>else
>  goto ; [INV]
>
> and proceeds:
>
> Marking useful stmt: if (x_5(D) > y_7(D))
> processing: if (x_5(D) > y_7(D))
> processing: i_3 = PHI 
>
> Eliminating unnecessary statements:
> Deleting : __builtin_expect (_2, 1);
> Deleting : _2 = (long int) _1;
> Deleting : _1 = x_5(D) > y_7(D);
>
> IF we are suppose to reverse the If, it is not obvious to me who is
> suppose to..   You seem to be right that its a crap shot that VRP2 does
> it because there isnt enough info to dictate it.. unless somewhere it
> detects that a THEN targets an empty block which fallthrus to the ELSE
> block should be swapped.   Or maybe you are right and that it  flukeily
> happens due to the ASSERTS being added and removed.
>
> IF i turn of DCE, then this all works like it si ssupopse to.. so maybe
> DCE isnt supopse to remove this?
>
> Andrew
>


  1   2   >