Re: [PATCH, v2, OpenMP 5.2, Fortran] Strictly-structured block support for OpenMP directives
On 2021/10/21 12:15 AM, Jakub Jelinek wrote: +program main + integer :: x, i, n + + !$omp parallel + block +x = x + 1 + end block I'd prefer not to use those x = j or x = x + 1 etc. as statements that do random work here whenever possible. While those are dg-do compile testcases, especially if it is without dg-errors I think it is preferrable not to show bad coding examples. E.g. the x = x + 1 above is wrong for 2 reasons, x is uninitialized before the parallel, and there is a data race, the threads, teams etc. can write to x concurrently. I think better would be to use something like call do_work which doesn't have to be defined anywhere and will just stand there as a black box for unspecified work. + !$omp workshare + block +x = x + 1 + end block There are exceptions though, e.g. workshare is such a case, because e.g. call do_work is not valid in workshare. So, it is ok to keep using x = x + 1 here if you initialize it first at the start of the program. + !$omp workshare + block +x = 1 +!$omp critical +block + x = 3 +end block + end block And then there are cases like the above, please just use different variables there (all initialized) or say an array and access different elements in the different spots. Jakub Thanks, attached is what I finally committed. Chung-Lin From 2e4659199e814b7ee0f6bd925fd2c0a7610da856 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang Date: Thu, 21 Oct 2021 14:56:20 +0800 Subject: [PATCH] openmp: Fortran strictly-structured blocks support This implements strictly-structured blocks support for Fortran, as specified in OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most OpenMP constructs, with a "!$omp end ..." ending directive optional for that form. gcc/fortran/ChangeLog: * decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case together with COMP_BLOCK. * parse.c (parse_omp_structured_block): Change return type to 'gfc_statement', add handling for strictly-structured block case, adjust recursive calls to parse_omp_structured_block. (parse_executable): Adjust calls to parse_omp_structured_block. * parse.h (enum gfc_compile_state): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK. * trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case handling. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/cancel-1.f90: Adjust testcase. * gfortran.dg/gomp/nesting-3.f90: Adjust testcase. * gfortran.dg/gomp/strictly-structured-block-1.f90: New test. * gfortran.dg/gomp/strictly-structured-block-2.f90: New test. * gfortran.dg/gomp/strictly-structured-block-3.f90: New test. libgomp/ChangeLog: * libgomp.texi (Support of strictly structured blocks in Fortran): Adjust to 'Y'. * testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase. --- gcc/fortran/decl.c| 1 + gcc/fortran/parse.c | 69 +- gcc/fortran/parse.h | 2 +- gcc/fortran/trans-openmp.c| 6 +- gcc/testsuite/gfortran.dg/gomp/cancel-1.f90 | 3 + gcc/testsuite/gfortran.dg/gomp/nesting-3.f90 | 20 +- .../gomp/strictly-structured-block-1.f90 | 214 ++ .../gomp/strictly-structured-block-2.f90 | 139 .../gomp/strictly-structured-block-3.f90 | 52 + libgomp/libgomp.texi | 2 +- .../libgomp.fortran/task-reduction-16.f90 | 1 + 11 files changed, 484 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90 diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6784b07ae9e..6043e100fbb 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -8429,6 +8429,7 @@ gfc_match_end (gfc_statement *st) break; case COMP_BLOCK: +case COMP_OMP_STRICTLY_STRUCTURED_BLOCK: *st = ST_END_BLOCK; target = " block"; eos_ok = 0; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 2a454be79b0..b1e73ee6801 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5459,7 +5459,7 @@ parse_oacc_loop (gfc_statement acc_st) /* Parse the statements of an OpenMP structured block. */ -static void +static gfc_statement parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) { gfc_statement st, omp_end_st; @@ -5546,6 +5546,32 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only) gcc_unreachable (); } + bool block_construct = false; + gfc_namespace *my_ns = NULL; + gfc_namespace *my_parent = NULL; + + st = next_statement (); + + if (st == ST_BLOCK) +{ + /* Adjust stat
Re: [PATCH] X86: Add an option -muse-unaligned-vector-move
On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu wrote: > > On Wed, Oct 20, 2021 at 9:58 AM Richard Biener > wrote: > > > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" > > wrote: > > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener > > > wrote: > > >> > > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong wrote: > > >> > > > >> > Many thanks for your explanation. I got the meaning of operands. > > >> > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned > > >> > otherwise it will rise a "Real-Address Mode Exceptions". > > >> > I haven't considered this situation "b(%rip)" has an address > > >> > dependence of "a(%rip)" before. I think this situation could be > > >> > resolved on the assembler side except for this dummy code like "movapd > > >> > 0x200b37(%rip),%xmm1, ... addpd 0x200b37(%rip),%xmm0 ". > > >> > > >> Of course the compiler will only emit instructions which have the > > >> constraint of aligned memory > > >> when the memory is known to be aligned. That's why I wonder why you > > >> would need such > > >> option. "Real-Address Mode Exceptions" may point to the issue, but I > > >> wonder what's different > > >> in real mode vs. protected mode - even with segmentation the alignment > > >> of objects should > > >> prevail unless you play linker"tricks" that make global objects have > > >> different alignment - but > > >> then it's better to adjust the respective hooks to not falsely claim > > >> such alignment. Consider > > >> for example > > >> > > >>if ((uintptr_t)&a & 0x7) > > >> foo(); > > >> else > > >> bar(); > > >> > > >> GCC will optimize the branch statically to always call foo if 'a' > > >> appears to be aligned, > > >> even if you later try to "override" this with an option. Alignment is > > >> not only about > > >> moves, it's also about knowledge about low bits in addresses and about > > >> alias analysis where alignment constrains how two objects can overlap. > > >> > > >> So - do not lie to the compiler! A late "workaround" avoiding aligned > > >> SSE moves isn't a proper fix. > > >> > > > > > >The motivations are > > > > > >1. AVX non-load/store ops work on unaligned memory. Unaligned > > >load/store on aligned memory is as fast as aligned load/store on Intel > > >AVX machines. The new switch makes load/store consistent with > > >other AVX ops. > > >2. We don't properly align the stack for AVX on Windows. This can > > >be used as a workaround for -mavx on Windows. > > > > But this, with lying that the stack is aligned, causes all of the above > > mentioned issues and thus needs to be fixed by either properly aligning the > > stack or not lying to the compiler that we do. > > > > > > > >We can change TARGET_USE_UNALIGNED_VECTOR_MOVE > > >to require AVX. > > > > But such workaround does not make any sense since it does not fix the > > fundamental underlying problem. > > > > There is a long standing desire to remove alignment checking (#AC(0)). > For integer operations, alignment checking is disabled in hardware. > For AVX ops, alignment checking is disabled in hardware for non-load/store > instructions. But we can't disable alignment checking in hardware for > aligned load/store instructions. -muse-unaligned-vector-move implements > disabling alignment checking for all AVX ops. No, it does not - it just emits unaligned moves. The compiler still assumes aligned memory. So whatever reason you have for disabling alignment checking for memory that is known to be aligned, I don't see it. If you want to "fix" broken user code then this doesn't do it. If you want to avoid the penalty for runtime stack alignment then you simply have to change the ABI(?) to not require vector types to have big alignment. Richard. > > -- > H.J.
Re: [PATCH] Try to resolve paths in threader without looking further back.
On Wed, Oct 20, 2021 at 10:01 PM Jeff Law wrote: > > > > On 10/20/2021 9:15 AM, Aldy Hernandez wrote: > > On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor wrote: > > > >> I appreciate the heads up. I'm happy that the threader has > >> improved. I'm obviously not pleased that it has led to regressions > >> in warnings but I understand that in some cases they might be due > >> to limitations in the warning code. I think the test case you have > >> xfailed might be one such example. The uninitialized warnings are > >> exquisitely sensitive to these types of changes. If/when this patch > >> is applied please reopen PR 89230 and reference this commit. > >> > >> Having said that, to maintain the quality of diagnostics, > >> the work that goes into these nice optimizer improvements needs > >> to be balanced by an effort to either update the warning code > >> to cope with the IL changes, or the optimizers need to take care > >> to avoid exposing undefined code that the warnings are designed > >> to detect. I'm concerned not just that the quality of GCC 12 > >> diagnostics has been eroding, but also that it seems to be not > >> just acceptable but expected. > > You make a very good point. It is certainly not my intention to make > > life difficult for the warning maintainers, but I'm afraid I don't > > have sufficient knowledge in the area to improve them. > > > > There may be some low hanging fruit though. At least in the warnings > > that use the ranger, there's no reason to run these passes so late in > > the pipeline. You could run the warning code as early as you want, > > insofar as SSA is available and the CFG has been built. Heck, you may > > even be able to run at -O0, though we may need some sort of value > > numbering. I believe Richi even suggested this a while back. > Running them later in the pipeline is to take advantage of the > optimizers removing dead and unreachable code as much as possible. In > fact, that's critical to -Wuninitialized. Optimizing away unreachable > paths to avoid Wuninitialized false positives has been the major driver > of jump threading improvements for the last 15 years. Ughh, that's unfortunate. We're gonna have to come up with improvements to the Wuninitialized code, or a different paradigm altogether. I'm afraid this will only get worse. It is a bit ironic that jump threading helps reduce Wuninitialized false positives, but yet too much of it causes even more false positives. Aldy
Re: [PATCH] Try to resolve paths in threader without looking further back.
On Wed, Oct 20, 2021 at 10:02 PM Jeff Law via Gcc-patches wrote: > > > > On 10/20/2021 9:15 AM, Aldy Hernandez wrote: > > On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor wrote: > > > >> I appreciate the heads up. I'm happy that the threader has > >> improved. I'm obviously not pleased that it has led to regressions > >> in warnings but I understand that in some cases they might be due > >> to limitations in the warning code. I think the test case you have > >> xfailed might be one such example. The uninitialized warnings are > >> exquisitely sensitive to these types of changes. If/when this patch > >> is applied please reopen PR 89230 and reference this commit. > >> > >> Having said that, to maintain the quality of diagnostics, > >> the work that goes into these nice optimizer improvements needs > >> to be balanced by an effort to either update the warning code > >> to cope with the IL changes, or the optimizers need to take care > >> to avoid exposing undefined code that the warnings are designed > >> to detect. I'm concerned not just that the quality of GCC 12 > >> diagnostics has been eroding, but also that it seems to be not > >> just acceptable but expected. > > You make a very good point. It is certainly not my intention to make > > life difficult for the warning maintainers, but I'm afraid I don't > > have sufficient knowledge in the area to improve them. > > > > There may be some low hanging fruit though. At least in the warnings > > that use the ranger, there's no reason to run these passes so late in > > the pipeline. You could run the warning code as early as you want, > > insofar as SSA is available and the CFG has been built. Heck, you may > > even be able to run at -O0, though we may need some sort of value > > numbering. I believe Richi even suggested this a while back. > Running them later in the pipeline is to take advantage of the > optimizers removing dead and unreachable code as much as possible. In > fact, that's critical to -Wuninitialized. Optimizing away unreachable > paths to avoid Wuninitialized false positives has been the major driver > of jump threading improvements for the last 15 years. Well, the good old idea is to simply queue the diagnostics and only emit them if a stmt (with the location?) gets expanded. Of course the details on what exactly to key the warning on is tricky. Richard. > jeff
Re: [r12-4559 Regression] FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1 on Linux/x86_64
On Thu, Oct 21, 2021 at 3:39 AM sunil.k.pandey via Gcc-patches wrote: > > On Linux/x86_64, > > 914045dff10fbd27de27b90a0ac78a0058b2c86e is the first bad commit > commit 914045dff10fbd27de27b90a0ac78a0058b2c86e > Author: Andre Simoes Dias Vieira > Date: Wed Oct 20 13:12:09 2021 +0100 > > [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under > -ffast-math on aarch64 > > caused > > FAIL: gcc.dg/vect/bb-slp-57.c -flto -ffat-lto-objects scan-tree-dump-times > slp1 "transform load" 1 > FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1 That's because _5 = .TRUNC (_3); is (gdb) p gimple_call_nothrow_p (call_stmt) $2 = false it is vectorizable though. The conversions the call is synthesized from are 'nothrow'. I'll fix that since the nothrow check is irrelevant here. > with GCC configured with > > ../../gcc/configure > --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4559/usr > --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl > --enable-libmpx x86_64-linux --disable-bootstrap > > To reproduce: > > $ cd {build_dir}/gcc && make check > RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-57.c --target_board='unix{-m64\ > -march=cascadelake}'" > > (Please do not reply to this email, for question about this report, contact > me at skpgkp2 at gmail dot com)
Re: [r12-4559 Regression] FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1 on Linux/x86_64
On Thu, Oct 21, 2021 at 9:30 AM Richard Biener wrote: > > On Thu, Oct 21, 2021 at 3:39 AM sunil.k.pandey via Gcc-patches > wrote: > > > > On Linux/x86_64, > > > > 914045dff10fbd27de27b90a0ac78a0058b2c86e is the first bad commit > > commit 914045dff10fbd27de27b90a0ac78a0058b2c86e > > Author: Andre Simoes Dias Vieira > > Date: Wed Oct 20 13:12:09 2021 +0100 > > > > [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under > > -ffast-math on aarch64 > > > > caused > > > > FAIL: gcc.dg/vect/bb-slp-57.c -flto -ffat-lto-objects scan-tree-dump-times > > slp1 "transform load" 1 > > FAIL: gcc.dg/vect/bb-slp-57.c scan-tree-dump-times slp1 "transform load" 1 > > That's because > > _5 = .TRUNC (_3); > > is > > (gdb) p gimple_call_nothrow_p (call_stmt) > $2 = false > > it is vectorizable though. The conversions the call is synthesized > from are 'nothrow'. I'll fix > that since the nothrow check is irrelevant here. For the specific testcase that doesn't help though since there's no optab for V2SFmode as needed by the testcase. > > > with GCC configured with > > > > ../../gcc/configure > > --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4559/usr > > --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld > > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet > > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap > > > > To reproduce: > > > > $ cd {build_dir}/gcc && make check > > RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-57.c --target_board='unix{-m64\ > > -march=cascadelake}'" > > > > (Please do not reply to this email, for question about this report, contact > > me at skpgkp2 at gmail dot com)
Re: [PATCH] Improve maybe_remove_writeonly_store to do a simple DCE for defining statement
On Thu, Oct 21, 2021 at 7:53 AM apinski--- via Gcc-patches wrote: > > From: Andrew Pinski > > Instead of putting a full blow DCE after execute_fixup_cfg, it makes sense > to try to remove the defining statement for the store that is being removed. > Using simple_dce_from_worklist makes this easier, just mark the ssa_name on > the rhs side of the store (if it was one) in a bitmap and then call > simple_dce_from_worklist at the end. > > gcc.dg/pr36902.c needed to be changed such that the static array was no > longer a static array but a global array. This is because this new dce > will remove the load as it is dead. I also filed PR 102864 for the warning > on dead loads. OK. Thanks, Richard. > gcc/ChangeLog: > > * tree-cfg.c (maybe_remove_writeonly_store): Add dce_ssa_names > argument. > Mark the ssa-name of the rhs as one to be removed. > (execute_fixup_cfg): Update call to maybe_remove_writeonly_store. > Call simple_dce_from_worklist at the end to a simple dce. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr36902.c: Move buf to be a non-static variable. > --- > gcc/testsuite/gcc.dg/pr36902.c | 5 + > gcc/tree-cfg.c | 18 -- > 2 files changed, 17 insertions(+), 6 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/pr36902.c b/gcc/testsuite/gcc.dg/pr36902.c > index 7dafc9ac171..365a26e26b7 100644 > --- a/gcc/testsuite/gcc.dg/pr36902.c > +++ b/gcc/testsuite/gcc.dg/pr36902.c > @@ -24,10 +24,9 @@ struct { >unsigned char pcr_select[4]; > } sel; > > +unsigned char buf[64]; > int bar(void) > { > - static unsigned char buf[64]; > - >sel.size_of_select = 3; >foo(buf, sel.pcr_select, sel.size_of_select); > > @@ -52,8 +51,6 @@ foo2(unsigned char * to, const unsigned char * from, int n) > > int baz(void) > { > - static unsigned char buf[64]; > - >sel.size_of_select = 5; >foo2(buf, sel.pcr_select, sel.size_of_select); > > diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c > index dbbf6beb6e4..b3a27bcd17c 100644 > --- a/gcc/tree-cfg.c > +++ b/gcc/tree-cfg.c > @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3. If not see > #include "value-prof.h" > #include "tree-inline.h" > #include "tree-ssa-live.h" > +#include "tree-ssa-dce.h" > #include "omp-general.h" > #include "omp-expand.h" > #include "tree-cfgcleanup.h" > @@ -9669,7 +9670,8 @@ make_pass_warn_unused_result (gcc::context *ctxt) > /* Maybe Remove stores to variables we marked write-only. > Return true if a store was removed. */ > static bool > -maybe_remove_writeonly_store (gimple_stmt_iterator &gsi, gimple *stmt) > +maybe_remove_writeonly_store (gimple_stmt_iterator &gsi, gimple *stmt, > + bitmap dce_ssa_names) > { >/* Keep access when store has side effect, i.e. in case when source > is volatile. */ > @@ -9692,6 +9694,15 @@ maybe_remove_writeonly_store (gimple_stmt_iterator > &gsi, gimple *stmt) >print_gimple_stmt (dump_file, stmt, 0, > TDF_VOPS|TDF_MEMSYMS); > } > + > + /* Mark ssa name defining to be checked for simple dce. */ > + if (gimple_assign_single_p (stmt)) > +{ > + tree rhs = gimple_assign_rhs1 (stmt); > + if (TREE_CODE (rhs) == SSA_NAME > + && !SSA_NAME_IS_DEFAULT_DEF (rhs)) > + bitmap_set_bit (dce_ssa_names, SSA_NAME_VERSION (rhs)); > +} >unlink_stmt_vdef (stmt); >gsi_remove (&gsi, true); >release_defs (stmt); > @@ -9714,6 +9725,7 @@ execute_fixup_cfg (void) >profile_count num = node->count; >profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; >bool scale = num.initialized_p () && !(num == den); > + auto_bitmap dce_ssa_names; > >if (scale) > { > @@ -9754,7 +9766,7 @@ execute_fixup_cfg (void) > } > > /* Remove stores to variables we marked write-only. */ > - if (maybe_remove_writeonly_store (gsi, stmt)) > + if (maybe_remove_writeonly_store (gsi, stmt, dce_ssa_names)) > { > todo |= TODO_update_ssa | TODO_cleanup_cfg; > continue; > @@ -9820,6 +9832,8 @@ execute_fixup_cfg (void) >&& (todo & TODO_cleanup_cfg)) > loops_state_set (LOOPS_NEED_FIXUP); > > + simple_dce_from_worklist (dce_ssa_names); > + >return todo; > } > > -- > 2.17.1 >
Re: [PATCH] Convert strlen pass from evrp to ranger.
> Massaging the IL should only take two forms IIRC. > > First, if we have a simplification we can do. That could be const/copy > propagation, replacing an expression with an SSA_NAME or constant and > the like. It doesn't massage the IL just to massage the IL. > > Second, we do temporarily copy propagate the current known values of an > SSA name into use points and then see if that allows us to determine if > a statement is already in the hash tables. But we undo that so that > nobody should see that temporary change in state. > > Finally, it does create some expressions & statements on the fly to > enter them into the tables. For example, if it sees a store, it'll > create a load with the source & dest interchanged and enter that into > the expression table. But none of this stuff ever shows up in the IL. > It's just to create entries in the expression tables. > > So ITSM the only real concern would be if those temporary const/copy > propagations were still in the IL and we called back into Ranger and it > poked at that data somehow. Hmmm, this is all very good insight. Thanks. One thing that would have to be adjusted then is remove the enable_ranger() call in the patch. This sets a global ranger, and there are users of get_range_query() that will use it to get on-demand ranges. One such use that I added was ssa_name_has_boolean_range in tree-ssa-dom.c. This means that if the IL has been temporarily changed, this function can and will use the global ranger. The alternative here would be to just create a new local ranger: - gimple_ranger *ranger = enable_ranger (fun); + gimple_ranger *ranger = new gimple_ranger; and then obviously deallocate it at the disable_ranger call site. This will cause any users of get_range_query() in the compiler to just use global ranges. Hopefully, none of these downstream users of get_range_query() from DOM need context sensitive results. ssa_name_has_boolean_range?? I think what you'd need to do is check that there are no calls to the ranger from cprop_into_stmt (?? this is the place where IL changes??), until wherever the undoing happens (I couldn't find it). I see a call to simplify_using_ranges in optimize_stmt that looks like it could be called with the IL in mid-flight. Maybe this call needs to happen before the IL is altered? > So if we're referring to those temporary const/copy propagations > "escaping" into Ranger, then I would fully expect that to cause > problems. Essentially they're path sensitive const/copy propagations > and may not be valid on all the paths through the CFG to the statement > where the propagation occurs Yeah. disabling the global ranger should help, plus making sure you don't use the ranger in the midst of the path sensitive changes. Aldy
[PATCH] Handle jobserver file descriptors in btest.
Hi. The patch is about sensitive handling of file descriptors opened by make's jobserver. Ready for master? Thanks, Martin PR testsuite/102742 libbacktrace/ChangeLog: * btest.c (check_open_files): Ignore file descriptors provided by jobserver. --- libbacktrace/btest.c | 16 1 file changed, 16 insertions(+) diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c index 9f9c03babf3..a1ea93728be 100644 --- a/libbacktrace/btest.c +++ b/libbacktrace/btest.c @@ -464,9 +464,25 @@ static void check_open_files (void) { int i; + int rfd = -1; + int wfd = -1; + +#define JS_NEEDLE "--jobserver-auth=" + + /* Ignore file descriptors provided by jobserver. */ + const char *makeflags = getenv ("MAKEFLAGS"); + if (makeflags != NULL) +{ + const char *n = strstr (makeflags, JS_NEEDLE); + if (n != NULL) + sscanf (n + strlen (JS_NEEDLE), "%d,%d", &rfd, &wfd); +} for (i = 3; i < 10; i++) { + if (i == rfd || i == wfd) + continue; + if (close (i) == 0) { fprintf (stderr, -- 2.33.1
[PATCH] testsuite/102861 - adjust gcc.dg/vect/bb-slp-16.c change
This reverts the bogus previous change causing runtime failures and instead realizes that we now have the loop condition if-converted and the BB vectorization opportunity realized during the loop vectorization pass. Tested on x86_64-unknown-linux-gnu, pushed. 2021-10-21 Richard Biener PR testsuite/102861 * gcc.dg/vect/bb-slp-16.c: Revert previous change, scan the vect dump instead. --- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 73 --- 1 file changed, 32 insertions(+), 41 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 4fc176dde84..82fae06e324 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,4 +1,6 @@ /* { dg-require-effective-target vect_int } */ +/* The SLP vectorization happens as part of the if-converted loop body. */ +/* { dg-additional-options "-fdump-tree-vect-details" } */ #include #include "tree-vect.h" @@ -16,52 +18,41 @@ main1 (int dummy) unsigned int *pin = &in[0]; unsigned int *pout = &out[0]; unsigned int a = 0; - - i = N; - if (i > 0) + + for (i = 0; i < N; i++) { - do - { - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - *pout++ = *pin++ + a; - if (arr[i] = i) - a = i; - else - a = 2; - } - while (i < N); + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + *pout++ = *pin++ + a; + if (arr[i] = i) +a = i; + else +a = 2; } a = 0; - /* check results: */ - i = N; - if (i > 0) + /* check results: */ + for (i = 0; i < N; i++) { - do - { - if (out[i*8] != in[i*8] + a - || out[i*8 + 1] != in[i*8 + 1] + a - || out[i*8 + 2] != in[i*8 + 2] + a - || out[i*8 + 3] != in[i*8 + 3] + a - || out[i*8 + 4] != in[i*8 + 4] + a - || out[i*8 + 5] != in[i*8 + 5] + a - || out[i*8 + 6] != in[i*8 + 6] + a - || out[i*8 + 7] != in[i*8 + 7] + a) - abort (); + if (out[i*8] != in[i*8] + a + || out[i*8 + 1] != in[i*8 + 1] + a + || out[i*8 + 2] != in[i*8 + 2] + a + || out[i*8 + 3] != in[i*8 + 3] + a + || out[i*8 + 4] != in[i*8 + 4] + a + || out[i*8 + 5] != in[i*8 + 5] + a + || out[i*8 + 6] != in[i*8 + 6] + a + || out[i*8 + 7] != in[i*8 + 7] + a) + abort (); - if (arr[i] = i) - a = i; - else - a = 2; - i++; - } - while (i < N); + if (arr[i] = i) +a = i; + else +a = 2; } return 0; @@ -76,4 +67,4 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */ +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "vect" } } */ -- 2.31.1
[COMMITTED] Avoid threading circular paths.
The backward threader keeps a hash of visited blocks to avoid crossing the same block twice. Interestingly, we haven't been checking it for the final block out of the path. This may be inherited from the old code, as it was simple enough that it didn't matter. With the upcoming changes enabling the fully resolving threader, it gets tripped often enough to cause wrong code to be generated. Tested on x86-64 Linux. Committed as obvious. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader::maybe_register_path): Avoid threading circular paths. --- gcc/tree-ssa-threadbackward.c | 4 1 file changed, 4 insertions(+) diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index d94e3b962db..38913b06717 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -140,6 +140,10 @@ back_threader::maybe_register_path () if (taken_edge && taken_edge != UNREACHABLE_EDGE) { + // Avoid circular paths. + if (m_visited_bbs.contains (taken_edge->dest)) + return UNREACHABLE_EDGE; + bool irreducible = false; bool profitable = m_profit.profitable_path_p (m_path, m_name, taken_edge, &irreducible); -- 2.31.1
[PATCH] Remove restriction of SLP vectorizing internal function calls
We already checked for unsupported internal throwing calls, general nothrow is not required. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-21 Richard Biener * tree-vect-slp.c (vect_build_slp_tree_1): Remove superfluous gimple_call_nothrow_p check. --- gcc/tree-vect-slp.c | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 709bcb63686..eff46405e87 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -975,7 +975,6 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, (gimple_call_internal_fn (call_stmt || gimple_call_tail_p (call_stmt) || gimple_call_noreturn_p (call_stmt) - || !gimple_call_nothrow_p (call_stmt) || gimple_call_chain (call_stmt)) { if (dump_enabled_p ()) -- 2.31.1
[committed] openmp: For default(none) ignore variables created by ubsan_create_data [PR64888]
Hi! We weren't ignoring the ubsan variables created by c-ubsan.c before gimplification (others are added later). One way to fix this would be to introduce further UBSAN_ internal functions and lower it later (sanopt pass) like other ifns, this patch instead recognizes those magic vars by name/name of type and DECL_ARTIFICIAL and TYPE_ARTIFICIAL. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2021-10-21 Jakub Jelinek PR middle-end/64888 gcc/c-family/ * c-omp.c (c_omp_predefined_variable): Return true also for ubsan_create_data created artificial variables. gcc/testsuite/ * c-c++-common/ubsan/pr64888.c: New test. --- gcc/c-family/c-omp.c.jj 2021-10-15 11:59:15.753689459 +0200 +++ gcc/c-family/c-omp.c2021-10-20 12:25:44.557498097 +0200 @@ -2860,13 +2860,44 @@ c_omp_predefined_variable (tree decl) { if (VAR_P (decl) && DECL_ARTIFICIAL (decl) - && TREE_READONLY (decl) && TREE_STATIC (decl) - && DECL_NAME (decl) - && (DECL_NAME (decl) == ridpointers[RID_C99_FUNCTION_NAME] - || DECL_NAME (decl) == ridpointers[RID_FUNCTION_NAME] - || DECL_NAME (decl) == ridpointers[RID_PRETTY_FUNCTION_NAME])) -return true; + && DECL_NAME (decl)) +{ + if (TREE_READONLY (decl) + && (DECL_NAME (decl) == ridpointers[RID_C99_FUNCTION_NAME] + || DECL_NAME (decl) == ridpointers[RID_FUNCTION_NAME] + || DECL_NAME (decl) == ridpointers[RID_PRETTY_FUNCTION_NAME])) + return true; + /* For UBSan handle the same also ubsan_create_data created +variables. There is no magic flag for those, but user variables +shouldn't be DECL_ARTIFICIAL or have TYPE_ARTIFICIAL type with +such names. */ + if ((flag_sanitize & (SANITIZE_UNDEFINED + | SANITIZE_UNDEFINED_NONDEFAULT)) != 0 + && DECL_IGNORED_P (decl) + && !TREE_READONLY (decl) + && TREE_CODE (DECL_NAME (decl)) == IDENTIFIER_NODE + && TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE + && TYPE_ARTIFICIAL (TREE_TYPE (decl)) + && TYPE_NAME (TREE_TYPE (decl)) + && TREE_CODE (TYPE_NAME (TREE_TYPE (decl))) == TYPE_DECL + && DECL_NAME (TYPE_NAME (TREE_TYPE (decl))) + && (TREE_CODE (DECL_NAME (TYPE_NAME (TREE_TYPE (decl + == IDENTIFIER_NODE)) + { + tree id1 = DECL_NAME (decl); + tree id2 = DECL_NAME (TYPE_NAME (TREE_TYPE (decl))); + if (IDENTIFIER_LENGTH (id1) >= sizeof ("ubsan_data") - 1 + && IDENTIFIER_LENGTH (id2) >= sizeof ("__ubsan__data") + && !memcmp (IDENTIFIER_POINTER (id2), "__ubsan_", + sizeof ("__ubsan_") - 1) + && !memcmp (IDENTIFIER_POINTER (id2) + IDENTIFIER_LENGTH (id2) + - sizeof ("_data") + 1, "_data", + sizeof ("_data") - 1) + && strstr (IDENTIFIER_POINTER (id1), "ubsan_data")) + return true; + } +} return false; } --- gcc/testsuite/c-c++-common/ubsan/pr64888.c.jj 2021-10-20 11:48:18.715240008 +0200 +++ gcc/testsuite/c-c++-common/ubsan/pr64888.c 2021-10-20 11:47:57.670539595 +0200 @@ -0,0 +1,27 @@ +/* PR middle-end/64888 */ +/* { dg-do compile { target fopenmp } } */ +/* { dg-options "-fopenmp -fsanitize=undefined" } */ + +int a, b; + +void +foo () +{ + int c; +#pragma omp parallel default (none) shared (a, b) private (c) + { +c = a / b; /* { dg-bogus "not specified in enclosing" } */ +(void) c; + } +#pragma omp task default (none) shared (a, b) private (c) + { +c = a << b;/* { dg-bogus "not specified in enclosing" } */ +(void) c; + } +#pragma omp teams default (none) shared (a, b) + { +int d[a]; /* { dg-bogus "not specified in enclosing" } */ +d[0] = 0; +(void) d[0]; + } +} Jakub
[PATCH] c++: Implement DR2351 - void{} [PR102820]
Hi! Here is an attempt to implement DR2351 - void{} - where void{} after pack expansion is considered valid and the same thing as void(). For templates, dunno if we have some better way to check if a CONSTRUCTOR might be empty after pack expansion. Would that only if the constructor only contains EXPR_PACK_EXPANSION elements and nothing else, or something else too? With the patch as is we wouldn't diagnose template void bar (T... t) { void{1, t...}; } at parsing time, only at instantiation time, even when it will always expand to at least one CONSTRUCTOR elt. Bootstrapped/regtested on x86_64-linux and i686-linux. 2021-10-21 Jakub Jelinek PR c++/102820 * semantics.c (finish_compound_literal): Implement DR2351 - void{}. If type is cv void and compound_literal has no elements, return void_node. If type is cv void and compound_literal is instantiation dependent, handle it like other dependent compound literals. * g++.dg/cpp0x/dr2351.C: New test. --- gcc/cp/semantics.c.jj 2021-10-15 11:58:45.079131947 +0200 +++ gcc/cp/semantics.c 2021-10-20 17:00:38.586705600 +0200 @@ -3104,9 +3104,20 @@ finish_compound_literal (tree type, tree if (!TYPE_OBJ_P (type)) { - if (complain & tf_error) - error ("compound literal of non-object type %qT", type); - return error_mark_node; + /* DR2351 */ + if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0) + return void_node; + else if (VOID_TYPE_P (type) + && processing_template_decl + && instantiation_dependent_expression_p (compound_literal)) + /* If there are packs in compound_literal, it could + be void{} after pack expansion. */; + else + { + if (complain & tf_error) + error ("compound literal of non-object type %qT", type); + return error_mark_node; + } } if (template_placeholder_p (type)) --- gcc/testsuite/g++.dg/cpp0x/dr2351.C.jj 2021-10-20 17:06:02.399162937 +0200 +++ gcc/testsuite/g++.dg/cpp0x/dr2351.C 2021-10-20 17:05:54.294276511 +0200 @@ -0,0 +1,36 @@ +// DR2351 +// { dg-do compile { target c++11 } } + +void +foo () +{ + void{}; + void(); +} + +template +void +bar (T... t) +{ + void{t...}; + void(t...); +} + +void +baz () +{ + bar (); +} + +template +void +qux (T... t) +{ + void{t...}; // { dg-error "compound literal of non-object type" } +} + +void +corge () +{ + qux (1, 2); +} Jakub
Re: [PATCH] Fix loop split incorrect count and probability
On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote: > > > On 2021/9/23 20:17, Richard Biener wrote: >> On Wed, 22 Sep 2021, Xionghu Luo wrote: >> >>> >>> >>> On 2021/8/11 17:16, Richard Biener wrote: On Wed, 11 Aug 2021, Xionghu Luo wrote: > > > On 2021/8/10 22:47, Richard Biener wrote: >> On Mon, 9 Aug 2021, Xionghu Luo wrote: >> >>> Thanks, >>> >>> On 2021/8/6 19:46, Richard Biener wrote: On Tue, 3 Aug 2021, Xionghu Luo wrote: > loop split condition is moved between loop1 and loop2, the split bb's > count and probability should also be duplicated instead of (100% vs > INV), > secondly, the original loop1 and loop2 count need be propotional from > the > original loop. > > > diff base/loop-cond-split-1.c.151t.lsplit > patched/loop-cond-split-1.c.151t.lsplit: > ... > int prephitmp_16; > int prephitmp_25; > >[local count: 118111600]: > if (n_7(D) > 0) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 118111600]: > return; > >[local count: 105119324]: > pretmp_3 = ga; > > - [local count: 955630225]: > + [local count: 315357973]: > # i_13 = PHI > # prephitmp_12 = PHI > if (prephitmp_12 != 0) > goto ; [33.00%] > else > goto ; [67.00%] > > - [local count: 315357972]: > + [local count: 104068130]: > _2 = do_something (); > ga = _2; > > - [local count: 955630225]: > + [local count: 315357973]: > # prephitmp_5 = PHI > i_10 = inc (i_13); > if (n_7(D) > i_10) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > goto ; [100.00%] > > - [local count: 850510901]: > + [local count: 280668596]: > if (prephitmp_12 != 0) > -goto ; [100.00%] > +goto ; [33.00%] > else > -goto ; [INV] > +goto ; [67.00%] > > - [local count: 850510901]: > + [local count: 280668596]: > goto ; [100.00%] > > - [count: 0]: > + [local count: 70429947]: > # i_23 = PHI > # prephitmp_25 = PHI > > - [local count: 955630225]: > + [local count: 640272252]: > # i_15 = PHI > # prephitmp_16 = PHI > i_22 = inc (i_15); > if (n_7(D) > i_22) > goto ; [89.00%] > else > goto ; [11.00%] > > - [local count: 850510901]: > + [local count: 569842305]: > goto ; [100.00%] > > } > > gcc/ChangeLog: > > * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > (do_split_loop_on_cond): Likewise. > --- > gcc/tree-ssa-loop-split.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > index 3a09bbc39e5..8e5a7ded0f7 100644 > --- a/gcc/tree-ssa-loop-split.c > +++ b/gcc/tree-ssa-loop-split.c > @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > basic_block cond_bb; >>> >>> if (!initial_true) >>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>> + >>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE >>> + ? EDGE_SUCC (bbs[i], 0) >>> + : EDGE_SUCC (bbs[i], 1); >>> > > class loop *loop2 = loop_version (loop1, cond, &cond_bb, > -profile_probability::always > (), > -profile_probability::always > (), > -profile_probability::always > (), > -profile_probability::always > (), > +true_edge->probability, > + > true_edge->probability.invert (), > +true_edge->probability, > + > true_edge->probability.invert (), > true); there is no 'true_edge' variable at t
Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F
On Thu, Oct 21, 2021 at 1:53 PM Hongyu Wang wrote: > > Yes, updated patch. > > gcc/ChangeLog: > PR target/102812 > * config/i386/i386.c (ix86_get_ssemov): Adjust HFmode vector > move to use the same logic as HImode. > > gcc/testsuite/ChangeLog: > PR target/102812 > * gcc.target/i386/pr102812.c: New test. > --- > gcc/config/i386/i386.c | 15 --- > gcc/testsuite/gcc.target/i386/pr102812.c | 12 > 2 files changed, 24 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr102812.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 9cc903e826b..159684ce549 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -5399,9 +5399,18 @@ ix86_get_ssemov (rtx *operands, unsigned size, >switch (scalar_mode) > { > case E_HFmode: > - opcode = (misaligned_p > - ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") > - : "vmovdqa64"); > + if (evex_reg_p) > + opcode = (misaligned_p > + ? (TARGET_AVX512BW > + ? "vmovdqu16" > + : "vmovdqu64") > + : "vmovdqa64"); > + else > + opcode = (misaligned_p > + ? (TARGET_AVX512BW > + ? "vmovdqu16" > + : "%vmovdqu") > + : "%vmovdqa"); > break; Assume gmail has swallow "tab", and there's no indent issue in the orinal code. if that, LGTM. > case E_SFmode: > opcode = misaligned_p ? "%vmovups" : "%vmovaps"; > diff --git a/gcc/testsuite/gcc.target/i386/pr102812.c > b/gcc/testsuite/gcc.target/i386/pr102812.c > new file mode 100644 > index 000..bad4fa9394e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr102812.c > @@ -0,0 +1,12 @@ > +/* PR target/102812 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse4 -mno-avx" } */ > +/* { dg-final { scan-assembler-not "vmovdqa64\t" } } */ > +/* { dg-final { scan-assembler "movdqa\t" } } */ > + > +typedef _Float16 v8hf __attribute__((__vector_size__ (16))); > + > +v8hf t (_Float16 a) > +{ > +return (v8hf) {a, 0, 0, 0, 0, 0, 0, 0}; > +} > -- > 2.18.1 > > Hongtao Liu via Gcc-patches 于2021年10月21日周四 下午1:24写道: > > > > On Wed, Oct 20, 2021 at 1:31 PM Hongyu Wang via Gcc-patches > > wrote: > > > > > > Since _Float16 type is enabled under sse2 target, returning > > > V8HFmode vector without AVX512F target would generate wrong > > > vmovdqa64 instruction. Adjust ix86_get_ssemov to avoid this. > > > > > > Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. > > > > > > OK for master? > > > > > > gcc/ChangeLog: > > > PR target/102812 > > > * config/i386/i386.c (ix86_get_ssemov): Adjust HFmode vector > > > move without AVX512F target. > > > > > > gcc/testsuite/ChangeLog: > > > PR target/102812 > > > * gcc.target/i386/pr102812.c: New test. > > > --- > > > gcc/config/i386/i386.c | 9 ++--- > > > gcc/testsuite/gcc.target/i386/pr102812.c | 12 > > > 2 files changed, 18 insertions(+), 3 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr102812.c > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > > > index 9cc903e826b..1d79180da9a 100644 > > > --- a/gcc/config/i386/i386.c > > > +++ b/gcc/config/i386/i386.c > > > @@ -5399,9 +5399,12 @@ ix86_get_ssemov (rtx *operands, unsigned size, > > >switch (scalar_mode) > > > { > > > case E_HFmode: > > > - opcode = (misaligned_p > > > - ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") > > > - : "vmovdqa64"); > > > + if (!TARGET_AVX512F) > > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; > > > + else > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") > > > + : "vmovdqa64"); > > > break; > > Could we just use similar logic as HI? > > > > case E_HImode: > > if (evex_reg_p) > > opcode = (need_unaligned_p > > ? (TARGET_AVX512BW > > ? "vmovdqu16" > > : "vmovdqu64") > > : "vmovdqa64"); > > else > > opcode = (need_unaligned_p > > ? (TARGET_AVX512BW > > ? "vmovdqu16" > > : "%vmovdqu") > > : "%vmovdqa"); > > break; > > > > > case E_SFmode: > > > opcode = misaligned_p ? "%vmovups" : "%vmovaps"; > > > diff --git a/gcc/testsuite/gcc.target/i386/pr102812.c > > > b/gcc/testsuite/gcc.target/i386/pr102812.c > > > new file mode 100644 > > > index 000..bad4fa9394e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr102812.c > > > @@ -0,0 +1,12 @@ > > > +/* PR target/102812 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -msse4 -mno-avx" } */ > > > +/* { dg-final { scan-assembler-not "vmovdqa64\t" } } */ > > > +/* { dg-final { scan-assemble
[committed] testsuite: Fix up gfortran.dg/gomp/strictly*.f90 testcases
Hi! I meant it in the other testcases too: While these testcases are dg-do compile only, I think it is better not to give users bad examples and avoid unnecessary data races in testcases (unless it is exactly what we want to test). Perhaps one day we'll do some analysis and warn about data races... Tested on x86_64-linux, committed to trunk. 2021-10-21 Jakub Jelinek * gfortran.dg/gomp/strictly-structured-block-1.f90: Use call do_work instead of x = x + 1 in places where the latter could be a data race. * gfortran.dg/gomp/strictly-structured-block-2.f90: Likewise. * gfortran.dg/gomp/strictly-structured-block-3.f90: Likewise. --- gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90.jj 2021-10-21 10:23:27.544833287 +0200 +++ gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-1.f90 2021-10-21 11:03:41.146041455 +0200 @@ -9,23 +9,23 @@ program main !$omp parallel block -x = x + 1 +call do_work end block !$omp parallel block -x = x + 1 +call do_work end block !$omp end parallel !$omp teams block -x = x + 1 +call do_work end block !$omp teams block -x = x + 1 +call do_work end block !$omp end teams @@ -42,12 +42,12 @@ program main !$omp scope block -x = x + 1 +call do_work end block !$omp scope block -x = x + 1 +call do_work end block !$omp end scope @@ -75,12 +75,12 @@ program main !$omp task block -x = x + 1 +call do_work end block !$omp task block -x = x + 1 +call do_work end block !$omp end task @@ -130,23 +130,23 @@ program main !$omp target parallel block -x = x + 1 +call do_work end block !$omp target parallel block -x = x + 1 +call do_work end block !$omp end target parallel !$omp target teams block -x = x + 1 +call do_work end block !$omp target teams block -x = x + 1 +call do_work end block !$omp end target teams @@ -176,7 +176,7 @@ program main do i = 1, n !$omp ordered block - x = x + 1 + call do_work end block end do @@ -184,7 +184,7 @@ program main do i = 1, n !$omp ordered block - x = x + 1 + call do_work end block !$omp end ordered end do --- gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90.jj 2021-10-21 10:23:27.544833287 +0200 +++ gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-2.f90 2021-10-21 11:05:04.326876447 +0200 @@ -9,16 +9,16 @@ program main !$omp parallel block -x = x + 1 +call do_work end block - x = x + 1 + call do_work !$omp end parallel ! { dg-error "Unexpected !.OMP END PARALLEL statement" } !$omp teams block -x = x + 1 +call do_work end block - x = x + 1 + call do_work !$omp end teams ! { dg-error "Unexpected !.OMP END TEAMS statement" } !$omp masked @@ -30,9 +30,9 @@ program main !$omp scope block -x = x + 1 +call do_work end block - x = x + 1 + call do_work !$omp end scope ! { dg-error "Unexpected !.OMP END SCOPE statement" } !$omp single @@ -51,9 +51,9 @@ program main !$omp task block -x = x + 1 +call do_work end block - x = x + 1 + call do_work !$omp end task ! { dg-error "Unexpected !.OMP END TASK statement" } !$omp target data map(x) @@ -86,16 +86,16 @@ program main !$omp target parallel block -x = x + 1 +call do_work end block - x = x + 1 + call do_work !$omp end target parallel ! { dg-error "Unexpected !.OMP END TARGET PARALLEL statement" } !$omp target teams block -x = x + 1 +call do_work end block - x = x + 1 + call do_work !$omp end target teams ! { dg-error "Unexpected !.OMP END TARGET TEAMS statement" } !$omp critical @@ -116,9 +116,9 @@ program main do i = 1, n !$omp ordered block - x = x + 1 + call do_work end block - x = x + 1 + call do_work !$omp end ordered ! { dg-error "Unexpected !.OMP END ORDERED statement" } end do --- gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90.jj 2021-10-21 10:23:27.544833287 +0200 +++ gcc/testsuite/gfortran.dg/gomp/strictly-structured-block-3.f90 2021-10-21 11:07:21.441956054 +0200 @@ -10,7 +10,7 @@ program main !$omp parallel !$omp parallel block - x = x + 1 + call do_work end block !$omp end parallel !$omp end parallel @@ -32,7 +32,7 @@ program main end block x = x + 2 !$omp section -x = x + 4 +call do_work end block !$omp sections Jakub
[PATCH] tree-optimization/102847 - properly cost VMAT_INVARIANT loads
The following adds proper costing of VMAT_INVARIANT loads, avoiding to ask the target about the cost of an unsupported vector load cost which we won't emit anyway. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-21 Richard Biener PR tree-optimization/102847 * tree-vect-stmts.c (vect_model_load_cost): Explicitely handle VMAT_INVARIANT as a splat in the prologue. --- gcc/tree-vect-stmts.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 9cbc1af4cc9..8f527452bd0 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1191,6 +1191,13 @@ vect_model_load_cost (vec_info *vinfo, ncopies * assumed_nunits, scalar_load, stmt_info, 0, vect_body); } + else if (memory_access_type == VMAT_INVARIANT) +{ + /* Invariant loads will ideally be hoisted and splat to a vector. */ + prologue_cost += record_stmt_cost (cost_vec, 1, +scalar_to_vec, stmt_info, 0, +vect_prologue); +} else vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, misalignment, first_stmt_p, -- 2.31.1
Re: [PATCH] options: Fix variable tracking option processing.
On Wed, Oct 20, 2021 at 10:51 AM Martin Liška wrote: > > On 10/19/21 12:53, Richard Biener wrote: > > Meh ... :/ > > > > Well, move the target override hook call down (try to shuffle things > > so diagnostics happen after but > > "inits" happen before). > > Not so easy. There are direct usages of the hooks > (influences dwarf2out_as_loc_support and dwarf2out_as_locview_support) > >if (!OPTION_SET_P (dwarf2out_as_loc_support)) > dwarf2out_as_loc_support = dwarf2out_default_as_loc_support (); >if (!OPTION_SET_P (dwarf2out_as_locview_support)) > dwarf2out_as_locview_support = dwarf2out_default_as_locview_support (); > >if (!OPTION_SET_P (debug_variable_location_views)) > { >debug_variable_location_views > = (flag_var_tracking >&& debug_info_level >= DINFO_LEVEL_NORMAL >&& dwarf_debuginfo_p () >&& !dwarf_strict >&& dwarf2out_as_loc_support >&& dwarf2out_as_locview_support); > } > > and then the warnings depend on debug_variable_location_views. > > I have another attempt which is about moving option detection of > debug_nonbind_markers_p > to finish_options. That works fine, except one needs to mark the option as > PerFunction. > That's because it depends on 'optimize' and that would trigger: > 'global_options are modified in local context' verification error. That looks like a sensible change anyway - options that depend on options that are per function have to be per function as well. It also depends on flag_selective_scheduling. But note the checks now happen before if (flag_syntax_only) { write_symbols = NO_DEBUG; profile_flag = 0; } if (flag_gtoggle) { if (debug_info_level == DINFO_LEVEL_NONE) { debug_info_level = DINFO_LEVEL_NORMAL; if (write_symbols == NO_DEBUG) write_symbols = PREFERRED_DEBUGGING_TYPE; } else debug_info_level = DINFO_LEVEL_NONE; } which previously affected debug_nonbind_markers_p. I think it makes sense to move the above to finish_options as well. I suppose -help doesn't correctly dump the -g enabled state for -gtoggle at the moment? Richard. > > What do you think about the patch? > Cheers, > Martin
[PATCH] tree-optimization/102847 - adjust VMAT_INVARIANT load costing
This adds the missing scalar load cost in the prologue. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. 2021-10-21 Richard Biener PR tree-optimization/102847 * tree-vect-stmts.c (vect_model_load_cost): Add the scalar load cost in the prologue for VMAT_INVARIANT. --- gcc/tree-vect-stmts.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 8f527452bd0..c28c9370655 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1194,6 +1194,9 @@ vect_model_load_cost (vec_info *vinfo, else if (memory_access_type == VMAT_INVARIANT) { /* Invariant loads will ideally be hoisted and splat to a vector. */ + prologue_cost += record_stmt_cost (cost_vec, 1, +scalar_load, stmt_info, 0, +vect_prologue); prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info, 0, vect_prologue); -- 2.31.1
[PATCH] Move the initial debug_hooks setting
I just realized that when I moved the langhook call I failed to move the initial debug_hooks setting whose comment mentions the langhook as reason. Bootstrap & regtest in progress on x86_64-unknown-linux-gnu. 2021-10-21 Richard Biener * toplev.c (process_options): Move the initial debug_hooks setting ... (toplev::main): ... before the call of the post_options langhook. --- gcc/toplev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/toplev.c b/gcc/toplev.c index cb4f8c470f0..67fb71612d5 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1227,9 +1227,6 @@ static void process_options (bool no_backend) { const char *language_string = lang_hooks.name; - /* Just in case lang_hooks.post_options ends up calling a debug_hook. - This can happen with incorrect pre-processed input. */ - debug_hooks = &do_nothing_debug_hooks; maximum_field_alignment = initial_max_fld_align * BITS_PER_UNIT; @@ -2338,6 +2335,9 @@ toplev::main (int argc, char **argv) /* Exit early if we can (e.g. -help). */ if (!exit_after_options) { + /* Just in case lang_hooks.post_options ends up calling a debug_hook. +This can happen with incorrect pre-processed input. */ + debug_hooks = &do_nothing_debug_hooks; /* Allow the front end to perform consistency checks and do further initialization based on the command line options. This hook also sets the original filename if appropriate (e.g. foo.i -> foo.c) -- 2.31.1
Re: [PATCH] Try to resolve paths in threader without looking further back.
On Wed, Oct 20, 2021 at 10:19 PM Jeff Law wrote: > So we want to keep some form of ssa-dom-thread-7. That' s the canonical > testcase for the case for the FSM optimization. > > What we need to verify is that we thread jumps across the backedge of > the loop through the switch statement to a particular case (thus > bypassing the indirect jump for the switch statement). How to do that > in a way that's easier to manage? I have no clue. I guess a gimple-fe > based test might help. Ah, I see. After lots of pain, I was able to distill a tiny testcase that does just that (tree-ssa/ssa-thread-backedge.c), and is easy to maintain. I've added a "backedge" marker to the path dumping code for easy matching in the test. An alternative would be to convert it to a gimple FE test examining the exact thread sequence through the backedge, but I think that's overkill since the test is so small. Phew, I think we're finally converging on a useful set of threading tests :). OK for trunk? Aldy From d6a3114b160079a60d968cd21950a33116361d75 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Wed, 20 Oct 2021 07:29:25 +0200 Subject: [PATCH] Try to resolve paths in threader without looking further back. Sometimes we can solve a candidate path without having to recurse further back. This can mostly happen in fully resolving mode, because we can ask the ranger what the range on entry to the path is, but there's no reason this can't always apply. This one-liner removes the fully-resolving restriction. I'm tickled pink to see how many things we now get quite early in the compilation. I actually had to disable jump threading entirely for a few tests because the early threader was catching things disturbingly early. Also, as Richi predicted, I saw a lot of pre-VRP cleanups happening. I was going to commit this as obvious, but I think the test changes merit discussion. We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite some time. Every time a threading pass gets smarter, we push the check further down the pipeline. We've officially run out of dumb threading passes to disable ;-). In the last year we've gone up from a handful of threads, to 34 threads with the current combination of options. I doubt this is testing anything useful anymore, so I've removed it. Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c. We used to thread 3 jump threads, but they were disallowed because of loop rotation. Then we started catching more jump threads in VRP2 threading so we tested there. With this patch though, we triple the number of threads found from 11 to 31. I believe this test has outlived its usefulness, and I've removed it. Note that even though we have these outrageous possibilities for this test, the block copier ultimately chops them down (23 survive though). The ssa-dom-thread-7.c is the canonical test for the FSM optimization, but it has become a maintenance burden. The number of threads has been consistently growing over the years, and there's no clean way to test that we're actually threading through the backedges. I have replaced this test with a much simpler one (ssa-thread-backedge.c) that tests that we can thread across the backedge, through the switch statement and into a particular case. All in all, I believe the simpler jump threading tests, as well as the gimple FE tests I've added, more than adequately cover us. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader::find_paths_to_names): Always try to resolve path without looking back. * tree-ssa-threadupdate.c (dump_jump_thread): Indidicate whether edge is a back edge. gcc/testsuite/ChangeLog: * gcc.dg/graphite/scop-dsyr2k-2.c: Adjust for jump threading changes. * gcc.dg/graphite/scop-dsyr2k.c: Same. * gcc.dg/graphite/scop-dsyrk-2.c: Same. * gcc.dg/graphite/scop-dsyrk.c: Same. * gcc.dg/tree-ssa/pr20701.c: Same. * gcc.dg/tree-ssa/pr20702.c: Same. * gcc.dg/tree-ssa/pr21086.c: Same. * gcc.dg/tree-ssa/pr25382.c: Same. * gcc.dg/tree-ssa/pr58480.c: Same. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same. * gcc.dg/tree-ssa/vrp08.c: Same. * gcc.dg/tree-ssa/vrp55.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Removed. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Removed. * gcc.dg/tree-ssa/ssa-thread-11.c: Removed. * gcc.dg/uninit-pr89230-1.c: xfail. --- gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c | 1 + gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c | 1 + gcc/testsuite/gcc.dg/graphite/scop-dsyrk-2.c | 1 + gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c| 1 + gcc/testsuite/gcc.dg/tree-ssa/pr20701.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr20702.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21086.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr25382.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr58480.c | 2 +- .../gcc.dg/tree-ssa/ssa-dom-thread-4.c| 60 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c| 134 -- gcc/testsuite/gcc.dg/t
Re: [PATCH] Convert strlen pass from evrp to ranger.
On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: > > > > On 10/18/2021 2:17 AM, Aldy Hernandez wrote: > > > > > > On 10/18/21 12:52 AM, Jeff Law wrote: > >> > >> > >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: > >>> The following patch converts the strlen pass from evrp to ranger, > >>> leaving DOM as the last remaining user. > >> So is there any reason why we can't convert DOM as well? DOM's use > >> of EVRP is pretty limited. You've mentioned FP bits before, but my > >> recollection is those are not part of the EVRP analysis DOM uses. > >> Hell, give me a little guidance and I'll do the work... > > > > Not only will I take you up on that offer, but I can provide 90% of > > the work. Here be dragons, though (well, for me, maybe not for you ;-)). > > > > DOM is actually an evrp pass at -O1 in disguise. The reason it really > > is a covert evrp pass is because: > > > > a) It calls extract_range_from_stmt on each statement. > > > > b) It folds conditionals with simplify_using_ranges. > > > > c) But most importantly, it exports discovered ranges when it's done > > (evrp_range_analyzer(true)). > > > > If you look at the evrp pass, you'll notice that that's basically what > > it does, albeit with the substitute and fold engine, which also calls > > gimple fold plus other goodies. > > > > But I could argue that we've made DOM into an evrp pass without > > noticing. The last item (c) is particularly invasive because these > > exported ranges show up in other passes unexpectedly. For instance, I > > saw an RTL pass at -O1 miss an optimization because it was dependent > > on some global range being set. IMO, DOM should not export global > > ranges it discovered during its walk (do one thing and do it well), > > but I leave it to you experts to pontificate. > All true. But I don't think we've got many, if any, hard dependencies > on those behaviors. > > > > > The attached patch is rather trivial. It's mostly deleting state. It > > seems DOM spends a lot of time massaging the IL so that it can fold > > conditionals or thread paths. None of this is needed, because the > > ranger can do all of this. Well, except floats, but... > Massaging the IL should only take two forms IIRC. > > First, if we have a simplification we can do. That could be const/copy > propagation, replacing an expression with an SSA_NAME or constant and > the like. It doesn't massage the IL just to massage the IL. > > Second, we do temporarily copy propagate the current known values of an > SSA name into use points and then see if that allows us to determine if > a statement is already in the hash tables. But we undo that so that > nobody should see that temporary change in state. Are you sure we still do that? I can't find it at least. > > Finally, it does create some expressions & statements on the fly to > enter them into the tables. For example, if it sees a store, it'll > create a load with the source & dest interchanged and enter that into > the expression table. But none of this stuff ever shows up in the IL. > It's just to create entries in the expression tables. > > So ITSM the only real concern would be if those temporary const/copy > propagations were still in the IL and we called back into Ranger and it > poked at that data somehow. > > > > That's the good news. The bad news is that DOM changes the IL as it > > goes and the patch doesn't bootstrap. Andrew insists that we should > > work even with DOM's changing IL, but last time we played this dance > > with the substitute_and_fold engine, there were some tweaks needed to > > the ranger. Could be this, but I haven't investigated. It could also > > be that the failures I was seeing were just DOM things that were no > > longer needed (shuffling the IL to simplify things for evrp). > So if we're referring to those temporary const/copy propagations > "escaping" into Ranger, then I would fully expect that to cause > problems. Essentially they're path sensitive const/copy propagations > and may not be valid on all the paths through the CFG to the statement > where the propagation occurs > > > > > > > This just needs a little shepherding from a DOM expert ;-). If you > > get it to bootstrap, I could take care of the tests, performance, and > > making sure we're getting the same number of threads etc. > > > >> > >>> > >>> No additional cleanups have been done. For example, the strlen pass > >>> still has uses of VR_ANTI_RANGE, and the sprintf still passes around > >>> pairs of integers instead of using a proper range. Fixing this > >>> could further improve these passes. > >>> > >>> As a further enhancement, if the relevant maintainers deem useful, > >>> the domwalk could be removed from strlen. That is, unless the pass > >>> needs it for something else. > >> The dom walk was strictly for the benefit of EVRP when it was added. > >> So I think it can get zapped once the pass is converted. > > > > Jakub mentioned a while ago, that the strlen pass itse
[RFC PATCH 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. The patch is a bit rough around the edges, but produces the correct results as far as I can tell. However, I couldn't quite figure out how to modify the patterns so that the offset will be moved into the immediate offset field of the LDR instructions, so currently, the ADD of the offset is always a distinct instruction. As for the spilling issues that have been fixed in this code in the past: I suppose a register carrying the TLS register value will never get spilled to begin with? How about a register that carries TLS+? Comments/suggestions welcome. Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 +++ gcc/config/arm/arm.c | 39 + gcc/config/arm/arm.md | 44 ++-- gcc/config/arm/arm.opt| 22 ++ gcc/doc/invoke.texi | 9 5 files changed, 116 insertions(+), 4 deletions(-) -- 2.30.2 $ cat |arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls -mstack-protector-guard-offset=10 -mtp=cp15 -S -o - -xc - -fstack-protector-all -O3 int foo(void *); int bar(void) { return foo(__builtin_thread_pointer()) + 1; } .arch armv7-a .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "" .text .align 2 .global bar .syntax unified .arm .type bar, %function bar: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 push{r4, lr} mrc p15, 0, r4, c13, c0, 3 @ load_tp_hard add r3, r4, #10 sub sp, sp, #8 mov r0, r4 add r4, r4, #10 ldr r3, [r3] str r3, [sp, #4] mov r3, #0 bl foo ldr r3, [r4] ldr r4, [sp, #4] eorsr3, r4, r3 mov r4, #0 bne .L5 add r0, r0, #1 add sp, sp, #8 @ sp needed pop {r4, pc} .L5: bl __stack_chk_fail .size bar, .-bar .ident "GCC: (GNU) 12.0.0 20211019 (experimental)" .section.note.GNU-stack,"",%progbits
[RFC PATCH 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-20 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm.c (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_stack_protect_guard): New. (TARGET_STACK_PROTECT_GUARD): Define. * config/arm/arm.md (reg_stack_protect_address): New. (stack_protect_set): Adjust for SSP_GLOBAL. (stack_protect_test): Likewise. * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 +++ gcc/config/arm/arm.c | 39 + gcc/config/arm/arm.md | 44 ++-- gcc/config/arm/arm.opt| 22 ++ gcc/doc/invoke.texi | 9 5 files changed, 116 insertions(+), 4 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e51f60a1841d..deccc88e006e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3157,6 +3160,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3824,6 +3847,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -34052,6 +34079,18 @@ arm_run_selftests (void) } } /* Namespace selftest. */ +/* Implement TARGET_STACK_PROTECT_GUARD. In case of a + global variable based guard use the default else + return a null tree. */ +static tree +arm_stack_protect_guard (void) +{ + if (arm_stack_protector_guard == SSP_GLOBAL) +return default_stack_protect_guard (); + + return NULL_TREE; +} + #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests #endif /* CHECKING_P */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4adc976b8b67..f57e1db07e6a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9183,8 +9183,21 @@ UNSPEC_SP_SET)) (clobber (match_scratch:SI 2 "")) (clobber (match_scratch:SI 3 ""))])] + "arm_stack_protector_guard == SSP_GLOBAL" "" - "" +) + +(define_expand "stack_protect_set" + [(match_operand:SI 0 "memory_operand") + (match_operand:SI 1 "")] + "arm_stack_protector_guard == SSP_TLSREG" + " +{ + rtx tp_reg = gen_reg
Re: [PATCH] Fix loop split incorrect count and probability
On Thu, 21 Oct 2021, Xionghu Luo wrote: > > > On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote: > > > > > > On 2021/9/23 20:17, Richard Biener wrote: > >> On Wed, 22 Sep 2021, Xionghu Luo wrote: > >> > >>> > >>> > >>> On 2021/8/11 17:16, Richard Biener wrote: > On Wed, 11 Aug 2021, Xionghu Luo wrote: > > > > > > > On 2021/8/10 22:47, Richard Biener wrote: > >> On Mon, 9 Aug 2021, Xionghu Luo wrote: > >> > >>> Thanks, > >>> > >>> On 2021/8/6 19:46, Richard Biener wrote: > On Tue, 3 Aug 2021, Xionghu Luo wrote: > > > loop split condition is moved between loop1 and loop2, the split > > bb's > > count and probability should also be duplicated instead of (100% vs > > INV), > > secondly, the original loop1 and loop2 count need be propotional > > from > > the > > original loop. > > > > > > diff base/loop-cond-split-1.c.151t.lsplit > > patched/loop-cond-split-1.c.151t.lsplit: > > ... > > int prephitmp_16; > > int prephitmp_25; > > > >[local count: 118111600]: > > if (n_7(D) > 0) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > >[local count: 118111600]: > > return; > > > >[local count: 105119324]: > > pretmp_3 = ga; > > > > - [local count: 955630225]: > > + [local count: 315357973]: > > # i_13 = PHI > > # prephitmp_12 = PHI > > if (prephitmp_12 != 0) > > goto ; [33.00%] > > else > > goto ; [67.00%] > > > > - [local count: 315357972]: > > + [local count: 104068130]: > > _2 = do_something (); > > ga = _2; > > > > - [local count: 955630225]: > > + [local count: 315357973]: > > # prephitmp_5 = PHI > > i_10 = inc (i_13); > > if (n_7(D) > i_10) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > >[local count: 105119324]: > > goto ; [100.00%] > > > > - [local count: 850510901]: > > + [local count: 280668596]: > > if (prephitmp_12 != 0) > > -goto ; [100.00%] > > +goto ; [33.00%] > > else > > -goto ; [INV] > > +goto ; [67.00%] > > > > - [local count: 850510901]: > > + [local count: 280668596]: > > goto ; [100.00%] > > > > - [count: 0]: > > + [local count: 70429947]: > > # i_23 = PHI > > # prephitmp_25 = PHI > > > > - [local count: 955630225]: > > + [local count: 640272252]: > > # i_15 = PHI > > # prephitmp_16 = PHI > > i_22 = inc (i_15); > > if (n_7(D) > i_22) > > goto ; [89.00%] > > else > > goto ; [11.00%] > > > > - [local count: 850510901]: > > + [local count: 569842305]: > > goto ; [100.00%] > > > > } > > > > gcc/ChangeLog: > > > > * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > > (do_split_loop_on_cond): Likewise. > > --- > > gcc/tree-ssa-loop-split.c | 16 > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > > index 3a09bbc39e5..8e5a7ded0f7 100644 > > --- a/gcc/tree-ssa-loop-split.c > > +++ b/gcc/tree-ssa-loop-split.c > > @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > > basic_block cond_bb; > >>> > >>> if (!initial_true) > >>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >>> + > >>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE > >>> +? EDGE_SUCC (bbs[i], 0) > >>> +: EDGE_SUCC (bbs[i], 1); > >>> > > > > class loop *loop2 = loop_version (loop1, cond, &cond_bb, > > - profile_probability::always > > (), > > - profile_probability::always > > (), > > - profile_probability::always > > (), > > - profile_probability::always > > (), >
[PATCH] sra: Fix corner case of total scalarization with virtual inheritance (PR 102505)
Hi, PR 102505 is a situation where of SRA takes its initial top-level access size from a get_ref_base_and_extent called on a COMPONENT_REF, and thus derived frm the FIELD_DECL, which however does not include a virtual base. Total scalarization then goes on traversing the type, which however has virtual base past the non-virtual bits, tricking SRA to create sub-accesses outside of the supposedly encompassing accesses, which in turn triggers the verifier within the pass. The patch below fixes that by failing total scalarization when this situation is detected. Bootstrapped and tested on x86_64-linux, OK for trunk and (after testing there) on the branches? Thanks, Martin gcc/ChangeLog: 2021-10-20 Martin Jambor PR tree-optimization/102505 * tree-sra.c (totally_scalarize_subtree): Check that the encountered field fits within the acces we would like to put it in. gcc/testsuite/ChangeLog: 2021-10-20 Martin Jambor PR tree-optimization/102505 * g++.dg/torture/pr102505.C: New test. --- gcc/testsuite/g++.dg/torture/pr102505.C | 15 +++ gcc/tree-sra.c | 2 ++ 2 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/torture/pr102505.C diff --git a/gcc/testsuite/g++.dg/torture/pr102505.C b/gcc/testsuite/g++.dg/torture/pr102505.C new file mode 100644 index 000..a846751a0d6 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/pr102505.C @@ -0,0 +1,15 @@ +struct D { int i; int pad alignas(16); }; +struct B : virtual D +{ + int j =84; + int k =84; +}; + +struct C: B { }; + +int main() +{ + C c; + if (c.j != 84 || c.k != 84) +__builtin_abort(); +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 9b786e29e4e..f561e1a2133 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3288,6 +3288,8 @@ totally_scalarize_subtree (struct access *root) continue; HOST_WIDE_INT pos = root->offset + int_bit_position (fld); + if (pos + fsize > root->size) + return false; enum total_sra_field_state state = total_should_skip_creating_access (root, &last_seen_sibling, -- 2.33.0
Re: [PATCH v4] Improve integer bit test on __atomic_fetch_[or|and]_* returns
i is On Wed, Oct 13, 2021 at 8:34 PM Richard Biener via Gcc-patches wrote: > > On Sun, Oct 10, 2021 at 3:49 PM H.J. Lu wrote: > > > > Changes in v4: > > > > 1. Bypass redundant check when inputs have been transformed to the > > equivalent canonical form with valid bit operation. > > > > Changes in v3: > > > > 1. Check invalid bit operation. > > > > commit adedd5c173388ae505470df152b9cb3947339566 > > Author: Jakub Jelinek > > Date: Tue May 3 13:37:25 2016 +0200 > > > > re PR target/49244 (__sync or __atomic builtins will not emit 'lock > > bts/btr/btc') > > > > optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns > > with lock bts/btr/btc by turning > > > > mask_2 = 1 << cnt_1; > > _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); > > _5 = _4 & mask_2; > > > > into > > > > _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); > > _5 = _4; > > > > and > > > > mask_6 = 1 << bit_5(D); > > _1 = ~mask_6; > > _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); > > _3 = _2 & mask_6; > > _4 = _3 != 0; > > > > into > > > > mask_6 = 1 << bit_5(D); > > _1 = ~mask_6; > > _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0); > > _4 = _11 != 0; > > > > But it failed to optimize many equivalent, but slighly different cases: > > > > 1. > > _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); > > _4 = (_Bool) _1; > > 2. > > _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); > > _4 = (_Bool) _1; > > 3. > > _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); > > _7 = ~_1; > > _5 = (_Bool) _7; > > 4. > > _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); > > _7 = ~_1; > > _5 = (_Bool) _7; > > 5. > > _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); > > _2 = (int) _1; > > _7 = ~_2; > > _5 = (_Bool) _7; > > 6. > > _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); > > _2 = (int) _1; > > _7 = ~_2; > > _5 = (_Bool) _7; > > 7. > > _1 = _atomic_fetch_or_4 (ptr_6, mask, _3); > > _2 = (int) _1; > > _5 = _2 & mask; > > 8. > > _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); > > _5 = (signed int) _1; > > _4 = _5 < 0; > > 9. > > _1 = __atomic_fetch_and_4 (ptr_6, 0x7fff, _3); > > _5 = (signed int) _1; > > _4 = _5 < 0; > > 10. > > _1 = 1 << bit_4(D); > > mask_5 = (unsigned int) _1; > > _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0); > > _3 = _2 & mask_5; > > 11. > > mask_7 = 1 << bit_6(D); > > _1 = ~mask_7; > > _2 = (unsigned int) _1; > > _3 = __atomic_fetch_and_4 (v_9(D), _2, 0); > > _4 = (int) _3; > > _5 = _4 & mask_7; > > > > We make > > > > mask_2 = 1 << cnt_1; > > _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); > > _5 = _4 & mask_2; > > > > and > > > > mask_6 = 1 << bit_5(D); > > _1 = ~mask_6; > > _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); > > _3 = _2 & mask_6; > > _4 = _3 != 0; > > > > the canonical forms for this optimization and transform cases 1-9 to the > > equivalent canonical form. For cases 10 and 11, we simply remove the cast > > before __atomic_fetch_or_4/__atomic_fetch_and_4 with > > > > _1 = 1 << bit_4(D); > > _2 = __atomic_fetch_or_4 (v_7(D), _1, 0); > > _3 = _2 & _1; > > > > and > > > > mask_7 = 1 << bit_6(D); > > _1 = ~mask_7; > > _3 = __atomic_fetch_and_4 (v_9(D), _1, 0); > > _6 = _3 & mask_7; > > _5 = (int) _6; > > > > gcc/ > > > > PR middle-end/102566 > > * tree-ssa-ccp.c (convert_atomic_bit_not): New function. > > (optimize_atomic_bit_test_and): Transform equivalent, but slighly > > different cases to their canonical forms. > > > > gcc/testsuite/ > > > > PR middle-end/102566 > > * g++.target/i386/pr102566-1.C: New test. > > * g++.target/i386/pr102566-2.C: Likewise. > > * g++.target/i386/pr102566-3.C: Likewise. > > * g++.target/i386/pr102566-4.C: Likewise. > > * g++.target/i386/pr102566-5a.C: Likewise. > > * g++.target/i386/pr102566-5b.C: Likewise. > > * g++.target/i386/pr102566-6a.C: Likewise. > > * g++.target/i386/pr102566-6b.C: Likewise. > > * gcc.target/i386/pr102566-1a.c: Likewise. > > * gcc.target/i386/pr102566-1b.c: Likewise. > > * gcc.target/i386/pr102566-2.c: Likewise. > > * gcc.target/i386/pr102566-3a.c: Likewise. > > * gcc.target/i386/pr102566-3b.c: Likewise. > > * gcc.target/i386/pr102566-4.c: Likewise. > > * gcc.target/i386/pr102566-5.c: Likewise. > > * gcc.target/i386/pr102566-6.c: Likewise. > > * gcc.target/i386/pr102566-7.c: Likewise. > > * gcc.target/i386/pr102566-8a.c: Likewise. > > * gcc.target/i386/pr102566-8b.c: Likewise. > > * gcc.target/i386/pr102566-9a.c: Likewise. > > * gcc.target/i386/pr102566-9b.c: Likewise. > > * gcc.target/i386/pr102566-10a.c: Likewise. > > * gcc.target/i386/pr102566-10b.c: Likewise. > > * gcc.target/i386/pr102566-11.c: Likewise. > > * gcc.target/i386/pr102566-12.c: Likewise. > > --- > > gcc/testsuite/g++.target/i386/pr102566-1.C
Re: [PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]
On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote: > or an unevaluated operand, or a subexpression of an immediate invocation. > > Hmm...that suggests that in consteval23.C, bar(foo) should also be OK, Ouch. That opens a big can of worms, see below. > because 'foo' is a subexpression of an immediate invocation. We can handle > that by... > > > + > > +bool > > +in_immediate_context () > > +{ > > + return (cp_unevaluated_operand != 0 > > + || (current_function_decl != NULL_TREE > > + && DECL_IMMEDIATE_FUNCTION_P (current_function_decl)) > > + || (current_binding_level->kind == sk_function_parms > > + && current_binding_level->immediate_fn_ctx_p) > > + || in_consteval_if_p); > > +} > > + > > /* Return true if a call to FN with number of arguments NARGS > > is an immediate invocation. */ > > @@ -9451,6 +9459,12 @@ build_over_call (struct z_candidate *can > > } > > /* Default arguments */ > > + bool save_in_consteval_if_p = in_consteval_if_p; > > + /* If the call is immediate function invocation, make sure > > + taking address of immediate functions is allowed in default > > + arguments. */ > > + if (immediate_invocation_p (STRIP_TEMPLATE (fn), nargs)) > > +in_consteval_if_p = true; > > ...moving this earlier in build_over_call, e.g. shortly after > not_really_used: > > You can also use make_temp_override. make_temp_override can't be used, because in_consteval_if_p is BOOL_BITFIELD under the hood and binding a reference to a bit-field is not valid. In the patch I've used a specialized in_consteval_if_p_temp_override class instead. And it either needs to be restored after all the argument processing, because we call immediate_invocation_p again and because of the in_consteval_if_p override it is then false (the patch calls icip.reset ()), or we'd need to remember the result of the first immediate_invocation_p call in some bool temporary and reuse that later. Updated patch below. This fixes the static_assert (bar (foo) == 42); case newly added to consteval23.C, but unfortunately that isn't enough. Consider incremental patch to the testcase: --- gcc/testsuite/g++.dg/cpp2a/consteval23.C.jj 2021-10-21 11:32:48.570586881 +0200 +++ gcc/testsuite/g++.dg/cpp2a/consteval23.C2021-10-21 12:47:42.281769906 +0200 @@ -2,6 +2,7 @@ // { dg-do compile { target c++20 } } consteval int foo () { return 42; } +constexpr auto baz (int (*fn) ()) { return fn; } consteval int bar (int (*fn) () = foo) @@ -11,3 +12,5 @@ bar (int (*fn) () = foo) static_assert (bar () == 42); static_assert (bar (foo) == 42); +static_assert (bar (&foo) == 42); +static_assert (bar (baz (&foo)) == 42); If even the last two lines are supposed to be valid, then I'm afraid we need to reconsider the errors performed in cp_build_addr_expr_1, because when that is called, we often don't know if we are in a subexpression of immediate invocation or not, whether it is because we are in template, something non-type dependent but other argumeents of the function call are type dependent, or because it is an overloaded function and some overloads are consteval and others are not and we are deep in a middle of some argument parsing and other arguments haven't been parsed at all yet, etc. I'm afraid I don't have a good idea where to move that diagnostic to though, it would need to be done somewhere where we are certain we aren't in a subexpression of immediate invocation. Given statement expressions, even diagnostics after parsing whole statements might not be good enough, e.g. void qux () { static_assert (bar (({ constexpr auto a = 1; foo; })) == 42); } But if we e.g. diagnose it somewhere after parsing the whole function (and for templates only after instantiating it) plus where we handle non-automatic variable initializers etc., will we handle all spots where we should diagnose it? It better should be done before cp_fold... In whatever spot is right doing something similar to what cp_walk_tree find_immediate_fndecl, does, except probably error out on each case we see (with the right locus) instead of just finding the first match. Note, trying clang++ on godbolt on the consteval23.C testcase with the above ammendments, it diagnoses the = foo default argument, that seems to be a clang bug to me, but if I comment out the static_assert (bar () == 42); test and comment out the default argument bar (int (*fn) () /* = foo */) then it succeeds. It even accepts the above qux with statement expression. 2021-10-21 Jakub Jelinek PR c++/102753 * cp-tree.h (saved_scope): Document that consteval_if_p member is also set while processing immediate invocation. (in_immediate_context): Declare. * call.c (in_immediate_context): New function. (immediate_invocation_p): Use it. (struct in_consteval_if_p_temp_override): New class. (build_over_call): Temporarily set in_consteval_if_p for processing immediate inv
Re: [PATCH] sra: Fix corner case of total scalarization with virtual inheritance (PR 102505)
On Thu, Oct 21, 2021 at 12:57 PM Martin Jambor wrote: > > Hi, > > PR 102505 is a situation where of SRA takes its initial top-level > access size from a get_ref_base_and_extent called on a COMPONENT_REF, > and thus derived frm the FIELD_DECL, which however does not include a > virtual base. Total scalarization then goes on traversing the type, > which however has virtual base past the non-virtual bits, tricking SRA > to create sub-accesses outside of the supposedly encompassing > accesses, which in turn triggers the verifier within the pass. > > The patch below fixes that by failing total scalarization when this > situation is detected. > > Bootstrapped and tested on x86_64-linux, OK for trunk and (after testing > there) on the branches? OK. > Thanks, > > Martin > > > gcc/ChangeLog: > > 2021-10-20 Martin Jambor > > PR tree-optimization/102505 > * tree-sra.c (totally_scalarize_subtree): Check that the > encountered field fits within the acces we would like to put it > in. > > gcc/testsuite/ChangeLog: > > 2021-10-20 Martin Jambor > > PR tree-optimization/102505 > * g++.dg/torture/pr102505.C: New test. > --- > gcc/testsuite/g++.dg/torture/pr102505.C | 15 +++ > gcc/tree-sra.c | 2 ++ > 2 files changed, 17 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/torture/pr102505.C > > diff --git a/gcc/testsuite/g++.dg/torture/pr102505.C > b/gcc/testsuite/g++.dg/torture/pr102505.C > new file mode 100644 > index 000..a846751a0d6 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr102505.C > @@ -0,0 +1,15 @@ > +struct D { int i; int pad alignas(16); }; > +struct B : virtual D > +{ > + int j =84; > + int k =84; > +}; > + > +struct C: B { }; > + > +int main() > +{ > + C c; > + if (c.j != 84 || c.k != 84) > +__builtin_abort(); > +} > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index 9b786e29e4e..f561e1a2133 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -3288,6 +3288,8 @@ totally_scalarize_subtree (struct access *root) > continue; > > HOST_WIDE_INT pos = root->offset + int_bit_position (fld); > + if (pos + fsize > root->size) > + return false; > enum total_sra_field_state > state = total_should_skip_creating_access (root, > &last_seen_sibling, > -- > 2.33.0 >
[COMMITTED] Revert the avoid threading circular paths commit.
I've tested this patch on the wrong tree, and picked up the test changes in a pending patch, without which this patch is no longer obvious. Plus, it causes a regression in an invalid test I've recommended we remove. I'm reverting this patch until the dependencies are reviewed. Sorry for the noise. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader::maybe_register_path): Remove circular paths check. --- gcc/tree-ssa-threadbackward.c | 4 1 file changed, 4 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 38913b06717..d94e3b962db 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -140,10 +140,6 @@ back_threader::maybe_register_path () if (taken_edge && taken_edge != UNREACHABLE_EDGE) { - // Avoid circular paths. - if (m_visited_bbs.contains (taken_edge->dest)) - return UNREACHABLE_EDGE; - bool irreducible = false; bool profitable = m_profit.profitable_path_p (m_path, m_name, taken_edge, &irreducible); -- 2.31.1
[PATCH] aarch64: Remove redundant struct type definitions in arm_neon.h
Hi, As subject, this patch deletes some redundant type definitions in arm_neon.h. These vector type definitions are an artifact from the initial commit that added the AArch64 port. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-10-15 Jonathan Wright * config/aarch64/arm_neon.h (__STRUCTN): Delete function macro and all invocations. rb14942.patch Description: rb14942.patch
Re: [PATCH] X86: Add an option -muse-unaligned-vector-move
On Thu, Oct 21, 2021 at 12:15 AM Richard Biener wrote: > > On Wed, Oct 20, 2021 at 8:34 PM H.J. Lu wrote: > > > > On Wed, Oct 20, 2021 at 9:58 AM Richard Biener > > wrote: > > > > > > On October 20, 2021 3:19:28 PM GMT+02:00, "H.J. Lu" > > > wrote: > > > >On Wed, Oct 20, 2021 at 4:18 AM Richard Biener > > > > wrote: > > > >> > > > >> On Wed, Oct 20, 2021 at 12:40 PM Xu Dianhong > > > >> wrote: > > > >> > > > > >> > Many thanks for your explanation. I got the meaning of operands. > > > >> > The "addpd b(%rip), %xmm0" instruction needs "b(%rip)" aligned > > > >> > otherwise it will rise a "Real-Address Mode Exceptions". > > > >> > I haven't considered this situation "b(%rip)" has an address > > > >> > dependence of "a(%rip)" before. I think this situation could be > > > >> > resolved on the assembler side except for this dummy code like > > > >> > "movapd 0x200b37(%rip),%xmm1, ... addpd 0x200b37(%rip),%xmm0 ". > > > >> > > > >> Of course the compiler will only emit instructions which have the > > > >> constraint of aligned memory > > > >> when the memory is known to be aligned. That's why I wonder why you > > > >> would need such > > > >> option. "Real-Address Mode Exceptions" may point to the issue, but I > > > >> wonder what's different > > > >> in real mode vs. protected mode - even with segmentation the alignment > > > >> of objects should > > > >> prevail unless you play linker"tricks" that make global objects have > > > >> different alignment - but > > > >> then it's better to adjust the respective hooks to not falsely claim > > > >> such alignment. Consider > > > >> for example > > > >> > > > >>if ((uintptr_t)&a & 0x7) > > > >> foo(); > > > >> else > > > >> bar(); > > > >> > > > >> GCC will optimize the branch statically to always call foo if 'a' > > > >> appears to be aligned, > > > >> even if you later try to "override" this with an option. Alignment is > > > >> not only about > > > >> moves, it's also about knowledge about low bits in addresses and about > > > >> alias analysis where alignment constrains how two objects can overlap. > > > >> > > > >> So - do not lie to the compiler! A late "workaround" avoiding aligned > > > >> SSE moves isn't a proper fix. > > > >> > > > > > > > >The motivations are > > > > > > > >1. AVX non-load/store ops work on unaligned memory. Unaligned > > > >load/store on aligned memory is as fast as aligned load/store on Intel > > > >AVX machines. The new switch makes load/store consistent with > > > >other AVX ops. > > > >2. We don't properly align the stack for AVX on Windows. This can > > > >be used as a workaround for -mavx on Windows. > > > > > > But this, with lying that the stack is aligned, causes all of the above > > > mentioned issues and thus needs to be fixed by either properly aligning > > > the stack or not lying to the compiler that we do. > > > > > > > > > > >We can change TARGET_USE_UNALIGNED_VECTOR_MOVE > > > >to require AVX. > > > > > > But such workaround does not make any sense since it does not fix the > > > fundamental underlying problem. > > > > > > > There is a long standing desire to remove alignment checking (#AC(0)). > > For integer operations, alignment checking is disabled in hardware. > > For AVX ops, alignment checking is disabled in hardware for non-load/store > > instructions. But we can't disable alignment checking in hardware for > > aligned load/store instructions. -muse-unaligned-vector-move implements > > disabling alignment checking for all AVX ops. > > No, it does not - it just emits unaligned moves. The compiler still assumes > aligned memory. So whatever reason you have for disabling alignment > checking for memory that is known to be aligned, I don't see it. > > If you want to "fix" broken user code then this doesn't do it. > > If you want to avoid the penalty for runtime stack alignment then you simply > have to change the ABI(?) to not require vector types to have big alignment. > > Let's drop it. We will find another way. Thanks. -- H.J.
Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
On Wed, 20 Oct 2021, Andre Vieira (lists) wrote: > On 15/10/2021 09:48, Richard Biener wrote: > > On Tue, 12 Oct 2021, Andre Vieira (lists) wrote: > > > >> Hi Richi, > >> > >> I think this is what you meant, I now hide all the unrolling cost > >> calculations > >> in the existing target hooks for costs. I did need to adjust 'finish_cost' > >> to > >> take the loop_vinfo so the target's implementations are able to set the > >> newly > >> renamed 'suggested_unroll_factor'. > >> > >> Also added the checks for the epilogue's VF. > >> > >> Is this more like what you had in mind? > > Not exactly (sorry..). For the target hook I think we don't want to > > pass vec_info but instead another output parameter like the existing > > ones. > > > > vect_estimate_min_profitable_iters should then via > > vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll > > suggestion to vect_analyze_loop which should then, if the suggestion > > was > 1, instead of iterating to the next vector mode run again > > with a fixed VF (old VF times suggested unroll factor - there's > > min_vf in vect_analyze_loop_2 which we should adjust to > > the old VF times two for example and maybe store the suggested > > factor as hint) - if it succeeds the result will end up in the > > list of considered modes (where we now may have more than one > > entry for the same mode but a different VF), we probably want to > > only consider more unrolling once. > > > > For simplicity I'd probably set min_vf = max_vf = old VF * suggested > > factor, thus take the targets request literally. > > > > Richard. > > Hi, > > I now pass an output parameter to finish_costs and route it through the > various calls up to vect_analyze_loop. I tried to rework > vect_determine_vectorization_factor and noticed that merely setting min_vf and > max_vf is not enough, we only use these to check whether the vectorization > factor is within range, well actually we only use max_vf at that stage. We > only seem to use 'min_vf' to make sure the data_references are valid. I am > not sure my changes are the most appropriate here, for instance I am pretty > sure the checks for max and min vf I added in > vect_determine_vectorization_factor are currently superfluous as they will > pass by design, but thought they might be good future proofing? Ah, ok - well, max_vf is determined by dependence analysis so we do have to check that. I think that + && known_le (unrolled_vf, min_vf)) is superfluous. So if we use min_vf as passed to vect_analyze_loop_2 to carry the suggested unroll factor then the changes look reasonable. What's if (max_vf != MAX_VECTORIZATION_FACTOR - && maybe_lt (max_vf, min_vf)) + && loop_vinfo->suggested_unroll_factor == 1 + && max_vf < estimated_poly_value (min_vf, POLY_VALUE_MAX)) return opt_result::failure_at (vect_location, "bad data dependence.\n"); LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf; supposed to be though? Why can we allow the unroll case to get past this point? I suppose initializing max_vf to MAX_VECTORIZATION_FACTOR doesn't play well with poly-ints? That is, how does estimated_poly_value (min_vf, POLY_VALUE_MAX) differ here? I would have expected that we can drop the max_vf != MAX_VECTORIZATION_FACTOR part but otherwise leave the test the same? Note you adjust the vectorization factor in vect_determine_vectorization_factor but I think you have to delay that until after vect_update_vf_for_slp which means doing it before the start_over: label. @@ -3038,6 +3203,18 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) if (res) { + /* Only try unrolling main loops. */ + if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)) + { + opt_loop_vec_info unrolled_vinfo = + vect_try_unrolling (loop_vinfo, &n_stmts, + suggested_unroll_factor); + if (unrolled_vinfo) + loop_vinfo = unrolled_vinfo; + /* Reset suggested_unroll_factor for next loop_vinfo. */ + suggested_unroll_factor = 1; + } so it looks like this eventually will leak 'loop_vinfo' but it also looks like you throw away the not unrolled analysis and costing. I was suggesting to simply add the unrolled variant to the alternatives considered - the flow of vect_analyze_loop is already quite complicated and the changes following this hunk do not help :/ (and I'm not sure what vect_reanalyze_as_main_loop is supposed to do or why we need to consider unrolling there as well). If we don't want to globally consider the unrolled variant maybe we can at least decide between the unrolled and not unrolled variant in vect_try_unrolling? But of course only the whole series, (unrolled) main + vectorized epilogues, determine the final cost. That said, the overall flow is OK now, some details about the max_vf check and where to compute the unrolled VF needs to be fleshed out. And then there's
Re: [RFC] vect: Convert cost hooks to classes
On Thu, 14 Oct 2021, Richard Sandiford wrote: > The current vector cost interface has a quite a bit of redundancy > built in. Each target that defines its own hooks has to replicate > the basic unsigned[3] management. Currently each target also > duplicates the cost adjustment for inner loops. > > This patch instead defines a vector_costs class for holding > the scalar or vector cost and allows targets to subclass it. > There is then only one costing hook: to create a new costs > structure of the appropriate type. Everything else can be > virtual functions, with common concepts implemented in the > base class rather than in each target's derivation. > > This might seem like excess C++-ification, but it shaves > ~100 LOC. I've also got some follow-on changes that become > significantly easier with this patch. Maybe it could help > with things like weighting blocks based on frequency too. > > This will clash with Andre's unrolling patches. His patches > have priority so this patch should queue behind them. > > The x86 and rs6000 parts fully convert to a self-contained class. > The equivalent aarch64 changes are more complex, so this patch > just does the bare minimum. A later patch will rework the > aarch64 bits. > > Tested on aarch64-linux-gnu, arm-linux-gnueabihf, x86_64-linux-gnu > and powerpc64le-linux-gnu. WDYT? I like it! Thus OK. I suggested sth similar to Martin for the backend state '[PATCH 3/N] Come up with casm global state.', abstracting varasm global state and allowing targets to override this via the adjusted init_sections target hook. Richard. > Richard > > > gcc/ > * target.def (targetm.vectorize.init_cost): Replace with... > (targetm.vectorize.create_costs): ...this. > (targetm.vectorize.add_stmt_cost): Delete. > (targetm.vectorize.finish_cost): Likewise. > (targetm.vectorize.destroy_cost_data): Likewise. > * doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with... > (TARGET_VECTORIZE_CREATE_COSTS): ...this. > (TARGET_VECTORIZE_ADD_STMT_COST): Delete. > (TARGET_VECTORIZE_FINISH_COST): Likewise. > (TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise. > * doc/tm.texi: Regenerate. > * tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data > parameter. > (vec_info::target_cost_data): Change from a void * to a vector_costs *. > (vector_costs): New class. > (init_cost): Take a vec_info and return a vector_costs. > (dump_stmt_cost): Remove data parameter. > (add_stmt_cost): Replace vinfo and data parameters with a vector_costs. > (add_stmt_costs): Likewise. > (finish_cost): Replace data parameter with a vector_costs. > (destroy_cost_data): Delete. > * tree-vectorizer.c (dump_stmt_cost): Remove data argument and > don't print it. > (vec_info::vec_info): Remove the target_cost_data parameter and > initialize the member variable to null instead. > (vec_info::~vec_info): Delete target_cost_data instead of calling > destroy_cost_data. > (vector_costs::add_stmt_cost): New function. > (vector_costs::finish_cost): Likewise. > (vector_costs::record_stmt_cost): Likewise. > (vector_costs::adjust_cost_for_freq): Likewise. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update > call to vec_info::vec_info. > (vect_compute_single_scalar_iteration_cost): Update after above > changes to costing interface. > (vect_analyze_loop_operations): Likewise. > (vect_estimate_min_profitable_iters): Likewise. > (vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA > at the start_over point, where it needs to be recreated after > trying without slp. Update retry code accordingly. > * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call > to vec_info::vec_info. > (vect_slp_analyze_operation): Update after above changes to costing > interface. > (vect_bb_vectorization_profitable_p): Likewise. > * targhooks.h (default_init_cost): Replace with... > (default_vectorize_create_costs): ...this. > (default_add_stmt_cost): Delete. > (default_finish_cost, default_destroy_cost_data): Likewise. > * targhooks.c (default_init_cost): Replace with... > (default_vectorize_create_costs): ...this. > (default_add_stmt_cost): Delete, moving logic to vector_costs instead. > (default_finish_cost, default_destroy_cost_data): Delete. > * config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from > vector_costs. Add a constructor. > (aarch64_init_cost): Replace with... > (aarch64_vectorize_create_costs): ...this. > (aarch64_add_stmt_cost): Replace with... > (aarch64_vector_costs::add_stmt_cost): ...this. Use record_stmt_cost > to adjust the cost for inner loops. > (aarch64_finish_cost): Replace with... > (aarch64_vector_costs::finish_c
Re: [PATCH] gcc: implement AIX-style constructors
Hi David, The problem is that cdtors is created by the linker only when -bcdtors flag is provided. Thus, if we add "extern void (* _cdtors[]) (void);" to the "crtcxa.c", we can't used it without using the new constructor types. One solution would be to create another crtcxa (eg crtcxa_cdtors.o) which will be replacing the default crtcxa.o when needed. I didn't thought about that when creating the patch. But it might be a better approach. What do you think ? About documentation, I wasn't aware of that. I'll do it. Clément From: David Edelsohn Sent: Tuesday, October 19, 2021 10:14 PM To: CHIGOT, CLEMENT Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] gcc: implement AIX-style constructors Caution! External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe. Clement, + /* Use __C_runtime_pstartup to run ctors and register dtors. + This whole part should normally be in libgcc but as + AIX cdtors format is currently not the default, managed + that in collect2. */ Why are you emitting the special startup function call in collect2.c instead of placing it in libgcc. The comment mentions that the special startup function should be defined in libgcc. Yes, the AIX ld bcdtors mechanism is not the default, but what is the harm? The symbol will be defined and exported by libgcc. If the AIX linker -bcdtors functionality is not invoked, the symbol is not used. And if a user does invoke the AIX linker with -bcdtors, the behavior will be the same (either the program was compiled to use AIX cdtors or not, which is the same if the startup function is emitted by collect2.c. Also, the patch should include documentation of the option. The documentation should mention that this is for interoperability with IBM XL Compiler, and the option will not operate correctly unless the application and the GCC runtime are built with the option. Thanks, David On Mon, Oct 18, 2021 at 3:55 AM CHIGOT, CLEMENT wrote: > > AIX linker now supports constructors and destructors detection. For such > functions to be detected, their name must starts with __sinit or __sterm. > and -bcdtors must be passed to linker calls. It will create "_cdtors" > symbol which can be used to launch the initialization. > > This patch creates a new RS6000 flag "-mcdtors=". > With "-mcdtors=aix", gcc will generate these new constructors/destructors. > With "-mcdtors=gcc", which is currently the default, gcc will continue > to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D > symbols). > Ideally, it would have been better to enable the AIX format by default > instead of using collect2. However, the compatibility between the > previously-built binaries and the new ones is too complex to be done. > > gcc/ChangeLog: > 2021-10-04 Clément Chigot > > * collect2.c (aixbcdtors_flags): New variable. > (main): Use it to detect -bcdtors and remove -binitfini flag. > (write_c_file_stat): Adapt to new AIX format. > * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define. > (FILE_STERM_FORMAT): New define. > (TARGET_FILE_FUNCTION_FORMAT): New define. > * config/rs6000/aix64.opt: Add -mcdtors flag. > * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when > -mcdtors=aix is passed. > * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise. > * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise. > * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum. > * tree.c (get_file_function_name): Add > TARGET_FILE_FUNCTION_FORMAT support. > > gcc/testsuite/ChangeLog: > 2021-10-04 Clément Chigot > > * gcc.target/powerpc/constructor-aix.c: New test. > >
Re: [PATCH 3/N] Come up with casm global state.
On Thu, Oct 21, 2021 at 11:47 AM Martin Liška wrote: > > On 10/5/21 13:54, Richard Biener wrote: > > On Mon, Oct 4, 2021 at 1:13 PM Martin Liška wrote: > >> > >> On 9/22/21 11:59, Richard Biener wrote: > >>> On Thu, Sep 16, 2021 at 3:12 PM Martin Liška wrote: > > This patch comes up with asm_out_state and a new global variable casm. > > Tested on all cross compilers. > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? > >>> > >>> * output.h (struct asm_out_file): New struct that replaces a > >>> ^^^ > >>> asm_out_state? > >> > >> Yes, sure! > >> > >>> > >>> You replace a lot of asm_out_file - do we still need the macro then? > >>> (I'd have simplified the patch leaving that replacement out at first) > >> > >> Well, I actually replaced only a very small fraction of the usage of > >> asm_out_file. > >> > >> $ git grep asm_out_file | grep -v ChangeLog | wc -l > >> > >> 1601 > >> > >> > >> So I think we should live with the macro for some time ... > >> > >>> > >>> You leave quite some global state out of the struct that is obviously > >>> related, in the diff I see object_block_htab for example. > >> > >> Yes, I'm aware of it. Can we do that incrementally? > >> > >>> Basically > >>> everything initialized in init_varasm_once is a candidate (which > >>> then shows const_desc_htab and shared_constant_pool as well > >>> as pending_assemble_externals_set). > >> > >> Well, these would probably need a different header file (or another > >> #include ... must > >> be added before output.h :// ). > >> > >>> For the goal of outputting > >>> early DWARF to another file the state CTOR could have a mode > >>> to not initialize those parts or we could have asm-out-state-with-sections > >>> as base of asm-out-state. > >> > >> Yes, right now asm_out_state ctor is minimal: > >> > >> asm_out_state (): out_file (NULL), in_section (NULL), > >> sec ({}), anchor_labelno (0), in_cold_section_p (false) > >> { > >> section_htab = hash_table::create_ggc (31); > >> } > >> > >>> > >>> In the end there will be a target part of the state so I think > >>> construction needs to be defered to the target which can > >>> derive from asm-out-state and initialize the part it needs. > >>> That's currently what targetm.asm_out.init_sections () does > >>> and we'd transform that to a targetm.asm_out.create () or so. > >>> That might already be necessary for the DWARF stuff. > >> > >> So what do you want to with content of init_varasm_once function? > > > > It goes away ;) Or rather becomes the invocation of the > > asm-out-state CTOR via the target hook. > > > >>> > >>> That said, dealing with the target stuff piecemail is OK, but maybe > >>> try to make sure that init_varasm_once is actually identical > >>> to what the CTOR does? > >> > >> So you want asm_out_state::asm_out_state doing what we current initialize > >> in init_varasm_once, right? > > > > Yes, asm_out_state should cover everything initialized in init_varasm_once > > (and the called targetm.asm_out.init_sections). > > > > targetm.asm_out.init_sections would become > > > > asm_out_state *init_sections (); > > > > where the target can derive from asm_out_state (optionally) and > > allocates the asm-out-state & invokes the CTORs. The middle-end > > visible asm_out_state would contain all the tables initialized in > > init_varasm_once. > > > > I'm not sure if doing it piecemail is good - we don't want to touch > > things multiple times without good need and it might be things > > turn out not be as "simple" as the above plan sounds. > > > > I would suggest to try the conversion with powerpc since it > > seems to be the only target with two different implementations > > for the target hook. > > > > The > > > > static GTY(()) section *read_only_data_section; > > static GTY(()) section *private_data_section; > > static GTY(()) section *tls_data_section; > > static GTY(()) section *tls_private_data_section; > > static GTY(()) section *read_only_private_data_section; > > static GTY(()) section *sdata2_section; > > > > section *toc_section = 0; > > > > could become #defines to static_cast > > (casm)->section_name > > and there'd be > > > > class rs6000_asm_out_state : public asm_out_state > > { > > ... add stuff ... > > } > > > > in rs6000/xcoff.h and rs6000/sysv4.h and some generic rs6000 header as well > > (adding no fields) and the target hook would basically do > > > > return ggc_new rs6000_asm_state (); > > > > (OK, ggc_alloc + placement new I guess). The default hook just > > creates asm_out_state. > > Hello. > > All right, I made a patch candidate that does the suggested steps and I ported > rs6000 target (both ELF and XCOFF sub-targets). > > So far I was able to bootstrap on ppc6le-linux-gnu. > > Thoughts? +extern GTY(()) rs6000_asm_out_state *target_casm; this seems to be unused +#define rs6000_casm static_cast (casm) maybe there's a better way? Though I ca
Re: [PATCH 3/N] Come up with casm global state.
On Thu, 2021-09-16 at 15:12 +0200, Martin Liška wrote: > This patch comes up with asm_out_state and a new global variable > casm. > > Tested on all cross compilers. > Patch can bootstrap on x86_64-linux-gnu and survives regression > tests. > > Ready to be installed? > Thanks, > Martin This is possibly a silly question, but what does the "c" stand for in "casm"? "compiler"? "compilation"? "common"? Sorry if it's explained somewhere, but I didn't spot this when glancing at the patches. Dave
Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F
On 10/21/21 07:47, Hongyu Wang via Gcc-patches wrote: |Yes, updated patch.| Note the patch caused the following test failure: FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1 FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1 Martin
Re: [PATCH] Convert strlen pass from evrp to ranger.
On Thu, Oct 21, 2021 at 12:20 PM Richard Biener wrote: > > On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: > > > > > > > > On 10/18/2021 2:17 AM, Aldy Hernandez wrote: > > > > > > > > > On 10/18/21 12:52 AM, Jeff Law wrote: > > >> > > >> > > >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: > > >>> The following patch converts the strlen pass from evrp to ranger, > > >>> leaving DOM as the last remaining user. > > >> So is there any reason why we can't convert DOM as well? DOM's use > > >> of EVRP is pretty limited. You've mentioned FP bits before, but my > > >> recollection is those are not part of the EVRP analysis DOM uses. > > >> Hell, give me a little guidance and I'll do the work... > > > > > > Not only will I take you up on that offer, but I can provide 90% of > > > the work. Here be dragons, though (well, for me, maybe not for you ;-)). > > > > > > DOM is actually an evrp pass at -O1 in disguise. The reason it really > > > is a covert evrp pass is because: > > > > > > a) It calls extract_range_from_stmt on each statement. > > > > > > b) It folds conditionals with simplify_using_ranges. > > > > > > c) But most importantly, it exports discovered ranges when it's done > > > (evrp_range_analyzer(true)). > > > > > > If you look at the evrp pass, you'll notice that that's basically what > > > it does, albeit with the substitute and fold engine, which also calls > > > gimple fold plus other goodies. > > > > > > But I could argue that we've made DOM into an evrp pass without > > > noticing. The last item (c) is particularly invasive because these > > > exported ranges show up in other passes unexpectedly. For instance, I > > > saw an RTL pass at -O1 miss an optimization because it was dependent > > > on some global range being set. IMO, DOM should not export global > > > ranges it discovered during its walk (do one thing and do it well), > > > but I leave it to you experts to pontificate. > > All true. But I don't think we've got many, if any, hard dependencies > > on those behaviors. > > > > > > > > The attached patch is rather trivial. It's mostly deleting state. It > > > seems DOM spends a lot of time massaging the IL so that it can fold > > > conditionals or thread paths. None of this is needed, because the > > > ranger can do all of this. Well, except floats, but... > > Massaging the IL should only take two forms IIRC. > > > > First, if we have a simplification we can do. That could be const/copy > > propagation, replacing an expression with an SSA_NAME or constant and > > the like. It doesn't massage the IL just to massage the IL. > > > > Second, we do temporarily copy propagate the current known values of an > > SSA name into use points and then see if that allows us to determine if > > a statement is already in the hash tables. But we undo that so that > > nobody should see that temporary change in state. > > Are you sure we still do that? I can't find it at least. I couldn't either, but perhaps what Jeff is referring to is the ad-hoc copy propagation that happens in the DOM's use of the threader: /* Make a copy of the uses & vuses into USES_COPY, then cprop into the operands. */ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) { tree tmp = NULL; tree use = USE_FROM_PTR (use_p); copy[i++] = use; if (TREE_CODE (use) == SSA_NAME) tmp = SSA_NAME_VALUE (use); if (tmp) SET_USE (use_p, tmp); } cached_lhs = simplifier->simplify (stmt, stmt, bb, this); /* Restore the statement's original uses/defs. */ i = 0; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) SET_USE (use_p, copy[i++]); Aldy
[PATCH 0/2] jit: Complex types and loong constants
Hi, These two patches fixes support for complex types and adds the possibility to make long long, long double and complex long double constants. Please see follow up mails. Regards,
[PATCH 1/2] jit: Complex types and loong constants
This patch fixes support for complex types. It adds the entrypoints: gcc_jit_context_new_rvalue_from_complex_double gcc_jit_context_set_bool_enable_complex_types Aswell as the binary operator GCC_JIT_BINARY_OP_COMPLEX, to make complex numbers from two real number rvalues. Note that the complex types already are in the type enum, so I thought the cleanest solution is to enable them, rather than to have a "get_complex_type" function. See attachement.From 6d5d7ce40f9bb84a7943e5113c8954ae4b941905 Mon Sep 17 00:00:00 2001 From: Petter Tomner Date: Fri, 15 Oct 2021 20:16:54 +0200 Subject: [PATCH 1/3] Complex Fix support for complex types The patch adds support of complex floating point types to libgccjit. A new binary operator 'COMPLEX' is added to create complex values from real values. Aswell as a function to create a complex double literal. To notify users if a binary linking to libgccjit depends on complex types, complex type support most be enabled with a new option function since they allready are in the types enum. Signed-off-by: 2021-10-21 Petter Tomner gcc/jit/ * jit-common.h : (INNER_BOOL_OPTION_ENABLE_COMPLEX_TYPES): New * jit-playback.c : Create imaginary literals, conversions (complex_real_to_real) : New (new_rvalue_from_const<...>) : Also build complex constants (new_rvalue_from_const <_Complex double>) : New (new_binary_op) : Add COMPLEX operator (build_cast) : Casts to complex (memento_of_new_rvalue_from_const <_Complex double>) : New (binary_op::make_debug_string) : Handle complex operator * jit-recording.c : Reproducer, debug strings, forwarding * jit-recording.h : (float_size_qual) : Poll size qualifier of float types (is_complex) : Poll if type is complex * libgccjit++.h : New entrypoints, see below * libgccjit.c : Implementation of new entry points, see .h (gcc_jit_context_get_type) : Extend range check, validation (valid_binary_op_p) : Extend range check (gcc_jit_context_new_binary_op) : Validation (gcc_jit_context_new_unary_op) : Validation (gcc_jit_context_new_comparison) : Validation * libgccjit.h : (gcc_jit_context_new_rvalue_from_complex_double) : New (GCC_JIT_BINARY_OP_COMPLEX) : New (LIBGCCJIT_HAVE_COMPLEX) : New (gcc_jit_context_set_bool_enable_complex_types) : New * libgccjit.map : Added new entrypoints (LIBGCCJIT_ABI_16) : New gcc/testsuite/ * jit.dg/test-complex-builtins.c : New * jit.dg/test-complex-literals.c : New * jit.dg/test-complex-misc.c : New * jit.dg/test-complex-operators.c : New * jit.dg/test-complex-types.c : New * jit.dg/test-error-complex-noenable.c : New gcc/jit/docs/topics/ * contexts.rst : Update docs * expressions.rst * types.rst --- gcc/jit/docs/topics/contexts.rst | 20 + gcc/jit/docs/topics/expressions.rst | 89 ++- gcc/jit/docs/topics/types.rst | 12 +- gcc/jit/jit-common.h | 1 + gcc/jit/jit-playback.c| 165 - gcc/jit/jit-recording.c | 126 +++- gcc/jit/jit-recording.h | 28 + gcc/jit/libgccjit++.h | 21 + gcc/jit/libgccjit.c | 97 ++- gcc/jit/libgccjit.h | 44 +- gcc/jit/libgccjit.map | 6 + gcc/testsuite/jit.dg/all-non-failing-tests.h | 30 + gcc/testsuite/jit.dg/test-complex-builtins.c | 217 ++ gcc/testsuite/jit.dg/test-complex-literals.c | 145 gcc/testsuite/jit.dg/test-complex-misc.c | 204 ++ gcc/testsuite/jit.dg/test-complex-operators.c | 353 + gcc/testsuite/jit.dg/test-complex-types.c | 677 ++ .../jit.dg/test-error-complex-noenable.c | 31 + 18 files changed, 2221 insertions(+), 45 deletions(-) create mode 100644 gcc/testsuite/jit.dg/test-complex-builtins.c create mode 100644 gcc/testsuite/jit.dg/test-complex-literals.c create mode 100644 gcc/testsuite/jit.dg/test-complex-misc.c create mode 100644 gcc/testsuite/jit.dg/test-complex-operators.c create mode 100644 gcc/testsuite/jit.dg/test-complex-types.c create mode 100644 gcc/testsuite/jit.dg/test-error-complex-noenable.c diff --git a/gcc/jit/docs/topics/contexts.rst b/gcc/jit/docs/topics/contexts.rst index 00fb17e155d..d8689407cc6 100644 --- a/gcc/jit/docs/topics/contexts.rst +++ b/gcc/jit/docs/topics/contexts.rst @@ -489,6 +489,26 @@ Boolean options #ifdef LIBGCCJIT_HAVE_gcc_jit_context_set_bool_use_external_driver +.. function:: void\ + gcc_jit_context_set_bool_enable_complex_types (gcc_jit_context *ctxt,\ + int bool_value); + + This option can enable complex type support in libgccjit. By default, + no complex types can be used. + + If you use complex types, it is recommended to use builtin functions + `creal`, `cimag` and `conj` etc. with the correct type suffix, since + those are inlined into the code, rather
Re: [PATCH 3/N] Come up with casm global state.
On Thu, Oct 21, 2021 at 2:48 PM David Malcolm via Gcc-patches wrote: > > On Thu, 2021-09-16 at 15:12 +0200, Martin Liška wrote: > > This patch comes up with asm_out_state and a new global variable > > casm. > > > > Tested on all cross compilers. > > Patch can bootstrap on x86_64-linux-gnu and survives regression > > tests. > > > > Ready to be installed? > > Thanks, > > Martin > > This is possibly a silly question, but what does the "c" stand for in > "casm"? "compiler"? "compilation"? "common"? 'current' like cfun or crtl .. maybe just following "bad" examples? > Sorry if it's explained somewhere, but I didn't spot this when glancing > at the patches. > > Dave >
[PATCH 2/2] jit: Complex types and loong constants
This patch adds the possibility to make the following constants: * long long * long double * complex long double The long long one is needed for 32bit systems. The new entrypoints are: gcc_jit_context_new_rvalue_from_long_long gcc_jit_context_new_rvalue_from_long_double gcc_jit_context_new_rvalue_from_complex_long_double The patch also fixes a issue with the reproducer's debug c-file writer, which does not handle floating point numbers very well. I.e. infs, NaN and losing precision on doubles. make check-jit runs fine with the patch series on Debian 64x. See attachment.From f49cea6830b08a019958ef3f6a1ced1e416f80b3 Mon Sep 17 00:00:00 2001 From: Petter Tomner Date: Sat, 16 Oct 2021 21:55:20 +0200 Subject: [PATCH 2/3] long long Add long long, long double and complex long double constants This patch adds entrypoints to make constants (literals) for long long, long double and complex long double types. Also, it fixes problems with all floating point numbers in write_reproducer. I.e. support NAN and INF values, aswell as proper precision for double and long double. Which is needed for testing this patch. Signed-off-by: 2021-10-21 Petter Tomner gcc/jit/ * jit-playback.c : (new_rvalue_from_const ): New (new_rvalue_from_const ): Uses long long instead (new_rvalue_from_const ): New (new_rvalue_from_const <_Complex long double>) : New * jit-recording.c : (dump_reproducer_to_file) : Type punning NANs (reproducer_write_arr) : arr -> "char array literal"-string (memento_of_new_rvalue_from_const ) : New (memento_of_new_rvalue_from_const ) : New (memento_of_new_rvalue_from_const <_Complex long double>) : New (memento_of_new_rvalue_from_const ::make_debug_string): %g instead of %f (memento_of_new_rvalue_from_const ::write_reproducer): %a instead of %f, handle NAN and INF (memento_of_new_rvalue_from_const <_Complex double>::write_reproducer): %a instead of %f, handle NAN and INF. Use CMPLX macro. * libgccjit.c : * libgccjit.h : (LIBGCCJIT_HAVE_LONGLONG_CONSTANTS) : New (gcc_jit_context_new_rvalue_from_long_long) : New (gcc_jit_context_new_rvalue_from_long_double) : New (gcc_jit_context_new_rvalue_from_complex_long_double) : New * libgccjit++.h : New entrypoints * libgccjit.map: New entrypoints added to ABI 16 gcc/testsuite/ * jit.dg/all-non-failing-tests.h: Added test-long-literals.c * jit.dg/test-long-literals.c: New gcc/jit/docs/topics/ * expressions.rst : Updated docks --- gcc/jit/docs/topics/expressions.rst | 43 +- gcc/jit/jit-playback.c | 143 +- gcc/jit/jit-recording.c | 442 ++- gcc/jit/libgccjit++.h| 39 ++ gcc/jit/libgccjit.c | 46 ++ gcc/jit/libgccjit.h | 33 +- gcc/jit/libgccjit.map| 3 + gcc/testsuite/jit.dg/all-non-failing-tests.h | 7 + gcc/testsuite/jit.dg/test-long-literals.c| 283 9 files changed, 1017 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/jit.dg/test-long-literals.c diff --git a/gcc/jit/docs/topics/expressions.rst b/gcc/jit/docs/topics/expressions.rst index a98f35572b4..30a3b9780f9 100644 --- a/gcc/jit/docs/topics/expressions.rst +++ b/gcc/jit/docs/topics/expressions.rst @@ -70,6 +70,18 @@ Simple expressions Given a numeric type (integer or floating point), build an rvalue for the given constant :c:type:`long` value. +.. function:: gcc_jit_rvalue *\ + gcc_jit_context_new_rvalue_from_long_long (gcc_jit_context *ctxt, \ + gcc_jit_type *numeric_type, \ + long long value) + + Given a numeric type (integer or floating point), build an rvalue for + the given constant :c:type:`long long` value. + + This function was added in :ref:`LIBGCCJIT_ABI_16`; + you can test for its presence using + `#ifdef LIBGCCJIT_HAVE_LONGLONG_CONSTANTS` + .. function:: gcc_jit_rvalue *gcc_jit_context_zero (gcc_jit_context *ctxt, \ gcc_jit_type *numeric_type) @@ -98,6 +110,18 @@ Simple expressions Given a numeric type (integer or floating point), build an rvalue for the given constant :c:type:`double` value. +.. function:: gcc_jit_rvalue *\ + gcc_jit_context_new_rvalue_from_long_double (gcc_jit_context *ctxt, \ + gcc_jit_type *numeric_type, \ + long double value) + + Given a floating point type, build an rvalue for + the given constant :c:type:`long double`. + + This function was added in :ref:`LIBGCCJIT_ABI_16`; + you can test for its presence using + `#ifdef LIBGCCJIT_HAVE_LONGLONG_CONSTANTS` + .. function:: gcc_jit_rvalue *\ gcc_jit_context_new_rvalue_from_complex_double (gcc_jit_context *ctxt, \
Re: [PATCH] options: Fix variable tracking option processing.
On 10/21/21 11:57, Richard Biener wrote: which previously affected debug_nonbind_markers_p. I think it makes sense to move the above to finish_options as well. I suppose -help doesn't correctly dump the -g enabled state for -gtoggle at the moment? All right, works for me. The updated patch was tested on nvptx cross compiler and can bootstrap on x86_64-linux-gnu and survives regression tests. The -g option is showed in -Q --help=common as: -g So no option value is displayed. Is the patch fine now? Thanks, MartinFrom da24e51c8e108373256f1106fe4478b919189c99 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 14 Oct 2021 14:57:18 +0200 Subject: [PATCH] options: Fix variable tracking option processing. PR debug/102585 PR bootstrap/102766 gcc/ChangeLog: * opts.c (finish_options): Process flag_var_tracking* options here as they can be adjusted by optimize attribute. Process also flag_syntax_only and flag_gtoggle. * toplev.c (process_options): Remove it here. * common.opt: Make debug_nonbind_markers_p as PerFunction attribute as it depends on optimization level. gcc/testsuite/ChangeLog: * gcc.dg/pr102585.c: New test. --- gcc/common.opt | 2 +- gcc/opts.c | 45 +++ gcc/testsuite/gcc.dg/pr102585.c | 6 + gcc/toplev.c| 47 + 4 files changed, 53 insertions(+), 47 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr102585.c diff --git a/gcc/common.opt b/gcc/common.opt index a2af7fb36e0..c4a77f65aa2 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -3284,7 +3284,7 @@ Common Driver JoinedOrMissing Negative(gvms) Generate debug information in extended STABS format. gstatement-frontiers -Common Driver Var(debug_nonbind_markers_p) +Common Driver Var(debug_nonbind_markers_p) PerFunction Emit progressive recommended breakpoint locations. gstrict-dwarf diff --git a/gcc/opts.c b/gcc/opts.c index 65fe192a198..4472cec1b98 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1349,6 +1349,51 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, SET_OPTION_IF_UNSET (opts, opts_set, flag_vect_cost_model, VECT_COST_MODEL_CHEAP); + /* One could use EnabledBy, but it would lead to a circular dependency. */ + if (!OPTION_SET_P (flag_var_tracking_uninit)) + flag_var_tracking_uninit = flag_var_tracking; + + if (!OPTION_SET_P (flag_var_tracking_assignments)) +flag_var_tracking_assignments + = (flag_var_tracking + && !(flag_selective_scheduling || flag_selective_scheduling2)); + + if (flag_var_tracking_assignments_toggle) +flag_var_tracking_assignments = !flag_var_tracking_assignments; + + if (flag_var_tracking_assignments && !flag_var_tracking) +flag_var_tracking = flag_var_tracking_assignments = -1; + + if (flag_var_tracking_assignments + && (flag_selective_scheduling || flag_selective_scheduling2)) +warning_at (loc, 0, + "var-tracking-assignments changes selective scheduling"); + + if (flag_syntax_only) +{ + write_symbols = NO_DEBUG; + profile_flag = 0; +} + + if (flag_gtoggle) +{ + if (debug_info_level == DINFO_LEVEL_NONE) + { + debug_info_level = DINFO_LEVEL_NORMAL; + + if (write_symbols == NO_DEBUG) + write_symbols = PREFERRED_DEBUGGING_TYPE; + } + else + debug_info_level = DINFO_LEVEL_NONE; +} + + if (!OPTION_SET_P (debug_nonbind_markers_p)) +debug_nonbind_markers_p + = (optimize + && debug_info_level >= DINFO_LEVEL_NORMAL + && dwarf_debuginfo_p () + && !(flag_selective_scheduling || flag_selective_scheduling2)); } #define LEFT_COLUMN 27 diff --git a/gcc/testsuite/gcc.dg/pr102585.c b/gcc/testsuite/gcc.dg/pr102585.c new file mode 100644 index 000..efd066b4a4e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr102585.c @@ -0,0 +1,6 @@ +/* PR debug/102585 */ +/* { dg-do compile } */ +/* { dg-options "-fvar-tracking-assignments -fno-var-tracking" } */ + +#pragma GCC optimize 0 +void d_demangle_callback_Og() { int c = 0; } diff --git a/gcc/toplev.c b/gcc/toplev.c index cb4f8c470f0..486bd8804fa 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1378,25 +1378,6 @@ process_options (bool no_backend) } } - if (flag_syntax_only) -{ - write_symbols = NO_DEBUG; - profile_flag = 0; -} - - if (flag_gtoggle) -{ - if (debug_info_level == DINFO_LEVEL_NONE) - { - debug_info_level = DINFO_LEVEL_NORMAL; - - if (write_symbols == NO_DEBUG) - write_symbols = PREFERRED_DEBUGGING_TYPE; - } - else - debug_info_level = DINFO_LEVEL_NONE; -} - /* CTF is supported for only C at this time. */ if (!lang_GNU_C () && ctf_debug_info_level > CTFINFO_LEVEL_NONE) @@ -1499,6 +1480,7 @@ process_options (bool no_backend) } flag_var_tracking = 0; flag_var_tracking_uninit = 0; + flag_var_tracking_assignments = 0; } /* The debug hooks are used to implemen
Re: [PATCH] Convert strlen pass from evrp to ranger.
On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez wrote: > > On Thu, Oct 21, 2021 at 12:20 PM Richard Biener > wrote: > > > > On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: > > > > > > > > > > > > On 10/18/2021 2:17 AM, Aldy Hernandez wrote: > > > > > > > > > > > > On 10/18/21 12:52 AM, Jeff Law wrote: > > > >> > > > >> > > > >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: > > > >>> The following patch converts the strlen pass from evrp to ranger, > > > >>> leaving DOM as the last remaining user. > > > >> So is there any reason why we can't convert DOM as well? DOM's use > > > >> of EVRP is pretty limited. You've mentioned FP bits before, but my > > > >> recollection is those are not part of the EVRP analysis DOM uses. > > > >> Hell, give me a little guidance and I'll do the work... > > > > > > > > Not only will I take you up on that offer, but I can provide 90% of > > > > the work. Here be dragons, though (well, for me, maybe not for you > > > > ;-)). > > > > > > > > DOM is actually an evrp pass at -O1 in disguise. The reason it really > > > > is a covert evrp pass is because: > > > > > > > > a) It calls extract_range_from_stmt on each statement. > > > > > > > > b) It folds conditionals with simplify_using_ranges. > > > > > > > > c) But most importantly, it exports discovered ranges when it's done > > > > (evrp_range_analyzer(true)). > > > > > > > > If you look at the evrp pass, you'll notice that that's basically what > > > > it does, albeit with the substitute and fold engine, which also calls > > > > gimple fold plus other goodies. > > > > > > > > But I could argue that we've made DOM into an evrp pass without > > > > noticing. The last item (c) is particularly invasive because these > > > > exported ranges show up in other passes unexpectedly. For instance, I > > > > saw an RTL pass at -O1 miss an optimization because it was dependent > > > > on some global range being set. IMO, DOM should not export global > > > > ranges it discovered during its walk (do one thing and do it well), > > > > but I leave it to you experts to pontificate. > > > All true. But I don't think we've got many, if any, hard dependencies > > > on those behaviors. > > > > > > > > > > > The attached patch is rather trivial. It's mostly deleting state. It > > > > seems DOM spends a lot of time massaging the IL so that it can fold > > > > conditionals or thread paths. None of this is needed, because the > > > > ranger can do all of this. Well, except floats, but... > > > Massaging the IL should only take two forms IIRC. > > > > > > First, if we have a simplification we can do. That could be const/copy > > > propagation, replacing an expression with an SSA_NAME or constant and > > > the like. It doesn't massage the IL just to massage the IL. > > > > > > Second, we do temporarily copy propagate the current known values of an > > > SSA name into use points and then see if that allows us to determine if > > > a statement is already in the hash tables. But we undo that so that > > > nobody should see that temporary change in state. > > > > Are you sure we still do that? I can't find it at least. > > I couldn't either, but perhaps what Jeff is referring to is the ad-hoc > copy propagation that happens in the DOM's use of the threader: > > /* Make a copy of the uses & vuses into USES_COPY, then cprop into > the operands. */ > FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) > { > tree tmp = NULL; > tree use = USE_FROM_PTR (use_p); > > copy[i++] = use; > if (TREE_CODE (use) == SSA_NAME) > tmp = SSA_NAME_VALUE (use); > if (tmp) > SET_USE (use_p, tmp); > } > > cached_lhs = simplifier->simplify (stmt, stmt, bb, this); > > /* Restore the statement's original uses/defs. */ > i = 0; > FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) > SET_USE (use_p, copy[i++]); Ah, likely. These days we'd likely use a gimple_match_op but then this seems heavily abstracted, no idea where simplifier->simplify might lead to ;) I'm also not sure why the threader would do the valueization here and not the simplify() function (and lookup_avail_expr misses an 'exploded' operand lookup as well). Lot's of legacy code ;) But I think the above is not going to be an issue unless ranger runs in circles around backedges, arriving at this very same stmt again? A way out might be to copy the stmt to the stack, adjust operands and use that for the simplify entry. Richard. > Aldy >
Re: [PATCH] options: Fix variable tracking option processing.
On Thu, Oct 21, 2021 at 3:14 PM Martin Liška wrote: > > On 10/21/21 11:57, Richard Biener wrote: > > which previously affected debug_nonbind_markers_p. I think it makes > > sense to move > > the above to finish_options as well. I suppose -help doesn't correctly dump > > the -g enabled state for -gtoggle at the moment? > > All right, works for me. The updated patch was tested on nvptx cross compiler > and can bootstrap on x86_64-linux-gnu and survives regression tests. > > The -g option is showed in -Q --help=common as: > >-g > > So no option value is displayed. > > Is the patch fine now? OK. Thanks, Richard. > Thanks, > Martin
[PATCH] or1k: Update FPU to specify detect tininess before rounding
This was not defined in the spec and not consistent in the implementation causing incosistent behavior. After review we have updated the CPU implementations and proposed the spec be updated to specific that FPU tininess checks check for tininess before roudning. Architecture change draft: https://openrisc.io/proposals/p18-fpu-tininess libgcc/ChangeLog: * config/or1k/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING): Change to 0. --- libgcc/config/or1k/sfp-machine.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgcc/config/or1k/sfp-machine.h b/libgcc/config/or1k/sfp-machine.h index eebe5b0578e..162c6bc5326 100644 --- a/libgcc/config/or1k/sfp-machine.h +++ b/libgcc/config/or1k/sfp-machine.h @@ -85,7 +85,7 @@ do { \ #define __BYTE_ORDER __BIG_ENDIAN -#define _FP_TININESS_AFTER_ROUNDING 1 +#define _FP_TININESS_AFTER_ROUNDING 0 /* Define ALIASNAME as a strong alias for NAME. */ # define strong_alias(name, aliasname) _strong_alias(name, aliasname) -- 2.31.1
[PATCH] libcody: Avoid double-free
If the listen call fails then 'goto fail' will jump to that label and use freeaddrinfo again. Set the pointer to null to prevent that. libcody/ChangeLog: * netserver.cc (ListenInet6): Set pointer to null after deallocation. --- libcody/netserver.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/libcody/netserver.cc b/libcody/netserver.cc index 30202c5106a..0499b5790b4 100644 --- a/libcody/netserver.cc +++ b/libcody/netserver.cc @@ -140,6 +140,7 @@ int ListenInet6 (char const **e, char const *name, int port, unsigned backlog) listen:; freeaddrinfo (addrs); + addrs = nullptr; if (listen (fd, backlog ? backlog : 17) < 0) { -- 2.31.1
[PATCH] c++tools: Fix memory leak
The allocated memory is not freed when returning early due to an error. c++tools/ChangeLog: * resolver.cc (module_resolver::read_tuple_file): Use unique_ptr to ensure memory is freed before returning. --- c++tools/resolver.cc | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/c++tools/resolver.cc b/c++tools/resolver.cc index 421fdaa55fe..d1b73a47778 100644 --- a/c++tools/resolver.cc +++ b/c++tools/resolver.cc @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see #include "resolver.h" // C++ #include +#include // C #include // OS @@ -114,10 +115,17 @@ module_resolver::read_tuple_file (int fd, char const *prefix, bool force) buffer = mmap (nullptr, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0); if (buffer == MAP_FAILED) return -errno; + struct Deleter { +void operator()(void* p) const { munmap(p, size); } +size_t size; + }; + std::unique_ptr guard(buffer, Deleter{(size_t)stat.st_size}); #else buffer = xmalloc (stat.st_size); if (!buffer) return -errno; + struct Deleter { void operator()(void* p) const { free(p); } }; + std::unique_ptr guard; if (read (fd, buffer, stat.st_size) != stat.st_size) return -errno; #endif @@ -179,12 +187,6 @@ module_resolver::read_tuple_file (int fd, char const *prefix, bool force) } } -#if MAPPED_READING - munmap (buffer, stat.st_size); -#else - free (buffer); -#endif - return 0; } -- 2.31.1
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/21/21 3:14 PM, Richard Biener wrote: On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez wrote: On Thu, Oct 21, 2021 at 12:20 PM Richard Biener wrote: On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: On 10/18/2021 2:17 AM, Aldy Hernandez wrote: On 10/18/21 12:52 AM, Jeff Law wrote: On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: The following patch converts the strlen pass from evrp to ranger, leaving DOM as the last remaining user. So is there any reason why we can't convert DOM as well? DOM's use of EVRP is pretty limited. You've mentioned FP bits before, but my recollection is those are not part of the EVRP analysis DOM uses. Hell, give me a little guidance and I'll do the work... Not only will I take you up on that offer, but I can provide 90% of the work. Here be dragons, though (well, for me, maybe not for you ;-)). DOM is actually an evrp pass at -O1 in disguise. The reason it really is a covert evrp pass is because: a) It calls extract_range_from_stmt on each statement. b) It folds conditionals with simplify_using_ranges. c) But most importantly, it exports discovered ranges when it's done (evrp_range_analyzer(true)). If you look at the evrp pass, you'll notice that that's basically what it does, albeit with the substitute and fold engine, which also calls gimple fold plus other goodies. But I could argue that we've made DOM into an evrp pass without noticing. The last item (c) is particularly invasive because these exported ranges show up in other passes unexpectedly. For instance, I saw an RTL pass at -O1 miss an optimization because it was dependent on some global range being set. IMO, DOM should not export global ranges it discovered during its walk (do one thing and do it well), but I leave it to you experts to pontificate. All true. But I don't think we've got many, if any, hard dependencies on those behaviors. The attached patch is rather trivial. It's mostly deleting state. It seems DOM spends a lot of time massaging the IL so that it can fold conditionals or thread paths. None of this is needed, because the ranger can do all of this. Well, except floats, but... Massaging the IL should only take two forms IIRC. First, if we have a simplification we can do. That could be const/copy propagation, replacing an expression with an SSA_NAME or constant and the like. It doesn't massage the IL just to massage the IL. Second, we do temporarily copy propagate the current known values of an SSA name into use points and then see if that allows us to determine if a statement is already in the hash tables. But we undo that so that nobody should see that temporary change in state. Are you sure we still do that? I can't find it at least. I couldn't either, but perhaps what Jeff is referring to is the ad-hoc copy propagation that happens in the DOM's use of the threader: /* Make a copy of the uses & vuses into USES_COPY, then cprop into the operands. */ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) { tree tmp = NULL; tree use = USE_FROM_PTR (use_p); copy[i++] = use; if (TREE_CODE (use) == SSA_NAME) tmp = SSA_NAME_VALUE (use); if (tmp) SET_USE (use_p, tmp); } cached_lhs = simplifier->simplify (stmt, stmt, bb, this); /* Restore the statement's original uses/defs. */ i = 0; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) SET_USE (use_p, copy[i++]); Ah, likely. These days we'd likely use a gimple_match_op but then this seems heavily abstracted, no idea where simplifier->simplify might lead to ;) I'm also not sure why the threader would do the valueization here and not the simplify() function (and lookup_avail_expr misses an 'exploded' operand lookup as well). Lot's of legacy code ;) Yes, there's a lot of legacy code I've left mostly alone. There are two copies of simplify() now, the DOM version (dom_jt_simplifier::simplify) and the VRP version (hybrid_jt_simplifier::simplify). Each do slightly different things, as has always been the case. It's just that the separation is now explicit. No idea what's up with the valueization either. The VRP version doesn't need any of this valuezation or on the side structures. You just ask the range of a stmt on a path and it gives you an answer. But I think the above is not going to be an issue unless ranger runs in circles around backedges, arriving at this very same stmt again? A way out might be to copy the stmt to the stack, adjust operands and use that for the simplify entry. If you look at the patch I sent Jeff, you'll see that I've removed most of what's in that function. There's no need to propagate anything. The ranger can simplify the final conditional without having to set up anything. Aldy
Re: [PATCH] libcody: Avoid double-free
Oops, this patch and the next one were only meant for gcc-patches, not the libstdc++ list as well - sorry! On Thu, 21 Oct 2021 at 14:27, Jonathan Wakely via Libstdc++ < libstd...@gcc.gnu.org> wrote: > If the listen call fails then 'goto fail' will jump to that label and > use freeaddrinfo again. Set the pointer to null to prevent that. > > libcody/ChangeLog: > > * netserver.cc (ListenInet6): Set pointer to null after > deallocation. > --- > libcody/netserver.cc | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libcody/netserver.cc b/libcody/netserver.cc > index 30202c5106a..0499b5790b4 100644 > --- a/libcody/netserver.cc > +++ b/libcody/netserver.cc > @@ -140,6 +140,7 @@ int ListenInet6 (char const **e, char const *name, int > port, unsigned backlog) > > listen:; >freeaddrinfo (addrs); > + addrs = nullptr; > >if (listen (fd, backlog ? backlog : 17) < 0) > { > -- > 2.31.1 > >
[COMMITED 1/3] Move ranger only VRP folder to tree-vrp.
This patch moves the ranger rvrp folder class to tree-vrp.c, and provides an execute_ranger_vrp() routine that matches the same parameter sequence as the execute_vrp() routine has. The evrp pass is tweaked to call this routine when running in ranger-only mode. This paves the way for a ranger version of VRP to be instantiated. There is no functional change. Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed. >From 434ebc1e08b1d83ecd3622ee2a3c7270869bda52 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Fri, 15 Oct 2021 12:26:13 -0400 Subject: [PATCH 1/3] Move ranger only VRP folder to tree-vrp. Consolidate the RVRP folder into a single "execute_vrp" routine that mimics the format used by the vrp1 and vrp2 passes. Relocate into the tree-vrp file. * gimple-ssa-evrp.c (class rvrp_folder): Move to tree-vrp.c. (execute_early_vrp): For ranger only mode, invoke ranger_vrp. * tree-vrp.c (class rvrp_folder): Relocate here. (execute_ranger_vrp): New. * tree-vrp.h (execute_ranger_vrp): Export. --- gcc/gimple-ssa-evrp.c | 92 ++- gcc/tree-vrp.c| 122 ++ gcc/tree-vrp.h| 2 + 3 files changed, 128 insertions(+), 88 deletions(-) diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c index 7f2055501a0..a0192e2b2aa 100644 --- a/gcc/gimple-ssa-evrp.c +++ b/gcc/gimple-ssa-evrp.c @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-range.h" #include "fold-const.h" #include "value-pointer-equiv.h" +#include "tree-vrp.h" // This is the classic EVRP folder which uses a dominator walk and pushes // ranges into the next block if it is a single predecessor block. @@ -110,88 +111,6 @@ protected: simplify_using_ranges simplifier; }; -// This is a ranger based folder which continues to use the dominator -// walk to access the substitute and fold machinery. Ranges are calculated -// on demand. - -class rvrp_folder : public substitute_and_fold_engine -{ -public: - - rvrp_folder () : substitute_and_fold_engine (), m_simplifier () - { -m_ranger = enable_ranger (cfun); -m_simplifier.set_range_query (m_ranger, m_ranger->non_executable_edge_flag); -m_pta = new pointer_equiv_analyzer (m_ranger); - } - - ~rvrp_folder () - { -if (dump_file && (dump_flags & TDF_DETAILS)) - m_ranger->dump (dump_file); - -m_ranger->export_global_ranges (); -disable_ranger (cfun); -delete m_pta; - } - - tree value_of_expr (tree name, gimple *s = NULL) OVERRIDE - { -// Shortcircuit subst_and_fold callbacks for abnormal ssa_names. -if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name)) - return NULL; -tree ret = m_ranger->value_of_expr (name, s); -if (!ret && supported_pointer_equiv_p (name)) - ret = m_pta->get_equiv (name); -return ret; - } - - tree value_on_edge (edge e, tree name) OVERRIDE - { -// Shortcircuit subst_and_fold callbacks for abnormal ssa_names. -if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name)) - return NULL; -tree ret = m_ranger->value_on_edge (e, name); -if (!ret && supported_pointer_equiv_p (name)) - ret = m_pta->get_equiv (name); -return ret; - } - - tree value_of_stmt (gimple *s, tree name = NULL) OVERRIDE - { -// Shortcircuit subst_and_fold callbacks for abnormal ssa_names. -if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name)) - return NULL; -return m_ranger->value_of_stmt (s, name); - } - - void pre_fold_bb (basic_block bb) OVERRIDE - { -m_pta->enter (bb); - } - - void post_fold_bb (basic_block bb) OVERRIDE - { -m_pta->leave (bb); - } - - void pre_fold_stmt (gimple *stmt) OVERRIDE - { -m_pta->visit_stmt (stmt); - } - - bool fold_stmt (gimple_stmt_iterator *gsi) OVERRIDE - { -return m_simplifier.simplify (gsi); - } - -private: - DISABLE_COPY_AND_ASSIGN (rvrp_folder); - gimple_ranger *m_ranger; - simplify_using_ranges m_simplifier; - pointer_equiv_analyzer *m_pta; -}; - // In a hybrid folder, start with an EVRP folder, and add the required // fold_stmt bits to either try the ranger first or second. // @@ -393,6 +312,9 @@ hybrid_folder::choose_value (tree evrp_val, tree ranger_val) static unsigned int execute_early_vrp () { + if ((param_evrp_mode & EVRP_MODE_RVRP_FIRST) == EVRP_MODE_RVRP_ONLY) +return execute_ranger_vrp (cfun, false); + /* Ideally this setup code would move into the ctor for the folder However, this setup can change the number of blocks which invalidates the internal arrays that are set up by the dominator @@ -411,12 +333,6 @@ execute_early_vrp () folder.substitute_and_fold (); break; } -case EVRP_MODE_RVRP_ONLY: - { - rvrp_folder folder; - folder.substitute_and_fold (); - break; - } case EVRP_MODE_EVRP_FIRST: { hybrid_folder folder (true); diff --git a/gcc/tree-vrp
[COMMITTED 2/3] Add --param=vrp1-mode and --param=vrp2-mode.
This patch adds 2 new --params, vrp1-mode and vrp2-mode. They can be used to select which version of VRP should be used for each pass. Valid options are "vrp" and "ranger": ie --param=vrp1-mode=vrp --param=vrp2-mode=ranger As VRP is the current default for both, again, there is no functional change, but it is now possible to run ranger in either of the 2 VRP passes. The plan will be to make ranger the default for VRP2 once I've sorted out a couple of testcases. Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed. >From bd400db6d3ec167142ace352db00f84d382e33a8 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Fri, 15 Oct 2021 12:06:27 -0400 Subject: [PATCH 2/3] Add --param=vrp1-mode and --param=vrp2-mode. Add 2 new params to select between VRP and RANGER to be used for each pass. * doc/invoke.texi: (vrp1-mode, vrp2-mode): Document. * flag-types.h: (enum vrp_mode): New. * params.opt: (vrp1-mode, vrp2-mode): New. * tree-vrp.c (vrp_pass_num): New. (pass_vrp::pass_vrp): Set pass number. (pass_vrp::execute): Choose which VRP mode to execute. --- gcc/doc/invoke.texi | 6 ++ gcc/flag-types.h| 7 +++ gcc/params.opt | 17 + gcc/tree-vrp.c | 12 ++-- 4 files changed, 40 insertions(+), 2 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6d1e328571a..b89f9b61f9c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14504,6 +14504,12 @@ Maximum number of basic blocks before EVRP uses a sparse cache. @item evrp-mode Specifies the mode Early VRP should operate in. +@item vrp1-mode +Specifies the mode VRP pass 1 should operate in. + +@item vrp2-mode +Specifies the mode VRP pass 2 should operate in. + @item evrp-switch-limit Specifies the maximum number of switch cases before EVRP ignores a switch. diff --git a/gcc/flag-types.h b/gcc/flag-types.h index ae0b216e8a3..9f104e43d40 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -463,6 +463,13 @@ enum evrp_mode EVRP_MODE_DEBUG = (EVRP_MODE_GORI | EVRP_MODE_CACHE) }; +/* VRP modes. */ +enum vrp_mode +{ + VRP_MODE_VRP, + VRP_MODE_RANGER +}; + /* Modes of OpenACC 'kernels' constructs handling. */ enum openacc_kernels { diff --git a/gcc/params.opt b/gcc/params.opt index 83b3db6fea6..27ef4b6578f 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1137,4 +1137,21 @@ Controls how loop vectorizer uses partial vectors. 0 means never, 1 means only Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 1) Param Optimization The maximum factor which the loop vectorizer applies to the cost of statements in an inner loop relative to the loop being vectorized. +-param=vrp1-mode= +Common Joined Var(param_vrp1_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param Optimization +--param=vrp1-mode=[vrp|ranger] Specifies the mode VRP1 should operate in. + +-param=vrp2-mode= +Common Joined Var(param_vrp2_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param Optimization +--param=vrp2-mode=[vrp|ranger] Specifies the mode VRP2 should operate in. + +Enum +Name(vrp_mode) Type(enum vrp_mode) UnknownError(unknown vrp mode %qs) + +EnumValue +Enum(vrp_mode) String(vrp) Value(VRP_MODE_VRP) + +EnumValue +Enum(vrp_mode) String(ranger) Value(VRP_MODE_RANGER) + ; This comment is to ensure we retain the blank line above. diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index b0b217bbf86..ba7a4efc7c6 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4450,11 +4450,13 @@ const pass_data pass_data_vrp = ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */ }; +static int vrp_pass_num = 0; class pass_vrp : public gimple_opt_pass { public: pass_vrp (gcc::context *ctxt) -: gimple_opt_pass (pass_data_vrp, ctxt), warn_array_bounds_p (false) +: gimple_opt_pass (pass_data_vrp, ctxt), warn_array_bounds_p (false), + my_pass (++vrp_pass_num) {} /* opt_pass methods: */ @@ -4466,10 +4468,16 @@ public: } virtual bool gate (function *) { return flag_tree_vrp != 0; } virtual unsigned int execute (function *fun) -{ return execute_vrp (fun, warn_array_bounds_p); } +{ + if ((my_pass == 1 && param_vrp1_mode == VRP_MODE_RANGER) + || (my_pass == 2 && param_vrp2_mode == VRP_MODE_RANGER)) + return execute_ranger_vrp (fun, warn_array_bounds_p); + return execute_vrp (fun, warn_array_bounds_p); +} private: bool warn_array_bounds_p; + int my_pass; }; // class pass_vrp } // anon namespace -- 2.17.2
[COMMITTED 3/3] Split --param=evrp-mode into evrp-mode and ranger-debug.
Should have done this a while ago. With ranger being utilized in other places, tying the debug output mode with the EVRP mode no longer makes sense. This patch splits the current --param=evrp-mode values into "evrp-mode" and "ranger-debug". After this patch, valid values are: --param=evrp-mode=[legacy|ranger|legacy-first|ranger-first] default is "ranger" --param=ranger-debug=[none|trace|gori|cache|tracegori|all] default is "none" Again, this provides no real functional changes. Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed. >From 9cb114fd5550eb02dfd6b8db5cb5b8fb72827d53 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Tue, 19 Oct 2021 14:09:51 -0400 Subject: [PATCH 3/3] Split --param=evrp-mode into evrp-mode and ranger-debug. With Ranger being used in more than EVRP, the debug output should no longer be tied up with the EVRP mode flag. * doc/invoke.texi (ranger-debug): Document. * flag-types.h (enum ranger_debug): New. (enum evrp_mode): Remove debug values. * gimple-range-cache.cc (DEBUG_RANGE_CACHE): Use new debug flag. * gimple-range-gori.cc (gori_compute::gori_compute): Ditto. * gimple-range.cc (gimple_ranger::gimple_ranger): Ditto. * gimple-ssa-evrp.c (hybrid_folder::choose_value): Ditto. (execute_early_vrp): Use evrp-mode directly. * params.opt (enum evrp_mode): Remove debug values. (ranger-debug): New. (ranger-logical-depth): Relocate to be in alphabetical order. --- gcc/doc/invoke.texi | 3 +++ gcc/flag-types.h | 24 +++--- gcc/gimple-range-cache.cc | 4 +-- gcc/gimple-range-gori.cc | 2 +- gcc/gimple-range.cc | 2 +- gcc/gimple-ssa-evrp.c | 6 ++--- gcc/params.opt| 52 +++ 7 files changed, 56 insertions(+), 37 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index b89f9b61f9c..c66a25fcd69 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -14510,6 +14510,9 @@ Specifies the mode VRP pass 1 should operate in. @item vrp2-mode Specifies the mode VRP pass 2 should operate in. +@item ranger-debug +Specifies the type of debug output to be issued for ranges. + @item evrp-switch-limit Specifies the maximum number of switch cases before EVRP ignores a switch. diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 9f104e43d40..a5a637160d7 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -449,18 +449,24 @@ enum parloops_schedule_type PARLOOPS_SCHEDULE_RUNTIME }; +/* Ranger debug mode. */ +enum ranger_debug +{ + RANGER_DEBUG_NONE = 0, + RANGER_DEBUG_TRACE = 1, + RANGER_DEBUG_CACHE = (2 | RANGER_DEBUG_TRACE), + RANGER_DEBUG_GORI = 4, + RANGER_DEBUG_TRACE_GORI = (RANGER_DEBUG_TRACE | RANGER_DEBUG_GORI), + RANGER_DEBUG_ALL = (RANGER_DEBUG_GORI | RANGER_DEBUG_CACHE) +}; + /* EVRP mode. */ enum evrp_mode { - EVRP_MODE_RVRP_ONLY = 0, - EVRP_MODE_EVRP_ONLY = 1, - EVRP_MODE_EVRP_FIRST = 2, - EVRP_MODE_RVRP_FIRST = 3, - EVRP_MODE_TRACE = 4, - EVRP_MODE_CACHE = (8 | EVRP_MODE_TRACE), - EVRP_MODE_GORI = 16, - EVRP_MODE_TRACE_GORI = (EVRP_MODE_TRACE | EVRP_MODE_GORI), - EVRP_MODE_DEBUG = (EVRP_MODE_GORI | EVRP_MODE_CACHE) + EVRP_MODE_RVRP_ONLY, + EVRP_MODE_EVRP_ONLY, + EVRP_MODE_EVRP_FIRST, + EVRP_MODE_RVRP_FIRST }; /* VRP modes. */ diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index 9cbc63d8a40..05010cf15bc 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -30,8 +30,8 @@ along with GCC; see the file COPYING3. If not see #include "gimple-range.h" #include "tree-cfg.h" -#define DEBUG_RANGE_CACHE (dump_file && (param_evrp_mode & EVRP_MODE_CACHE) \ - == EVRP_MODE_CACHE) +#define DEBUG_RANGE_CACHE (dump_file \ + && (param_ranger_debug & RANGER_DEBUG_CACHE)) // During contructor, allocate the vector of ssa_names. diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index 6946fa65dda..4e45c593871 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -644,7 +644,7 @@ gori_compute::gori_compute (int not_executable_flag) // Create a boolean_type true and false range. m_bool_zero = int_range<2> (boolean_false_node, boolean_false_node); m_bool_one = int_range<2> (boolean_true_node, boolean_true_node); - if (dump_file && (param_evrp_mode & EVRP_MODE_GORI)) + if (dump_file && (param_ranger_debug & RANGER_DEBUG_GORI)) tracer.enable_trace (); } diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index 93d6da66ccb..69cde911c49 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -42,7 +42,7 @@ gimple_ranger::gimple_ranger () : { // If the cache has a relation oracle, use it. m_oracle = m_cache.oracle (); - if (dump_file && (param_evrp_mode & EVRP_MODE_TRACE)) + if (dump_file && (param_ranger_debug & RANGER_DEBUG_TRACE)) tracer.enable_trace (); // Ensure the not_executable flag is clear everywhere. diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c index a0192e2b2aa
Re: [PATCH] i386: Fix wrong codegen for V8HF move without TARGET_AVX512F
Thanks for reminding this, will adjust the testcase since the output for 128/256bit HFmode load has changed. Martin Liška 于2021年10月21日周四 下午8:49写道: > > On 10/21/21 07:47, Hongyu Wang via Gcc-patches wrote: > > |Yes, updated patch.| > > Note the patch caused the following test failure: > > FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ > \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 > FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ > \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1 > FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ > \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 > FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ > \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1 > > Martin
Re: [PATCH 3/N] Come up with casm global state.
On 10/21/21 14:42, Richard Biener wrote: On Thu, Oct 21, 2021 at 11:47 AM Martin Liška wrote: On 10/5/21 13:54, Richard Biener wrote: On Mon, Oct 4, 2021 at 1:13 PM Martin Liška wrote: On 9/22/21 11:59, Richard Biener wrote: On Thu, Sep 16, 2021 at 3:12 PM Martin Liška wrote: This patch comes up with asm_out_state and a new global variable casm. Tested on all cross compilers. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? * output.h (struct asm_out_file): New struct that replaces a ^^^ asm_out_state? Yes, sure! You replace a lot of asm_out_file - do we still need the macro then? (I'd have simplified the patch leaving that replacement out at first) Well, I actually replaced only a very small fraction of the usage of asm_out_file. $ git grep asm_out_file | grep -v ChangeLog | wc -l 1601 So I think we should live with the macro for some time ... You leave quite some global state out of the struct that is obviously related, in the diff I see object_block_htab for example. Yes, I'm aware of it. Can we do that incrementally? Basically everything initialized in init_varasm_once is a candidate (which then shows const_desc_htab and shared_constant_pool as well as pending_assemble_externals_set). Well, these would probably need a different header file (or another #include ... must be added before output.h :// ). For the goal of outputting early DWARF to another file the state CTOR could have a mode to not initialize those parts or we could have asm-out-state-with-sections as base of asm-out-state. Yes, right now asm_out_state ctor is minimal: asm_out_state (): out_file (NULL), in_section (NULL), sec ({}), anchor_labelno (0), in_cold_section_p (false) { section_htab = hash_table::create_ggc (31); } In the end there will be a target part of the state so I think construction needs to be defered to the target which can derive from asm-out-state and initialize the part it needs. That's currently what targetm.asm_out.init_sections () does and we'd transform that to a targetm.asm_out.create () or so. That might already be necessary for the DWARF stuff. So what do you want to with content of init_varasm_once function? It goes away ;) Or rather becomes the invocation of the asm-out-state CTOR via the target hook. That said, dealing with the target stuff piecemail is OK, but maybe try to make sure that init_varasm_once is actually identical to what the CTOR does? So you want asm_out_state::asm_out_state doing what we current initialize in init_varasm_once, right? Yes, asm_out_state should cover everything initialized in init_varasm_once (and the called targetm.asm_out.init_sections). targetm.asm_out.init_sections would become asm_out_state *init_sections (); where the target can derive from asm_out_state (optionally) and allocates the asm-out-state & invokes the CTORs. The middle-end visible asm_out_state would contain all the tables initialized in init_varasm_once. I'm not sure if doing it piecemail is good - we don't want to touch things multiple times without good need and it might be things turn out not be as "simple" as the above plan sounds. I would suggest to try the conversion with powerpc since it seems to be the only target with two different implementations for the target hook. The static GTY(()) section *read_only_data_section; static GTY(()) section *private_data_section; static GTY(()) section *tls_data_section; static GTY(()) section *tls_private_data_section; static GTY(()) section *read_only_private_data_section; static GTY(()) section *sdata2_section; section *toc_section = 0; could become #defines to static_cast (casm)->section_name and there'd be class rs6000_asm_out_state : public asm_out_state { ... add stuff ... } in rs6000/xcoff.h and rs6000/sysv4.h and some generic rs6000 header as well (adding no fields) and the target hook would basically do return ggc_new rs6000_asm_state (); (OK, ggc_alloc + placement new I guess). The default hook just creates asm_out_state. Hello. All right, I made a patch candidate that does the suggested steps and I ported rs6000 target (both ELF and XCOFF sub-targets). So far I was able to bootstrap on ppc6le-linux-gnu. Thoughts? +extern GTY(()) rs6000_asm_out_state *target_casm; this seems to be unused Sure, it can be a static variable in rs6000.c for GGC marking purpose. +#define rs6000_casm static_cast (casm) maybe there's a better way? Though I can't think of one at the moment. There are only 10 uses so eventually we can put the static_cast into all places. Let's ask the powerpc maintainers (CCed). Or we can wrap it with a function call doing the casting? Note you do +/* Implement TARGET_ASM_INIT_SECTIONS. */ + +static asm_out_state * +rs6000_elf_asm_init_sections (void) +{ + rs6000_asm_out_state *target_state += new (ggc_alloc ()) rs6000_asm_out_state (); + target_state->init_elf_s
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/21/2021 6:56 AM, Aldy Hernandez wrote: On Thu, Oct 21, 2021 at 12:20 PM Richard Biener wrote: On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: On 10/18/2021 2:17 AM, Aldy Hernandez wrote: On 10/18/21 12:52 AM, Jeff Law wrote: On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: The following patch converts the strlen pass from evrp to ranger, leaving DOM as the last remaining user. So is there any reason why we can't convert DOM as well? DOM's use of EVRP is pretty limited. You've mentioned FP bits before, but my recollection is those are not part of the EVRP analysis DOM uses. Hell, give me a little guidance and I'll do the work... Not only will I take you up on that offer, but I can provide 90% of the work. Here be dragons, though (well, for me, maybe not for you ;-)). DOM is actually an evrp pass at -O1 in disguise. The reason it really is a covert evrp pass is because: a) It calls extract_range_from_stmt on each statement. b) It folds conditionals with simplify_using_ranges. c) But most importantly, it exports discovered ranges when it's done (evrp_range_analyzer(true)). If you look at the evrp pass, you'll notice that that's basically what it does, albeit with the substitute and fold engine, which also calls gimple fold plus other goodies. But I could argue that we've made DOM into an evrp pass without noticing. The last item (c) is particularly invasive because these exported ranges show up in other passes unexpectedly. For instance, I saw an RTL pass at -O1 miss an optimization because it was dependent on some global range being set. IMO, DOM should not export global ranges it discovered during its walk (do one thing and do it well), but I leave it to you experts to pontificate. All true. But I don't think we've got many, if any, hard dependencies on those behaviors. The attached patch is rather trivial. It's mostly deleting state. It seems DOM spends a lot of time massaging the IL so that it can fold conditionals or thread paths. None of this is needed, because the ranger can do all of this. Well, except floats, but... Massaging the IL should only take two forms IIRC. First, if we have a simplification we can do. That could be const/copy propagation, replacing an expression with an SSA_NAME or constant and the like. It doesn't massage the IL just to massage the IL. Second, we do temporarily copy propagate the current known values of an SSA name into use points and then see if that allows us to determine if a statement is already in the hash tables. But we undo that so that nobody should see that temporary change in state. Are you sure we still do that? I can't find it at least. I couldn't either, but perhaps what Jeff is referring to is the ad-hoc copy propagation that happens in the DOM's use of the threader: /* Make a copy of the uses & vuses into USES_COPY, then cprop into the operands. */ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) { tree tmp = NULL; tree use = USE_FROM_PTR (use_p); copy[i++] = use; if (TREE_CODE (use) == SSA_NAME) tmp = SSA_NAME_VALUE (use); if (tmp) SET_USE (use_p, tmp); } cached_lhs = simplifier->simplify (stmt, stmt, bb, this); /* Restore the statement's original uses/defs. */ i = 0; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) SET_USE (use_p, copy[i++]); Exactly what I was referring to. It may be worth an experiment -- I can't recall when this code went in. It might be a remnant from the original threader that pre-dates block copying. In that world we had to look up expressions in the table as part of the step to verify that we could safely ignore statements at the start of a block. Or it could be something that was added to improve threading when the condition we're trying to thread through is partially redundant on the thread path. This would allow us to discover that partial redundancy and exploit it for threading. In either case this code may have outlived its usefulness. I wonder what would happen if we just took it out. jeff
Re: [PATCH] Convert strlen pass from evrp to ranger.
On Thu, Oct 21, 2021 at 3:30 PM Aldy Hernandez wrote: > > > > On 10/21/21 3:14 PM, Richard Biener wrote: > > On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez wrote: > >> > >> On Thu, Oct 21, 2021 at 12:20 PM Richard Biener > >> wrote: > >>> > >>> On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: > > > > On 10/18/2021 2:17 AM, Aldy Hernandez wrote: > > > > > > On 10/18/21 12:52 AM, Jeff Law wrote: > >> > >> > >> On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: > >>> The following patch converts the strlen pass from evrp to ranger, > >>> leaving DOM as the last remaining user. > >> So is there any reason why we can't convert DOM as well? DOM's use > >> of EVRP is pretty limited. You've mentioned FP bits before, but my > >> recollection is those are not part of the EVRP analysis DOM uses. > >> Hell, give me a little guidance and I'll do the work... > > > > Not only will I take you up on that offer, but I can provide 90% of > > the work. Here be dragons, though (well, for me, maybe not for you > > ;-)). > > > > DOM is actually an evrp pass at -O1 in disguise. The reason it really > > is a covert evrp pass is because: > > > > a) It calls extract_range_from_stmt on each statement. > > > > b) It folds conditionals with simplify_using_ranges. > > > > c) But most importantly, it exports discovered ranges when it's done > > (evrp_range_analyzer(true)). > > > > If you look at the evrp pass, you'll notice that that's basically what > > it does, albeit with the substitute and fold engine, which also calls > > gimple fold plus other goodies. > > > > But I could argue that we've made DOM into an evrp pass without > > noticing. The last item (c) is particularly invasive because these > > exported ranges show up in other passes unexpectedly. For instance, I > > saw an RTL pass at -O1 miss an optimization because it was dependent > > on some global range being set. IMO, DOM should not export global > > ranges it discovered during its walk (do one thing and do it well), > > but I leave it to you experts to pontificate. > All true. But I don't think we've got many, if any, hard dependencies > on those behaviors. > > > > > The attached patch is rather trivial. It's mostly deleting state. It > > seems DOM spends a lot of time massaging the IL so that it can fold > > conditionals or thread paths. None of this is needed, because the > > ranger can do all of this. Well, except floats, but... > Massaging the IL should only take two forms IIRC. > > First, if we have a simplification we can do. That could be const/copy > propagation, replacing an expression with an SSA_NAME or constant and > the like. It doesn't massage the IL just to massage the IL. > > Second, we do temporarily copy propagate the current known values of an > SSA name into use points and then see if that allows us to determine if > a statement is already in the hash tables. But we undo that so that > nobody should see that temporary change in state. > >>> > >>> Are you sure we still do that? I can't find it at least. > >> > >> I couldn't either, but perhaps what Jeff is referring to is the ad-hoc > >> copy propagation that happens in the DOM's use of the threader: > >> > >>/* Make a copy of the uses & vuses into USES_COPY, then cprop into > >> the operands. */ > >>FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) > >> { > >>tree tmp = NULL; > >>tree use = USE_FROM_PTR (use_p); > >> > >>copy[i++] = use; > >>if (TREE_CODE (use) == SSA_NAME) > >> tmp = SSA_NAME_VALUE (use); > >>if (tmp) > >> SET_USE (use_p, tmp); > >> } > >> > >>cached_lhs = simplifier->simplify (stmt, stmt, bb, this); > >> > >>/* Restore the statement's original uses/defs. */ > >>i = 0; > >>FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) > >> SET_USE (use_p, copy[i++]); > > > > Ah, likely. These days we'd likely use a gimple_match_op but then > > this seems heavily abstracted, no idea where simplifier->simplify > > might lead to ;) > > I'm also not sure why the threader would do the valueization here and > > not the simplify() function (and lookup_avail_expr misses an 'exploded' > > operand > > lookup as well). Lot's of legacy code ;) > > Yes, there's a lot of legacy code I've left mostly alone. > > There are two copies of simplify() now, the DOM version > (dom_jt_simplifier::simplify) and the VRP version > (hybrid_jt_simplifier::simplify). Each do slightly different things, as > has always been the case. It's just that the separation is now explicit. > > No idea what's up with the valueization either. The VRP version doesn't
[PATCH] Adjust testcase for 128/256 bit HF vector load/store
Hi, The HF vector move have been updated to align with HI vector, adjust according testcase for _Float16 vector load and store. Tested on x86_64-pc-linux-gnu{-m32,}, pushed as obvious fix. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-13.c: Adjust scan-assembler for xmm/ymm load/store. --- gcc/testsuite/gcc.target/i386/avx512fp16-13.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c index c3bae65da67..b73a8f44e1a 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c @@ -18,7 +18,7 @@ store256_ph (void *p, __m256h a) _mm256_store_ph (p, a); } -/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */ void __attribute__ ((noinline, noclone)) @@ -27,7 +27,7 @@ store_ph (void *p, __m128h a) _mm_store_ph (p, a); } -/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */ __m512h __attribute__ ((noinline, noclone)) @@ -45,7 +45,7 @@ load256_ph (void const *p) return _mm256_load_ph (p); } -/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)" 1 } } */ __m128h __attribute__ ((noinline, noclone)) @@ -53,7 +53,7 @@ load_ph (void const *p) { return _mm_load_ph (p); } -/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)" 1 } } */ __m512h __attribute__ ((noinline, noclone)) -- 2.18.1
Re: [PATCH 3/N] Come up with casm global state.
On Thu, Oct 21, 2021 at 3:43 PM Martin Liška wrote: > > On 10/21/21 14:42, Richard Biener wrote: > > On Thu, Oct 21, 2021 at 11:47 AM Martin Liška wrote: > >> > >> On 10/5/21 13:54, Richard Biener wrote: > >>> On Mon, Oct 4, 2021 at 1:13 PM Martin Liška wrote: > > On 9/22/21 11:59, Richard Biener wrote: > > On Thu, Sep 16, 2021 at 3:12 PM Martin Liška wrote: > >> > >> This patch comes up with asm_out_state and a new global variable casm. > >> > >> Tested on all cross compilers. > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > > > >* output.h (struct asm_out_file): New struct that replaces a > > ^^^ > > asm_out_state? > > Yes, sure! > > > > > You replace a lot of asm_out_file - do we still need the macro then? > > (I'd have simplified the patch leaving that replacement out at first) > > Well, I actually replaced only a very small fraction of the usage of > asm_out_file. > > $ git grep asm_out_file | grep -v ChangeLog | wc -l > > 1601 > > > So I think we should live with the macro for some time ... > > > > > You leave quite some global state out of the struct that is obviously > > related, in the diff I see object_block_htab for example. > > Yes, I'm aware of it. Can we do that incrementally? > > > Basically > > everything initialized in init_varasm_once is a candidate (which > > then shows const_desc_htab and shared_constant_pool as well > > as pending_assemble_externals_set). > > Well, these would probably need a different header file (or another > #include ... must > be added before output.h :// ). > > > For the goal of outputting > > early DWARF to another file the state CTOR could have a mode > > to not initialize those parts or we could have > > asm-out-state-with-sections > > as base of asm-out-state. > > Yes, right now asm_out_state ctor is minimal: > > asm_out_state (): out_file (NULL), in_section (NULL), > sec ({}), anchor_labelno (0), in_cold_section_p (false) > { > section_htab = hash_table::create_ggc (31); > } > > > > > In the end there will be a target part of the state so I think > > construction needs to be defered to the target which can > > derive from asm-out-state and initialize the part it needs. > > That's currently what targetm.asm_out.init_sections () does > > and we'd transform that to a targetm.asm_out.create () or so. > > That might already be necessary for the DWARF stuff. > > So what do you want to with content of init_varasm_once function? > >>> > >>> It goes away ;) Or rather becomes the invocation of the > >>> asm-out-state CTOR via the target hook. > >>> > > > > That said, dealing with the target stuff piecemail is OK, but maybe > > try to make sure that init_varasm_once is actually identical > > to what the CTOR does? > > So you want asm_out_state::asm_out_state doing what we current initialize > in init_varasm_once, right? > >>> > >>> Yes, asm_out_state should cover everything initialized in init_varasm_once > >>> (and the called targetm.asm_out.init_sections). > >>> > >>> targetm.asm_out.init_sections would become > >>> > >>> asm_out_state *init_sections (); > >>> > >>> where the target can derive from asm_out_state (optionally) and > >>> allocates the asm-out-state & invokes the CTORs. The middle-end > >>> visible asm_out_state would contain all the tables initialized in > >>> init_varasm_once. > >>> > >>> I'm not sure if doing it piecemail is good - we don't want to touch > >>> things multiple times without good need and it might be things > >>> turn out not be as "simple" as the above plan sounds. > >>> > >>> I would suggest to try the conversion with powerpc since it > >>> seems to be the only target with two different implementations > >>> for the target hook. > >>> > >>> The > >>> > >>> static GTY(()) section *read_only_data_section; > >>> static GTY(()) section *private_data_section; > >>> static GTY(()) section *tls_data_section; > >>> static GTY(()) section *tls_private_data_section; > >>> static GTY(()) section *read_only_private_data_section; > >>> static GTY(()) section *sdata2_section; > >>> > >>> section *toc_section = 0; > >>> > >>> could become #defines to static_cast > >>> (casm)->section_name > >>> and there'd be > >>> > >>> class rs6000_asm_out_state : public asm_out_state > >>> { > >>> ... add stuff ... > >>> } > >>> > >>> in rs6000/xcoff.h and rs6000/sysv4.h and some generic rs6000 header as > >>> well > >>> (adding no fields) and the target hook would basically do > >>> > >>>return ggc_new rs6000_asm_state (); > >>> > >>> (OK, ggc_alloc + placement new I guess). The default hook just > >>> creates a
Re: [PATCH] gcc: implement AIX-style constructors
On Thu, Oct 21, 2021 at 8:39 AM CHIGOT, CLEMENT wrote: > > Hi David, > > The problem is that cdtors is created by the linker only when the -bcdtors > flag is provided. Thus, if we add "extern void (* _cdtors[]) (void);" to > the "crtcxa.c", we can't use it without using the new constructor types. > One solution would be to create another crtcxa (eg crtcxa_cdtors.o) which will > be replacing the default crtcxa.o when needed. I didn't think about that > when creating the patch. But it might be a better approach. What do you think > ? Hi, Clement Another, optional object file in libgcc seems like a cleaner solution than collect2.c emitting all of the code. The file should not be called crtcxa_cdtors.o because "cxa" came from the name of the libstdc++ support functions. Maybe crtcdtors.o or crt_cdtors.o or aixcdtors.o or aix_cdtors.o . Thanks, David
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/21/21 3:46 PM, Richard Biener wrote: On Thu, Oct 21, 2021 at 3:30 PM Aldy Hernandez wrote: On 10/21/21 3:14 PM, Richard Biener wrote: On Thu, Oct 21, 2021 at 2:56 PM Aldy Hernandez wrote: On Thu, Oct 21, 2021 at 12:20 PM Richard Biener wrote: On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: On 10/18/2021 2:17 AM, Aldy Hernandez wrote: On 10/18/21 12:52 AM, Jeff Law wrote: On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: The following patch converts the strlen pass from evrp to ranger, leaving DOM as the last remaining user. So is there any reason why we can't convert DOM as well? DOM's use of EVRP is pretty limited. You've mentioned FP bits before, but my recollection is those are not part of the EVRP analysis DOM uses. Hell, give me a little guidance and I'll do the work... Not only will I take you up on that offer, but I can provide 90% of the work. Here be dragons, though (well, for me, maybe not for you ;-)). DOM is actually an evrp pass at -O1 in disguise. The reason it really is a covert evrp pass is because: a) It calls extract_range_from_stmt on each statement. b) It folds conditionals with simplify_using_ranges. c) But most importantly, it exports discovered ranges when it's done (evrp_range_analyzer(true)). If you look at the evrp pass, you'll notice that that's basically what it does, albeit with the substitute and fold engine, which also calls gimple fold plus other goodies. But I could argue that we've made DOM into an evrp pass without noticing. The last item (c) is particularly invasive because these exported ranges show up in other passes unexpectedly. For instance, I saw an RTL pass at -O1 miss an optimization because it was dependent on some global range being set. IMO, DOM should not export global ranges it discovered during its walk (do one thing and do it well), but I leave it to you experts to pontificate. All true. But I don't think we've got many, if any, hard dependencies on those behaviors. The attached patch is rather trivial. It's mostly deleting state. It seems DOM spends a lot of time massaging the IL so that it can fold conditionals or thread paths. None of this is needed, because the ranger can do all of this. Well, except floats, but... Massaging the IL should only take two forms IIRC. First, if we have a simplification we can do. That could be const/copy propagation, replacing an expression with an SSA_NAME or constant and the like. It doesn't massage the IL just to massage the IL. Second, we do temporarily copy propagate the current known values of an SSA name into use points and then see if that allows us to determine if a statement is already in the hash tables. But we undo that so that nobody should see that temporary change in state. Are you sure we still do that? I can't find it at least. I couldn't either, but perhaps what Jeff is referring to is the ad-hoc copy propagation that happens in the DOM's use of the threader: /* Make a copy of the uses & vuses into USES_COPY, then cprop into the operands. */ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) { tree tmp = NULL; tree use = USE_FROM_PTR (use_p); copy[i++] = use; if (TREE_CODE (use) == SSA_NAME) tmp = SSA_NAME_VALUE (use); if (tmp) SET_USE (use_p, tmp); } cached_lhs = simplifier->simplify (stmt, stmt, bb, this); /* Restore the statement's original uses/defs. */ i = 0; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) SET_USE (use_p, copy[i++]); Ah, likely. These days we'd likely use a gimple_match_op but then this seems heavily abstracted, no idea where simplifier->simplify might lead to ;) I'm also not sure why the threader would do the valueization here and not the simplify() function (and lookup_avail_expr misses an 'exploded' operand lookup as well). Lot's of legacy code ;) Yes, there's a lot of legacy code I've left mostly alone. There are two copies of simplify() now, the DOM version (dom_jt_simplifier::simplify) and the VRP version (hybrid_jt_simplifier::simplify). Each do slightly different things, as has always been the case. It's just that the separation is now explicit. No idea what's up with the valueization either. The VRP version doesn't need any of this valuezation or on the side structures. You just ask the range of a stmt on a path and it gives you an answer. Yeah, but it doesn't "simplify", it uses ranger to do constant folding ... (ick). For random expressions I'd have used gimple_simplify (in fact it somehow tries to do value-numbering or sth like that to derive new equivalences). The ranger is read-only. It is not folding anything. The simplify() callback is only returning the singleton the conditional resolves to (1 or 0 iirc). It is up to the caller (the forward threader in this case) to d
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/21/21 3:43 PM, Jeff Law wrote: On 10/21/2021 6:56 AM, Aldy Hernandez wrote: On Thu, Oct 21, 2021 at 12:20 PM Richard Biener wrote: On Wed, Oct 20, 2021 at 10:58 PM Jeff Law wrote: On 10/18/2021 2:17 AM, Aldy Hernandez wrote: On 10/18/21 12:52 AM, Jeff Law wrote: On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote: The following patch converts the strlen pass from evrp to ranger, leaving DOM as the last remaining user. So is there any reason why we can't convert DOM as well? DOM's use of EVRP is pretty limited. You've mentioned FP bits before, but my recollection is those are not part of the EVRP analysis DOM uses. Hell, give me a little guidance and I'll do the work... Not only will I take you up on that offer, but I can provide 90% of the work. Here be dragons, though (well, for me, maybe not for you ;-)). DOM is actually an evrp pass at -O1 in disguise. The reason it really is a covert evrp pass is because: a) It calls extract_range_from_stmt on each statement. b) It folds conditionals with simplify_using_ranges. c) But most importantly, it exports discovered ranges when it's done (evrp_range_analyzer(true)). If you look at the evrp pass, you'll notice that that's basically what it does, albeit with the substitute and fold engine, which also calls gimple fold plus other goodies. But I could argue that we've made DOM into an evrp pass without noticing. The last item (c) is particularly invasive because these exported ranges show up in other passes unexpectedly. For instance, I saw an RTL pass at -O1 miss an optimization because it was dependent on some global range being set. IMO, DOM should not export global ranges it discovered during its walk (do one thing and do it well), but I leave it to you experts to pontificate. All true. But I don't think we've got many, if any, hard dependencies on those behaviors. The attached patch is rather trivial. It's mostly deleting state. It seems DOM spends a lot of time massaging the IL so that it can fold conditionals or thread paths. None of this is needed, because the ranger can do all of this. Well, except floats, but... Massaging the IL should only take two forms IIRC. First, if we have a simplification we can do. That could be const/copy propagation, replacing an expression with an SSA_NAME or constant and the like. It doesn't massage the IL just to massage the IL. Second, we do temporarily copy propagate the current known values of an SSA name into use points and then see if that allows us to determine if a statement is already in the hash tables. But we undo that so that nobody should see that temporary change in state. Are you sure we still do that? I can't find it at least. I couldn't either, but perhaps what Jeff is referring to is the ad-hoc copy propagation that happens in the DOM's use of the threader: /* Make a copy of the uses & vuses into USES_COPY, then cprop into the operands. */ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) { tree tmp = NULL; tree use = USE_FROM_PTR (use_p); copy[i++] = use; if (TREE_CODE (use) == SSA_NAME) tmp = SSA_NAME_VALUE (use); if (tmp) SET_USE (use_p, tmp); } cached_lhs = simplifier->simplify (stmt, stmt, bb, this); /* Restore the statement's original uses/defs. */ i = 0; FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_ALL_USES) SET_USE (use_p, copy[i++]); Exactly what I was referring to. It may be worth an experiment -- I can't recall when this code went in. It might be a remnant from the original threader that pre-dates block copying. In that world we had to look up expressions in the table as part of the step to verify that we could safely ignore statements at the start of a block. Or it could be something that was added to improve threading when the condition we're trying to thread through is partially redundant on the thread path. This would allow us to discover that partial redundancy and exploit it for threading. In either case this code may have outlived its usefulness. I wonder what would happen if we just took it out. Look at the patch I sent you. I rip most of it out, as ranger doesn't need it. Perhaps DOM+evrp+threader doesn't either?? Aldy
[PATCH] coroutine use of lambda capture proxies [PR96517]
Hi (apologies to Jason and Nathan for two copies - I sent from the wrong email and omitted patches@) This is a description of how the revised handling of coroutine proxy variables is supposed to work, in the case of methods and lambdas. Given that this came up again recently, perhaps I might put some version of this description as a block comment before the rewrite code. There is a bug-fix patch for PR96517, the long preamble is to put context (and, hopefully, thus identify any more assumptions that I might have missed). NOTE: It is the user’s responsibility to ensure that ‘this’ and any lambda object persist for the duration of the coroutine. IMO this is probably not the best design decision ever made by the committee since it is very easy to make innocent-looking but broken coroutine code with lambdas. [aside: I think a more user-friendly decision would have been to extend the lifetime of temporaries in statements until *all* the code before the semi-colon is executed (i.e. including deferred code in a coroutine)] anyway… consider this illustrative-only case (where ‘coroutine’ is a suitable implementation of the customisation points). I have elided some of the implementation variables, they are handled the same way as the ones shown and this is already long enough. code struct test { int a; void foo() { int c; auto lam = [this, c](int b) -> coroutine { co_return a + b + c; }; } }; Before the coroutine transformation process: closure object: { int __c test* __this typedef closure_type } inline coroutine test::foo::…..::operator() (closure_type *__closure, int b) { BIND_EXPR VARS { const int c; // lambda proxy var => __closure->__c test* const this; // lambda proxy var => __clousre->__this } BODY { { const int c; test* const this; co_return this->a + b + c; } } } Now because of the design decision mentioned above, we do not copy the closure contents, only the pointer. === after transformation our coroutine frame will look like this… test::foo::……..Frame { void (*)test::foo::Frame*)_Coro_resume_fn)(test::foo::…..Frame*) void (*)test::foo::...Frame*)_Coro_destroy_fn)(est::foo::…...Frame*) coroutine::promise_type _Coro_promise ... closure_type* const __closure // parm copy int b // parm copy, no need for mangling. ... std::__n4861::suspend_always Is_1_1 // local var frame entries are mangled with their scope info, std::__n4861::suspend_never Fs_1_3 } our rewritten lambda looks like this: BIND_EXPR VARS { void (*_Coro_resume_fn)(test::foo::._anon_4::.Frame*); // coroutine proxy var for resume fn void (*_Coro_destroy_fn)(test::foo::._anon_4::.Frame*); // ... coroutine::promise_type _Coro_promise; // for promise DVE => Frame->_Coro_promise ... closure_type* const __closure; // coro proxy var for closure parm DVE => Frame->__closure int b; // coro proxy var for b parm …. } BODY { { void (*_Coro_resume_fn)(test::foo::._anon_4::.Frame*); void (*_Coro_destroy_fn)(test::foo::._anon_4::.Frame*); coroutine::promise_type _Coro_promise; ... const test::foo::._anon_4* const __closure; int b; ... try { co_await initial suspend expr. BIND_EXPR (original lambda body) VARS { const int c; // lambda proxy var; DVE => __closure->__c test* const this; // lambda proxy var DVE => __closure->__this } BODY { { const int c; test* const this; co_return this->a + b + c; } } } catch(...) cleanup_stmt … } c-finally ... final.suspend:; … } } We have a predicate for is_capture_proxy() for lambdas So .. in this expression: co_return this->a + b + c; If we ask “is_capture_proxy” for ’this’ or ‘c’ we should answer “yes” - these vars are referred to via the closure object and, if the constraints of the standard are met (lifetime of the closure object and the captured this), then it should be just as if they were used in the body of a regular lambda. The closure pointer itself is a coro proxy var (for the parm) b is a coro proxy var (for the parm) Note that the parm proxies are VAR decls not PARMs (we need to make a copy, because the original parms are used in the ramp, and obviously they are *not* parms of the helper function). so when all the DECL_VALUE_EXPRs are expanded we would get: Frame->__closure->__this->a + Frame->b + Frame->_closure->__c; Whilst the initial intention of this was to make debug work (since GCC’s debug generation already knows about DVEs) it turns out that it greatly simplifies the code, since we are dealing with the original variables instead of replacing them with a component ref. PR 96517 fires because we answer “NO” for is_capture_proxy in the rewritten code, because we do not (correctly) answer “yes” for LAMBDA_TYPE_P for
Re: [COMMITTED] Avoid threading circular paths.
On 10/21/2021 2:12 AM, Aldy Hernandez wrote: The backward threader keeps a hash of visited blocks to avoid crossing the same block twice. Interestingly, we haven't been checking it for the final block out of the path. This may be inherited from the old code, as it was simple enough that it didn't matter. With the upcoming changes enabling the fully resolving threader, it gets tripped often enough to cause wrong code to be generated. Tested on x86-64 Linux. Committed as obvious. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader::maybe_register_path): Avoid threading circular paths. I noticed you already reverted. I won't complain about the morning test results :-) Jeff
[PATCH] libstdc++: missing constexpr for __[nm]iter_base [PR102358]
Tested on x86_64-pc-linux-gnu, does this look ok for trunk/11/10? PR libstdc++/102358 libstdc++-v3/ChangeLog: * include/bits/stl_iterator.h (__niter_base): Make constexpr for C++20. (__miter_base): Likewise. * testsuite/25_algorithms/move/constexpr.cc: New test. --- libstdc++-v3/include/bits/stl_iterator.h | 2 ++ .../testsuite/25_algorithms/move/constexpr.cc | 19 +++ 2 files changed, 21 insertions(+) create mode 100644 libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 8afd6756613..fbd91956ced 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -2470,6 +2470,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// @} group iterators template +_GLIBCXX20_CONSTEXPR auto __niter_base(move_iterator<_Iterator> __it) -> decltype(make_move_iterator(__niter_base(__it.base( @@ -2483,6 +2484,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION }; template +_GLIBCXX20_CONSTEXPR auto __miter_base(move_iterator<_Iterator> __it) -> decltype(__miter_base(__it.base())) diff --git a/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc new file mode 100644 index 000..773c55cfb50 --- /dev/null +++ b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc @@ -0,0 +1,19 @@ +// { dg-options "-std=gnu++20" } +// { dg-do compile { target c++20 } } + +#include +#include + +constexpr bool +test01() +{ + // PR libstdc++/102358 + int x[2] = {1,2}, y[2]; + std::span in(x), out(y); + std::move(std::move_iterator(in.begin()), std::move_iterator(in.end()), + out.begin()); + return std::equal(std::move_iterator(in.begin()), std::move_iterator(in.end()), + std::move_iterator(out.begin())); +} + +static_assert(test01()); -- 2.33.1.711.g9d530dc002
Re: [COMMITTED] Avoid threading circular paths.
Sorry, I tried to revert as soon as I noticed, but it's always a race against the 20 CI bots and fuzzers that keep us honest ;-). Aldy On Thu, Oct 21, 2021 at 4:35 PM Jeff Law wrote: > > > > On 10/21/2021 2:12 AM, Aldy Hernandez wrote: > > The backward threader keeps a hash of visited blocks to avoid crossing > > the same block twice. Interestingly, we haven't been checking it for > > the final block out of the path. This may be inherited from the old > > code, as it was simple enough that it didn't matter. With the > > upcoming changes enabling the fully resolving threader, it gets > > tripped often enough to cause wrong code to be generated. > > > > Tested on x86-64 Linux. > > > > Committed as obvious. > > > > gcc/ChangeLog: > > > > * tree-ssa-threadbackward.c (back_threader::maybe_register_path): > > Avoid threading circular paths. > I noticed you already reverted. I won't complain about the morning test > results :-) > > Jeff > >
[PATCH] libstdc++: Implement P2432R1 changes for views::istream
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11/10? libstdc++-v3/ChangeLog: * include/std/ranges (istream_view): Replace this function template with an alias template as per P2432R1. (wistream_view): Define as per P2432R1. (views::_Istream, views::istream): Likewise. * testsuite/std/ranges/istream_view.cc (test07): New test. --- libstdc++-v3/include/std/ranges | 25 --- .../testsuite/std/ranges/istream_view.cc | 13 ++ 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index 30ba0606869..7aa0f9dfc67 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -781,10 +781,27 @@ namespace views friend _Iterator; }; - template -basic_istream_view<_Val, _CharT, _Traits> -istream_view(basic_istream<_CharT, _Traits>& __s) -{ return basic_istream_view<_Val, _CharT, _Traits>{__s}; } + template +using istream_view = basic_istream_view<_Val, char>; + + template +using wistream_view = basic_istream_view<_Val, wchar_t>; + +namespace views +{ + template +struct _Istream +{ + template + [[nodiscard]] + constexpr auto + operator()(basic_istream<_CharT, _TraitsT>& __e) const + { return basic_istream_view<_Tp, _CharT, _TraitsT>(__e); } +}; + + template +inline constexpr _Istream<_Tp> istream; +} // C++20 24.7 [range.adaptors] Range adaptors diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc b/libstdc++-v3/testsuite/std/ranges/istream_view.cc index f5c0c2a6bb0..ea7c5a35ef7 100644 --- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc +++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc @@ -103,6 +103,18 @@ test06() static_assert( std::is_same_v ); } +void +test07() +{ + // P2432R1, views::istream + auto nums = std::istringstream("0 1 2 3 4"); + ranges::istream_view v(nums); + int sum = 0; + for (int val : views::istream(nums)) +sum += val; + VERIFY( sum == 10 ); +} + int main() { @@ -112,4 +124,5 @@ main() test04(); test05(); test06(); + test07(); } -- 2.33.1.711.g9d530dc002
Re: [PATCH] Try to resolve paths in threader without looking further back.
On 10/21/21 1:17 AM, Aldy Hernandez wrote: On Wed, Oct 20, 2021 at 10:01 PM Jeff Law wrote: On 10/20/2021 9:15 AM, Aldy Hernandez wrote: On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor wrote: I appreciate the heads up. I'm happy that the threader has improved. I'm obviously not pleased that it has led to regressions in warnings but I understand that in some cases they might be due to limitations in the warning code. I think the test case you have xfailed might be one such example. The uninitialized warnings are exquisitely sensitive to these types of changes. If/when this patch is applied please reopen PR 89230 and reference this commit. Having said that, to maintain the quality of diagnostics, the work that goes into these nice optimizer improvements needs to be balanced by an effort to either update the warning code to cope with the IL changes, or the optimizers need to take care to avoid exposing undefined code that the warnings are designed to detect. I'm concerned not just that the quality of GCC 12 diagnostics has been eroding, but also that it seems to be not just acceptable but expected. You make a very good point. It is certainly not my intention to make life difficult for the warning maintainers, but I'm afraid I don't have sufficient knowledge in the area to improve them. There may be some low hanging fruit though. At least in the warnings that use the ranger, there's no reason to run these passes so late in the pipeline. You could run the warning code as early as you want, insofar as SSA is available and the CFG has been built. Heck, you may even be able to run at -O0, though we may need some sort of value numbering. I believe Richi even suggested this a while back. Running them later in the pipeline is to take advantage of the optimizers removing dead and unreachable code as much as possible. In fact, that's critical to -Wuninitialized. Optimizing away unreachable paths to avoid Wuninitialized false positives has been the major driver of jump threading improvements for the last 15 years. Ughh, that's unfortunate. We're gonna have to come up with improvements to the Wuninitialized code, or a different paradigm altogether. I'm afraid this will only get worse. It is a bit ironic that jump threading helps reduce Wuninitialized false positives, but yet too much of it causes even more false positives. For best results they need to work together. That was also my point. All middle end warnings necessarily rely on optimizers for one thing or another. When we change one such optimization and break the assumptions the warnings make (right or wrong), the warnings break too. To minimize the fallout we need to do two things: decouple them as much as possible (minimize the assumptions) and, when we change the optimizers, we might need to update the warnings. If we don't, then yes, things are going to keep getting worse. As an aside, in GCC 12 a good number of middle end warnings already do run at -O0: all those in gimple-ssa-warn-access.c. I'd like to see gimple-ssa-array-bounds invoked from the access pass too (instead of from VRP), and eventually -Wrestrict as well. I'm not sure about the strlen/sprintf warnings; those might need to stay where they are because they run as part of the optimizers there. (By the way, I don't see range info in the access pass at -O0. Should I?) Martin
Re: [PATCH] libstdc++: missing constexpr for __[nm]iter_base [PR102358]
On Thu, 21 Oct 2021 at 15:38, Patrick Palka via Libstdc++ < libstd...@gcc.gnu.org> wrote: > Tested on x86_64-pc-linux-gnu, does this look ok for trunk/11/10? > Yes to all, thanks. > PR libstdc++/102358 > > libstdc++-v3/ChangeLog: > > * include/bits/stl_iterator.h (__niter_base): Make constexpr > for C++20. > (__miter_base): Likewise. > * testsuite/25_algorithms/move/constexpr.cc: New test. > --- > libstdc++-v3/include/bits/stl_iterator.h | 2 ++ > .../testsuite/25_algorithms/move/constexpr.cc | 19 +++ > 2 files changed, 21 insertions(+) > create mode 100644 libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc > > diff --git a/libstdc++-v3/include/bits/stl_iterator.h > b/libstdc++-v3/include/bits/stl_iterator.h > index 8afd6756613..fbd91956ced 100644 > --- a/libstdc++-v3/include/bits/stl_iterator.h > +++ b/libstdc++-v3/include/bits/stl_iterator.h > @@ -2470,6 +2470,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >/// @} group iterators > >template > +_GLIBCXX20_CONSTEXPR > auto > __niter_base(move_iterator<_Iterator> __it) > -> decltype(make_move_iterator(__niter_base(__it.base( > @@ -2483,6 +2484,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > }; > >template > +_GLIBCXX20_CONSTEXPR > auto > __miter_base(move_iterator<_Iterator> __it) > -> decltype(__miter_base(__it.base())) > diff --git a/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc > b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc > new file mode 100644 > index 000..773c55cfb50 > --- /dev/null > +++ b/libstdc++-v3/testsuite/25_algorithms/move/constexpr.cc > @@ -0,0 +1,19 @@ > +// { dg-options "-std=gnu++20" } > +// { dg-do compile { target c++20 } } > + > +#include > +#include > + > +constexpr bool > +test01() > +{ > + // PR libstdc++/102358 > + int x[2] = {1,2}, y[2]; > + std::span in(x), out(y); > + std::move(std::move_iterator(in.begin()), std::move_iterator(in.end()), > + out.begin()); > + return std::equal(std::move_iterator(in.begin()), > std::move_iterator(in.end()), > + std::move_iterator(out.begin())); > +} > + > +static_assert(test01()); > -- > 2.33.1.711.g9d530dc002 > >
Re: how does vrp2 rearrange this?
On 10/19/21 7:13 PM, Andrew Pinski wrote: On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod wrote: On 10/19/21 5:13 PM, Andrew Pinski wrote: On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches wrote: using testcase ifcvt-4.c: typedef int word __attribute__((mode(word))); word foo (word x, word y, word a) { word i = x; word j = y; /* Try to make taking the branch likely. */ __builtin_expect (x > y, 1); if (x > y) { i = a; j = i; } return i * j; The testcase is broken anyways. The builtin_expect should be inside the if to have any effect. Look at the estimated values: if (x_3(D) > y_4(D)) goto ; [50.00%]<<-- has been reversed. else goto ; [50.00%] ;;succ: 4 [50.0% (guessed)] count:536870912 (estimated locally) (TRUE_VALUE,EXECUTABLE) ;;3 [50.0% (guessed)] count:536870912 (estimated locally) (FALSE_VALUE,EXECUTABLE) See how it is 50/50? The testcase is not even testing what it says it is testing. Just happened to work previously does not mean anything. Move the builtin_expect inside the if and try again. I am shocked it took this long to find the testcase issue really. Thanks, Andrew Pinski Moving the expect around doesn't change anything, in fact, it makes it worse since fre and evrp immediately eliminate it as true if it is in the THEN block. It looks like it is eliminated by the CDDCE pass: cddce1 sees: _1 = x_5(D) > y_7(D); # RANGE [0, 1] NONZERO 1 _2 = (long int) _1; __builtin_expect (_2, 1); if (x_5(D) > y_7(D)) goto ; [INV] else goto ; [INV] and proceeds: Marking useful stmt: if (x_5(D) > y_7(D)) processing: if (x_5(D) > y_7(D)) processing: i_3 = PHI Eliminating unnecessary statements: Deleting : __builtin_expect (_2, 1); Deleting : _2 = (long int) _1; Deleting : _1 = x_5(D) > y_7(D); IF we are suppose to reverse the If, it is not obvious to me who is suppose to.. You seem to be right that its a crap shot that VRP2 does it because there isnt enough info to dictate it.. unless somewhere it detects that a THEN targets an empty block which fallthrus to the ELSE block should be swapped. Or maybe you are right and that it flukeily happens due to the ASSERTS being added and removed. IF i turn of DCE, then this all works like it si ssupopse to.. so maybe DCE isnt supopse to remove this? Andrew
Re: [PATCH] libstdc++: Implement P2432R1 changes for views::istream
On Thu, 21 Oct 2021 at 15:41, Patrick Palka via Libstdc++ < libstd...@gcc.gnu.org> wrote: > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11/10? > Yes for trunk. I'm undecided whether we want to make this breaking change on the branches. > libstdc++-v3/ChangeLog: > > * include/std/ranges (istream_view): Replace this function > template with an alias template as per P2432R1. > (wistream_view): Define as per P2432R1. > (views::_Istream, views::istream): Likewise. > * testsuite/std/ranges/istream_view.cc (test07): New test. > --- > libstdc++-v3/include/std/ranges | 25 --- > .../testsuite/std/ranges/istream_view.cc | 13 ++ > 2 files changed, 34 insertions(+), 4 deletions(-) > > diff --git a/libstdc++-v3/include/std/ranges > b/libstdc++-v3/include/std/ranges > index 30ba0606869..7aa0f9dfc67 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -781,10 +781,27 @@ namespace views >friend _Iterator; > }; > > - template > -basic_istream_view<_Val, _CharT, _Traits> > -istream_view(basic_istream<_CharT, _Traits>& __s) > -{ return basic_istream_view<_Val, _CharT, _Traits>{__s}; } > + template > +using istream_view = basic_istream_view<_Val, char>; > + > + template > +using wistream_view = basic_istream_view<_Val, wchar_t>; > + > +namespace views > +{ > + template > +struct _Istream > +{ > + template > + [[nodiscard]] > + constexpr auto > + operator()(basic_istream<_CharT, _TraitsT>& __e) const > + { return basic_istream_view<_Tp, _CharT, _TraitsT>(__e); } > +}; > + > + template > +inline constexpr _Istream<_Tp> istream; > +} > >// C++20 24.7 [range.adaptors] Range adaptors > > diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc > b/libstdc++-v3/testsuite/std/ranges/istream_view.cc > index f5c0c2a6bb0..ea7c5a35ef7 100644 > --- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc > +++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc > @@ -103,6 +103,18 @@ test06() >static_assert( std::is_same_v ); > } > > +void > +test07() > +{ > + // P2432R1, views::istream > + auto nums = std::istringstream("0 1 2 3 4"); > + ranges::istream_view v(nums); > + int sum = 0; > + for (int val : views::istream(nums)) > +sum += val; > + VERIFY( sum == 10 ); > +} > + > int > main() > { > @@ -112,4 +124,5 @@ main() >test04(); >test05(); >test06(); > + test07(); > } > -- > 2.33.1.711.g9d530dc002 > >
Re: [PATCH 2/2] libstdc++: Implement P1739R4 changes to views::take/drop/counted
On Tue, 19 Oct 2021 at 13:20, Patrick Palka via Libstdc++ < libstd...@gcc.gnu.org> wrote: > This implements P1739R4 along with the resolution for LWG 3407 which > corrects the paper's wording. > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk? > Yes, thanks. > libstdc++-v3/ChangeLog: > > * include/bits/ranges_util.h (views::_Drop): Forward declare. > (subrange): Befriend views::_Drop. > (subrange::_S_store_size): Declare constexpr instead of just > const, remove obsolete comment. > * include/std/ranges (views::__detail::__is_empty_view): Define. > (views::__detail::__is_basic_string_view): Likewise. > (views::__detail::__is_subrange): Likewise. > (views::__detail::__is_iota_view): Likewise. > (views::__detail::__can_take_view): Rename template parm _Tp to > _Dp. > (views::_Take): Rename template parm _Tp to _Dp, make it > non-deducible > and fix its type to range_difference_t<_Range>. Implement P1739R4 > and > LWG 3407 changes. > (views::__detail::__can_drop_view): Rename template parm _Tp to > _Dp. > (views::_Drop): As with views::_Take. > (views::_Counted): Implement P1739R4 changes. > * include/std/span (__detail::__is_std_span): Rename to ... > (__detail::__is_span): ... this and turn it into a variable > template. > (__detail::__is_std_array): Turn it into a variable template. > (span::span): Adjust uses of __is_std_span and __is_std_array > accordingly. > * testsuite/std/ranges/adaptors/p1739.cc: New test. > --- > libstdc++-v3/include/bits/ranges_util.h | 7 +- > libstdc++-v3/include/std/ranges | 106 +++--- > libstdc++-v3/include/std/span | 12 +- > .../testsuite/std/ranges/adaptors/p1739.cc| 88 +++ > 4 files changed, 192 insertions(+), 21 deletions(-) > create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/p1739.cc > > diff --git a/libstdc++-v3/include/bits/ranges_util.h > b/libstdc++-v3/include/bits/ranges_util.h > index 7e7b958d274..1afa66d298c 100644 > --- a/libstdc++-v3/include/bits/ranges_util.h > +++ b/libstdc++-v3/include/bits/ranges_util.h > @@ -211,6 +211,8 @@ namespace ranges > >} // namespace __detail > > + namespace views { struct _Drop; } > + >enum class subrange_kind : bool { unsized, sized }; > >/// The ranges::subrange class template > @@ -221,8 +223,9 @@ namespace ranges > class subrange : public view_interface> > { > private: > - // XXX: gcc complains when using constexpr here > - static const bool _S_store_size > + friend struct views::_Drop; // Needs to inspect _S_store_size. > + > + static constexpr bool _S_store_size > = _Kind == subrange_kind::sized && !sized_sentinel_for<_Sent, _It>; > >_It _M_begin = _It(); > diff --git a/libstdc++-v3/include/std/ranges > b/libstdc++-v3/include/std/ranges > index 85f232d8fb9..64396027c1b 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -42,6 +42,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -2147,19 +2148,66 @@ namespace views::__adaptor >{ > namespace __detail > { > - template > + template > + constexpr bool __is_empty_view = false; > + > + template > + constexpr bool __is_empty_view> = true; > + > + template > + constexpr bool __is_basic_string_view = false; > + > + template > + constexpr bool __is_basic_string_view _Traits>> = true; > + > + template > + constexpr bool __is_subrange = false; > + > + template > + constexpr bool __is_subrange> = true; > + > + template > + constexpr bool __is_iota_view = false; > + > + template > + constexpr bool __is_iota_view> = true; > + > + template > concept __can_take_view > - = requires { take_view(std::declval<_Range>(), > std::declval<_Tp>()); }; > + = requires { take_view(std::declval<_Range>(), > std::declval<_Dp>()); }; > } // namespace __detail > > struct _Take : __adaptor::_RangeAdaptor<_Take> > { > - template > - requires __detail::__can_take_view<_Range, _Tp> > + template range_difference_t<_Range>> > + requires __detail::__can_take_view<_Range, _Dp> > constexpr auto > - operator() [[nodiscard]] (_Range&& __r, _Tp&& __n) const > + operator() [[nodiscard]] (_Range&& __r, type_identity_t<_Dp> __n) > const > { > - return take_view(std::forward<_Range>(__r), > std::forward<_Tp>(__n)); > + using _Tp = remove_cvref_t<_Range>; > + if constexpr (__detail::__is_empty_view<_Tp>) > + return _Tp(); > + else if constexpr (random_access_range<_Tp> > +&& sized_range<_Tp> > +&& (std::__d
RE: [PATCH] aarch64: Remove redundant struct type definitions in arm_neon.h
> -Original Message- > From: Jonathan Wright > Sent: Thursday, October 21, 2021 12:46 PM > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov ; Richard Sandiford > > Subject: [PATCH] aarch64: Remove redundant struct type definitions in > arm_neon.h > > Hi, > > As subject, this patch deletes some redundant type definitions in > arm_neon.h. These vector type definitions are an artifact from the initial > commit that added the AArch64 port. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? Ok. Thanks, Kyrill > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-10-15 Jonathan Wright > > * config/aarch64/arm_neon.h (__STRUCTN): Delete function > macro and all invocations.
Re: [PATCH 3/N] Come up with casm global state.
On Thu, Oct 21, 2021 at 02:42:10PM +0200, Richard Biener wrote: > +#define rs6000_casm static_cast (casm) > > maybe there's a better way? Though I can't think of one at the moment. > There are only 10 uses so eventually we can put the > static_cast into all places. Let's ask the powerpc maintainers (CCed). It's disgusting, and fragile. The define is slightly better than having to write it out every time. But can this not be done properly? If you use object-oriented stuff and need casts for that, you are doing something wrong. > Note you do > > +/* Implement TARGET_ASM_INIT_SECTIONS. */ > + > +static asm_out_state * > +rs6000_elf_asm_init_sections (void) > +{ > + rs6000_asm_out_state *target_state > += new (ggc_alloc ()) rs6000_asm_out_state (); > + target_state->init_elf_sections (); > + target_state->init_sections (); > + > + return target_state; > +} > > If you'd have made init_sections virtual the flow would be more > natural and we could separate section init from casm construction > (and rs6000 would override init_sections but call the base function > from the override). Yeah. Segher
Re: Fix PR middle-end/102764
> 2021-10-20 Eric Botcazou > > PR middle-end/102764 > * cfgexpand.c (expand_gimple_basic_block): Disregard a final debug > statement to reset the current location for the outgoing edges. This apparently breaks -fcompare-debug with -m32 so I have made it more robust by means of the attached fixlet. Bootstrapped/regtested on x86-64/Linux, applied on the mainline as obvious. PR middle-end/102764 * cfgexpand.c (expand_gimple_basic_block): Robustify latest change. -- Eric Botcazoudiff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 8b067f9d848..01d0cdc548a 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -5756,6 +5756,7 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls) rtx_insn *last; edge e; edge_iterator ei; + bool nondebug_stmt_seen = false; if (dump_file) fprintf (dump_file, "\n;; Generating RTL for gimple basic block %d\n", @@ -5836,6 +5837,8 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls) basic_block new_bb; stmt = gsi_stmt (gsi); + if (!is_gimple_debug (stmt)) + nondebug_stmt_seen = true; /* If this statement is a non-debug one, and we generate debug insns, then this one might be the last real use of a TERed @@ -6090,7 +6093,7 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls) /* Expand implicit goto and convert goto_locus. */ FOR_EACH_EDGE (e, ei, bb->succs) { - if (e->goto_locus != UNKNOWN_LOCATION || !stmt || is_gimple_debug (stmt)) + if (e->goto_locus != UNKNOWN_LOCATION || !nondebug_stmt_seen) set_curr_insn_location (e->goto_locus); if ((e->flags & EDGE_FALLTHRU) && e->dest != bb->next_bb) {
[wwwdocs, committed] GCC 12: Add release note for Fortran TS29113 improvements
I've checked in the attached patch to announce the cleanup project that Tobias and I have been working on over the last several months in the GCC 12 release notes. I also updated the page for TS29113 on the GCC wiki to reflect that anything that still doesn't work ought to be considered a bug, not just incomplete work-in-progress. I know that the conformance sections of the Fortran manual are badly in need of updating too (not just for TS29113, but the various versions of the standard). That's in my queue to fix up, but I'm likely going to need assistance from the gfortran maintainers to get the details right. :-S -Sandra commit f5971f451ae8834e928738bbfe465670aa481cea Author: Sandra Loosemore Date: Thu Oct 21 09:00:16 2021 -0700 GCC 12: Add release note for Fortran TS29113 improvements. diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index 5be1570..5974b9b 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -147,7 +147,17 @@ a work-in-progress. - +Fortran + + WG5/N1942, "TS 29113 Further Interoperability of Fortran with C", +is now fully supported. In addition to implementing previously +missing functionality, such as support for character arguments of +length greater than one in functions marked bind(c) +and gaps in the handling for assumed-rank arrays, numerous other bugs +have been fixed, and an extensive set of new conformance test cases +has been added. + +
Re: [RFC PATCH 0/1] implement TLS register based stack canary for ARM
On Thu, 21 Oct 2021 at 12:23, Ard Biesheuvel wrote: > > Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 > > In the Linux kernel, user processes calling into the kernel are > essentially threads running in the same address space, of a program that > never terminates. This means that using a global variable for the stack > protector canary value is problematic on SMP systems, as we can never > change it unless we reboot the system. (Processes that sleep for any > reason will do so on a call into the kernel, which means that there will > always be live kernel stack frames carrying copies of the canary taken > when the function was entered) > > AArch64 implements -mstack-protector-guard=sysreg for this purpose, as > this permits the kernel to use different memory addresses for the stack > canary for each CPU, and context switch the chosen system register with > the rest of the process, allowing each process to use its own unique > value for the stack canary. > > This patch implements something similar, but for the 32-bit ARM kernel, > which will start using the user space TLS register TPIDRURO to index > per-process metadata while running in the kernel. This means we can just > add an offset to TPIDRURO to obtain the address from which to load the > canary value. > > The patch is a bit rough around the edges, but produces the correct > results as far as I can tell. This is a lie > However, I couldn't quite figure out how > to modify the patterns so that the offset will be moved into the > immediate offset field of the LDR instructions, so currently, the ADD of > the offset is always a distinct instruction. > ... and this is no longer true now that I fixed the correctness problem. I will be sending out a v2 shortly, so please disregard this one for now. > As for the spilling issues that have been fixed in this code in the > past: I suppose a register carrying the TLS register value will never > get spilled to begin with? How about a register that carries TLS+? > > Comments/suggestions welcome. > > Cc: thomas.preudho...@celest.fr > Cc: adhemerval.zane...@linaro.org > Cc: Qing Zhao > Cc: Richard Sandiford > Cc: gcc-patches@gcc.gnu.org > > Ard Biesheuvel (1): > [ARM] Add support for TLS register based stack protector canary access > > gcc/config/arm/arm-opts.h | 6 +++ > gcc/config/arm/arm.c | 39 + > gcc/config/arm/arm.md | 44 ++-- > gcc/config/arm/arm.opt| 22 ++ > gcc/doc/invoke.texi | 9 > 5 files changed, 116 insertions(+), 4 deletions(-) > > -- > 2.30.2 > > $ cat |arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls > -mstack-protector-guard-offset=10 -mtp=cp15 -S -o - -xc - > -fstack-protector-all -O3 > int foo(void *); > int bar(void) > { > > return foo(__builtin_thread_pointer()) + 1; > } > .arch armv7-a > .fpu softvfp > .eabi_attribute 20, 1 > .eabi_attribute 21, 1 > .eabi_attribute 23, 3 > .eabi_attribute 24, 1 > .eabi_attribute 25, 1 > .eabi_attribute 26, 2 > .eabi_attribute 30, 2 > .eabi_attribute 34, 1 > .eabi_attribute 18, 4 > .file "" > .text > .align 2 > .global bar > .syntax unified > .arm > .type bar, %function > bar: > @ args = 0, pretend = 0, frame = 8 > @ frame_needed = 0, uses_anonymous_args = 0 > push{r4, lr} > mrc p15, 0, r4, c13, c0, 3 @ load_tp_hard > add r3, r4, #10 > sub sp, sp, #8 > mov r0, r4 > add r4, r4, #10 > ldr r3, [r3] > str r3, [sp, #4] > mov r3, #0 > bl foo > ldr r3, [r4] > ldr r4, [sp, #4] > eorsr3, r4, r3 > mov r4, #0 > bne .L5 > add r0, r0, #1 > add sp, sp, #8 > @ sp needed > pop {r4, pc} > .L5: > bl __stack_chk_fail > .size bar, .-bar > .ident "GCC: (GNU) 12.0.0 20211019 (experimental)" > .section.note.GNU-stack,"",%progbits >
Re: [RFC PATCH 0/1] implement TLS register based stack canary for ARM
On Thu, Oct 21, 2021 at 06:34:04PM +0200, Ard Biesheuvel wrote: > On Thu, 21 Oct 2021 at 12:23, Ard Biesheuvel wrote: > > > > Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 > > > > In the Linux kernel, user processes calling into the kernel are > > essentially threads running in the same address space, of a program that > > never terminates. This means that using a global variable for the stack > > protector canary value is problematic on SMP systems, as we can never > > change it unless we reboot the system. (Processes that sleep for any > > reason will do so on a call into the kernel, which means that there will > > always be live kernel stack frames carrying copies of the canary taken > > when the function was entered) > > > > AArch64 implements -mstack-protector-guard=sysreg for this purpose, as > > this permits the kernel to use different memory addresses for the stack > > canary for each CPU, and context switch the chosen system register with > > the rest of the process, allowing each process to use its own unique > > value for the stack canary. > > > > This patch implements something similar, but for the 32-bit ARM kernel, > > which will start using the user space TLS register TPIDRURO to index > > per-process metadata while running in the kernel. This means we can just > > add an offset to TPIDRURO to obtain the address from which to load the > > canary value. > > > > The patch is a bit rough around the edges, but produces the correct > > results as far as I can tell. > > This is a lie LOL. > > > However, I couldn't quite figure out how > > to modify the patterns so that the offset will be moved into the > > immediate offset field of the LDR instructions, so currently, the ADD of > > the offset is always a distinct instruction. > > > > ... and this is no longer true now that I fixed the correctness > problem. I will be sending out a v2 shortly, so please disregard this > one for now. Heh, I hadn't even had a chance to test it, so I'll hold off. :) Thanks! -Kees -- Kees Cook
Re: [wwwdocs, committed] GCC 12: Add release note for Fortran TS29113 improvements
Hi Sandra, I've checked in the attached patch to announce the cleanup project that Tobias and I have been working on over the last several months in the GCC 12 release notes. I also updated the page for TS29113 on the GCC wiki to reflect that anything that still doesn't work ought to be considered a bug, not just incomplete work-in-progress. Thanks for the work that the both of you put into this, and also thanks for putting this into the release notes. I was about to suggest you do so, so you beat me to it :-) I know that the conformance sections of the Fortran manual are badly in need of updating too (not just for TS29113, but the various versions of the standard). That's in my queue to fix up, but I'm likely going to need assistance from the gfortran maintainers to get the details right. :-S Just ask :-) Regards Thomas
[PATCH] x86: Document -fcf-protection requires i686 or newer
PR target/98667 * doc/invoke.texi: Document -fcf-protection requires i686 or new. --- gcc/doc/invoke.texi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c66a25fcd69..71992b8c597 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15542,7 +15542,8 @@ which functions and calls should be skipped from instrumentation (@pxref{Function Attributes}). Currently the x86 GNU/Linux target provides an implementation based -on Intel Control-flow Enforcement Technology (CET). +on Intel Control-flow Enforcement Technology (CET) which works for +i686 processor or newer. @item -fstack-protector @opindex fstack-protector -- 2.32.0
Re: [PATCH] x86: Adjust gcc.target/i386/pr22076.c
On Tue, Oct 19, 2021 at 11:42 PM Uros Bizjak wrote: > > On Tue, Oct 19, 2021 at 8:23 PM H.J. Lu wrote: > > > > commit 247c407c83f0015f4b92d5f71e45b63192f6757e > > Author: Roger Sayle > > Date: Mon Oct 18 12:15:40 2021 +0100 > > > > Try placing RTL folded constants in the constant pool. > > > > My recent attempts to come up with a testcase for my patch to evaluate > > ss_plus in simplify-rtx.c, identified a missed optimization opportunity > > (that's potentially a long-time regression): The RTL optimizers no > > longer > > place constants in the constant pool. > > > > changed -m32 codegen from > > > > movq.LC1, %mm0 > > paddb .LC0, %mm0 > > movq%mm0, x > > ret > > > > to > > > > movl$807671820, %eax > > movl$1616136252, %edx > > movl%eax, x > > movl%edx, x+4 > > ret > > > > and -m64 codegen from > > > > movq.LC1(%rip), %mm0 > > paddb .LC0(%rip), %mm0 > > movq%xmm0, x(%rip) > > ret > > > > to > > > > movq.LC2(%rip), %rax > > movq%rax, x(%rip) > > ret > > > > Adjust pr22076.c to check that MMX register isn't used since avoiding > > MMX register isn't a bad thing. > > > > PR testsuite/102840 > > * gcc.target/i386/pr22076.c: Updated to check that MMX register > > isn't used. > > The compiler is now able to evaluate the result at the compile time > and it optimizes the test accordingly. Let's provide some MMX > instruction that is implemented with UNSPEC, so the compiler won't be > able to outsmart us. > > Something like the attached patch. > > Uros. Works for me. Thanks. -- H.J.
[RFC PATCH v2 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. As for the spilling issues that have been fixed in this code in the past: I suppose a register carrying the TLS register value will never get spilled to begin with? Comments/suggestions welcome. Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) -- 2.30.2 $ cat|arm-linux-gnueabihf-gcc -march=armv7-a -mstack-protector-guard=tls -mstack-protector-guard-offset=1296 -mtp=cp15 -S -o - -xc - -fstack-protector-all -O3 int foo(void *); int bar(void) { return foo(__builtin_thread_pointer()) + 1; } .arch armv7-a .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "" .text .align 2 .global bar .syntax unified .arm .type bar, %function bar: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 push{r4, lr} mrc p15, 0, r4, c13, c0, 3 @ load_tp_hard mov r0, r4 sub sp, sp, #8 ldr r3, [r4, #1296] str r3, [sp, #4] mov r3, #0 bl foo ldr r2, [sp, #4] ldr r3, [r4, #1296] eorsr3, r2, r3 mov r2, #0 bne .L5 add r0, r0, #1 add sp, sp, #8 @ sp needed pop {r4, pc} .L5: bl __stack_chk_fail .size bar, .-bar .ident "GCC: (GNU) 12.0.0 20211021 (experimental)" .section.note.GNU-stack,"",%progbits
[RFC PATCH v2 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-21 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9b1f61394ad7..37e80256a78d 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (void); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c4ff06b087eb..0bf06e764dbb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -8087,6 +8114,19 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg, } +rtx +arm_stack_protect_tls_canary_mem (void) +{ + rtx tp = gen_reg_rtx (SImode); + emit_insn (gen_load_tp_hard (tp)); + + rtx r
Re: [PATH][_GLIBCXX_DEBUG] Fix unordered container merge
I eventually would like to propose a different approach. I am adding a hook in normal implementation to let the _GLIBCXX_DEBUG code know when a node is being extracted. This way invalidation is only done by comparing nodes, no need to compute hash code for this operation. The only drawback is that for each extraction we have a linear research on iterators to invalidate the correct one. I will implement next an optimization when hasher/equal_to are noexcept. This patch also remove the invalid noexcept qualification on the _Hashtable merge methods and make use of const_iterator as it is what is expected by the extract. Tested under Linux x86_64. Ok to commit ? François On 16/10/21 4:52 pm, Jonathan Wakely wrote: On Sat, 16 Oct 2021, 14:49 François Dumont via Libstdc++, mailto:libstdc%2b...@gcc.gnu.org>> wrote: Hi Here is the new proposal. My only concern is that we are also using hash or equal_to functors in the guard destructor. Can we catch any exception there, invalidate all iterators, and not rethrow the exception? I am going to enhance merge normal implementation to make use of the cached hash code when hash functors are the same between the source and destination of nodes. Maybe I'll be able to make use of it in Debug implementation too. François On 14/10/21 10:23 am, Jonathan Wakely wrote: > On Wed, 13 Oct 2021 at 18:10, François Dumont via Libstdc++ > mailto:libstdc%2b...@gcc.gnu.org>> wrote: >> Hi >> >> libstdc++: [_GLIBCXX_DEBUG] Implement unordered container merge >> >> The _GLIBCXX_DEBUG unordered containers need a dedicated merge >> implementation >> so that any existing iterator on the transfered nodes is properly >> invalidated. >> >> Add typedef/using declaration for everything used as-is from normal >> implementation. >> >> libstdc++-v3/ChangeLog: >> >> * include/debug/safe_container.h (_Safe_container<>): Make >> all methods >> protected. >> * include/debug/safe_unordered_container.h >> (_Safe_unordered_container<>::_M_invalide_all): Make public. >> (_Safe_unordered_container<>::_M_invalide_if): Likewise. >> (_Safe_unordered_container<>::_M_invalide_local_if): Likewise. >> * include/debug/unordered_map >> (unordered_map<>::mapped_type, pointer, const_pointer): New >> typedef. >> (unordered_map<>::reference, const_reference, >> difference_type): New typedef. >> (unordered_map<>::get_allocator, empty, size, max_size): >> Add usings. >> (unordered_map<>::bucket_count, max_bucket_count, bucket): >> Add usings. >> (unordered_map<>::hash_function, key_equal, count, >> contains): Add usings. >> (unordered_map<>::operator[], at, rehash, reserve): Add usings. >> (unordered_map<>::merge): New. >> (unordered_multimap<>::mapped_type, pointer, >> const_pointer): New typedef. >> (unordered_multimap<>::reference, const_reference, >> difference_type): New typedef. >> (unordered_multimap<>::get_allocator, empty, size, >> max_size): Add usings. >> (unordered_multimap<>::bucket_count, max_bucket_count, >> bucket): Add usings. >> (unordered_multimap<>::hash_function, key_equal, count, >> contains): Add usings. >> (unordered_multimap<>::rehash, reserve): Add usings. >> (unordered_multimap<>::merge): New. >> * include/debug/unordered_set >> (unordered_set<>::mapped_type, pointer, const_pointer): New >> typedef. >> (unordered_set<>::reference, const_reference, >> difference_type): New typedef. >> (unordered_set<>::get_allocator, empty, size, max_size): >> Add usings. >> (unordered_set<>::bucket_count, max_bucket_count, bucket): >> Add usings. >> (unordered_set<>::hash_function, key_equal, count, >> contains): Add usings. >> (unordered_set<>::rehash, reserve): Add usings. >> (unordered_set<>::merge): New. >> (unordered_multiset<>::mapped_type, pointer, >> const_pointer): New typedef. >> (unordered_multiset<>::reference, const_reference, >> difference_type): New typedef. >> (unordered_multiset<>::get_allocator, empty, size, >> max_size): Add usings. >> (unordered_multiset<>::bucket_count, max_bucket_count, >> bucket): Add usings. >> (unordered_multiset<>::hash_function, key_equal, count, >> contains): Add usings. >> (unordered_multiset<>::rehash, reserve): Add usings. >> (unordered_multiset<>::merge): New. >> * >> testsuite/23_containers/unordered_map/debug/merge1_neg.cc: New test. >> * >> testsuite/23_containers/unordered_map/debug/merge2_neg.cc: New test.
Re: [PATH][_GLIBCXX_DEBUG] Fix unordered container merge
On Thu, 21 Oct 2021 at 17:52, François Dumont wrote: > I eventually would like to propose a different approach. > > I am adding a hook in normal implementation to let the _GLIBCXX_DEBUG code > know when a node is being extracted. This way invalidation is only done by > comparing nodes, no need to compute hash code for this operation. > Ugh, this is horrible, I don't like the normal mode depending on the debug mode (even if it's just having to add hooks like this). The previous patch seemed fine to me. Already an improvement on what is on trunk now.
[PATCH] rs6000: Add Power10 optimization for most _mm_movemask*
Power10 ISA added `vextract*` instructions which are realized in the `vec_extractm` instrinsic. Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`. 2021-10-21 Paul A. Clarke gcc * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm when _ARCH_PWR10. * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise. (_mm_movemask_epi8): Likewise. --- Tested on Power10 powerpc64le-linux (compiled with and without `-mcpu=power10`). OK for trunk? gcc/config/rs6000/emmintrin.h | 8 gcc/config/rs6000/xmmintrin.h | 4 2 files changed, 12 insertions(+) diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h index 32ad72b4cc35..ab16c13c379e 100644 --- a/gcc/config/rs6000/emmintrin.h +++ b/gcc/config/rs6000/emmintrin.h @@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B) extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movemask_pd (__m128d __A) { +#ifdef _ARCH_PWR10 + return vec_extractm ((__v2du) __A); +#else __vector unsigned long long result; static const __vector unsigned int perm_mask = { @@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d __A) #else return result[0]; #endif +#endif /* !_ARCH_PWR10 */ } #endif /* _ARCH_PWR8 */ @@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B) extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movemask_epi8 (__m128i __A) { +#ifdef _ARCH_PWR10 + return vec_extractm ((__v16qu) __A); +#else __vector unsigned long long result; static const __vector unsigned char perm_mask = { @@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A) #else return result[0]; #endif +#endif /* !_ARCH_PWR10 */ } #endif /* _ARCH_PWR8 */ diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h index ae1a33e8d95b..4c093fd1d5ae 100644 --- a/gcc/config/rs6000/xmmintrin.h +++ b/gcc/config/rs6000/xmmintrin.h @@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A) extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movemask_ps (__m128 __A) { +#ifdef _ARCH_PWR10 + return vec_extractm ((vector unsigned int) __A); +#else __vector unsigned long long result; static const __vector unsigned int perm_mask = { @@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128 __A) #else return result[0]; #endif +#endif /* !_ARCH_PWR10 */ } #endif /* _ARCH_PWR8 */ -- 2.27.0
Re: [PATCH] Try to resolve paths in threader without looking further back.
On 10/21/2021 1:17 AM, Aldy Hernandez wrote: On Wed, Oct 20, 2021 at 10:01 PM Jeff Law wrote: On 10/20/2021 9:15 AM, Aldy Hernandez wrote: On Wed, Oct 20, 2021 at 4:35 PM Martin Sebor wrote: I appreciate the heads up. I'm happy that the threader has improved. I'm obviously not pleased that it has led to regressions in warnings but I understand that in some cases they might be due to limitations in the warning code. I think the test case you have xfailed might be one such example. The uninitialized warnings are exquisitely sensitive to these types of changes. If/when this patch is applied please reopen PR 89230 and reference this commit. Having said that, to maintain the quality of diagnostics, the work that goes into these nice optimizer improvements needs to be balanced by an effort to either update the warning code to cope with the IL changes, or the optimizers need to take care to avoid exposing undefined code that the warnings are designed to detect. I'm concerned not just that the quality of GCC 12 diagnostics has been eroding, but also that it seems to be not just acceptable but expected. You make a very good point. It is certainly not my intention to make life difficult for the warning maintainers, but I'm afraid I don't have sufficient knowledge in the area to improve them. There may be some low hanging fruit though. At least in the warnings that use the ranger, there's no reason to run these passes so late in the pipeline. You could run the warning code as early as you want, insofar as SSA is available and the CFG has been built. Heck, you may even be able to run at -O0, though we may need some sort of value numbering. I believe Richi even suggested this a while back. Running them later in the pipeline is to take advantage of the optimizers removing dead and unreachable code as much as possible. In fact, that's critical to -Wuninitialized. Optimizing away unreachable paths to avoid Wuninitialized false positives has been the major driver of jump threading improvements for the last 15 years. Ughh, that's unfortunate. We're gonna have to come up with improvements to the Wuninitialized code, or a different paradigm altogether. I'm afraid this will only get worse. Well, good luck with a different paradigm :-) It's a tough little nut. As long as we want to reduce false positives, then we're going to be dependent on optimization and related analysis. And as I've noted before, we're generally better off fixing the optimizers when we stumble over a false positive from Wuninitialized. When that's not possible, we should look to fix the predicate analysis code. It is a bit ironic that jump threading helps reduce Wuninitialized false positives, but yet too much of it causes even more false positives. I would expect the latter to be relatively rare for Wuninitialized. I think some of the other middle end warnings may be in a different boat though. jeff
[r12-4601 Regression] FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 on Linux/x86_64
On Linux/x86_64, c8a889fc0e115d40a2d02f32842655f3eadc8fa1 is the first bad commit commit c8a889fc0e115d40a2d02f32842655f3eadc8fa1 Author: Hongyu Wang Date: Wed Oct 20 13:13:39 2021 +0800 i386: Fix wrong codegen for V8HF move without TARGET_AVX512F caused FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1 FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1 with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4601/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-13.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
Re: [PATCH] Handle jobserver file descriptors in btest.
On Thu, Oct 21, 2021 at 12:48 AM Martin Liška wrote: > > The patch is about sensitive handling of file descriptors opened > by make's jobserver. Thanks. I think a better approach would be, at the start of main, fstat the descriptors up to 10 and record the ones for which fstat succeeds. Then at the end of main only check the descriptors for which fstat failed earlier. I can work on that at some point if you don't want to tackle it. Ian
Re: [PATCH] Convert strlen pass from evrp to ranger.
On 10/21/2021 1:42 AM, Aldy Hernandez wrote: Massaging the IL should only take two forms IIRC. First, if we have a simplification we can do. That could be const/copy propagation, replacing an expression with an SSA_NAME or constant and the like. It doesn't massage the IL just to massage the IL. Second, we do temporarily copy propagate the current known values of an SSA name into use points and then see if that allows us to determine if a statement is already in the hash tables. But we undo that so that nobody should see that temporary change in state. Finally, it does create some expressions & statements on the fly to enter them into the tables. For example, if it sees a store, it'll create a load with the source & dest interchanged and enter that into the expression table. But none of this stuff ever shows up in the IL. It's just to create entries in the expression tables. So ITSM the only real concern would be if those temporary const/copy propagations were still in the IL and we called back into Ranger and it poked at that data somehow. Hmmm, this is all very good insight. Thanks. One thing that would have to be adjusted then is remove the enable_ranger() call in the patch. This sets a global ranger, and there are users of get_range_query() that will use it to get on-demand ranges. One such use that I added was ssa_name_has_boolean_range in tree-ssa-dom.c. This means that if the IL has been temporarily changed, this function can and will use the global ranger. The alternative here would be to just create a new local ranger: - gimple_ranger *ranger = enable_ranger (fun); + gimple_ranger *ranger = new gimple_ranger; and then obviously deallocate it at the disable_ranger call site. This will cause any users of get_range_query() in the compiler to just use global ranges. Hopefully, none of these downstream users of get_range_query() from DOM need context sensitive results. ssa_name_has_boolean_range?? I think what you'd need to do is check that there are no calls to the ranger from cprop_into_stmt (?? this is the place where IL changes??), until wherever the undoing happens (I couldn't find it). I see a call to simplify_using_ranges in optimize_stmt that looks like it could be called with the IL in mid-flight. Maybe this call needs to happen before the IL is altered? So if we're referring to those temporary const/copy propagations "escaping" into Ranger, then I would fully expect that to cause problems. Essentially they're path sensitive const/copy propagations and may not be valid on all the paths through the CFG to the statement where the propagation occurs Yeah. disabling the global ranger should help, plus making sure you don't use the ranger in the midst of the path sensitive changes. I think we should first try to remove those temporary const/copy propagations. As I noted in a different follow-up, I can't remember if they were done as part of the original non-copying threader or if they enabled further optimizations in the copying threader. If its the former, then they can go away and that would be my preference. I'll try to poke at that over the weekend. jeff
Re: [PATCH] x86: Adjust gcc.target/i386/pr22076.c
On Thu, Oct 21, 2021 at 6:50 PM H.J. Lu wrote: > > On Tue, Oct 19, 2021 at 11:42 PM Uros Bizjak wrote: > > > > On Tue, Oct 19, 2021 at 8:23 PM H.J. Lu wrote: > > > > > > commit 247c407c83f0015f4b92d5f71e45b63192f6757e > > > Author: Roger Sayle > > > Date: Mon Oct 18 12:15:40 2021 +0100 > > > > > > Try placing RTL folded constants in the constant pool. > > > > > > My recent attempts to come up with a testcase for my patch to evaluate > > > ss_plus in simplify-rtx.c, identified a missed optimization > > > opportunity > > > (that's potentially a long-time regression): The RTL optimizers no > > > longer > > > place constants in the constant pool. > > > > > > changed -m32 codegen from > > > > > > movq.LC1, %mm0 > > > paddb .LC0, %mm0 > > > movq%mm0, x > > > ret > > > > > > to > > > > > > movl$807671820, %eax > > > movl$1616136252, %edx > > > movl%eax, x > > > movl%edx, x+4 > > > ret > > > > > > and -m64 codegen from > > > > > > movq.LC1(%rip), %mm0 > > > paddb .LC0(%rip), %mm0 > > > movq%xmm0, x(%rip) > > > ret > > > > > > to > > > > > > movq.LC2(%rip), %rax > > > movq%rax, x(%rip) > > > ret > > > > > > Adjust pr22076.c to check that MMX register isn't used since avoiding > > > MMX register isn't a bad thing. > > > > > > PR testsuite/102840 > > > * gcc.target/i386/pr22076.c: Updated to check that MMX register > > > isn't used. > > > > The compiler is now able to evaluate the result at the compile time > > and it optimizes the test accordingly. Let's provide some MMX > > instruction that is implemented with UNSPEC, so the compiler won't be > > able to outsmart us. > > > > Something like the attached patch. > > > > Uros. > > Works for me. Committed with the following ChangeLog: testsuite: Adjust pr22076.c to avoid compile-time optimization [PR102840] 2021-10-21 Uroš Bizjak PR testsuite/102840 gcc/testsuite/ChangeLog: * gcc.target/i386/pr22076.c: Adjust to avoid compile time optimization. Uros.
[PATCH] Possible use before def in fortran/trans-decl.c.
As I'm tweaking installing ranger as the VRP2 pass, I am getting a stage 2 bootstrap failure now: In file included from /opt/notnfs/amacleod/master/gcc/gcc/fortran/trans-decl.c:28: /opt/notnfs/amacleod/master/gcc/gcc/tree.h: In function ‘void gfc_conv_cfi_to_gfc(stmtblock_t*, stmtblock_t*, tree, tree, gfc_symbol*)’: /opt/notnfs/amacleod/master/gcc/gcc/tree.h:244:56: error: ‘rank’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code) | ^~~~ /opt/notnfs/amacleod/master/gcc/gcc/fortran/trans-decl.c:6671:8: note: ‘rank’ was declared here 6671 | tree rank, idx, etype, tmp, tmp2, size_var = NULL_TREE; | ^~~~ cc1plus: all warnings being treated as errors make[3]: *** [Makefile:1136: fortran/trans-decl.o] Error 1 looking at that function, in the middle I see: if (sym->as->rank < 0) { /* Set gfc->dtype.rank, if assumed-rank. */ rank = gfc_get_cfi_desc_rank (cfi); gfc_add_modify (&block, gfc_conv_descriptor_rank (gfc_desc), rank); } else if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (gfc_desc))) /* In that case, the CFI rank and the declared rank can differ. */ rank = gfc_get_cfi_desc_rank (cfi); else rank = build_int_cst (signed_char_type_node, sym->as->rank); so rank is set on all paths here. However, stepping back a bit, earlier in the function I see: if (!sym->attr.dimension || !GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (gfc_desc))) { tmp = gfc_get_cfi_desc_base_addr (cfi); gfc_add_modify (&block, gfc_desc, fold_convert (TREE_TYPE (gfc_desc), tmp)); if (!sym->attr.dimension) goto done; } The done: label occurs *after* that block of initialization code, and bit furtehr down , I see this: gfc_add_modify (&loop_body, tmpidx, idx); stmtblock_t inner_loop; gfc_init_block (&inner_loop); tree dim = gfc_create_var (TREE_TYPE (rank), "dim"); I cannot convince myself by looking at the intervening code that this can not be executed along this path. Perhaps someone more familiar with the code can check it out. However, It seems worthwhile to at least initialize rank to NULL_TREE, thus we can be more likely to see a trap if that path ever gets followed. And it makes the warning go away :-) OK? Andrew PS as a side note, it would be handy to have the def point *and* the use point that might be undefined. Its a big function and it took me a while just to see where a possible use might be. commit ed571a93c54e3967fbf445624e47817be5e333ed Author: Andrew MacLeod Date: Thu Oct 21 14:48:20 2021 -0400 Initialize variable. gcc/fortran/ * trans-decl.c (gfc_conv_cfi_to_gfc): Initialize rank to NULL_TREE. diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index de624c82fcf..fe5511b5285 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -6668,7 +6668,7 @@ gfc_conv_cfi_to_gfc (stmtblock_t *init, stmtblock_t *finally, stmtblock_t block; gfc_init_block (&block); tree cfi = build_fold_indirect_ref_loc (input_location, cfi_desc); - tree rank, idx, etype, tmp, tmp2, size_var = NULL_TREE; + tree idx, etype, tmp, tmp2, size_var = NULL_TREE, rank = NULL_TREE; bool do_copy_inout = false; /* When allocatable + intent out, free the cfi descriptor. */
Re: [PATCH] x86: Document -fcf-protection requires i686 or newer
On Thu, Oct 21, 2021 at 6:47 PM H.J. Lu wrote: > > PR target/98667 > * doc/invoke.texi: Document -fcf-protection requires i686 or > new. Obvious patch? Uros. > --- > gcc/doc/invoke.texi | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index c66a25fcd69..71992b8c597 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -15542,7 +15542,8 @@ which functions and calls should be skipped from > instrumentation > (@pxref{Function Attributes}). > > Currently the x86 GNU/Linux target provides an implementation based > -on Intel Control-flow Enforcement Technology (CET). > +on Intel Control-flow Enforcement Technology (CET) which works for > +i686 processor or newer. > > @item -fstack-protector > @opindex fstack-protector > -- > 2.32.0 >
Re: [PATCH] x86: Document -fcf-protection requires i686 or newer
On Thu, Oct 21, 2021 at 12:04 PM Uros Bizjak wrote: > > On Thu, Oct 21, 2021 at 6:47 PM H.J. Lu wrote: > > > > PR target/98667 > > * doc/invoke.texi: Document -fcf-protection requires i686 or > > new. > > Obvious patch? I am checking it in and backporting it to release branches. Thanks. > Uros. > > > --- > > gcc/doc/invoke.texi | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index c66a25fcd69..71992b8c597 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -15542,7 +15542,8 @@ which functions and calls should be skipped from > > instrumentation > > (@pxref{Function Attributes}). > > > > Currently the x86 GNU/Linux target provides an implementation based > > -on Intel Control-flow Enforcement Technology (CET). > > +on Intel Control-flow Enforcement Technology (CET) which works for > > +i686 processor or newer. > > > > @item -fstack-protector > > @opindex fstack-protector > > -- > > 2.32.0 > > -- H.J.
Re: [PATCH] libcody: Avoid double-free
On 10/21/21 09:27, Jonathan Wakely wrote: If the listen call fails then 'goto fail' will jump to that label and use freeaddrinfo again. Set the pointer to null to prevent that. OK. libcody/ChangeLog: * netserver.cc (ListenInet6): Set pointer to null after deallocation. --- libcody/netserver.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/libcody/netserver.cc b/libcody/netserver.cc index 30202c5106a..0499b5790b4 100644 --- a/libcody/netserver.cc +++ b/libcody/netserver.cc @@ -140,6 +140,7 @@ int ListenInet6 (char const **e, char const *name, int port, unsigned backlog) listen:; freeaddrinfo (addrs); + addrs = nullptr; if (listen (fd, backlog ? backlog : 17) < 0) {
Re: [PATCH] c++tools: Fix memory leak
On 10/21/21 09:28, Jonathan Wakely wrote: The allocated memory is not freed when returning early due to an error. c++tools/ChangeLog: * resolver.cc (module_resolver::read_tuple_file): Use unique_ptr to ensure memory is freed before returning. --- c++tools/resolver.cc | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/c++tools/resolver.cc b/c++tools/resolver.cc index 421fdaa55fe..d1b73a47778 100644 --- a/c++tools/resolver.cc +++ b/c++tools/resolver.cc @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see #include "resolver.h" // C++ #include +#include // C #include // OS @@ -114,10 +115,17 @@ module_resolver::read_tuple_file (int fd, char const *prefix, bool force) buffer = mmap (nullptr, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0); if (buffer == MAP_FAILED) return -errno; + struct Deleter { +void operator()(void* p) const { munmap(p, size); } +size_t size; + }; + std::unique_ptr guard(buffer, Deleter{(size_t)stat.st_size}); #else buffer = xmalloc (stat.st_size); if (!buffer) return -errno; + struct Deleter { void operator()(void* p) const { free(p); } }; + std::unique_ptr guard; Don't you need to initialize guard from buffer? if (read (fd, buffer, stat.st_size) != stat.st_size) return -errno; #endif @@ -179,12 +187,6 @@ module_resolver::read_tuple_file (int fd, char const *prefix, bool force) } } -#if MAPPED_READING - munmap (buffer, stat.st_size); -#else - free (buffer); -#endif - return 0; }
PING [PATCH v4 0/2] Implement indirect external access
On Wed, Sep 22, 2021 at 7:02 PM H.J. Lu wrote: > > Changes in the v4 patch. > > 1. Add nodirect_extern_access attribute. > > Changes in the v3 patch. > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to > GNU binutils 2.38. But the -z indirect-extern-access linker option is > only available for Linux/x86. However, the --max-cache-size=SIZE linker > option was also addded within a day. --max-cache-size=SIZE is used to > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support. > > Changes in the v2 patch. > > 1. Rename the option to -fdirect-extern-access. > > --- > On systems with copy relocation: > * A copy in executable is created for the definition in a shared library > at run-time by ld.so. > * The copy is referenced by executable and shared libraries. > * Executable can access the copy directly. > > Issues are: > * Overhead of a copy, time and space, may be visible at run-time. > * Read-only data in the shared library becomes read-write copy in > executable at run-time. > * Local access to data with the STV_PROTECTED visibility in the shared > library must use GOT. > > On systems without function descriptor, function pointers vary depending > on where and how the functions are defined. > * If the function is defined in executable, it can be the address of > function body. > * If the function, including the function with STV_PROTECTED visibility, > is defined in the shared library, it can be the address of the PLT entry > in executable or shared library. > > Issues are: > * The address of function body may not be used as its function pointer. > * ld.so needs to search loaded shared libraries for the function pointer > of the function with STV_PROTECTED visibility. > > Here is a proposal to remove copy relocation and use canonical function > pointer: > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must > use GOT. > a. Linker may optimize out GOT access if the data is defined in PIE or > non-PIE. > 2. Read-only data in the shared library remain read-only at run-time > 3. Address of global data with the STV_PROTECTED visibility in the shared > library is the address of data body. > a. Can use IP-relative access. > b. May need GOT without IP-relative access. > 4. For systems without function descriptor, > a. All global function pointers of undefined functions in PIE and > non-PIE must use GOT. Linker may optimize out GOT access if the > function is defined in PIE or non-PIE. > b. Function pointer of functions with the STV_PROTECTED visibility in > executable and shared library is the address of function body. >i. Can use IP-relative access. >ii. May need GOT without IP-relative access. >iii. Branches to undefined functions may use PLT. > 5. Single global definition marker: > > Add GNU_PROPERTY_1_NEEDED: > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO > > to indicate the needed properties by the object file. > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) > > to indicate that the object file requires canonical function pointers and > cannot be used with copy relocation. This bit should be cleared in > executable when there are non-GOT or non-PLT relocations in relocatable > input files without this bit set. > > a. Protected symbol access within the shared library can be treated as > local. > b. Copy relocation should be disallowed at link-time and run-time. > c. GOT function pointer reference is required at link-time and run-time. > > The indirect external access marker can be used in the following ways: > > 1. Linker can decide the best way to resolve a relocation against a > protected symbol before seeing all relocations against the symbol. > 2. Dynamic linker can decide if it is an error to have a copy relocation > in executable against the protected symbol in a shared library by checking > if the shared library is built with -fno-direct-extern-access. > > Add a compiler option, -fdirect-extern-access. -fdirect-extern-access is > the default. With -fno-direct-extern-access: > > 1. Always to use GOT to access undefined symbols, including in PIE and > non-PIE. This is safe to do and does not break the ABI. > 2. In executable and shared library, for symbols with the STV_PROTECTED > visibility: > a. The address of data symbol is the address of data body. > b. For systems without function descriptor, the function pointer is > the address of function body. > These break the ABI and resulting shared libraries may not be compatible > with executables which are not compiled with -fno-direct-extern-access. > 3. Generate an indirect external access marker in relocatable objects if > supported by linker. > > H.J. Lu (2): > Add -f[no-]direct-extern-access > Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE > Hi, This has been implemented in binutils 2.38 and glibc 2.35. What do I need to do to get it into GCC 12? Thanks. -- H.J.
Re: how does vrp2 rearrange this?
On Thu, Oct 21, 2021 at 8:04 AM Andrew MacLeod wrote: > > On 10/19/21 7:13 PM, Andrew Pinski wrote: > > On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod wrote: > >> On 10/19/21 5:13 PM, Andrew Pinski wrote: > >>> On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches > >>> wrote: > using testcase ifcvt-4.c: > > > typedef int word __attribute__((mode(word))); > > word > foo (word x, word y, word a) > { > word i = x; > word j = y; > /* Try to make taking the branch likely. */ > __builtin_expect (x > y, 1); > if (x > y) > { > i = a; > j = i; > } > return i * j; > > > > > The testcase is broken anyways. > > The builtin_expect should be inside the if to have any effect. Look > > at the estimated values: > > if (x_3(D) > y_4(D)) > > goto ; [50.00%]<<-- has been reversed. > > else > > goto ; [50.00%] > > ;;succ: 4 [50.0% (guessed)] count:536870912 (estimated > > locally) (TRUE_VALUE,EXECUTABLE) > > ;;3 [50.0% (guessed)] count:536870912 (estimated > > locally) (FALSE_VALUE,EXECUTABLE) > > > > See how it is 50/50? > > The testcase is not even testing what it says it is testing. Just > > happened to work previously does not mean anything. Move the > > builtin_expect inside the if and try again. I am shocked it took this > > long to find the testcase issue really. > > > > Thanks, > > Andrew Pinski > > > Moving the expect around doesn't change anything, in fact, it makes it > worse since fre and evrp immediately eliminate it as true if it is in > the THEN block. I think you misunderstood the change I was saying to do. Try this: typedef int word __attribute__((mode(word))); word foo (word x, word y, word a) { word i = x; word j = y; /* Try to make taking the branch likely. */ if (__builtin_expect (x > y, 1)) { i = a; j = i; } return i * j; } /* { dg-final { scan-rtl-dump "2 true changes made" "ce1" } } */ This should fix the "estimated values" to be more correct. Thanks, Andrew Pinski > > It looks like it is eliminated by the CDDCE pass: > > cddce1 sees: > >_1 = x_5(D) > y_7(D); ># RANGE [0, 1] NONZERO 1 >_2 = (long int) _1; >__builtin_expect (_2, 1); >if (x_5(D) > y_7(D)) > goto ; [INV] >else > goto ; [INV] > > and proceeds: > > Marking useful stmt: if (x_5(D) > y_7(D)) > processing: if (x_5(D) > y_7(D)) > processing: i_3 = PHI > > Eliminating unnecessary statements: > Deleting : __builtin_expect (_2, 1); > Deleting : _2 = (long int) _1; > Deleting : _1 = x_5(D) > y_7(D); > > IF we are suppose to reverse the If, it is not obvious to me who is > suppose to.. You seem to be right that its a crap shot that VRP2 does > it because there isnt enough info to dictate it.. unless somewhere it > detects that a THEN targets an empty block which fallthrus to the ELSE > block should be swapped. Or maybe you are right and that it flukeily > happens due to the ASSERTS being added and removed. > > IF i turn of DCE, then this all works like it si ssupopse to.. so maybe > DCE isnt supopse to remove this? > > Andrew >