Re: [PATCH] slp: fix sharing of SLP only patterns. (PR99149)
On Fri, 19 Feb 2021, Tamar Christina wrote: > Hi Richi, > > The attached testcase ICEs due to a couple of issues. > In the testcase you have two SLP instances that share the majority of their > definition with each other. One tree defines a COMPLEX_MUL sequence and the > other tree a COMPLEX_FMA. > > The ice happens because: > > 1. the refcounts are wrong, in particular the FMA case doesn't correctly count > the references for the COMPLEX_MUL that it consumes. > > 2. when the FMA is created it incorrectly assumes it can just tear apart the > MUL > node that it's consuming. This is wrong and should only be done when there is > no more uses of the node, in which case the vector only pattern is no longer > relevant. > > To fix the last part the SLP only pattern reset code was moved into > vect_free_slp_tree which results in cleaner code. I also think it does belong > there since that function knows when there are no more uses of the node and so > the pattern should be unmarked, so when the the vectorizer is inspecting the > BB > it doesn't find the now invalid vector only patterns. Note that if you rely on the pattern being reset when it's no longer used then this won't work in case we end up in vect_slp_convert_to_external because that does /* Don't remove and free the child nodes here, since they could be referenced by other structures. The analysis and scheduling phases (need to) ignore child nodes of anything that isn't vect_internal_def. */ unsigned int group_size = SLP_TREE_LANES (node); SLP_TREE_DEF_TYPE (node) = vect_external_def; SLP_TREE_SCALAR_OPS (node).safe_grow (group_size, true); SLP_TREE_LOAD_PERMUTATION (node).release (); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) { tree lhs = gimple_get_lhs (vect_orig_stmt (stmt_info)->stmt); where I don't remember the reason for not freeing the children (it's Richards code, so you'd have to ask him). OTOH the above only triggers for non-loop vectorization where the pattern reset isn't needed? > This has the obvious problem in that, eventually after analysis is done, the > entire SLP tree is dissolved before codegen. Where we get into trouble as we > have now dissolved the patterns too... Huh? Codegen needs the SLP nodes so they most definitely survive until after vectorization has finished. So did you run into actual problems with this approach? > My initial thought was to add a parameter to vect_free_slp_tree, but I know > you > wouldn't like that. So I am sending this patch up as an RFC. > > PS. This testcase actually shows that the codegen we get in these cases is not > optimal. Currently this won't vectorize as the compiler thinks the vector > version is too expensive. > > My guess here is because the patterns now unshare the tree and it's likely > costing the setup for the vector code twice? I can see it figures there are LIVE lanes out of the SLP tree, likely because the way vect_bb_slp_mark_live_stmts works out the stmt to vectorize doesn't work for SLP patterns. It sees t.ii:19:10: note: node 0x3b87dd0 (max_nunits=2, refcnt=1) t.ii:19:10: note: op template: MEM [(struct a *)_10] = _24; t.ii:19:10: note: stmt 0 MEM [(struct a *)_10] = _24; t.ii:19:10: note: stmt 1 MEM [(struct a *)_10 + 4B] = _25; t.ii:19:10: note: children 0x3b87e58 t.ii:19:10: note: node 0x3b87e58 (max_nunits=2, refcnt=1) t.ii:19:10: note: op template: slp_patt_34 = .COMPLEX_FMA (_24, _24, _24); t.ii:19:10: note: stmt 0 _24 = l$b_11 + _20; t.ii:19:10: note: stmt 1 _25 = l$c_12 + _23; t.ii:19:10: note: children 0x3b87ee0 0x3b88100 0x3b88100 t.ii:19:10: note: node 0x3b87ee0 (max_nunits=2, refcnt=1) t.ii:19:10: note: op template: l$b_11 = MEM[(const struct a &)_10].b; t.ii:19:10: note: stmt 0 l$b_11 = MEM[(const struct a &)_10].b; t.ii:19:10: note: stmt 1 l$c_12 = MEM[(const struct a &)_10].c; t.ii:19:10: note: node 0x3b88100 (max_nunits=2, refcnt=4) t.ii:19:10: note: op template: k$b_4 = MEM[(const struct a &)_3].b; t.ii:19:10: note: stmt 0 k$b_4 = MEM[(const struct a &)_3].b; t.ii:19:10: note: stmt 1 k$c_5 = MEM[(const struct a &)_3].c; but the scalar stmts for the .COMPLEX_FMA pattern have uses that are not marked as PURE_SLP (_20 and _23). There's the other issue that we have t.ii:19:10: note: op template: slp_patt_33 = .COMPLEX_MUL (_20, _20); t.ii:19:10: note: stmt 0 _20 = _18 - _19; t.ii:19:10: note: stmt 1 _23 = _21 + _22; t.ii:19:10: note: children 0x3b88180 0x3b88538 t.ii:19:10: note: node 0x3b88538 (max_nunits=1, refcnt=1) t.ii:19:10: note: op: VEC_PERM_EXPR t.ii:19:10: note: { } t.ii:19:10: note: lane permutation { 0[0] 1[1] } t.ii:19:10: note: children 0x3b880f8 0x3b88290 t.ii:19:10: note: node 0x3b880f8 (max_nunits=2, refcnt=1) t.ii:19:10: note: op template: k$b_4 = MEM[(const struct a &)_3].b; t.ii:19:10: note: stmt 0 k$b_4 = MEM[(const struct a &)_3].b; t.ii:19:10: note: stmt
[PATCH] tree-optimization/99165 - fix ICE in store-merging w/ non-call EH
This adds a missing accumulation to ret. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-02-22 Richard Biener PR tree-optimization/99165 * gimple-ssa-store-merging.c (pass_store_merging::process_store): Accumulate changed to ret. * g++.dg/pr99165.C: New testcase. --- gcc/gimple-ssa-store-merging.c | 2 +- gcc/testsuite/g++.dg/pr99165.C | 7 +++ 2 files changed, 8 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/pr99165.C diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c index b4c5e8eb9a8..213c1551d39 100644 --- a/gcc/gimple-ssa-store-merging.c +++ b/gcc/gimple-ssa-store-merging.c @@ -5230,7 +5230,7 @@ pass_store_merging::process_store (gimple *stmt) if (idx >= (unsigned)param_max_store_chains_to_track || (n_stores + (*e)->m_store_info.length () > (unsigned)param_max_stores_to_track)) - terminate_and_process_chain (*e); + ret |= terminate_and_process_chain (*e); else { n_stores += (*e)->m_store_info.length (); diff --git a/gcc/testsuite/g++.dg/pr99165.C b/gcc/testsuite/g++.dg/pr99165.C new file mode 100644 index 000..70ffd0345cf --- /dev/null +++ b/gcc/testsuite/g++.dg/pr99165.C @@ -0,0 +1,7 @@ +// { dg-do compile } +// { dg-options "-O2 -fnon-call-exceptions --param=max-stores-to-track=2" } + +struct A { + A() : i() {} + int i; +} *ap2 = new A[3]; -- 2.26.2
Re: [PATCH] Fix ICE in tree_inlinable_function_p.
On Sat, Feb 20, 2021 at 11:49 AM Martin Liška wrote: > > After g:1a2a7096e5e20d736c6138179470b21aa5a74864 we forbid inlining > for a VLA types. What we miss is setting inline_forbidden_reason > variable. > > Fixes: > > ./xgcc -B. -O3 -c > /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c -Winline > > during GIMPLE pass: local-fnsummary > /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c: In function > ‘foo’: > /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/pr99122-2.c:21:1: internal > compiler error: Segmentation fault > 21 | } >| ^ > 0xe8b2ca crash_signal > /home/marxin/Programming/gcc/gcc/toplev.c:327 > 0x1a92733 pp_format(pretty_printer*, text_info*) > /home/marxin/Programming/gcc/gcc/pretty-print.c:1096 > 0x1a76b90 diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*) > /home/marxin/Programming/gcc/gcc/diagnostic.c:1244 > 0x1a79994 diagnostic_impl > /home/marxin/Programming/gcc/gcc/diagnostic.c:1406 > 0x1a79994 warning(int, char const*, ...) > /home/marxin/Programming/gcc/gcc/diagnostic.c:1527 > 0xf1bb16 tree_inlinable_function_p(tree_node*) > /home/marxin/Programming/gcc/gcc/tree-inline.c:4123 > 0xc3f1c5 compute_fn_summary(cgraph_node*, bool) > /home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:3110 > 0xc3f937 compute_fn_summary_for_current > /home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:3160 > 0xc3f937 execute > /home/marxin/Programming/gcc/gcc/ipa-fnsummary.c:4768 > > Ready to be installed? OK. > Thanks, > Martin > > gcc/ChangeLog: > > * tree-inline.c (inline_forbidden_p): Set > inline_forbidden_reason. > --- > gcc/tree-inline.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c > index c993b1fee8a..1dcb31c0267 100644 > --- a/gcc/tree-inline.c > +++ b/gcc/tree-inline.c > @@ -4027,10 +4027,20 @@ inline_forbidden_p (tree fndecl) >the caller. */ > if (COMPLETE_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl))) > && !poly_int_tree_p (TYPE_SIZE (TREE_TYPE (TREE_TYPE (fndecl) > -return true; > +{ > + inline_forbidden_reason > + = G_("function %q+F can never be inlined because " > +"it has a VLA return argument"); > + return true; > +} > for (tree parm = DECL_ARGUMENTS (fndecl); parm; parm = DECL_CHAIN (parm)) > if (!poly_int_tree_p (DECL_SIZE (parm))) > - return true; > + { > + inline_forbidden_reason > + = G_("function %q+F can never be inlined because " > + "it has a VLA argument"); > + return true; > + } > > FOR_EACH_BB_FN (bb, fun) > { > -- > 2.30.1 >
[PATCH] dump SLP subgraph before costing
This adds another dump of the SLP subgraph we're throwing at costing. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-02-22 Richard Biener * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Dump costed subgraph. --- gcc/tree-vect-slp.c | 9 + 1 file changed, 9 insertions(+) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index ea8a97b01c6..c521c34a83b 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -4366,6 +4366,15 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, unsigned int vec_inside_cost = 0, vec_outside_cost = 0, scalar_cost = 0; unsigned int vec_prologue_cost = 0, vec_epilogue_cost = 0; + if (dump_enabled_p ()) +{ + dump_printf_loc (MSG_NOTE, vect_location, "Costing subgraph: \n"); + hash_set visited; + FOR_EACH_VEC_ELT (slp_instances, i, instance) + vect_print_slp_graph (MSG_NOTE, vect_location, + SLP_INSTANCE_TREE (instance), visited); +} + /* Calculate scalar cost and sum the cost for the vector stmts previously collected. */ stmt_vector_for_cost scalar_costs = vNULL; -- 2.26.2
Re: [PR94092] Re: [RFC] test builtin ratio for loop distribution
On Fri, Feb 19, 2021 at 9:08 AM Alexandre Oliva wrote: > > Here's an improved version of the patch. Regstrapped on > x86_64-linux-gnu, with and without a patchlet that moved multi-pieces > ahead of setmem, and also tested with riscv32-elf. > > Is it ok to install? Or should it wait for stage1? It generally looks OK but I'd wait for stage1. I'd also like to see comments from others. Note that I think we need to guard the loop emitting on optimize_*_for_speed/!size. It's also not entirely clear to me how the code avoids expanding all > 1 byte block size memsets to a loop (thus bypassing more optimal libc implementations for larger sizes). I also remember that we sometimes do a dynamic dispatch between inlined and non-inlined code paths, though that might be target specific handling in the x86 backend. Thanks, Richard. > [PR94092] introduce try store by multiple pieces > > From: Alexandre Oliva > > The ldist pass turns even very short loops into memset calls. E.g., > the TFmode emulation calls end with a loop of up to 3 iterations, to > zero out trailing words, and the loop distribution pass turns them > into calls of the memset builtin. > > Though short constant-length clearing memsets are usually dealt with > efficiently, for non-constant-length ones, the options are setmemM, or > a function calls. > > RISC-V doesn't have any setmemM pattern, so the loops above end up > "optimized" into memset calls, incurring not only the overhead of an > explicit call, but also discarding the information the compiler has > about the alignment of the destination, and that the length is a > multiple of the word alignment. > > This patch handles variable lengths with multiple conditional > power-of-2-constant-sized stores-by-pieces, so as to reduce the > overhead of length compares. > > It also changes the last copy-prop pass into ccp, so that pointer > alignment and length's nonzero bits are detected and made available > for the expander, even for ldist-introduced SSA_NAMEs. > > > for gcc/ChangeLog > > PR tree-optimization/94092 > * builtins.c (try_store_by_multiple_pieces): New. > (expand_builtin_memset_args): Use it. If target_char_cast > fails, proceed as for non-constant val. Pass len's ctz to... > * expr.c (clear_storage_hints): ... this. Try store by > multiple pieces after setmem. > (clear_storage): Adjust. > * expr.h (clear_storage_hints): Likewise. > (try_store_by_multiple_pieces): Declare. > * passes.def: Replace the last copy_prop to ccp. > --- > gcc/builtins.c | 182 > ++-- > gcc/expr.c |9 ++- > gcc/expr.h | 13 > gcc/passes.def |5 +- > 4 files changed, 197 insertions(+), 12 deletions(-) > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index 0aed008687ccc..05b14f2a3f6d3 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -6600,6 +6600,166 @@ expand_builtin_memset (tree exp, rtx target, > machine_mode mode) >return expand_builtin_memset_args (dest, val, len, target, mode, exp); > } > > +/* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO. > + Return TRUE if successful, FALSE otherwise. TO is assumed to be > + aligned at an ALIGN-bits boundary. LEN must be a multiple of > + 1< + > + The strategy is to issue one store_by_pieces for each power of two, > + from most to least significant, guarded by a test on whether there > + are at least that many bytes left to copy in LEN. > + > + ??? Should we skip some powers of two in favor of loops? Maybe start > + at the max of TO/LEN/word alignment, at least when optimizing for > + size, instead of ensuring O(log len) dynamic compares? */ > + > +bool > +try_store_by_multiple_pieces (rtx to, rtx len, unsigned int ctz_len, > + unsigned HOST_WIDE_INT min_len, > + unsigned HOST_WIDE_INT max_len, > + rtx val, char valc, unsigned int align) > +{ > + int max_bits = floor_log2 (max_len); > + int min_bits = floor_log2 (min_len); > + int sctz_len = ctz_len; > + > + gcc_checking_assert (sctz_len >= 0); > + > + if (val) > +valc = 1; > + > + /* Bits more significant than TST_BITS are part of the shared prefix > + in the binary representation of both min_len and max_len. Since > + they're identical, we don't need to test them in the loop. */ > + int tst_bits = (max_bits != min_bits ? max_bits > + : floor_log2 (max_len ^ min_len)); > + > + /* Check whether it's profitable to start by storing a fixed BLKSIZE > + bytes, to lower max_bits. In the unlikely case of a constant LEN > + (implied by identical MAX_LEN and MIN_LEN), we want to issue a > + single store_by_pieces, but otherwise, select the minimum multiple > + of the ALIGN (in bytes) and of the MCD of the possible LENs, that > + brings MAX_LEN below TST_BITS, if that's
Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered
Hi Andre, Thanks for fixing this. On 19/02/2021 10:53, Andre Vieira (lists) via Gcc-patches wrote: > Hi, > > This patch makes sure that allocno copies are not created for unordered > modes. The testcases in the PR highlighted a case where an allocno copy was > being created for: > (insn 121 120 123 11 (parallel [ > (set (reg:VNx2QI 217) > (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0))) > (clobber (scratch:VNx16BI)) > ]) 4750 {*vec_duplicatevnx2qi_reg} > (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) > (nil))) > > As the compiler detected that the vec_duplicate_reg pattern allowed > the input and output operand to be of the same register class, it tried to > create an allocno copy for these two operands, stripping subregs in the > process. However, this meant that the copy was between VNx2QI and SI, which > have unordered mode precisions. > > So at compile time we do not know which of the two modes is smaller which is > a requirement when updating allocno copy costs. > > Regression tested on aarch64-linux-gnu. > > Is this OK for trunk (and after a week backport to gcc-10) ? > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c > new file mode 100644 > index > ..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c > @@ -0,0 +1,12 @@ > +/* PR rtl-optimization/98791 */ > +/* { dg-do compile } */ > +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } > */ > +extern char a[], b[]; > +short c, d; > +long *e; > +void f() { > + for (int g; g < c; g += 1) { > +a[g] = d; > +b[g] = e[g]; > + } > +} For the testcase, you might want to use the one I posted most recently: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3 which avoids the dependency on the aarch64-autovec-preference param (which is in GCC 11 only) as this will simplify backporting. But if it's preferable to have a testcase without SVE intrinsics for GCC 11 then we should stick with that. Cheers, Alex
Re: [PATCH] gcov: use mmap pools for KVP.
PING^3 On 2/9/21 9:37 AM, Martin Liška wrote: PING^2 @Honza: ? On 1/29/21 2:33 PM, Martin Liška wrote: PING^1 On 1/25/21 1:51 PM, Martin Liška wrote: On 1/22/21 3:33 PM, Jan Hubicka wrote: It is definitly doable (gcov machinery is quite flexible WRT having more types of counters). Yes, that would introduce back the dropped TOPN counters which I intentionally dropped. We could bring back topn counters or the easier dominating vlaue ones and add command line option. However perhaps better in long term would be keep adding mmap ifdefs for targets where it is important... Kernel guys seems to be getting on profile feedback with clang, so we should keep in mind posibility of supporting that as well. Sure, that should not be so difficult. They should handle it in their kernel run-time. We could however also ask users to either resort to -fno-profile-values or implement mmap equivalent ifdefs to libgcov if they really care about malloc profiling. Seems reasonable to me. Well, the current approach makes some assumptions on mmap (and malloc), but they seem reasonable to me. I don't expect anybody building Mingw Firefox PGO build, it's likely unsupported configuration. It is possible to build Firefox with mingw on windows and I would expected feedback to work. https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Compiling_Mozilla_With_Mingw However this is not only about Firefox - we notice problems with Firefox since it is only real word app where we look at PGO more systematically. But we definitly should aim for PGO to be useful for apps of similar complexity. It would be nice to incrase this testing coverage. I see. Another observation about the tcmalloc 'malloc' implementation. It hangs in a PGO build, but libgcov would be happy if NULL would be returned. Yep, I would expect other folks to try to PGO optimize malloc implementations an we could run into variety of issues... Ok, can we go with the suggested mmap patch for now? Thanks, Martin Honza Martin So personally I do not see this as a must feature (and I think Martin was really looking forward to drop the former topn profile implementation :) Another intersting case would be, of course, profiling of kernel... Honza
[Patch] Fortran/OpenMP: Fix optional dummy procedures [PR99171]
Normal dummy arguments get some additional redirection if they are OPTIONAL; however, that's not the case for dummy procedures. That was shown by a simple 'procedure(), optional :: proc' example in the PR. – The fix is as simple. However, I thought it still makes sense to test all combinations of procedure pointer (incl. c_funptr) with optional and pointer... OK for mainline and GCC 10 (it is a 10/11 regression)? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf Fortran/OpenMP: Fix optional dummy procedures [PR99171] gcc/fortran/ChangeLog: PR fortran/99171 * trans-openmp.c (gfc_omp_is_optional_argument): Regard optional dummy procs as nonoptional as no special treatment is needed. libgomp/ChangeLog: PR fortran/99171 * testsuite/libgomp.fortran/dummy-procs-1.f90: New test. gcc/fortran/trans-openmp.c | 5 +- .../testsuite/libgomp.fortran/dummy-procs-1.f90| 393 + 2 files changed, 397 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 67e370f8b57..349df1cc346 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -64,7 +64,9 @@ gfc_omp_is_allocatable_or_ptr (const_tree decl) /* True if the argument is an optional argument; except that false is also returned for arguments with the value attribute (nonpointers) and for assumed-shape variables (decl is a local variable containing arg->data). - Note that pvoid_type_node is for 'type(c_ptr), value. */ + Note that for 'procedure(), optional' the value false is used as that's + always a pointer and no additional indirection is used. + Note that pvoid_type_node is for 'type(c_ptr), value' (and c_funloc). */ static bool gfc_omp_is_optional_argument (const_tree decl) @@ -73,6 +75,7 @@ gfc_omp_is_optional_argument (const_tree decl) && DECL_LANG_SPECIFIC (decl) && TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE && !VOID_TYPE_P (TREE_TYPE (TREE_TYPE (decl))) + && TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) != FUNCTION_TYPE && GFC_DECL_OPTIONAL_ARGUMENT (decl)); } diff --git a/libgomp/testsuite/libgomp.fortran/dummy-procs-1.f90 b/libgomp/testsuite/libgomp.fortran/dummy-procs-1.f90 new file mode 100644 index 000..fcb17ce69a9 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/dummy-procs-1.f90 @@ -0,0 +1,393 @@ +! { dg-do run } +! +! PR fortran/99171 +! +! Check dummy procedure arguments, especially optional ones +! +module m + use iso_c_binding + implicit none (type, external) + integer :: cnt + integer :: cnt2 +contains + subroutine proc() +cnt = cnt + 1 + end subroutine + + subroutine proc2() +cnt2 = cnt2 + 1 + end subroutine + + subroutine check(my_proc) +procedure(proc) :: my_proc +cnt = 42 +call my_proc() +if (cnt /= 43) stop 1 + +!$omp parallel + call my_proc() +!$omp end parallel +if (cnt <= 43) stop 2 + end + + subroutine check_opt(my_proc) +procedure(proc), optional :: my_proc +logical :: is_present +is_present = present(my_proc) +cnt = 55 +if (present (my_proc)) then + call my_proc() + if (cnt /= 56) stop 3 +endif + +!$omp parallel + if (is_present .neqv. present (my_proc)) stop 4 + if (present (my_proc)) then +call my_proc() +if (cnt <= 56) stop 5 + end if +!$omp end parallel +if (is_present) then + if (cnt <= 56) stop 6 +else if (cnt /= 55) then + stop 7 +end if + end + + subroutine check_ptr(my_proc) +procedure(proc), pointer :: my_proc +logical :: is_assoc +integer :: mycnt +is_assoc = associated (my_proc) + +cnt = 10 +cnt2 = 20 +if (associated (my_proc)) then + call my_proc() + if (cnt /= 11 .or. cnt2 /= 20) stop 8 +endif + +!$omp parallel + if (is_assoc .neqv. associated (my_proc)) stop 9 + if (associated (my_proc)) then +if (.not. associated (my_proc, proc)) stop 10 +call my_proc() +if (cnt <= 11 .or. cnt2 /= 20) stop 11 + else if (cnt /= 10 .or. cnt2 /= 20) then +stop 12 + end if +!$omp end parallel +if (is_assoc .neqv. associated (my_proc)) stop 13 +if (associated (my_proc)) then + if (cnt <= 11 .or. cnt2 /= 20) stop 14 +else if (is_assoc .and. (cnt /= 11 .or. cnt2 /= 20)) then + stop 15 +end if + +cnt = 30 +cnt2 = 40 +mycnt = 0 +!$omp parallel shared(mycnt) + !$omp critical + my_proc => proc2 + if (.not.associated (my_proc, proc2)) stop 17 + mycnt = mycnt + 1 + call my_proc() + if (cnt2 /= 40 + mycnt .or. cnt /= 30) stop 18 + !$omp end critical +!$omp end parallel +if (.not.associated (my_proc, proc2)) stop 19 +if (cnt2 /= 40 + mycnt .or. cnt /= 30) stop 20 + end + + subroutine check_ptr_o
Re: [Patch] Fortran/OpenMP: Fix optional dummy procedures [PR99171]
On Mon, Feb 22, 2021 at 01:06:56PM +0100, Tobias Burnus wrote: > Normal dummy arguments get some additional redirection if they are > OPTIONAL; however, that's not the case for dummy procedures. > > That was shown by a simple 'procedure(), optional :: proc' example > in the PR. – The fix is as simple. > > However, I thought it still makes sense to test all combinations of > procedure pointer (incl. c_funptr) with optional and pointer... > > OK for mainline and GCC 10 (it is a 10/11 regression)? Ok, thanks. > gcc/fortran/ChangeLog: > > PR fortran/99171 > * trans-openmp.c (gfc_omp_is_optional_argument): Regard optional > dummy procs as nonoptional as no special treatment is needed. > > libgomp/ChangeLog: > > PR fortran/99171 > * testsuite/libgomp.fortran/dummy-procs-1.f90: New test. > > gcc/fortran/trans-openmp.c | 5 +- > .../testsuite/libgomp.fortran/dummy-procs-1.f90| 393 > + > 2 files changed, 397 insertions(+), 1 deletion(-) Jakub
Re: [PATCH v2 2/2] ada: add 128bit operation to MIPS N32 and N64
Arnaud Charlet 于2021年2月18日周四 下午3:38写道: > > > For MIPS N64 and N32: > > add GNATRTL_128BIT_PAIRS to LIBGNAT_TARGET_PAIRS > > add GNATRTL_128BIT_OBJS to EXTRA_GNATRTL_NONTASKING_OBJS > > > > gcc/ada/ChangeLog: > > PR ada/98996 > > * Makefile.rtl (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS) > > : add 128Bit operation file to MIPS N64 and N32. > > Note that the ChangeLog is generated automatically these days. > > The change is OK, thanks. can you help me to commit this patchset? > > > --- > > gcc/ada/ChangeLog| 6 ++ > > gcc/ada/Makefile.rtl | 12 > > 2 files changed, 18 insertions(+) > > > > diff --git a/gcc/ada/ChangeLog b/gcc/ada/ChangeLog > > index 52faefaa2ae..3565a32c5ac 100644 > > --- a/gcc/ada/ChangeLog > > +++ b/gcc/ada/ChangeLog > > @@ -1,3 +1,9 @@ > > +2021-02-18 YunQiang Su > > + > > + PR ada/98996 > > + * Makefile.rtl (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS) > > + : add 128Bit operation file to MIPS N64 and N32. > > + > > 2021-02-12 Arnaud Charlet > > > > * repinfo.ads, repinfo.adb (*SO_Ref*): Restore. > > diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl > > index 35faf13ea46..d86eb8acbf3 100644 > > --- a/gcc/ada/Makefile.rtl > > +++ b/gcc/ada/Makefile.rtl > > @@ -2311,6 +2311,18 @@ ifeq ($(strip $(filter-out mips% > > linux%,$(target_cpu) $(target_os))),) > >s-tpopsp.adb >system.ads > > > + ifeq ($(strip $(filter-out mips64% mipsisa64%,$(target_cpu))),) > > +ifneq ($(strip $(MULTISUBDIR)),/32) > > + LIBGNAT_TARGET_PAIRS += $(GNATRTL_128BIT_PAIRS) > > + EXTRA_GNATRTL_NONTASKING_OBJS += $(GNATRTL_128BIT_OBJS) > > +endif > > + else > > +ifeq ($(strip $(filter-out /64 /n32,$(MULTISUBDIR))),) > > + LIBGNAT_TARGET_PAIRS += $(GNATRTL_128BIT_PAIRS) > > + EXTRA_GNATRTL_NONTASKING_OBJS += $(GNATRTL_128BIT_OBJS) > > +endif > > + endif > > + > >TOOLS_TARGET_PAIRS = indepsw.adb > > >EXTRA_GNATRTL_TASKING_OBJS=s-linux.o > > -- > > 2.20.1
Re: [PATCH][PR98791]: IRA: Make sure allocno copy mode's are ordered
Hi Alex, On 22/02/2021 10:20, Alex Coplan wrote: For the testcase, you might want to use the one I posted most recently: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3 which avoids the dependency on the aarch64-autovec-preference param (which is in GCC 11 only) as this will simplify backporting. But if it's preferable to have a testcase without SVE intrinsics for GCC 11 then we should stick with that. I don't see any problem with having SVE intrinsics in the testcase, committed with your other test as I agree it makes the backport easier eventually. Thanks for pointing that out. diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c index 2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0 100644 --- a/gcc/ira-conflicts.c +++ b/gcc/ira-conflicts.c @@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)]; ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)]; - if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2) + if (!allocnos_conflict_for_copy_p (a1, a2) + && offset1 == offset2 + && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)), + GET_MODE_PRECISION (ALLOCNO_MODE (a2 { cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn, ira_curr_loop_tree_node); diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c new file mode 100644 index ..cc1f1831afb68ba70016cbe26f8f9273cfceafa8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c @@ -0,0 +1,12 @@ +/* PR rtl-optimization/98791 */ +/* { dg-do compile } */ +/* { dg-options "-O -ftree-vectorize" } */ +#include +extern char a[11]; +extern long b[]; +void f() { + for (int d; d < 10; d++) { +a[d] = svaddv(svptrue_b8(), svdup_u8(0)); +b[d] = 0; + } +}
Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]
On Fri, Feb 19, 2021 at 07:12:42PM +, Kwok Cheung Yeung wrote: > I have opted for a union of completion_sem (for tasks that are undeferred) > and a struct gomp_team *detach_team (for deferred tasks) that holds the team > if the completion event has not yet fulfilled, or NULL if is it. I don't see > the point of having an indirection to the union here since the union is just > the size of a pointer, so it might as well be inlined. I see three issues with the union of completion_sem and detach_team done that way. 1) while linux --enable-futex and accel gomp_sem_t is small (int), rtems and especially posix gomp_sem_t is large; so while it might be a good idea to inline gomp_sem_t on config/{linux,accel} into the union, for the rest it might be better to use indirection; if it is only for the undeferred tasks, it could be just using an automatic variable and put into the struct address of that; could be done either always, or define some macro in config/{linux,accel}/sem.h that gomp_sem_t is small and decide on the indirection based on that macro 2) kind == GOMP_TASK_UNDEFERRED is true also for the deferred tasks while running the cpyfn callback; guess this could be dealt with making sure the detach handling is done only after thr->task = task; if (cpyfn) { cpyfn (arg, data); task->copy_ctors_done = true; } else memcpy (arg, data, arg_size); thr->task = parent; task->kind = GOMP_TASK_WAITING; task->fn = fn; task->fn_data = arg; task->final_task = (flags & GOMP_TASK_FLAG_FINAL) >> 1; I see you've instead removed the GOMP_TASK_UNDEFERRED but the rationale for that is that the copy constructors are being run synchronously 3) kind is not constant, for the deferred tasks it can change over the lifetime of the task, as you've said in the comments, it is kind == GOMP_TASK_UNDEFERRED vs. other values; while the changes of task->kind are done while holding the task lock, omp_fulfill_event reads it before locking that lock, so I think it needs to be done using if (__atomic_load_n (&task->kind, MEMMODEL_RELAXED) == GOMP_TASK_UNDEFERRED) Pedantically the stores to task->kind also need to be done with __atomic_store_n MEMMODEL_RELAXED. Now, similarly for 3) on task->kind, task->detach_team is similar case, again, some other omp_fulfill_event can clear it (under lock, but still read outside of the lock), so it probably should be read with struct gomp_team *team = __atomic_load_n (&task->detach_team, MEMMODEL_RELAXED); And again, pedantically the detach_team stores should be atomic relaxed stores too. > > Do you agree with this, or see some reason why this can't work? > > The main problem I see is this code in gomp_barrier_handle_tasks: > > if (--team->task_count == 0 > && gomp_team_barrier_waiting_for_tasks (&team->barrier)) > { > gomp_team_barrier_done (&team->barrier, state); > > We do not have access to state from within omp_fulfill_event, so how should > this be handled? Sure, omp_fulfill_event shouldn't do any waiting, it needs to awake anything that could have been waiting. > @@ -688,8 +697,7 @@ struct gomp_team >int work_share_cancelled; >int team_cancelled; > > - /* Tasks waiting for their completion event to be fulfilled. */ > - struct priority_queue task_detach_queue; > + /* Number of tasks waiting for their completion event to be fulfilled. */ >unsigned int task_detach_count; Do we need task_detach_count? Currently it is only initialized and incremented/decremented, but never tested for anything. Though see below. > + gomp_debug (0, "omp_fulfill_event: %p event fulfilled for finished task\n", > + task); > + size_t new_tasks = gomp_task_run_post_handle_depend (task, team); > + gomp_task_run_post_remove_parent (task); > + gomp_clear_parent (&task->children_queue); > + gomp_task_run_post_remove_taskgroup (task); > + team->task_count--; > + team->task_detach_count--; > + > + /* Wake up any threads that may be waiting for the detached task > + to complete. */ > + struct gomp_task *parent = task->parent; > + > + if (parent && parent->taskwait) > +{ > + if (parent->taskwait->in_taskwait) > + { > + parent->taskwait->in_taskwait = false; > + gomp_sem_post (&parent->taskwait->taskwait_sem); > + } > + else if (parent->taskwait->in_depend_wait) > + { > + parent->taskwait->in_depend_wait = false; > + gomp_sem_post (&parent->taskwait->taskwait_sem); > + } > +} Looking at gomp_task_run_post_remove_parent, doesn't that function already handle the in_taskwait and in_depend_wait gomp_sem_posts? > + if (task->taskgroup && task->taskgroup->in_taskgroup_wait) > +{ > + task->taskgroup->in_taskgroup_wait = false; > + gomp_sem_post (&task->taskgroup->taskgroup_sem); > +} And into gomp_task_run_post_remove_t
Re: [PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints
On 15/02/21 23:22 +, Neven Sajko wrote: There is a long-standing, but undocumented GCC inline assembly feature that's part of the extended asm GCC extension to C and C++: extended asm empty input constraints. Although I don't really use extended asm much, and I never contributed to GCC before; I tried to document the feature as far as I understand it. I ran make html to check that the changed Texinfo is well formed. FTR, empty input constraints have been mentioned on the GCC mailing lists, e.g.: https://gcc.gnu.org/pipermail/gcc-help/2015-June/124410.html I release this contribution into the public domain. Neven Sajko gcc/ChangeLog: * doc/md.texi: Document extended asm empty input constraints diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index e3686dbfe..deccfd38a 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1131,7 +1131,102 @@ the addressing register. @subsection Simple Constraints @cindex simple constraints -The simplest kind of constraint is a string full of letters, each of +An input constraint is allowed to be an empty string, in which case it is +called an empty input constraint. (When an empty input constraint is used, +the assembler template will most probably also be empty. I.e., the @code{asm} +declaration need not contain actual assembly code.) An empty input This parenthesis seems to be important to the next sentence, so should it really be in parentheses? Also, it's an asm statement, not asm declaration, isn't it? And we don't seem to document that an empty assembler tempalte is allowed, except in passing here. +constraint can be used to create an artificial dependency on a C or C++ Is it redundant to say "C or C++ variable"? What other kind of variable could it be talking about? +variable (the variable that appears in the expression associated with the +constraint) without incurring unnecessary costs to performance. I think this whole paragraph (in fact all the new text) is quite verbose and could be much more succint. +An input constraint is allowed to be an empty string, in which case it is +called an empty input constraint. (When an empty input constraint is used, +the assembler template will most probably also be empty. I.e., the @code{asm} +declaration need not contain actual assembly code.) An empty input +constraint can be used to create an artificial dependency on a C or C++ +variable (the variable that appears in the expression associated with the +constraint) without incurring unnecessary costs to performance. Also, does the discussion of empty input constraints really belong first? Isn't it a special case, and so should be documented after the more common case? And do we really want to spend as long talking about this special case as we do about all the other types of input constraint combined? +An example of where such behavior may be useful is for preventing compiler "This can be useful to prevent compiler optimizations like dead store elimination and hoisting code outside loops. Specific applications may include ..." +optimizations like dead store elimination or hoisting code outside a loop for +certain pieces of C or C++ code. Specific applications may include direct +interaction with hardware features; or things like testing, fuzzing and +benchmarking. + +Here's a simple C++20 program that is not useful in practice but demonstrates +relevant behavior; store it as a file called asm.cc: + +@verbatim +#include + +int +main() { +// Greater than or equal to zero. +constexpr int asmV = ASM_V; + +// The exact content of v is irrelevant for +// this example. +std::vector v{7, 6, 9, 3, 2, 0}; This is ill-formed because int -> char is a narrowing conversion. Please use valid C++ for the example. + +for (int i{0}; i < (1 << 28); i++) { +for (int j{0}; j < 6; j++) { I don't think the brace initialization helps the example. Consider that many readers of the GCC manual are C programmers. Why use a feature specific to C++ when you could just say "int i = 0;" so everybody understands it? +// The exact operation on the contents +// of v is not relevant for this +// example. +v[j]++; + +if constexpr (1 <= asmV) { I don't think the Yoda Conditions help the example. +asm volatile ("" :: ""(v.size())); +for (auto x: v) { +asm volatile ("" :: ""(x)); +} +} +if constexpr (2 <= asmV) { +asm volatile ("" :: ""(v.size())); +for (auto x: v) { +asm volatile ("" :: ""(x)); +} +} +if constexpr (3 <= asmV) { +asm volatile ("" :: ""(v.size())); +for (auto x: v) { +asm volatile ("" :: ""(x)); +} +} +} +} + +return 0; A return statement is not needed for main, so removing it k
c++: cross-header-unit template definitions [PR 99153]
A member function can be defined in a different header-file than the one defining the class. In such situations we must unmark the decl as imported. When the entity is a template we failed to unmark the template_decl. Perhaps the duplication of these flags on the template_decl from the underlying decl is an error. I set on the fence about it for a long time during development, but I don't think now is the time to change that (barring catastrophic bugs). PR c++/99153 gcc/cp/ * decl.c (duplicate_decls): Move DECL_MODULE_IMPORT_P propagation to common-path. * module.cc (set_defining_module): Add assert. gcc/testsuite/ * g++.dg/modules/pr99153_a.H: New. * g++.dg/modules/pr99153_b.H: New. -- Nathan Sidwell diff --git c/gcc/cp/decl.c w/gcc/cp/decl.c index 6f3414f058e..7fa8f52d667 100644 --- c/gcc/cp/decl.c +++ w/gcc/cp/decl.c @@ -2879,19 +2879,6 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden) (char *) newdecl + sizeof (struct tree_common), sizeof (struct tree_decl_common) - sizeof (struct tree_common)); - if (DECL_LANG_SPECIFIC (olddecl) && DECL_TEMPLATE_INFO (olddecl)) - { - /* Repropagate the module information to the template. */ - tree tmpl = DECL_TI_TEMPLATE (olddecl); - - if (DECL_TEMPLATE_RESULT (tmpl) == olddecl) - { - DECL_MODULE_PURVIEW_P (tmpl) = DECL_MODULE_PURVIEW_P (olddecl); - gcc_checking_assert (!DECL_MODULE_IMPORT_P (olddecl)); - DECL_MODULE_IMPORT_P (tmpl) = false; - } - } - switch (TREE_CODE (newdecl)) { case LABEL_DECL: @@ -2925,6 +2912,19 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden) } } + if (DECL_LANG_SPECIFIC (olddecl) && DECL_TEMPLATE_INFO (olddecl)) +{ + /* Repropagate the module information to the template. */ + tree tmpl = DECL_TI_TEMPLATE (olddecl); + + if (DECL_TEMPLATE_RESULT (tmpl) == olddecl) + { + DECL_MODULE_PURVIEW_P (tmpl) = DECL_MODULE_PURVIEW_P (olddecl); + gcc_checking_assert (!DECL_MODULE_IMPORT_P (olddecl)); + DECL_MODULE_IMPORT_P (tmpl) = false; + } +} + if (VAR_OR_FUNCTION_DECL_P (newdecl)) { if (DECL_EXTERNAL (olddecl) diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc index 3d17b8ddcdb..7a40be3db35 100644 --- c/gcc/cp/module.cc +++ w/gcc/cp/module.cc @@ -18516,6 +18516,7 @@ set_defining_module (tree decl) gcc_checking_assert (!use_tpl); /* Get to the TEMPLATE_DECL. */ decl = TI_TEMPLATE (ti); + gcc_checking_assert (!DECL_MODULE_IMPORT_P (decl)); } /* Record it on the class_members list. */ diff --git c/gcc/testsuite/g++.dg/modules/pr99153_a.H w/gcc/testsuite/g++.dg/modules/pr99153_a.H new file mode 100644 index 000..3eaa76bdc32 --- /dev/null +++ w/gcc/testsuite/g++.dg/modules/pr99153_a.H @@ -0,0 +1,11 @@ +// { dg-additional-options -fmodule-header } +// { dg-module-cmi {} } + +template +struct pair +{ + inline void Frob (); +}; + +template +inline void Widget (); diff --git c/gcc/testsuite/g++.dg/modules/pr99153_b.H w/gcc/testsuite/g++.dg/modules/pr99153_b.H new file mode 100644 index 000..5699378d317 --- /dev/null +++ w/gcc/testsuite/g++.dg/modules/pr99153_b.H @@ -0,0 +1,15 @@ +// PR 99153 Mismatched flags on template and result +// { dg-additional-options -fmodule-header } +// { dg-module-cmi {} } + +import "pr99153_a.H"; + +template +inline void pair<_T1>::Frob() +{ } + + +template +inline void Widget () +{ +}
c++: Recursing header imports and other duplicates [PR 99174]
The fix for 98741 introduced two issues. (a) recursive header units iced because we tried to read the preprocessor state after having failed to read the config. (b) we could have duplicate imports of named modules in our pending queue, and that would lead to duplicate requests for pathnames, which coupled with the use of a null-pathname to indicate we'd asked could lead to desynchronization with the module mapper. Fixed by adding a 'visited' flag to module state, so we ask exactly once. PR c++/99174 gcc/cp * module.cc (struct module_state): Add visited_p flag. (name_pending_imports): Use it to avoid duplicate requests. (preprocess_module): Don't read preprocessor state if we failed to load a module's config. gcc/testsuite/ * g++.dg/modules/pr99174-1_a.C: New. * g++.dg/modules/pr99174-1_b.C: New. * g++.dg/modules/pr99174-1_c.C: New. * g++.dg/modules/pr99174.H: New. -- Nathan Sidwell diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc index 3d17b8ddcdb..09a9ca8778b 100644 --- c/gcc/cp/module.cc +++ w/gcc/cp/module.cc @@ -3551,9 +3551,10 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state { bool call_init_p : 1; /* This module's global initializer needs calling. */ bool inform_read_p : 1; /* Inform of a read. */ + bool visited_p : 1;/* A walk-once flag. */ /* Record extensions emitted or permitted. */ unsigned extensions : SE_BITS; - /* 13 bits used, 3 bits remain */ + /* 14 bits used, 2 bits remain */ public: module_state (tree name, module_state *, bool); @@ -3787,6 +3788,7 @@ module_state::module_state (tree name, module_state *parent, bool partition) partition_p = partition; inform_read_p = false; + visited_p = false; extensions = 0; if (name && TREE_CODE (name) == STRING_CST) @@ -19304,16 +19306,16 @@ name_pending_imports (cpp_reader *reader, bool at_end) { module_state *module = (*pending_imports)[ix]; gcc_checking_assert (module->is_direct ()); - if (!module->filename) + if (!module->filename + && !module->visited_p + && (module->is_header () || !only_headers)) { - Cody::Flags flags - = (flag_preprocess_only ? Cody::Flags::None - : Cody::Flags::NameOnly); + module->visited_p = true; + Cody::Flags flags = (flag_preprocess_only + ? Cody::Flags::None : Cody::Flags::NameOnly); - if (only_headers && !module->is_header ()) - ; - else if (module->module_p - && (module->is_partition () || module->exported_p)) + if (module->module_p + && (module->is_partition () || module->exported_p)) mapper->ModuleExport (module->get_flatname (), flags); else mapper->ModuleImport (module->get_flatname (), flags); @@ -19325,15 +19327,13 @@ name_pending_imports (cpp_reader *reader, bool at_end) for (unsigned ix = 0; ix != pending_imports->length (); ix++) { module_state *module = (*pending_imports)[ix]; - gcc_checking_assert (module->is_direct ()); - if (only_headers && !module->is_header ()) - ; - else if (!module->filename) + if (module->visited_p) { - Cody::Packet const &p = *r_iter; - ++r_iter; + module->visited_p = false; + gcc_checking_assert (!module->filename); - module->set_filename (p); + module->set_filename (*r_iter); + ++r_iter; } } @@ -19443,7 +19443,8 @@ preprocess_module (module_state *module, location_t from_loc, /* Now read the preprocessor state of this particular import. */ unsigned n = dump.push (module); - if (module->read_preprocessor (true)) + if (module->loadedness == ML_CONFIG + && module->read_preprocessor (true)) module->import_macros (); dump.pop (n); diff --git c/gcc/testsuite/g++.dg/modules/pr99174-1_a.C w/gcc/testsuite/g++.dg/modules/pr99174-1_a.C new file mode 100644 index 000..c22b45bff45 --- /dev/null +++ w/gcc/testsuite/g++.dg/modules/pr99174-1_a.C @@ -0,0 +1,8 @@ +// PR 99174 what if we import the same module twice (with no +// intervening header import)? +// { dg-additional-options -fmodules-ts } + +export module Foo; +// { dg-module-cmi Foo } + +export void Foo (); diff --git c/gcc/testsuite/g++.dg/modules/pr99174-1_b.C w/gcc/testsuite/g++.dg/modules/pr99174-1_b.C new file mode 100644 index 000..aaa0a9492ad --- /dev/null +++ w/gcc/testsuite/g++.dg/modules/pr99174-1_b.C @@ -0,0 +1,6 @@ +// { dg-additional-options -fmodules-ts } + +export module Bar; +// { dg-module-cmi Bar } + +export void Bar (); diff --git c/gcc/testsuite/g++.dg/modules/pr99174-1_c.C w/gcc/testsuite/g++.dg/modules/pr99174-1_c.C new file mode 100644 index 000..58936f292f8 --- /dev/null +++ w/gcc/testsuite/g++.dg/modules/pr99174-1_c.C @@ -0,0 +1,11 @@ +// { dg-additional-options -fmodules-ts } + +import Foo; +import Foo; +import Bar; + +int main () +{ + Foo (); + Bar (); +} diff --git c/gcc/testsuite/g++.dg/modules/pr99174.H w/gcc/testsuite/g+
[committed] hppa: Add mi_thunk support for vcalls
The attached change adds mi_thunk support for vcalls on hppa. Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11. Committed to trunk and gcc-10. Dave -- John David Anglin dave.ang...@bell.net Add mi_thunk support for vcalls on hppa. gcc/ChangeLog: PR target/85074 * config/pa/pa.c (TARGET_ASM_CAN_OUTPUT_MI_THUNK): Define as hook_bool_const_tree_hwi_hwi_const_tree_true. (pa_asm_output_mi_thunk): Add support for nonzero vcall_offset. diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c index 3921b5c98de..d7fcd11e504 100644 --- a/gcc/config/pa/pa.c +++ b/gcc/config/pa/pa.c @@ -293,7 +293,7 @@ static size_t n_deferred_plabels = 0; #undef TARGET_ASM_OUTPUT_MI_THUNK #define TARGET_ASM_OUTPUT_MI_THUNK pa_asm_output_mi_thunk #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK -#define TARGET_ASM_CAN_OUTPUT_MI_THUNK default_can_output_mi_thunk_no_vcall +#define TARGET_ASM_CAN_OUTPUT_MI_THUNK hook_bool_const_tree_hwi_hwi_const_tree_true #undef TARGET_ASM_FILE_END #define TARGET_ASM_FILE_END pa_file_end @@ -8461,12 +8461,15 @@ pa_is_function_label_plus_const (rtx op) && GET_CODE (XEXP (op, 1)) == CONST_INT); } -/* Output assembly code for a thunk to FUNCTION. */ +/* Output the assembler code for a thunk function. THUNK_DECL is the + declaration for the thunk function itself, FUNCTION is the decl for + the target function. DELTA is an immediate constant offset to be + added to THIS. If VCALL_OFFSET is nonzero, the word at + *(*this + vcall_offset) should be added to THIS. */ static void pa_asm_output_mi_thunk (FILE *file, tree thunk_fndecl, HOST_WIDE_INT delta, - HOST_WIDE_INT vcall_offset ATTRIBUTE_UNUSED, - tree function) + HOST_WIDE_INT vcall_offset, tree function) { const char *fnname = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (thunk_fndecl)); static unsigned int current_thunk_number; @@ -8482,201 +8485,386 @@ pa_asm_output_mi_thunk (FILE *file, tree thunk_fndecl, HOST_WIDE_INT delta, assemble_start_function (thunk_fndecl, fnname); final_start_function (emit_barrier (), file, 1); - /* Output the thunk. We know that the function is in the same - translation unit (i.e., the same space) as the thunk, and that - thunks are output after their method. Thus, we don't need an - external branch to reach the function. With SOM and GAS, - functions and thunks are effectively in different sections. - Thus, we can always use a IA-relative branch and the linker - will add a long branch stub if necessary. - - However, we have to be careful when generating PIC code on the - SOM port to ensure that the sequence does not transfer to an - import stub for the target function as this could clobber the - return value saved at SP-24. This would also apply to the - 32-bit linux port if the multi-space model is implemented. */ - if ((!TARGET_LONG_CALLS && TARGET_SOM && !TARGET_PORTABLE_RUNTIME - && !(flag_pic && TREE_PUBLIC (function)) - && (TARGET_GAS || last_address < 262132)) - || (!TARGET_LONG_CALLS && !TARGET_SOM && !TARGET_PORTABLE_RUNTIME - && ((targetm_common.have_named_sections - && DECL_SECTION_NAME (thunk_fndecl) != NULL - /* The GNU 64-bit linker has rather poor stub management. - So, we use a long branch from thunks that aren't in - the same section as the target function. */ - && ((!TARGET_64BIT - && (DECL_SECTION_NAME (thunk_fndecl) - != DECL_SECTION_NAME (function))) - || ((DECL_SECTION_NAME (thunk_fndecl) - == DECL_SECTION_NAME (function)) - && last_address < 262132))) - /* In this case, we need to be able to reach the start of -the stub table even though the function is likely closer -and can be jumped to directly. */ - || (targetm_common.have_named_sections - && DECL_SECTION_NAME (thunk_fndecl) == NULL - && DECL_SECTION_NAME (function) == NULL - && total_code_bytes < MAX_PCREL17F_OFFSET) - /* Likewise. */ - || (!targetm_common.have_named_sections - && total_code_bytes < MAX_PCREL17F_OFFSET -{ - if (!val_14) - output_asm_insn ("addil L'%2,%%r26", xoperands); - - output_asm_insn ("b %0", xoperands); - - if (val_14) - { - output_asm_insn ("ldo %2(%%r26),%%r26", xoperands); - nbytes += 8; + if (!vcall_offset) +{ + /* Output the thunk. We know that the function is in the same +translation unit (i.e., the same space) as the thunk, and that +thunks are output after their method. Thus, we don't need an +external branch to reach the function. With SOM and GAS, +functions and thunk
[gcc-12 PATCH] ira: Correct HONOR_REG_ALLOC_ORDER usage
The intention of HONOR_REG_ALLOC_ORDER is to ensure that IRA allocates registers in the order given by REG_ALLOC_ORDER. However in ira_better_spill_reload_regno_p, there is still a place where the calculation depends on the presence of REG_ALLOC_ORDER, ignoring HONOR_REG_ALLOC_ORDER macro altogether. The patch uses the correct macro at this place. On the other hand, assign_hard_reg function respects HONOR_REG_ALLOC_ORDER, but expects this macro to return 1 to avoid internal cost calculations. As the macro is defined to 0 by default, it is expected that targets redefine HONOR_REG_ALLOC_ORDER to return nonzero value, even if REG_ALLOC_ORDER is defined. This approach is prone to errors, so the patch defines HONOR_REG_ALLOC_ORDER to 1 by default if REG_ALLOC_ORDER is defined. 2021-02-22 Uroš Bizjak gcc/ * defaults.h (HONOR_REG_ALLOC_ORDER): If not defined, define to 1 if REG_ALLOC_ORDER is defined. * doc/tm.texi.in (HONOR_REG_ALLOC_ORDER): Describe new default definition. * doc/tm.texi: Regenerate. * ira-color.c (ira_better_spill_reload_regno_p): Use HONOR_REG_ALLOC_ORDER instead of REG_ALLOC_ORDER to determine better spill reload regno. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. OK for gcc-12 when it opens? Uros. diff --git a/gcc/defaults.h b/gcc/defaults.h index 91216593e75..2af4add0c05 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1047,7 +1047,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #endif #ifndef HONOR_REG_ALLOC_ORDER -#define HONOR_REG_ALLOC_ORDER 0 +# if defined REG_ALLOC_ORDER +# define HONOR_REG_ALLOC_ORDER 1 +# else +# define HONOR_REG_ALLOC_ORDER 0 +# endif #endif /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function, diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 062785af1e2..9a346555ec8 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -2050,7 +2050,10 @@ prologue and restoring it in the epilogue. This discourages it from using call-saved registers. If a machine wants to ensure that IRA allocates registers in the order given by REG_ALLOC_ORDER even if some call-saved registers appear earlier than call-used ones, then define this -macro as a C expression to nonzero. Default is 0. +macro as a C expression to nonzero. + +The default definition is @code{1} if @code{REG_ALLOC_ORDER} is defined; +otherwise, it is @code{0}. @end defmac @defmac IRA_HARD_REGNO_ADD_COST_MULTIPLIER (@var{regno}) diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 3b19e6f4281..1cb7fad7a46 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -1797,7 +1797,10 @@ prologue and restoring it in the epilogue. This discourages it from using call-saved registers. If a machine wants to ensure that IRA allocates registers in the order given by REG_ALLOC_ORDER even if some call-saved registers appear earlier than call-used ones, then define this -macro as a C expression to nonzero. Default is 0. +macro as a C expression to nonzero. + +The default definition is @code{1} if @code{REG_ALLOC_ORDER} is defined; +otherwise, it is @code{0}. @end defmac @defmac IRA_HARD_REGNO_ADD_COST_MULTIPLIER (@var{regno}) diff --git a/gcc/ira-color.c b/gcc/ira-color.c index 3d01c60800c..4586c9a1e08 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -4835,14 +4835,17 @@ ira_better_spill_reload_regno_p (int *regnos, int *other_regnos, return cost < other_cost; if (length != other_length) return length > other_length; -#ifdef REG_ALLOC_ORDER - if (hard_regno >= 0 && other_hard_regno >= 0) -return (inv_reg_alloc_order[hard_regno] - < inv_reg_alloc_order[other_hard_regno]); -#else - if (call_used_count != other_call_used_count) -return call_used_count > other_call_used_count; -#endif + if (HONOR_REG_ALLOC_ORDER) +{ + if (hard_regno >= 0 && other_hard_regno >= 0) + return (inv_reg_alloc_order[hard_regno] + < inv_reg_alloc_order[other_hard_regno]); +} + else +{ + if (call_used_count != other_call_used_count) + return call_used_count > other_call_used_count; +} return false; }
Re: [PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints
Hi! First off, thanbk you for the patch! On Mon, Feb 15, 2021 at 11:22:52PM +, Neven Sajko via Gcc-patches wrote: > There is a long-standing, but undocumented GCC inline assembly feature > that's part of the extended asm GCC extension to C and C++: extended > asm empty input constraints. There is no such thing. *All* empty constraints have the same semantics: anything whatsoever will do. Any register, any constant, any memory. > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -1131,7 +1131,102 @@ the addressing register. > @subsection Simple Constraints > @cindex simple constraints > > -The simplest kind of constraint is a string full of letters, each of > +An input constraint is allowed to be an empty string, in which case it is > +called an empty input constraint. That is just shorthand for "empty constraint that is used for an input operand". It is not special, and it *is* documented: https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints The simplest kind of constraint is a string full of letters, each of which describes one kind of operand that is permitted. A length zero string is allowed as well. This could be made more explicit sure; OTOH, it isn't very often useful. So your example (using it for making a dependency) is certainly useful to have. But it is not a special case at all. > (When an empty input constraint is used, > +the assembler template will most probably also be empty. I.e., the @code{asm} > +declaration need not contain actual assembly code.) Don't use parentheses like this in documentation please. > An empty input > +constraint can be used to create an artificial dependency on a C or C++ > +variable (the variable that appears in the expression associated with the > +constraint) without incurring unnecessary costs to performance. It still needs a register (or memory) reserved there (or sometimes a constant can be used, but you have no dependency in that case!) > +An example of where such behavior may be useful is for preventing compiler > +optimizations like dead store elimination or hoisting code outside a loop for > +certain pieces of C or C++ code. You should not think about preventing the compiler from doing something. Instead, you can give the compiler extra information that makes it *do* something: it has to, because it has to implement the semantics your source program has. > Specific applications may include direct > +interaction with hardware features; or things like testing, fuzzing and > +benchmarking. What does this mean? Here is a simple example showing why this isn't as simple to use as you imply here: === void f(int x) { asm volatile("" :: ""(x)); } void g(void) { return f(42); } === Both function compile to (taking aarch64 as example) just "ret". But, if you look at what the compiler does, you see in the "dfinish" pass it has for f: (insn:TI 6 3 20 (asm_operands/v ("") ("") 0 [ (reg:SI 0 x0 [93]) ] [ (asm_input:SI ("") zlc.c:3) ] [] zlc.c:3) "zlc.c":3:2 -1 (expr_list:REG_DEAD (reg:SI 0 x0 [93]) (nil))) (so it has register x0 as input), while function g has (insn:TI 5 2 16 (asm_operands/v ("") ("") 0 [ (const_int 42 [0x2a]) ] [ (asm_input:SI ("") zlc.c:3) ] [] zlc.c:3) "zlc.c":3:2 -1 (nil)) which has no dependency, gets fed the constant 42 instead, because *anything at all* is allowed by an empty constraint. You can also make this clear by using asm volatile("# %0" :: ""(x)); which gives # x0 resp. # 42 or, with -fverbose-asm: # x0// tmp93 and # 42// which is clear as mud, but it means in f there was a variable as input to the asm, and in g there wasn't. Segher
cris: Fix addi insn mult vs. shift canonicalization
Ever since the canonicalization clean-up of (mult X (1 << N)) into (ashift X N) outside addresses, the CRIS addi patterns have been unmatched. No big deal. Unfortunately, nobody thought of adjusting reloaded addresses, so transforming mult into a shift has to be a kludged for when reload decides that it has to move an address like (plus (mult reg0 4) reg1) into a register, as happens building libgfortran. (No, simplify_rtx et al don't automatically DTRT.) Something less kludgy would make sense if it wasn't for the current late development stage and reload being deprecated. I don't know whether this issue is absent for LRA, though. I added a testsuite for the reload issue, despite being exposed by a libgfortran build, so the issue would be covered by C/C++ builds, but to the CRIS test-suite, not as a generic test, to avoid bad feelings from anyone preferring short test-times to redundant coverage. Committed. gcc: * config/cris/cris.c (cris_print_operand) <'T'>: Change valid operand from is now an addi mult-value to shift-value. * config/cris/cris.md (*addi): Change expression of scaled operand from mult to ashift. * config/cris/cris.md (*addi_reload): New insn_and_split. gcc/testsuite: * gcc.target/cris/torture/sync-reload-mul-1.c: New test. --- gcc/config/cris/cris.c | 23 +-- gcc/config/cris/cris.md| 33 +++--- .../gcc.target/cris/torture/sync-reload-mul-1.c| 13 + 3 files changed, 57 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/cris/torture/sync-reload-mul-1.c diff --git a/gcc/config/cris/cris.c b/gcc/config/cris/cris.c index 48ea8552e593..8a42aa16da13 100644 --- a/gcc/config/cris/cris.c +++ b/gcc/config/cris/cris.c @@ -880,9 +880,6 @@ cris_print_operand (FILE *file, rtx x, int code) { rtx operand = x; - /* Size-strings corresponding to MULT expressions. */ - static const char *const mults[] = { "BAD:0", ".b", ".w", "BAD:3", ".d" }; - /* New code entries should just be added to the switch below. If handling is finished, just return. If handling was just a modification of the operand, the modified operand should be put in @@ -1212,11 +1209,21 @@ cris_print_operand (FILE *file, rtx x, int code) return; case 'T': - /* Print the size letter for an operand to a MULT, which must be a -const_int with a suitable value. */ - if (!CONST_INT_P (operand) || INTVAL (operand) > 4) - LOSE_AND_RETURN ("invalid operand for 'T' modifier", x); - fprintf (file, "%s", mults[INTVAL (operand)]); + { + /* Print the size letter for an operand to a ASHIFT, which must be a + const_int with a suitable value. */ + int shiftval; + + if (!CONST_INT_P (operand)) + LOSE_AND_RETURN ("invalid operand for 'T' modifier", x); + + shiftval = INTVAL (operand); + + if (!(shiftval == 1 || shiftval == 2)) + LOSE_AND_RETURN ("invalid operand for 'T' modifier", x); + + fprintf (file, "%s", shiftval == 1 ? ".w" : ".d"); + } return; case 0: diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md index 0fd29f99ba99..069f7e00b6db 100644 --- a/gcc/config/cris/cris.md +++ b/gcc/config/cris/cris.md @@ -1278,18 +1278,43 @@ (define_insn "addi_mul" (define_insn "*addi" [(set (match_operand:SI 0 "register_operand" "=r") (plus:SI -(mult:SI (match_operand:SI 2 "register_operand" "r") - (match_operand:SI 3 "const_int_operand" "n")) +(ashift:SI (match_operand:SI 2 "register_operand" "r") + (match_operand:SI 3 "const_int_operand" "n")) (match_operand:SI 1 "register_operand" "0")))] "operands[0] != frame_pointer_rtx && operands[1] != frame_pointer_rtx && CONST_INT_P (operands[3]) - && (INTVAL (operands[3]) == 1 - || INTVAL (operands[3]) == 2 || INTVAL (operands[3]) == 4)" + && (INTVAL (operands[3]) == 1 || INTVAL (operands[3]) == 2)" "addi %2%T3,%0" [(set_attr "slottable" "yes") (set_attr "cc" "none")]) +;; The mult-vs-ashift canonicalization-cleanup plagues us: nothing in +;; reload transforms a "scaled multiplication" into an ashift in a +;; reloaded address; it's passed as-is and expected to be recognized, +;; or else we get a tell-tale "unrecognizable insn". +;; On top of that, we *should* match the bare insn, as a *matching +;; pattern* (as opposed to e.g. a reload_load_address expander +;; changing the mul into an ashift), so can_reload_into will re-use +;; registers in the reloaded expression instead of allocating a new +;; register. +(define_insn_and_split "*addi_reload" + [(set (match_operand:SI 0 "register_operand" "=r") + (plus:SI +(mult:SI (match_operand:SI 2 "register_operand" "r") + (match_operand:SI 3 "const_int_operand" "n")) +(match_operand:SI 1 "register_operand" "
cris: testsuite/gcc.target/cris/biap-mul.c: New test.
Needed coverage for that *addi_mul pattern. Committed. gcc/testsuite: * gcc.target/cris/biap-mul.c: New test. --- gcc/testsuite/gcc.target/cris/biap-mul.c | 15 +++ 1 file changed, 15 insertions(+) create mode 100644 gcc/testsuite/gcc.target/cris/biap-mul.c diff --git a/gcc/testsuite/gcc.target/cris/biap-mul.c b/gcc/testsuite/gcc.target/cris/biap-mul.c new file mode 100644 index ..e0054632b239 --- /dev/null +++ b/gcc/testsuite/gcc.target/cris/biap-mul.c @@ -0,0 +1,15 @@ +/* Make sure ADDI is used for trivial multiplications too. */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "\taddi" 2 } } */ +/* { dg-final { scan-assembler-not "\tlsl|\tmul|\tmove|\tadd\[^i\]" } } */ + +int xyzzy (int r10) +{ + return r10 * 5; +} + +int plugh (int r10) +{ + return r10 * 3; +} -- 2.11.0
cris: testsuite/gcc.target/cris/biap.c: Add a Y+=X*2 to the Y+=X*4.
Also, tweak the scan-assembler regexps to include a tab, lest they may spuriously match file-paths in the emitted assembly code, should some be added at some point. And, add "mul", "move" and (non-addi-)"add" to insns that shouldn't appear. Committed. gcc/testsuite: * gcc.target/cris/biap.c: Add a Y+=X*2 to the Y+=X*4. --- gcc/testsuite/gcc.target/cris/biap.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/cris/biap.c b/gcc/testsuite/gcc.target/cris/biap.c index 1f3b4368a36a..f31e61c20c5f 100644 --- a/gcc/testsuite/gcc.target/cris/biap.c +++ b/gcc/testsuite/gcc.target/cris/biap.c @@ -2,10 +2,15 @@ See also PR37939. */ /* { dg-do compile } */ /* { dg-options "-O2" } */ -/* { dg-final { scan-assembler "addi" } } */ -/* { dg-final { scan-assembler-not "lsl" } } */ +/* { dg-final { scan-assembler-times "\taddi" 2 } } */ +/* { dg-final { scan-assembler-not "\tlsl|\tmul|\tmove|\tadd\[^i\]" } } */ int xyzzy (int r10, int r11) { return r11 * 4 + r10; } + +int plugh (int r10, int r11) +{ + return r11 * 2 + r10; +} -- 2.11.0
Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]
On Mon, Feb 22, 2021 at 02:49:44PM +0100, Jakub Jelinek wrote: > So, I think for the team != gomp_thread ()->ts.team > && !do_wake > && gomp_team_barrier_waiting_for_tasks (&team->barrier) > && team->task_detach_count == 0 > case, we need to wake up 1 thread anyway and arrange for it to do: > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > Possibly in > if (!priority_queue_empty_p (&team->task_queue, MEMMODEL_RELAXED)) > add > else if (team->task_count == 0 > && gomp_team_barrier_waiting_for_tasks (&team->barrier)) > { > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > if (to_free) > { > gomp_finish_task (to_free); > free (to_free); > } > return; > } > but the: > if (--team->task_count == 0 > && gomp_team_barrier_waiting_for_tasks (&team->barrier)) > { > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > gomp_mutex_lock (&team->task_lock); > } > in that case would then be incorrect, we don't want to do that twice. > So, either that second if would need to do the to_free handling > and return instead of taking the lock again and looping, or > perhaps we can just do > --team->task_count; > there instead and let the above added else if handle that? FYI, I've just tested that part of change alone whether it doesn't break anything else and it worked fine: diff --git a/libgomp/task.c b/libgomp/task.c index b242e7c8d20..9c27c3b5148 100644 --- a/libgomp/task.c +++ b/libgomp/task.c @@ -1405,6 +1405,19 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state) team->task_running_count++; child_task->in_tied_task = true; } + else if (team->task_count == 0 + && gomp_team_barrier_waiting_for_tasks (&team->barrier)) + { + gomp_team_barrier_done (&team->barrier, state); + gomp_mutex_unlock (&team->task_lock); + gomp_team_barrier_wake (&team->barrier, 0); + if (to_free) + { + gomp_finish_task (to_free); + free (to_free); + } + return; + } gomp_mutex_unlock (&team->task_lock); if (do_wake) { @@ -1479,14 +1492,7 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state) if (do_wake > new_tasks) do_wake = new_tasks; } - if (--team->task_count == 0 - && gomp_team_barrier_waiting_for_tasks (&team->barrier)) - { - gomp_team_barrier_done (&team->barrier, state); - gomp_mutex_unlock (&team->task_lock); - gomp_team_barrier_wake (&team->barrier, 0); - gomp_mutex_lock (&team->task_lock); - } + --team->task_count; } } } Jakub
Re: [PATCH] libstdc++: Fix up parallel_backend_serial.h [PR97549]
- Original Message - > Hi! > > In GCC 10, parallel_backend.h just included parallel_backend_{serial,tbb}.h > and > did nothing beyond that, and parallel_backend_tbb.h provided directly > namespace __pstl { namespace __par_backend { ... } } > and defined everything in there, while parallel_backend_serial.h did: > namespace __pstl { namespace __serial { ... } } and had this > namespace __pstl { namespace __par_backend { using namespace > __pstl::__serial; } } > at the end. > In GCC 11, parallel_backend.h does: > namespace __pstl { namespace __par_backend = __serial_backend; } > after including parallel_backend_serial.h or > namespace __pstl { namespace __par_backend = __tbb_backend; } > after including parallel_backend_tbb.h. The latter then has: > namespace __pstl { namespace __tbb_backend { ... } } > and no using etc. at the end, while parallel_backend_serial.h changed to: > namespace __pstl { namespace __serial_backend { ... } } > but has this leftover block from the GCC 10 times. Even changing that > using namespace __pstl::__serial; > to > using namespace __pstl::__serial_backend; > doesn't work, as it clashes with > namespace __pstl { namespace __par_backend = __serial_backend; } > in parallel_backend.h. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux > both without and with tbb-devel installed. > > Ok for trunk? Yes, thanks. > > 2021-02-20 Jakub Jelinek > > PR libstdc++-v3/97549 > * include/pstl/parallel_backend_serial.h: Remove __pstl::__par_backend. > > --- libstdc++-v3/include/pstl/parallel_backend_serial.h.jj2020-10-21 > 19:33:24.059872401 +0200 > +++ libstdc++-v3/include/pstl/parallel_backend_serial.h 2021-02-19 > 11:59:56.414645219 +0100 > @@ -127,12 +127,4 @@ __parallel_invoke(_ExecutionPolicy&&, _F > } // namespace __serial_backend > } // namespace __pstl > > -namespace __pstl > -{ > -namespace __par_backend > -{ > -using namespace __pstl::__serial; > -} > -} // namespace __pstl > - > #endif /* _PSTL_PARALLEL_BACKEND_SERIAL_H */ > > Jakub >
Committed: g++.dg/warn/Warray-bounds-10..13: Fix for 32-bit newlib targets
See gcc/config/newlib-stdint.h, where targets that have LONG_TYPE_SIZE == 32, get __INT32_TYPE__ defined to "long int". All these tests have "typedef __INT32_TYPE__ int32_t;". Thus the tests fail for 32-bit newlib targets due to related warning messages being matched to "aka int" where the emitted message for these targets have "aka long int". Tested cris-elf and x86_64-linux, committed as obvious. gcc/testsuite: * g++.dg/warn/Warray-bounds-10.C, g++.dg/warn/Warray-bounds-11.C, g++.dg/warn/Warray-bounds-12.C, g++.dg/warn/Warray-bounds-13.C: Handle __INT32_TYPE__ being "long int". --- gcc/testsuite/g++.dg/warn/Warray-bounds-10.C | 24 gcc/testsuite/g++.dg/warn/Warray-bounds-11.C | 24 gcc/testsuite/g++.dg/warn/Warray-bounds-12.C | 24 gcc/testsuite/g++.dg/warn/Warray-bounds-13.C | 24 4 files changed, 48 insertions(+), 48 deletions(-) diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C index 22466977b68a..4deea31fae03 100644 --- a/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C +++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C @@ -19,9 +19,9 @@ void warn_op_new () { T (int32_t, 0, 0); // { dg-warning "array subscript 0 is outside array bounds of 'int32_t \\\[0]'" } // { dg-message "referencing an object of size \\d allocated by 'void\\\* operator new\\\(\(long \)?unsigned int\\\)'" "note" { target *-*-* } .-1 } - T (int32_t, 1, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" } - T (int32_t, 2, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" } - T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" } + T (int32_t, 1, 0); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" } + T (int32_t, 2, 0); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" } + T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" } T (int32_t, 4, 0); @@ -30,9 +30,9 @@ void warn_op_new () T (int32_t, 2, 1); // { dg-warning "array subscript 1 is outside array bounds " } T (int32_t, 3, 1); // { dg-warning "array subscript 1 is outside array bounds " } T (int32_t, 4, 1); // { dg-warning "array subscript 1 is outside array bounds " } - T (int32_t, 5, 1); // { dg-warning "array subscript 'int32_t {aka int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[5]" } - T (int32_t, 6, 1); // { dg-warning "array subscript 'int32_t {aka int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[6]" } - T (int32_t, 7, 1); // { dg-warning "array subscript 'int32_t {aka int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[7]" } + T (int32_t, 5, 1); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[5]" } + T (int32_t, 6, 1); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[6]" } + T (int32_t, 7, 1); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[1]' is partly outside array bounds of 'unsigned char \\\[7]" } T (int32_t, 8, 1); } @@ -45,9 +45,9 @@ void warn_op_array_new () T (int32_t, 0, 0); // { dg-warning "array subscript 0 is outside array bounds of 'int32_t \\\[0]'" } // { dg-message "referencing an object of size \\d allocated by 'void\\\* operator new \\\[]\\\(\(long \)?unsigned int\\\)'" "note" { target *-*-* } .-1 } - T (int32_t, 1, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" } - T (int32_t, 2, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" } - T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" } + T (int32_t, 1, 0); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" } + T (int32_t, 2, 0); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" } + T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka (long )?int}\\\
Re: [PATCH, V2] Add conversions between _Float128 and Decimal.
On Sat, Feb 20, 2021 at 06:33:12PM -0600, Segher Boessenkool wrote: > Hi! > > On Tue, Feb 09, 2021 at 02:35:05AM -0500, Michael Meissner wrote: > > This patch implements conversions between _Float128 and the 3 Decimal > > floating > > types. It does this by extendending the dfp-bit conversions to add a new > > binary floating point type (KF), and doing the conversions in the same > > manner > > as the other binary/decimal conversions. > > > For conversions from _Float128 to Decimal, this patch uses a function > > (__sprintfkf) instead of the sprintf function to convert long double values > > to > > strings. The __sprintfkf function determines if GLIBC 2.32 or newer is used > > and calls the IEEE 128-bit version of sprintf (__sprintfieee128). If the > > GLIBC > > is earlier than 2.32, the code will convert _Float128 to __ibm128 and then > > use > > the normal sprintf to convert this value. > > So if built with a glibc version before 2.32 (less than a year old) it > will give the wrong answer. This needs improving, or it will be another > eight or so years until this is generally usable. But until the long double format default is changed to be IEEE 128-bit floating point, only the people who explicitly convert between __float128/_Float128 and the decimal types will see the issue. In order to switch to the IEEE 128-bit floating point long double, you need at least GLIBC 2.32, and you will get full accuracy. > > The compilers built fine providing I recompiled gmp, mpc, and mpfr with the > > appropriate long double options. There were a few differences in the test > > suite runs that will be addressed in later patches, but over all it works > > well. > > What kind of differences? I assume you checked all, and all differences > are an improvement, or the differences are inconsequential and the test > is not very good? I have submitted patches for these before, and I will shortly ping or resubmit them again. But I felt this was more import to get in before worrying about changes to the testsuite. The changes are: * C test c-c++-common/dfp/convert-bfp-11.c fails * C test gcc.target/powerpc/pr70117.c fails * C test gcc.dg/torture/float128-nan.c fails * C test gcc.target/powerpc/nan128-1.c fails * C++ test c-c++-common/dfp/convert-bfp-11.c fails * C++ testsall modules tests fails * Fortran test gfortran.dg/default_format_2.f90 now passes * Fortran test gfortran.dg/default_format_denormal_2.f90 now passes * Fortran test gfortran.dg/ieee/large_2.f90 now passes The following two tests test facets of the IBM 128-bit long double implementation. Since they are hard wired to use IBM 128-bit long double, I've added options in the patches to run these tests with -mabi=ibmlongdouble if the default is -mabi=ieeelongdouble. * c-c++-common/dfp/convert-bfp-11.c * gcc.target/powerpc/pr70117.c The following two tests fail because they are testing the old libquadmath 'q' built-ins, and there is a subtle difference between using the _Float128 built-in function for the nans function and the long double built-in function for nans. In particular, the signalling NaN is silently converted to a quiet NaN. * gcc.dg/torture/float128-nan.c * gcc.target/powerpc/nan128-1.c The modules failures are PR c++/98645, and are not a back end feature. The 3 fortran tests now pass if long double uses the IEEE 128-bit representation: * gfortran.dg/default_format_2.f90 * gfortran.dg/default_format_denormal_2.f90 * gfortran.dg/ieee/large_2.f90 > > > --- /dev/null > > +++ b/libgcc/config/rs6000/_sprintfkf.c > > @@ -0,0 +1,57 @@ > > + If we are linked against an earlier library, we will have fake it by > > + converting the value to long double, and using sprinf to do the > > conversion. > > (typo, sprintf) Fixed. > > + This isn't ideal, as IEEE 128-bit has more exponent range than IBM > > + 128-bit. */ > > And that results in some numbers being printed as Inf that are not. But > also, the significand has more bits, so there is a loss of precision as > well. Yes. > > +int __sprintfkf (char *restrict string, > > +const char *restrict format, > > +_Float128 number) > > +{ > > + if (__sprintfieee128) > > +return __sprintfieee128 (string, format, number); > > + > > + return sprintf (string, format, (long double)number); > > Space after cast. Fixed. > > +_Float128 > > +__strtokf (const char *string, char **endptr) > > +{ > > + long double num; > > + > > + if (__strtoieee128) > > +return __strtoieee128 (string, endptr); > > + > > + num = strtold (string, endptr); > > + return (_Float128) num; > > Do not cast return values please. All casts you do should be *needed*, > have a purpose. Well it is changing type (num is long double, i.e. IBM 128-bit long double) and the return is _Flo
[PATCH, constexpr, coroutines ] Generic lambda coroutines cannot be constexpr [PR96251].
Hi, PR96251 (and dups) is a rejects-valid bug against coroutines (but I’m not sure that the problem is really there). * coroutines cannot be constexpr. * Part of the way that the coroutine implementation works is a consequence of the "lazy discovery of coroutine-ness”; whenever we first encounter a coroutine keyword... .. we mark the function as a coroutine, and then we can deal with diagnostics etc. that change under these circumstances. This marking of the function is independent of whether keyword expressions are type-dependent or not. * when we encounter a coroutine keyword in a constexpr context we error. * the constexpr machinery also has been taught that coroutine expressions should cause constexpr-ness to be rejected when checking for "potentially constexpr". So why is there a problem? * lambdas are implicitly constexpr from C++17. * the constexpr machinery tends to bail early *with a conservative assumption that the expression is potentially constexpr* when it finds a type-dependent expression - without evaluating sub- expressions to see if they are valid (thus missing coroutine exprs. in such positions). The combination of these ^^ two things, means that for many generic lambdas with non-trivial bodies - we then enter instantiation with a constexpr type-dependent coroutine (which is a Thing that Should not Be). As soon as we try to tsub any coroutine keyword expression encountered, we error out. * I was not able to see any way in which the instantiation process could be made to bail in this case and re-try for non-constexpr. * Nor able to see somewhere else where that decision could be made. ^^ these two points reflect that I'm not familiar with the constexpr machinery. * Proposed solution. Since coroutines cannot be constexpr (not even sure what that would mean in any practicable implementation). The fix proposed is to reject constexpr-ness for coroutine functions as early as possible. * a) We can prevent a generic lambda coroutine from being converted to constexpr in finish_function (). Likewise, via the tests below, we can avoid it for regular lambdas. b) We can also reject coroutine function decls early in both is_valid_constexpr_fn () and potential_constant_expression_1 (). So - is there some alternate solution that would be better? (in some ways it seems pointless to delay rejection of a coroutine function to some later point). OTOH, I suppose that one could have some weird code where coroutine coroutine keywords only appeared in a non-constexpr paths of the code - but our current lowering is not prepared for such a circumstance. AFAIU the standard, there's no dispensation from the "cannot be constexpr" for such a code construction .. but ICBW. In any event, the cases reported (and fixed by this patch) are not trying anything so fiendishly clever. Tested on x86_64-darwin. Suggestions? OK for master (after wider testing)? thanks Iain = coroutines cannot be constexpr, but generic lambdas (from C++17) are implicitly assumed to be, and the processing of type- dependent lambda bodies often terminates before any coroutine expressions are encountered. For example, the PR notes cases where the coro expressions are hidden by type-dependent for or switch expressions. The solution proposed is to deny coroutine generic lambdas. We also tighten up the checks for is_valid_constexpr_fn() and do an early test for coroutine functions in checking for potential constant expressions. gcc/cp/ChangeLog: PR c++/96251 * constexpr.c (is_valid_constexpr_fn): Test for a coroutine before the chekc for lambda. (potential_constant_expression_1): Disallow coroutine function decls. * decl.c (finish_function): Do not mark generic coroutine lambda templates as constexpr. gcc/testsuite/ChangeLog: PR c++/96251 * g++.dg/coroutines/pr96251.C: New test. --- gcc/cp/constexpr.c| 7 ++- gcc/cp/decl.c | 2 +- gcc/testsuite/g++.dg/coroutines/pr96251.C | 22 ++ 3 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr96251.C diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 377fe322ee8..036bf04efa9 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -210,7 +210,9 @@ is_valid_constexpr_fn (tree fun, bool complain) } } - if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17) + if (DECL_COROUTINE_P (fun)) +ret = false; + else if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17) { ret = false; if (complain) @@ -7827,6 +7829,9 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now, switch (TREE_CODE (t)) { case FUNCTION_DECL: + if (DECL_COROUTINE_P (t)) + return false; + /* FALLTHROUGH. */ case BASELINK: case TEMPLATE_DE
[PATCH] PR fortran/99206 - [11 Regression] ICE in add_init_expr_to_sym, at fortran/decl.c:1980
Dear all, when simplifying the RESHAPE intrinsic we lost the string length when the resulting array had size zero. The attached patch sets the resulting length from the source. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980 Make sure that the string length is properly set during simplification even when the resulting array is zero-sized. gcc/fortran/ChangeLog: PR fortran/99206 * simplify.c (gfc_simplify_reshape): Set string length for character arguments. gcc/testsuite/ChangeLog: PR fortran/99206 * gfortran.dg/reshape_zerosize_4.f90: New test. diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c index 35f267db588..388aca7c38c 100644 --- a/gcc/fortran/simplify.c +++ b/gcc/fortran/simplify.c @@ -6887,6 +6887,8 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp, &source->where); if (source->ts.type == BT_DERIVED) result->ts.u.derived = source->ts.u.derived; + if (source->ts.type == BT_CHARACTER && result->ts.u.cl == NULL) +result->ts = source->ts; result->rank = rank; result->shape = gfc_get_shape (rank); for (i = 0; i < rank; i++) diff --git a/gcc/testsuite/gfortran.dg/reshape_zerosize_4.f90 b/gcc/testsuite/gfortran.dg/reshape_zerosize_4.f90 new file mode 100644 index 000..d9a0d217271 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/reshape_zerosize_4.f90 @@ -0,0 +1,14 @@ +! { dg-do run } +! PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980 +! Check simplifier of RESHAPE for character arrays. + +program p + character(*),parameter :: a(0) = reshape([ 'ab'], [0]) + character(*,kind=4), parameter :: c(0) = reshape([4_'cd'], [0]) + if (len (a) /= 2) stop 1 + if (len (reshape ( ['ab'], [0])) /= 2) stop 2 + if (kind(reshape ( ['ab'], [0])) /= 1) stop 3 + if (len (c) /= 2) stop 4 + if (len (reshape ([4_'cd'], [0])) /= 2) stop 5 + if (kind(reshape ([4_'cd'], [0])) /= 4) stop 6 +end
Re: [PATCH, V2] Add conversions between _Float128 and Decimal.
On Mon, Feb 22, 2021 at 03:52:52PM -0500, Michael Meissner wrote: > On Sat, Feb 20, 2021 at 06:33:12PM -0600, Segher Boessenkool wrote: > > So if built with a glibc version before 2.32 (less than a year old) it > > will give the wrong answer. This needs improving, or it will be another > > eight or so years until this is generally usable. > > But until the long double format default is changed to be IEEE 128-bit > floating > point, only the people who explicitly convert between __float128/_Float128 > and > the decimal types will see the issue. > > In order to switch to the IEEE 128-bit floating point long double, you need at > least GLIBC 2.32, and you will get full accuracy. But see my point. It only works if you can break all links with the past. Which cannot work if you have to work together with other communities, don't just do your own thing. ("Full accuracy" is a strange thing to say... doubele-double has (much) higher precision for some numbers... but if you want IEEE float semantics, double-double is simply *wrong*). > > > The compilers built fine providing I recompiled gmp, mpc, and mpfr with > > > the > > > appropriate long double options. There were a few differences in the test > > > suite runs that will be addressed in later patches, but over all it works > > > well. > > > > What kind of differences? I assume you checked all, and all differences > > are an improvement, or the differences are inconsequential and the test > > is not very good? > > I have submitted patches for these before, and I will shortly ping or resubmit > them again. But I felt this was more import to get in before worrying about > changes to the testsuite. That isn't what I asked? I don't want to investigate all these bugs, I just asked if you did and found nothing that should hold up this patch. Just "yes" would work :-) > The following two tests fail because they are testing the old libquadmath 'q' > built-ins, and there is a subtle difference between using the _Float128 > built-in function for the nans function and the long double built-in function > for nans. In particular, the signalling NaN is silently converted to a quiet > NaN. > > * gcc.dg/torture/float128-nan.c > * gcc.target/powerpc/nan128-1.c Both tests needs to use -fsignaling-nans. Do they work better with that fixed? > > > +_Float128 > > > +__strtokf (const char *string, char **endptr) > > > +{ > > > + long double num; > > > + > > > + if (__strtoieee128) > > > +return __strtoieee128 (string, endptr); > > > + > > > + num = strtold (string, endptr); > > > + return (_Float128) num; > > > > Do not cast return values please. All casts you do should be *needed*, > > have a purpose. > > Well it is changing type (num is long double, i.e. IBM 128-bit long double) > and > the return is _Float128. And this does not make a difference since function prototypes exist. So please don't. *All* casts there are should be there for a reason; and the few casts you end up using all look like a "danger!" sign to a careful reader (they hide compiler warnings, and that is the *least* trouble they cause!) Thanks, Segher
RE: [PATCH] aarch64: Add internal tune flag to minimise VL-based scalar ops
> -Original Message- > From: Gcc-patches On Behalf Of > Kyrylo Tkachov via Gcc-patches > Sent: 16 February 2021 15:20 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] aarch64: Add internal tune flag to minimise VL-based scalar > ops > > Hi all, > > This patch introduces an internal tune flag to break up VL-based scalar ops > into a GP-reg scalar op with the VL read kept separate. This can be preferable > on some CPUs. > > I went for a tune param rather than extending the rtx costs as our RTX costs > tables aren't set up to track > this intricacy. > > I've confirmed that on the simple loop: > void vadd (int *dst, int *op1, int *op2, int count) > { > for (int i = 0; i < count; ++i) > dst[i] = op1[i] + op2[i]; > } > > we now split the incw into a cntw outside the loop and the add inside. > > + cntwx5 > ... > loop: > - incwx4 > + add x4, x4, x5 > > Bootstrapped and tested on aarch64-none-linux-gnu. > This is a minimally invasive fix to help the performance of just - > mcpu=neoverse-v1 for > GCC 11 so I'd like to have it in stage4 if possible, > but I'd appreciate some feedback on the risk assessment of it. After some offline discussion and evaluation with Richard Sandiford we are happy with the patch to go in now as it's a minimal codegen change isolated to a specific non-default tuning, so I've pushed the patch. Thanks, Kyrill > > Thanks, > Kyrill > > gcc/ChangeLog: > > * config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants): > Define. > * config/aarch64/aarch64.md (add3): Force > CONST_POLY_INT immediates > into a register when the above is enabled. > * config/aarch64/aarch64.c (neoversev1_tunings): > AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS. > (aarch64_rtx_costs): Use > AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS. > > gcc/testsuite/ > > * gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.
[PATCH] [libstdc++] Refactor/cleanup of atomic wait implementation
From: Thomas Rodgers This is a substantial rewrite of the atomic wait/notify (and timed wait counterparts) implementation. The previous __platform_wait looped on EINTR however this behavior is not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro now controls whether wait/notify are implemented using a platform specific primitive or with a platform agnostic mutex/condvar. This patch only supplies a definition for linux futexes. A future update could add support __ulock_wait/wake on Darwin, for instance. The members of __waiters were lifted to a new base class. The members are now arranged such that overall sizeof(__waiters_base) fits in two cache lines (on platforms with at least 64 byte cache lines). The definition will also use destructive_interference_size for this if it is available. The __waiters type is now specific to untimed waits. Timed waits have a corresponding __timed_waiters type. Much of the code has been moved from the previous __atomic_wait() free function to the __waiter_base template and a __waiter derived type is provided to implement the un-timed wait operations. A similar change has been made to the timed wait implementation. The __atomic_spin code has been extended to take a spin policy which is invoked after the initial busy wait loop. The default policy is to return from the spin. The timed wait code adds a timed backoff spinning policy. The code from which implements this_thread::sleep_for, sleep_until has been moved to a new header which allows the thread sleep code to be consumed without pulling in the whole of . The entry points into the wait/notify code have been restructured to support either - * Testing the current value of the atomic stored at the given address and waiting on a notification. * Applying a predicate to determine if the wait was satisfied. The entry points were renamed to make it clear that the wait and wake operations operate on addresses. The first variant takes the expected value and a function which returns the current value that should be used in comparison operations, these operations are named with a _v suffix (e.g. 'value'). All atomic<_Tp> wait/notify operations use the first variant. Barriers, latches and semaphores use the predicate variant. This change also centralizes what it means to compare values for the purposes of atomic::wait rather than scattering through individual predicates. This change also centralizes the repetitive code which adjusts for different user supplied clocks (this should be moved elsewhere and all such adjustments should use a common implementation). libstdc++-v3/ChangeLog: * include/Makefile.am: Add new header. * include/Makefile.in: Regenerate. * include/bits/atomic_base.h: Adjust all calls to __atomic_wait/__atomic_notify for new call signatures. * include/bits/atomic_wait.h: Extensive rewrite. * include/bits/atomic_timed_wait.h: Likewise. * include/bits/semaphore_base.h: Adjust all calls to __atomic_wait/__atomic_notify for new call signatures. * include/bits/std_thread_sleep.h: New file. * include/std/atomic: Likewise. * include/std/barrier: Likewise. * include/std/latch: Likewise. * testsuite/29_atomics/atomic/wait_notify/bool.cc: Simplify test. * testsuite/29_atomics/atomic/wait_notify/generic.cc: Likewise. * testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise. * testsuite/29_atomics/atomic_flag/wait_notify.cc: Likewise. * testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise. * testsuite/29_atomics/atomic_integral/wait_notify.cc: Likewise. * testsuite/29_atomics/atomic_ref/wait_notify.cc: Likewise. --- libstdc++-v3/include/Makefile.am | 1 + libstdc++-v3/include/Makefile.in | 1 + libstdc++-v3/include/bits/atomic_base.h | 36 +- libstdc++-v3/include/bits/atomic_timed_wait.h | 398 +++-- libstdc++-v3/include/bits/atomic_wait.h | 400 +++--- libstdc++-v3/include/bits/semaphore_base.h| 73 +--- libstdc++-v3/include/bits/std_thread_sleep.h | 119 ++ libstdc++-v3/include/std/atomic | 15 +- libstdc++-v3/include/std/barrier | 4 +- libstdc++-v3/include/std/latch| 4 +- libstdc++-v3/include/std/thread | 68 +-- .../29_atomics/atomic/wait_notify/bool.cc | 37 +- .../29_atomics/atomic/wait_notify/generic.cc | 19 +- .../29_atomics/atomic/wait_notify/pointers.cc | 36 +- .../29_atomics/atomic_flag/wait_notify/1.cc | 37 +- .../29_atomics/atomic_float/wait_notify.cc| 26 +- .../29_atomics/atomic_integral/wait_notify.cc | 73 ++-- .../29_atomics/atomic_ref/wait_notify.cc | 74 +--- 18 files changed, 804 insertions(+), 617 deletions(-) create mode 100644 libstdc++-v3/include/bits/std_thread_sleep.h diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v
[PATCH 2/2] libstdc++: Fix endianness issue with IBM long double [PR98384]
The logic in std::to_chars for extracting the high- and low-order parts of a IBM long double value does the right thing on powerpc64le, but not on powerpc64be. This patch makes the extraction endian-agnostic, which fixes the execution FAIL of to_chars/long_double.cc on powerpc64be. Tested on powerpc64le and powerpc64be. Does this look OK for trunk? libstdc++-v3/ChangeLog: PR libstdc++/98384 * src/c++17/floating_to_chars.cc (get_ieee_repr): Extract the high- and low-order parts from an IBM long double value in an endian-agnostic way. --- libstdc++-v3/src/c++17/floating_to_chars.cc | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc b/libstdc++-v3/src/c++17/floating_to_chars.cc index a50548acae7..4b2f85c1c1a 100644 --- a/libstdc++-v3/src/c++17/floating_to_chars.cc +++ b/libstdc++-v3/src/c++17/floating_to_chars.cc @@ -395,11 +395,11 @@ namespace // of the high part, and we merge the mantissa of the high part with the // mantissa (and the implicit leading bit) of the low part. using uint_t = unsigned __int128; - uint_t value_bits = 0; - memcpy(&value_bits, &value, sizeof(value_bits)); + uint64_t value_bits[2] = {}; + memcpy(value_bits, &value, sizeof(value_bits)); - const uint64_t value_hi = value_bits; - const uint64_t value_lo = value_bits >> 64; + const uint64_t value_hi = value_bits[0]; + const uint64_t value_lo = value_bits[1]; uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1); unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1); -- 2.30.1.559.g2283e0e9af
[PATCH 1/2] libstdc++: Robustify long double std::to_chars testcase [PR98384]
This makes the hexadecimal section of the long double std::to_chars testcase more robust by avoiding false-negative FAILs due to printf using a different leading hex digit than us, and by additionally verifying the correctness of the hexadecimal form via round-tripping through std::from_chars. Tested on x86, x86_64, powerpc64be, powerpc64le and aarch64. Does this look OK for trunk? libstdc++-v3/ChangeLog: PR libstdc++/98384 * testsuite/20_util/to_chars/long_double.cc: Include . (test01): Simplify verifying the nearby values by using a 2-iteration loop and a dedicated output buffer to check that the nearby values are different. Factor out the printf-based verification into a local function, and check that the leading hex digits agree before comparing with the output of printf. Also verify the output by round-tripping it through from_chars. --- .../testsuite/20_util/to_chars/long_double.cc | 73 --- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc index 4f72cb65400..34cc03034cc 100644 --- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc +++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -50,6 +51,38 @@ namespace detail void test01() { + // Verifies correctness of the hexadecimal form [BEGIN,END) for VALUE by + // round-tripping it through from_chars (if available). + auto verify_via_from_chars = [] (char *begin, char *end, long double value) { +#if __cpp_lib_to_chars >= 201611L || _GLIBCXX_HAVE_USELOCALE +long double roundtrip; +auto result = from_chars(begin, end, roundtrip, chars_format::hex); +VERIFY( result.ec == errc{} ); +VERIFY( result.ptr == end ); +VERIFY( roundtrip == value ); +#endif + }; + + // Verifies correctness of the null-terminated hexadecimal form at BEGIN + // for VALUE and PRECISION by comparing it with the output of printf's %La + // conversion specifier. + auto verify_via_printf = [] (char *begin, long double value, + optional precision = nullopt) { +char printf_buffer[1024] = {}; +if (precision.has_value()) + sprintf(printf_buffer, "%.*La", precision.value(), value); +else + sprintf(printf_buffer, "%La", value); + +// Only compare with the output of printf if the leading hex digits agree. +// If the leading hex digit of our form doesn't agree with that of printf, +// then the two forms may still be equivalent (e.g. 1.1p+0 vs 8.8p-3), so we +// don't want a FAIL in this case. But if the leading hex digits do agree, +// then we do expect the two forms to be the same. +if (printf_buffer[strlen("0x")] == begin[0]) + VERIFY( !strcmp(begin, printf_buffer+strlen("0x")) ); + }; + const long double hex_testcases[] = { detail::nextdownl(numeric_limits::max()), detail::nextupl(numeric_limits::min()), @@ -92,38 +125,27 @@ test01() if (testcase == 0.0L || isinf(testcase)) continue; - char to_chars_buffer[1024], printf_buffer[1024]; - memset(to_chars_buffer, '\0', sizeof(to_chars_buffer)); - memset(printf_buffer, '\0', sizeof(printf_buffer)); - + char to_chars_buffer[1024] = {}; auto result = to_chars(begin(to_chars_buffer), end(to_chars_buffer), testcase, chars_format::hex); VERIFY( result.ec == errc{} ); *result.ptr = '\0'; - sprintf(printf_buffer, "%La", testcase); - VERIFY( !strcmp(to_chars_buffer, printf_buffer+strlen("0x")) ); + verify_via_from_chars(begin(to_chars_buffer), result.ptr, testcase); + verify_via_printf(to_chars_buffer, testcase); + // Verify the nearby values, and also check they have a different + // shortest form. + for (long double nearby +: { detail::nextdownl(testcase), detail::nextupl(testcase) }) { - // Verify that the nearby values have a different shortest form. - testcase = detail::nextdownl(testcase); - result = to_chars(begin(to_chars_buffer), end(to_chars_buffer), - testcase, chars_format::hex); - VERIFY( result.ec == errc{} ); - *result.ptr = '\0'; - VERIFY( strcmp(to_chars_buffer, printf_buffer+strlen("0x")) != 0); - sprintf(printf_buffer, "%La", testcase); - VERIFY( !strcmp(to_chars_buffer, printf_buffer+strlen("0x")) ); - - testcase = detail::nextupl(detail::nextupl(testcase)); - result = to_chars(begin(to_chars_buffer), end(to_chars_buffer), - testcase, chars_format::hex); + char nearby_buffer[1024] = {}; + result = to_chars(begin(nearby_buffer), end(nearby_buffer), +
Re: [PATCH 2/2] libstdc++: Fix endianness issue with IBM long double [PR98384]
On Mon, 22 Feb 2021, 22:13 Patrick Palka via Libstdc++, < libstd...@gcc.gnu.org> wrote: > The logic in std::to_chars for extracting the high- and low-order parts > of a IBM long double value does the right thing on powerpc64le, but not > on powerpc64be. This patch makes the extraction endian-agnostic, which > fixes the execution FAIL of to_chars/long_double.cc on powerpc64be. > > Tested on powerpc64le and powerpc64be. Does this look OK for trunk? > I was going to suggest __builtin_unpack_longdouble but actually since you want the bits this is better. You'd need another step to get the bits from each double. OK for trunk.
Re: [PATCH] C++: target attribute - local decl
On 2/18/21 7:15 AM, Martin Liška wrote: We crash when target attribute get_function_versions_dispatcher is called for a function that is not registered in call graph. This fixes that by emitting a new error. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/cp/ChangeLog: PR c++/99108 * call.c (get_function_version_dispatcher): Do not parse target attribute for a function with a missing declaration. gcc/testsuite/ChangeLog: PR c++/99108 * g++.target/i386/pr99108.C: New test. --- gcc/cp/call.c | 8 +++- gcc/testsuite/g++.target/i386/pr99108.C | 18 ++ 2 files changed, 25 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.target/i386/pr99108.C diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 186feef6fe3..844853e504e 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8386,8 +8386,14 @@ get_function_version_dispatcher (tree fn) && DECL_FUNCTION_VERSIONED (fn)); gcc_assert (targetm.get_function_versions_dispatcher); - dispatcher_decl = targetm.get_function_versions_dispatcher (fn); + if (cgraph_node::get (fn) == NULL) + { + error_at (DECL_SOURCE_LOCATION (fn), "missing declaration " + "for a multiversioned function"); This seems like the wrong message. The declaration isn't missing, you're using its source location for the error. Do you mean "definition"? But adding definitions earlier in the file doesn't help. And having file-scope declarations without definitions is fine. The problem seems to be with the handling of local decls. If DECL_LOCAL_DECL_P, you need to look at DECL_LOCAL_DECL_ALIAS to find the namespace-scope decl. But then if there is no preceding namespace-scope declaration, the new decl created by push_local_extern_decl_alias doesn't have a cgraph node, either. I guess maybe_function_versions also needs to look through DECL_LOCAL_DECL_ALIAS. + return NULL; + } + dispatcher_decl = targetm.get_function_versions_dispatcher (fn); if (dispatcher_decl == NULL) { error_at (input_location, "use of multiversioned function " diff --git a/gcc/testsuite/g++.target/i386/pr99108.C b/gcc/testsuite/g++.target/i386/pr99108.C new file mode 100644 index 000..b0c4ffa2688 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr99108.C @@ -0,0 +1,18 @@ +/* PR c++/99108 */ +/* { dg-do compile { target c++20 } } */ +/* { dg-require-ifunc "" } */ + +struct A { + void foo(auto); +}; +void A::foo(auto) +{ + int f(void) __attribute__((target("default"))); + int f(void) __attribute__((target("arch=atom"))); /* { dg-error "missing declaration for a multiversioned function" } */ + int b = f(); +} +void bar(void) +{ + A c; + c.foo(7); +}
Re: [PATCH v7] Practical improvement to libgcc complex divide
I decided to use float_h_prefix instead of modename or mode prefix as Joseph has better insight as to naming expectations when others are working in gcc/c-family/c-cppbuiltin.c. All requested changes made and tested. Patch coming shortly. - patrick On 2/19/2021 5:48 PM, Joseph Myers wrote: On Fri, 19 Feb 2021, Patrick McGehearty via Gcc-patches wrote: Here you're properly computing the mapping from mode to float.h macro prefix (though I think "modename" is a confusing name for the variable used for float.h macro prefixes; to me, "modename" suggests the names such as SF or DF, which are actually "name" here, and something like "float_h_prefix" would be better than "modename"). float_h_prefix puts me in mind of half-precision floating point rather than float.h. How would modeprefix do instead of modename? I'm not sure modeprefix is better; you could say SF is the prefix of SFmode, for example. Hence my suggestion of a name that makes clear it's the prefix *in float.h*. typeprefix might be a possibility, as it's really a prefix associated with a corresponding type rather than directly with the mode.
[PATCH] Require SHF_GNU_RETAIN support for retain test
Since retain attribute requires SHF_GNU_RETAIN, run retain tests only if SHF_GNU_RETAIN is supported. PR testsuite/99173 * c-c++-common/attr-retain-5.c: Require R_flag_in_section. * c-c++-common/attr-retain-6.c: Likewise. * c-c++-common/attr-retain-7.c: Likewise. * c-c++-common/attr-retain-8.c: Likewise. * c-c++-common/attr-retain-9.c: Likewise. --- gcc/testsuite/c-c++-common/attr-retain-5.c | 2 +- gcc/testsuite/c-c++-common/attr-retain-6.c | 2 +- gcc/testsuite/c-c++-common/attr-retain-7.c | 2 +- gcc/testsuite/c-c++-common/attr-retain-8.c | 2 +- gcc/testsuite/c-c++-common/attr-retain-9.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/c-c++-common/attr-retain-5.c b/gcc/testsuite/c-c++-common/attr-retain-5.c index ee6e2c4e054..1f3f8bfb5c5 100644 --- a/gcc/testsuite/c-c++-common/attr-retain-5.c +++ b/gcc/testsuite/c-c++-common/attr-retain-5.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target R_flag_in_section } } */ /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */ /* { dg-options "-Wall -O2" } */ diff --git a/gcc/testsuite/c-c++-common/attr-retain-6.c b/gcc/testsuite/c-c++-common/attr-retain-6.c index 9aead148ec8..ed6cfae2948 100644 --- a/gcc/testsuite/c-c++-common/attr-retain-6.c +++ b/gcc/testsuite/c-c++-common/attr-retain-6.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target R_flag_in_section } } */ /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */ /* { dg-options "-Wall -O2" } */ diff --git a/gcc/testsuite/c-c++-common/attr-retain-7.c b/gcc/testsuite/c-c++-common/attr-retain-7.c index 4c1673583db..c56d18a88e0 100644 --- a/gcc/testsuite/c-c++-common/attr-retain-7.c +++ b/gcc/testsuite/c-c++-common/attr-retain-7.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target R_flag_in_section } } */ /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */ /* { dg-options "-Wall -O2" } */ diff --git a/gcc/testsuite/c-c++-common/attr-retain-8.c b/gcc/testsuite/c-c++-common/attr-retain-8.c index 7e067f56274..c1a9b52972e 100644 --- a/gcc/testsuite/c-c++-common/attr-retain-8.c +++ b/gcc/testsuite/c-c++-common/attr-retain-8.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target R_flag_in_section } } */ /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */ /* { dg-options "-Wall -O2" } */ diff --git a/gcc/testsuite/c-c++-common/attr-retain-9.c b/gcc/testsuite/c-c++-common/attr-retain-9.c index 81accc0ee91..e6d7c0b43da 100644 --- a/gcc/testsuite/c-c++-common/attr-retain-9.c +++ b/gcc/testsuite/c-c++-common/attr-retain-9.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target R_flag_in_section } } */ /* { dg-skip-if "non-ELF target" { *-*-darwin* powerpc*-*-aix* } } */ /* { dg-options "-Wall -O2" } */ -- 2.29.2
Re: [PATCH] handle bad __dynamic_cast more gracefully (PR 99074)
On 2/13/21 7:31 PM, Martin Sebor wrote: The test case in PR 99074 invokes dynamic_cast with the this pointer in a non-static member function called on a null pointer. The call is, of course, undefined and other different circumstances would be diagnosed by -Wnonnull. Unfortunately, in the test case, the null pointer is the result of inlining and constant propagation and so detected neither by the front end -Wnonnull nor by the middle end. The program ends up passing it to __dynamic_cast() which then crashes at runtime (again, not surprising for undefined behavior. However, the reporter says the code behaved gracefully (didn't crash) when compiled with GCC 4.8, and in my tests it also doesn't crash when compiled with Clang or ICC. I looked to see if it's possible to do better and it seems it is. The attached patch improves things by changing __dynamic_cast to fail by returning null when the first argument is null, and also This hunk is OK. by declaring __dynamic_cast with attribute nonnull so that invalid calls to it with a constant null pointer can be detected at compile time. This is not; dynamic_cast is specified to return null for a null operand. "If v is a null pointer value, the result is a null pointer value." The undefined behavior is the call to _to_object, not the dynamic_cast. Since the test case is undefined it seems borderline whether this can strictly be considered a regression, even if some previous releases handled it more gracefully. Indeed. But handling the null case in __dynamic_cast as well as in the compiler seems harmless enough. Jason
[PATCH v8] Practical improvement to libgcc complex divide
Changes in this version from Version 7: gcc/c-family/c-cppbuiltin.c Changed modename to float_h_prefix Removed (mode == TYPE_MODE...) code left over from earlier approach. libgcc/libgcc2.c Removed conditional use of XF constants in TF case. Also left over from earlier approach. Tested _Float64x case on x86. Works as expected. Correctness and performance test programs used during development of this project may be found in the attachment to: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough to allow shifting any subnormal values to normal values all input values could be scaled up without risking overflow. That gained a clear improvement in accuracy. Similarly, when either numerator was subnormal and the other numerator and both denominator values were not too large, scaling could be used to reduce risk of computing with subnormals. The test and scaling values used all fit within the allowed exponent range for each precision required by the C standard. Float precision has more difficulty with getting correct answers than double pre
Re: [PATCH] Require SHF_GNU_RETAIN support for retain test
On Mon, Feb 22, 2021 at 03:06:58PM -0800, H.J. Lu via Gcc-patches wrote: > Since retain attribute requires SHF_GNU_RETAIN, run retain tests only > if SHF_GNU_RETAIN is supported. > > PR testsuite/99173 > * c-c++-common/attr-retain-5.c: Require R_flag_in_section. > * c-c++-common/attr-retain-6.c: Likewise. > * c-c++-common/attr-retain-7.c: Likewise. > * c-c++-common/attr-retain-8.c: Likewise. > * c-c++-common/attr-retain-9.c: Likewise. Ok, thanks. Jakub
[committed] analyzer: handle error/error_at_line [PR99196]
PR analyzer/99196 describes a false positive from -fanalyzer due to the analyzer not "knowing" that calls to GNU libc's error(3) with a nonzero status terminate the process and thus don't return. This patch fixes the false positive by special-casing "error" and "error_at_line". Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r11-7333-g5ee4ba031dd9fc60bf2494ca30f46c0acaa34805. gcc/analyzer/ChangeLog: PR analyzer/99196 * engine.cc (exploded_node::on_stmt): Provide terminate_path flag as a way for on_call_pre to terminate the current analysis path. * region-model-impl-calls.cc (call_details::num_args): New. (region_model::impl_call_error): New. * region-model.cc (region_model::on_call_pre): Add param "out_terminate_path". Handle "error" and "error_at_line". * region-model.h (call_details::num_args): New decl. (region_model::on_call_pre): Add param "out_terminate_path". (region_model::impl_call_error): New decl. gcc/testsuite/ChangeLog: PR analyzer/99196 * gcc.dg/analyzer/error-1.c: New test. * gcc.dg/analyzer/error-2.c: New test. * gcc.dg/analyzer/error-3.c: New test. --- gcc/analyzer/engine.cc | 6 ++- gcc/analyzer/region-model-impl-calls.cc | 38 ++ gcc/analyzer/region-model.cc| 22 - gcc/analyzer/region-model.h | 7 ++- gcc/testsuite/gcc.dg/analyzer/error-1.c | 66 + gcc/testsuite/gcc.dg/analyzer/error-2.c | 48 ++ gcc/testsuite/gcc.dg/analyzer/error-3.c | 11 + 7 files changed, 194 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-1.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-2.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/error-3.c diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 89b3be22ecb..0edeb490a53 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -1132,6 +1132,7 @@ exploded_node::on_stmt (exploded_graph &eg, stmt); bool unknown_side_effects = false; + bool terminate_path = false; switch (gimple_code (stmt)) { @@ -1203,7 +1204,7 @@ exploded_node::on_stmt (exploded_graph &eg, } else unknown_side_effects - = state->m_region_model->on_call_pre (call, &ctxt); + = state->m_region_model->on_call_pre (call, &ctxt, &terminate_path); } break; @@ -1215,6 +1216,9 @@ exploded_node::on_stmt (exploded_graph &eg, break; } + if (terminate_path) +return on_stmt_flags::terminate_path (); + bool any_sm_changes = false; int sm_idx; sm_state_map *smap; diff --git a/gcc/analyzer/region-model-impl-calls.cc b/gcc/analyzer/region-model-impl-calls.cc index d3b66ba7375..72404a5bc61 100644 --- a/gcc/analyzer/region-model-impl-calls.cc +++ b/gcc/analyzer/region-model-impl-calls.cc @@ -96,6 +96,14 @@ call_details::maybe_set_lhs (const svalue *result) const return false; } +/* Return the number of arguments used by the call statement. */ + +unsigned +call_details::num_args () const +{ + return gimple_call_num_args (m_call); +} + /* Get argument IDX at the callsite as a tree. */ tree @@ -240,6 +248,36 @@ region_model::impl_call_calloc (const call_details &cd) return true; } +/* Handle the on_call_pre part of "error" and "error_at_line" from + GNU's non-standard . + MIN_ARGS identifies the minimum number of expected arguments + to be consistent with such a call (3 and 5 respectively). + Return true if handling it as one of these functions. + Write true to *OUT_TERMINATE_PATH if this execution path should be + terminated (e.g. the function call terminates the process). */ + +bool +region_model::impl_call_error (const call_details &cd, unsigned min_args, + bool *out_terminate_path) +{ + /* Bail if not enough args. */ + if (cd.num_args () < min_args) +return false; + + /* Initial argument ought to be of type "int". */ + if (cd.get_arg_type (0) != integer_type_node) +return false; + + /* The process exits if status != 0, so it only continues + for the case where status == 0. + Add that constraint, or terminate this analysis path. */ + tree status = cd.get_arg_tree (0); + if (!add_constraint (status, EQ_EXPR, integer_zero_node, cd.get_ctxt ())) +*out_terminate_path = true; + + return true; +} + /* Handle the on_call_post part of "free", after sm-handling. If the ptr points to an underlying heap region, delete the region, diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 159b0f18654..2053f8f79bb 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -741,10 +741,14 @@ region_model::on_assignment (const gassign *assign, region_model_context *ctxt) Return true if the function call has u
Re: [PATCH] doc: c: c++: Document the C/C++ extended asm empty input constraints
On Mon, 22 Feb 2021 at 16:30, Segher Boessenkool wrote: > > Hi! > > First off, thanbk you for the patch! You're welcome! > On Mon, Feb 15, 2021 at 11:22:52PM +, Neven Sajko via Gcc-patches wrote: > > There is a long-standing, but undocumented GCC inline assembly feature > > that's part of the extended asm GCC extension to C and C++: extended > > asm empty input constraints. > > There is no such thing. *All* empty constraints have the same > semantics: anything whatsoever will do. Any register, any constant, any > memory. What I was trying to express is that input operand constraints are unlike output operand constraints in that they can be empty. I now realize I ended up being slightly confusing, though. > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -1131,7 +1131,102 @@ the addressing register. > > @subsection Simple Constraints > > @cindex simple constraints > > > > -The simplest kind of constraint is a string full of letters, each of > > +An input constraint is allowed to be an empty string, in which case it is > > +called an empty input constraint. > > That is just shorthand for "empty constraint that is used for an input > operand". It is not special, and it *is* documented: > https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints > The simplest kind of constraint is a string full of letters, each of > which describes one kind of operand that is permitted. > > A length zero string is allowed as well. This could be made more > explicit sure; OTOH, it isn't very often useful. So your example > (using it for making a dependency) is certainly useful to have. But > it is not a special case at all. Syntactically, it's not a special case; but I definitely think the semantics could be better documented. Proof: * There's a relevant Stack Overflow question. If I didn't know better I'd conclude from the discussion there that empty input constraints are undocumented and unsupported, and there would surely be an answer if the documentation on the GCC side was a bit better: https://stackoverflow.com/questions/63305223/gcc-asm-with-empty-input-operand-constraint * Clang erroneously doesn't support empty constraints for many years now (even though their internal documentation still says empty input constraints are supported, and external documentation says they support all the same constraints as GCC does). I suppose they may have been mislead by the lack of explicit mention of the feature in GCC's documentation. > > (When an empty input constraint is used, > > +the assembler template will most probably also be empty. I.e., the > > @code{asm} > > +declaration need not contain actual assembly code.) > > Don't use parentheses like this in documentation please. OK. > > An empty input > > +constraint can be used to create an artificial dependency on a C or C++ > > +variable (the variable that appears in the expression associated with the > > +constraint) without incurring unnecessary costs to performance. > > It still needs a register (or memory) reserved there (or sometimes a > constant can be used, but you have no dependency in that case!) Yeah, this is a bit more complicated than I perhaps implied. An asm volatile can tell the compiler "I need this value calculated at this point", but the compiler may still choose to eliminate the calculation from the generated code if it can perform it itself at compilation time. Thus currently the programmer must be able to predict if GCC will be able compute the value of some variable or expression; the good thing is that this is usually easy to predict. > > +An example of where such behavior may be useful is for preventing compiler > > +optimizations like dead store elimination or hoisting code outside a loop > > for > > +certain pieces of C or C++ code. > > You should not think about preventing the compiler from doing something. > Instead, you can give the compiler extra information that makes it *do* > something: it has to, because it has to implement the semantics your > source program has. > > > Specific applications may include direct > > +interaction with hardware features; or things like testing, fuzzing and > > +benchmarking. > > What does this mean? The manual already has examples for "direct interaction with hardware features". Benchmarking is another relatively well known example of an activity during which we may be inconvenienced by the compiler doing dead store elimination and loop hoisting at certain specific places in the code. E.g., Google's Benchmark has DoNotOptimize and Facebook's Folly has doNotOptimizeAway: https://github.com/google/benchmark/blob/master/include/benchmark/benchmark.h#L308 https://github.com/facebook/folly/blob/master/folly/BenchmarkUtil.h#L73 Unit testing and fuzzing are other such examples, specifically when trying to test for undefined behavior with sanitizers. > Here is a simple example showing why this isn't as simple to use as > you imply here: > > === > void f(int x)
PING 4 [PATCH] correct fix to avoid false positives for vectorized stores (PR 96963, 94655)
Ping 4: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html On 2/14/21 5:43 PM, Martin Sebor wrote: Ping 3: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html This is a fix for two P2 bugs (false positives). On 2/6/21 10:13 AM, Martin Sebor wrote: Ping 2: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html On 1/29/21 10:20 AM, Martin Sebor wrote: Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564059.html On 1/21/21 4:38 PM, Martin Sebor wrote: The hack I put in compute_objsize() last January for pr93200 isn't quite correct. It happened to suppress the false positive there but, due to what looks like a thinko on my part, not in some other test cases involving vectorized stores. The attached change adjusts the hack to have compute_objsize() give up on all MEM_REFs with a vector type. This effectively disables -Wstringop-{overflow,overread} for vector accesses (either written by the user or synthesized by GCC from ordinary accesses). It doesn't affect -Warray-bounds because this warning doesn't use compute_objsize() yet. When it does (it should considerably simplify the code) some additional changes will be needed to preserve -Warray-bounds for out of bounds vector accesses. The test this patch adds should serve as a reminder to make it if we forget. Tested on x86_64-linux. Since PR 94655 was reported against GCC 10 I'd like to apply this fix to both the trunk and the 10 branch. Martin
Committed: g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets
Looking at commit de05c19d5fd6, that adjustment to these tests apparently assumed that the testsuite is run (only) on targets where structure memory layout has padding as per "natural alignment". For cris-elf, a target with no padding in structure memory layout, these tests have been failing since that commit. Tested cris-elf and x86_64-linux, committed as obvious. gcc/testsuite: * g++.dg/warn/Wplacement-new-size-1.C, g++.dg/warn/Wplacement-new-size-2.C, g++.dg/warn/Wplacement-new-size-6.C: Adjust for default_packed targets. --- gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C | 12 ++-- gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C | 14 +++--- gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C | 22 +++--- 3 files changed, 24 insertions(+), 24 deletions(-) diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C index cec83163dbe7..1ad5c2da75af 100644 --- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C @@ -67,8 +67,8 @@ void fBx (BAx *pbx, BAx &rbx) { BAx bax; // The uninitialized flexible array takes up the bytes of padding. - new (bax.ax.a) char; - new (bax.ax.a) Int16; + new (bax.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax.ax.a) Int16;// { dg-warning "placement" "" { target default_packed } } new (bax.ax.a) Int32;// { dg-warning "placement" } new (pbx->ax.a) char; @@ -86,10 +86,10 @@ void fBx1 () static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } }; // The empty flexible array takes up the bytes of padding. - new (bax1.ax.a) char; - new (bax1.ax.a) char[2]; - new (bax1.ax.a) Int16; - new (bax1.ax.a) char[3]; + new (bax1.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[2]; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) Int16;// { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[3]; // { dg-warning "placement" "" { target default_packed } } new (bax1.ax.a) char[4]; // { dg-warning "placement" } new (bax1.ax.a) Int32;// { dg-warning "placement" } } diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C index e5fdfe1f603e..a4de2b04629c 100644 --- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C @@ -125,9 +125,9 @@ void fBx (BAx *pbx, BAx &rbx) { BAx bax; // The uninitialized flexible array takes up the bytes of padding. - new (bax.ax.a) char; - new (bax.ax.a) Int16; - new (bax.ax.a) char[3]; + new (bax.ax.a) char;// { dg-warning "placement" "" { target default_packed } } + new (bax.ax.a) Int16; // { dg-warning "placement" "" { target default_packed } } + new (bax.ax.a) char[3]; // { dg-warning "placement" "" { target default_packed } } new (bax.ax.a) Int32; // { dg-warning "placement" } new (bax.ax.a) char[4]; // { dg-warning "placement" } new (bax.ax.a) char[5]; // { dg-warning "placement" } @@ -147,10 +147,10 @@ void fBx1 () static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } }; // The empty flexible array takes up the bytes of padding. - new (bax1.ax.a) char; - new (bax1.ax.a) char[2]; - new (bax1.ax.a) Int16; - new (bax1.ax.a) char[3]; + new (bax1.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[2];// { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) Int16; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[3];// { dg-warning "placement" "" { target default_packed } } new (bax1.ax.a) Int32; // { dg-warning "placement" } new (bax1.ax.a) char[4];// { dg-warning "placement" } new (bax1.ax.a) char[5];// { dg-warning "placement" } diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C index 5eb63d23b477..86a18fa08e63 100644 --- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C @@ -18,9 +18,9 @@ void fBx1 () static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ { 3 } } }; // { dg-error "initialization of flexible array member in a nested context" } // The first three bytes of the flexible array member live in the padding. - new (bax1.ax.a) char; - new (bax1.ax.a) char[2]; - new (bax1.ax.a) Int16; + new (bax1.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[2]; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) Int16;// { dg-warning "placement" "" { target default_packed } } new (bax1.ax.a) Int32;// { dg-warning "placement" } } @@ -29,10 +29,10 @@
Re: [PATCH] handle bad __dynamic_cast more gracefully (PR 99074)
On 2/22/21 4:08 PM, Jason Merrill wrote: On 2/13/21 7:31 PM, Martin Sebor wrote: The test case in PR 99074 invokes dynamic_cast with the this pointer in a non-static member function called on a null pointer. The call is, of course, undefined and other different circumstances would be diagnosed by -Wnonnull. Unfortunately, in the test case, the null pointer is the result of inlining and constant propagation and so detected neither by the front end -Wnonnull nor by the middle end. The program ends up passing it to __dynamic_cast() which then crashes at runtime (again, not surprising for undefined behavior. However, the reporter says the code behaved gracefully (didn't crash) when compiled with GCC 4.8, and in my tests it also doesn't crash when compiled with Clang or ICC. I looked to see if it's possible to do better and it seems it is. The attached patch improves things by changing __dynamic_cast to fail by returning null when the first argument is null, and also This hunk is OK. by declaring __dynamic_cast with attribute nonnull so that invalid calls to it with a constant null pointer can be detected at compile time. This is not; dynamic_cast is specified to return null for a null operand. "If v is a null pointer value, the result is a null pointer value." The undefined behavior is the call to _to_object, not the dynamic_cast. Yes, of course. Just to be clear, in case it's not from the patch, it adds nonnull to the __dynamic_cast() function in libsupc++ which (if I read the comment right) documents nonnull-ness as its precondition. The function should never be called with a null pointer except in buggy code like in the PR. I don't think this is the most elegant way to diagnose the user bug but I also couldn't think of anything better. Do you have any suggestions? (I ask in part because for GCC 12 I'd like to see about issuing the warning requested on PR 12277.) WRT to the documentation of __dynamic_cast(), I didn't remove the comment in the patch that mentions the precondition because of the new warning. If we want to consider null pointers valid input to the function it seems we should update the comment. Do you agree? The comment is below. I assume SUB refers to SRC_PTR. /* sub: source address to be adjusted; nonnull, and since the * source object is polymorphic, *(void**)sub is a virtual pointer. * src: static type of the source object. * dst: destination type (the "T" in "dynamic_cast(v)"). * src2dst_offset: a static hint about the location of the *source subobject with respect to the complete object; *special negative values are: * -1: no hint * -2: src is not a public base of dst * -3: src is a multiple public base type but never a * virtual base type *otherwise, the src type is a unique public nonvirtual *base type of dst at offset src2dst_offset from the *origin of dst. */ extern "C" void * __dynamic_cast (const void *src_ptr,// object started from const __class_type_info *src_type, // type of the starting object const __class_type_info *dst_type, // desired target type ptrdiff_t src2dst) // how src and dst are related Since the test case is undefined it seems borderline whether this can strictly be considered a regression, even if some previous releases handled it more gracefully. Indeed. But handling the null case in __dynamic_cast as well as in the compiler seems harmless enough. Okay. Martin Jason
Re: Committed: g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets
On 2/22/21 5:48 PM, Hans-Peter Nilsson wrote: Looking at commit de05c19d5fd6, that adjustment to these tests apparently assumed that the testsuite is run (only) on targets where structure memory layout has padding as per "natural alignment". For cris-elf, a target with no padding in structure memory layout, these tests have been failing since that commit. Tested cris-elf and x86_64-linux, committed as obvious. Thanks for the heads up. Would declaring the structs with attribute packed work? If so, that would seem more robust to me since it would exercise the behavior on all targets. Martin gcc/testsuite: * g++.dg/warn/Wplacement-new-size-1.C, g++.dg/warn/Wplacement-new-size-2.C, g++.dg/warn/Wplacement-new-size-6.C: Adjust for default_packed targets. --- gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C | 12 ++-- gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C | 14 +++--- gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C | 22 +++--- 3 files changed, 24 insertions(+), 24 deletions(-) diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C index cec83163dbe7..1ad5c2da75af 100644 --- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-1.C @@ -67,8 +67,8 @@ void fBx (BAx *pbx, BAx &rbx) { BAx bax; // The uninitialized flexible array takes up the bytes of padding. - new (bax.ax.a) char; - new (bax.ax.a) Int16; + new (bax.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax.ax.a) Int16;// { dg-warning "placement" "" { target default_packed } } new (bax.ax.a) Int32;// { dg-warning "placement" } new (pbx->ax.a) char; @@ -86,10 +86,10 @@ void fBx1 () static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } }; // The empty flexible array takes up the bytes of padding. - new (bax1.ax.a) char; - new (bax1.ax.a) char[2]; - new (bax1.ax.a) Int16; - new (bax1.ax.a) char[3]; + new (bax1.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[2]; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) Int16;// { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[3]; // { dg-warning "placement" "" { target default_packed } } new (bax1.ax.a) char[4]; // { dg-warning "placement" } new (bax1.ax.a) Int32;// { dg-warning "placement" } } diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C index e5fdfe1f603e..a4de2b04629c 100644 --- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-2.C @@ -125,9 +125,9 @@ void fBx (BAx *pbx, BAx &rbx) { BAx bax; // The uninitialized flexible array takes up the bytes of padding. - new (bax.ax.a) char; - new (bax.ax.a) Int16; - new (bax.ax.a) char[3]; + new (bax.ax.a) char;// { dg-warning "placement" "" { target default_packed } } + new (bax.ax.a) Int16; // { dg-warning "placement" "" { target default_packed } } + new (bax.ax.a) char[3]; // { dg-warning "placement" "" { target default_packed } } new (bax.ax.a) Int32; // { dg-warning "placement" } new (bax.ax.a) char[4]; // { dg-warning "placement" } new (bax.ax.a) char[5]; // { dg-warning "placement" } @@ -147,10 +147,10 @@ void fBx1 () static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ {} } }; // The empty flexible array takes up the bytes of padding. - new (bax1.ax.a) char; - new (bax1.ax.a) char[2]; - new (bax1.ax.a) Int16; - new (bax1.ax.a) char[3]; + new (bax1.ax.a) char; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[2];// { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) Int16; // { dg-warning "placement" "" { target default_packed } } + new (bax1.ax.a) char[3];// { dg-warning "placement" "" { target default_packed } } new (bax1.ax.a) Int32; // { dg-warning "placement" } new (bax1.ax.a) char[4];// { dg-warning "placement" } new (bax1.ax.a) char[5];// { dg-warning "placement" } diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C index 5eb63d23b477..86a18fa08e63 100644 --- a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-6.C @@ -18,9 +18,9 @@ void fBx1 () static BAx bax1 = { 1, /* Ax = */ { 2, /* a[] = */ { 3 } } }; // { dg-error "initialization of flexible array member in a nested context" } // The first three bytes of the flexible array member live in the padding. - new (bax1.ax.a) char; - new (bax1.ax.a) char[2]; - new (bax1.ax.a) Int16; + new (bax1.ax.a) char; // { dg-warning "placement" "" { target default_p
Re: Committed: g++.dg/warn/Wplacement-new-size-1.C, -2, -6: Fix for default_packed targets
> From: Martin Sebor > Date: Tue, 23 Feb 2021 02:07:59 +0100 > On 2/22/21 5:48 PM, Hans-Peter Nilsson wrote: > > Looking at commit de05c19d5fd6, that adjustment to these > > tests apparently assumed that the testsuite is run (only) on > > targets where structure memory layout has padding as per > > "natural alignment". For cris-elf, a target with no padding > > in structure memory layout, these tests have been failing > > since that commit. > > > > Tested cris-elf and x86_64-linux, committed as obvious. > > Thanks for the heads up. Would declaring the structs with attribute > packed work? That seems likely, yes. brgds, H-P
Re: [PATCH] PR fortran/99206 - [11 Regression] ICE in add_init_expr_to_sym, at fortran/decl.c:1980
Hi Harald, Looks Good to Me. OK Thanks On 2/22/21 1:14 PM, Harald Anlauf via Fortran wrote: Dear all, when simplifying the RESHAPE intrinsic we lost the string length when the resulting array had size zero. The attached patch sets the resulting length from the source. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald PR fortran/99206 - ICE in add_init_expr_to_sym, at fortran/decl.c:1980 Make sure that the string length is properly set during simplification even when the resulting array is zero-sized. gcc/fortran/ChangeLog: PR fortran/99206 * simplify.c (gfc_simplify_reshape): Set string length for character arguments. gcc/testsuite/ChangeLog: PR fortran/99206 * gfortran.dg/reshape_zerosize_4.f90: New test.
Re: [PATCH] handle bad __dynamic_cast more gracefully (PR 99074)
On 2/22/21 8:00 PM, Martin Sebor wrote: On 2/22/21 4:08 PM, Jason Merrill wrote: On 2/13/21 7:31 PM, Martin Sebor wrote: The test case in PR 99074 invokes dynamic_cast with the this pointer in a non-static member function called on a null pointer. The call is, of course, undefined and other different circumstances would be diagnosed by -Wnonnull. Unfortunately, in the test case, the null pointer is the result of inlining and constant propagation and so detected neither by the front end -Wnonnull nor by the middle end. The program ends up passing it to __dynamic_cast() which then crashes at runtime (again, not surprising for undefined behavior. However, the reporter says the code behaved gracefully (didn't crash) when compiled with GCC 4.8, and in my tests it also doesn't crash when compiled with Clang or ICC. I looked to see if it's possible to do better and it seems it is. The attached patch improves things by changing __dynamic_cast to fail by returning null when the first argument is null, and also This hunk is OK. by declaring __dynamic_cast with attribute nonnull so that invalid calls to it with a constant null pointer can be detected at compile time. This is not; dynamic_cast is specified to return null for a null operand. "If v is a null pointer value, the result is a null pointer value." The undefined behavior is the call to _to_object, not the dynamic_cast. Yes, of course. Just to be clear, in case it's not from the patch, it adds nonnull to the __dynamic_cast() function in libsupc++ which (if I read the comment right) documents nonnull-ness as its precondition. The function should never be called with a null pointer except in buggy code like in the PR. True. So the attribute is technically correct, but the resulting warning is misleading, since the user shouldn't need to know anything about the internal implementation of dynamic_cast, and the user-visible feature doesn't have that constraint. I don't think this is the most elegant way to diagnose the user bug but I also couldn't think of anything better. Do you have any suggestions? (I ask in part because for GCC 12 I'd like to see about issuing the warning requested on PR 12277.) If we can manage to warn about a null argument to __dynamic_cast, I'd think we should also be able to warn about a null 'this' argument to _to_object. WRT to the documentation of __dynamic_cast(), I didn't remove the comment in the patch that mentions the precondition because of the new warning. If we want to consider null pointers valid input to the function it seems we should update the comment. Do you agree? The comment is from the ABI (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#dynamic_cast-algorithm), we shouldn't change it. The comment is below. I assume SUB refers to SRC_PTR. /* sub: source address to be adjusted; nonnull, and since the * source object is polymorphic, *(void**)sub is a virtual pointer. * src: static type of the source object. * dst: destination type (the "T" in "dynamic_cast(v)"). * src2dst_offset: a static hint about the location of the * source subobject with respect to the complete object; * special negative values are: * -1: no hint * -2: src is not a public base of dst * -3: src is a multiple public base type but never a * virtual base type * otherwise, the src type is a unique public nonvirtual * base type of dst at offset src2dst_offset from the * origin of dst. */ extern "C" void * __dynamic_cast (const void *src_ptr, // object started from const __class_type_info *src_type, // type of the starting object const __class_type_info *dst_type, // desired target type ptrdiff_t src2dst) // how src and dst are related Since the test case is undefined it seems borderline whether this can strictly be considered a regression, even if some previous releases handled it more gracefully. Indeed. But handling the null case in __dynamic_cast as well as in the compiler seems harmless enough. Okay. I guess it's a question of which behavior is preferable in (this) case of undefined behavior. Doing something reasonable is in general a valid choice. OTOH, the crash revealed the undefined behavior in the testcase. Jason
Re: [PATCH, constexpr, coroutines ] Generic lambda coroutines cannot be constexpr [PR96251].
On 2/22/21 3:59 PM, Iain Sandoe wrote: Hi, PR96251 (and dups) is a rejects-valid bug against coroutines (but I’m not sure that the problem is really there). * coroutines cannot be constexpr. * Part of the way that the coroutine implementation works is a consequence of the "lazy discovery of coroutine-ness”; whenever we first encounter a coroutine keyword... .. we mark the function as a coroutine, and then we can deal with diagnostics etc. that change under these circumstances. This marking of the function is independent of whether keyword expressions are type-dependent or not. * when we encounter a coroutine keyword in a constexpr context we error. * the constexpr machinery also has been taught that coroutine expressions should cause constexpr-ness to be rejected when checking for "potentially constexpr". So why is there a problem? * lambdas are implicitly constexpr from C++17. * the constexpr machinery tends to bail early *with a conservative assumption that the expression is potentially constexpr* when it finds a type-dependent expression - without evaluating sub- expressions to see if they are valid (thus missing coroutine exprs. in such positions). The combination of these ^^ two things, means that for many generic lambdas with non-trivial bodies - we then enter instantiation with a constexpr type-dependent coroutine (which is a Thing that Should not Be). As soon as we try to tsub any coroutine keyword expression encountered, we error out. * I was not able to see any way in which the instantiation process could be made to bail in this case and re-try for non-constexpr. Many of the other places that set cp_function_chain->invalid_constexpr condition their errors on !is_instantiation_of_constexpr, which should also fix this testcase. * Nor able to see somewhere else where that decision could be made. ^^ these two points reflect that I'm not familiar with the constexpr machinery. * Proposed solution. Since coroutines cannot be constexpr (not even sure what that would mean in any practicable implementation). The fix proposed is to reject constexpr-ness for coroutine functions as early as possible. * a) We can prevent a generic lambda coroutine from being converted to constexpr in finish_function (). Likewise, via the tests below, we can avoid it for regular lambdas. b) We can also reject coroutine function decls early in both is_valid_constexpr_fn () and potential_constant_expression_1 (). So - is there some alternate solution that would be better? (in some ways it seems pointless to delay rejection of a coroutine function to some later point). OTOH, I suppose that one could have some weird code where coroutine coroutine keywords only appeared in a non-constexpr paths of the code - but our current lowering is not prepared for such a circumstance. AFAIU the standard, there's no dispensation from the "cannot be constexpr" for such a code construction .. but ICBW. In any event, the cases reported (and fixed by this patch) are not trying anything so fiendishly clever. Tested on x86_64-darwin. Suggestions? OK for master (after wider testing)? thanks Iain = coroutines cannot be constexpr, but generic lambdas (from C++17) are implicitly assumed to be, and the processing of type- dependent lambda bodies often terminates before any coroutine expressions are encountered. For example, the PR notes cases where the coro expressions are hidden by type-dependent for or switch expressions. The solution proposed is to deny coroutine generic lambdas. We also tighten up the checks for is_valid_constexpr_fn() and do an early test for coroutine functions in checking for potential constant expressions. gcc/cp/ChangeLog: PR c++/96251 * constexpr.c (is_valid_constexpr_fn): Test for a coroutine before the chekc for lambda. (potential_constant_expression_1): Disallow coroutine function decls. * decl.c (finish_function): Do not mark generic coroutine lambda templates as constexpr. gcc/testsuite/ChangeLog: PR c++/96251 * g++.dg/coroutines/pr96251.C: New test. --- gcc/cp/constexpr.c | 7 ++- gcc/cp/decl.c | 2 +- gcc/testsuite/g++.dg/coroutines/pr96251.C | 22 ++ 3 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr96251.C diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 377fe322ee8..036bf04efa9 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -210,7 +210,9 @@ is_valid_constexpr_fn (tree fun, bool complain) } } - if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17) + if (DECL_COROUTINE_P (fun)) + ret = false; + else if (LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)) && cxx_dialect < cxx17) { ret = false; if (complain) @@ -7827,6 +7829,9 @@ potential_constant_expression_1 (tree t, bool want_rval
Re: [PATCH] c++: Micro-optimize instantiation_dependent_expression_p
On 2/12/21 6:36 PM, Patrick Palka wrote: This makes instantiation_dependent_expression_p avoid checking potential_constant_expression when processing_template_decl isn't set (and hence when value_dependent_expression_p is definitely false). OK. gcc/cp/ChangeLog: * pt.c (instantiation_dependent_expression_p): Check processing_template_decl before checking potential_constant_expression. --- gcc/cp/pt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index dd89dd6f896..5102bf02d0f 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -27519,7 +27519,8 @@ bool instantiation_dependent_expression_p (tree expression) { return (instantiation_dependent_uneval_expression_p (expression) - || (potential_constant_expression (expression) + || (processing_template_decl + && potential_constant_expression (expression) && value_dependent_expression_p (expression))); }
Re: [PATCH v2 0/5] RISC-V big endian support
Hi Marcus: Thanks for the quick update, I am testing your V2 patch now, the result seems really great now, some of fail case seems like not cause by big-endian patch, I am reviewing and comparing the fail case with the little-endian build. > Should I make a PR against riscv-newlib on GitHub, or would you prefer > some other process for dealing with newlib fixes related to these > patches? Could you send to newlib mailing list directly, ideally riscv-newlib just a buffer we don't want to hold any patch there as possible. https://sourceware.org/mailman/listinfo/newlib/ On Sun, Feb 21, 2021 at 8:17 AM Marcus Comstedt wrote: > > This is an update to the patch series for big endian RISC-V support. > > Changes since last version: > > * Added documentation of -mbig-endian and -mlittle-endian > > * New patch: Fix soft-fp endianness setting > > * New patch: Fix trampoline generation on big endian > > * New patch: Update the shift-shift-5.c testcase to work correctly > on big endian > > With these changes, and two fixes to newlib (setting correct floating > point byteorder, and an update to the handcoded assembler for strcmp), > I'm now down to > >= Summary of gcc testsuite = > | # of unexpected case / # of unique unexpected > case > | gcc | g++ | gfortran | > rv64gc/ lp64/ medlow | 14 / 8 | 39 /10 | - | > > and of these only two failures do not also occur for little endian: > > FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi > FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi > > I'm quite puzzled why these two testcases give different results with > -mbig-endian compared to with -mlittle-endian though, since they only > deal with register-to-register operations so the memory model should be > completely irrelevant... > > > // Marcus > > >
[PATCH v3 1/2] MIPS: unaligned load: use SImode for SUBREG if OK (PR98996)
It is found by ada s-pack96.adb ftbfs, due to 96bit load: 96 = 64 + 32. While the 32bit pair of l r is mark as SUBREG, so they are not in SImode, make it fail to find suitable insn. gcc/ChangeLog: * config/mips/mips.c (mips_expand_ext_as_unaligned_load): If TARGET_64BIT and dest is SUBREG, we check the width, if it equal to SImode, we use SImode operation, just like what we are doing for REG one. --- gcc/ChangeLog | 8 gcc/config/mips/mips.c | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3bd877243b7..e86d7817d9d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2021-02-23 YunQiang Su + + PR target/98996 + * config/mips/mips.c (mips_expand_ext_as_unaligned_load): + If TARGET_64BIT and dest is SUBREG, we check the width, if it + equal to SImode, we use SImode operation, just like what we are + doing for REG one. + 2021-02-22 Kyrylo Tkachov * config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants): diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 8bd2d29552e..e901d860c3d 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -8400,7 +8400,7 @@ mips_expand_ext_as_unaligned_load (rtx dest, rtx src, HOST_WIDE_INT width, /* If TARGET_64BIT, the destination of a 32-bit "extz" or "extzv" will be a DImode, create a new temp and emit a zero extend at the end. */ if (GET_MODE (dest) == DImode - && REG_P (dest) + && (REG_P (dest) || (SUBREG_P(dest) && !MEM_P(SUBREG_REG(dest && GET_MODE_BITSIZE (SImode) == width) { dest1 = dest; -- 2.20.1
[PATCH v3 2/2] ada: add 128bit operation to MIPS N32 and N64
For MIPS N64 and N32: add GNATRTL_128BIT_PAIRS to LIBGNAT_TARGET_PAIRS add GNATRTL_128BIT_OBJS to EXTRA_GNATRTL_NONTASKING_OBJS gcc/ada/ChangeLog: PR ada/98996 * Makefile.rtl: add 128Bit operation file to MIPS N64 and N32 (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS). --- gcc/ada/ChangeLog| 6 ++ gcc/ada/Makefile.rtl | 12 2 files changed, 18 insertions(+) diff --git a/gcc/ada/ChangeLog b/gcc/ada/ChangeLog index 52faefaa2ae..fb09986bde0 100644 --- a/gcc/ada/ChangeLog +++ b/gcc/ada/ChangeLog @@ -1,3 +1,9 @@ +2021-02-23 YunQiang Su + + PR ada/98996 + * Makefile.rtl: add 128Bit operation file to MIPS + N64 and N32 (LIBGNAT_TARGET_PAIRS, EXTRA_GNATRTL_NONTASKING_OBJS). + 2021-02-12 Arnaud Charlet * repinfo.ads, repinfo.adb (*SO_Ref*): Restore. diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl index 35faf13ea46..987eff0abba 100644 --- a/gcc/ada/Makefile.rtl +++ b/gcc/ada/Makefile.rtl @@ -2311,6 +2311,18 @@ ifeq ($(strip $(filter-out mips% linux%,$(target_cpu) $(target_os))),) s-tpopsp.adb
Re: [PATCH] c++: Private parent access check for using decls [PR19377]
On 2/15/21 7:45 AM, Anthony Sharp wrote: PARENT_BINFO, has private access to DECL. Examine certain sepcial cases "special" I did try the name lookup as Jason suggested but, as I say, in the case of overloaded functions it returns multiple values, so it didn't help in determining what DECL a USING_DECL inherited. I was thinking you could walk through the overload set to see if it contains DECL. Jason
Re: [PATCH] c++: Fix folding of non-dependent BASELINKs [PR95468]
On 2/13/21 10:21 PM, Patrick Palka wrote: On Fri, 12 Feb 2021, Patrick Palka wrote: Here, the problem ultimately seems to be that tsubst_copy_and_build, when called with empty args as we do during non-dependent expression folding, doesn't touch BASELINKs at all: it delegates to tsubst_copy which then immediately exits early due to the empty args. This means that the CAST_EXPR int(1) in the BASELINK A::condition never gets folded (as part of folding of the overall CALL_EXPR), which later causes us to crash when performing overload resolution of the rebuilt CALL_EXPR (which is in terms of this still-templated BASELINK). This doesn't happen when condition() is a namespace-scope function because then condition is represented as a TEMPLATE_ID_EXPR rather than a BASELINK, which does get handled directly from tsubst_copy_and_build. This patch fixes this issue by having tsubst_copy_and_build handle BASELINK directly rather than delegating to tsubst_copy, so that it processes BASELINKS even when args is empty. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? gcc/cp/ChangeLog: PR c++/95468 * pt.c (tsubst_copy_and_build) : New case, copied over from tsubst_copy. gcc/testsuite/ChangeLog: PR c++/95468 * g++.dg/template/non-dependent15.C: New test. --- gcc/cp/pt.c | 5 + gcc/testsuite/g++.dg/template/non-dependent15.C | 12 2 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/template/non-dependent15.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 5102bf02d0f..5b2f43dc5c1 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -19856,6 +19856,11 @@ tsubst_copy_and_build (tree t, case SCOPE_REF: RETURN (tsubst_qualified_id (t, args, complain, in_decl, /*done=*/true, /*address_p=*/false)); + +case BASELINK: + return tsubst_baselink (t, current_nonlambda_class_type (), + args, complain, in_decl); Oops, this should use the RETURN macro instead of a bare 'return' so that input_location gets properly restored on function exit. Indeed. And it looks like there's an existing instance of this bug in the LAMBDA_EXPR case of tsubst_copy_and_build. Perhaps it's a good time to replace these RETURN macros with the equivalent use of iloc_sentinel, like so? Bootstrapped and tested on x86_64-pc-linux-gnu. Both patches are OK. -- >8 -- Subject: [PATCH] c++: Replace RETURN macros with iloc_sentinel This replaces the C-era RETURN macro idiom used by some of the tsubsting functions with an iloc_sentinel declared at the start of each function. gcc/cp/ChangeLog: * pt.c (tsubst_decl): Delete the RETURN macro, and replace its uses with a plain 'return'. Set up an iloc_sentinel at the start of the function to set and restore input_location. (tsubst_expr): Likewise. Remove redundant break statements that immediately follow a return. (tsubst_copy_and_build): Likewise. Remove 'retval' local variable. Add gcc_unreachable to the end of the function. --- gcc/cp/pt.c | 316 1 file changed, 145 insertions(+), 171 deletions(-) diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 5b2f43dc5c1..32d1759258c 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -14414,15 +14414,12 @@ enclosing_instantiation_of (tree otctx) static tree tsubst_decl (tree t, tree args, tsubst_flags_t complain) { -#define RETURN(EXP) do { r = (EXP); goto out; } while(0) - location_t saved_loc; tree r = NULL_TREE; tree in_decl = t; hashval_t hash = 0; - /* Set the filename and linenumber to improve error-reporting. */ - saved_loc = input_location; - input_location = DECL_SOURCE_LOCATION (t); + /* Set the source position to improve error-reporting. */ + iloc_sentinel ils (DECL_SOURCE_LOCATION (t)); switch (TREE_CODE (t)) { @@ -14453,7 +14450,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain) if (spec && TREE_CODE (spec) == PARM_DECL && TREE_CODE (TREE_TYPE (spec)) != TYPE_PACK_EXPANSION) - RETURN (spec); + return spec; /* Expand the TYPE_PACK_EXPANSION that provides the types for the parameters in this function parameter pack. */ @@ -14466,8 +14463,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain) /* Zero-length parameter packs are boring. Just substitute into the chain. */ if (len == 0 && !cp_unevaluated_operand) - RETURN (tsubst (TREE_CHAIN (t), args, complain, - TREE_CHAIN (t))); + return tsubst (TREE_CHAIN (t), args, complain, +TREE_CHAIN (t)); } else { @@ -14
[PATCH,rs6000] [v2] Optimize pcrel access of globals
From: Aaron Sawdey This patch implements a RTL pass that looks for pc-relative loads of the address of an external variable using the PCREL_GOT relocation and a single load or store that uses that external address. Produced by a cast of thousands: * Michael Meissner * Peter Bergner * Bill Schmidt * Alan Modra * Segher Boessenkool * Aaron Sawdey This incorporates the changes requested in Segher's review. A few things I did not change were the insn-at-a-time scan that could be done with DF, and I did not change to using statistics.[ch] for the counters struct. I did try to improve the naming, and rewrote a number of comments to make them consistent with the code, and generally tried to make things more readable. OK for trunk if bootstrap/regtest passes? Thanks! Aaron gcc/ChangeLog: * config.gcc: Add pcrel-opt.o. * config/rs6000/pcrel-opt.c: New file. * config/rs6000/pcrel-opt.md: New file. * config/rs6000/predicates.md: Add d_form_memory predicate. * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT. * config/rs6000/rs6000-passes.def: Add pass_pcrel_opt. * config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(), pcrel_opt_valid_mem_p(), output_pcrel_opt_reloc(), and make_pass_pcrel_opt(). * config/rs6000/rs6000.c (reg_to_non_prefixed): Make global. (rs6000_option_override_internal): Add pcrel-opt. (rs6000_delegitimize_address): Support pcrel-opt. (rs6000_opt_masks): Add pcrel-opt. (pcrel_opt_valid_mem_p): New function. (reg_to_non_prefixed): Make global. (rs6000_asm_output_opcode): Reset next_insn_prefixed_p. (output_pcrel_opt_reloc): New function. * config/rs6000/rs6000.md (loads_extern_addr): New attr. (pcrel_extern_addr): Set loads_extern_addr. Add include for pcrel-opt.md. * config/rs6000/rs6000.opt: Add -mpcrel-opt. * config/rs6000/t-rs6000: Add rules for pcrel-opt.c and pcrel-opt.md. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pcrel-opt-inc-di.c: New test. * gcc.target/powerpc/pcrel-opt-ld-df.c: New test. * gcc.target/powerpc/pcrel-opt-ld-di.c: New test. * gcc.target/powerpc/pcrel-opt-ld-hi.c: New test. * gcc.target/powerpc/pcrel-opt-ld-qi.c: New test. * gcc.target/powerpc/pcrel-opt-ld-sf.c: New test. * gcc.target/powerpc/pcrel-opt-ld-si.c: New test. * gcc.target/powerpc/pcrel-opt-ld-vector.c: New test. * gcc.target/powerpc/pcrel-opt-st-df.c: New test. * gcc.target/powerpc/pcrel-opt-st-di.c: New test. * gcc.target/powerpc/pcrel-opt-st-hi.c: New test. * gcc.target/powerpc/pcrel-opt-st-qi.c: New test. * gcc.target/powerpc/pcrel-opt-st-sf.c: New test. * gcc.target/powerpc/pcrel-opt-st-si.c: New test. * gcc.target/powerpc/pcrel-opt-st-vector.c: New test. --- gcc/config.gcc| 8 +- gcc/config/rs6000/pcrel-opt.md| 399 gcc/config/rs6000/predicates.md | 21 + gcc/config/rs6000/rs6000-cpus.def | 2 + gcc/config/rs6000/rs6000-passes.def | 8 + gcc/config/rs6000/rs6000-pcrel-opt.c | 924 ++ gcc/config/rs6000/rs6000-protos.h | 4 + gcc/config/rs6000/rs6000.c| 111 ++- gcc/config/rs6000/rs6000.md | 8 +- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/t-rs6000| 7 +- .../gcc.target/powerpc/pcrel-opt-inc-di.c | 17 + .../gcc.target/powerpc/pcrel-opt-ld-df.c | 36 + .../gcc.target/powerpc/pcrel-opt-ld-di.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-si.c | 41 + .../gcc.target/powerpc/pcrel-opt-ld-vector.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-df.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-di.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-hi.c | 42 + .../gcc.target/powerpc/pcrel-opt-st-qi.c | 42 + .../gcc.target/powerpc/pcrel-opt-st-sf.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-si.c | 41 + .../gcc.target/powerpc/pcrel-opt-st-vector.c | 36 + 26 files changed, 2054 insertions(+), 9 deletions(-) create mode 100644 gcc/config/rs6000/pcrel-opt.md create mode 100644 gcc/config/rs6000/rs6000-pcrel-opt.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c create mode
[COMMITED][BACKPORT GCC10] aarch64: Add cpu cost tables for A64FX
This is a backport of adding cost tables for A64FX. Bootstrapped and tested on aarch64-none-linux-gnu. 2021-02-23 Qian Jianhua gcc/ChangeLog: * config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New. * config/aarch64/aarch64.c (a64fx_addrcost_table): New. (a64fx_regmove_cost, a64fx_vector_cost): New. (a64fx_tunings): Use the new added cost tables. --- gcc/config/aarch64/aarch64-cost-tables.h | 103 +++ gcc/config/aarch64/aarch64.c | 53 +++- 2 files changed, 152 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index 8a98bf4278c..c6805717f6e 100644 --- a/gcc/config/aarch64/aarch64-cost-tables.h +++ b/gcc/config/aarch64/aarch64-cost-tables.h @@ -541,4 +541,107 @@ const struct cpu_cost_table tsv110_extra_costs = } }; +const struct cpu_cost_table a64fx_extra_costs = +{ + /* ALU */ + { +0, /* arith. */ +0, /* logical. */ +0, /* shift. */ +0, /* shift_reg. */ +COSTS_N_INSNS (1), /* arith_shift. */ +COSTS_N_INSNS (1), /* arith_shift_reg. */ +COSTS_N_INSNS (1), /* log_shift. */ +COSTS_N_INSNS (1), /* log_shift_reg. */ +0, /* extend. */ +COSTS_N_INSNS (1), /* extend_arith. */ +0, /* bfi. */ +0, /* bfx. */ +0, /* clz. */ +0, /* rev. */ +0, /* non_exec. */ +true /* non_exec_costs_exec. */ + }, + { +/* MULT SImode */ +{ + COSTS_N_INSNS (4), /* simple. */ + COSTS_N_INSNS (4), /* flag_setting. */ + COSTS_N_INSNS (4), /* extend. */ + COSTS_N_INSNS (5), /* add. */ + COSTS_N_INSNS (5), /* extend_add. */ + COSTS_N_INSNS (18) /* idiv. */ +}, +/* MULT DImode */ +{ + COSTS_N_INSNS (4), /* simple. */ + 0, /* flag_setting (N/A). */ + COSTS_N_INSNS (4), /* extend. */ + COSTS_N_INSNS (5), /* add. */ + COSTS_N_INSNS (5), /* extend_add. */ + COSTS_N_INSNS (26) /* idiv. */ +} + }, + /* LD/ST */ + { +COSTS_N_INSNS (4), /* load. */ +COSTS_N_INSNS (4), /* load_sign_extend. */ +COSTS_N_INSNS (5), /* ldrd. */ +COSTS_N_INSNS (4), /* ldm_1st. */ +1, /* ldm_regs_per_insn_1st. */ +2, /* ldm_regs_per_insn_subsequent. */ +COSTS_N_INSNS (4), /* loadf. */ +COSTS_N_INSNS (4), /* loadd. */ +COSTS_N_INSNS (5), /* load_unaligned. */ +0, /* store. */ +0, /* strd. */ +0, /* stm_1st. */ +1, /* stm_regs_per_insn_1st. */ +2, /* stm_regs_per_insn_subsequent. */ +0, /* storef. */ +0, /* stored. */ +0, /* store_unaligned. */ +COSTS_N_INSNS (1), /* loadv. */ +COSTS_N_INSNS (1) /* storev. */ + }, + { +/* FP SFmode */ +{ + COSTS_N_INSNS (6), /* div. */ + COSTS_N_INSNS (1), /* mult. */ + COSTS_N_INSNS (1), /* mult_addsub. */ + COSTS_N_INSNS (2), /* fma. */ + COSTS_N_INSNS (1), /* addsub. */ + COSTS_N_INSNS (1), /* fpconst. */ + COSTS_N_INSNS (1), /* neg. */ + COSTS_N_INSNS (1), /* compare. */ + COSTS_N_INSNS (2), /* widen. */ + COSTS_N_INSNS (2), /* narrow. */ + COSTS_N_INSNS (2), /* toint. */ + COSTS_N_INSNS (2), /* fromint. */ + COSTS_N_INSNS (2)/* roundint. */ +}, +/* FP DFmode */ +{ + COSTS_N_INSNS (11), /* div. */ + COSTS_N_INSNS (1), /* mult. */ + COSTS_N_INSNS (1), /* mult_addsub. */ + COSTS_N_INSNS (2), /* fma. */ + COSTS_N_INSNS (1), /* addsub. */ + COSTS_N_INSNS (1), /* fpconst. */ + COSTS_N_INSNS (1), /* neg. */ + COSTS_N_INSNS (1), /* compare. */ + COSTS_N_INSNS (2), /* widen. */ + COSTS_N_INSNS (2), /* narrow. */ + COSTS_N_INSNS (2), /* toint. */ + COSTS_N_INSNS (2), /* fromint. */ + COSTS_N_INSNS (2)/* roundint. */ +} + }, + /* Vector */ + { +COSTS_N_INSNS (1) /* alu. */ + } +}; + #endif diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9bfc99ab090..46fe6835506 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -464,6 +464,22 @@ static const struct cpu_addrcost_table qdf24xx_addrcost_table = 2, /* imm_offset */ }; +static const struct
Re: [PATCH v2 0/5] RISC-V big endian support
Seems like only 3 fail are related to big-endian, you don't need to worry about other fails. FAIL: gcc.c-torture/execute/string-opt-5.c FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi On Tue, Feb 23, 2021 at 10:38 AM Kito Cheng wrote: > > Hi Marcus: > > Thanks for the quick update, I am testing your V2 patch now, the result seems > really great now, some of fail case seems like not cause by > big-endian patch, I am reviewing and comparing the fail case with the > little-endian build. > > > Should I make a PR against riscv-newlib on GitHub, or would you prefer > > some other process for dealing with newlib fixes related to these > > patches? > > Could you send to newlib mailing list directly, ideally riscv-newlib > just a buffer > we don't want to hold any patch there as possible. > https://sourceware.org/mailman/listinfo/newlib/ > > > > > On Sun, Feb 21, 2021 at 8:17 AM Marcus Comstedt wrote: > > > > This is an update to the patch series for big endian RISC-V support. > > > > Changes since last version: > > > > * Added documentation of -mbig-endian and -mlittle-endian > > > > * New patch: Fix soft-fp endianness setting > > > > * New patch: Fix trampoline generation on big endian > > > > * New patch: Update the shift-shift-5.c testcase to work correctly > > on big endian > > > > With these changes, and two fixes to newlib (setting correct floating > > point byteorder, and an update to the handcoded assembler for strcmp), > > I'm now down to > > > >= Summary of gcc testsuite = > > | # of unexpected case / # of unique unexpected > > case > > | gcc | g++ | gfortran | > > rv64gc/ lp64/ medlow | 14 / 8 | 39 /10 | - | > > > > and of these only two failures do not also occur for little endian: > > > > FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi > > FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi > > > > I'm quite puzzled why these two testcases give different results with > > -mbig-endian compared to with -mlittle-endian though, since they only > > deal with register-to-register operations so the memory model should be > > completely irrelevant... > > > > > > // Marcus > > > > > >
Re: [PATCH v2 0/5] RISC-V big endian support
Hi Kito, Kito Cheng writes: > FAIL: gcc.c-torture/execute/string-opt-5.c > FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi > FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi string-opt-5.c is one of the newlib issues I mentioned (handcoded assembler for strcmp which assumed LE (it was intended to #error out on BE, but used "BYTE_ORDER" instead of "__BYTE_ORDER__", so the check never worked)). I'll send the fixes later today. The shift-and tests don't generate incorrect code or anything, but it's still puzzling why the generated code is different from with -mlittle-endian. // Marcus
Re: [gcc-12 PATCH] ira: Correct HONOR_REG_ALLOC_ORDER usage
On Mon, 22 Feb 2021, Uros Bizjak wrote: > The intention of HONOR_REG_ALLOC_ORDER is to ensure that IRA allocates > registers in the order given by REG_ALLOC_ORDER. However in > ira_better_spill_reload_regno_p, there is still a place where the > calculation depends on the presence of REG_ALLOC_ORDER, ignoring > HONOR_REG_ALLOC_ORDER macro altogether. The patch uses the correct macro > at this place. > > On the other hand, assign_hard_reg function respects HONOR_REG_ALLOC_ORDER, > but expects this macro to return 1 to avoid internal cost calculations. > As the macro is defined to 0 by default, it is expected that targets redefine > HONOR_REG_ALLOC_ORDER to return nonzero value, even if REG_ALLOC_ORDER > is defined. This approach is prone to errors, so the patch defines > HONOR_REG_ALLOC_ORDER to 1 by default if REG_ALLOC_ORDER is defined. > > 2021-02-22 Uroš Bizjak > > gcc/ > * defaults.h (HONOR_REG_ALLOC_ORDER): If not defined, > define to 1 if REG_ALLOC_ORDER is defined. > * doc/tm.texi.in (HONOR_REG_ALLOC_ORDER): > Describe new default definition. > * doc/tm.texi: Regenerate. > * ira-color.c (ira_better_spill_reload_regno_p): > Use HONOR_REG_ALLOC_ORDER instead of REG_ALLOC_ORDER > to determine better spill reload regno. > > Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > OK for gcc-12 when it opens? OK in case Vlad doesn't have a better suggestion or further comments. Do you have an idea for how many targets the changed default is an actual change? Thanks, Richard. > Uros. > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)