[PATCH, wwwdocs] Update GCC 8 release notes for NDS32 port
Hi, Gerald, In previous patch we add new options for NDS32 port: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01397.html So I think I need to update GCC 8 release notes as well. Index: htdocs/gcc-8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v retrieving revision 1.30 diff -u -r1.30 changes.html --- htdocs/gcc-8/changes.html 21 Jan 2018 20:01:31 - 1.30 +++ htdocs/gcc-8/changes.html 22 Jan 2018 06:28:56 - @@ -348,7 +348,13 @@ - +NDS32 + + +New command-line options -mext-perf -mext-perf2 -mext-string +have been added for performance extension instructions. + + Nios II Is this patch OK? Best regards, jasonwucj
[PATCH] Clean-up IPA profile dump output.
Hi. I'm aware in which development stage we are. However the patch is small and makes dump files readable. Hope such patch can be accepted even now? Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Martin gcc/ChangeLog: 2018-01-22 Martin Liska * tree-profile.c (tree_profiling): Print function header to aware reader which function we are working on. * value-prof.c (gimple_find_values_to_profile): Do not print not interesting value histograms. --- gcc/tree-profile.c | 4 gcc/value-prof.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c index 9d919062db1..f96bd4b9704 100644 --- a/gcc/tree-profile.c +++ b/gcc/tree-profile.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include "params.h" #include "stringpool.h" #include "attribs.h" +#include "tree-pretty-print.h" static GTY(()) tree gcov_type_node; static GTY(()) tree tree_interval_profiler_fn; @@ -671,6 +672,9 @@ tree_profiling (void) push_cfun (DECL_STRUCT_FUNCTION (node->decl)); + if (dump_file) + dump_function_header (dump_file, cfun->decl, dump_flags); + /* Local pure-const may imply need to fixup the cfg. */ if (execute_fixup_cfg () & TODO_cleanup_cfg) cleanup_tree_cfg (); diff --git a/gcc/value-prof.c b/gcc/value-prof.c index b503320f188..16cdbd64f46 100644 --- a/gcc/value-prof.c +++ b/gcc/value-prof.c @@ -2053,7 +2053,7 @@ gimple_find_values_to_profile (histogram_values *values) default: gcc_unreachable (); } - if (dump_file) + if (dump_file && hist->hvalue.stmt != NULL) { fprintf (dump_file, "Stmt "); print_gimple_stmt (dump_file, hist->hvalue.stmt, 0, TDF_SLIM);
[PATCH] PR 83975 Set character length to 0 if unknown
When associating a variable of type character, if the length of the target isn't known, set it to zero rather than leaving it unset. This is not a complete fix for making associate of characters work properly, but papers over an ICE in the middle-end. See PR 83344 for more details. Regtested on x86_64-pc-linux-gnu, Ok for trunk? gcc/fortran/ChangeLog: 2018-01-23 Janne Blomqvist PR 83975 PR 83344 * resolve.c (resolve_assoc_var): Set character length to zero if target length unknown. gcc/testsuite/ChangeLog: 2018-01-23 Janne Blomqvist PR 83975 PR 83344 * gfortran.dg/associate_31.f90: New test. --- gcc/fortran/resolve.c | 13 - gcc/testsuite/gfortran.dg/associate_31.f90 | 8 2 files changed, 16 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/associate_31.f90 diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index a2b892a..f3fc3ca 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -8634,11 +8634,14 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target) if (!sym->ts.u.cl) sym->ts.u.cl = target->ts.u.cl; - if (!sym->ts.u.cl->length && !sym->ts.deferred - && target->expr_type == EXPR_CONSTANT) - sym->ts.u.cl->length - = gfc_get_int_expr (gfc_charlen_int_kind, - NULL, target->value.character.length); + if (!sym->ts.u.cl->length && !sym->ts.deferred) + /* TODO: Setting the length to zero if the expression isn't + constant is probably not correct, but at this point we + don't have any better idea what it should be either. */ + sym->ts.u.cl->length = + gfc_get_int_expr (gfc_charlen_int_kind, NULL, + target->expr_type == EXPR_CONSTANT ? + target->value.character.length : 0); } /* If the target is a good class object, so is the associate variable. */ diff --git a/gcc/testsuite/gfortran.dg/associate_31.f90 b/gcc/testsuite/gfortran.dg/associate_31.f90 new file mode 100644 index 000..75774c8 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/associate_31.f90 @@ -0,0 +1,8 @@ +! { dg-do compile } +! PR 83975, see also PR 83344 +! Testcase contributed by Gerhard Steinmetz +subroutine s(x) + character(*) :: x + associate (y => x) + end associate +end subroutine s -- 2.7.4
Re: [PATCH 1/2] Introduce prefetch-minimum stride option
Hi Luis, On 22/01/18 13:46, Luis Machado wrote: This patch adds a new option to control the minimum stride, for a memory reference, after which the loop prefetch pass may issue software prefetch hints for. There are two motivations: * Make the pass less aggressive, only issuing prefetch hints for bigger strides that are more likely to benefit from prefetching. I've noticed a case in cpu2017 where we were issuing thousands of hints, for example. I've noticed a large amount of prefetch hints being issued as well, but had not analysed it further. * For processors that have a hardware prefetcher, like Falkor, it allows the loop prefetch pass to defer prefetching of smaller (less than the threshold) strides to the hardware prefetcher instead. This prevents conflicts between the software prefetcher and the hardware prefetcher. I've noticed considerable reduction in the number of prefetch hints and slightly positive performance numbers. This aligns GCC and LLVM in terms of prefetch behavior for Falkor. Do you, by any chance, have a link to the LLVM review that implemented that behavior? It's okay if you don't, but I think it would be useful context. The default settings should guarantee no changes for existing targets. Those are free to tweak the settings as necessary. No regressions in the testsuite and bootstrapped ok on aarch64-linux. Ok? Are there any benchmark numbers you can share? I think this approach is sensible. Since your patch touches generic code as well as AArch64 code you'll need an approval from a midend maintainer as well as an AArch64 maintainer. Also, GCC development is now in the regression fixing stage, so unless this fixes a regression it may have to wait until GCC 9 development is opened. Thanks, Kyrill 2018-01-22 Luis Machado Introduce option to limit software prefetching to known constant strides above a specific threshold with the goal of preventing conflicts with a hardware prefetcher. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include minimum_stride field. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_MINIMUM_STRIDE. * doc/invoke.texi (prefetch-minimum-stride): Document new option. * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New. * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if stride is constant and is below the minimum stride threshold. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 13 - gcc/doc/invoke.texi | 15 +++ gcc/params.def | 9 + gcc/params.h| 2 ++ gcc/tree-ssa-loop-prefetch.c| 16 6 files changed, 57 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ef1b0bc..8736bd9 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* The minimum constant stride beyond which we should use prefetch + hints for. */ + const int minimum_stride; const int default_opt_level; }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 174310c..0ed9f14 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024,/* l2_cache_size */ - -1 /* default_opt_level */ + 2048,/* minimum_stride */ + 3/* default_opt_level */ }; static const cpu_prefetch_tune thunderxt88_prefetch
Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
Hi Luis, On 22/01/18 13:46, Luis Machado wrote: The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? This also looks like a sensible approach to me with a caveat inline. The same general comments as for patch [1/2] apply. Thanks, Kyrill 2018-01-22 Luis Machado Introduce option to control whether the software prefetch pass should issue prefetch hints for non-constant strides. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const unsigned int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 11 +++ gcc/doc/invoke.texi | 10 ++ gcc/params.def | 9 + gcc/params.h| 2 ++ gcc/tree-ssa-loop-prefetch.c| 10 ++ 6 files changed, 45 insertions(+) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; I understand that the midend PARAMs are defined as integers, but I think the backend tuning option here is better represented as a boolean as it really is just a yes/no decision. /* The minimum constant stride beyond which we should use prefetch hints for. */ const int minimum_stride; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0ed9f14..713b230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024,/* l2_cache_size */ + 0, /* prefetch_dynamic_strides */ 2048,/* minimum_stride */ 3/* default_opt_level */ }; @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ 3/* default_opt_level */ }; @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_siz
Re: [PATCH] Clean-up IPA profile dump output.
> Hi. > > I'm aware in which development stage we are. However the patch is small and > makes > dump files readable. Hope such patch can be accepted even now? > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Martin > > gcc/ChangeLog: > > 2018-01-22 Martin Liska > > * tree-profile.c (tree_profiling): Print function header to > aware reader which function we are working on. > * value-prof.c (gimple_find_values_to_profile): Do not print > not interesting value histograms. OK. How those non-interesting value histograms arrise? Honza > --- > gcc/tree-profile.c | 4 > gcc/value-prof.c | 2 +- > 2 files changed, 5 insertions(+), 1 deletion(-) > > > diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c > index 9d919062db1..f96bd4b9704 100644 > --- a/gcc/tree-profile.c > +++ b/gcc/tree-profile.c > @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see > #include "params.h" > #include "stringpool.h" > #include "attribs.h" > +#include "tree-pretty-print.h" > > static GTY(()) tree gcov_type_node; > static GTY(()) tree tree_interval_profiler_fn; > @@ -671,6 +672,9 @@ tree_profiling (void) > >push_cfun (DECL_STRUCT_FUNCTION (node->decl)); > > + if (dump_file) > + dump_function_header (dump_file, cfun->decl, dump_flags); > + >/* Local pure-const may imply need to fixup the cfg. */ >if (execute_fixup_cfg () & TODO_cleanup_cfg) > cleanup_tree_cfg (); > diff --git a/gcc/value-prof.c b/gcc/value-prof.c > index b503320f188..16cdbd64f46 100644 > --- a/gcc/value-prof.c > +++ b/gcc/value-prof.c > @@ -2053,7 +2053,7 @@ gimple_find_values_to_profile (histogram_values *values) > default: > gcc_unreachable (); > } > - if (dump_file) > + if (dump_file && hist->hvalue.stmt != NULL) > { > fprintf (dump_file, "Stmt "); >print_gimple_stmt (dump_file, hist->hvalue.stmt, 0, TDF_SLIM); >
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
.. testing completed OK. Just in case it wasn't obvious in my previous messages, when we are calling check_for_uninitialized_const_var from potential_constant_expression_1 we can't use CP_TYPE_CONST_P as gate for emitting diagnostic - as just discovered - neither we can use var_in_constexpr_fn, which would cause many regressions for variables in constexpr functions, I remember cpp1y/pr63996.C for example. Conservatively, !DECL_NONTRIVIALLY_INITIALIZED_P supplemented by default_init_uninitialized_part appears to work fine. Thanks, Paolo.
Re: [PATCH 4/5] Remove predictors that are unrealiable.
> On 01/19/2018 12:57 PM, Martin Liška wrote: > > Yes, there's a huge difference in between CPU 2006 and 2017. Former has 63% > > w/ dominant edges, > > and later one only 11%. It's caused by these 2 benchmarks with a high > > coverage: > > > > Hi. > > I'm sending details about the 2 edges that influence the statistics > significantly: > > > 500.perlbench_r: regexec.c.065i.profile: > > negative return heuristics of edge 1368->1370: 2.0% exec 2477714850 hit > > 2429863555 (98.1%) > > 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp) > -: 3485:{ > 2477714850: 3486:CHECKPOINT lastcp; > 2477714850: 3487:REGEXP *const rx = reginfo->prog; > 2477714850: 3488:regexp *const prog = ReANY(rx); > 2477714850: 3489:SSize_t result; > [snip] > 2477714850: 8046:assert(!result || locinput - reginfo->strbeg >= 0); > 2477714850: 8047:return result ? locinput - reginfo->strbeg : -1; > -: 8048:} > > As seen it return -1 if a regex is not found, which is in case of perlbench > very likely branch. > > > > > and 523.xalancbmk_r: > > build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile: > > negative return heuristics of edge 3->4: 2.0% exec 1221735072 hit > > 1221522453 (100.0%) > > 1221735072: 74:int NameDatatypeValidator::compare(const XMLCh* const lValue > -: 75: , const XMLCh* const rValue > -: 76: , MemoryManager* > const) > -: 77:{ > 1221735072: 78:return ( XMLString::equals(lValue, rValue)? 0 : -1); > -: 79:} > -: 80: > > IP profile dump file: > > xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator * > const this, const XMLCh * const lValue, const XMLCh * const rValue, struct > MemoryManager * const D.17157) > { > bool _1; > int iftmp.0_2; > >[count: 1221735072]: > # DEBUG BEGIN_STMT > _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D)); > if (_1 != 0) > goto ; [0.02%] > else > goto ; [99.98%] > >[count: 1221522453]: > >[count: 1221735072]: > # iftmp.0_2 = PHI <0(2), -1(3)> > return iftmp.0_2; > > } > > Likewise, XML values are more commonly non-equal. > Ok, so may I mark also negative return with PROB_EVEN to track it? Well, if we start disabling predictors just because we can find benchmark where they do not perform as expected, we will need to disable them all. It is nature of heuristics to make mistakes, just they should do something useful for common code. It is not goal to make heuristics that makes no mistake. Heuristics is useful either if it have high precision even if it cover few branches (say noreturn), or if it have large coverage and still some hitrate (say opcode, call or return based heuristics). To make things bit more esoteric, in practice this is also bit weighted by fact how much damage bad prediction does. For example, call heuristics which predict non-call path to be taken is likely going to do little damage. Even if call path is common it is heavy and hard to optimize by presence of call and thus we don't loose that much in common scenarios. In the second case: 00073 // --- 00074 // Compare methods 00075 // --- 00076 int NameDatatypeValidator::compare(const XMLCh* const lValue 00077, const XMLCh* const rValue 00078, MemoryManager* const) 00079 { 00080 return ( XMLString::equals(lValue, rValue)? 0 : -1); 00081 } I would say it is odd coding style. I do not see why it is not declared bool and not returning true/false. First case seems bit non-standard too as it looks like usual cache lookup just the cache frequently fails. I would be in favor of keeping the prediction with non-50% hitrate thus unless we run into some performance problems. If there are only two benchmarks out of say 20 we tried (in spec2000 and 2k17) it seems fine to me. There is issue with the way we collect our data. Using arithmetic mean across spec is optimizing for overall runtime of all spec benchmarks combined together. We more want to look for safer values that do well for average benchmarks independently how many branches they do. We could consider making the statistics script also collect geometric averages across different benchmarks. If we will see big difference between arithmetic mean and geomean we will know there is small subset of benchmarks which dominates and we could decide what to do with the probability. Honza > > Thanks, > Martin > > > > > Ideas what to do with the predictor for GCC 8 release? > > Martin
Fix quality tracking for named probabilities
Hi, this is first patch of two which aim to fix function partitioning issues where we put basic blocks to cold sections while we should not. This patch fixes issues where we think we have precise profile information but we don't because we scaled it by some of named probability values. It is nature of these that we use them in contexts where we don't know precise outcome. So I changed them all to guessed. This prevents us from wrong function splitting decisions. Bootstrapped/regtested x86_64-linux, comitted. Honza * profile-count.h (profile_probability::very_unlikely, profile_probability::unlikely, profile_probability::even): Set precision to guessed. Index: profile-count.h === --- profile-count.h (revision 256973) +++ profile-count.h (working copy) @@ -164,7 +164,7 @@ public: { /* Be consistent with PROB_VERY_UNLIKELY in predict.h. */ profile_probability r -= profile_probability::always ().apply_scale (1, 2000); += profile_probability::guessed_always ().apply_scale (1, 2000); r.m_val--; return r; } @@ -172,13 +172,13 @@ public: { /* Be consistent with PROB_VERY_LIKELY in predict.h. */ profile_probability r -= profile_probability::always ().apply_scale (1, 5); += profile_probability::guessed_always ().apply_scale (1, 5); r.m_val--; return r; } static profile_probability even () { - return profile_probability::always ().apply_scale (1, 2); + return profile_probability::guessed_always ().apply_scale (1, 2); } static profile_probability very_likely () {
Make unlikely bb discovery more robust
Hi, this is second part of fix for wrong partitining decisions. It makes probabily_never_executed to ignore scaled profiles because they may be low for invalid reasons. This does not seem to affect size of cold section significantly (as it is predocminantly occupied by blocks with 0 count that remains precise). Bootstrapped/regtested x86_64-linux, comitted. Honza * predict.c (probably_never_executed): Only use precise profile info. (compute_function_frequency): Skip after inlining hack since we now have quality checking. Index: predict.c === --- predict.c (revision 256973) +++ predict.c (working copy) @@ -212,7 +212,12 @@ probably_never_executed (struct function gcc_checking_assert (fun); if (count == profile_count::zero ()) return true; - if (count.initialized_p () && profile_status_for_fn (fun) == PROFILE_READ) + /* Do not trust adjusted counts. This will make us to drop int cold section + code with low execution count as a result of inlining. These low counts + are not safe even with read profile and may lead us to dropping + code which actually gets executed into cold section of binary that is not + desirable. */ + if (count.precise_p () && profile_status_for_fn (fun) == PROFILE_READ) { int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); if (count.apply_scale (unlikely_count_fraction, 1) >= profile_info->runs) @@ -3759,15 +3764,10 @@ compute_function_frequency (void) return; } - /* Only first time try to drop function into unlikely executed. - After inlining the roundoff errors may confuse us. - Ipa-profile pass will drop functions only called from unlikely - functions to unlikely and that is most of what we care about. */ - if (!cfun->after_inlining) -{ - node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED; - warn_function_cold (current_function_decl); -} + node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED; + warn_function_cold (current_function_decl); + if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa() == profile_count::zero ()) +return; FOR_EACH_BB_FN (bb, cfun) { if (maybe_hot_bb_p (cfun, bb))
Re: [PACH][RFC] Pass -m -jN to gcc_build from gcc_release
On Mon, 22 Jan 2018, Jeff Law wrote: > On 01/22/2018 01:57 AM, Jakub Jelinek wrote: > > On Mon, Jan 22, 2018 at 09:37:05AM +0100, Richard Biener wrote: > >> > >> Currently gcc_release builds GCC (for generating in-tree generated > >> files) serially - that's prohibitly expensive and takges ages (>4h for > >> me). I'm using (when I remember to edit gcc_release ...) -j6 without > >> a problem for some years, thus the following proposal. > >> > >> Any recommendation for the default N? 4 might work just fine as well > >> and no release manager should do without at least 4 cores... > > > > Perhaps > > num_cpus=1 > > if type -p getconf 2>/dev/null; then > > num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null` > > case "$num_cpus" in > > '' | 0* | *[!0-9]*) num_cpus=1;; > > esac > > fi > > > > and "-j$num_cpus" ? > Seems reasonable to me -- I believe we do something similar on our > various internal scripts ({redhat,fedora}-rpm-config). > > jeff Ok, I'll commit the following then if it all goes well when doing the final GCC 7.3 release. Richard. Index: maintainer-scripts/gcc_release === --- maintainer-scripts/gcc_release (revision 256973) +++ maintainer-scripts/gcc_release (working copy) @@ -209,8 +209,16 @@ EOF # on at least one platform. inform "Building compiler" OBJECT_DIRECTORY=../objdir +num_cpus=1 +if type -p getconf 2>/dev/null; then + num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null` + case "$num_cpus" in + '' | 0* | *[!0-9]*) num_cpus=1;; + esac +fi contrib/gcc_build -d ${SOURCE_DIRECTORY} -o ${OBJECT_DIRECTORY} \ - -c "--enable-generated-files-in-srcdir --disable-multilib" build || \ + -c "--enable-generated-files-in-srcdir --disable-multilib" \ + -m "-j$num_cpus" build || \ error "Could not rebuild GCC" fi
Re: Disable some patterns for fold-left reductions (PR 83965)
On Mon, Jan 22, 2018 at 6:53 PM, Richard Sandiford wrote: > In this PR we recognised a PLUS_EXPR as a fold-left reduction, > then applied pattern matching to convert it to a WIDEN_SUM_EXPR. > We need to keep the original code in this case since we implement > the reduction using scalar rather than vector operations. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > OK to install? Ok. Richard. > Richard > > > 2018-01-22 Richard Sandiford > > gcc/ > PR tree-optimization/83965 > * tree-vect-patterns.c (vect_reassociating_reduction_p): New function. > (vect_recog_dot_prod_pattern, vect_recog_sad_pattern): Use it > instead of checking only for a reduction. > (vect_recog_widen_sum_pattern): Likewise. > > gcc/testsuite/ > PR tree-optimization/83965 > * gcc.dg/vect/pr83965.c: New test. > > Index: gcc/tree-vect-patterns.c > === > --- gcc/tree-vect-patterns.c2018-01-13 18:01:51.240735098 + > +++ gcc/tree-vect-patterns.c2018-01-22 17:51:06.751462689 + > @@ -217,6 +217,16 @@ vect_recog_temp_ssa_var (tree type, gimp >return make_temp_ssa_name (type, stmt, "patt"); > } > > +/* Return true if STMT_VINFO describes a reduction for which reassociation > + is allowed. */ > + > +static bool > +vect_reassociating_reduction_p (stmt_vec_info stmt_vinfo) > +{ > + return (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def > + && STMT_VINFO_REDUC_TYPE (stmt_vinfo) != FOLD_LEFT_REDUCTION); > +} > + > /* Function vect_recog_dot_prod_pattern > > Try to find the following pattern: > @@ -336,7 +346,7 @@ vect_recog_dot_prod_pattern (vec { >gimple *def_stmt; > > - if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def > + if (!vect_reassociating_reduction_p (stmt_vinfo) > && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo)) > return NULL; >oprnd0 = gimple_assign_rhs1 (last_stmt); > @@ -557,7 +567,7 @@ vect_recog_sad_pattern (vec *s > { >gimple *def_stmt; > > - if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def > + if (!vect_reassociating_reduction_p (stmt_vinfo) > && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo)) > return NULL; >plus_oprnd0 = gimple_assign_rhs1 (last_stmt); > @@ -1181,7 +1191,7 @@ vect_recog_widen_sum_pattern (vecif (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR) > return NULL; > > - if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def > + if (!vect_reassociating_reduction_p (stmt_vinfo) >&& ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo)) > return NULL; > > Index: gcc/testsuite/gcc.dg/vect/pr83965.c > === > --- /dev/null 2018-01-22 13:31:44.608695204 + > +++ gcc/testsuite/gcc.dg/vect/pr83965.c 2018-01-22 17:51:06.751462689 + > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-ftrapv" } */ > + > +int > +mac (const short *a, const short *b, int sqr, int *sum) > +{ > + int i; > + int dotp = *sum; > + > + for (i = 0; i < 150; i++) > +{ > + dotp += b[i] * a[i]; > + sqr += b[i] * b[i]; > +} > + > + *sum = dotp; > + return sqr; > +}
Re: Fix vect_float markup for a couple of tests (PR 83888)
On Mon, Jan 22, 2018 at 6:55 PM, Richard Sandiford wrote: > vect_float is true for arm*-*-* targets, but the support is only > available when -funsafe-math-optimizations is on. This caused > failures in two tests that disable fast-math. > > The easiest fix seemed to be to add a new target selector for > "vect_float without special options". > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > OK to install? Ok - so arm float vectors are not IEEE compliant? Richard. > Richard > > > 2018-01-22 Richard Sandiford > > gcc/ > PR testsuite/83888 > * doc/sourcebuild.texi (vect_float): Say that the selector > only describes the situation when -funsafe-math-optimizations is on. > (vect_float_strict): Document. > > gcc/testsuite/ > PR testsuite/83888 > * lib/target-supports.exp (check_effective_target_vect_float): Say > that the result only holds when -funsafe-math-optimizations is on. > (check_effective_target_vect_float_strict): New procedure. > * gcc.dg/vect/no-fast-math-vect16.c: Use vect_float_strict instead > of vect_float. > * gcc.dg/vect/vect-reduc-6.c: Likewise. > > Index: gcc/doc/sourcebuild.texi > === > --- gcc/doc/sourcebuild.texi2018-01-22 17:51:03.860579049 + > +++ gcc/doc/sourcebuild.texi2018-01-22 17:54:02.172848564 + > @@ -1403,7 +1403,13 @@ The target's preferred vector alignment > alignment. > > @item vect_float > -Target supports hardware vectors of @code{float}. > +Target supports hardware vectors of @code{float} when > +@option{-funsafe-math-optimizations} is in effect. > + > +@item vect_float_strict > +Target supports hardware vectors of @code{float} when > +@option{-funsafe-math-optimizations} is not in effect. > +This implies @code{vect_float}. > > @item vect_int > Target supports hardware vectors of @code{int}. > Index: gcc/testsuite/lib/target-supports.exp > === > --- gcc/testsuite/lib/target-supports.exp 2018-01-22 17:51:03.817580787 > + > +++ gcc/testsuite/lib/target-supports.exp 2018-01-22 17:54:02.173848531 > + > @@ -5492,7 +5492,8 @@ proc check_effective_target_vect_long { > return $answer > } > > -# Return 1 if the target supports hardware vectors of float, 0 otherwise. > +# Return 1 if the target supports hardware vectors of float when > +# -funsafe-math-optimizations is enabled, 0 otherwise. > # > # This won't change for different subtargets so cache the result. > > @@ -5525,6 +5526,14 @@ proc check_effective_target_vect_float { > return $et_vect_float_saved($et_index) > } > > +# Return 1 if the target supports hardware vectors of float without > +# -funsafe-math-optimizations being enabled, 0 otherwise. > + > +proc check_effective_target_vect_float_strict { } { > +return [expr { [check_effective_target_vect_float] > + && ![istarget arm*-*-*] }] > +} > + > # Return 1 if the target supports hardware vectors of double, 0 otherwise. > # > # This won't change for different subtargets so cache the result. > Index: gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c > === > --- gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-13 > 18:01:15.293116922 + > +++ gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-22 > 17:54:02.172848564 + > @@ -1,4 +1,4 @@ > -/* { dg-require-effective-target vect_float } */ > +/* { dg-require-effective-target vect_float_strict } */ > > #include > #include "tree-vect.h" > Index: gcc/testsuite/gcc.dg/vect/vect-reduc-6.c > === > --- gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-13 18:01:15.294116882 > + > +++ gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-22 17:54:02.172848564 > + > @@ -1,4 +1,4 @@ > -/* { dg-require-effective-target vect_float } */ > +/* { dg-require-effective-target vect_float_strict } */ > /* { dg-additional-options "-fno-fast-math" } */ > > #include
Re: [PATCH] Fix memory leaks in sbitmap.c selftests
On Mon, Jan 22, 2018 at 8:51 PM, David Malcolm wrote: > "make selftest-valgrind" shows a few leaks in sbitmap.c's selftests; > this patch fixes them. > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. > OK for trunk? Ok. > gcc/ChangeLog: > * sbitmap.c (selftest::test_set_range): Fix memory leaks. > (selftest::test_bit_in_range): Likewise. > --- > gcc/sbitmap.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c > index cf46cb2..967868a 100644 > --- a/gcc/sbitmap.c > +++ b/gcc/sbitmap.c > @@ -897,6 +897,7 @@ test_set_range () >bitmap_set_range (s, 15, 1); >ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 1, 14)); >ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 15, 15)); > + sbitmap_free (s); > >s = sbitmap_alloc (1024); >bitmap_clear (s); > @@ -914,6 +915,7 @@ test_set_range () >ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 512 + 64, 1023)); >ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512, 512)); >ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512 + 63, 512 + 63)); > + sbitmap_free (s); > } > > /* Verify bitmap_bit_in_range_p functions for sbitmap. */ > @@ -935,6 +937,8 @@ test_bit_in_range () >ASSERT_TRUE (bitmap_bit_in_range_p (s, 100, 100)); >ASSERT_TRUE (bitmap_bit_p (s, 100)); > > + sbitmap_free (s); > + >s = sbitmap_alloc (64); >bitmap_clear (s); >bitmap_set_bit (s, 63); > @@ -942,6 +946,7 @@ test_bit_in_range () >ASSERT_TRUE (bitmap_bit_in_range_p (s, 1, 63)); >ASSERT_TRUE (bitmap_bit_in_range_p (s, 63, 63)); >ASSERT_TRUE (bitmap_bit_p (s, 63)); > + sbitmap_free (s); > >s = sbitmap_alloc (1024); >bitmap_clear (s); > @@ -985,6 +990,7 @@ test_bit_in_range () >ASSERT_FALSE (bitmap_bit_in_range_p (s, 17, 31)); >ASSERT_FALSE (bitmap_bit_in_range_p (s, 49, 63)); >ASSERT_FALSE (bitmap_bit_in_range_p (s, 65, 1023)); > + sbitmap_free (s); > } > > /* Run all of the selftests within this file. */ > -- > 1.8.5.3 >
Re: [PATCH] Add more test coverage to selftest::test_location_wrappers
On Mon, Jan 22, 2018 at 8:52 PM, David Malcolm wrote: > This patch adds a few extra assertions to selftest::test_location_wrappers. > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. > OK for trunk? Ok. > gcc/ChangeLog: > * tree.c (selftest::test_location_wrappers): Add more test > coverage. > --- > gcc/tree.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/tree.c b/gcc/tree.c > index b3e93b8..c5baf08 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -14490,6 +14490,8 @@ test_location_wrappers () > { >location_t loc = BUILTINS_LOCATION; > > + ASSERT_EQ (NULL_TREE, maybe_wrap_with_location (NULL_TREE, loc)); > + >/* Wrapping a constant. */ >tree int_cst = build_int_cst (integer_type_node, 42); >ASSERT_FALSE (CAN_HAVE_LOCATION_P (int_cst)); > @@ -14500,6 +14502,14 @@ test_location_wrappers () >ASSERT_EQ (loc, EXPR_LOCATION (wrapped_int_cst)); >ASSERT_EQ (int_cst, tree_strip_any_location_wrapper (wrapped_int_cst)); > > + /* We shouldn't add wrapper nodes for UNKNOWN_LOCATION. */ > + ASSERT_EQ (int_cst, maybe_wrap_with_location (int_cst, UNKNOWN_LOCATION)); > + > + /* We shouldn't add wrapper nodes for nodes that CAN_HAVE_LOCATION_P. */ > + tree cast = build1 (NOP_EXPR, char_type_node, int_cst); > + ASSERT_TRUE (CAN_HAVE_LOCATION_P (cast)); > + ASSERT_EQ (cast, maybe_wrap_with_location (cast, loc)); > + >/* Wrapping a STRING_CST. */ >tree string_cst = build_string (4, "foo"); >ASSERT_FALSE (CAN_HAVE_LOCATION_P (string_cst)); > -- > 1.8.5.3 >
Re: [PATCH] v2 -Warray-bounds: Fix false positive in some "switch" stmts (PR tree-optimization/83510)
On Mon, Jan 22, 2018 at 9:05 PM, David Malcolm wrote: > On Fri, 2018-01-19 at 09:45 +0100, Richard Biener wrote: >> On Fri, Jan 19, 2018 at 12:36 AM, David Malcolm >> wrote: >> > PR tree-optimization/83510 reports that r255649 (for >> > PR tree-optimization/83312) introduced a false positive for >> > -Warray-bounds for array accesses within certain switch statements: >> > those for which value-ranges allow more than one case to be >> > reachable, >> > but for which one or more of the VR-unreachable cases contain >> > out-of-range array accesses. >> > >> > In the reproducer, after the switch in f is inlined into g, we have >> > 3 cases >> > for the switch (case 9, case 10-19, and default), within a loop >> > that >> > ranges from 0..9. >> > >> > With both the old and new code, >> > vr_values::simplify_switch_using_ranges clears >> > the EDGE_EXECUTABLE flag on the edge to the "case 10-19" >> > block. This >> > happens during the dom walk within the substitute_and_fold_engine. >> > >> > With the old code, the clearing of that EDGE_EXECUTABLE flag led to >> > the >> > /* Skip blocks that were found to be unreachable. */ >> > code in the old implementation of vrp_prop::check_all_array_refs >> > skipping >> > the "case 10-19" block. >> > >> > With the new code, we have a second dom walk, and that dom_walker's >> > ctor >> > sets all edges to be EDGE_EXECUTABLE, losing that information. >> > >> > Then, dom_walker::before_dom_children (here, the subclass' >> > check_array_bounds_dom_walker::before_dom_children) can return one >> > edge, if >> > there's a unique successor edge, and dom_walker::walk filters the >> > dom walk >> > to just that edge. >> > >> > Here we have two VR-valid edges (case 9 and default), and an VR- >> > invalid >> > successor edge (case 10-19). There's no *unique* valid successor >> > edge, >> > and hence taken_edge is NULL, and the filtering in dom_walker::walk >> > doesn't fire. >> > >> > Hence we've lost the filtering of the "case 10-19" BB, hence the >> > false >> > positive. >> > >> > The issue is that we have two dom walks: first within vr_values' >> > substitute_and_fold_dom_walker (which has skip_unreachable_blocks >> > == false), >> > then another within vrp_prop::check_all_array_refs (with >> > skip_unreachable_blocks == true). >> > >> > Each has different "knowledge" about ruling out edges due to value- >> > ranges, >> > but we aren't combining that information. The former "knows" about >> > out-edges at a particular control construct (e.g. at a switch), the >> > latter >> > "knows" about dominance, but only about unique successors (hence >> > the >> > problem when two out of three switch cases are valid). >> > >> > This patch combines the information by preserving the >> > EDGE_EXECUTABLE >> > flags from the first dom walk, and using it in the second dom walk, >> > potentially rejecting additional edges. >> > >> > Doing so fixes the false positive. >> > >> > I attempted an alternative fix, merging the two dom walks into one, >> > but >> > that led to crashes in identify_jump_threads, so I went with this, >> > as >> > a less invasive fix. >> > >> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. >> > OK for trunk? >> >> Ok, but I think you need to update the domwalk construction in >> graphite-scop-detection.c as well - did you test w/o graphite? > > Thanks. I did test with graphite enabled; the use of default args meant I > didn't have to touch that code. > > That said, I've become unhappy the two bool params (both defaulting > to false) for dom_walker's ctor, especially given that they're > really a tristate. > > So here's an alternative version of the patch, which, rather than adding > a new bool, instead introduces a 3-valued enum to explicitly capture the valid > possibilities. Also, having it as an enum rather than booleans makes the > meaning clearer, and makes the places that override the default more obvious. Ah, much nicer indeed. > (This version of the patch *did* require touching that graphite file). > >> grep might be your friend... > > (and indeed, with an enum, it's even more grep-friendly) > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (with graphite). > OK for trunk? > (or did you prefer the earlier patch?) Ok. Thanks, Richard. >> Thanks, >> Richard. > > [...snip...] > > > gcc/ChangeLog: > PR tree-optimization/83510 > * domwalk.c (set_all_edges_as_executable): New function. > (dom_walker::dom_walker): Convert bool param > "skip_unreachable_blocks" to enum reachability. Move setup of > edge flags to set_all_edges_as_executable and only do it when > reachability is REACHABLE_BLOCKS. > * domwalk.h (enum dom_walker::reachability): New enum. > (dom_walker::dom_walker): Convert bool param > "skip_unreachable_blocks" to enum reachability. > (set_all_edges_as_executable): New decl. > * graphite-scop-detection.c (gather_bbs::gather_bbs): Conv
[PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.
Hi All, This patch makes dg-cmp-results.sh reject the use of log files in the comparison. Often when given a log file dg-cmp-results will give incomplete/wrong output and using log instead of sum is one autocomplete tab away. Instead have the tool guard against such mistakes. Ok for trunk? Thanks, Tamar contrib/ 2018-01-23 Tamar Christina * dg-cmp-results.sh: Reject log files from comparison. -- diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index 5f2fed5ec3ff0c66d22bc07c84571568730fbcac..93461efbeff7c7d3b6051ab978cf97122dd5079d 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -61,8 +61,20 @@ esac VARIANT="$1" OFILE="$2" OBASE=`basename "$2"` +OEXT="${OBASE##*.}" NFILE="$3" NBASE=`basename "$3"` +NEXT="${NBASE##*.}" + +if test "$OEXT" = "log"; then +echo " must be a sum file instead of log file." +exit 1 +fi + +if test "$NEXT" = "log"; then +echo " must be a sum file instead of log file." +exit 1 +fi echo "dg-cmp-results.sh: Verbosity is ${verbose}, Variant is \"${VARIANT}\"" echo @@ -82,11 +94,11 @@ fi unset temp # Copy out the old file's section 0. -echo "Older log file: $OFILE" +echo "Older summary file: $OFILE" sed $E -e '/^[[:space:]]+===/,$d' $OFILE # Copy out the new file's section 0. -echo "Newer log file: $NFILE" +echo "Newer summary file: $NFILE" sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section.
Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure
On Tue, Jan 23, 2018 at 8:32 AM, Uros Bizjak wrote: > On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor wrote: >> On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak wrote: >>> >>> The default "go build" compile options over-optimize the auxiliary >>> executable, built in TestCrashDumpsAllThreads testcase >>> (libgo/go/runtime/crash_unix_test.go). This over-optimization results >>> in removal of the trivial summing loop and in the inlining of the >>> main.loop function into main.$thunk0. >>> >>> The testcase expects backtrace that shows several instances of >>> main.loop in the backtrace dump. When main.loop gets inlined into >>> thunk, its name is not dumped anymore. >>> >>> The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining. >>> >>> Patch was bootstrapped and regression tested on x86_64-linux-gnu and >>> alphev68-linux-gnu, where for the later target the patch fixed the >>> mentioned failure. >> >> That sounds like a bug somewhere. Even when one function gets inlined >> into another, its name should still be dumped. This is implemented by >> report_inlined_functions in libbacktrace/dwarf.c. While something >> like your patch may be needed, I'd like to more clearly understand why >> libbacktrace isn't working. > > If I manually compile the testcase from crash_unix_test.go wth gccgo > -O2 (the asm is attached), the main.loop function is not even present > in the dump. Digging deeper into the source, here is the problem: When inlined, the thunk reads: 0260 : 260: 00 00 bb 27 ldahgp,0(t12) 260: GPDISP .text+0x4 264: 00 00 bd 23 lda gp,0(gp) 268: 08 00 10 a6 ldq a0,8(a0) 26c: 00 00 7d a7 ldq t12,0(gp) 26c: ELF_LITERALruntime.closechan 270: f0 ff de 23 lda sp,-16(sp) 274: 00 00 5e b7 stq ra,0(sp) 278: 00 40 5b 6b jsr ra,(t12),27c 278: LITUSE .text+0x3 278: HINT runtime.closechan 27c: 00 80 5f 24 ldaht1,-32768 280: 01 05 e2 47 not t1,t0 284: 00 00 fe 2f unop 288: 1f 04 ff 47 nop 28c: 00 00 fe 2f unop 290: ff ff 21 20 lda t0,-1(t0) 294: fe ff 3f f4 bne t0,290 298: 01 05 e2 47 not t1,t0 29c: fc ff ff c3 br 290 and the (very tight) loop consists only of insns at 0x290 and 0x294. However, the assembly dump of the loop reads: $L23: lda $1,-1($1) $LBB53: $LBB51: $LBB49: $LBB47: .loc 1 13 31 bne $1,$L23 $LBE47: $LBE49: $LBE51: $LBE53: with $Ldebug_ranges0: .8byte $LBB45-$Ltext0 .8byte $LBE45-$Ltext0 .8byte $LBB50-$Ltext0 .8byte $LBE50-$Ltext0 .8byte $LBB51-$Ltext0 .8byte $LBE51-$Ltext0 .8byte 0 .8byte 0 So, those $LBB labels are at the wrong place. During runtime, we get (oh, why can't we display PC in hex?!): SIGQUIT: quit PC=4831845072 m=0 sigcode=0 goroutine 7 [running]: runtime.sighandler /space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/signal_sighandler.go:104 runtime.sigtrampgo /space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/signal_unix.go:312 runtime.sigtramp /space/homedirs/uros/gcc-svn/trunk/libgo/runtime/go-signal.c:54 runtime.kickoff /space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/proc.go:1161 0x120001ad0 000120001aa0 : 120001aa0: 02 00 bb 27 ldahgp,2(t12) 120001aa4: f0 a5 bd 23 lda gp,-23056(gp) 120001aa8: 08 00 10 a6 ldq a0,8(a0) 120001aac: c8 80 7d a7 ldq t12,-32568(gp) 120001ab0: f0 ff de 23 lda sp,-16(sp) 120001ab4: 00 00 5e b7 stq ra,0(sp) 120001ab8: 00 40 5b 6b jsr ra,(t12),120001abc 120001abc: 00 80 5f 24 ldaht1,-32768 120001ac0: 01 05 e2 47 not t1,t0 120001ac4: 00 00 fe 2f unop 120001ac8: 1f 04 ff 47 nop 120001acc: 00 00 fe 2f unop > 120001ad0: ff ff 21 20 lda t0,-1(t0) 120001ad4: fe ff 3f f4 bne t0,120001ad0 120001ad8: 01 05 e2 47 not t1,t0 120001adc: fc ff ff c3 br 120001ad0 which is obviously out of corresponding debug range. Contents of the .debug_ranges section: Offset BeginEnd 000120001a7c 000120001a8c 000120001a90 000120001a9c 000120001a7c 000120001a8c 000120001a90 000120001a9c 0030 000120001aa8 000120001ab0 0030 000120001ab8 000120001abc 0030 000120001ad4 000120001ad8 <<< this one! 0030 Uros.
Re: Fix vect_float markup for a couple of tests (PR 83888)
Richard Biener writes: > On Mon, Jan 22, 2018 at 6:55 PM, Richard Sandiford > wrote: >> vect_float is true for arm*-*-* targets, but the support is only >> available when -funsafe-math-optimizations is on. This caused >> failures in two tests that disable fast-math. >> >> The easiest fix seemed to be to add a new target selector for >> "vect_float without special options". >> >> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. >> OK to install? > > Ok - so arm float vectors are not IEEE compliant? Not for subnormals apparently: ; Note that NEON operations don't support the full IEEE 754 standard: in ; particular, denormal values are flushed to zero. This means that GCC cannot ; use those instructions for autovectorization, etc. unless ; -funsafe-math-optimizations is in effect (in which case flush-to-zero ; behavior is permissible). Thanks, Richard
Re: [PATCH PR82604]Fix regression in ftree-parallelize-loops
On Sat, Jan 20, 2018 at 4:10 PM, Bin.Cheng wrote: > On Fri, Jan 19, 2018 at 5:42 PM, Bin Cheng wrote: >> Hi, >> This patch is supposed to fix regression caused by loop distribution when >> ftree-parallelize-loops. The reason is distributed memset call can't be >> understood/analyzed in data reference analysis, as a result, parloop can >> only parallelize the innermost 2-level loop nest. Before distribution >> change, parloop can parallelize the innermost 3-level loop nest, i.e, >> more parallelization. >> As commented in the PR, ideally, loop distribution should be able to >> distribute memset call for 3-level loop nest. Unfortunately this requires >> sophisticated work proving equality between tree expressions which gcc >> is not good at now. >> Another fix is to improve data reference analysis so that memset call >> can be supported. We don't know how big this change is and it's definitely >> not GCC 8 task. >> >> So this patch fixes the regression in a bit hacking way. It first enables >> 3-level loop nest distribution when flag_tree_parloops > 1. Secondly, it >> supports 3-level loop nest distribution for ZERO-ing stmt which can only >> be distributed as a loop (nest) of memset, but can't be distributed as a >> single memset. The overall effect is ZERO-ing stmt will be distributed >> to one loop deeper than now, so parloop can parallelize as before. >> >> Bootstrap and test on x86_64 and AArch64 ongoing. Is it OK if no errors? Ok. Thanks, Richard. > Test finished without error. Also I checked > -ftree-parallelize-loops=6 on AArch64 and can confirm the regression > is resolved. > > Thanks, > bin >> >> Thanks, >> bin >> 2018-01-19 Bin Cheng >> >> PR tree-optimization/82604 >> * tree-loop-distribution.c (enum partition_kind): New enum item >> PKIND_PARTIAL_MEMSET. >> (partition_builtin_p): Support above new enum item. >> (generate_code_for_partition): Ditto. >> (compute_access_range): Differentiate cases that equality can be >> proven at all loops, the innermost loops or no loops. >> (classify_builtin_st, classify_builtin_ldst): Adjust call to above >> function. Set PKIND_PARTIAL_MEMSET for partition appropriately. >> (finalize_partitions, distribute_loop): Don't fuse partition of >> PKIND_PARTIAL_MEMSET kind when distributing 3-level loop nest. >> (prepare_perfect_loop_nest): Distribute 3-level loop nest only if >> parloop is enabled.
Fix use of boolean_true/false_node (PR 83979)
r255913 changed some constant_boolean_node calls to boolean_true_node and boolean_false_node, which meant that the returned tree didn't always have the right type. Tested on aarch64-linux-gnu. Probably bordering on obvious, but just in case: OK to install? Richard 2018-01-23 Richard Sandiford gcc/ PR tree-optimization/83979 * fold-const.c (fold_comparison): Use constant_boolean_node instead of boolean_{true,false}_node. gcc/testsuite/ PR tree-optimization/83979 * g++.dg/pr83979.c: New test. Index: gcc/fold-const.c === --- gcc/fold-const.c2018-01-16 15:13:19.643832679 + +++ gcc/fold-const.c2018-01-23 11:23:59.982555852 + @@ -8572,39 +8572,39 @@ fold_comparison (location_t loc, enum tr { case EQ_EXPR: if (known_eq (bitpos0, bitpos1)) - return boolean_true_node; + return constant_boolean_node (true, type); if (known_ne (bitpos0, bitpos1)) - return boolean_false_node; + return constant_boolean_node (false, type); break; case NE_EXPR: if (known_ne (bitpos0, bitpos1)) - return boolean_true_node; + return constant_boolean_node (true, type); if (known_eq (bitpos0, bitpos1)) - return boolean_false_node; + return constant_boolean_node (false, type); break; case LT_EXPR: if (known_lt (bitpos0, bitpos1)) - return boolean_true_node; + return constant_boolean_node (true, type); if (known_ge (bitpos0, bitpos1)) - return boolean_false_node; + return constant_boolean_node (false, type); break; case LE_EXPR: if (known_le (bitpos0, bitpos1)) - return boolean_true_node; + return constant_boolean_node (true, type); if (known_gt (bitpos0, bitpos1)) - return boolean_false_node; + return constant_boolean_node (false, type); break; case GE_EXPR: if (known_ge (bitpos0, bitpos1)) - return boolean_true_node; + return constant_boolean_node (true, type); if (known_lt (bitpos0, bitpos1)) - return boolean_false_node; + return constant_boolean_node (false, type); break; case GT_EXPR: if (known_gt (bitpos0, bitpos1)) - return boolean_true_node; + return constant_boolean_node (true, type); if (known_le (bitpos0, bitpos1)) - return boolean_false_node; + return constant_boolean_node (false, type); break; default:; } Index: gcc/testsuite/g++.dg/pr83979.c === --- /dev/null 2018-01-22 18:46:35.983712806 + +++ gcc/testsuite/g++.dg/pr83979.c 2018-01-23 11:23:59.982555852 + @@ -0,0 +1,7 @@ +/* { dg-compile } */ + +int +foo (char* p) +{ + return p + 1000 < p; +}
Re: [PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.
On 23 January 2018 11:31:27 CET, Tamar Christina wrote: >Hi All, > >This patch makes dg-cmp-results.sh reject the use of log files in the >comparison. >Often when given a log file dg-cmp-results will give incomplete/wrong >output and >using log instead of sum is one autocomplete tab away. > >Instead have the tool guard against such mistakes. +if test "$OEXT" = "log"; then +echo " must be a sum file instead of log file." +exit 1 +fi + +if test "$NEXT" = "log"; then +echo " must be a sum file instead of log file." typo: new-file +exit 1 +fi > >Ok for trunk? > >Thanks, >Tamar > >contrib/ >2018-01-23 Tamar Christina > > * dg-cmp-results.sh: Reject log files from comparison.
Remove explicit dg-do runs from gcc.dg/vect (PR 83889)
The failures in this PR were from forcing { dg-do run } even when vect.exp chooses options that are incompatible with the runtime. The default vect.exp behaviour is to execute when possible, so there's no need for a dg-do at all. The patch removes other unconditional { dg-do run }s too. Many of them were already failing in the same way. Also, the dg-do run condition in vect-reduc-or* seems unnecessary: the test should run correctly whatever happens, and the scan tests are already guarded properly. Tested on aarch64-linux-gnu, arm-non-eabi, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2018-01-23 Richard Sandiford gcc/testsuite/ PR testsuite/83889 * gcc.dg/vect/pr79920.c: Remove explicit dg-do run. * gcc.dg/vect/pr80631-1.c: Likewise. * gcc.dg/vect/pr80631-2.c: Likewise. * gcc.dg/vect/pr81410.c: Likewise. * gcc.dg/vect/pr81633.c: Likewise. * gcc.dg/vect/pr81815.c: Likewise. * gcc.dg/vect/pr82108.c: Likewise. * gcc.dg/vect/pr83857.c: Likewise. * gcc.dg/vect/vect-alias-check-8.c: Likewise. * gcc.dg/vect/vect-alias-check-9.c: Likewise. * gcc.dg/vect/vect-alias-check-10.c: Likewise. * gcc.dg/vect/vect-alias-check-11.c: Likewise. * gcc.dg/vect/vect-alias-check-12.c: Likewise. * gcc.dg/vect/vect-reduc-11.c: Likewise. * gcc.dg/vect/vect-tail-nomask-1.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-1.c: Remove dg-do run and use dg-xfail-run-if instead. * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise. * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise. * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/pr79920.c === --- gcc/testsuite/gcc.dg/vect/pr79920.c 2018-01-15 12:38:45.039094423 + +++ gcc/testsuite/gcc.dg/vect/pr79920.c 2018-01-23 11:29:38.977575495 + @@ -1,4 +1,3 @@ -/* { dg-do run } */ /* { dg-additional-options "-O3 -fno-fast-math" } */ #include "tree-vect.h" Index: gcc/testsuite/gcc.dg/vect/pr80631-1.c === --- gcc/testsuite/gcc.dg/vect/pr80631-1.c 2018-01-13 17:59:52.122334084 + +++ gcc/testsuite/gcc.dg/vect/pr80631-1.c 2018-01-23 11:29:38.977575495 + @@ -1,5 +1,4 @@ /* PR tree-optimization/80631 */ -/* { dg-do run } */ #include "tree-vect.h" Index: gcc/testsuite/gcc.dg/vect/pr80631-2.c === --- gcc/testsuite/gcc.dg/vect/pr80631-2.c 2017-12-14 00:04:52.323446529 + +++ gcc/testsuite/gcc.dg/vect/pr80631-2.c 2018-01-23 11:29:38.977575495 + @@ -1,5 +1,4 @@ /* PR tree-optimization/80631 */ -/* { dg-do run } */ #include "tree-vect.h" Index: gcc/testsuite/gcc.dg/vect/pr81410.c === --- gcc/testsuite/gcc.dg/vect/pr81410.c 2017-07-27 10:37:55.334036950 +0100 +++ gcc/testsuite/gcc.dg/vect/pr81410.c 2018-01-23 11:29:38.977575495 + @@ -1,4 +1,3 @@ -/* { dg-do run } */ /* { dg-require-effective-target vect_long_long } */ #include "tree-vect.h" Index: gcc/testsuite/gcc.dg/vect/pr81633.c === --- gcc/testsuite/gcc.dg/vect/pr81633.c 2017-08-03 10:40:54.014105333 +0100 +++ gcc/testsuite/gcc.dg/vect/pr81633.c 2018-01-23 11:29:38.977575495 + @@ -1,5 +1,3 @@ -/* { dg-do run } */ - static double identity[4][4] = {{1, 0, 0, 0}, {0, 1, 0, 0}, {0, 0, 1, 0}, Index: gcc/testsuite/gcc.dg/vect/pr81815.c === --- gcc/testsuite/gcc.dg/vect/pr81815.c 2017-08-16 08:50:54.197549943 +0100 +++ gcc/testsuite/gcc.dg/vect/pr81815.c 2018-01-23 11:29:38.978575453 + @@ -1,5 +1,3 @@ -/* { dg-do run } */ - int __attribute__ ((noinline, noclone)) f (int *x, int n) { Index: gcc/testsuite/gcc.dg/vect/pr82108.c === --- gcc/testsuite/gcc.dg/vect/pr82108.c 2017-09-06 20:47:38.380589062 +0100 +++ gcc/testsuite/gcc.dg/vect/pr82108.c 2018-01-23 11:29:38.978575453 + @@ -1,4 +1,3 @@ -/* { dg-do run } */ /* { dg-require-effective-target vect_float } */ #include "tree-vect.h" Index: gcc/testsuite/gcc.dg/vect/pr83857.c === --- gcc/testsuite/gcc.dg/vect/pr83857.c 2018-01-16 15:13:24.937622282 + +++ gcc/testsuite/gcc.dg/vect/pr83857.c 2018-01-23 11:29:38.978575453 + @@ -1,4 +1,3 @@ -/* { dg-do run } */ /* { dg-additional-options "-ffast-math" } */ #define N 100 Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c =
Re: [PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.
The 01/23/2018 11:30, Bernhard Reutner-Fischer wrote: > On 23 January 2018 11:31:27 CET, Tamar Christina > wrote: > >Hi All, > > > >This patch makes dg-cmp-results.sh reject the use of log files in the > >comparison. > >Often when given a log file dg-cmp-results will give incomplete/wrong > >output and > >using log instead of sum is one autocomplete tab away. > > > >Instead have the tool guard against such mistakes. > > +if test "$OEXT" = "log"; then > +echo " must be a sum file instead of log file." > +exit 1 > +fi > + > +if test "$NEXT" = "log"; then > +echo " must be a sum file instead of log file." > > typo: new-file > Thanks, Typo fixed. > +exit 1 > +fi > > > > >Ok for trunk? > > > >Thanks, > >Tamar > > > >contrib/ > >2018-01-23 Tamar Christina > > > > * dg-cmp-results.sh: Reject log files from comparison. > -- diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index 5f2fed5ec3ff0c66d22bc07c84571568730fbcac..d1a7332660c35ca008d8e0d06eba8842758032c5 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -61,8 +61,20 @@ esac VARIANT="$1" OFILE="$2" OBASE=`basename "$2"` +OEXT="${OBASE##*.}" NFILE="$3" NBASE=`basename "$3"` +NEXT="${NBASE##*.}" + +if test "$OEXT" = "log"; then +echo " must be a sum file instead of log file." +exit 1 +fi + +if test "$NEXT" = "log"; then +echo " must be a sum file instead of log file." +exit 1 +fi echo "dg-cmp-results.sh: Verbosity is ${verbose}, Variant is \"${VARIANT}\"" echo @@ -82,11 +94,11 @@ fi unset temp # Copy out the old file's section 0. -echo "Older log file: $OFILE" +echo "Older summary file: $OFILE" sed $E -e '/^[[:space:]]+===/,$d' $OFILE # Copy out the new file's section 0. -echo "Newer log file: $NFILE" +echo "Newer summary file: $NFILE" sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section.
Re: [PATCH 4/5] Remove predictors that are unrealiable.
On 01/23/2018 11:02 AM, Jan Hubicka wrote: >> On 01/19/2018 12:57 PM, Martin Liška wrote: >>> Yes, there's a huge difference in between CPU 2006 and 2017. Former has 63% >>> w/ dominant edges, >>> and later one only 11%. It's caused by these 2 benchmarks with a high >>> coverage: >>> >> >> Hi. >> >> I'm sending details about the 2 edges that influence the statistics >> significantly: >> >>> 500.perlbench_r: regexec.c.065i.profile: >>> negative return heuristics of edge 1368->1370: 2.0% exec 2477714850 hit >>> 2429863555 (98.1%) >> >> 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp) >> -: 3485:{ >> 2477714850: 3486:CHECKPOINT lastcp; >> 2477714850: 3487:REGEXP *const rx = reginfo->prog; >> 2477714850: 3488:regexp *const prog = ReANY(rx); >> 2477714850: 3489:SSize_t result; >> [snip] >> 2477714850: 8046:assert(!result || locinput - reginfo->strbeg >= 0); >> 2477714850: 8047:return result ? locinput - reginfo->strbeg : -1; >> -: 8048:} >> >> As seen it return -1 if a regex is not found, which is in case of perlbench >> very likely branch. >> >>> >>> and 523.xalancbmk_r: >>> build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile: >>> negative return heuristics of edge 3->4: 2.0% exec 1221735072 hit >>> 1221522453 (100.0%) >> >> 1221735072: 74:int NameDatatypeValidator::compare(const XMLCh* const lValue >> -: 75: , const XMLCh* const >> rValue >> -: 76: , MemoryManager* >> const) >> -: 77:{ >> 1221735072: 78:return ( XMLString::equals(lValue, rValue)? 0 : -1); >> -: 79:} >> -: 80: >> >> IP profile dump file: >> >> xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator * >> const this, const XMLCh * const lValue, const XMLCh * const rValue, struct >> MemoryManager * const D.17157) >> { >> bool _1; >> int iftmp.0_2; >> >>[count: 1221735072]: >> # DEBUG BEGIN_STMT >> _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D)); >> if (_1 != 0) >> goto ; [0.02%] >> else >> goto ; [99.98%] >> >>[count: 1221522453]: >> >>[count: 1221735072]: >> # iftmp.0_2 = PHI <0(2), -1(3)> >> return iftmp.0_2; >> >> } >> >> Likewise, XML values are more commonly non-equal. >> Ok, so may I mark also negative return with PROB_EVEN to track it? > > Well, if we start disabling predictors just because we can find benchmark > where they do not > perform as expected, we will need to disable them all. It is nature of > heuristics to make > mistakes, just they should do something useful for common code. > It is not goal to make heuristics that makes no mistake. > Heuristics is useful either if it have high precision even if it cover few > branches (say noreturn), or if it have large coverage and still some hitrate > (say opcode, > call or return based heuristics). > > To make things bit more esoteric, in practice this is also bit weighted by > fact > how much damage bad prediction does. For example, call heuristics which > predict > non-call path to be taken is likely going to do little damage. Even if call > path is common it is heavy and hard to optimize by presence of call and thus > we > don't loose that much in common scenarios. > > In the second case: > 00073 // > --- > 00074 // Compare methods > 00075 // > --- > 00076 int NameDatatypeValidator::compare(const XMLCh* const lValue > 00077, const XMLCh* const rValue > 00078, MemoryManager* const) > 00079 { > 00080 return ( XMLString::equals(lValue, rValue)? 0 : -1); > 00081 } > > I would say it is odd coding style. I do not see why it is not declared > bool and not returning true/false. > > First case seems bit non-standard too as it looks like usual cache lookup just > the cache frequently fails. > > I would be in favor of keeping the prediction with non-50% hitrate thus unless > we run into some performance problems. > If there are only two benchmarks out of say 20 we tried (in spec2000 and 2k17) > it seems fine to me. > > There is issue with the way we collect our data. Using arithmetic mean across > spec is optimizing for overall runtime of all spec benchmarks combined > together. We more want to look for safer values that do well for average > benchmarks independently how many branches they do. We could consider making > the statistics script also collect geometric averages across different > benchmarks. If we will see big difference between arithmetic mean and geomean > we will know there is small subset of benchmarks which dominates and we could > decide what to do with the probability. Hello. I fully agree that proper way to do statistics is to do a we
[C++ PATCH] Deprecate ARM-era for scopes
As discussed (https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01778.html) this patch deprecates the ARM-era for scope. a) in c++98 mode with -fpermissive, there's now a deprecation note when we fix up something like for (int i = ...) {} ... i ... // out of scope use of i b) -fno-for-scope gives a deprecated warning I noticed that the flag showed signs of being tri-valued at some point, but it is no longer (I suspect at some point we only attempted #a if neither sense of -ffor-scope was given). Cleaned that up. Applying to trunk. nathan -- Nathan Sidwell 2018-01-23 Nathan Sidwell gcc/cp/ Deprecate ARM-era for scope handling * decl.c (poplevel): Flag_new_for_scope is a boolean-like. (cxx_init_decl_processing): Deprecate flag_new_for_scope being cleared. * name-lookup.c (check_for_out_of_scope_variable): Deprecate and cleanup handling. * semantics.c (begin_for_scope): Flag_new_for_scope is boolean-like. (finish_for_stmt, begin_range_for_stmt): Likewise. gcc/ * doc/invoke.texi (ffor-scope): Deprecate. gcc/cp/ * g++.dg/cpp0x/range-for10.C: Adjust. * g++.dg/ext/forscope1.C: Adjust. * g++.dg/ext/forscope2.C: Adjust. * g++.dg/template/for1.C: Adjust. Index: cp/decl.c === --- cp/decl.c (revision 256951) +++ cp/decl.c (working copy) @@ -644,7 +644,7 @@ poplevel (int keep, int reverse, int fun in a init statement were in scope after the for-statement ended. We only use the new rules if flag_new_for_scope is nonzero. */ leaving_for_scope -= current_binding_level->kind == sk_for && flag_new_for_scope == 1; += current_binding_level->kind == sk_for && flag_new_for_scope; /* Before we remove the declarations first check for unused variables. */ if ((warn_unused_variable || warn_unused_but_set_variable) @@ -4094,6 +4094,8 @@ cxx_init_decl_processing (void) pop_namespace (); flag_noexcept_type = (cxx_dialect >= cxx17); + if (!flag_new_for_scope) +warning (OPT_Wdeprecated, "%<-fno-for-scope%> is deprecated"); c_common_nodes_and_builtins (); Index: cp/name-lookup.c === --- cp/name-lookup.c (revision 256951) +++ cp/name-lookup.c (working copy) @@ -3231,7 +3231,9 @@ push_local_binding (tree id, tree decl, standard. If so, issue an error message. If name lookup would work in both cases, but return a different result, this function returns the result of ANSI/ISO lookup. Otherwise, it returns - DECL. */ + DECL. + + FIXME: Scheduled for removal after GCC-8 is done. */ tree check_for_out_of_scope_variable (tree decl) @@ -3252,16 +3254,16 @@ check_for_out_of_scope_variable (tree de shadowed = find_namespace_value (current_namespace, DECL_NAME (decl)); if (shadowed) { - if (!DECL_ERROR_REPORTED (decl)) + if (!DECL_ERROR_REPORTED (decl) + && flag_permissive + && warning (0, "name lookup of %qD changed", DECL_NAME (decl))) { - warning (0, "name lookup of %qD changed", DECL_NAME (decl)); - warning_at (DECL_SOURCE_LOCATION (shadowed), 0, - " matches this %qD under ISO standard rules", - shadowed); - warning_at (DECL_SOURCE_LOCATION (decl), 0, - " matches this %qD under old rules", decl); - DECL_ERROR_REPORTED (decl) = 1; + inform (DECL_SOURCE_LOCATION (shadowed), + "matches this %qD under ISO standard rules", shadowed); + inform (DECL_SOURCE_LOCATION (decl), + " matches this %qD under old rules", decl); } + DECL_ERROR_REPORTED (decl) = 1; return shadowed; } @@ -3279,26 +3281,25 @@ check_for_out_of_scope_variable (tree de { error ("name lookup of %qD changed for ISO % scoping", DECL_NAME (decl)); - error (" cannot use obsolete binding at %q+D because " - "it has a destructor", decl); + inform (DECL_SOURCE_LOCATION (decl), + "cannot use obsolete binding %qD because it has a destructor", + decl); return error_mark_node; } else { - permerror (input_location, "name lookup of %qD changed for ISO % scoping", + permerror (input_location, + "name lookup of %qD changed for ISO % scoping", DECL_NAME (decl)); if (flag_permissive) -permerror (DECL_SOURCE_LOCATION (decl), - " using obsolete binding at %qD", decl); - else - { - static bool hint; - if (!hint) - { - inform (input_location, "(if you use %<-fpermissive%> G++ will accept your code)"); - hint = true; - } - } +inform (DECL_SOURCE_LOCATION (decl), + "using obsolete binding %qD", decl); + static bool hint; + if (!hint) + inform (input_location, flag_permissive + ? "this flexibility is deprecated and will be removed" + : "if you use %<-fpermissive%> G++ will accept your code"); + hint = true; } return decl; Index: cp/semantics.c ==
Re: [PATCH] Clean-up IPA profile dump output.
On 01/23/2018 10:43 AM, Jan Hubicka wrote: >> Hi. >> >> I'm aware in which development stage we are. However the patch is small and >> makes >> dump files readable. Hope such patch can be accepted even now? >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Martin >> >> gcc/ChangeLog: >> >> 2018-01-22 Martin Liska >> >> * tree-profile.c (tree_profiling): Print function header to >> aware reader which function we are working on. >> * value-prof.c (gimple_find_values_to_profile): Do not print >> not interesting value histograms. > > OK. How those non-interesting value histograms arrise? Can happen if you have missing profile, then no value histograms are loaded. Thanks for review, let me install it. Martin > > Honza >> --- >> gcc/tree-profile.c | 4 >> gcc/value-prof.c | 2 +- >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> > >> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c >> index 9d919062db1..f96bd4b9704 100644 >> --- a/gcc/tree-profile.c >> +++ b/gcc/tree-profile.c >> @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see >> #include "params.h" >> #include "stringpool.h" >> #include "attribs.h" >> +#include "tree-pretty-print.h" >> >> static GTY(()) tree gcov_type_node; >> static GTY(()) tree tree_interval_profiler_fn; >> @@ -671,6 +672,9 @@ tree_profiling (void) >> >>push_cfun (DECL_STRUCT_FUNCTION (node->decl)); >> >> + if (dump_file) >> +dump_function_header (dump_file, cfun->decl, dump_flags); >> + >>/* Local pure-const may imply need to fixup the cfg. */ >>if (execute_fixup_cfg () & TODO_cleanup_cfg) >> cleanup_tree_cfg (); >> diff --git a/gcc/value-prof.c b/gcc/value-prof.c >> index b503320f188..16cdbd64f46 100644 >> --- a/gcc/value-prof.c >> +++ b/gcc/value-prof.c >> @@ -2053,7 +2053,7 @@ gimple_find_values_to_profile (histogram_values >> *values) >> default: >>gcc_unreachable (); >> } >> - if (dump_file) >> + if (dump_file && hist->hvalue.stmt != NULL) >> { >>fprintf (dump_file, "Stmt "); >>print_gimple_stmt (dump_file, hist->hvalue.stmt, 0, TDF_SLIM); >> >
[PATCH] Handle trailing arrays in ODR warning (PR lto/81440).
Hi. This removes false positive warning when having trailing array at the end of a struct. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin gcc/lto/ChangeLog: 2018-01-23 Martin Liska PR lto/81440 * lto-symtab.c (lto_symtab_merge): Handle and do not warn about trailing arrays at the end of a struct. gcc/testsuite/ChangeLog: 2018-01-23 Martin Liska PR lto/81440 * gcc.dg/lto/pr81440.h: New test. * gcc.dg/lto/pr81440_0.c: New test. * gcc.dg/lto/pr81440_1.c: New test. --- gcc/lto/lto-symtab.c | 25 +++-- gcc/testsuite/gcc.dg/lto/pr81440.h | 4 gcc/testsuite/gcc.dg/lto/pr81440_0.c | 9 + gcc/testsuite/gcc.dg/lto/pr81440_1.c | 6 ++ 4 files changed, 38 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr81440.h create mode 100644 gcc/testsuite/gcc.dg/lto/pr81440_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr81440_1.c diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c index ee02a534714..0f0b958e98d 100644 --- a/gcc/lto/lto-symtab.c +++ b/gcc/lto/lto-symtab.c @@ -352,18 +352,31 @@ lto_symtab_merge (symtab_node *prevailing, symtab_node *entry) return false; if (DECL_SIZE (decl) && DECL_SIZE (prevailing_decl) - && !tree_int_cst_equal (DECL_SIZE (decl), DECL_SIZE (prevailing_decl)) + && !tree_int_cst_equal (DECL_SIZE (decl), DECL_SIZE (prevailing_decl))) + { + if (!DECL_COMMON (decl) && !DECL_EXTERNAL (decl)) + return false; + + tree type = TREE_TYPE (decl); + + /* For record type, check for array at the end of the structure. */ + if (TREE_CODE (type) == RECORD_TYPE) + { + tree field = TYPE_FIELDS (type); + while (DECL_CHAIN (field) != NULL_TREE) + field = DECL_CHAIN (field); + + return TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE; + } /* As a special case do not warn about merging int a[]; and int a[]={1,2,3}; here the first declaration is COMMON and sizeof(a) == sizeof (int). */ - && ((!DECL_COMMON (decl) && !DECL_EXTERNAL (decl)) - || TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE - || TYPE_SIZE (TREE_TYPE (decl)) - != TYPE_SIZE (TREE_TYPE (TREE_TYPE (decl) -return false; + else if (TREE_CODE (type) == ARRAY_TYPE) + return (TYPE_SIZE (decl) == TYPE_SIZE (TREE_TYPE (type))); + } return true; } diff --git a/gcc/testsuite/gcc.dg/lto/pr81440.h b/gcc/testsuite/gcc.dg/lto/pr81440.h new file mode 100644 index 000..d9e6c3da645 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr81440.h @@ -0,0 +1,4 @@ +typedef struct { + int i; + int ints[]; +} struct_t; diff --git a/gcc/testsuite/gcc.dg/lto/pr81440_0.c b/gcc/testsuite/gcc.dg/lto/pr81440_0.c new file mode 100644 index 000..07f2a87da21 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr81440_0.c @@ -0,0 +1,9 @@ +/* { dg-lto-do link } */ + +#include "pr81440.h" + +extern struct_t my_struct; + +int main() { + return my_struct.ints[0]; +} diff --git a/gcc/testsuite/gcc.dg/lto/pr81440_1.c b/gcc/testsuite/gcc.dg/lto/pr81440_1.c new file mode 100644 index 000..d03533029c1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr81440_1.c @@ -0,0 +1,6 @@ +#include "pr81440.h" + +struct_t my_struct = { + 20, + { 1, 2 } +};
Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure
On Tue, Jan 23, 2018 at 11:38 AM, Uros Bizjak wrote: > On Tue, Jan 23, 2018 at 8:32 AM, Uros Bizjak wrote: >> On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor wrote: >>> On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak wrote: The default "go build" compile options over-optimize the auxiliary executable, built in TestCrashDumpsAllThreads testcase (libgo/go/runtime/crash_unix_test.go). This over-optimization results in removal of the trivial summing loop and in the inlining of the main.loop function into main.$thunk0. The testcase expects backtrace that shows several instances of main.loop in the backtrace dump. When main.loop gets inlined into thunk, its name is not dumped anymore. The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining. Patch was bootstrapped and regression tested on x86_64-linux-gnu and alphev68-linux-gnu, where for the later target the patch fixed the mentioned failure. >>> >>> That sounds like a bug somewhere. Even when one function gets inlined >>> into another, its name should still be dumped. This is implemented by >>> report_inlined_functions in libbacktrace/dwarf.c. While something >>> like your patch may be needed, I'd like to more clearly understand why >>> libbacktrace isn't working. >> >> If I manually compile the testcase from crash_unix_test.go wth gccgo >> -O2 (the asm is attached), the main.loop function is not even present >> in the dump. > > Digging deeper into the source, here is the problem: [...] > So, those $LBB labels are at the wrong place. During runtime, we get > (oh, why can't we display PC in hex?!): Following testcase: --cut here-- package main func loop(i int, c chan bool) { close(c) for { for j := 0; j < 0x7fff; j++ { } } } --cut here-- can be compiled with a crosscompiler to alpha-linux-gnu: ~/gcc-build-alpha/gcc/go1 -O2 -fkeep-static-functions l.go to generate assembly, where $LBB2 is placed after "lda" insn. main.loop: ... $L2: lda $1,-1($1) $LBB2: $LM6: bne $1,$L2 br $31,$L3 $LBE2: ... and in ._final, we can see that (note 52) is in wrong place. It should be placed in front of (insn 13). (code_label 14 6 12 2 (nil) [1 uses]) (note 12 14 13 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn:TI 13 12 52 (set (reg:DI 1 $1 [orig:70 ivtmp_1 ] [70]) (plus:DI (reg:DI 1 $1 [orig:70 ivtmp_1 ] [70]) (const_int -1 [0x]))) 7 {*adddi_internal} (nil)) (note 52 13 15 0x7f6b5f2791e0 NOTE_INSN_BLOCK_BEG) (jump_insn:TI 15 52 45 (set (pc) (if_then_else (ne (reg:DI 1 $1 [orig:70 ivtmp_1 ] [70]) (const_int 0 [0])) (label_ref:DI 14) (pc))) "l.go":6 169 {*bcc_normal} (int_list:REG_BR_PROB 1062895956 (nil)) -> 14) (note 45 15 46 [bb 5] NOTE_INSN_BASIC_BLOCK) (jump_insn:TI 46 45 47 (set (pc) (label_ref 16)) 200 {jump} (nil) -> 16) (barrier 47 46 32) (note 32 47 54 NOTE_INSN_DELETED) (note 54 32 0 0x7f6b5f2791e0 NOTE_INSN_BLOCK_END) I will open a PR: Uros.
Re: [PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust
Hi Kyrill, On 22 January 2018 at 11:48, Kyrill Tkachov wrote: > Hi all, > > This test has needlessly restrictive requirements. It tries to force a > soft-float target and tries to run. > This makes it unsupportable for any non-soft-float variant. > In fact, the test can be a run-time test for any target, and only the > scan-assembler tests are specific to > -mfloat-abi=soft. So this patch makes the test always runnable and makes the > scan-assembler checks predicable > on the the new arm_sotftfloat effective target check. > Unfortunately, the test now fails on armeb-linux-gnueabihf (it used to be unsupported). my logs only show: qemu: uncaught target signal 11 (Segmentation fault) - core dumped on arm-linux-gnueabihf, it's OK (the test was unsupported, but now passes) Christophe > Committing to trunk. > Thanks, > Kyrill > > 2018-01-22 Kyrylo Tkachov > > * doc/sourcebuild.texi (arm_softfloat): Document. > > 2018-01-22 Kyrylo Tkachov > > * lib/target-supports.exp (check_effective_target_arm_softfloat): > New procedure. > * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere. > Adjust scan-assembler checks for soft-float.
Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure
On Tue, Jan 23, 2018 at 1:54 PM, Uros Bizjak wrote: > On Tue, Jan 23, 2018 at 11:38 AM, Uros Bizjak wrote: >> On Tue, Jan 23, 2018 at 8:32 AM, Uros Bizjak wrote: >>> On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor wrote: On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak wrote: > > The default "go build" compile options over-optimize the auxiliary > executable, built in TestCrashDumpsAllThreads testcase > (libgo/go/runtime/crash_unix_test.go). This over-optimization results > in removal of the trivial summing loop and in the inlining of the > main.loop function into main.$thunk0. > > The testcase expects backtrace that shows several instances of > main.loop in the backtrace dump. When main.loop gets inlined into > thunk, its name is not dumped anymore. > > The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining. > > Patch was bootstrapped and regression tested on x86_64-linux-gnu and > alphev68-linux-gnu, where for the later target the patch fixed the > mentioned failure. That sounds like a bug somewhere. Even when one function gets inlined into another, its name should still be dumped. This is implemented by report_inlined_functions in libbacktrace/dwarf.c. While something like your patch may be needed, I'd like to more clearly understand why libbacktrace isn't working. >>> >>> If I manually compile the testcase from crash_unix_test.go wth gccgo >>> -O2 (the asm is attached), the main.loop function is not even present >>> in the dump. >> >> Digging deeper into the source, here is the problem: > > [...] > >> So, those $LBB labels are at the wrong place. During runtime, we get >> (oh, why can't we display PC in hex?!): > > Following testcase: > > --cut here-- > package main > > func loop(i int, c chan bool) { > close(c) > for { > for j := 0; j < 0x7fff; j++ { > } > } > } > --cut here-- > > can be compiled with a crosscompiler to alpha-linux-gnu: > > ~/gcc-build-alpha/gcc/go1 -O2 -fkeep-static-functions l.go > > to generate assembly, where $LBB2 is placed after "lda" insn. [...] > I will open a PR: PR 83992 [1]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83992 Uros.
Re: [PATCH 1/2] Introduce prefetch-minimum stride option
Hi Kyrill, On 01/23/2018 07:32 AM, Kyrill Tkachov wrote: Hi Luis, On 22/01/18 13:46, Luis Machado wrote: This patch adds a new option to control the minimum stride, for a memory reference, after which the loop prefetch pass may issue software prefetch hints for. There are two motivations: * Make the pass less aggressive, only issuing prefetch hints for bigger strides that are more likely to benefit from prefetching. I've noticed a case in cpu2017 where we were issuing thousands of hints, for example. I've noticed a large amount of prefetch hints being issued as well, but had not analysed it further. I've gathered some numbers for this. Some of the most extreme cases before both patches: CPU2017 xalancbmk_s: 3755 hints wrf_s: 10950 hints parest_r: 8521 hints CPU2006 gamess: 11377 hints wrf: 3238 hints After both patches: CPU2017 xalancbmk_s: 1 hint wrf_s: 20 hints parest_r: 0 hints CPU2006 gamess: 44 hints wrf: 16 hints * For processors that have a hardware prefetcher, like Falkor, it allows the loop prefetch pass to defer prefetching of smaller (less than the threshold) strides to the hardware prefetcher instead. This prevents conflicts between the software prefetcher and the hardware prefetcher. I've noticed considerable reduction in the number of prefetch hints and slightly positive performance numbers. This aligns GCC and LLVM in terms of prefetch behavior for Falkor. Do you, by any chance, have a link to the LLVM review that implemented that behavior? It's okay if you don't, but I think it would be useful context. I've dug it up. The base change was implemented here: review: https://reviews.llvm.org/D17945 RFC: http://lists.llvm.org/pipermail/llvm-dev/2015-December/093514.html And then target-specific changes were introduced later for specific processors. One small difference in LLVM is the fact that the second parameter, prefetching of non-constant strides, is implicitly switched off if one sets the minimum stride length. My approach here makes that second parameter adjustable. I've seen big gains due to prefetching of non-constant strides, but it tends to be tricky to control and usually comes together with significant regressions as well. The fact that we potentially unroll loops along with issuing prefetch hints also makes things a bit erratic. The default settings should guarantee no changes for existing targets. Those are free to tweak the settings as necessary. No regressions in the testsuite and bootstrapped ok on aarch64-linux. Ok? Are there any benchmark numbers you can share? I think this approach is sensible. Comparing the previous, more aggressive, pass behavior with the new one i've seen a slight improvement for CPU2006, 0.15% for both INT and FP. For CPU2017 the previous behavior was actually a bit harmful, regressing performance by about 1.2% in intspeed. The new behavior kept intspeed stable and slightly improved fpspeed by 0.15%. The motivation for the future is to have better control of software prefetching so we can fine-tune the pass, either through generic loop prefetch code or by using the target-specific parameters. Since your patch touches generic code as well as AArch64 code you'll need an approval from a midend maintainer as well as an AArch64 maintainer. Also, GCC development is now in the regression fixing stage, so unless this fixes a regression it may have to wait until GCC 9 development is opened. That is my understanding. I thought i'd put this up for review anyway so people can chime in and provide their thoughts. Thanks for the review. Luis
Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
Hi, On 01/23/2018 07:40 AM, Kyrill Tkachov wrote: Hi Luis, On 22/01/18 13:46, Luis Machado wrote: The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? This also looks like a sensible approach to me with a caveat inline. The same general comments as for patch [1/2] apply. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; I understand that the midend PARAMs are defined as integers, but I think the backend tuning option here is better represented as a boolean as it really is just a yes/no decision. I started off with a boolean to be honest. Then i noticed the midend only used integers, which i restricted to the range of 0..1. I'll change this locally to use booleans again. Thanks! Luis
[PATCH] Fix PR82819
This fixes a GRAPHITE code-generation issue by eliding ISL AST plus for large power-of-two values that don't affect the result. I intentionally didn't extend this to other values with the same property as I'd like to see testcases. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2018-01-23 Richard Biener PR tree-optimization/82819 * graphite-isl-ast-to-gimple.c (binary_op_to_tree): Avoid code generating pluses that are no-ops in the target precision. * gcc.dg/graphite/pr82819.c: New testcase. Index: gcc/graphite-isl-ast-to-gimple.c === --- gcc/graphite-isl-ast-to-gimple.c(revision 256977) +++ gcc/graphite-isl-ast-to-gimple.c(working copy) @@ -326,7 +326,8 @@ binary_op_to_tree (tree type, __isl_take /* From our constraint generation we may get modulo operations that we cannot represent explicitely but that are no-ops for TYPE. Elide those. */ - if (expr_type == isl_ast_op_pdiv_r + if ((expr_type == isl_ast_op_pdiv_r + || expr_type == isl_ast_op_add) && isl_ast_expr_get_type (arg_expr) == isl_ast_expr_int && (wi::exact_log2 (widest_int_from_isl_expr_int (arg_expr)) >= TYPE_PRECISION (type))) Index: gcc/testsuite/gcc.dg/graphite/pr82819.c === --- gcc/testsuite/gcc.dg/graphite/pr82819.c (nonexistent) +++ gcc/testsuite/gcc.dg/graphite/pr82819.c (working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -floop-nest-optimize" } */ + +short int *ts; + +void +c2 (unsigned long long int s4, int ns) +{ + short int *b2 = (short int *)&ns; + + while (ns != 0) +{ + int xn; + + for (xn = 0; xn < 3; ++xn) + for (*b2 = 0; *b2 < 2; ++*b2) + s4 += xn; + if (s4 != 0) + b2 = ts; + ++ns; +} +}
[PATCH] PR 78534 Reinstate better string copy algorithm
As part of the change to larger character lengths, the string copy algorithm was temporarily pessimized to get around some spurious -Wstringop-overflow warnings. Having tried a number of variations of this algorithm I have managed to get it down to one spurious warning, only with -O1 optimization, in the testsuite. This patch reinstates the optimized variant and modifies this one testcase to ignore the warning. Regtested on x86_64-pc-linux-gnu, Ok for trunk? gcc/fortran/ChangeLog: 2018-01-23 Janne Blomqvist PR 78534 * trans-expr.c (fill_with_spaces): Use memset instead of generating loop. (gfc_trans_string_copy): Improve opportunity to use builtins with constant lengths. gcc/testsuite/ChangeLog: 2018-01-23 Janne Blomqvist PR 78534 * gfortran.dg/allocate_deferred_char_scalar_1.f03: Prune -Wstringop-overflow warnings due to spurious warning with -O1. * gfortran.dg/char_cast_1.f90: Update dump scan pattern. * gfortran.dg/transfer_intrinsic_1.f90: Likewise. --- gcc/fortran/trans-expr.c | 52 -- .../allocate_deferred_char_scalar_1.f03| 2 + gcc/testsuite/gfortran.dg/char_cast_1.f90 | 6 +-- gcc/testsuite/gfortran.dg/transfer_intrinsic_1.f90 | 2 +- 4 files changed, 35 insertions(+), 27 deletions(-) diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index e90036f..4da6cee 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6407,8 +6407,6 @@ fill_with_spaces (tree start, tree type, tree size) tree i, el, exit_label, cond, tmp; /* For a simple char type, we can call memset(). */ - /* TODO: This code does work and is potentially more efficient, but - causes spurious -Wstringop-overflow warnings. if (compare_tree_int (TYPE_SIZE_UNIT (type), 1) == 0) return build_call_expr_loc (input_location, builtin_decl_explicit (BUILT_IN_MEMSET), @@ -6416,7 +6414,6 @@ fill_with_spaces (tree start, tree type, tree size) build_int_cst (gfc_get_int_type (gfc_c_int_kind), lang_hooks.to_target_charset (' ')), fold_convert (size_type_node, size)); - */ /* Otherwise, we use a loop: for (el = start, i = size; i > 0; el--, i+= TYPE_SIZE_UNIT (type)) @@ -6522,11 +6519,20 @@ gfc_trans_string_copy (stmtblock_t * block, tree dlength, tree dest, /* The string copy algorithm below generates code like - if (dlen > 0) { - memmove (dest, src, min(dlen, slen)); - if (slen < dlen) - memset(&dest[slen], ' ', dlen - slen); - } + if (destlen > 0) + { + if (srclen < destlen) + { + memmove (dest, src, srclen); + // Pad with spaces. + memset (&dest[srclen], ' ', destlen - srclen); + } + else + { + // Truncate if too long. + memmove (dest, src, destlen); + } + } */ /* Do nothing if the destination length is zero. */ @@ -6555,21 +6561,16 @@ gfc_trans_string_copy (stmtblock_t * block, tree dlength, tree dest, else src = gfc_build_addr_expr (pvoid_type_node, src); - /* First do the memmove. */ - tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen, - slen); - tmp2 = build_call_expr_loc (input_location, - builtin_decl_explicit (BUILT_IN_MEMMOVE), - 3, dest, src, - fold_convert (size_type_node, tmp2)); - stmtblock_t tmpblock2; - gfc_init_block (&tmpblock2); - gfc_add_expr_to_block (&tmpblock2, tmp2); - - /* If the destination is longer, fill the end with spaces. */ + /* Truncate string if source is too long. */ cond2 = fold_build2_loc (input_location, LT_EXPR, logical_type_node, slen, dlen); + /* Copy and pad with spaces. */ + tmp3 = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_MEMMOVE), + 3, dest, src, + fold_convert (size_type_node, slen)); + /* Wstringop-overflow appears at -O3 even though this warning is not explicitly available in fortran nor can it be switched off. If the source length is a constant, its negative appears as a very large @@ -6584,14 +6585,19 @@ gfc_trans_string_copy (stmtblock_t * block, tree dlength, tree dest, tmp4 = fill_with_spaces (tmp4, chartype, tmp); gfc_init_block (&tempblock); + gfc_add_expr_to_block (&tempblock, tmp3); gfc_add_expr_to_block (&tempblock, tmp4); tmp3 = gfc_finish_block (&tempblock); + /* The truncated memmove if the slen >= dlen. */ + tmp2 = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_MEMMOVE)
Re: [C++ PATCH] Fix a structured binding ICE (PR c++/83958)
On 01/22/2018 06:06 PM, Jakub Jelinek wrote: Hi! I've recently added the complete_type call in case the structured binding is a reference to a type that needs to be instantiated. This testcase shows a problem where it can't be instantiated and we ICE because we don't expect an incomplete type. If decl isn't a reference, cp_finish_decl for it should handle everything. I've considered using complete_type_or_else, but we aren't checking that decl's type is complete, but rather that the type it refers to is complete, and furthermore not sure if it is very nice to print in the diagnostics. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-22 Jakub Jelinek PR c++/83958 * decl.c (cp_finish_decomp): Diagnose if reference structure binding refers to incomplete type. * g++.dg/cpp1z/decomp35.C: New test. ok, thanks -- Nathan Sidwell
Re: [PATCH 4/5] Remove predictors that are unrealiable.
> On 01/23/2018 11:02 AM, Jan Hubicka wrote: > >> On 01/19/2018 12:57 PM, Martin Liška wrote: > >>> Yes, there's a huge difference in between CPU 2006 and 2017. Former has > >>> 63% w/ dominant edges, > >>> and later one only 11%. It's caused by these 2 benchmarks with a high > >>> coverage: > >>> > >> > >> Hi. > >> > >> I'm sending details about the 2 edges that influence the statistics > >> significantly: > >> > >>> 500.perlbench_r: regexec.c.065i.profile: > >>> negative return heuristics of edge 1368->1370: 2.0% exec 2477714850 > >>> hit 2429863555 (98.1%) > >> > >> 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp) > >> -: 3485:{ > >> 2477714850: 3486:CHECKPOINT lastcp; > >> 2477714850: 3487:REGEXP *const rx = reginfo->prog; > >> 2477714850: 3488:regexp *const prog = ReANY(rx); > >> 2477714850: 3489:SSize_t result; > >> [snip] > >> 2477714850: 8046:assert(!result || locinput - reginfo->strbeg >= 0); > >> 2477714850: 8047:return result ? locinput - reginfo->strbeg : -1; > >> -: 8048:} > >> > >> As seen it return -1 if a regex is not found, which is in case of > >> perlbench very likely branch. > >> > >>> > >>> and 523.xalancbmk_r: > >>> build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile: > >>> negative return heuristics of edge 3->4: 2.0% exec 1221735072 hit > >>> 1221522453 (100.0%) > >> > >> 1221735072: 74:int NameDatatypeValidator::compare(const XMLCh* const > >> lValue > >> -: 75: , const XMLCh* const > >> rValue > >> -: 76: , MemoryManager* > >>const) > >> -: 77:{ > >> 1221735072: 78:return ( XMLString::equals(lValue, rValue)? 0 : -1); > >> -: 79:} > >> -: 80: > >> > >> IP profile dump file: > >> > >> xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator > >> * const this, const XMLCh * const lValue, const XMLCh * const rValue, > >> struct MemoryManager * const D.17157) > >> { > >> bool _1; > >> int iftmp.0_2; > >> > >>[count: 1221735072]: > >> # DEBUG BEGIN_STMT > >> _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D)); > >> if (_1 != 0) > >> goto ; [0.02%] > >> else > >> goto ; [99.98%] > >> > >>[count: 1221522453]: > >> > >>[count: 1221735072]: > >> # iftmp.0_2 = PHI <0(2), -1(3)> > >> return iftmp.0_2; > >> > >> } > >> > >> Likewise, XML values are more commonly non-equal. > >> Ok, so may I mark also negative return with PROB_EVEN to track it? > > > > Well, if we start disabling predictors just because we can find benchmark > > where they do not > > perform as expected, we will need to disable them all. It is nature of > > heuristics to make > > mistakes, just they should do something useful for common code. > > It is not goal to make heuristics that makes no mistake. > > Heuristics is useful either if it have high precision even if it cover few > > branches (say noreturn), or if it have large coverage and still some > > hitrate (say opcode, > > call or return based heuristics). > > > > To make things bit more esoteric, in practice this is also bit weighted by > > fact > > how much damage bad prediction does. For example, call heuristics which > > predict > > non-call path to be taken is likely going to do little damage. Even if call > > path is common it is heavy and hard to optimize by presence of call and > > thus we > > don't loose that much in common scenarios. > > > > In the second case: > > 00073 // > > --- > > 00074 // Compare methods > > 00075 // > > --- > > 00076 int NameDatatypeValidator::compare(const XMLCh* const lValue > > 00077, const XMLCh* const rValue > > 00078, MemoryManager* const) > > 00079 { > > 00080 return ( XMLString::equals(lValue, rValue)? 0 : -1); > > 00081 } > > > > I would say it is odd coding style. I do not see why it is not declared > > bool and not returning true/false. > > > > First case seems bit non-standard too as it looks like usual cache lookup > > just > > the cache frequently fails. > > > > I would be in favor of keeping the prediction with non-50% hitrate thus > > unless > > we run into some performance problems. > > If there are only two benchmarks out of say 20 we tried (in spec2000 and > > 2k17) > > it seems fine to me. > > > > There is issue with the way we collect our data. Using arithmetic mean > > across > > spec is optimizing for overall runtime of all spec benchmarks combined > > together. We more want to look for safer values that do well for average > > benchmarks independently how many branches they do. We could consider > > making > > the statistics script also collect geometric averages across di
Re: [PATCH] Handle trailing arrays in ODR warning (PR lto/81440).
> Hi. > > This removes false positive warning when having trailing array at the end of > a struct. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin > > gcc/lto/ChangeLog: > > 2018-01-23 Martin Liska > > PR lto/81440 > * lto-symtab.c (lto_symtab_merge): Handle and do not warn about > trailing arrays at the end of a struct. OK, thanks! Honza
[PR c++/839888] Baselink tsubst ICE
I added an assert when recently fixing baselink substitution, but the assert is incorrect as this testcase shows. Fixing thusly. nathan -- Nathan Sidwell 2018-01-23 Nathan Sidwell PR c++/83988 * pt.c (tsubst_baselink): Remove optype assert. * ptree.c (cxx_print_xnode): Print BASELINK_OPTYPE. PR c++/83988 * g++.dg/template/pr83988.C: New. Index: cp/pt.c === --- cp/pt.c (revision 256981) +++ cp/pt.c (working copy) @@ -14447,11 +14447,8 @@ tsubst_baselink (tree baselink, tree obj fns = BASELINK_FUNCTIONS (baselink); } else -{ - gcc_assert (optype == BASELINK_OPTYPE (baselink)); - /* We're going to overwrite pieces below, make a duplicate. */ - baselink = copy_node (baselink); -} +/* We're going to overwrite pieces below, make a duplicate. */ +baselink = copy_node (baselink); /* If lookup found a single function, mark it as used at this point. (If lookup found multiple functions the one selected later by Index: cp/ptree.c === --- cp/ptree.c (revision 256981) +++ cp/ptree.c (working copy) @@ -215,6 +215,7 @@ cxx_print_xnode (FILE *file, tree node, print_node (file, "binfo", BASELINK_BINFO (node), indent + 4); print_node (file, "access_binfo", BASELINK_ACCESS_BINFO (node), indent + 4); + print_node (file, "optype", BASELINK_OPTYPE (node), indent + 4); break; case OVERLOAD: print_node (file, "function", OVL_FUNCTION (node), indent+4); Index: testsuite/g++.dg/template/pr83988.C === --- testsuite/g++.dg/template/pr83988.C (revision 0) +++ testsuite/g++.dg/template/pr83988.C (working copy) @@ -0,0 +1,16 @@ +// PR 83988 ICE + +template struct optional {}; +struct get_from_json { + template + operator optional() const {return optional ();} + template + optional maybe() const + { +return this->operator optional(); + } +}; +void test() +{ + get_from_json().maybe(); +}
[committed] Add testcases for PR c++/82882 and PR c++/83978
Hi! These 2 PRs were fixed with r256964 and stay fixed even with r256965. So I've just committed the testcase to the trunk and am going to close them as fixed. 2018-01-23 Jakub Jelinek PR c++/82882 PR c++/83978 * g++.dg/cpp0x/pr82882.C: New test. * g++.dg/cpp0x/pr83978.C: New test. --- gcc/testsuite/g++.dg/cpp0x/pr82882.C.jj 2018-01-23 15:01:19.781297360 +0100 +++ gcc/testsuite/g++.dg/cpp0x/pr82882.C2018-01-23 15:00:41.739297731 +0100 @@ -0,0 +1,15 @@ +// PR c++/82882 +// { dg-do compile { target c++11 } } + +template +void +foo () +{ + auto v = [] { enum { E, F }; }; +} + +void +bar () +{ + foo (); +} --- gcc/testsuite/g++.dg/cpp0x/pr83978.C.jj 2018-01-23 15:01:30.086297256 +0100 +++ gcc/testsuite/g++.dg/cpp0x/pr83978.C2018-01-23 15:00:54.721297609 +0100 @@ -0,0 +1,6 @@ +// PR c++/83978 +// { dg-do compile { target c++11 } } + +template +struct A { A () { auto a = [] { enum E { F }; }; } }; +A<0> *p = new A<0> (); Jakub
[PATCH] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)
PR c++/83974 reports an ICE within fold_for_warn when calling cxx_eval_constant_expression on a CAST_EXPR. This comes from a pointer-to-member-function. The result of build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a TREE_LIST containing the zero INTEGER_CST. After r256804, fold_for_warn within a template calls fold_non_dependent_expr. For this tree, is_nondependent_constant_expression returns true. potential_constant_expression_1 has these cases: case TREE_LIST: { gcc_assert (TREE_PURPOSE (t) == NULL_TREE || DECL_P (TREE_PURPOSE (t))); if (!RECUR (TREE_VALUE (t), want_rval)) return false; if (TREE_CHAIN (t) == NULL_TREE) return true; return RECUR (TREE_CHAIN (t), want_rval); } and: case CAST_EXPR: case CONST_CAST_EXPR: case STATIC_CAST_EXPR: case REINTERPRET_CAST_EXPR: case IMPLICIT_CONV_EXPR: if (cxx_dialect < cxx11 && !dependent_type_p (TREE_TYPE (t)) && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t))) /* In C++98, a conversion to non-integral type can't be part of a constant expression. */ { // reject it } // accept it and thus returns true for the CAST_EXPR and TREE_LIST, and hence for the CONSTRUCTOR as a whole. However, cxx_eval_constant_expression does not support these tree codes, ICEing in the default case with: internal_error ("unexpected expression %qE of kind %s", t, It seems that, given potential_constant_expression_1 can return true for these cases, then cxx_eval_constant_expression ought to handle them, which this patch implements, fixing the ICE. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/cp/ChangeLog: PR c++/83974 * constexpr.c (cxx_eval_constant_expression): Handle case TREE_LIST. Handle CAST_EXPR, CONST_CAST_EXPR, STATIC_CAST_EXPR, REINTERPRET_CAST_EXPR and IMPLICIT_CONV_EXPR via cxx_eval_unary_expression. gcc/testsuite/ChangeLog: PR c++/83974 * g++.dg/warn/pr83974.C: New test case. --- gcc/cp/constexpr.c | 27 +++ gcc/testsuite/g++.dg/warn/pr83974.C | 11 +++ 2 files changed, 38 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/pr83974.C diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index ca7f369..a593132 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -4348,6 +4348,24 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, } break; +case TREE_LIST: + { + tree value, chain; + gcc_assert (TREE_PURPOSE (t) == NULL_TREE + || DECL_P (TREE_PURPOSE (t))); + value = cxx_eval_constant_expression (ctx, TREE_VALUE (t), lval, + non_constant_p, overflow_p, + jump_target); + if (TREE_CHAIN (t)) + chain = cxx_eval_constant_expression (ctx, TREE_CHAIN (t), lval, + non_constant_p, overflow_p, + jump_target); + else + chain = NULL_TREE; + r = tree_cons (TREE_PURPOSE (t), value, chain); + } + break; + case POINTER_PLUS_EXPR: case POINTER_DIFF_EXPR: case PLUS_EXPR: @@ -4594,6 +4612,15 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, return cxx_eval_statement_list (&new_ctx, t, non_constant_p, overflow_p, jump_target); +case CAST_EXPR: +case CONST_CAST_EXPR: +case STATIC_CAST_EXPR: +case REINTERPRET_CAST_EXPR: +case IMPLICIT_CONV_EXPR: + r = cxx_eval_unary_expression (ctx, t, lval, +non_constant_p, overflow_p); + break; + case BIND_EXPR: return cxx_eval_constant_expression (ctx, BIND_EXPR_BODY (t), lval, diff --git a/gcc/testsuite/g++.dg/warn/pr83974.C b/gcc/testsuite/g++.dg/warn/pr83974.C new file mode 100644 index 000..af12c2d --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/pr83974.C @@ -0,0 +1,11 @@ +// { dg-options "-Wtautological-compare" } + +struct A { + typedef void (A::*B) (); + operator B (); +}; +template +struct C { + void foo () { d == 0; } + A d; +}; -- 1.8.5.3
Re: [PATCH, wwwdocs] Update GCC 8 release notes for NDS32 port
On Tue, 23 Jan 2018, Chung-Ju Wu wrote: > In previous patch we add new options for NDS32 port: > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01397.html > > So I think I need to update GCC 8 release notes as well. Yes, please. :-) > +New command-line options -mext-perf -mext-perf2 -mext-string Can you write this as "...-mext-perf, -mext-perf2, and -mext-string..." please? Approved with that change. Gerald
[PATCH][AArch64] Fix gcc.target/aarch64/subs_compare_[12].c
Hi all, This patch fixes the testsuite failures gcc.target/aarch64/subs_compare_1.c and subs_compare_2.c The tests check that we combine a sequence like: sub w2, w0, w1 cmp w0, w1 into subsw2, w0, w1 This is done by a couple of peepholes in aarch64.md. Unfortunately due to scheduling and other optimisations the SUB and CMP can come in a different order: cmp w0, w1 sub w0, w0, w1 And the existing peepholes cannot catch that and we fail to combine the two. This patch adds a peephole that matches the CMP as the first insn and the SUB as the second and outputs a SUBS. This is almost equivalent to the existing peephole that matches SUB first and CMP second except that it doesn't have the restriction that the output register of the SUB has to not be one of the input registers. Remember "sub w0, w0, w1 ; cmp w0, w1" is *not* equivalent to: "subs w0, w0, w1" but "cmp w0, w1 ; sub w0, w0, w1" is. So this is what this patch does. It adds a peephole for the case above and one for the SUB-immediate variant (because the SUB-immediate is represented as PLUS-of-negated-immediate and thus has different RTL structure). Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2018-01-23 Kyrylo Tkachov * config/aarch64/aarch64.md: Add peepholes for CMP + SUB -> SUBS and CMP + SUB-immediate -> SUBS. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a6ecb391309494087416913c11f339cf10357977..49095f8f3d995907903c11b68cae25a919204a76 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2430,6 +2430,26 @@ (define_peephole2 } ) +;; Same as the above peephole but with the compare and minus in +;; swapped order. The restriction on overlap between operand 0 +;; and operands 1 and 2 doesn't apply here. +(define_peephole2 + [(set (reg:CC CC_REGNUM) + (compare:CC + (match_operand:GPI 1 "aarch64_reg_or_zero") + (match_operand:GPI 2 "aarch64_reg_or_zero"))) + (set (match_operand:GPI 0 "register_operand") + (minus:GPI (match_dup 1) + (match_dup 2)))] + "" + [(const_int 0)] + { +emit_insn (gen_sub3_compare1 (operands[0], operands[1], + operands[2])); +DONE; + } +) + (define_peephole2 [(set (match_operand:GPI 0 "register_operand") (plus:GPI (match_operand:GPI 1 "register_operand") @@ -2448,6 +2468,26 @@ (define_peephole2 } ) +;; Same as the above peephole but with the compare and minus in +;; swapped order. The restriction on overlap between operand 0 +;; and operands 1 doesn't apply here. +(define_peephole2 + [(set (reg:CC CC_REGNUM) + (compare:CC + (match_operand:GPI 1 "register_operand") + (match_operand:GPI 3 "const_int_operand"))) + (set (match_operand:GPI 0 "register_operand") + (plus:GPI (match_dup 1) + (match_operand:GPI 2 "aarch64_sub_immediate")))] + "INTVAL (operands[3]) == -INTVAL (operands[2])" + [(const_int 0)] + { +emit_insn (gen_sub3_compare1_imm (operands[0], operands[1], + operands[2], operands[3])); +DONE; + } +) + (define_insn "*sub__" [(set (match_operand:GPI 0 "register_operand" "=r") (minus:GPI (match_operand:GPI 3 "register_operand" "r")
Re: [PATCH] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)
On Tue, Jan 23, 2018 at 9:27 AM, David Malcolm wrote: > PR c++/83974 reports an ICE within fold_for_warn when calling > cxx_eval_constant_expression on a CAST_EXPR. > > This comes from a pointer-to-member-function. The result of > build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is > a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a > TREE_LIST containing the zero INTEGER_CST. > > After r256804, fold_for_warn within a template calls > fold_non_dependent_expr. > > For this tree, is_nondependent_constant_expression returns true. > > potential_constant_expression_1 has these cases: > > case TREE_LIST: > { > gcc_assert (TREE_PURPOSE (t) == NULL_TREE > || DECL_P (TREE_PURPOSE (t))); > if (!RECUR (TREE_VALUE (t), want_rval)) > return false; > if (TREE_CHAIN (t) == NULL_TREE) > return true; > return RECUR (TREE_CHAIN (t), want_rval); > } > > and: > > case CAST_EXPR: > case CONST_CAST_EXPR: > case STATIC_CAST_EXPR: > case REINTERPRET_CAST_EXPR: > case IMPLICIT_CONV_EXPR: > if (cxx_dialect < cxx11 > && !dependent_type_p (TREE_TYPE (t)) > && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t))) > /* In C++98, a conversion to non-integral type can't be part of a >constant expression. */ > { > // reject it > } > // accept it > > and thus returns true for the CAST_EXPR and TREE_LIST, and hence for the > CONSTRUCTOR as a whole. > > However, cxx_eval_constant_expression does not support these tree codes, Because they are template-only codes, that cxx_eval_constant_expression should never see. They shouldn't survive the call to instantiate_non_dependent_expr_internal from fold_non_dependent_expr. Jason
Re: [PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust
Hi Christophe, On 23/01/18 13:09, Christophe Lyon wrote: Hi Kyrill, On 22 January 2018 at 11:48, Kyrill Tkachov wrote: Hi all, This test has needlessly restrictive requirements. It tries to force a soft-float target and tries to run. This makes it unsupportable for any non-soft-float variant. In fact, the test can be a run-time test for any target, and only the scan-assembler tests are specific to -mfloat-abi=soft. So this patch makes the test always runnable and makes the scan-assembler checks predicable on the the new arm_sotftfloat effective target check. Unfortunately, the test now fails on armeb-linux-gnueabihf (it used to be unsupported). my logs only show: qemu: uncaught target signal 11 (Segmentation fault) - core dumped :( it runs fine for me on a armeb-none-eabi --with-float=hard target with a Foundation Model. Unfortunately I don't have access to an armeb-non-linux-gnueabihf setup. Could you provide more detailed configuration options so that I can try to reproduce the exact -march -m[arm,thumb], -mfpu setup? Thanks, Kyrill on arm-linux-gnueabihf, it's OK (the test was unsupported, but now passes) Christophe Committing to trunk. Thanks, Kyrill 2018-01-22 Kyrylo Tkachov * doc/sourcebuild.texi (arm_softfloat): Document. 2018-01-22 Kyrylo Tkachov * lib/target-supports.exp (check_effective_target_arm_softfloat): New procedure. * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere. Adjust scan-assembler checks for soft-float.
Re: [PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust
On 23 January 2018 at 15:57, Kyrill Tkachov wrote: > Hi Christophe, > > On 23/01/18 13:09, Christophe Lyon wrote: >> >> Hi Kyrill, >> >> On 22 January 2018 at 11:48, Kyrill Tkachov >> wrote: >>> >>> Hi all, >>> >>> This test has needlessly restrictive requirements. It tries to force a >>> soft-float target and tries to run. >>> This makes it unsupportable for any non-soft-float variant. >>> In fact, the test can be a run-time test for any target, and only the >>> scan-assembler tests are specific to >>> -mfloat-abi=soft. So this patch makes the test always runnable and makes >>> the >>> scan-assembler checks predicable >>> on the the new arm_sotftfloat effective target check. >>> >> Unfortunately, the test now fails on armeb-linux-gnueabihf (it used to >> be unsupported). >> my logs only show: >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped > > > :( it runs fine for me on a armeb-none-eabi --with-float=hard target with a > Foundation Model. > Unfortunately I don't have access to an armeb-non-linux-gnueabihf setup. > Could you provide more detailed configuration options so that I can try to > reproduce the > exact -march -m[arm,thumb], -mfpu setup? > I'm using qemu in user-mode, and I see this error on the 3 configurations I test: --with-cpu=cortex-a9 --with-mode=arm or thumb --with-fpu=neon-fp16 or vfpv3-d16-fp16 HTH > Thanks, > Kyrill > > >> on arm-linux-gnueabihf, it's OK (the test was unsupported, but now passes) >> >> Christophe >> >>> Committing to trunk. >>> Thanks, >>> Kyrill >>> >>> 2018-01-22 Kyrylo Tkachov >>> >>> * doc/sourcebuild.texi (arm_softfloat): Document. >>> >>> 2018-01-22 Kyrylo Tkachov >>> >>> * lib/target-supports.exp (check_effective_target_arm_softfloat): >>> New procedure. >>> * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere. >>> Adjust scan-assembler checks for soft-float. > >
[aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor
Hi, Here's v2 of the patch to disable register offset addressing mode for stores of 128-bit values on Falkor because they're very costly. Differences from the last version: - Incorporated changes Jim made to his patch earlier that I missed, i.e. adding an extra tuning parameter called SLOW_REGOFFSET_QUADWORD_STORE instead of making it conditional on TUNE_FALKOR. - Added a new query type ADDR_QUERY_STR to indicate the queried address is used for a store. This way I can use it for other scenarios where stores are significantly more expensive than loads, such as pre/post modify addressing modes. - Incorporated the constraint functionality into aarch64_legitimate_address_p and aarch64_classify_address. I evaluated the suggestion of using costs to do this but it's not possible with the current costs as they do not differentiate between loads and stores. If modifying all passes that use these costs to identify loads vs stores is considered OK (ivopts seems to be the biggest user) then I can volunteer to do that work for gcc9 and evetually replace this. On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. This patch improves fpspeed by 0.3% and intspeed by 0.22% in CPU2017, with omnetpp_s (4.3%) and pop2_s (2.6%) being the biggest winners. 2018-xx-xx Jim Wilson Kugan Vivenakandarajah Siddhesh Poyarekar gcc/ * gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New member ADDR_QUERY_STR. * gcc/config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_classify_address): Avoid register indexing for quad mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set. * gcc/config/aarch64/constraints.md (Uts): New constraint. * gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64): Use it. * gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov): Likewise. gcc/testsuite/ * gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case. --- gcc/config/aarch64/aarch64-protos.h | 4 gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 gcc/config/aarch64/aarch64.c| 12 +++- gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 7 +++ gcc/testsuite/gcc.target/aarch64/pr82533.c | 11 +++ 7 files changed, 42 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ef1b0bc8e28..5fedc85f283 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -120,6 +120,9 @@ enum aarch64_symbol_type ADDR_QUERY_LDP_STP Query what is valid for a load/store pair. + ADDR_QUERY_STR + Query what is valid for a store. + ADDR_QUERY_ANY Query what is valid for at least one memory constraint, which may allow things that "m" doesn't. For example, the SVE LDR and STR @@ -128,6 +131,7 @@ enum aarch64_symbol_type enum aarch64_addr_query_type { ADDR_QUERY_M, ADDR_QUERY_LDP_STP, + ADDR_QUERY_STR, ADDR_QUERY_ANY }; diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3d1f6a01cb7..48d92702723 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -131,9 +131,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Uts, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz,w, w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index ea9ead234cb..04baf5b6de6 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) are not considered cheap. */ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) +/* Don't use a register offset in a memory address for a quad-word store. */ +AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store", +
[PATCH] fix type typo in avx512fintrin.h
I just hit a compile error on AVX512 code. The fix is trivial enough that I didn't bother writing a PR and just fixed it. Acceptable? I hope this doesn't require the paperwork, though my employer is willing to do it anyway. :-) Cheers, Matthias 2018-01-23 Matthias Kretz * config/i386/avx512fintrin.h: Fix signatures of _mm512_abs_ps and _mm512_mask_abs_pd to use __m512d instead of __m512. diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h index 71e36a5..de68675 100644 --- a/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -7612,7 +7612,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 __A) extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_abs_pd (__m512 __A) +_mm512_abs_pd (__m512d __A) { return (__m512d) _mm512_and_epi64 ((__m512i) __A, _mm512_set1_epi64 (0x7fffLL)); @@ -7620,7 +7620,7 @@ _mm512_abs_pd (__m512 __A) extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512 __A) +_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A) { return (__m512d) _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A, -- ── Dr. Matthias Kretzhttps://kretzfamily.de GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de SIMD easy and portable https://github.com/VcDevel/Vc ──
Re: [PATCH] fix type typo in avx512fintrin.h
On Tue, Jan 23, 2018 at 04:46:02PM +0100, Matthias Kretz wrote: > I just hit a compile error on AVX512 code. The fix is trivial enough that I > didn't bother writing a PR and just fixed it. Acceptable? > > I hope this doesn't require the paperwork, though my employer is willing to > do > it anyway. :-) CCing maintainers. > 2018-01-23 Matthias Kretz > > * config/i386/avx512fintrin.h: Fix signatures of _mm512_abs_ps and > _mm512_mask_abs_pd to use __m512d instead of __m512. This should have been: * config/i386/avx512fintrin.h (_mm512_abs_ps, _mm512_mask_abs_pd): Change type of arguments with __m512 type to __m512d. or so. > diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h > index 71e36a5..de68675 100644 > --- a/gcc/config/i386/avx512fintrin.h > +++ b/gcc/config/i386/avx512fintrin.h > @@ -7612,7 +7612,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 > __A) > > extern __inline __m512d > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > -_mm512_abs_pd (__m512 __A) > +_mm512_abs_pd (__m512d __A) > { >return (__m512d) _mm512_and_epi64 ((__m512i) __A, > _mm512_set1_epi64 (0x7fffLL)); > @@ -7620,7 +7620,7 @@ _mm512_abs_pd (__m512 __A) > > extern __inline __m512d > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > -_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512 __A) > +_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A) > { >return (__m512d) > _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A, > > -- > ── > Dr. Matthias Kretzhttps://kretzfamily.de > GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de > SIMD easy and portable https://github.com/VcDevel/Vc > ── Jakub
Re: [PATCH] PR 83975 Set character length to 0 if unknown
Hi Janne, When associating a variable of type character, if the length of the target isn't known, set it to zero rather than leaving it unset. This is not a complete fix for making associate of characters work properly, but papers over an ICE in the middle-end. See PR 83344 for more details. Regtested on x86_64-pc-linux-gnu, Ok for trunk? Does the code which is accepted then execute correctly? Personally, I would prefer an ICE over a silent wrong-code. Regards Thomas
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
Could somebody please review the patch? Thanks Paul On 23 January 2018 at 06:13, Dominique d'Humières wrote: > Dear Paul, > > The test suite passed without new regression with both -m32 and -m64. > > Thanks for the work, > > Dominique -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
On 01/22/2018 05:13 PM, Paolo Carlini wrote: Hi again, On 22/01/2018 22:50, Paolo Carlini wrote: Ok. The below passes the C++ testsuite and I'm finishing testing it. Therefore, as you already hinted to, we can now say that what was *really* missing from potential_constant_expression_1 was the use of default_init_uninitialized_part, which does all the non-trivial work besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. check_for_uninitialized_const_var also provides the informs, which were completely missing. Grrr. Testing the library revealed immediately the failure of 18_support/byte/ops.cc, because in constexpr_context_p == true, thus from potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. I guess we really want to keep the existing constexpr_context_p == false cases separate. I'm therefore restarting testing with the below. + && ((!constexpr_context_p + && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))) + || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl))) && !DECL_INITIAL (decl)) I think I'd replace the DECL_INITIAL check with DECL_NONTRIVIALLY_INITIALIZED_P, which seems more precise. So we ought to be ok with the simpler && (constexpr_context_p || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)) Jason
[PATCH] libgcc: xtensa: fix NaN return from add/sub/mul/div helpers
libgcc/ 2018-01-22 Max Filippov * config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3) (__divsf3): Make NaN return value quiet. * config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3) (__divdf3): Make NaN return value quiet. --- libgcc/config/xtensa/ieee754-df.S | 54 --- libgcc/config/xtensa/ieee754-sf.S | 51 2 files changed, 74 insertions(+), 31 deletions(-) diff --git a/libgcc/config/xtensa/ieee754-df.S b/libgcc/config/xtensa/ieee754-df.S index 9aa55d1f74a4..2662a6600751 100644 --- a/libgcc/config/xtensa/ieee754-df.S +++ b/libgcc/config/xtensa/ieee754-df.S @@ -64,17 +64,26 @@ __adddf3_aux: .Ladd_xnan_or_inf: /* If y is neither Infinity nor NaN, return x. */ - bnall yh, a6, 1f + bnall yh, a6, .Ladd_return_nan_or_inf /* If x is a NaN, return it. Otherwise, return y. */ sllia7, xh, 12 or a7, a7, xl - beqza7, .Ladd_ynan_or_inf -1: leaf_return + bneza7, .Ladd_return_nan .Ladd_ynan_or_inf: /* Return y. */ mov xh, yh mov xl, yl + +.Ladd_return_nan_or_inf: + sllia7, xh, 12 + or a7, a7, xl + bneza7, .Ladd_return_nan + leaf_return + +.Ladd_return_nan: + movia4, 0x8 /* make it a quiet NaN */ + or xh, xh, a4 leaf_return .Ladd_opposite_signs: @@ -319,17 +328,24 @@ __subdf3_aux: .Lsub_xnan_or_inf: /* If y is neither Infinity nor NaN, return x. */ - bnall yh, a6, 1f + bnall yh, a6, .Lsub_return_nan_or_inf + +.Lsub_return_nan: /* Both x and y are either NaN or Inf, so the result is NaN. */ movia4, 0x8 /* make it a quiet NaN */ or xh, xh, a4 -1: leaf_return + leaf_return .Lsub_ynan_or_inf: /* Negate y and return it. */ sllia7, a6, 11 xor xh, yh, a7 mov xl, yl + +.Lsub_return_nan_or_inf: + sllia7, xh, 12 + or a7, a7, xl + bneza7, .Lsub_return_nan leaf_return .Lsub_opposite_signs: @@ -692,10 +708,7 @@ __muldf3_aux: /* If y is zero, return NaN. */ bnezyl, 1f sllia8, yh, 1 - bneza8, 1f - movia4, 0x8 /* make it a quiet NaN */ - or xh, xh, a4 - j .Lmul_done + beqza8, .Lmul_return_nan 1: /* If y is NaN, return y. */ bnall yh, a6, .Lmul_returnx @@ -708,6 +721,9 @@ __muldf3_aux: mov xl, yl .Lmul_returnx: + sllia8, xh, 12 + or a8, a8, xl + bneza8, .Lmul_return_nan /* Set the sign bit and return. */ extui a7, a7, 31, 1 sllixh, xh, 1 @@ -720,8 +736,11 @@ __muldf3_aux: bnezxl, .Lmul_returny sllia8, xh, 1 bneza8, .Lmul_returny - movia7, 0x8 /* make it a quiet NaN */ - or xh, yh, a7 + mov xh, yh + +.Lmul_return_nan: + movia4, 0x8 /* make it a quiet NaN */ + or xh, xh, a4 j .Lmul_done .align 4 @@ -1370,10 +1389,11 @@ __divdf3_aux: sllia7, a7, 31 xor xh, xh, a7 /* If y is NaN or Inf, return NaN. */ - bnall yh, a6, 1f - movia4, 0x8 /* make it a quiet NaN */ - or xh, xh, a4 -1: leaf_return + ballyh, a6, .Ldiv_return_nan + sllia8, xh, 12 + or a8, a8, xl + bneza8, .Ldiv_return_nan + leaf_return .Ldiv_ynan_or_inf: /* If y is Infinity, return zero. */ @@ -1383,6 +1403,10 @@ __divdf3_aux: /* y is NaN; return it. */ mov xh, yh mov xl, yl + +.Ldiv_return_nan: + movia4, 0x8 /* make it a quiet NaN */ + or xh, xh, a4 leaf_return .Ldiv_highequal1: diff --git a/libgcc/config/xtensa/ieee754-sf.S b/libgcc/config/xtensa/ieee754-sf.S index 659f183f6ba8..d48b230a7588 100644 --- a/libgcc/config/xtensa/ieee754-sf.S +++ b/libgcc/config/xtensa/ieee754-sf.S @@ -64,15 +64,23 @@ __addsf3_aux: .Ladd_xnan_or_inf: /* If y is neither Infinity nor NaN, return x. */ - bnall a3, a6, 1f + bnall a3, a6, .Ladd_return_nan_or_inf /* If x is a NaN, return it. Otherwise, return y. */ sllia7, a2, 9 - beqza7, .Ladd_ynan_or_inf -1: leaf_return + bneza7, .Ladd_return_nan .Ladd_ynan_or_inf: /* Return y. */ mov a2, a3 + +.Ladd_return_nan_or_inf: + sllia7, a2, 9 + bneza7, .Ladd_return_nan + leaf_return + +.Ladd_return_nan: + movia6, 0x40/* make it a quiet NaN */ + or a2, a2, a6 leaf_return .Ladd_opposite_signs: @@ -265,16 +273,22 @@ __subsf3_aux: .Lsub_xnan_or_inf: /* If y is neither Infinity nor NaN, return x.
[PATCH v2] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)
On Tue, 2018-01-23 at 09:54 -0500, Jason Merrill wrote: > On Tue, Jan 23, 2018 at 9:27 AM, David Malcolm > wrote: > > PR c++/83974 reports an ICE within fold_for_warn when calling > > cxx_eval_constant_expression on a CAST_EXPR. > > > > This comes from a pointer-to-member-function. The result of > > build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is > > a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a > > TREE_LIST containing the zero INTEGER_CST. > > > > After r256804, fold_for_warn within a template calls > > fold_non_dependent_expr. > > > > For this tree, is_nondependent_constant_expression returns true. > > > > potential_constant_expression_1 has these cases: > > > > case TREE_LIST: > > { > > gcc_assert (TREE_PURPOSE (t) == NULL_TREE > > || DECL_P (TREE_PURPOSE (t))); > > if (!RECUR (TREE_VALUE (t), want_rval)) > > return false; > > if (TREE_CHAIN (t) == NULL_TREE) > > return true; > > return RECUR (TREE_CHAIN (t), want_rval); > > } > > > > and: > > > > case CAST_EXPR: > > case CONST_CAST_EXPR: > > case STATIC_CAST_EXPR: > > case REINTERPRET_CAST_EXPR: > > case IMPLICIT_CONV_EXPR: > > if (cxx_dialect < cxx11 > > && !dependent_type_p (TREE_TYPE (t)) > > && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t))) > > /* In C++98, a conversion to non-integral type can't be > > part of a > >constant expression. */ > > { > > // reject it > > } > > // accept it > > > > and thus returns true for the CAST_EXPR and TREE_LIST, and hence > > for the > > CONSTRUCTOR as a whole. > > > > However, cxx_eval_constant_expression does not support these tree > > codes, > > Because they are template-only codes, that > cxx_eval_constant_expression should never see. They shouldn't > survive > the call to instantiate_non_dependent_expr_internal from > fold_non_dependent_expr. > > Jason Aha - thanks. instantiate_non_dependent_expr_internal calls tsubst_copy_and_build, and tsubst_copy_and_build is bailing out here: 18103 /* digest_init will do the wrong thing if we let it. */ 18104 if (type && TYPE_PTRMEMFUNC_P (type)) 18105 RETURN (t); leaving the CONSTRUCTOR uncopied, and thus containing the CAST_EXPR and TREE_LIST, leading to the ICE within cxx_eval_constant_expression. The above code dates back to 33643032d70f56c4e00028da8185bcac4023e646 (r61409) from 2003, which added tsubst_copy_and_build. The call to digest_init in tsubst_copy_and_build's CONSTRUCTOR case was removed in 1d8baa0efe4be51729c604adf7be9c36e786edff (r117832, 2006-10-17, fixing PR c++/27270), which amongst other things made this change, removing the use of digest_init: ce->value = RECUR (ce->value); } - r = build_constructor (NULL_TREE, n); - TREE_HAS_CONSTRUCTOR (r) = TREE_HAS_CONSTRUCTOR (t); + if (TREE_HAS_CONSTRUCTOR (t)) + return finish_compound_literal (type, n); - if (type) - { - r = reshape_init (type, r); - return digest_init (type, r); - } - return r; + return build_constructor (NULL_TREE, n); } Given that, is the early bailout for TYPE_PTRMEMFUNC_P still needed? On the assumption that it's outdated, this patch removes it. With that change, tsubst_copy_and_build successfully copies the CONSTRUCTOR, with the CAST_EXPR and TREE_LIST eliminated, fixing the ICE in fold_for_warn. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/cp/ChangeLog: PR c++/83974 * pt.c (tsubst_copy_and_build) : Remove early bailout for pointer to member function types. gcc/testsuite/ChangeLog: PR c++/83974 * g++.dg/warn/pr83974.C: New test case. --- gcc/cp/pt.c | 4 gcc/testsuite/g++.dg/warn/pr83974.C | 11 +++ 2 files changed, 11 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/warn/pr83974.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 0296845..0e48a13 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -18100,10 +18100,6 @@ tsubst_copy_and_build (tree t, if (type == error_mark_node) RETURN (error_mark_node); - /* digest_init will do the wrong thing if we let it. */ - if (type && TYPE_PTRMEMFUNC_P (type)) - RETURN (t); - /* We do not want to process the index of aggregate initializers as they are identifier nodes which will be looked up by digest_init. */ diff --git a/gcc/testsuite/g++.dg/warn/pr83974.C b/gcc/testsuite/g++.dg/warn/pr83974.C new file mode 100644 index 000..af12c2d --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/pr83974.C @@ -0,0 +1,11 @@ +// { dg-options "-Wtautological-compare" } + +struct A { + typedef void (A::*B) (); + operator B (); +}; +template +struct
Re: [PATCH][GCC][Testsuite] Have dg-cmp-results reject log files.
On Jan 23, 2018, at 2:31 AM, Tamar Christina wrote: > > This patch makes dg-cmp-results.sh reject the use of log files in the > comparison. No please. We like to run that script on log files from time to time for various reasons. I'd rather fix anything that prevents that from working as expected. > Often when given a log file dg-cmp-results will give incomplete/wrong output > and > using log instead of sum is one autocomplete tab away. Add a make target if you can't type the right command line?
Re: [C++ PATCH] Deprecate ARM-era for scopes
On Jan 23, 2018, at 4:18 AM, Nathan Sidwell wrote: > > As discussed (https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01778.html) this > patch deprecates the ARM-era for scope. The code gives: if you use %<-fpermissive%> G++ will accept your code I think we should no longer recommend a deprecated feature without at least stating it is deprecated in the hint? Not too important, as if they actually use it, the compiler will spit out: this flexibility is deprecated and will be removed if someone tried to use it, but thought I might mention it.
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
Hi Jerry, Do you mean adding something like this: diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h index 4c643b7e17b..c86e0b45e1d 100644 --- a/libgfortran/libgfortran.h +++ b/libgfortran/libgfortran.h @@ -600,6 +600,7 @@ typedef struct st_parameter_common GFC_INTEGER_4 line; CHARACTER2 (iomsg); GFC_INTEGER_4 *iostat; + void *reserved; } st_parameter_common; Yes, this is what I had in mind. Regards Thomas
Re: [C++ PATCH] Deprecate ARM-era for scopes
On 01/23/2018 01:27 PM, Mike Stump wrote: On Jan 23, 2018, at 4:18 AM, Nathan Sidwell wrote: As discussed (https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01778.html) this patch deprecates the ARM-era for scope. The code gives: if you use %<-fpermissive%> G++ will accept your code I think we should no longer recommend a deprecated feature without at least stating it is deprecated in the hint? Not too important, as if they actually use it, the compiler will spit out: this flexibility is deprecated and will be removed if someone tried to use it, but thought I might mention it. yeah, I hemmed and hawed about that, but given how bitrotted those diagnostics had become punted. nathan -- Nathan Sidwell
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
Hi Paul, Could somebody please review the patch? I'd say the patch is OK for trunk. I have also tested this on a big-endian system, and things worked. I will look at the resulting fallout of the GFC_DTYPE_TYPE_SIZE stuff. Thanks a lot for your work! Regards Thomas
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas wrote: > This patch has been triggered by Thomas's recent message to the list. > Not only did I start work late relative to stage 3 but debugging took > somewhat longer than anticipated. Therefore, to get this committed > asap, we will have to beg the indulgence of the release managers and > prompt review and/or testing by fortran maintainers. (Dominique has > already undertaken to test -m32.) > > The patch delivers rank up to 15 for F2008, the descriptor information > needed to enact the F2018 C descriptor macros and an attribute field > to store such information as pointer/allocatable, contiguous etc.. > Only the first has been enabled so far but it was necessary to submit > the array descriptor changes now to avoid any further ABI breakage in > 9.0.0. > > I took the design choice choice to replace the dtype with a structure: > typedef struct dtype_type > { > size_t elem_len; > int version; > int rank; > int type; > int attribute; > } > dtype_type; > > This choice was intended to reduce the changes to a minimum, since in > most references to the dtype, one dtype is assigned to another. > > The F2018 interop defines the 'type and 'attribute fields to be signed > char types. I used this intially but found that using int was the best > way to silence the warnings about padding since it also allows for > more attribute information to be carried. > > Some parts of the patch (eg. in get_scalar_to_descriptor_type) look as > if latent bugs were uncovered by the change to the descriptor. If so, > the time spent debugging was well worthwhile. > > It should be noted that some of the intrinsics, which use switch/case > for the type/kind selection, limit the effective element size that > they handle to the maximum value of size_t, less 7 bits. A bit of > straightforward work there would fix this limitation and would allow > the GFC_DTYPE shifts and masks to be eliminated. > > Bootstraps and regtests on FC23/x86_64 - OK for trunk? In trans-types.c: structure dtype_type dtype; Should be "struct dtype_type dtype". (This is in a comment, so doesn't affect the code, but still). I have to say, the patch is much smaller and less invasive than I had feared for such a fundamental change. I guess you were right about making dtype it's own type. As for the DTYPE shifting and masking thing, now that I have read the patch, if I understood it correctly, that's an internal issue in libgfortran and has no effect on the descriptor. That being said, the F2018 C descriptor has both the type and kind encoded into the type field, see table 18.4 in F2018 N2146. (In a way that's redundant, since the type and the elem_len fields suffice to figure out the type&kind combination). Anyway, is there a case for following suit, and if so, is it a too invasive change at this point? In any case, since we seem to agree that we have no big lingering issues that would require us to break the ABI again for GCC 9, IMHO this is Ok for trunk. You might want to get an explicit Ok from the release manager, though. -- Janne Blomqvist
[Patch, fortran] PR83898 - [6/7/8 Regression] ICE in gfc_conv_expr_descriptor, at fortran/trans-array.c:7181
Commit as 'obvious' in revision 256994. I will attend to 6- and 7-branches in a little while. Paul 2018-23-01 Paul Thomas PR fortran/83898 * trans-stmt.c (trans_associate_var): Do not set cst_array_ctor for characters. 2018-23-01 Paul Thomas PR fortran/83898 * gfortran.dg/associate_33.f03 : New test. Index: gcc/fortran/trans-stmt.c === *** gcc/fortran/trans-stmt.c(revision 256606) --- gcc/fortran/trans-stmt.c(working copy) *** trans_associate_var (gfc_symbol *sym, gf *** 1579,1585 desc = sym->backend_decl; cst_array_ctor = e->expr_type == EXPR_ARRAY ! && gfc_constant_array_constructor_p (e->value.constructor); /* If association is to an expression, evaluate it and create temporary. Otherwise, get descriptor of target for pointer assignment. */ --- 1579,1586 desc = sym->backend_decl; cst_array_ctor = e->expr_type == EXPR_ARRAY ! && gfc_constant_array_constructor_p (e->value.constructor) ! && e->ts.type != BT_CHARACTER; /* If association is to an expression, evaluate it and create temporary. Otherwise, get descriptor of target for pointer assignment. */ Index: gcc/testsuite/gfortran.dg/associate_33.f03 === *** gcc/testsuite/gfortran.dg/associate_33.f03 (nonexistent) --- gcc/testsuite/gfortran.dg/associate_33.f03 (working copy) *** *** 0 --- 1,11 + ! { dg-do run } + ! + ! Test the fix for PR83898.f90 + ! + ! Contributed by G Steinmetz + ! + program p +associate (x => ['1','2']) + if (any (x .ne. ['1','2'])) call abort +end associate + end
[Patch, fortran] PR83866 - [8 Regression] ICE in gfc_release_symbol, at fortran/symbol.c:3087
Committed as 'obvious' in revision 256995. Closing PR. Paul 2018-23-01 Paul Thomas PR fortran/83866 * decl.c (gfc_match_derived_decl): If eos not matched, recover and emit error about garbage after declaration. 2018-23-01 Paul Thomas PR fortran/83866 * gfortran.dg/pdt_29.f03 : New test. Index: gcc/fortran/decl.c === *** gcc/fortran/decl.c (revision 256606) --- gcc/fortran/decl.c (working copy) *** gfc_match_derived_decl (void) *** 9856,9862 gfc_error_recovery (); m = gfc_match_eos (); if (m != MATCH_YES) ! return m; sym->attr.pdt_template = 1; } --- 9856,9865 gfc_error_recovery (); m = gfc_match_eos (); if (m != MATCH_YES) ! { ! gfc_error_recovery (); ! gfc_error_now ("Garbage after PARAMETERIZED TYPE declaration at %C"); ! } sym->attr.pdt_template = 1; } Index: gcc/testsuite/gfortran.dg/pdt_29.f03 === *** gcc/testsuite/gfortran.dg/pdt_29.f03(nonexistent) --- gcc/testsuite/gfortran.dg/pdt_29.f03(working copy) *** *** 0 --- 1,15 + ! { dg-do compile } + ! + ! Test the fix for PR83866.f90 + ! + ! Contributed by G Steinmetz + ! + program p + type private + end type +type t + class(t), pointer :: a +end type +type extends(t) :: t2 ! { dg-error "Garbage after | does not have a component" } +end type + end
Re: [PATCH] PR 83975 Set character length to 0 if unknown
On Tue, Jan 23, 2018 at 6:33 PM, Thomas Koenig wrote: > Hi Janne, > >> When associating a variable of type character, if the length of the >> target isn't known, set it to zero rather than leaving it unset. This >> is not a complete fix for making associate of characters work >> properly, but papers over an ICE in the middle-end. See PR 83344 for >> more details. >> >> Regtested on x86_64-pc-linux-gnu, Ok for trunk? > > > Does the code which is accepted then execute correctly? Hmm, prompted by your question I did some investigating, and I present to you evidence of GFortran being a quantum mechanical compiler (spooky action at a distance!): Consider the slightly fleshed out testcase from PR 83975: program foo call s("foo") contains subroutine s(x) character(*) :: x print *, "X:", x, ":ENDX" print *, "len(x): ", len(x) ! associate (y => x) !print *, "Y:", y, ":ENDY" !print *, "len(y): ", len(y) ! end associate end subroutine s end program foo With the associate stuff commented out, it's fairly bog-standard stuff, and the output is the expected: X:foo:ENDX len(x):3 Now, incorporating the commented out associate block, and the result is: X::ENDX len(x):0 Y::ENDY len(y):0 So the associate construct manages to somehow break the variable X in the outer scope! -- Janne Blomqvist
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
Janne, Thanks. Jakub, is this OK with you? Paul On 23 January 2018 at 19:09, Janne Blomqvist wrote: > On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas > wrote: >> This patch has been triggered by Thomas's recent message to the list. >> Not only did I start work late relative to stage 3 but debugging took >> somewhat longer than anticipated. Therefore, to get this committed >> asap, we will have to beg the indulgence of the release managers and >> prompt review and/or testing by fortran maintainers. (Dominique has >> already undertaken to test -m32.) >> >> The patch delivers rank up to 15 for F2008, the descriptor information >> needed to enact the F2018 C descriptor macros and an attribute field >> to store such information as pointer/allocatable, contiguous etc.. >> Only the first has been enabled so far but it was necessary to submit >> the array descriptor changes now to avoid any further ABI breakage in >> 9.0.0. >> >> I took the design choice choice to replace the dtype with a structure: >> typedef struct dtype_type >> { >> size_t elem_len; >> int version; >> int rank; >> int type; >> int attribute; >> } >> dtype_type; >> >> This choice was intended to reduce the changes to a minimum, since in >> most references to the dtype, one dtype is assigned to another. >> >> The F2018 interop defines the 'type and 'attribute fields to be signed >> char types. I used this intially but found that using int was the best >> way to silence the warnings about padding since it also allows for >> more attribute information to be carried. >> >> Some parts of the patch (eg. in get_scalar_to_descriptor_type) look as >> if latent bugs were uncovered by the change to the descriptor. If so, >> the time spent debugging was well worthwhile. >> >> It should be noted that some of the intrinsics, which use switch/case >> for the type/kind selection, limit the effective element size that >> they handle to the maximum value of size_t, less 7 bits. A bit of >> straightforward work there would fix this limitation and would allow >> the GFC_DTYPE shifts and masks to be eliminated. >> >> Bootstraps and regtests on FC23/x86_64 - OK for trunk? > > In trans-types.c: > > structure dtype_type dtype; > > Should be "struct dtype_type dtype". (This is in a comment, so doesn't > affect the code, but still). > > I have to say, the patch is much smaller and less invasive than I had > feared for such a fundamental change. I guess you were right about > making dtype it's own type. > > As for the DTYPE shifting and masking thing, now that I have read the > patch, if I understood it correctly, that's an internal issue in > libgfortran and has no effect on the descriptor. That being said, the > F2018 C descriptor has both the type and kind encoded into the type > field, see table 18.4 in F2018 N2146. (In a way that's redundant, > since the type and the elem_len fields suffice to figure out the > type&kind combination). Anyway, is there a case for following suit, > and if so, is it a too invasive change at this point? > > In any case, since we seem to agree that we have no big lingering > issues that would require us to break the ABI again for GCC 9, IMHO > this is Ok for trunk. You might want to get an explicit Ok from the > release manager, though. > > -- > Janne Blomqvist -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
Hi, On 23/01/2018 18:38, Jason Merrill wrote: On 01/22/2018 05:13 PM, Paolo Carlini wrote: Hi again, On 22/01/2018 22:50, Paolo Carlini wrote: Ok. The below passes the C++ testsuite and I'm finishing testing it. Therefore, as you already hinted to, we can now say that what was *really* missing from potential_constant_expression_1 was the use of default_init_uninitialized_part, which does all the non-trivial work besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. check_for_uninitialized_const_var also provides the informs, which were completely missing. Grrr. Testing the library revealed immediately the failure of 18_support/byte/ops.cc, because in constexpr_context_p == true, thus from potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. I guess we really want to keep the existing constexpr_context_p == false cases separate. I'm therefore restarting testing with the below. + && ((!constexpr_context_p + && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))) + || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl))) && !DECL_INITIAL (decl)) I think I'd replace the DECL_INITIAL check with DECL_NONTRIVIALLY_INITIALIZED_P, which seems more precise. So we ought to be ok with the simpler && (constexpr_context_p || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)) Oh nice, thanks Jason. I'm therefore finishing regtesting the below, it's currently half way through the library testsuite. Ok if it passes? Thanks again, Paolo. Index: cp/constexpr.c === --- cp/constexpr.c (revision 256991) +++ cp/constexpr.c (working copy) @@ -5707,13 +5707,9 @@ potential_constant_expression_1 (tree t, bool want "% in % context", tmp); return false; } - else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp)) - { - if (flags & tf_error) - error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized " - "variable %qD in % context", tmp); - return false; - } + else if (!check_for_uninitialized_const_var + (tmp, /*constexpr_context_p=*/true, flags)) + return false; } return RECUR (tmp, want_rval); Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 256991) +++ cp/cp-tree.h(working copy) @@ -6221,6 +6221,7 @@ extern tree finish_case_label (location_t, tree, extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t); extern bool check_array_designated_initializer (constructor_elt *, unsigned HOST_WIDE_INT); +extern bool check_for_uninitialized_const_var (tree, bool, tsubst_flags_t); /* in decl2.c */ extern void record_mangling(tree, bool); Index: cp/decl.c === --- cp/decl.c (revision 256991) +++ cp/decl.c (working copy) @@ -72,7 +72,6 @@ static int check_static_variable_definition (tree, static void record_unknown_type (tree, const char *); static tree builtin_function_1 (tree, tree, bool); static int member_function_or_else (tree, tree, enum overload_flags); -static void check_for_uninitialized_const_var (tree); static tree local_variable_p_walkfn (tree *, int *, void *); static const char *tag_name (enum tag_types); static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool); @@ -5545,10 +5544,14 @@ maybe_commonize_var (tree decl) } } -/* Issue an error message if DECL is an uninitialized const variable. */ +/* Issue an error message if DECL is an uninitialized const variable. + CONSTEXPR_CONTEXT_P is true when the function is called in a constexpr + context from potential_constant_expression. Returns true if all is well, + false otherwise. */ -static void -check_for_uninitialized_const_var (tree decl) +bool +check_for_uninitialized_const_var (tree decl, bool constexpr_context_p, + tsubst_flags_t complain) { tree type = strip_array_types (TREE_TYPE (decl)); @@ -5557,26 +5560,38 @@ maybe_commonize_var (tree decl) 7.1.6 */ if (VAR_P (decl) && TREE_CODE (type) != REFERENCE_TYPE - && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) - && !DECL_INITIAL (decl)) + && (constexpr_context_p + || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) + && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)) { tree field = default_init_uninitialized_part (type); if (!field) - return; + return true; - if (CP_TYPE_CONST_P (type)) - permerror (DECL_SOURCE_LOCATION (decl), - "uninitialized const %qD", decl); - e
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
On Tue, Jan 23, 2018 at 07:30:49PM +, Paul Richard Thomas wrote: > Janne, Thanks. > > Jakub, is this OK with you? It is indeed quite late for such large ABI changes, some distributions are about to start using the compiler by now. How much was it tested (on which targets)? Has the debug info side of things been adjusted too (the get_array_descr_info langhook)? > >> I took the design choice choice to replace the dtype with a structure: > >> typedef struct dtype_type > >> { > >> size_t elem_len; > >> int version; > >> int rank; > >> int type; > >> int attribute; > >> } > >> dtype_type; Isn't this too wasteful? rank will be just 0-15, right? What values can version have? What type? Do you need negative values for any of those? I think using unsigned char or unsigned short for some of those fields should be sufficient, yeah, they don't necessarily need to be bitfields. Jakub
[committed] Fix LTO ICE on OpenMP privatized member DECL_VALUE_EXPRs (PR sanitizer/83987)
Hi! DECL_OMP_PRIVATIZED_MEMBER vars are artificial, DECL_IGNORED_P and useful only during OpenMP lowering, their DECL_VALUE_EXPR isn't really useful afterwards (just shouldn't be cleared because then we could try to expand those vars if we don't optimize them as unused). The sanitizer can add calls in them to UBSAN_VPTR etc. though, which LTO streaming doesn't like. The following patch just makes sure it doesn't see them. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2018-01-23 Jakub Jelinek PR sanitizer/83987 * tree.c (cp_free_lang_data): Change DECL_VALUE_EXPR of DECL_OMP_PRIVATIZED_MEMBER vars to error_mark_node. * g++.dg/ubsan/pr83987.C: New test. --- gcc/cp/tree.c.jj2018-01-17 22:00:06.878228591 +0100 +++ gcc/cp/tree.c 2018-01-23 10:17:03.810795156 +0100 @@ -5274,6 +5274,16 @@ cp_free_lang_data (tree t) /* We do not need the leftover chaining of namespaces from the binding level. */ DECL_CHAIN (t) = NULL_TREE; + /* Set DECL_VALUE_EXPRs of OpenMP privatized member artificial + decls to error_mark_node. These are DECL_IGNORED_P and after + OpenMP lowering they aren't useful anymore. Clearing DECL_VALUE_EXPR + doesn't work, as expansion could then consider them as something + to be expanded. */ + if (VAR_P (t) + && DECL_LANG_SPECIFIC (t) + && DECL_OMP_PRIVATIZED_MEMBER (t) + && DECL_IGNORED_P (t)) +SET_DECL_VALUE_EXPR (t, error_mark_node); } /* Stub for c-common. Please keep in sync with c-decl.c. --- gcc/testsuite/g++.dg/ubsan/pr83987.C.jj 2018-01-23 10:35:37.158192734 +0100 +++ gcc/testsuite/g++.dg/ubsan/pr83987.C2018-01-23 10:37:04.819118976 +0100 @@ -0,0 +1,15 @@ +// PR sanitizer/83987 +// { dg-do compile { target fopenmp } } +// { dg-options "-fopenmp -fsanitize=vptr -O0" } + +struct A { int i; }; +struct B : virtual A { void foo (); }; + +void +B::foo () +{ +#pragma omp sections lastprivate (i) + { +i = 0; + } +} Jakub
Fix PR rtl-optimization/81443
This is a combinatorial explosion in the combiner for the num_sign_bit_copies mechanism present on the 7 branch and mainline during the bootstrap on MIPS n32, i.e. a 64-bit WORD_REGISTER_OPERATIONS SIGN_EXTEND target. While I'm still exploring various approaches to fixing it on the mainline, Richard B. agreed to have it fixed on the 7 branch by reverting the problematic bits from the PR rtl-optimization/59461 patch that introduced it. Manually tested on mips64-unknown-linux-gnu, applied on the 7 branch. 2018-01-23 Eric Botcazou PR rtl-optimization/81443 * rtlanal.c (num_sign_bit_copies1) : Do not propagate results from inner REGs to paradoxical SUBREGs. -- Eric BotcazouIndex: rtlanal.c === --- rtlanal.c (revision 256841) +++ rtlanal.c (working copy) @@ -4976,7 +4976,7 @@ num_sign_bit_copies1 (const_rtx x, machi if (WORD_REGISTER_OPERATIONS && load_extend_op (inner_mode) == SIGN_EXTEND && paradoxical_subreg_p (x) - && (MEM_P (SUBREG_REG (x)) || REG_P (SUBREG_REG (x + && MEM_P (SUBREG_REG (x))) return cached_num_sign_bit_copies (SUBREG_REG (x), mode, known_x, known_mode, known_ret); break;
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
OK, thanks. On Tue, Jan 23, 2018 at 2:37 PM, Paolo Carlini wrote: > Hi, > > On 23/01/2018 18:38, Jason Merrill wrote: >> >> On 01/22/2018 05:13 PM, Paolo Carlini wrote: >>> >>> Hi again, >>> >>> On 22/01/2018 22:50, Paolo Carlini wrote: Ok. The below passes the C++ testsuite and I'm finishing testing it. Therefore, as you already hinted to, we can now say that what was *really* missing from potential_constant_expression_1 was the use of default_init_uninitialized_part, which does all the non-trivial work besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. check_for_uninitialized_const_var also provides the informs, which were completely missing. >>> >>> Grrr. Testing the library revealed immediately the failure of >>> 18_support/byte/ops.cc, because in constexpr_context_p == true, thus from >>> potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. I guess >>> we really want to keep the existing constexpr_context_p == false cases >>> separate. I'm therefore restarting testing with the below. >> >> >>> + && ((!constexpr_context_p >>> + && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))) >>> + || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P >>> (decl))) >>>&& !DECL_INITIAL (decl)) >> >> >> I think I'd replace the DECL_INITIAL check with >> DECL_NONTRIVIALLY_INITIALIZED_P, which seems more precise. So we ought to >> be ok with the simpler >> >> && (constexpr_context_p >> || CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) >> && !DECL_NONTRIVIALLY_INITIALIZED_P (decl)) > > Oh nice, thanks Jason. > > I'm therefore finishing regtesting the below, it's currently half way > through the library testsuite. Ok if it passes? > > Thanks again, > Paolo. > >
Re: [PATCH v2] C++: Fix ICE in fold_for_warn on CAST_EXPR (PR c++/83974)
On Tue, Jan 23, 2018 at 1:06 PM, David Malcolm wrote: > On Tue, 2018-01-23 at 09:54 -0500, Jason Merrill wrote: >> On Tue, Jan 23, 2018 at 9:27 AM, David Malcolm >> wrote: >> > PR c++/83974 reports an ICE within fold_for_warn when calling >> > cxx_eval_constant_expression on a CAST_EXPR. >> > >> > This comes from a pointer-to-member-function. The result of >> > build_ptrmemfunc (within cp_convert_to_pointer for a null ptr) is >> > a CONSTRUCTOR containing, amongst other things a CAST_EXPR of a >> > TREE_LIST containing the zero INTEGER_CST. >> > >> > After r256804, fold_for_warn within a template calls >> > fold_non_dependent_expr. >> > >> > For this tree, is_nondependent_constant_expression returns true. >> > >> > potential_constant_expression_1 has these cases: >> > >> > case TREE_LIST: >> > { >> > gcc_assert (TREE_PURPOSE (t) == NULL_TREE >> > || DECL_P (TREE_PURPOSE (t))); >> > if (!RECUR (TREE_VALUE (t), want_rval)) >> > return false; >> > if (TREE_CHAIN (t) == NULL_TREE) >> > return true; >> > return RECUR (TREE_CHAIN (t), want_rval); >> > } >> > >> > and: >> > >> > case CAST_EXPR: >> > case CONST_CAST_EXPR: >> > case STATIC_CAST_EXPR: >> > case REINTERPRET_CAST_EXPR: >> > case IMPLICIT_CONV_EXPR: >> > if (cxx_dialect < cxx11 >> > && !dependent_type_p (TREE_TYPE (t)) >> > && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t))) >> > /* In C++98, a conversion to non-integral type can't be >> > part of a >> >constant expression. */ >> > { >> > // reject it >> > } >> > // accept it >> > >> > and thus returns true for the CAST_EXPR and TREE_LIST, and hence >> > for the >> > CONSTRUCTOR as a whole. >> > >> > However, cxx_eval_constant_expression does not support these tree >> > codes, >> >> Because they are template-only codes, that >> cxx_eval_constant_expression should never see. They shouldn't >> survive >> the call to instantiate_non_dependent_expr_internal from >> fold_non_dependent_expr. >> >> Jason > > Aha - thanks. > > instantiate_non_dependent_expr_internal calls tsubst_copy_and_build, and > tsubst_copy_and_build is bailing out here: > > 18103 /* digest_init will do the wrong thing if we let it. */ > 18104 if (type && TYPE_PTRMEMFUNC_P (type)) > 18105 RETURN (t); > > leaving the CONSTRUCTOR uncopied, and thus containing the CAST_EXPR and > TREE_LIST, leading to the ICE within cxx_eval_constant_expression. Wow, how has that not broken things before now? The patch is OK. Jason
Re: [PATCH] libgcc: xtensa: fix NaN return from add/sub/mul/div helpers
On Tue, Jan 23, 2018 at 9:55 AM, Max Filippov wrote: > libgcc/ > 2018-01-22 Max Filippov > > * config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3) > (__divsf3): Make NaN return value quiet. > * config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3) > (__divdf3): Make NaN return value quiet. This is fine. Please apply.
C++ PATCH for c++/83947, ICE with auto deduction
In this testcase, we were deducing a type for g from the function f which has not yet been deduced itself. Fixed by recognizing the case of an undeduced initializer. The change to undeduced_auto_decl is necessary because auto parameters are a different story; they act more like normal template parameters. Tested x86_64-pc-linux-gnu, applying to trunk. commit 28ec0bede14adc98ac9cec6693f8a4c0ccf39d14 Author: Jason Merrill Date: Tue Jan 23 13:47:13 2018 -0500 PR c++/83947 - ICE with auto declarations. * pt.c (do_auto_deduction): Don't deduce from an auto decl. * decl.c (undeduced_auto_decl): Limit to vars and fns. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index f6fab422d17..e6d9289abd2 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -16189,15 +16189,17 @@ fndecl_declared_return_type (tree fn) return TREE_TYPE (TREE_TYPE (fn)); } -/* Returns true iff DECL was declared with an auto type and it has - not yet been deduced to a real type. */ +/* Returns true iff DECL is a variable or function declared with an auto type + that has not yet been deduced to a real type. */ bool undeduced_auto_decl (tree decl) { if (cxx_dialect < cxx11) return false; - return type_uses_auto (TREE_TYPE (decl)); + return ((VAR_OR_FUNCTION_DECL_P (decl) + || TREE_CODE (decl) == TEMPLATE_DECL) + && type_uses_auto (TREE_TYPE (decl))); } /* Complain if DECL has an undeduced return type. */ diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 0296845a31b..d5d263c1c74 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -25891,6 +25891,10 @@ do_auto_deduction (tree type, tree init, tree auto_node, from ahead of time isn't worth the trouble. */ return type; + /* Similarly, we can't deduce from another undeduced decl. */ + if (init && undeduced_auto_decl (init)) +return type; + if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node)) /* C++17 class template argument deduction. */ return do_class_deduction (type, tmpl, init, flags, complain); diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn46.C b/gcc/testsuite/g++.dg/cpp1y/auto-fn46.C new file mode 100644 index 000..120a4dd9e7c --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn46.C @@ -0,0 +1,6 @@ +// PR c++/83947 +// { dg-do compile { target c++14 } } + +auto f (); +template < int > auto g (f); // { dg-error "before deduction" } +auto h = g < 0 > ();
Re: [PATCH 4/6] [ARC] Rework delegitimate_address hook
* Claudiu Zissulescu [2017-11-02 13:30:33 +0100]: > From: claziss > > Delegitimize address is used to undo the obfuscating effect of PIC > addresses, returning the address in a way which is understood by the > compiler. > > gcc/ > 2017-04-25 Claudiu Zissulescu > > * config/arc/arc.c (arc_delegitimize_address_0): Refactored to > recognize new pic like addresses. > (arc_delegitimize_address): Clean up. > > testsuite/ > 2017-08-31 Claudiu Zissulescu > > * testsuite/gcc.target/arc/tdelegitimize_addr.c: New test. Assuming this has passed all of the tests, then this change is fine with me. The commit message you propose above describes what delegitimize does in general, but it doesn't really explain why _this_ change is needed. You remove a lot of code, it would be nice if the commit message explained why we're able to drop all of this complexity. Thanks, Andrew > --- > gcc/config/arc/arc.c | 91 > ++- > gcc/testsuite/gcc.target/arc/tdelegitimize_addr.c | 23 ++ > 2 files changed, 62 insertions(+), 52 deletions(-) > create mode 100755 gcc/testsuite/gcc.target/arc/tdelegitimize_addr.c > > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c > index e7194a2..07dd072 100644 > --- a/gcc/config/arc/arc.c > +++ b/gcc/config/arc/arc.c > @@ -9506,68 +9506,55 @@ arc_legitimize_address (rtx orig_x, rtx oldx, > machine_mode mode) > } > > static rtx > -arc_delegitimize_address_0 (rtx x) > +arc_delegitimize_address_0 (rtx op) > { > - rtx u, gp, p; > - > - if (GET_CODE (x) == CONST && GET_CODE (u = XEXP (x, 0)) == UNSPEC) > + switch (GET_CODE (op)) > { > - if (XINT (u, 1) == ARC_UNSPEC_GOT > - || XINT (u, 1) == ARC_UNSPEC_GOTOFFPC) > - return XVECEXP (u, 0, 0); > +case CONST: > + return arc_delegitimize_address_0 (XEXP (op, 0)); > + > +case UNSPEC: > + switch (XINT (op, 1)) > + { > + case ARC_UNSPEC_GOT: > + case ARC_UNSPEC_GOTOFFPC: > + return XVECEXP (op, 0, 0); > + default: > + break; > + } > + break; > + > +case PLUS: > + { > + rtx t1 = arc_delegitimize_address_0 (XEXP (op, 0)); > + rtx t2 = XEXP (op, 1); > + > + if (t1 && t2) > + return gen_rtx_PLUS (GET_MODE (op), t1, t2); > + break; > + } > + > +default: > + break; > } > - else if (GET_CODE (x) == CONST && GET_CODE (p = XEXP (x, 0)) == PLUS > -&& GET_CODE (u = XEXP (p, 0)) == UNSPEC > -&& (XINT (u, 1) == ARC_UNSPEC_GOT > -|| XINT (u, 1) == ARC_UNSPEC_GOTOFFPC)) > -return gen_rtx_CONST > - (GET_MODE (x), > - gen_rtx_PLUS (GET_MODE (p), XVECEXP (u, 0, 0), XEXP (p, 1))); > - else if (GET_CODE (x) == PLUS > -&& ((REG_P (gp = XEXP (x, 0)) > - && REGNO (gp) == PIC_OFFSET_TABLE_REGNUM) > -|| (GET_CODE (gp) == CONST > -&& GET_CODE (u = XEXP (gp, 0)) == UNSPEC > -&& XINT (u, 1) == ARC_UNSPEC_GOT > -&& GET_CODE (XVECEXP (u, 0, 0)) == SYMBOL_REF > -&& !strcmp (XSTR (XVECEXP (u, 0, 0), 0), "_DYNAMIC"))) > -&& GET_CODE (XEXP (x, 1)) == CONST > -&& GET_CODE (u = XEXP (XEXP (x, 1), 0)) == UNSPEC > -&& XINT (u, 1) == ARC_UNSPEC_GOTOFF) > -return XVECEXP (u, 0, 0); > - else if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x, 0)) == PLUS > -&& ((REG_P (gp = XEXP (XEXP (x, 0), 1)) > - && REGNO (gp) == PIC_OFFSET_TABLE_REGNUM) > -|| (GET_CODE (gp) == CONST > -&& GET_CODE (u = XEXP (gp, 0)) == UNSPEC > -&& XINT (u, 1) == ARC_UNSPEC_GOT > -&& GET_CODE (XVECEXP (u, 0, 0)) == SYMBOL_REF > -&& !strcmp (XSTR (XVECEXP (u, 0, 0), 0), "_DYNAMIC"))) > -&& GET_CODE (XEXP (x, 1)) == CONST > -&& GET_CODE (u = XEXP (XEXP (x, 1), 0)) == UNSPEC > -&& XINT (u, 1) == ARC_UNSPEC_GOTOFF) > -return gen_rtx_PLUS (GET_MODE (x), XEXP (XEXP (x, 0), 0), > - XVECEXP (u, 0, 0)); > - else if (GET_CODE (x) == PLUS > -&& (u = arc_delegitimize_address_0 (XEXP (x, 1 > -return gen_rtx_PLUS (GET_MODE (x), XEXP (x, 0), u); >return NULL_RTX; > } > > static rtx > -arc_delegitimize_address (rtx x) > +arc_delegitimize_address (rtx orig_x) > { > - rtx orig_x = x = delegitimize_mem_from_attrs (x); > - if (GET_CODE (x) == MEM) > + rtx x = orig_x; > + > + if (MEM_P (x)) > x = XEXP (x, 0); > + >x = arc_delegitimize_address_0 (x); > - if (x) > -{ > - if (MEM_P (orig_x)) > - x = replace_equiv_address_nv (orig_x, x); > - return x; > -} > - return orig_x; > + if (!x) > +return orig_x; > + > + if (MEM_P (orig_x)) > +x = replace_equiv_address_nv (orig_x, x); > + return x; > } > > /* Return a REG rtx for acc1. N.B. the gcc-internal representation may > diff --git a/gcc/testsuite/gcc.target/arc/tdelegitimize_addr.c
Re: [PATCH] libgcc: xtensa: fix NaN return from add/sub/mul/div helpers
On Tue, Jan 23, 2018 at 1:07 PM, augustine.sterl...@gmail.com wrote: > On Tue, Jan 23, 2018 at 9:55 AM, Max Filippov wrote: >> libgcc/ >> 2018-01-22 Max Filippov >> >> * config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3) >> (__divsf3): Make NaN return value quiet. >> * config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3) >> (__divdf3): Make NaN return value quiet. > > This is fine. Please apply. Thanks, applied to trunk and backported to gcc-6 and gcc-7 branches. -- Max
[PATCH] RISC-V: Add -mpreferred-stack-boundary option.
This adds a -mpreferred-stack-boundary option patterned after the one supported by the x86 port. This allows one to reduce the size of stack frames to reduce memory usage. This was tested with a rv32i/ilp32 target using -mpreferred-stack-boundary=3 to get 8-byte alignment instead of the default 16-byte alignment. There were no regressions other than gcc.dg/stack-usage-1.c which is expected to fail as it checks for the default stack frame size. Committed. Jim 2018-01-23 Andrew Waterman gcc/ * config/riscv/riscv.c (riscv_stack_boundary): New. (riscv_option_override): Set riscv_stack_boundary. Handle riscv_preferred_stack_boundary_arg. * config/riscv/riscv.h (MIN_STACK_BOUNDARY, ABI_STACK_BOUNDARY): New. (BIGGEST_ALIGNMENT): Set to STACK_BOUNDARY. (STACK_BOUNDARY): Set to riscv_stack_boundary. (RISCV_STACK_ALIGN): Use STACK_BOUNDARY. * config/riscv/riscv.opt (mpreferred-stack-boundary): New. * doc/invoke.tex (RISC-V Options): Add -mpreferred-stack-boundary. --- gcc/config/riscv/riscv.c | 17 + gcc/config/riscv/riscv.h | 17 - gcc/config/riscv/riscv.opt | 4 gcc/doc/invoke.texi| 11 +++ 4 files changed, 44 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 20660a4061a..4ef7a1774c4 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -222,6 +222,9 @@ struct riscv_cpu_info { /* Whether unaligned accesses execute very slowly. */ bool riscv_slow_unaligned_access_p; +/* Stack alignment to assume/maintain. */ +unsigned riscv_stack_boundary; + /* Which tuning parameters to use. */ static const struct riscv_tune_info *tune_info; @@ -4111,6 +4114,20 @@ riscv_option_override (void) /* We do not yet support ILP32 on RV64. */ if (BITS_PER_WORD != POINTER_SIZE) error ("ABI requires -march=rv%d", POINTER_SIZE); + + /* Validate -mpreferred-stack-boundary= value. */ + riscv_stack_boundary = ABI_STACK_BOUNDARY; + if (riscv_preferred_stack_boundary_arg) +{ + int min = ctz_hwi (MIN_STACK_BOUNDARY / 8); + int max = 8; + + if (!IN_RANGE (riscv_preferred_stack_boundary_arg, min, max)) + error ("-mpreferred-stack-boundary=%d must be between %d and %d", + riscv_preferred_stack_boundary_arg, min, max); + + riscv_stack_boundary = 8 << riscv_preferred_stack_boundary_arg; +} } /* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */ diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index cd37761b629..a002bff4480 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -123,8 +123,14 @@ along with GCC; see the file COPYING3. If not see /* Allocation boundary (in *bits*) for the code of a function. */ #define FUNCTION_BOUNDARY (TARGET_RVC ? 16 : 32) +/* The smallest supported stack boundary the calling convention supports. */ +#define MIN_STACK_BOUNDARY (2 * BITS_PER_WORD) + +/* The ABI stack alignment. */ +#define ABI_STACK_BOUNDARY 128 + /* There is no point aligning anything to a rounder boundary than this. */ -#define BIGGEST_ALIGNMENT 128 +#define BIGGEST_ALIGNMENT STACK_BOUNDARY /* The user-level ISA permits unaligned accesses, but they are not required of the privileged architecture. */ @@ -472,8 +478,8 @@ enum reg_class `crtl->outgoing_args_size'. */ #define OUTGOING_REG_PARM_STACK_SPACE(FNTYPE) 1 -#define STACK_BOUNDARY 128 - +#define STACK_BOUNDARY riscv_stack_boundary + /* Symbolic macros for the registers used to return integer and floating point values. */ @@ -528,8 +534,9 @@ typedef struct { #define EPILOGUE_USES(REGNO) ((REGNO) == RETURN_ADDR_REGNUM) -/* ABI requires 16-byte alignment, even on RV32. */ -#define RISCV_STACK_ALIGN(LOC) (((LOC) + 15) & -16) +/* Align based on stack boundary, which might have been set by the user. */ +#define RISCV_STACK_ALIGN(LOC) \ + (((LOC) + ((STACK_BOUNDARY/8)-1)) & -(STACK_BOUNDARY/8)) /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function, the stack pointer does not matter. The value is tested only in diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index 50df851afdf..581a26bb5c1 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -33,6 +33,10 @@ mabi= Target Report RejectNegative Joined Enum(abi_type) Var(riscv_abi) Init(ABI_ILP32) Specify integer and floating-point calling convention. +mpreferred-stack-boundary= +Target RejectNegative Joined UInteger Var(riscv_preferred_stack_boundary_arg) +Attempt to keep stack aligned to this power of 2. + Enum Name(abi_type) Type(enum riscv_abi_type) Supported ABIs (for use with the -mabi= option): diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 891de733160..f066349c2ff 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -992,6 +992,7 @@ See RS/6000 and PowerPC Options. -mdiv -mno-
[PATCH][PR target/83994] Fix stack-clash-protection code generation on x86
pr83994 is a code generation bug in the stack-clash support that affects openssl (we've turned on stack-clash-protection by default for the F28 builds). The core problem is stack-clash (like stack-check) will emit a probing loop if the prologue allocates enough stack space. When emitting a loop both implementations will need a scratch register. They use get_scratch_register_at_entry to find a suitable scratch register. This routine assumes that callee registers saves are completed at the point where the scratch register is going to be used. In this particular testcase we select %ebx because ax,cx,dx are used for parameter passing. That's fine. The problem is %ebx hasn't been saved yet! -fstack-check has a bit of code in the frame setup/layout code which forces the prologue to use pushes rather than reg->mem moves for saving registers. There's a gcc_assert in the prologue expander to catch any case where the registers aren't saved. -fstack-clash-protection doesn't have that same bit of magic in the frame setup/layout code and it bypasses the assertion due to a change I made back in Nov 2017 due to not being aware of this particular issue. This patch reverts the assertion bypass I added back in Nov 2017 and adds clarifying comments. The patch also forces use of push to save integer registers for a stack-clash protected prologue if probes are going to be needed. Bootstrapped and regression tested on x86_64. While the bug is not marked as a regression, ISTM this needs to be fixed for gcc-8. OK for the trunk? Jeff * i386.c (get_probe_interval): Move to earlier point. (ix86_compute_frame_layout): If -fstack-clash-protection and the frame is larger than the probe interval, then use pushes to save registers rather than reg->mem moves. (ix86_expand_prologue): Remove conditional for int_registers_saved assertion. * gcc.target/i386/pr83994.c: New test. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 72d25ae..4cb55a8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11497,6 +11497,18 @@ static void warn_once_call_ms2sysv_xlogues (const char *feature) } } +/* Return the probing interval for -fstack-clash-protection. */ + +static HOST_WIDE_INT +get_probe_interval (void) +{ + if (flag_stack_clash_protection) +return (HOST_WIDE_INT_1U + << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL)); + else +return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP); +} + /* When using -fsplit-stack, the allocation routines set a field in the TCB to the bottom of the stack plus this much space, measured in bytes. */ @@ -11773,7 +11785,14 @@ ix86_compute_frame_layout (void) to_allocate = offset - frame->sse_reg_save_offset; if ((!to_allocate && frame->nregs <= 1) - || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))) + || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)) + /* If stack clash probing needs a loop, then it needs a +scratch register. But the returned register is only guaranteed +to be safe to use after register saves are complete. So if +stack clash protections are enabled and the allocated frame is +larger than the probe interval, then use pushes to save +callee saved registers. */ + || (flag_stack_clash_protection && to_allocate > get_probe_interval ())) frame->save_regs_using_mov = false; if (ix86_using_red_zone () @@ -12567,18 +12586,6 @@ release_scratch_register_on_entry (struct scratch_reg *sr) } } -/* Return the probing interval for -fstack-clash-protection. */ - -static HOST_WIDE_INT -get_probe_interval (void) -{ - if (flag_stack_clash_protection) -return (HOST_WIDE_INT_1U - << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL)); - else -return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP); -} - /* Emit code to adjust the stack pointer by SIZE bytes while probing it. This differs from the next routine in that it tries hard to prevent @@ -13727,12 +13734,11 @@ ix86_expand_prologue (void) && (flag_stack_check == STATIC_BUILTIN_STACK_CHECK || flag_stack_clash_protection)) { - /* This assert wants to verify that integer registers were saved -prior to probing. This is necessary when probing may be implemented -as a function call (Windows). It is not necessary for stack clash -protection probing. */ - if (!flag_stack_clash_protection) - gcc_assert (int_registers_saved); + /* We expect the GP registers to be saved when probes are used +as the probing sequences might need a scratch register and +the routine to allocate one assumes the integer registers +have already been saved. */ + gcc_assert (int_registers_saved); if (flag_stack_clash_protection) { diff --git a/gcc/testsuite/gcc.ta
Re: libgo patch committed: Update to Go1.10beta2 release
On 17.01.2018 15:20, Ian Lance Taylor wrote: > This patch updates libgo to the Go1.10beta2 release. The complete > patch is too large to include in this e-mail message, mainly due to > some test changes. Bootstrapped and ran Go testsuite on > x86_64-pc-linux-gnu. Committed to mainline. gotools now installs three more binaries into the internal gcclibdir. Are all of these intended to be installed, and is the installation location correct? $ ls -l /usr/lib/gcc/x86_64-linux-gnu/8/ total 2132 -rwxr-xr-x 1 doko doko 412768 Jan 24 00:24 buildid -rwxr-xr-x 1 doko doko 391480 Jan 24 00:24 test2json -rwxr-xr-x 1 doko doko 1357656 Jan 24 00:24 vet Matthias
Re: libgo patch committed: Update to Go1.10beta2 release
On Tue, Jan 23, 2018 at 5:24 PM, Matthias Klose wrote: > On 17.01.2018 15:20, Ian Lance Taylor wrote: >> This patch updates libgo to the Go1.10beta2 release. The complete >> patch is too large to include in this e-mail message, mainly due to >> some test changes. Bootstrapped and ran Go testsuite on >> x86_64-pc-linux-gnu. Committed to mainline. > > gotools now installs three more binaries into the internal gcclibdir. Are all > of > these intended to be installed, and is the installation location correct? > > $ ls -l /usr/lib/gcc/x86_64-linux-gnu/8/ > total 2132 > -rwxr-xr-x 1 doko doko 412768 Jan 24 00:24 buildid > -rwxr-xr-x 1 doko doko 391480 Jan 24 00:24 test2json > -rwxr-xr-x 1 doko doko 1357656 Jan 24 00:24 vet Yes, and yes. They are separate helper executables invoked by the go tool installed in $(bindir). Ian
Re: [RFC][PATCH] PR preprocessor/83173: Additional check before decrementing highest_location
On 01/14/2018 12:16 PM, Mike Gulick wrote: > On 01/12/2018 06:16 PM, David Malcolm wrote: [snip] >> >> I was going to suggest renaming your new test to e.g. >> location-overflow-test-pr83173.c >> so that it matches the glob in those comments, but given that you refer >> to the testname in dg-final directives, please update the line-map.h >> comments to refer to e.g. "all of testcases in gcc.dg/plugin that use >> location_overflow_plugin.c.", or somesuch wording. >> > > If I'm going to modify location_overflow_plugin.c and reuse it for this PR, > then > it would make sense to rename the test and its accompanying helper files to > your > suggested naming as well. The dg-final regexes will likely continue to work. > I've attached an new version of the test patch that updates location_overflow_plugin.c to use PLUGIN_PRAGMAS, and updates the test filenames to be more consistent with the existing location-overflow-test-* tests. [snip] >> >> If I'm reading your description in the PR right, this test case covers >> the >> loc > LINE_MAP_MAX_LOCATION_WITH_COLS >> case, but not the: >> loc <= LINE_MAP_MAX_LOCATION_WITH_COLS >> case. >> >> Can the latter be done by re-using the testcase, but with a different >> (or no) plugin argument? >> > > I haven't really poked too hard to try to find if there is any corruption of > the > line map occurring when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS. It is just a > suspicion given the fact that this code is still decrementing > line_table->highest_location in that case. I would imagine this may result in > corruption of the column number or range rather than the file name and line > number. > I haven't been able to find any clear bugs in the gcc output when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS, but I'm not quite sure how this behavior, if indeed incorrect, would manifest itself. The original bug is only visible (AFAIK) when running gcc with -E, as it results in incorrect file names and line numbers in the preprocessed output. If the file name and line number are correct in this output (as they would be when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS), then everything will be fine when the .i file is read back in. I'm not sure if/how this bug can trigger any incorrect behavior when not using -E (or -no-integrated-cpp). Still, it does seem to me that even when loc <= LINE_MAP_MAX_LOCATION_WITH_COLS, _cpp_stack_include() is doing the wrong thing by decrementing pfile->line_table->highest_location when CPP_INCREMENT_LINE was not called from _cpp_lex_direct(). I will think about this a little more, and would value any insight you can offer. There is some more information about the details of what goes wrong in these functions in the original bug report. Thanks, Mike From a0320fc6e4292a194a8680b26f4951a320fceaf2 Mon Sep 17 00:00:00 2001 From: Mike Gulick Date: Thu, 30 Nov 2017 18:35:48 -0500 Subject: [PATCH v2] PR preprocessor/83173: New test 2017-12-01 Mike Gulick PR preprocessor/83173 * gcc.dg/plugin/location-overflow-test-pr83173.c: New test. * gcc.dg/plugin/location-overflow-test-pr83173.h: Header for pr83173.c. * gcc.dg/plugin/location-overflow-test-pr83173-1.h: Header for pr83173.c. * gcc.dg/plugin/location-overflow-test-pr83173-2.h: Header for pr83173.c. * gcc.dg/plugin/location_overflow_plugin.c: Use PLUGIN_PRAGMAS instead of PLUGIN_START_UNIT. * gcc.dg/plugin/plugin.exp: Enable new test. --- .../plugin/location-overflow-test-pr83173-1.h | 2 ++ .../plugin/location-overflow-test-pr83173-2.h | 2 ++ .../gcc.dg/plugin/location-overflow-test-pr83173.c | 21 + .../gcc.dg/plugin/location-overflow-test-pr83173.h | 2 ++ .../gcc.dg/plugin/location_overflow_plugin.c| 13 - gcc/testsuite/gcc.dg/plugin/plugin.exp | 3 ++- 6 files changed, 37 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173.c create mode 100644 gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173.h diff --git a/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h new file mode 100644 index 000..f47dc3643c2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-1.h @@ -0,0 +1,2 @@ +#pragma once +#define LOCATION_OVERFLOW_TEST_PR83173_1_H diff --git a/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h new file mode 100644 index 000..bc23ed2a188 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173-2.h @@ -0,0 +1,2 @@ +#pragma once +#define LOCATION_OVERFLOW_TEST_PR83173_2_H diff --git a/gcc/testsuite/gcc.dg/plugin/location-overflow-test-pr83173.c b/gcc/testsuite/gcc.d
[PR81611] improve auto-inc
These two patches fix PR81611. The first one improves forwprop so that we avoid adding SSA conflicting by forwpropping the iv increment, which may cause both the incremented and the original value to be live, even when the iv is copied between the PHI node and the increment. We already handled the case in which there aren't any such copies. Alas, this is not enough to address the problem on avr, even though it fixes it on e.g. ppc. The reason is that avr rejects var+offset addresses, and this prevents the memory access in a post-inc code sequence from being adjusted to an address that auto-inc-dec can recognize. The second patch adjusts auto-inc-dec to recognize a code sequence in which the original, unincremented pseudo is used in an address after it's incremented into another pseudo, and turn that into a post-inc address, leaving the copying for subsequent passes to eliminate. Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and aarch64-linux-gnu. Ok to install? I'd appreciate suggestions on how to turn the submitted testcase into a regression test; I suppose an avr-specific test that requires the auto-inc transformation is a possibility, but that feels a bit too limited, doesn't it? Thoughts? Thanks in advance, [PR81611] accept copies in simple_iv_increment_p If there are copies between the GIMPLE_PHI at the loop body and the increment that reaches it (presumably through a back edge), still regard it as a simple_iv_increment, so that we won't consider the value in the back edge eligible for forwprop. Doing so would risk making the phi node and the incremented conflicting value live within the loop, and the phi node to be preserved for propagated uses after the loop. for gcc/ChangeLog PR tree-optimization/81611 * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening copies. --- gcc/tree-ssa-dom.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 2b371667253a..3c0ff9458342 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies) /* Returns true when STMT is a simple iv increment. It detects the following situation: - i_1 = phi (..., i_2) - i_2 = i_1 +/- ... */ + i_1 = phi (..., i_k) + [...] + i_j = i_{j-1} for each j : 2 <= j <= k-1 + [...] + i_k = i_{k-1} +/- ... */ bool simple_iv_increment_p (gimple *stmt) @@ -1305,8 +1308,18 @@ simple_iv_increment_p (gimple *stmt) return false; phi = SSA_NAME_DEF_STMT (preinc); - if (gimple_code (phi) != GIMPLE_PHI) -return false; + while (gimple_code (phi) != GIMPLE_PHI) +{ + /* Follow trivial copies, but not the DEF used in a back edge, +so that we don't prevent coalescing. */ + if (gimple_code (phi) != GIMPLE_ASSIGN + || gimple_assign_lhs (phi) != preinc + || !gimple_assign_ssa_name_copy_p (phi)) + return false; + + preinc = gimple_assign_rhs1 (phi); + phi = SSA_NAME_DEF_STMT (preinc); +} for (i = 0; i < gimple_phi_num_args (phi); i++) if (gimple_phi_arg_def (phi, i) == lhs) [PR81611] turn inc-and-use-of-dead-orig into auto-inc When the addressing modes available on the machine don't allow offsets in addresses, odds are that post-increments will be represented in trees and RTL as: y <= x + 1 ... *(x) ... x <= y so deal with this form so as to create auto-inc addresses that we'd otherwise miss. for gcc/ChangeLog PR rtl-optimization/81611 * auto-inc-dec.c (attempt_change): Move dead note from mem_insn if it's the next use of regno (find_address): Take address use of reg holding non-incremented value. (merge_in_block): Attempt to use a mem insn that is the next use of the original regno. --- gcc/auto-inc-dec.c | 46 -- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c index d02fa9d081c7..4ffbcf56a456 100644 --- a/gcc/auto-inc-dec.c +++ b/gcc/auto-inc-dec.c @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg) before the memory reference. */ gcc_assert (mov_insn); emit_insn_before (mov_insn, inc_insn.insn); - move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); + regno = REGNO (inc_insn.reg0); + if (reg_next_use[regno] == mem_insn.insn) + move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0); + else + move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0); regno = REGNO (inc_insn.reg_res); reg_next_def[regno] = mov_insn; @@ -842,7 +846,7 @@ find_address (rtx *address_of_x) if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res)) { - /* Match with *reg0. */ + /* Match with *reg_res. */ mem_insn.mem_loc = address_of_x;
Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers
On Dec 21, 2017, Jeff Law wrote: > On 12/11/2017 07:54 PM, Alexandre Oliva wrote: >> + ASM_GENERATE_INTERNAL_LABEL (label, "LVU", ied->view); >> + /* FIXME: this will resolve to a small number. Could we >> + possibly emit smaller data? Ideally we'd emit a >> + uleb128, but that would make the size of DIEs >> + impossible for the compiler to compute, since it's >> + the assembler that computes the value of the view >> + label in this case. Ideally, we'd have a single form >> + encompassing both the address and the view, and >> + indirecting them through a table might make things >> + easier, but even that would be more wasteful, >> + space-wise, than what we have now. */ >> + add_AT_lbl_id (die, DW_AT_GNU_entry_view, label); > Do you have a sense of whether or not this really matters in practice? > how much bigger do we get due when we enable LVU? It's more of a matter of general design principle. DWARF strives to be compact, and outputting a very-likely small constant with most bits known to be zero is kind of against the rules. LVU proper doesn't run into this because the location view numbers are emitted as uleb128 constants in location list sections, never in the main debug section. How much it really grows depends on the representation: DWARF>5 loclists only get additional entries when at least one view number in a range pair is nonzero, and I have a sense that most view numbers in loclists are zero; for DWARF[2,5], we emit list of pairs corresponding to the ranges in each entry in the actual loclist, so we spend at least two bytes for both views, plus the pointer to the locviewlist in an additional attribute. Considering each range amounts to a pair of pointers plus at least two bytes for the location expression, and that each range gets a corresponding pair of views, and assuming all views will fit in a single uleb128 byte (128+ views at the same PC are very unlikely), the loclist section would grow by at most 20% with 32-bit pointers, and about half that much with 64-bit pointers. In reality, it ought to be a lot less than that, since location expressions's taking up a single byte (plus another byte representing the size) are common but hardly a majority of the cases. >> + /* Sanity check the block tree. This would catch a case in which >> + BLOCK got removed from the tree reachable from the outermost >> + lexical block, but got retained in markers. It would still link >> + back to its parents, but some ancestor would be missing a link >> + down the path to the sub BLOCK. If the block got removed, its >> + BLOCK_NUMBER will not be a usable value. */ >> + gcc_checking_assert (block_within_block_p (block, >> + DECL_INITIAL >> + (current_function_decl), >> + true)); >> + >> + gcc_assert (inlined_function_outer_scope_p (block)); >> + gcc_assert (!BLOCK_DIE (block)); >> + >> + /* If we can't represent it, don't bother. */ >> + if (!(dwarf_version >= 3 || !dwarf_strict)) >> +return; > Consider moving this check earlier. I don't think it's a hard > requirement, so if you put it after the asserts for a reason, then leave > it has is. The reason I put the check after the asserts was that I wanted to catch messed up block trees even if we wouldn't emit the entry point after all. Now, considering the block tree messing up was figured out, I guess moving the test earlier and saving the cycles if we're not emitting the annotation makes sense. Will do. > Generally I think this is fine (it's much simpler than the dwarf2 bits > of the LVU patch. Yeah. I wonder if there's anything I can do, now that I'm back from vacations, to help get the LVU patch reviewed. I suppose it might still be eligible for GCC 8, considering it was posted even before stage3, let alone stage4, and considering it only deals with debug info (mainly location lists), but then... unlike the SFN stuff, the additional information it outputs will only make a difference once debug info consumers learn to use it. So maybe we could afford to leave it out, or to bring it in but disabled. Thoughts? Does it make sense at this point for me to pester ^W entice some review to have a look at it? Thanks, -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
[PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
The loop_align.c test has been broken for some time, since I put in patches to eliminate some debug hooks (-mno-upper-regs-{df,di,sf}) that we deemed to no longer be needed. As Segher and I were discussing over private IRC, the root cause of this bug is the compiler no long generates the BDNZ instruction for a count down loop, instead it decrements the index in a GPR and does a branch/comparison on it. In doing so, it now unrolls the loop twice, and and the resulting loop is too big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop isn't aligned to a 32 byte boundary. While it is important to ultimately fix the code generation bug to once again generate the BDNZ instruction, it may be more involved in fixing the bug. So, I decided to rewrite the test to be simpler, and the resulting code fits within the 4-8 instruction window the target hook is looking for. I have tested this on a little endian power8 system, and a big endian power8 system, doing both bootstrap builds and regression tests. The only change to the regression test is that loop_align.c now passes on little endian 64-bit and big endian 64-bit (big endian 32-bit did not fail with the new changes). Can I install this on the trunk? Back ports to GCC 7/6 are not needed, since the original code works on those systems. [gcc/testsuite] 2018-01-24 Michael Meissner PR target/81550 * gcc.target/powerpc/loop_align.c: Rewrite test so that the loop remains small enough that it tests the alignment of loops specified by the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/testsuite/gcc.target/powerpc/loop_align.c === --- gcc/testsuite/gcc.target/powerpc/loop_align.c (revision 256992) +++ gcc/testsuite/gcc.target/powerpc/loop_align.c (working copy) @@ -1,11 +1,16 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ -/* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */ +/* { dg-options "-O2 -mcpu=power7 -falign-functions=16 -fno-reorder-blocks" } */ /* { dg-final { scan-assembler ".p2align 5,,31" } } */ -void f(double *a, double *b, double *c, int n) { - int i; +/* This test must be crafted so that the loop is less than 8 insns (and more + than 4) in order for the TARGET_ASM_LOOP_ALIGN_MAX_SKIP target hook to fire + and align the loop properly. Originally the loop used double, and various + optimizations caused the loop to double in size, and fail the test. */ + +void f(long *a, long *b, long *c, long n) { + long i; for (i=0; i < n; i++) a[i] = b[i] + c[i]; }
Re: [PATCH][PR target/83994] Fix stack-clash-protection code generation on x86
On Wed, Jan 24, 2018 at 12:15 AM, Jeff Law wrote: > > pr83994 is a code generation bug in the stack-clash support that affects > openssl (we've turned on stack-clash-protection by default for the F28 > builds). > > The core problem is stack-clash (like stack-check) will emit a probing > loop if the prologue allocates enough stack space. When emitting a loop > both implementations will need a scratch register. > > They use get_scratch_register_at_entry to find a suitable scratch > register. This routine assumes that callee registers saves are > completed at the point where the scratch register is going to be used. > > In this particular testcase we select %ebx because ax,cx,dx are used for > parameter passing. That's fine. The problem is %ebx hasn't been saved yet! > > -fstack-check has a bit of code in the frame setup/layout code which > forces the prologue to use pushes rather than reg->mem moves for saving > registers. There's a gcc_assert in the prologue expander to catch any > case where the registers aren't saved. > > -fstack-clash-protection doesn't have that same bit of magic in the > frame setup/layout code and it bypasses the assertion due to a change I > made back in Nov 2017 due to not being aware of this particular issue. > > This patch reverts the assertion bypass I added back in Nov 2017 and > adds clarifying comments. The patch also forces use of push to save > integer registers for a stack-clash protected prologue if probes are > going to be needed. > > Bootstrapped and regression tested on x86_64. > > While the bug is not marked as a regression, ISTM this needs to be fixed > for gcc-8. > > OK for the trunk? > > Jeff > > * i386.c (get_probe_interval): Move to earlier point. > (ix86_compute_frame_layout): If -fstack-clash-protection and > the frame is larger than the probe interval, then use pushes > to save registers rather than reg->mem moves. > (ix86_expand_prologue): Remove conditional for int_registers_saved > assertion. > > * gcc.target/i386/pr83994.c: New test. OK with the fixed testcase (see below). Thanks, Uros. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 72d25ae..4cb55a8 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -11497,6 +11497,18 @@ static void warn_once_call_ms2sysv_xlogues (const > char *feature) > } > } > > +/* Return the probing interval for -fstack-clash-protection. */ > + > +static HOST_WIDE_INT > +get_probe_interval (void) > +{ > + if (flag_stack_clash_protection) > +return (HOST_WIDE_INT_1U > + << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL)); > + else > +return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP); > +} > + > /* When using -fsplit-stack, the allocation routines set a field in > the TCB to the bottom of the stack plus this much space, measured > in bytes. */ > @@ -11773,7 +11785,14 @@ ix86_compute_frame_layout (void) >to_allocate = offset - frame->sse_reg_save_offset; > >if ((!to_allocate && frame->nregs <= 1) > - || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000))) > + || (TARGET_64BIT && to_allocate >= HOST_WIDE_INT_C (0x8000)) > + /* If stack clash probing needs a loop, then it needs a > +scratch register. But the returned register is only guaranteed > +to be safe to use after register saves are complete. So if > +stack clash protections are enabled and the allocated frame is > +larger than the probe interval, then use pushes to save > +callee saved registers. */ > + || (flag_stack_clash_protection && to_allocate > get_probe_interval > ())) > frame->save_regs_using_mov = false; > >if (ix86_using_red_zone () > @@ -12567,18 +12586,6 @@ release_scratch_register_on_entry (struct > scratch_reg *sr) > } > } > > -/* Return the probing interval for -fstack-clash-protection. */ > - > -static HOST_WIDE_INT > -get_probe_interval (void) > -{ > - if (flag_stack_clash_protection) > -return (HOST_WIDE_INT_1U > - << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL)); > - else > -return (HOST_WIDE_INT_1U << STACK_CHECK_PROBE_INTERVAL_EXP); > -} > - > /* Emit code to adjust the stack pointer by SIZE bytes while probing it. > > This differs from the next routine in that it tries hard to prevent > @@ -13727,12 +13734,11 @@ ix86_expand_prologue (void) >&& (flag_stack_check == STATIC_BUILTIN_STACK_CHECK > || flag_stack_clash_protection)) > { > - /* This assert wants to verify that integer registers were saved > -prior to probing. This is necessary when probing may be implemented > -as a function call (Windows). It is not necessary for stack clash > -protection probing. */ > - if (!flag_stack_clash_protection) > - gcc_assert (int_registers_saved); > + /* We expect the GP registers to be saved whe