Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Tue, 11 Feb 2020, Segher Boessenkool wrote: > On Tue, Feb 11, 2020 at 02:58:47PM +0100, Richard Biener wrote: > > On Tue, 11 Feb 2020, Roman Zhuykov wrote: > > > 11.02.2020 11:01, Richard Biener wrote: > > > Sound good, but IMHO modulo scheduler is not the best choice to be the > > > first step implementing such a concept. > > > > True ;) But since the context of this thread is unrolling ... > > Not sure how you'd figure the unroll factor to apply if you want > > to do unrolling within a classical scheduling framework? Maybe > > unroll as much as you can fill slots until the last instruction > > of the first iteration retires? > > That will be terrible on register-rich architectures: it *already* is > problematic how often some things are unrolled, blindly unrolling more > would make things worse. We need to unroll more where it helps, but > less where it does not. For that we need a good cost/benefit estimate. True. For x86 we tried but did not come up with a sensible estimate (probably the x86 uarchs are way too complicated to understand). Richard.
Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Tue, 11 Feb 2020, Segher Boessenkool wrote: > Hi! > > On Tue, Feb 11, 2020 at 03:46:05PM +0300, Roman Zhuykov wrote: > > Hmm, even when trying to move it just few passes earlier many years ago, > > got another opinion: > > https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html > > Although without such a move we still have annoying issues which RTL > > folks can't solve, see e.q. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264#c2 > > Basic block partitioning has wildly disproportionate fallout in all > later passes, both in terms of what those *do* (or don't, if partitioning > is enabled), and of impact on the code (not to mention developer time). > > Maybe the implementation can be improved, but probably we should do this > in a different way altogether. The current situation is not good. I think the expectation that you can go back to CFG layout mode and then work with CFG layout tools after we've lowered to CFG RTL is simply bogus. Yeah, you can probably do analysis things but I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle can wreck things. Undoubtedly doing CFG manipulations is not going to work since CFG layout does not respect CFG RTL restrictions. Partitioning simply uncovered latent bugs, there's nothing wrong with it IMHO. Richard.
Re: [RFC] [c-family] PR92867 - Add returns_arg attribute
On Wed, Feb 12, 2020 at 7:52 AM Prathamesh Kulkarni wrote: > > On Tue, 4 Feb 2020 at 14:54, Prathamesh Kulkarni > wrote: > > > > On Mon, 3 Feb 2020 at 14:56, Prathamesh Kulkarni > > wrote: > > > > > > On Mon, 3 Feb 2020 at 14:41, Prathamesh Kulkarni > > > wrote: > > > > > > > > On Thu, 30 Jan 2020 at 19:17, Richard Biener > > > > wrote: > > > > > > > > > > On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni > > > > > wrote: > > > > > > > > > > > > On Wed, 29 Jan 2020 at 14:38, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni > > > > > > > > wrote: > > > > > > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh > > > > > > > > > > Kulkarni wrote: > > > > > > > > > > > Thanks for the suggestions. In the attached patch I > > > > > > > > > > > bumped up value of > > > > > > > > > > > ERF_RETURNS_ARG_MASK > > > > > > > > > > > to UINT_MAX >> 2, and use highest two bits for > > > > > > > > > > > ERF_NOALIAS and ERF_RETURNS_ARG. > > > > > > > > > > > And use fn spec "Z" to store the argument number > > > > > > > > > > > in fn-spec format. > > > > > > > > > > > Does that look OK ? > > > > > > > > > > > > > > > > > > > > No. > > > > > > > > > > > > > > > > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2) > > > > > > > > > > > > > > > > > > > > /* Nonzero if the return value is equal to the argument > > > > > > > > > > number > > > > > > > > > > flags & ERF_RETURN_ARG_MASK. */ > > > > > > > > > > -#define ERF_RETURNS_ARG(1 << 2) > > > > > > > > > > +#define ERF_RETURNS_ARG(1 << > > > > > > > > > > (BITS_PER_WORD - 2)) > > > > > > > > > > > > > > > > > > > > How is size of host int related to BITS_PER_WORD? Not to > > > > > > > > > > mention that > > > > > > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - > > > > > > > > > > 2) is UB. > > > > > > > > > Oops sorry. I should have used HOST_BITS_PER_INT. > > > > > > > > > I assume that'd be correct ? > > > > > > > > > > > > > > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative > > > > > > > > number, you'd > > > > > > > > need either 1U and verify all ERF_* flags uses, or avoid using > > > > > > > > the sign bit. > > > > > > > > The patch has other issues, you don't verify that the argnum > > > > > > > > fits into > > > > > > > > the bits (doesn't overflow into the other ERF_* bits), in > > > > > > > > + char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD); > > > > > > > > + s[0] = 'Z'; > > > > > > > > + sprintf (s + 1, "%lu", argnum); > > > > > > > > 1) sizeof (char) is 1 by definition > > > > > > > > 2) it is pointless to allocate it and then deallocate (just use > > > > > > > > automatic > > > > > > > > array) > > > > > > > > 3) it is unclear how is BITS_PER_WORD related to the length of > > > > > > > > decimal > > > > > > > > encoded string + Z char + terminating '\0'. The usual way is > > > > > > > > for unsigned > > > > > > > > numbers to estimate number of digits by counting 3 digits per > > > > > > > > each 8 bits, > > > > > > > > in your case of course + 2. > > > > > > > > > > > > > > > > More importantly, the "fn spec" attribute isn't used just in > > > > > > > > gimple_call_return_flags, but also in e.g. > > > > > > > > gimple_call_arg_flags which > > > > > > > > assumes that the return stuff in there is a single char and the > > > > > > > > reaming > > > > > > > > chars are for argument descriptions, or in decl_return_flags > > > > > > > > which you > > > > > > > > haven't modified. > > > > > > > > > > > > > > > > I must say I fail to see the point in trying to glue this > > > > > > > > together into the > > > > > > > > "fn spec" argument so incompatibly, why can't we handle the fn > > > > > > > > spec with its > > > > > > > > 1-4 returns_arg and returns_arg attribute with arbitrary > > > > > > > > position > > > > > > > > side-by-side? > > > > > > > > > > > > > > Yeah, I wouldn't have added "fn spec" for "returns_arg" but kept > > > > > > > the > > > > > > > query interface thus access it via gimple_call_return_flags and > > > > > > > use > > > > > > > ERF_*. For the flags adjustment just up the maximum argument > > > > > > > to 1<<15 then the argument number is also nicely aligned, no need > > > > > > > to do fancy limiting that depends on the host. For too large > > > > > > > argument numbers just warn the attribute is ignored. I'd say even > > > > > > > a max of 255 is sane just the existing limit is a bit too low. > > > > > > Hi, > > > > > > Thanks for the suggestions. In the attached patch, I use TREE_VALUE > > > > > > (attr) to store/retrieve > > > > > > arbitrary argument position, and have bumped ERF_RETURNS_ARG_MASK > > > > > > to 0x3ff
[PATCH] testsuite/93697 fix inconsistent warning in testcase
The warning was emitted inconsistently on targets, so disable it since the testcase was for an ICE. 2020-02-12 Richard Biener PR testsuite/93697 * gcc.dg/pr93661.c: Pass -w, remove dg-warning. --- gcc/testsuite/gcc.dg/pr93661.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr93661.c b/gcc/testsuite/gcc.dg/pr93661.c index e311ba545c4..bc77fcab69e 100644 --- a/gcc/testsuite/gcc.dg/pr93661.c +++ b/gcc/testsuite/gcc.dg/pr93661.c @@ -1,9 +1,9 @@ /* { dg-do compile } */ -/* { dg-options "-O2" } */ +/* { dg-options "-O2 -w" } */ int f () { unsigned x = 0x; - __builtin_memset (1+(char *) &x, 0, -1); /* { dg-warning "maximum object size" } */ + __builtin_memset (1+(char *) &x, 0, -1); return (x != 0xf000); } -- 2.16.4
Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops
Hello! 11.02.2020 16:40, Andrea Corallo wrote: > Hi Richard, > > "Richard Earnshaw (lists)" writes: > >>> gcc/ChangeLog: >>> 2020-??-?? Andrea Corallo >>> 2020-??-?? Mihail-Calin Ionescu >>> 2020-??-?? Iain Apreotesei >>> * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP): >>> (arm_invalid_within_doloop): Implement invalid_within_doloop hook. >>> * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro. >>> * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn): >>> Add new patterns. >>> * config/arm/unspecs.md: Add new unspec. >>> >> A date should only appear before the first author in a multi-author >> patch, other authors should then be indented to align with the name of >> that first author. > Ack This patch is stage1 material, right? > >> +(define_insn "*doloop_end" >> + [(parallel [(set (pc) >> + (if_then_else >> + (ne (reg:SI LR_REGNUM) (const_int 1)) >> + (label_ref (match_operand 0 "" "")) >> + (pc))) >> + (set (reg:SI LR_REGNUM) >> + (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])] >> + "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched" >> + "le\tlr, %l0") I'm not an expert in .md files, but having that "!flag_modulo_sched" condition seems wrong to me. What was the issue on SMS side to add that? Currently, there are fake doloop_end pattern on ARM. It is generated only when flag_modulo_sched is set and actually expands to more than one instruction. This old approach have its pros and cons. When we HAVE_LOB, target allows us to use a real doloop_end instruction, fake one is not needed at all. In this case compiler should use real instruction regardless whether SMS in on or off. I hope in stage1 after upgrading modulo scheduler, we will restart old discussion about removing fake doloop_end pattern for ARM: https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01812.html https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00195.html Aarch64 also have such a fake pattern since 2014, probably its removal also will be considered. Roman >> Is it deliberate that this pattern name has a '*' prefix? doloop_end >> is a named expansion pattern according to md.texi. > Yes, this should be expanded already by the define_expand we have in > thumb2.md. Perhaps I'll call it 'doloop_end_internal' and add a > comment. > >> Also, hard-coded register names should be prefixed with '%|' (so >> "%|lr", not just "lr"), just in case the assembler dialect requires >> something (ELF doesn't but others have). Also for dls_insn. > Ack > >> For the tests, your 'require-effective-taret' tests look insufficient >> to prevent problems when testing a multilib environment, you'll need >> (at least) checks that a) passing -marm has not happened and b) that >> the architecture, or a specific CPU isn't being passed on the command >> line. > Ack > > Thanks for reviewing I'll update the patch. > > Andrea
[PATCH] i386: Fix up vec_extract_lo* patterns [PR93670]
Hi! The VEXTRACT* insns have way too many different CPUID feature flags (ATT syntax) vextractf128 $imm, %ymm, %xmm/mem AVX vextracti128 $imm, %ymm, %xmm/mem AVX2 vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F As the testcase shows and the patch too, we didn't get it right in all cases. The first hunk is about avx512vl_vextractf128v8s[if] incorrectly requiring TARGET_AVX512DQ. The corresponding insn is the first vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it correct (TARGET_AVX512VL implies TARGET_AVX512F): BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI) We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if]. The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that, but instead incorrectly && 1 for non-masked and && (64 == 64 && TARGET_AVX512VL) for masked insns. This is extraction from ZMM, so it doesn't need VL for anything. The hunk actually only requires TARGET_AVX512DQ when the insn is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F and does the same thing, extracts the low 256 bits from 512 bits vector (often we split it into just nothing, but there are some special cases like when using xmm16+ when we can't without AVX512VL). The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask suffixed ones are ok already and just split into nothing (lowpart subreg). The masked ones were incorrectly requiring TARGET_AVX512VL and TARGET_AVX512DQ, when we only need TARGET_AVX512VL. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2020-02-12 Jakub Jelinek PR target/93670 * config/i386/sse.md (VI48F_256_DQ): New mode iterator. (avx512vl_vextractf128): Use it instead of VI48F_256. Remove TARGET_AVX512DQ from condition. (vec_extract_lo_): Use instead of in condition. If TARGET_AVX512DQ is false, emit vextract*64x4 instead of vextract*32x8. (vec_extract_lo_): Drop from condition. * gcc.target/i386/avx512vl-pr93670.c: New test. --- gcc/config/i386/sse.md.jj 2020-02-11 14:54:38.017593464 +0100 +++ gcc/config/i386/sse.md 2020-02-11 15:50:59.629130828 +0100 @@ -8719,13 +8719,16 @@ (define_insn "vec_extract_hi_")]) +(define_mode_iterator VI48F_256_DQ + [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")]) + (define_expand "avx512vl_vextractf128" [(match_operand: 0 "nonimmediate_operand") - (match_operand:VI48F_256 1 "register_operand") + (match_operand:VI48F_256_DQ 1 "register_operand") (match_operand:SI 2 "const_0_to_1_operand") (match_operand: 3 "nonimm_or_0_operand") (match_operand:QI 4 "register_operand")] - "TARGET_AVX512DQ && TARGET_AVX512VL" + "TARGET_AVX512VL" { rtx (*insn)(rtx, rtx, rtx, rtx); rtx dest = operands[0]; @@ -8793,14 +8796,19 @@ (define_insn "vec_extract_lo_ + && && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if ( || (!TARGET_AVX512VL && !REG_P (operands[0]) && EXT_REX_SSE_REG_P (operands[1]))) -return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; +{ + if (TARGET_AVX512DQ) + return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + else + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; +} else return "#"; } @@ -8910,7 +8918,7 @@ (define_insn "vec_extract_lo_ && + && && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if () --- gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c.jj 2020-02-11 16:00:14.874930873 +0100 +++ gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c2020-02-11 15:59:01.252019025 +0100 @@ -0,0 +1,77 @@ +/* PR target/93670 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl -mno-avx512dq" } */ + +#include + +__m128i +f1 (__m256i x) +{ + return _mm256_extracti32x4_epi32 (x, 0); +} + +__m128i +f2 (__m256i x, __m128i w, __mmask8 m) +{ + return _mm256_mask_extracti32x4_epi32 (w, m, x, 0); +} + +__m128i +f3 (__m256i x, __mmask8 m) +{ + return _mm256_maskz_extracti32x4_epi32 (m, x, 0); +} + +__m128 +f4 (__m256 x) +{ + return _mm256_extractf32x4_ps (x, 0); +} + +__m128 +f5 (__m256 x, __m128 w, __mmask8 m) +{
Re: [cris-decc0 0/14] A set of compare-elimination-fixes.
> I just rebased and updated the vendors/axis branch > axis/cris-decc0 with the following commits, which should bring > back compare-elimination results to that of cc0 on master. Nice work! An example of transition done properly... > With the exception of the bit-test patterns (btst / btstq which > is more of a "combine" matter), everything is centered around > working together with the "cmpelim" pass with the help of > define_subst attributes. Feel free to further tweak the cmpelim pass if need be; I did it for the Visium so there is a precedent. :-) > No performance tests yet though, but I expect axis/cris-decc0 to > be a win over master, since as I've mentioned before, I see > improvements in register-allocation already in libgcc, which > should get back what's lost in all the special patterns I > deleted. I haven't looked into the cause, but it shouldn't > surprise anyone that there's some noticeable goodies inside > something to the effect of #ifndef HAVE_cc0, even with IRA. > (Conversion to LRA is way down on the TODO list.) For the Visium, the transition was a win overall, with sporadic regressions in delay slot filling. The transition to LRA looks more problematic. -- Eric Botcazou
Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote: > On Tue, 11 Feb 2020, Segher Boessenkool wrote: > > Basic block partitioning has wildly disproportionate fallout in all > > later passes, both in terms of what those *do* (or don't, if partitioning > > is enabled), and of impact on the code (not to mention developer time). > > > > Maybe the implementation can be improved, but probably we should do this > > in a different way altogether. The current situation is not good. > > I think the expectation that you can go back to CFG layout mode > and then work with CFG layout tools after we've lowered to CFG RTL > is simply bogus. Partitioning is also quite problematic if you do not use cfglayout mode. For example, in shrink-wrapping. It prevents a lot there. > Yeah, you can probably do analysis things but > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle > can wreck things. Undoubtedly doing CFG manipulations is not going > to work since CFG layout does not respect CFG RTL restrictions. Doing CFG manipulations on CFG RTL mode directly is almost impossible to do correctly. For example, bb-reorder. Which is a much more important optimisation than partitioning, btw. > Partitioning simply uncovered latent bugs, there's nothing wrong > with it IMHO. I don't agree. The whole way EDGE_CROSSING works hinders all other optimisations a lot. Segher
Re: [PATCH] i386: Fix up vec_extract_lo* patterns [PR93670]
On Wed, Feb 12, 2020 at 10:27 AM Jakub Jelinek wrote: > > Hi! > > The VEXTRACT* insns have way too many different CPUID feature flags (ATT > syntax) > vextractf128 $imm, %ymm, %xmm/mem AVX > vextracti128 $imm, %ymm, %xmm/mem AVX2 > vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F > vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F > vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ > vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ > vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ > vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F > > As the testcase shows and the patch too, we didn't get it right in all > cases. > > The first hunk is about avx512vl_vextractf128v8s[if] incorrectly > requiring TARGET_AVX512DQ. The corresponding insn is the first > vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it > correct (TARGET_AVX512VL implies TARGET_AVX512F): > BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, > "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, > UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI) > BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, > "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, > UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI) > We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if]. > > The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using > the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that, > but instead incorrectly && 1 for non-masked and && (64 == 64 && > TARGET_AVX512VL) > for masked insns. This is extraction from ZMM, so it doesn't need VL for > anything. The hunk actually only requires TARGET_AVX512DQ when the insn > is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can > use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F > and does the same thing, extracts the low 256 bits from 512 bits vector > (often we split it into just nothing, but there are some special cases like > when using xmm16+ when we can't without AVX512VL). > > The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask > suffixed ones are ok already and just split into nothing (lowpart subreg). > The masked ones were incorrectly requiring TARGET_AVX512VL and > TARGET_AVX512DQ, when we only need TARGET_AVX512VL. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2020-02-12 Jakub Jelinek > > PR target/93670 > * config/i386/sse.md (VI48F_256_DQ): New mode iterator. > (avx512vl_vextractf128): Use it instead of VI48F_256. Remove > TARGET_AVX512DQ from condition. > (vec_extract_lo_): Use > instead of in condition. If > TARGET_AVX512DQ is false, emit vextract*64x4 instead of > vextract*32x8. > (vec_extract_lo_): Drop > from condition. > > * gcc.target/i386/avx512vl-pr93670.c: New test. OK. Thanks, Uros. > --- gcc/config/i386/sse.md.jj 2020-02-11 14:54:38.017593464 +0100 > +++ gcc/config/i386/sse.md 2020-02-11 15:50:59.629130828 +0100 > @@ -8719,13 +8719,16 @@ (define_insn "vec_extract_hi_ (set_attr "prefix" "evex") > (set_attr "mode" "")]) > > +(define_mode_iterator VI48F_256_DQ > + [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")]) > + > (define_expand "avx512vl_vextractf128" >[(match_operand: 0 "nonimmediate_operand") > - (match_operand:VI48F_256 1 "register_operand") > + (match_operand:VI48F_256_DQ 1 "register_operand") > (match_operand:SI 2 "const_0_to_1_operand") > (match_operand: 3 "nonimm_or_0_operand") > (match_operand:QI 4 "register_operand")] > - "TARGET_AVX512DQ && TARGET_AVX512VL" > + "TARGET_AVX512VL" > { >rtx (*insn)(rtx, rtx, rtx, rtx); >rtx dest = operands[0]; > @@ -8793,14 +8796,19 @@ (define_insn "vec_extract_lo_ (const_int 4) (const_int 5) > (const_int 6) (const_int 7)])))] >"TARGET_AVX512F > - && > + && > && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" > { >if ( >|| (!TARGET_AVX512VL > && !REG_P (operands[0]) > && EXT_REX_SSE_REG_P (operands[1]))) > -return "vextract32x8\t{$0x0, %1, > %0|%0, %1, 0x0}"; > +{ > + if (TARGET_AVX512DQ) > + return "vextract32x8\t{$0x0, %1, > %0|%0, %1, 0x0}"; > + else > + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; > +} >else > return "#"; > } > @@ -8910,7 +8918,7 @@ (define_insn "vec_extract_lo_ (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3)])))] >"TARGET_AVX > - && && > + && > && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" > { >if () > --- gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c.jj 2020-02-11 > 16:00:14.874930873 +0100 > +++ gcc/testsuite/gcc.
Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Wed, 12 Feb 2020, Segher Boessenkool wrote: > On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote: > > On Tue, 11 Feb 2020, Segher Boessenkool wrote: > > > Basic block partitioning has wildly disproportionate fallout in all > > > later passes, both in terms of what those *do* (or don't, if partitioning > > > is enabled), and of impact on the code (not to mention developer time). > > > > > > Maybe the implementation can be improved, but probably we should do this > > > in a different way altogether. The current situation is not good. > > > > I think the expectation that you can go back to CFG layout mode > > and then work with CFG layout tools after we've lowered to CFG RTL > > is simply bogus. > > Partitioning is also quite problematic if you do not use cfglayout > mode. For example, in shrink-wrapping. It prevents a lot there. > > > Yeah, you can probably do analysis things but > > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle > > can wreck things. Undoubtedly doing CFG manipulations is not going > > to work since CFG layout does not respect CFG RTL restrictions. > > Doing CFG manipulations on CFG RTL mode directly is almost impossible > to do correctly. > > For example, bb-reorder. Which is a much more important optimisation > than partitioning, btw. BB reorder switches back and forth as well ... :/ I think both are closely enough related that we probably should do partitioning from within the same framework? OTOH BB reorder happens _much_ later. > > Partitioning simply uncovered latent bugs, there's nothing wrong > > with it IMHO. > > I don't agree. The whole way EDGE_CROSSING works hinders all other > optimisations a lot. I'm not sure if it's there for correctness reasons or just to help checking that nothing "undoes" the partitioning decision. Richard. > > Segher
[PATCH] c++: Emit DFP typeinfos even when DFP is disabled [PR92906]
Hi! Before Joseph's changes when compiling libstdc++-v3/libsupc++/fundamental_type_info.cc we were emitting _ZTIPDd, _ZTIPDe, _ZTIPDf, _ZTIPKDd, _ZTIPKDe, _ZTIPKDf, _ZTIDd, _ZTIDe, _ZTIDf symbols even when DFP wasn't usable, but now we don't and thus those 9 symbols @@CXXABI_1.3.4 are gone from libstdc++. While nothing could probably use it (except perhaps dlsym etc.), various tools don't really like symbols disappearing from symbol versioned shared libraries with stable ABI. Adding those in assembly would be possible, but would be a portability nightmare (the PR has something Red Hat uses in libstdc++_nonshared.a, but that can handle only a handful of linux ELF targets we care about). So, instead this patch hacks up the FE, so that it emits those, but in a way that won't make the DFP types available again on targets that don't support them. Bootstrapped/regtested on x86_64-linux and i686-linux (where it doesn't change anything, because DFP is supported), aarch64-linux (where it fixed FAIL: libstdc++-abi/abi_check ) and armv7hl-linux (where abi_check didn't fail because we don't have baseline file, but comparing abilist of unpatched and patched libstdc++.so.6 shows: +_ZTIDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8 +_ZTIDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8 +_ZTIDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8 +_ZTIPDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPKDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPKDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPKDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 ). Ok for trunk? 2020-02-12 Jakub Jelinek PR libstdc++/92906 * cp-tree.h (enum cp_tree_index): Add CPTI_FALLBACK_DFLOAT32_TYPE, CPTI_FALLBACK_DFLOAT64_TYPE and CPTI_FALLBACK_DFLOAT128_TYPE. (fallback_dfloat32_type, fallback_dfloat64_type, fallback_dfloat128_type): Define. * mangle.c (write_builtin_type): Handle fallback_dfloat*_type like dfloat*_type_node. * rtti.c (emit_support_tinfos): Emit DFP typeinfos even when dfp is disabled for compatibility. --- gcc/cp/cp-tree.h.jj 2020-02-10 15:02:05.031846484 +0100 +++ gcc/cp/cp-tree.h2020-02-11 23:00:25.561410042 +0100 @@ -206,6 +206,10 @@ enum cp_tree_index CPTI_SOURCE_LOCATION_IMPL, +CPTI_FALLBACK_DFLOAT32_TYPE, +CPTI_FALLBACK_DFLOAT64_TYPE, +CPTI_FALLBACK_DFLOAT128_TYPE, + CPTI_MAX }; @@ -366,6 +370,12 @@ extern GTY(()) tree cp_global_trees[CPTI #define access_default_nodenull_node +/* Variant of dfloat{32,64,128}_type_node only used for fundamental + rtti purposes if DFP is disabled. */ +#define fallback_dfloat32_type cp_global_trees[CPTI_FALLBACK_DFLOAT32_TYPE] +#define fallback_dfloat64_type cp_global_trees[CPTI_FALLBACK_DFLOAT64_TYPE] +#define fallback_dfloat128_type cp_global_trees[CPTI_FALLBACK_DFLOAT128_TYPE] + #include "name-lookup.h" --- gcc/cp/mangle.c.jj 2020-01-21 09:13:43.339634944 +0100 +++ gcc/cp/mangle.c 2020-02-11 22:59:28.466265009 +0100 @@ -2569,11 +2569,11 @@ write_builtin_type (tree type) write_char ('d'); else if (type == long_double_type_node) write_char ('e'); - else if (type == dfloat32_type_node) + else if (type == dfloat32_type_node || type == fallback_dfloat32_type) write_string ("Df"); - else if (type == dfloat64_type_node) + else if (type == dfloat64_type_node || type == fallback_dfloat64_type) write_string ("Dd"); - else if (type == dfloat128_type_node) + else if (type == dfloat128_type_node || type == fallback_dfloat128_type) write_string ("De"); else gcc_unreachable (); --- gcc/cp/rtti.c.jj2020-01-21 09:13:43.359634642 +0100 +++ gcc/cp/rtti.c 2020-02-11 22:59:28.467264994 +0100 @@ -1588,6 +1588,20 @@ emit_support_tinfos (void) } for (tree t = registered_builtin_types; t; t = TREE_CHAIN (t)) emit_support_tinfo_1 (TREE_VALUE (t)); + /* For compatibility, emit DFP typeinfos even when DFP isn't enabled, + because we've emitted that in the past. */ + if (!targetm.decimal_float_supported_p ()) +{ + gcc_assert (dfloat32_type_node == NULL_TREE + && dfloat64_type_node == NULL_TREE + && dfloat128_type_node == NULL_TREE); + fallback_dfloat32_type = make_node (REAL_TYPE); + fallback_dfloat64_type = make_node (REAL_TYPE); + fallback_dfloat128_type = make_node (REAL_TYPE); + emit_support_tinfo_1 (fallback_dfloat32_type); + emit_support_tinfo_1 (fallback_dfloat64_type); + emit_support_tinfo_1 (fallback_dfloat128_type); +} input_location = saved_loc; } Jakub
Re: [PATCH] i386: Skip ENDBR32 at nested function entry
On Mon, Feb 10, 2020 at 12:01 PM Uros Bizjak wrote: > > On Mon, Feb 10, 2020 at 8:53 PM H.J. Lu wrote: > > > > On Mon, Feb 10, 2020 at 11:40 AM Uros Bizjak wrote: > > > > > > On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu wrote: > > > > > > > > Since nested function isn't only called directly, there is ENDBR32 at > > > > function entry and we need to skip it for direct jump in trampoline. > > > > > > Hm, I'm afraid I don't understand this comment. Can you perhaps rephrase > > > it? > > > > > > > ix86_trampoline_init has > > > > /* Compute offset from the end of the jmp to the target function. > > In the case in which the trampoline stores the static chain on > > the stack, we need to skip the first insn which pushes the > > (call-saved) register static chain; this push is 1 byte. */ > > offset += 5; > > disp = expand_binop (SImode, sub_optab, fnaddr, > >plus_constant (Pmode, XEXP (m_tramp, 0), > > offset - (MEM_P (chain) ? 1 : 0)), > >NULL_RTX, 1, OPTAB_DIRECT); > > emit_move_insn (mem, disp); > > > > Without CET, we got > > > > 011 : > > 11: 56push %esi > > 12: 55push %ebp << trampoline jumps here. > > 13: 89 e5mov%esp,%ebp > > 15: 83 ec 08 sub$0x8,%esp > > > > With CET, if bar isn't only called directly, we got > > > > 0015 : > > 15: f3 0f 1e fb endbr32 > > 19: 56push %esi > > 1a: 55push %ebp trampoline jumps here. > > 1b: 89 e5mov%esp,%ebp > > 1d: 83 ec 08 sub$0x8,%esp > > > > We need to add 4 bytes for trampoline to skip endbr32. > > > > Here is the updated patch to check if nested function isn't only > > called directly, > > Please figure out the final patch. I don't want to waste my time > reviewing different version every half hour. Ping me in a couple of > days. This is the final version: https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00586.html You can try the testcase in the patch on any machine with CET binutils since ENDBR32 is nop on none-CET machines. Without this patch, the test will fail. Thanks. -- H.J.
Re: [PATCH] configure: Re-disable building cross-gdbserver
On 2/11/20 9:01 PM, Maciej W. Rozycki wrote: > On Tue, 11 Feb 2020, Tom Tromey wrote: > >> Maciej> Correct fallout from commit 919adfe84092 ("Move gdbserver to top >> level") >> Maciej> and revert to not building `gdbserver' in a cross-configuration, >> that is >> Maciej> where host != target, matching the documented behaviour. We have no >> way >> Maciej> to support non-native `gdbserver', and native `gdbserver' is usually >> of >> Maciej> no use with cross-GDB of the chosen host. >> >> Pedro had a different way to do this, that keeps the decision under >> gdbserver's control: >> >> https://sourceware.org/ml/gdb-patches/2020-02/msg00383.html > > That's actually quite similar to what I considered first, before I > changed my mind. Whatever. Doing it in gdbserver/ has the advantage that it stays under gdbserver's control, so it doesn't need syncing code with the gcc tree. I know of at least one off-tree port that uses gdbserver in a host != target scenario, so I imagine that this condition will evolve over time. Also, this way, changes in this area don't require running autoconf to regenerate configure. I'm not seeing any downside. > > However I would expect `exit' not to be what we want in a sourced script > (I did this differently; see below). Good point, somehow did not think of that. It worked in my patch because we source the script in a sub-shell. But it's clearer/better to not rely on that. > > case "${host}" in > + ${target}) > + gdbserver_host=${host} > + ;; > + *) > + gdbserver_host=NONE > + ;; if/else reads more to-the-point to me, so I tweaked it that way, and merged it in (to binutils-gdb), like below. I'm sorry for not noticing your earlier patch. >From f20e3e823d56e54ffe56792ea6a2fe947c2dec0d Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Wed, 12 Feb 2020 13:50:30 + Subject: [PATCH] Disable gdbserver on host != target configurations Correct fallout from commit 919adfe84092 ("Move gdbserver to top level") and revert to not building `gdbserver' in a cross-configuration, that is where host != target, matching the documented behaviour. We have no way to support non-native `gdbserver', and native `gdbserver' is usually of no use with cross-GDB of the chosen host. gdbserver/ChangeLog: 2020-02-12 Maciej W. Rozycki Pedro Alves Skip building gdbserver in a cross-configuration. * configure.srv: Set $gdbserver_host depending on whether $target is $host. Use $gdbserver_host instead of $host. --- gdbserver/ChangeLog | 7 +++ gdbserver/configure.srv | 11 +-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/gdbserver/ChangeLog b/gdbserver/ChangeLog index 09707067730..709ef23674c 100644 --- a/gdbserver/ChangeLog +++ b/gdbserver/ChangeLog @@ -1,3 +1,10 @@ +2020-02-12 Maciej W. Rozycki + Pedro Alves + + Skip building gdbserver in a cross-configuration. + * configure.srv: Set $gdbserver_host depending on whether $target + is $host. Use $gdbserver_host instead of $host. + 2020-02-11 Simon Marchi * configure: Re-generate. diff --git a/gdbserver/configure.srv b/gdbserver/configure.srv index 2e83cbdc07f..375ac0aeb2a 100644 --- a/gdbserver/configure.srv +++ b/gdbserver/configure.srv @@ -33,9 +33,16 @@ ipa_ppc_linux_regobj="powerpc-32l-ipa.o powerpc-altivec32l-ipa.o powerpc-vsx32l- # these files over and over again. srv_linux_obj="linux-low.o nat/linux-osdata.o nat/linux-procfs.o nat/linux-ptrace.o nat/linux-waitpid.o nat/linux-personality.o nat/linux-namespaces.o fork-child.o nat/fork-inferior.o" -# Input is taken from the "${host}" variable. +# Input is taken from the "${host}" and "${target}" variables. -case "${host}" in +# GDBserver can only debug native programs. +if test "${target}" = "${host}"; then +gdbserver_host=${host} +else +gdbserver_host= +fi + +case "${gdbserver_host}" in aarch64*-*-linux*) srv_tgtobj="linux-aarch64-low.o" srv_tgtobj="$srv_tgtobj nat/aarch64-linux-hw-point.o" srv_tgtobj="$srv_tgtobj linux-aarch32-low.o" base-commit: 38de8abe21fe17c31888094bd860a84f88cb5749 -- 2.14.5
Re: [PATCH] c++: Emit DFP typeinfos even when DFP is disabled [PR92906]
On 12/02/20 12:31 +0100, Jakub Jelinek wrote: Hi! Before Joseph's changes when compiling libstdc++-v3/libsupc++/fundamental_type_info.cc we were emitting _ZTIPDd, _ZTIPDe, _ZTIPDf, _ZTIPKDd, _ZTIPKDe, _ZTIPKDf, _ZTIDd, _ZTIDe, _ZTIDf symbols even when DFP wasn't usable, but now we don't and thus those 9 symbols @@CXXABI_1.3.4 are gone from libstdc++. While nothing could probably use it (except perhaps dlsym etc.), various tools don't really like symbols disappearing from symbol versioned shared libraries with stable ABI. Adding those in assembly would be possible, but would be a portability nightmare (the PR has something Red Hat uses in libstdc++_nonshared.a, but that can handle only a handful of linux ELF targets we care about). So, instead this patch hacks up the FE, so that it emits those, but in a way that won't make the DFP types available again on targets that don't support them. Bootstrapped/regtested on x86_64-linux and i686-linux (where it doesn't change anything, because DFP is supported), aarch64-linux (where it fixed FAIL: libstdc++-abi/abi_check ) and armv7hl-linux (where abi_check didn't fail because we don't have baseline file, but comparing abilist of unpatched and patched libstdc++.so.6 shows: +_ZTIDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8 +_ZTIDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8 +_ZTIDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8 +_ZTIPDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPKDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPKDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 +_ZTIPKDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16 ). Ok for trunk? It seems less hacky than the aliases that would be needed in libstdc++.so to solve it purely in the library.
Re: [PATCH][AArch64] Improve clz patterns
Hi Richard, Right, so this is an alternative approach using costs - Combine won't try to duplicate instructions if it increases costs, so increasing the ctz cost to 2 instructions (which is the correct cost for ctz anyway) ensures we still get efficient code for this example: [AArch64] Set ctz rtx_cost (PR93565) Although GCC should understand the limited range of clz/ctz/cls results, Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary sign extension. Avoid this by setting the cost for ctz to be higher than that of a simple ALU instruction. Deepsjeng performance improves by ~0.6%. Bootstrap OK. ChangeLog: 2020-02-12 Wilco Dijkstra PR rtl-optimization/93565 * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs. * gcc.target/aarch64/pr93565.c: New test. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED, return false; +case CTZ: + *cost = COSTS_N_INSNS (2); + + if (speed) + *cost += extra_cost->alu.clz + extra_cost->alu.rev; + return false; + case COMPARE: op0 = XEXP (x, 0); op1 = XEXP (x, 1); diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c b/gcc/testsuite/gcc.target/aarch64/pr93565.c new file mode 100644 index ..7200f80d1bb161f6a058cc6591f61b6b75cf1749 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +static const unsigned long long magic = 0x03f08c5392f756cdULL; + +static const char table[64] = { + 0, 1, 12, 2, 13, 22, 17, 3, +14, 33, 23, 36, 18, 58, 28, 4, +62, 15, 34, 26, 24, 48, 50, 37, +19, 55, 59, 52, 29, 44, 39, 5, +63, 11, 21, 16, 32, 35, 57, 27, +61, 25, 47, 49, 54, 51, 43, 38, +10, 20, 31, 56, 60, 46, 53, 42, + 9, 30, 45, 41, 8, 40, 7, 6, +}; + +static inline int ctz1 (unsigned long b) +{ + unsigned long lsb = b & -b; + return table[(lsb * magic) >> 58]; +} + +void f (unsigned long x, int *p) +{ + if (x != 0) +{ + int a = ctz1 (x); + *p = a | p[a]; +} +} + +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */ +/* { dg-final { scan-assembler-times "clz\t" 1 } } */ +
[PATCH] c++: Fix hashing and testing for equality of ATOMIC_CONST_EXPRs
Two equal atomic constraint expressions do not necessarily share the same tree, so we can't assume that two ATOMIC_CONST_EXPRs are equal if and only if they point to the same tree. The main consequence of this invalid assumption is that the constraint subsumption checker may reject a valid partial specialization for a constrained template because the checker decides that the constraints on the partial specialization do not subsume the constraints on the primary template, as in the test case below. Another consequence is that it probably makes the constraint caches less effective. This patch changes atomic_constraints_identical_p to test for equality of ATOMIC_CONST_EXPRs using cp_tree_equal, and changes hash_atomic_constraint to hash the ATOMIC_CONST_EXPR with iterative_hash_template_arg. Originally I used template_args_equal but cp_tree_equal seems to suffice. Does this look like an OK solution to this issue? Are the testcase adjustments valid? gcc/cp/ChangeLog: * constraint.cc (atomic_constraints_identical_p): Use cp_tree_equal to test for equality between ATOMIC_CONST_EXPRs. (hash_atomic_constraint): Use iterative_hash_template to hash the ATOMIC_CONST_EXPR. gcc/testsuite/ChangeLog: * g++.dg/concepts/variadic4.C: Adjust test to no longere expect an error. * g++.dg/cpp2a/concepts-pr84551.C: Likewise. * g++.dg/cpp2a/concepts-partial-spec7.C: New test. --- gcc/cp/constraint.cc | 4 +- gcc/testsuite/g++.dg/concepts/variadic4.C | 4 +- .../g++.dg/cpp2a/concepts-partial-spec7.C | 40 +++ gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C | 2 +- 4 files changed, 44 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 58044cd0f9d..935c3556272 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -933,7 +933,7 @@ atomic_constraints_identical_p (tree t1, tree t2) gcc_assert (TREE_CODE (t1) == ATOMIC_CONSTR); gcc_assert (TREE_CODE (t2) == ATOMIC_CONSTR); - if (ATOMIC_CONSTR_EXPR (t1) != ATOMIC_CONSTR_EXPR (t2)) + if (!cp_tree_equal (ATOMIC_CONSTR_EXPR (t1), ATOMIC_CONSTR_EXPR (t2))) return false; if (!parameter_mapping_equivalent_p (t1, t2)) @@ -981,7 +981,7 @@ hash_atomic_constraint (tree t) gcc_assert (TREE_CODE (t) == ATOMIC_CONSTR); /* Hash the identity of the expression. */ - hashval_t val = htab_hash_pointer (ATOMIC_CONSTR_EXPR (t)); + hashval_t val = iterative_hash_template_arg (ATOMIC_CONSTR_EXPR (t), 0); /* Hash the targets of the parameter map. */ tree p = ATOMIC_CONSTR_MAP (t); diff --git a/gcc/testsuite/g++.dg/concepts/variadic4.C b/gcc/testsuite/g++.dg/concepts/variadic4.C index d6eea49b958..b073bcce5e9 100644 --- a/gcc/testsuite/g++.dg/concepts/variadic4.C +++ b/gcc/testsuite/g++.dg/concepts/variadic4.C @@ -12,9 +12,7 @@ struct zip; template requires requires { typename list; } // && (Sequence && ...) -struct zip {}; // { dg-error "does not specialize" } -// The constraints of the specialization and the sequence are not -// comparable; the specializations are unordered. +struct zip {}; int main() { diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C new file mode 100644 index 000..ed77ed4d362 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C @@ -0,0 +1,40 @@ +// { dg-do compile { target c++2a } } + +template + inline constexpr bool foo = true; + +template + inline constexpr bool bar = true; + +// The following two partial specializations are both valid, but the constraint +// subsumption checker used to compare ATOMIC_CONSTR_EXPRs by comparing their +// pointers, leading to it accepting the partial specialization of test1 and +// rejecting the partial specialization of test2: in the test1 case, the +// TEMPLATE_ID_EXPR of foo in the constraints was shared between the primary +// template and partial specialization, and in the test2 case the +// TEMPLATE_ID_EXPRs of foo in the constraints were different (but equal) +// trees. + +template + concept baz = foo; + +template + requires baz + struct test1 + { }; + +template + requires baz && bar + struct test1 + { }; + +template + requires foo + struct test2 + { }; + +template + requires foo && bar + struct test2 + { }; + diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C b/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C index e40796f4975..e7498437178 100644 --- a/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C @@ -8,4 +8,4 @@ template requires C class TT> struct A {}; template requires true struct B {}; -A a;// { dg-error "" } +A a; -- 2.25.0.191.gde93cc14ab
[committed] c++: Add new test [PR88819]
Fixed by r10-1975-g59febe0ece37bedab7f42ae51b9f2b7a372d2950. 2020-02-12 Marek Polacek PR c++/88819 * g++.dg/cpp2a/nontype-class32.C: New test. --- gcc/testsuite/ChangeLog | 5 + gcc/testsuite/g++.dg/cpp2a/nontype-class32.C | 10 ++ 2 files changed, 15 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class32.C diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 968b903411d..e2f590b48f4 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2020-02-12 Marek Polacek + + PR c++/88819 + * g++.dg/cpp2a/nontype-class32.C: New test. + 2020-02-12 Marek Polacek PR c++/93684 - ICE-on-invalid with broken attribute. diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class32.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class32.C new file mode 100644 index 000..6a4dac5c90e --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class32.C @@ -0,0 +1,10 @@ +// PR c++/88819 +// { dg-do compile { target c++2a } } + +template class TT, class R = TT <0>> struct A +{ + template struct B {}; +}; +template struct C {}; + +A> a; base-commit: e428a9cf85a8bdde9d031a215e10bd96eb3b789a -- Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
Contents of PO file 'cpplib-10.1-b20200209.fr.po'
cpplib-10.1-b20200209.fr.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
New French PO file for 'cpplib' (version 10.1-b20200209)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the French team of translators. The file is available at: https://translationproject.org/latest/cpplib/fr.po (This file, 'cpplib-10.1-b20200209.fr.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
[committed] Clean up various dead patterns, expanders, splitters and peepholes on the H8.
I needed some mindless work yesterday, so I took the opportunity to do some light cleanups on the H8 port in the hopes that removing unnecessary cruft will make cc0 conversion easier. In this round I'm removing all the expanders, patterns and splitters that have been disabled (in some cases for ~15 years) as well as removing the peepholes that widen QI/HI mode pushes to SImode. I verified none of those peepholes trigger during the course of building libgcc, newlib or across the testsuite (I won't list all the multilibs, there's many :-) I didn't feel such rigor was needed for the previously disabled patterns, expanders and splitters. While we're in stage4, this is isolated to the H8 target which is currently scheduled for deprecation. So it's seems safe enough :-) There'll be more of these cleanups coming. Removing the unused combiner patterns in particular is useful to simplify the cc0 conversion. Installing on the trunk momentarily. Jeff commit e5cc04a73a3e212114ca9725911eaaa66d32303c Author: Jeff Law Date: Wed Feb 12 10:35:12 2020 -0700 Clean up dead patterns, splitters, expanders and peepholes on the H8 port. * config/h8300/h8300.md (cpymemsi, movmd): Remove dead patterns, expanders, splits, etc. (movmd_internal_, movmd splitter, movstr, movsd): Likewise. (stpcpy_internal_, stpcpy splitter): Likewise. (peepholes to convert QI/HI mode pushes to SI mode pushes): Likewise. * config/h8300/h8300.c (h8300_swap_into_er6): Remove unused function. (h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise * config/h8300/h8300-protos.h (h8300_swap_into_er6): Remove unused function prototype. (h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6fa4768d816..9ff0a943e4f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,16 @@ +2020-02-12 Jeff Law + + * config/h8300/h8300.md (cpymemsi, movmd): Remove dead patterns, + expanders, splits, etc. + (movmd_internal_, movmd splitter, movstr, movsd): Likewise. + (stpcpy_internal_, stpcpy splitter): Likewise. + (peepholes to convert QI/HI mode pushes to SI mode pushes): Likewise. + * config/h8300/h8300.c (h8300_swap_into_er6): Remove unused function. + (h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise + * config/h8300/h8300-protos.h (h8300_swap_into_er6): Remove unused + function prototype. + (h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise. + 2020-02-12 Jakub Jelinek PR target/93670 diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h index 0cbb091390d..2416741e76a 100644 --- a/gcc/config/h8300/h8300-protos.h +++ b/gcc/config/h8300/h8300-protos.h @@ -109,9 +109,6 @@ extern unsigned inth8300_insn_length_from_table (rtx_insn *, rtx *); extern const char *output_h8sx_shift (rtx *, int, int); extern boolh8300_operands_match_p (rtx *); extern boolh8sx_mergeable_memrefs_p (rtx, rtx); -extern boolh8sx_emit_movmd (rtx, rtx, rtx, HOST_WIDE_INT); -extern voidh8300_swap_into_er6 (rtx); -extern voidh8300_swap_out_of_er6 (rtx); extern poly_int64 h8300_push_rounding (poly_int64); #endif /* ! GCC_H8300_PROTOS_H */ diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c index def8be344af..d2fea04d8e4 100644 --- a/gcc/config/h8300/h8300.c +++ b/gcc/config/h8300/h8300.c @@ -2647,145 +2647,6 @@ h8300_operands_match_p (rtx *operands) return false; } -/* Try using movmd to move LENGTH bytes from memory region SRC to memory - region DEST. The two regions do not overlap and have the common - alignment given by ALIGNMENT. Return true on success. - - Using movmd for variable-length moves seems to involve some - complex trade-offs. For instance: - - - Preparing for a movmd instruction is similar to preparing - for a memcpy. The main difference is that the arguments - are moved into er4, er5 and er6 rather than er0, er1 and er2. - - - Since movmd clobbers the frame pointer, we need to save - and restore it somehow when frame_pointer_needed. This can - sometimes make movmd sequences longer than calls to memcpy(). - - - The counter register is 16 bits, so the instruction is only - suitable for variable-length moves when sizeof (size_t) == 2. - That's only true in normal mode. - - - We will often lack static alignment information. Falling back - on movmd.b would likely be slower than calling memcpy(), at least - for big moves. - - This function therefore only uses movmd when the length is a - known constant, and only then if -fomit-frame-pointer is in - effect or if we're not optimizing for size. - - At the moment the function uses movmd for all in-range constants, - but it might be better to fall back on memcpy() for large m
Re: [PATCH][AArch64] Improve clz patterns
Wilco Dijkstra writes: > Hi Richard, > > Right, so this is an alternative approach using costs - Combine won't try to > duplicate instructions if it increases costs, so increasing the ctz cost to 2 > instructions (which is the correct cost for ctz anyway) ...agreed... > ensures we still get efficient code for this example: > > [AArch64] Set ctz rtx_cost (PR93565) > > Although GCC should understand the limited range of clz/ctz/cls results, > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary > sign extension. Avoid this by setting the cost for ctz to be higher than > that of a simple ALU instruction. Deepsjeng performance improves by ~0.6%. > > Bootstrap OK. > > ChangeLog: > 2020-02-12 Wilco Dijkstra > > PR rtl-optimization/93565 > * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs. > > * gcc.target/aarch64/pr93565.c: New test. OK, thanks. Could you remove the bit about combine behaving oddly when you commit though? I think this was simply a target bug and combine was being given duff information. Richard > > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int > outer ATTRIBUTE_UNUSED, > >return false; > > +case CTZ: > + *cost = COSTS_N_INSNS (2); > + > + if (speed) > +*cost += extra_cost->alu.clz + extra_cost->alu.rev; > + return false; > + > case COMPARE: >op0 = XEXP (x, 0); >op1 = XEXP (x, 1); > diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c > b/gcc/testsuite/gcc.target/aarch64/pr93565.c > new file mode 100644 > index > ..7200f80d1bb161f6a058cc6591f61b6b75cf1749 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +static const unsigned long long magic = 0x03f08c5392f756cdULL; > + > +static const char table[64] = { > + 0, 1, 12, 2, 13, 22, 17, 3, > +14, 33, 23, 36, 18, 58, 28, 4, > +62, 15, 34, 26, 24, 48, 50, 37, > +19, 55, 59, 52, 29, 44, 39, 5, > +63, 11, 21, 16, 32, 35, 57, 27, > +61, 25, 47, 49, 54, 51, 43, 38, > +10, 20, 31, 56, 60, 46, 53, 42, > + 9, 30, 45, 41, 8, 40, 7, 6, > +}; > + > +static inline int ctz1 (unsigned long b) > +{ > + unsigned long lsb = b & -b; > + return table[(lsb * magic) >> 58]; > +} > + > +void f (unsigned long x, int *p) > +{ > + if (x != 0) > +{ > + int a = ctz1 (x); > + *p = a | p[a]; > +} > +} > + > +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */ > +/* { dg-final { scan-assembler-times "clz\t" 1 } } */ > +
Re: [PATCH][AArch64] Improve popcount expansion
Wilco Dijkstra writes: > The popcount expansion uses umov to extend the result and move it back > to the integer register file. If we model ADDV as a zero-extending > operation, fmov can be used to move back to the integer side. This > results in a ~0.5% speedup on deepsjeng on Cortex-A57. > > A typical __builtin_popcount expansion is now: > > fmovs0, w0 > cntv0.8b, v0.8b > addvb0, v0.8b > fmovw0, s0 > > Bootstrap OK, passes regress. > > ChangeLog > 2020-02-02 Wilco Dijkstra > > gcc/ > * config/aarch64/aarch64.md (popcount2): Improve expansion. > * config/aarch64/aarch64-simd.md > (aarch64_zero_extend_reduc_plus_): New pattern. I think reordering these and using: * config/aarch64/aarch64.md (popcount2): Use it instead of generating separate ADDV and zero_extend patterns. would be clearer. OK with that change, thanks. Richard > * config/aarch64/iterators.md (VDQV_E): New iterator. > testsuite/ > * gcc.target/aarch64/popcnt2.c: New test. > > -- > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 97f46f96968a6bc2f93bbc812931537b819b3b19..34765ff43c1a090a31e2aed64ce95510317ab8c3 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2460,6 +2460,17 @@ (define_insn "aarch64_reduc_plus_internal" >[(set_attr "type" "neon_reduc_add")] > ) > > +;; ADDV with result zero-extended to SI/DImode (for popcount). > +(define_insn "aarch64_zero_extend_reduc_plus_" > + [(set (match_operand:GPI 0 "register_operand" "=w") > + (zero_extend:GPI > +(unspec: [(match_operand:VDQV_E 1 "register_operand" "w")] > + UNSPEC_ADDV)))] > + "TARGET_SIMD" > + "add\\t%0, %1." > + [(set_attr "type" "neon_reduc_add")] > +) > + > (define_insn "aarch64_reduc_plus_internalv2si" > [(set (match_operand:V2SI 0 "register_operand" "=w") > (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "w")] > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > 86c2cdfc7973f4b964ba233cfbbe369b24e0ac10..5edc76ee14b55b2b4323530e10bd22b3ffca483e > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4829,7 +4829,6 @@ (define_expand "popcount2" > { >rtx v = gen_reg_rtx (V8QImode); >rtx v1 = gen_reg_rtx (V8QImode); > - rtx r = gen_reg_rtx (QImode); >rtx in = operands[1]; >rtx out = operands[0]; >if(mode == SImode) > @@ -4843,8 +4842,7 @@ (define_expand "popcount2" > } >emit_move_insn (v, gen_lowpart (V8QImode, in)); >emit_insn (gen_popcountv8qi2 (v1, v)); > - emit_insn (gen_reduc_plus_scal_v8qi (r, v1)); > - emit_insn (gen_zero_extendqi2 (out, r)); > + emit_insn (gen_aarch64_zero_extend_reduc_plus_v8qi (out, v1)); >DONE; > }) > > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index > fc973086cb91ae0dc54eeeb0b832d522539d7982..926779bf2442fa60d184ef17308f91996d6e8d1b > 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -208,6 +208,9 @@ (define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI > V2DI]) > ;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes. > (define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI]) > > +;; Advanced SIMD modes for Integer reduction across lanes (zero/sign > extended). > +(define_mode_iterator VDQV_E [V8QI V16QI V4HI V8HI]) > + > ;; All double integer narrow-able modes. > (define_mode_iterator VDN [V4HI V2SI DI]) > > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt2.c > b/gcc/testsuite/gcc.target/aarch64/popcnt2.c > new file mode 100644 > index > ..e321858afa4d6ecb6fc7348f39f6e5c6c0c46147 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt2.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +unsigned > +foo (int x) > +{ > + return __builtin_popcount (x); > +} > + > +unsigned long > +foo1 (int x) > +{ > + return __builtin_popcount (x); > +} > + > +/* { dg-final { scan-assembler-not {popcount} } } */ > +/* { dg-final { scan-assembler-times {cnt\t} 2 } } */ > +/* { dg-final { scan-assembler-times {fmov} 4 } } */ > +/* { dg-final { scan-assembler-not {umov} } } */ > +/* { dg-final { scan-assembler-not {uxtw} } } */ > +/* { dg-final { scan-assembler-not {sxtw} } } */
Re: [PATCH] regalloc/debug: fix buggy print_hard_reg_set
On Tue, 2020-02-11 at 15:54 +0100, Hans-Peter Nilsson wrote: > I was using ira-conflicts.c:print_hard_reg_set with a local > patch to gdbinit.in in a debug-session, and noticed the > erroneous output. I see there's an almost identical function in > ira-color.c and on top of that, there's another function by the > same name and with similar semantics in sel-sched-dump.c, but > the last one doesn't try to print ranges. > > I'm guessing the dump functions have been used with targets that > have "impossible" registers at FIRST_PSEUDO_REGISTER - 1. CRIS > happens to have the condition-code register at > FIRST_PSEUDO_REGISTER - 1; not allocatable but appears a lot in > (clobbered) register sets after decc0ration. > > Before, for a target with FIRST_PSEUDO_REGISTER 20, you'd get > "19-18" for (1<<19). For (1<<18)|(1<<19), you'd get "18". > > The diff is unfortunately hairy, but I hope you agree the code > is a bit clearer. I'm deliberately not consolidating the dump > functions as IMHO that's too much a matter of taste. > > * ira-conflicts.c (print_hard_reg_set): Correct output for sets > including FIRST_PSEUDO_REGISTER - 1. > * ira-color.c (print_hard_reg_set): Ditto. > > Ok to commit? Not strictly a regression fix, but it's just dumping routines. So OK for the trunk. jeff >
Re: [PATCH] issue -Wstringop-overflow for potential overflow, not -truncation (PR 93646)
On Mon, 2020-02-10 at 15:47 -0700, Martin Sebor wrote: > The reporter of RHBZ #1798636 was mislead and confused by GCC > issuing -Wstringop-truncation for a possible overflow in strncat. > It took a few iterations to appreciate this subtlety and realize > the warning was of the wrong kind. > > The attached patch adjusts the logic of the function responsible > for the warning not to issue -Wstringop-truncation only for strncpy > and -Wstringop-overflow for strncat. > > I'm not sure if the bug is strictly speaking a regression: GCC 8 > didn't issue any warning so it could be viewed as one, but then > again, issuing a warning in this instance is intended, but not > a misleading one. > > Tested on x86_64-linux. > > Martin > PR middle-end/93646 - confusing -Wstringop-truncation on strncat where > -Wstringop-overflow is expected > > gcc/ChangeLog: > > PR middle-end/93646 > * tree-ssa-strlen.c (handle_builtin_stxncpy): Rename... > (handle_builtin_stxncpy_strncat): ...to this. Change first argument. > Issue only -Wstringop-overflow strncat, never -Wstringop-truncation. > (strlen_check_and_optimize_call): Adjust callee name. > > gcc/testsuite/ChangeLog: > > PR middle-end/93646 > * gcc.dg/Wstringop-overflow-31.c: New test. Not strictly a regression bugfix, but I think this is OK for the trunk, even in stage4. jeff
Re: [PATCH][AArch64] Improve clz patterns
On Wed, Feb 12, 2020 at 9:56 AM Richard Sandiford wrote: > > Wilco Dijkstra writes: > > Hi Richard, > > > > Right, so this is an alternative approach using costs - Combine won't try to > > duplicate instructions if it increases costs, so increasing the ctz cost to > > 2 > > instructions (which is the correct cost for ctz anyway) > > ...agreed... Yes I agree a better cost model for CTZ/CLZ is the right solution but I disagree with 2 ALU instruction as the cost. It should either be the same cost as a multiply or have its own cost entry. For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4 cycles, the same as the cost as a multiple; on OcteonTX2 it is 5 cycles (again the same cost as a multiple). Thanks, Andrew Pinski > > > ensures we still get efficient code for this example: > > > > [AArch64] Set ctz rtx_cost (PR93565) > > > > Although GCC should understand the limited range of clz/ctz/cls results, > > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary > > sign extension. Avoid this by setting the cost for ctz to be higher than > > that of a simple ALU instruction. Deepsjeng performance improves by ~0.6%. > > > > Bootstrap OK. > > > > ChangeLog: > > 2020-02-12 Wilco Dijkstra > > > > PR rtl-optimization/93565 > > * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs. > > > > * gcc.target/aarch64/pr93565.c: New test. > > OK, thanks. Could you remove the bit about combine behaving oddly when > you commit though? I think this was simply a target bug and combine > was being given duff information. > > Richard > > > > > -- > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index > > e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b > > 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int > > outer ATTRIBUTE_UNUSED, > > > >return false; > > > > +case CTZ: > > + *cost = COSTS_N_INSNS (2); > > + > > + if (speed) > > +*cost += extra_cost->alu.clz + extra_cost->alu.rev; > > + return false; > > + > > case COMPARE: > >op0 = XEXP (x, 0); > >op1 = XEXP (x, 1); > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c > > b/gcc/testsuite/gcc.target/aarch64/pr93565.c > > new file mode 100644 > > index > > ..7200f80d1bb161f6a058cc6591f61b6b75cf1749 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c > > @@ -0,0 +1,34 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +static const unsigned long long magic = 0x03f08c5392f756cdULL; > > + > > +static const char table[64] = { > > + 0, 1, 12, 2, 13, 22, 17, 3, > > +14, 33, 23, 36, 18, 58, 28, 4, > > +62, 15, 34, 26, 24, 48, 50, 37, > > +19, 55, 59, 52, 29, 44, 39, 5, > > +63, 11, 21, 16, 32, 35, 57, 27, > > +61, 25, 47, 49, 54, 51, 43, 38, > > +10, 20, 31, 56, 60, 46, 53, 42, > > + 9, 30, 45, 41, 8, 40, 7, 6, > > +}; > > + > > +static inline int ctz1 (unsigned long b) > > +{ > > + unsigned long lsb = b & -b; > > + return table[(lsb * magic) >> 58]; > > +} > > + > > +void f (unsigned long x, int *p) > > +{ > > + if (x != 0) > > +{ > > + int a = ctz1 (x); > > + *p = a | p[a]; > > +} > > +} > > + > > +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */ > > +/* { dg-final { scan-assembler-times "clz\t" 1 } } */ > > +
Re: [PATCH][AArch64] Improve clz patterns
Hi Richard, See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565#c8 - the problem is more generic like I suspected and it's easy to create similar examples. So while this turned out to be an easy worksaround for ctz, there general case is harder to avoid since you still want to allow beneficial multi-use cases (such as merging a shift into 2 ALU instructions). Cheers, Wilco
Re: [PATCH][AArch64] Improve clz patterns
Hi Andrew, > Yes I agree a better cost model for CTZ/CLZ is the right solution but > I disagree with 2 ALU instruction as the cost. It should either be > the same cost as a multiply or have its own cost entry. > For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4 > cycles, the same as the cost as a multiple; on OcteonTX2 it is 5 > cycles (again the same cost as a multiple). + if (speed) + *cost += extra_cost->alu.clz + extra_cost->alu.rev; + return false; So if the cost of clz and ctz are similar enough, this will use the defined per-cpu costs. Cheers, Wilco
[committed] More H8 cleanups
This is another small H8 cleanup. This time we're killing a peephole2 that doesn't seem terribly useful. The peephole in question narrows a SImode comparison to QImode when it's fed by an (and (xor)) and the result is the same in either mode. I couldn't get this to trigger within libgcc, newlib or in the testsuite. Note that we do some of this kind of narrowing in match.pd these days, so it may be the case that the pattern was of marginal value before and none after the match.pd improvements. The more straightforward SI->QI and HI->QI operand narrowing when fed by a simple (and) operation were consolidated into a single peephole2. These are still triggering which may point to a failing of the match.pd patterns, but addressing in match.pd seems terribly out of scope at this point. Installing on the trunk. Jeff commit 37462a131c528d0980915d98567361aa9396b030 Author: Jeff Law Date: Wed Feb 12 12:12:22 2020 -0700 Drop unused comparison shortening pattern and consolidate remaining comparison shortening patterns. * config/h8300/h8300.md (comparison shortening peepholes): Drop (and (xor)) variant. Combine other two into single peephole. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 735cc47a3dd..1927290f568 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2020-02-12 Jeff Law + + * config/h8300/h8300.md (comparison shortening peepholes): Drop + (and (xor)) variant. Combine other two into single peephole. + 2020-02-12 Wilco Dijkstra PR rtl-optimization/93565 diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md index 0548368215b..dcc9c3682d6 100644 --- a/gcc/config/h8300/h8300.md +++ b/gcc/config/h8300/h8300.md @@ -5405,7 +5405,9 @@ [(cc0) (const_int 0)]) (label_ref (match_operand 2 "" "")) (pc)))] - "peep2_reg_dead_p (2, operands[0])" + "((const_int_qi_operand (operands[1], QImode) + || const_int_hi_operand (operands[1], HImode)) +&& peep2_reg_dead_p (2, operands[0]))" [(set (match_dup 4) (and:QI (match_dup 4) (match_dup 5))) @@ -5416,68 +5418,11 @@ (label_ref (match_dup 2)) (pc)))] { -operands[4] = gen_rtx_REG (QImode, REGNO (operands[0])); -operands[5] = gen_int_mode (INTVAL (operands[1]), QImode); - }) +enum machine_mode mode; -(define_peephole2 - [(set (match_operand:SI 0 "register_operand" "") - (and:SI (match_dup 0) - (match_operand:SI 1 "const_int_hi_operand" ""))) - (set (cc0) (compare (match_dup 0) - (const_int 0))) - (set (pc) - (if_then_else (match_operator 3 "eqne_operator" - [(cc0) (const_int 0)]) - (label_ref (match_operand 2 "" "")) - (pc)))] - "peep2_reg_dead_p (2, operands[0])" - [(set (match_dup 4) - (and:HI (match_dup 4) - (match_dup 5))) - (set (cc0) (compare (match_dup 4) - (const_int 0))) - (set (pc) - (if_then_else (match_op_dup 3 [(cc0) (const_int 0)]) - (label_ref (match_dup 2)) - (pc)))] - { -operands[4] = gen_rtx_REG (HImode, REGNO (operands[0])); -operands[5] = gen_int_mode (INTVAL (operands[1]), HImode); - }) - -(define_peephole2 - [(set (match_operand:SI 0 "register_operand" "") - (and:SI (match_dup 0) - (match_operand:SI 1 "const_int_qi_operand" ""))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (match_operand:SI 2 "const_int_qi_operand" ""))) - (set (cc0) (compare (match_dup 0) - (const_int 0))) - (set (pc) - (if_then_else (match_operator 4 "eqne_operator" - [(cc0) (const_int 0)]) - (label_ref (match_operand 3 "" "")) - (pc)))] - "peep2_reg_dead_p (3, operands[0]) - && (~INTVAL (operands[1]) & INTVAL (operands[2])) == 0" - [(set (match_dup 5) - (and:QI (match_dup 5) - (match_dup 6))) - (set (match_dup 5) - (xor:QI (match_dup 5) - (match_dup 7))) - (set (cc0) (compare (match_dup 5) - (const_int 0))) - (set (pc) - (if_then_else (match_op_dup 4 [(cc0) (const_int 0)]) - (label_ref (match_dup 3)) - (pc)))] - { -operands[5] = gen_rtx_REG (QImode, REGNO (operands[0])); -operands[6] = gen_int_mode (INTVAL (operands[1]), QImode); -operands[7] = gen_int_mode (INTVAL (operands[2]), QImode); +mode = const_int_qi_operand (operands[1], QImode) ? QImode : HImode; +operands[4] = gen_rtx_REG (mode, REGNO (operands[0])); +operands[5] = gen_int_mode (INTVAL (operands[1]), mode); }) ;; These triggers right at the end of allocation of locals in the
[PATCH] c++: Fix poor diagnostic for array initializer [PR93710]
A small improvement for an error in build_user_type_conversion_1: instead of array-init1.C:11:1: error: conversion from ‘long int’ to ‘A’ is ambiguous 11 | }; | ^ we will print array-init1.C:8:3: error: conversion from ‘long int’ to ‘A’ is ambiguous 8 | 0L, | ^~ Bootstrapped/regtested on x86_64-linux, ok for trunk? 2020-02-12 Marek Polacek PR c++/93710 - poor diagnostic for array initializer. * call.c (build_user_type_conversion_1): Use cp_expr_loc_or_input_loc for an error call. * g++.dg/diagnostic/array-init1.C: New test. --- gcc/cp/call.c | 5 +++-- gcc/testsuite/g++.dg/diagnostic/array-init1.C | 11 +++ 2 files changed, 14 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/diagnostic/array-init1.C diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 51621b7dd87..f47f96bf1c2 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -4172,8 +4172,9 @@ build_user_type_conversion_1 (tree totype, tree expr, int flags, if (complain & tf_error) { auto_diagnostic_group d; - error ("conversion from %qH to %qI is ambiguous", -fromtype, totype); + error_at (cp_expr_loc_or_input_loc (expr), + "conversion from %qH to %qI is ambiguous", + fromtype, totype); print_z_candidates (location_of (expr), candidates); } diff --git a/gcc/testsuite/g++.dg/diagnostic/array-init1.C b/gcc/testsuite/g++.dg/diagnostic/array-init1.C new file mode 100644 index 000..78580ad6b83 --- /dev/null +++ b/gcc/testsuite/g++.dg/diagnostic/array-init1.C @@ -0,0 +1,11 @@ +// PR c++/93710 - poor diagnostic for array initializer. + +struct A { A (int); A (char*); int i; }; + +int x; + +A a1[] = { + 0L, // { dg-error "3:conversion from .long int. to .A. is ambiguous" } + &x, // { dg-error "3:invalid conversion from .int\\*. to .int." } + __builtin_offsetof (A, i) // { dg-error "23:conversion from .long unsigned int. to .A. is ambiguous" } +}; base-commit: 5bfc8303ffe2d86e938d45f13cd99a39469dac4f -- Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
Re: [PATCH] avoid user-constructible types in reshape_init_array (PR 90938)
On 2/11/20 5:28 PM, Jason Merrill wrote: On 2/11/20 9:00 PM, Martin Sebor wrote: r270155, committed in GCC 9, introduced a transformation that strips redundant trailing zero initializers from array initializer lists in order to support string literals as template arguments. The transformation neglected to consider the case of array elements of trivial class types with user-defined conversion ctors and either defaulted or deleted default ctors. (It didn't occur to me that those qualify as trivial types despite the user-defined ctors.) As a result, some valid initialization expressions are rejected when the explicit zero-initializers are dropped in favor of the (deleted) default ctor, Hmm, a type with only a deleted default constructor is not trivial, that should have been OK already. For Marek's test case: struct A { A () == delete; A (int) = delete; }; trivial_type_p() returns true (as does __is_trivial (A) in both GCC and Clang). [class.prop] says that A trivial class is a class that is trivially copyable and has one or more default constructors (10.3.4.1), all of which are either trivial or deleted and at least one of which is not deleted. That sounds like A above is not trivial because it doesn't have at least one default ctor that's not deleted, but both GCC and Clang say it is. What am I missing? Is there some other default constructor hiding in there that I don't know about? and others are eliminated in favor of the defaulted ctor instead of invoking a user-defined conversion ctor, leading to wrong code. This seems like a bug in type_initializer_zero_p; it shouldn't treat 0 as a zero initializer for any class. That does fix it, and it seems like the right solution to me as well. Thanks for the suggestion. I'm a little unsure about the condition I put in place though. Attached is an updated patch rested on x86_64-linux. Martin PR c++/90938 - Initializing array with {1} works but not {0} gcc/cp/ChangeLog: PR c++/90938 * decl.c (reshape_init_array_1): Avoid types with non-trivial user-defined ctors. * tree.c (type_initializer_zero_p): Fail for structs initialized with non-structs. gcc/testsuite/ChangeLog: PR c++/90938 * g++.dg/init/array55.C: New test. * g++.dg/init/array56.C: New test. * g++.dg/cpp2a/nontype-class33.C: New test. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 31a556a0a1f..60731cb3f9d 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6051,11 +6051,14 @@ reshape_init_array_1 (tree elt_type, tree max_index, reshape_iter *d, break; } - if (sized_array_p && trivial_type_p (elt_type)) + if (sized_array_p + && trivial_type_p (elt_type) + && !TYPE_NEEDS_CONSTRUCTING (elt_type)) { /* Strip trailing zero-initializers from an array of a trivial - type of known size. They are redundant and get in the way - of telling them apart from those with implicit zero value. */ + type with no user-defined ctor of known size. They are + redundant and get in the way of telling them apart from those + with implicit zero value. */ unsigned HOST_WIDE_INT nelts = CONSTRUCTOR_NELTS (new_init); if (last_nonzero > nelts) nelts = 0; diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index eb540f851ee..d79d7b1ddd1 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -5708,7 +5708,15 @@ type_initializer_zero_p (tree type, tree init) return TREE_CODE (init) != STRING_CST && initializer_zerop (init); if (TREE_CODE (init) != CONSTRUCTOR) -return initializer_zerop (init); +{ + /* A class can only be initialized by a non-class type if it has + a ctor that converts from that type. Such classes are excluded + since their semantics are unknown. */ + if (RECORD_OR_UNION_TYPE_P (type) + && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (init))) + return false; + return initializer_zerop (init); +} if (TREE_CODE (type) == ARRAY_TYPE) { diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class33.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class33.C new file mode 100644 index 000..8407f4b4c8a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class33.C @@ -0,0 +1,35 @@ +// PR c++/90938 +// { dg-do compile } + +struct A { int i; }; +struct B { A a[2]; }; + +static const constexpr A a0 = { 0 }; +static const constexpr A a_ = { }; + +template struct X { }; + +typedef X XB; +typedef XXB; +typedef X XB; +typedef X XB; +typedef X XB; +typedef XXB; +typedef X XB; +typedef X XB; +typedef X XB; + + +struct C { constexpr C () = default; }; +struct D { C c[2]; }; + +static const constexpr C c_ = { }; + +template struct Y { }; + +typedef Y YD; +typedef Y YD; +typedef Y YD; +typedef YYD; +typedef Y YD; +typedef Y YD; diff --git a/gcc/testsuite/g++.dg/init/array55.C b/gcc/testsuite/g++.dg/init/array55.C new file mode 100644 index 000..70fb183b897 --- /dev/null +++ b/gcc/testsuite/g++.dg/init/array5
[PATCH 1/2] libstdc++: Move some ranges algos to a new header
This roughly mirrors the existing split between and . The ranges [specialized.algorithms] will use this new header to avoid including all of of . libstdc++-v3/ChangeLog: * include/Makefile.am: Add bits/ranges_algobase.h * include/Makefile.in: Regenerate. * bits/ranges_algo.h: Include and refactor existing #includes. (__detail::__is_normal_iterator, __detail::is_reverse_iterator, __detail::__is_move_iterator, copy_result, move_result, __equal, equal, copy_result, move_result, move_backward_result, copy_backward_result, __copy_or_move_backward, __copy_or_move, copy, move, copy_backward, move_backward, copy_n_result, copy_n, fill_n, fill): Split out into ... * bits/range_algobase.h: ... this new header. --- libstdc++-v3/include/Makefile.am| 1 + libstdc++-v3/include/Makefile.in| 1 + libstdc++-v3/include/bits/ranges_algo.h | 508 +- libstdc++-v3/include/bits/ranges_algobase.h | 556 4 files changed, 559 insertions(+), 507 deletions(-) create mode 100644 libstdc++-v3/include/bits/ranges_algobase.h diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 1d342cecbcc..614222db400 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -157,6 +157,7 @@ bits_headers = \ ${bits_srcdir}/random.tcc \ ${bits_srcdir}/range_access.h \ ${bits_srcdir}/range_cmp.h \ + ${bits_srcdir}/ranges_algobase.h \ ${bits_srcdir}/ranges_algo.h \ ${bits_srcdir}/refwrap.h \ ${bits_srcdir}/regex.h \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index c735d67a5d3..7ee6a1e3f61 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -502,6 +502,7 @@ bits_headers = \ ${bits_srcdir}/random.tcc \ ${bits_srcdir}/range_access.h \ ${bits_srcdir}/range_cmp.h \ + ${bits_srcdir}/ranges_algobase.h \ ${bits_srcdir}/ranges_algo.h \ ${bits_srcdir}/refwrap.h \ ${bits_srcdir}/regex.h \ diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h index e065ff2a974..84a02cabb80 100644 --- a/libstdc++-v3/include/bits/ranges_algo.h +++ b/libstdc++-v3/include/bits/ranges_algo.h @@ -32,13 +32,7 @@ #if __cplusplus > 201703L -#include -#include -#include -// #include -#include -#include -#include // __is_byte +#include #include // concept uniform_random_bit_generator #if __cpp_lib_concepts @@ -49,28 +43,6 @@ namespace ranges { namespace __detail { -template - constexpr inline bool __is_normal_iterator = false; - -template - constexpr inline bool - __is_normal_iterator<__gnu_cxx::__normal_iterator<_Iterator, - _Container>> = true; - -template - constexpr inline bool __is_reverse_iterator = false; - -template - constexpr inline bool - __is_reverse_iterator> = true; - -template - constexpr inline bool __is_move_iterator = false; - -template - constexpr inline bool - __is_move_iterator> = true; - template constexpr auto __make_comp_proj(_Comp& __comp, _Proj& __proj) @@ -741,420 +713,6 @@ namespace ranges std::move(__proj1), std::move(__proj2)); } - template _Sent1, - input_iterator _Iter2, sentinel_for<_Iter2> _Sent2, - typename _Pred, typename _Proj1, typename _Proj2> -requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2> -constexpr bool -__equal(_Iter1 __first1, _Sent1 __last1, _Iter2 __first2, _Sent2 __last2, - _Pred __pred, _Proj1 __proj1, _Proj2 __proj2) -{ - // TODO: implement more specializations to at least have parity with - // std::equal. - constexpr bool __sized_iters - = (sized_sentinel_for<_Sent1, _Iter1> - && sized_sentinel_for<_Sent2, _Iter2>); - if constexpr (__sized_iters) - { - auto __d1 = ranges::distance(__first1, __last1); - auto __d2 = ranges::distance(__first2, __last2); - if (__d1 != __d2) - return false; - - using _ValueType1 = iter_value_t<_Iter1>; - using _ValueType2 = iter_value_t<_Iter2>; - constexpr bool __use_memcmp - = ((is_integral_v<_ValueType1> || is_pointer_v<_ValueType1>) - && is_same_v<_ValueType1, _ValueType2> - && is_pointer_v<_Iter1> - && is_pointer_v<_Iter2> - && is_same_v<_Pred, ranges::equal_to> - && is_same_v<_Proj1, identity> - && is_same_v<_Proj2, identity>); - if constexpr (__use_memcmp) - { - if (const size_t __len = (__last1 - __first1)) - return !std::__memcmp(__first1, __first2, __len); -
[PATCH 2/2] libstdc++: Implement ranges [specialized.algorithms]
This implements all the ranges members defined in [specialized.algorithms]: ranges::uninitialized_default_construct ranges::uninitialized_value_construct ranges::uninitialized_copy ranges::uninitialized_copy_n ranges::uninitialized_move ranges::uninitialized_move_n ranges::uninitialized_fill ranges::uninitialized_fill_n ranges::construct_at ranges::destroy_at ranges::destroy It also implements (hopefully correctly) the "obvious" optimizations for these algos, namely that if the output range has a trivial value type and if the appropriate operation won't throw then we can dispatch to the standard ranges version of the algorithm which will then potentially enable further optimizations. libstdc++-v3/ChangeLog: * include/Makefile.am: Add . * include/Makefile.in: Regenerate. * include/bits/ranges_uninitialized.h: New header. * include/std/memory: Include it. * testsuite/20_util/specialized_algorithms/destroy/constrained.cc: New test. * .../uninitialized_copy/constrained.cc: New test. * .../uninitialized_default_construct/constrained.cc: New test. * .../uninitialized_fill/constrained.cc: New test. * .../uninitialized_move/constrained.cc: New test. * .../uninitialized_value_construct/constrained.cc: New test. --- libstdc++-v3/include/Makefile.am | 1 + libstdc++-v3/include/Makefile.in | 1 + .../include/bits/ranges_uninitialized.h | 491 ++ libstdc++-v3/include/std/memory | 1 + .../destroy/constrained.cc| 76 +++ .../uninitialized_copy/constrained.cc | 166 ++ .../constrained.cc| 147 ++ .../uninitialized_fill/constrained.cc | 137 + .../uninitialized_move/constrained.cc | 176 +++ .../constrained.cc| 140 + 10 files changed, 1336 insertions(+) create mode 100644 libstdc++-v3/include/bits/ranges_uninitialized.h create mode 100644 libstdc++-v3/testsuite/20_util/specialized_algorithms/destroy/constrained.cc create mode 100644 libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc create mode 100644 libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_default_construct/constrained.cc create mode 100644 libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/constrained.cc create mode 100644 libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc create mode 100644 libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct/constrained.cc diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 614222db400..e131ce04f8c 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -159,6 +159,7 @@ bits_headers = \ ${bits_srcdir}/range_cmp.h \ ${bits_srcdir}/ranges_algobase.h \ ${bits_srcdir}/ranges_algo.h \ + ${bits_srcdir}/ranges_uninitialized.h \ ${bits_srcdir}/refwrap.h \ ${bits_srcdir}/regex.h \ ${bits_srcdir}/regex.tcc \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 7ee6a1e3f61..ae20f6b1d21 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -504,6 +504,7 @@ bits_headers = \ ${bits_srcdir}/range_cmp.h \ ${bits_srcdir}/ranges_algobase.h \ ${bits_srcdir}/ranges_algo.h \ + ${bits_srcdir}/ranges_uninitialized.h \ ${bits_srcdir}/refwrap.h \ ${bits_srcdir}/regex.h \ ${bits_srcdir}/regex.tcc \ diff --git a/libstdc++-v3/include/bits/ranges_uninitialized.h b/libstdc++-v3/include/bits/ranges_uninitialized.h new file mode 100644 index 000..252295faed7 --- /dev/null +++ b/libstdc++-v3/include/bits/ranges_uninitialized.h @@ -0,0 +1,491 @@ +// Raw memory manipulators -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNT
Re: [PATCH] real: Fix roundeven on inf/nan [PR93663]
On Wed, 12 Feb 2020, Jakub Jelinek wrote: > As can be seen in the testcase, roundeven with inf or nan arguments > ICE because of those asserts where nothing prevents from is_halfway_below > being called with those arguments. > > The following patch fixes that by just returning false for rvc_inf/rvc_nan > like it returns for rvc_zero, so that we handle roundeven with all those > values as round. Inf/NaN are not halfway in between two integers... > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Use a non-empty test program to test ability to link
On 2/10/20 3:58 PM, Joseph Myers wrote: On Sat, 8 Feb 2020, Sandra Loosemore wrote: BTW, I did run autoconf in every subdirectory that contains a configure.ac, but it appears only libstc++-v3 actually uses this test; all the other regenerated configure scripts were unchanged. There's some problem with that regeneration, then; every directory where configure.ac mentions GCC_NO_EXECUTABLES should have a resulting change (I tested regenerating in libgcc/ with this patch applied and got such a change, for example). Hmmm. I just tried that again and got nothing. Do I have to do something other than just run "autoconf" in that directory? I'm always confused by anything involving configuration changes. :-( Just running autoconf in that directory should suffice to get configure regenerated with the change. Hmmm, I tried again and saw that autoconf didn't even touch the timestamp on the existing configure file, but I was able to force it to regenerate the files by removing the old ones first. Is this version of the patch OK to check in? -Sandra >From c258184d7945071450856e8563128417993758a9 Mon Sep 17 00:00:00 2001 From: Sandra Loosemore Date: Wed, 12 Feb 2020 12:20:15 -0800 Subject: [PATCH v2] Use a non-empty test program to test ability to link. On bare-metal targets, I/O support is typically provided by a BSP and requires a linker script and/or hosting library to be specified on the linker command line. Linking an empty program with the default linker script may succeed, however, which confuses libstdc++ configuration when programs that probe for the presence of various I/O features fail with link errors. 2020-02-12 Sandra Loosemore PR libstdc++/79193 PR libstdc++/88999 config/ * no-executables.m4: Use a non-empty program to test for linker support. libgcc/ * configure: Regenerated. libgfortran/ * configure: Regenerated. libiberty/ * configure: Regenerated. libitm/ * configure: Regenerated. libobjc/ * configure: Regenerated. libquadmath/ * configure: Regenerated. libssp/ * configure: Regenerated. libstdc++v-3/ * configure: Regenerated. --- config/ChangeLog | 8 config/no-executables.m4 | 4 +++- libgcc/ChangeLog | 7 +++ libgcc/configure | 4 ++-- libgfortran/ChangeLog| 7 +++ libgfortran/configure| 4 ++-- libiberty/ChangeLog | 7 +++ libiberty/configure | 4 ++-- libitm/ChangeLog | 7 +++ libitm/configure | 0 libobjc/ChangeLog| 7 +++ libobjc/configure| 4 ++-- libquadmath/ChangeLog| 7 +++ libquadmath/configure| 4 ++-- libssp/ChangeLog | 7 +++ libssp/configure | 4 ++-- libstdc++-v3/ChangeLog | 7 +++ libstdc++-v3/configure | 4 ++-- 18 files changed, 81 insertions(+), 15 deletions(-) mode change 100644 => 100755 libitm/configure diff --git a/config/ChangeLog b/config/ChangeLog index f1fec81..01428dd 100644 --- a/config/ChangeLog +++ b/config/ChangeLog @@ -1,3 +1,11 @@ +2020-02-12 Sandra Loosemore + + PR libstdc++/79193 + PR libstdc++/88999 + + * no-executables.m4: Use a non-empty program to test for linker + support. + 2020-02-01 Andrew Burgess * lib-link.m4 (AC_LIB_LINKFLAGS_BODY): Update shell syntax. diff --git a/config/no-executables.m4 b/config/no-executables.m4 index 9061624..6842f84 100644 --- a/config/no-executables.m4 +++ b/config/no-executables.m4 @@ -25,7 +25,9 @@ AC_BEFORE([$0], [_AC_COMPILER_EXEEXT]) AC_BEFORE([$0], [AC_LINK_IFELSE]) m4_define([_AC_COMPILER_EXEEXT], -[AC_LANG_CONFTEST([AC_LANG_PROGRAM()]) +[AC_LANG_CONFTEST([AC_LANG_PROGRAM( + [#include ], + [printf ("hello world\n");])]) # FIXME: Cleanup? AS_IF([AC_TRY_EVAL(ac_link)], [gcc_no_link=no], [gcc_no_link=yes]) if test x$gcc_no_link = xyes; then diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index 917d0e7..7b46ccb 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,10 @@ +2020-02-12 Sandra Loosemore + + PR libstdc++/79193 + PR libstdc++/88999 + + * configure: Regenerated. + 2020-02-10 Jeff Law * config/frv/frvbegin.c: Use right flags for .ctors and .dtors diff --git a/libgcc/configure b/libgcc/configure index ab8d471..093036a 100755 --- a/libgcc/configure +++ b/libgcc/configure @@ -3553,11 +3553,11 @@ done cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ - +#include int main () { - +printf ("hello world\n"); ; return 0; } diff --git a/libgfortran/ChangeLog b/libgfortran/ChangeLog index fe7e480..5632fbc 100644 --- a/libgfortran/ChangeLog +++ b/libgfortran/ChangeLog @@ -1,3 +1,10 @@ +2020-02-12 Sandra Loosemore + + PR libstdc++/79193 + PR libstdc++/88999 + + * configure: Regenerated. + 2020-01-24 Maciej W. Rozycki * configure.ac: Handle `--with-toolexeclibdir='. diff --git a/libgfortran/configure b/libgfortran/configure index 8ba6831..d01654e 100755 --- a/libgfortran/configure +++ b/libgfortran/configure @@ -4042,11 +4042,11 @@ done
[PING PATCH] document that alias and target must have the same type
Ping: https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00290.html On 2/5/20 1:13 PM, Martin Sebor wrote: On 2/4/20 6:05 PM, Martin Sebor wrote: GCC diagnoses declarations of function aliases whose type doesn't match that of the target (ditto for attribute weakref). It doesn't yet diagnose such incompatbilities for variable aliases but that's just an oversight that I will try to remember to correct in GCC 11. The attached patch updates the manual to make it clear that aliases must have the same type as their targets, or the behavior is undefined (and may be diagnosed). On further review I noticed a few problems with the documentation of attribute weakref. The manual says that Without a target, given as an argument to weakref or to alias, weakref is equivalent to weak. and At present, a declaration to which weakref is attached can only be static. However, GCC accepts the following declaration: extern int x (void) __attribute__ ((weakref)); so the second sentence isn't correct without qualification (unlike weakref, a weak declaration must be external). Another documentation problem is with the example in the manual that says that this declaration: static int x() __attribute__ ((weakref ("y"))); /* is equivalent to... */ static int x() __attribute__ ((weak, weakref, alias ("y"))); but GCC rejects the latter with error: weak declaration of ‘x’ must be public Changing the latter from static to extern changes the error to error: ‘weakref’ symbol ‘x’ must have static linkage So clearly two declarations with the two sets of attributes are not equivalent, either extern or static. I've fixed up these documentation problems in the attached revision of the original patch. I also mention that besides their types having to match, the alias must have the same size and alignment as the target. Martin PS I also noted that when the function is then also used GCC issues: warning: ‘weakref’ attribute should be accompanied with an ‘alias’ attribute This matches what the manual says in "Without arguments, it should be accompanied by an alias attribute naming the target symbol." But when the weakref function is not used there is no warning. That seems like an unfortunate side-effect of the choice of issuing the warning a little too late (waning from the front-end it would make it consistent regardless of whether the function was used, and it would highlight the omission even in translation units that define the weakref without using it).
Re: [PATCH] Use a non-empty test program to test ability to link
On Wed, 12 Feb 2020, Sandra Loosemore wrote: > Hmmm, I tried again and saw that autoconf didn't even touch the timestamp on > the existing configure file, but I was able to force it to regenerate the > files by removing the old ones first. Is this version of the patch OK to > check in? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Don't override various Makefile variables for gnulib et al
Ping On Wed, Jan 29, 2020 at 8:07 AM Christian Biesinger wrote: > > Ping > > On Sat, Nov 23, 2019 at 12:30 AM Christian Biesinger > wrote: > > > > Normally the toplevel Makefile will pass various CC=foo and other > > flags down to subdir Makefiles. However, for Gnulib this is a problem > > because Gnulib's configure specifically sets CC to something that > > includes a -std=gnu11 flag on some systems, and this override would > > set it back to CC=gcc, leading to compile errors in a GDB build > > with an updated Gnulib. > > > > I don't believe this is needed outside of GCC, so this patch changes > > Gnulib and other non-GCC modules to just not override any flags -- > > the values set during configure time should be fine. If a user > > overrides them manually when invoking make, those will still work. > > > > Under the same condition, I also removed the host_exports. I don't > > understand why this is ever necessary (this is only after configure > > has run). > > > > The other option is to clear MAKEOVERRIDES in gnulib/Makefile.am, but > > that means the user can't override any variables for this subdirectory. > > > > ChangeLog: > > > > 2019-11-22 Christian Biesinger > > > > * Makefile.def: Pass no_exports_and_flags to various non-GCC > > modules. > > * Makefile.in: Allow passing a no_exports_and_flags argument to > > "all" to suppress emitting exports and make flags. Useful when > > invoked via host_modules from Makefile.def. > > * Makefile.tpl: Regenerate. > > > > Change-Id: I7d80328cf81c133ba6157eec7d10c422b6790723 > > --- > > Makefile.def | 12 ++-- > > Makefile.in | 30 -- > > Makefile.tpl | 9 ++--- > > 3 files changed, 24 insertions(+), 27 deletions(-) > > > > diff --git a/Makefile.def b/Makefile.def > > index 311feb9de3..e1ff065202 100644 > > --- a/Makefile.def > > +++ b/Makefile.def > > @@ -33,7 +33,7 @@ build_modules= { module= fixincludes; }; > > build_modules= { module= libcpp; > > extra_configure_flags='--disable-nls > > am_cv_func_iconv=no';}; > > > > -host_modules= { module= bfd; bootstrap=true; }; > > +host_modules= { module= bfd; bootstrap=true; no_exports_and_flags=true; }; > > host_modules= { module= opcodes; bootstrap=true; }; > > host_modules= { module= binutils; bootstrap=true; }; > > host_modules= { module= bison; no_check_cross= true; }; > > @@ -105,15 +105,15 @@ host_modules= { module= libiconv; > > missing= install-html; > > missing= install-info; }; > > host_modules= { module= m4; }; > > -host_modules= { module= readline; }; > > +host_modules= { module= readline; no_exports_and_flags=true; }; > > host_modules= { module= sid; }; > > -host_modules= { module= sim; }; > > +host_modules= { module= sim; no_exports_and_flags=true; }; > > host_modules= { module= texinfo; no_install= true; }; > > host_modules= { module= zlib; no_install=true; no_check=true; > > bootstrap=true; > > extra_configure_flags='@extra_host_zlib_configure_flags@';}; > > -host_modules= { module= gnulib; }; > > -host_modules= { module= gdb; }; > > +host_modules= { module= gnulib; no_exports_and_flags=true; }; > > +host_modules= { module= gdb; no_exports_and_flags=true; }; > > host_modules= { module= expect; }; > > host_modules= { module= guile; }; > > host_modules= { module= tk; }; > > @@ -129,7 +129,7 @@ host_modules= { module= lto-plugin; bootstrap=true; > > extra_make_flags='@extra_linker_plugin_flags@'; }; > > host_modules= { module= libcc1; extra_configure_flags=--enable-shared; }; > > host_modules= { module= gotools; }; > > -host_modules= { module= libctf; no_check=true; > > +host_modules= { module= libctf; no_check=true; no_exports_and_flags=true; > > bootstrap=true; }; > > > > target_modules = { module= libstdc++-v3; > > diff --git a/Makefile.in b/Makefile.in > > index 1aabf6ede4..bd41753543 100644 > > --- a/Makefile.in > > +++ b/Makefile.in > > @@ -3414,10 +3414,9 @@ maybe-all-bfd: all-bfd > > all-bfd: configure-bfd > > @r=`${PWD_COMMAND}`; export r; \ > > s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ > > - $(HOST_EXPORTS) \ > > +\ > > (cd $(HOST_SUBDIR)/bfd && \ > > - $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) > > $(STAGE1_FLAGS_TO_PASS) \ > > - $(TARGET-bfd)) > > + $(MAKE) $(TARGET-bfd)) > > @endif bfd > > > > > > @@ -25530,10 +25529,9 @@ all-readline: configure-readline > > @: $(MAKE); $(unstage) > > @r=`${PWD_COMMAND}`; export r; \ > > s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ > > - $(HOST_EXPORTS) \ > > +\ > > (cd $(HOST_SUBDIR)/readline && \ > > - $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) > > $(STAGE1_FLAGS_TO_PASS) \ > > - $(TARGET-readline)) > > + $(MAKE) $(TARGET-readline)) > > @endif readline > > > > > > @@ -
Re: Patch ping
On Mon, 2020-02-10 at 10:24 +0100, Jakub Jelinek wrote: > Hi! > > I'd like to ping a couple of patches: > > PR target/91913 - arm movsi + cmpsi -> movsi_compare0 peephole2 ICE fix >https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00010.html Letting the ARM guys deal with this. > > PR preprocessor/92319 - partially implement P1042R1: __VA_OPT__ wording > clarifications >https://gcc.gnu.org/ml/gcc-patches/2020-01/msg02104.html Jason for this one. > > PR target/93069 - avx512* rejects-valid fix (rejected by assembler) >http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01606.html This is in my queue :-) > > PR tree-optimization/92868 - compute_objsize/gimple_call_alloc_size > /maybe_warn_overflow fixes >http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01164.html Martin's patch should have addressed all the issues and should include your tests (tweaked, but supposed to be equivalent). jeff
Re: [PATCH] [MIPS] Remove unnecessary moves around DSP multiply-accumulate instructions
On Thu, 2020-02-06 at 14:05 +0100, Mihailo Stojanovic wrote: > Unnecessary moves around dpadd and dpsub are caused by different pseudos > being assigned to the input-output operands which correspond to the same > register. > > Just like for the MSA multiply-accumulate instructions, this forces the > same pseudo to the input-output operands, > which removes unnecesary moves. > > Tested on mips-mti-linux-gnu. > > gcc/ChangeLog: > > * gcc/config/mips/mips.c (mips_expand_builtin_insn): Operands of > DSP multiply-accumulate instructions which correspond to the > same input-output register now have the same pseudo asigned to > them. > > gcc/testsuite/ChangeLog: > > * gcc/testsuite/gcc.target/mips/mac-zero-reload.c: New test. Affects just MIPS, so OK even though we're in stage4. jeff >
Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Wed, Feb 12, 2020 at 09:07:27AM +0100, Richard Biener wrote: > On Tue, 11 Feb 2020, Segher Boessenkool wrote: > > > On Tue, Feb 11, 2020 at 02:58:47PM +0100, Richard Biener wrote: > > > On Tue, 11 Feb 2020, Roman Zhuykov wrote: > > > > 11.02.2020 11:01, Richard Biener wrote: > > > > Sound good, but IMHO modulo scheduler is not the best choice to be the > > > > first step implementing such a concept. > > > > > > True ;) But since the context of this thread is unrolling ... > > > Not sure how you'd figure the unroll factor to apply if you want > > > to do unrolling within a classical scheduling framework? Maybe > > > unroll as much as you can fill slots until the last instruction > > > of the first iteration retires? > > > > That will be terrible on register-rich architectures: it *already* is > > problematic how often some things are unrolled, blindly unrolling more > > would make things worse. We need to unroll more where it helps, but > > less where it does not. For that we need a good cost/benefit estimate. > > True. For x86 we tried but did not come up with a sensible estimate > (probably the x86 uarchs are way too complicated to understand). There are three main factors at work: 1) The cost for iterating the loop. This is related to what ivopts does, and also on most uarchs every loop iteration has some minimum cost (on many uarchs it needs a fetch redirect, or you can only do one branch per cycle, etc.; this is mainly important for "small" loops, where what is "small" differs per uarch). Unrolling brings this cost down. 2) Cost reduction by better scheduling. 3) The cost for using more registers, when unrolling. This is worse if you also use SMS or variable expansion, or simply because of the better scheduling. This can increase the cost a *lot*. 1) isn't all that hard to estimate. 2) and esp. 3) are though :-/ Segher
Re: [PATCH] avoid user-constructible types in reshape_init_array (PR 90938)
On Wed, Feb 12, 2020 at 01:21:58PM -0700, Martin Sebor wrote: > On 2/11/20 5:28 PM, Jason Merrill wrote: > > On 2/11/20 9:00 PM, Martin Sebor wrote: > > > r270155, committed in GCC 9, introduced a transformation that strips > > > redundant trailing zero initializers from array initializer lists in > > > order to support string literals as template arguments. > > > > > > The transformation neglected to consider the case of array elements > > > of trivial class types with user-defined conversion ctors and either > > > defaulted or deleted default ctors. (It didn't occur to me that > > > those qualify as trivial types despite the user-defined ctors.) As > > > a result, some valid initialization expressions are rejected when > > > the explicit zero-initializers are dropped in favor of the (deleted) > > > default ctor, > > > > Hmm, a type with only a deleted default constructor is not trivial, that > > should have been OK already. > > For Marek's test case: > struct A { A () == delete; A (int) = delete; }; > > trivial_type_p() returns true (as does __is_trivial (A) in both GCC > and Clang). > > [class.prop] says that > > A trivial class is a class that is trivially copyable and has one > or more default constructors (10.3.4.1), all of which are either > trivial or deleted and at least one of which is not deleted. > > That sounds like A above is not trivial because it doesn't have > at least one default ctor that's not deleted, but both GCC and > Clang say it is. What am I missing? Is there some other default > constructor hiding in there that I don't know about? Note that [class.prop]/2 now says something other than that: "A trivial class is a class that is trivially copyable and has one or more eligible default constructors ([class.default.ctor]), all of which are trivial." I think this changed in P0848R3 (Conditionally Trivial Special Member Functions). Here A has a default constructor, but it's not eligible: [special]/6 says "An eligible special member function is a special member function for which: -- the function is not deleted, [...]" So it seems that A should not be trivial. But the wording you quoted also means that: it was changed in CWG1496 Triviality with deleted and missing default constructors but we don't implement that (https://gcc.gnu.org/PR85723). So going further down memory lane, [class.prop] used to say "A trivial class is a class that has a default constructor (12.1), has no non-trivial default constructors, and is trivially copyable." which A is (because it has an implicit copy ctor). So I think it all comes down to the fact that neither g++ not clang++ implement CWG 1496. And that's probably GCC 11 work. Marek
[committed] Consolidate two H8 peepholes into one peephole using a mode iterator
Another minor cleanup for the H8 port. There was another pattern that handles shortening of comparison operators. It handles shortening from HI->QI (the other handled SI->HI and SI->QI). This patch uses a mode iterator to handle them all with one pattern. It also installs the correct version of the previous patch (which would generate RTL with incorrect modes). Regression tested on h8300-elf with a slew of multilibs :-) Installing on the trunk. jeff Combine the two H8 mode shortening peepholes into a single peephole * config/h8300/h8300.md (comparison shortening peepholes): Use a mode iterator to merge the HImode and SImode peepholes. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b566ca4b591..fe42cee9e22 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2020-02-12 Jeff Law + + * config/h8300/h8300.md (comparison shortening peepholes): Use + a mode iterator to merge the HImode and SImode peepholes. + 2020-02-12 Jakub Jelinek PR middle-end/93663 diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md index dcc9c3682d6..f12eb46306f 100644 --- a/gcc/config/h8300/h8300.md +++ b/gcc/config/h8300/h8300.md @@ -5366,38 +5366,13 @@ { operands[5] = GEN_INT (- INTVAL (operands[1])); }) -;; Narrow the mode of testing if possible. -(define_peephole2 - [(set (match_operand:HI 0 "register_operand" "") - (and:HI (match_dup 0) - (match_operand:HI 1 "const_int_qi_operand" ""))) - (set (cc0) (compare (match_dup 0) - (const_int 0))) - (set (pc) - (if_then_else (match_operator 3 "eqne_operator" - [(cc0) (const_int 0)]) - (label_ref (match_operand 2 "" "")) - (pc)))] - "peep2_reg_dead_p (2, operands[0])" - [(set (match_dup 4) - (and:QI (match_dup 4) - (match_dup 5))) - (set (cc0) (compare (match_dup 4) - (const_int 0))) - (set (pc) - (if_then_else (match_op_dup 3 [(cc0) (const_int 0)]) - (label_ref (match_dup 2)) - (pc)))] - { -operands[4] = gen_rtx_REG (QImode, REGNO (operands[0])); -operands[5] = gen_int_mode (INTVAL (operands[1]), QImode); - }) +;; Narrow the mode of testing if possible. (define_peephole2 - [(set (match_operand:SI 0 "register_operand" "") - (and:SI (match_dup 0) - (match_operand:SI 1 "const_int_operand" ""))) + [(set (match_operand:HSI 0 "register_operand" "") + (and:HSI (match_dup 0) +(match_operand:HSI 1 "const_int_operand" ""))) (set (cc0) (compare (match_dup 0) (const_int 0))) (set (pc) @@ -5406,11 +5381,10 @@ (label_ref (match_operand 2 "" "")) (pc)))] "((const_int_qi_operand (operands[1], QImode) - || const_int_hi_operand (operands[1], HImode)) + || (GET_MODE (operands[0]) == SImode +&& const_int_hi_operand (operands[1], HImode))) && peep2_reg_dead_p (2, operands[0]))" - [(set (match_dup 4) - (and:QI (match_dup 4) - (match_dup 5))) + [(set (match_dup 4) (match_dup 6)) (set (cc0) (compare (match_dup 4) (const_int 0))) (set (pc) @@ -5423,6 +5397,7 @@ mode = const_int_qi_operand (operands[1], QImode) ? QImode : HImode; operands[4] = gen_rtx_REG (mode, REGNO (operands[0])); operands[5] = gen_int_mode (INTVAL (operands[1]), mode); +operands[6] = gen_rtx_AND (mode, operands[4], operands[5]); }) ;; These triggers right at the end of allocation of locals in the
Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Wed, Feb 12, 2020 at 11:53:22AM +0100, Richard Biener wrote: > On Wed, 12 Feb 2020, Segher Boessenkool wrote: > > On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote: > > > On Tue, 11 Feb 2020, Segher Boessenkool wrote: > > > > Basic block partitioning has wildly disproportionate fallout in all > > > > later passes, both in terms of what those *do* (or don't, if > > > > partitioning > > > > is enabled), and of impact on the code (not to mention developer time). > > > > > > > > Maybe the implementation can be improved, but probably we should do this > > > > in a different way altogether. The current situation is not good. > > > > > > I think the expectation that you can go back to CFG layout mode > > > and then work with CFG layout tools after we've lowered to CFG RTL > > > is simply bogus. > > > > Partitioning is also quite problematic if you do not use cfglayout > > mode. For example, in shrink-wrapping. It prevents a lot there. > > > > > Yeah, you can probably do analysis things but > > > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle > > > can wreck things. Undoubtedly doing CFG manipulations is not going > > > to work since CFG layout does not respect CFG RTL restrictions. > > > > Doing CFG manipulations on CFG RTL mode directly is almost impossible > > to do correctly. > > > > For example, bb-reorder. Which is a much more important optimisation > > than partitioning, btw. > > BB reorder switches back and forth as well ... :/ Yes. It is extremely hard to change any jumps in cfgrtl mode. The goal is to use cfglayout mode more, ideally *always*, not to use it less! > I think both are closely enough related that we probably should do > partitioning from within the same framework? OTOH BB reorder > happens _much_ later. This may be doable. What we need to do first though is to find a better thing to use than EDGE_CROSSING. Maybe we can determine which blocks should be hot and cold early, but actually make that happen only very late (maybe adjusting the decision if we have to to make things work)? That way, intervening passes do not have to care (much). > > > Partitioning simply uncovered latent bugs, there's nothing wrong > > > with it IMHO. > > > > I don't agree. The whole way EDGE_CROSSING works hinders all other > > optimisations a lot. > > I'm not sure if it's there for correctness reasons or just to > help checking that nothing "undoes" the partitioning decision. The latter. (It may also make it easier for the former, of course, but that can be solved anyway). And that makes live very hard for all later passes, while it is doubtful that it even give the best decisions: for example it prevents a lot of shrink-wrapping, which you dearly *want* to do on cold code! Segher
[committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test
On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote: > This patch renames the PowerPC internal switch -mprefixed-addr to be > -mprefixed. --- gcc/config/rs6000/rs6000.opt +++ gcc/config/rs6000/rs6000.opt @@ -570,8 +570,8 @@ mfuture Target Report Mask(FUTURE) Var(rs6000_isa_flags) Use instructions for a future architecture. -mprefixed-addr -Target Undocumented Mask(PREFIXED_ADDR) Var(rs6000_isa_flags) +mprefixed +Target Report Mask(PREFIXED) Var(rs6000_isa_flags) Generate (do not generate) prefixed memory instructions. mpcrel This change broke the gcc.target/powerpc/pr93122.c test, so it now FAIL: gcc.target/powerpc/pr93122.c (test for excess errors) Excess errors: xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean '-mprefixed'? Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to trunk as obvious. 2020-02-12 Jakub Jelinek * gcc.target/powerpc/pr93122.c: Use -mprefixed instead of -mprefixed-addr in dg-options. --- gcc/testsuite/gcc.target/powerpc/pr93122.c +++ gcc/testsuite/gcc.target/powerpc/pr93122.c @@ -1,6 +1,6 @@ /* PR target/93122 */ /* { dg-do compile { target lp64 } } */ -/* { dg-options "-fstack-clash-protection -mprefixed-addr -mfuture" } */ +/* { dg-options "-fstack-clash-protection -mprefixed -mfuture" } */ void bar (char *); Jakub
Re: [committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test
On Wed, Feb 12, 2020 at 11:27:01PM +0100, Jakub Jelinek wrote: > On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote: > > This patch renames the PowerPC internal switch -mprefixed-addr to be > > -mprefixed. > This change broke the gcc.target/powerpc/pr93122.c test, so it now > FAIL: gcc.target/powerpc/pr93122.c (test for excess errors) > Excess errors: > xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean > '-mprefixed'? > > Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to > trunk as obvious. Thanks! Segher
Re: [committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test
On Wed, Feb 12, 2020 at 11:27:01PM +0100, Jakub Jelinek wrote: > On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote: > > This patch renames the PowerPC internal switch -mprefixed-addr to be > > -mprefixed. > > --- gcc/config/rs6000/rs6000.opt > +++ gcc/config/rs6000/rs6000.opt > @@ -570,8 +570,8 @@ mfuture > Target Report Mask(FUTURE) Var(rs6000_isa_flags) > Use instructions for a future architecture. > > -mprefixed-addr > -Target Undocumented Mask(PREFIXED_ADDR) Var(rs6000_isa_flags) > +mprefixed > +Target Report Mask(PREFIXED) Var(rs6000_isa_flags) > Generate (do not generate) prefixed memory instructions. > > mpcrel > > This change broke the gcc.target/powerpc/pr93122.c test, so it now > FAIL: gcc.target/powerpc/pr93122.c (test for excess errors) > Excess errors: > xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean > '-mprefixed'? > > Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to > trunk as obvious. Thanks. I don't think that test was in the trunk when I did the the bootstrap for the -mprefixed-addr to -mprefixed option. I was about to send a similar patch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[committed] c++: Fix constexpr if and braced functional cast.
While partially instantiating a generic lambda, we can encounter pack expansions or constexpr if where we can't actually do the substitution immediately, and instead remember a partial instantiation context in *_EXTRA_ARGS. This includes any local_specializations used in the pattern or condition. In this testcase our tree walk wasn't finding the use of i because we weren't walking into the type of a CONSTRUCTOR. Fixed by moving the code for doing that from find_parameter_packs_r into cp_walk_subtrees. Tested x86_64-pc-linux-gnu, applying to trunk. 2020-02-11 Jason Merrill PR c++/92583 PR c++/92654 * tree.c (cp_walk_subtrees): Walk CONSTRUCTOR types here. * pt.c (find_parameter_packs_r): Not here. --- gcc/cp/pt.c | 3 --- gcc/cp/tree.c | 5 gcc/testsuite/g++.dg/cpp0x/nondeduced7.C | 2 +- .../g++.dg/cpp1z/constexpr-if-lambda3.C | 24 +++ 4 files changed, 30 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index c2d3a98b1c5..6e7f4555da8 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -3924,9 +3924,6 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, void* data) case TEMPLATE_DECL: if (!DECL_TEMPLATE_TEMPLATE_PARM_P (t)) return NULL_TREE; - gcc_fallthrough(); - -case CONSTRUCTOR: cp_walk_tree (&TREE_TYPE (t), &find_parameter_packs_r, ppd, ppd->visited); return NULL_TREE; diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index eb540f851ee..736ef6fe667 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -5024,6 +5024,11 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, walk_tree_fn func, *walk_subtrees_p = 0; break; +case CONSTRUCTOR: + if (COMPOUND_LITERAL_P (*tp)) + WALK_SUBTREE (TREE_TYPE (*tp)); + break; + case TRAIT_EXPR: WALK_SUBTREE (TRAIT_EXPR_TYPE1 (*tp)); WALK_SUBTREE (TRAIT_EXPR_TYPE2 (*tp)); diff --git a/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C b/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C index a8aa073c57e..eb5d5faf493 100644 --- a/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C +++ b/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C @@ -2,5 +2,5 @@ // { dg-do compile { target c++11 } } template struct A; -template struct A {}; // { dg-error "partial specialization" } +template struct A {}; // { dg-error "partial specialization|involves template parameter" } A a; diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C b/gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C new file mode 100644 index 000..34615f71ee2 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C @@ -0,0 +1,24 @@ +// PR c++/92583 +// { dg-do compile { target c++17 } } + +template struct a { + constexpr operator int() { return 42; } +}; +template using b = int; +template struct e {}; +template using h = e; +template void apply(j f, e) { + (f(a{}), ...); +} +template void m(j f) { + using k = b; + using n = h; + apply(f, n{}); +} +template void o() { + auto p = [](auto i) { +if constexpr (a{}) ; + }; + m(p); +} +auto q() { o<0, 1>; } base-commit: 1cd9bef89ef25b3fd506cedbc82e8bc00ff56fa8 -- 2.18.1
[PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]
Hi As mentioned in PR93724, several intrinsic macros lack a closing parenthesis. These macros are only used with -O0 option, and currently unit tests use -O2, so not covered. Bootstrap ok, regression tests on i386/x86_64 is ok. Ok for trunk? Changelog gcc/ * config/i386/avx512vbmi2intrin.h (_mm512_[,mask_,maskz_]shrdi_epi16, _mm512_[,mask_,maskz_]shrdi_epi32, _m512_[,mask_,maskz_]shrdi_epi64, _mm512_[,mask_,maskz_]shldi_epi16, _mm512_[,mask_,maskz_]shldi_epi32, _m512_[,mask_,maskz_]shldi_epi64): Fix typo of lacking a closing parenthesis. * config/i386/avx512vbmi2vlintrin.h (_mm256_[,mask_,maskz_]shrdi_epi16, _mm256_[,mask_,maskz_]shrdi_epi32, _m256_[,mask_,maskz_]shrdi_epi64, _mm_[,mask_,maskz_]shrdi_epi16, _mm_[,mask_,maskz_]shrdi_epi32, _mm_[,mask_,maskz_]shrdi_epi64, _mm256_[,mask_,maskz_]shldi_epi16, _mm256_[,mask_,maskz_]shldi_epi32, _m256_[,mask_,maskz_]shldi_epi64, _mm_[,mask_,maskz_]shldi_epi16, _mm_[,mask_,maskz_]shldi_epi32, _mm_[,mask_,maskz_]shldi_epi64): Ditto. gcc/testsuite/ * gcc.target/i386/avx512vbmi2-vpshld-1.c: New test. * gcc.target/i386/avx512vbmi2-vpshld-O0-1.c: Ditto. * gcc.target/i386/avx512vbmi2-vpshrd-1.c: Ditto. * gcc.target/i386/avx512vbmi2-vpshrd-O0-1.c: Ditto. * gcc.target/i386/avx512vl-vpshld-O0-1.c: Ditto. * gcc.target/i386/avx512vl-vpshrd-O0-1.c: Ditto. -- BR, Hongtao 0001-Intrinsic-macro-of-vpshr-and-vpshl-lack-a-closing-pa.patch Description: Binary data
[fixincludes] skip fixinc on vxworks7*, amend mkheaders
vxworks7 headers haven't required fixes, and we've long avoided running fixinc on them. The problem with that is that, with a dummy fixinc, mkheaders wipes out include-fixed but then multi_dir subdirs are not created again, so we end up with a limits.h named after each multi_dir, when there are non-default multilibs. Oops. This patch arranges for a dummy fixinc to be created for *-*-vxworks7* targets, and fixes mkheaders so as to create multi_dir subdirs in include-fixed after wiping them out, and to copy limits.h so that it won't take the name that should be of a subdir (unless the multi_dir is limits.h, but that's hopefully never the case ;-) This was tested on x86_64-linux-gnu (no changes to include-fixed there, as expected), and with various of AdaCore's vx6 and vx7 targets, including ones with and without multilibs. Ok to install? for fixincludes/ChangeLog * mkheaders.in: Re-create subdirs, copy limits.h into subdir. * mkfixinc.sh: Create dummy fixinc for *-*-vxworks7*. --- fixincludes/mkfixinc.sh |1 + fixincludes/mkheaders.in |3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/fixincludes/mkfixinc.sh b/fixincludes/mkfixinc.sh index 0f96486..df90720 100755 --- a/fixincludes/mkfixinc.sh +++ b/fixincludes/mkfixinc.sh @@ -19,6 +19,7 @@ case $machine in powerpc-*-rtems* | \ powerpcle-*-eabisim* | \ powerpcle-*-eabi* | \ +*-*-vxworks7* | \ *-musl* ) # IF there is no include fixing, # THEN create a no-op fixer and exit diff --git a/fixincludes/mkheaders.in b/fixincludes/mkheaders.in index 9109b05..a293a57 100644 --- a/fixincludes/mkheaders.in +++ b/fixincludes/mkheaders.in @@ -86,6 +86,7 @@ for ml in `cat ${itoolsdatadir}/fixinc_list`; do sysroot_headers_suffix=`echo ${ml} | sed -e 's/;.*$//'` multi_dir=`echo ${ml} | sed -e 's/^[^;]*;//'` subincdir=${incdir}${multi_dir} + ${mkinstalldirs} ${subincdir} . ${itoolsdatadir}/mkheaders.conf if [ x${STMP_FIXINC} != x ] ; then TARGET_MACHINE="${target}" target_canonical="${target}" \ @@ -100,5 +101,5 @@ for ml in `cat ${itoolsdatadir}/fixinc_list`; do fi fi - cp ${itoolsdatadir}/include${multi_dir}/limits.h ${subincdir} + cp ${itoolsdatadir}/include${multi_dir}/limits.h ${subincdir}/limits.h done -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
[PATCH] libiberty.h: punt duplicate strverscmp prototype
SVN r216772 accidentally copied & pasted this prototype when adding other ones nearby. 2020-02-13 Mike Frysinger * libiberty.h (strverscmp): Delete duplicate prototype. --- include/ChangeLog | 4 include/libiberty.h | 5 - 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/include/ChangeLog b/include/ChangeLog index 3f9382d9ad4c..5094f6ce742f 100644 --- a/include/ChangeLog +++ b/include/ChangeLog @@ -1,3 +1,7 @@ +2020-02-13 Mike Frysinger + + * libiberty.h (strverscmp): Delete duplicate prototype. + 2020-02-05 Andrew Burgess * hashtab.h (htab_remove_elt): Make a parameter const. diff --git a/include/libiberty.h b/include/libiberty.h index 141cb886a850..08ede9889213 100644 --- a/include/libiberty.h +++ b/include/libiberty.h @@ -706,11 +706,6 @@ extern unsigned long long int strtoull (const char *nptr, char **endptr, int base); #endif -#if defined(HAVE_DECL_STRVERSCMP) && !HAVE_DECL_STRVERSCMP -/* Compare version strings. */ -extern int strverscmp (const char *, const char *); -#endif - /* Set the title of a process */ extern void setproctitle (const char *name, ...); -- 2.23.0
[PATCH] config: import pkg.m4 from pkg-config
We use this in the sim tree currently. Rather than require people to have pkg-config installed, include it in the config/ dir. 2012-12-23 Mike Frysinger * pkg.m4: New file from pkg-config-0.29.2. --- config/pkg.m4 | 275 ++ 1 file changed, 275 insertions(+) create mode 100644 config/pkg.m4 diff --git a/config/pkg.m4 b/config/pkg.m4 new file mode 100644 index ..13a889017866 --- /dev/null +++ b/config/pkg.m4 @@ -0,0 +1,275 @@ +# pkg.m4 - Macros to locate and utilise pkg-config. -*- Autoconf -*- +# serial 12 (pkg-config-0.29.2) + +dnl Copyright ?? 2004 Scott James Remnant . +dnl Copyright ?? 2012-2015 Dan Nicholson +dnl +dnl This program is free software; you can redistribute it and/or modify +dnl it under the terms of the GNU General Public License as published by +dnl the Free Software Foundation; either version 2 of the License, or +dnl (at your option) any later version. +dnl +dnl This program is distributed in the hope that it will be useful, but +dnl WITHOUT ANY WARRANTY; without even the implied warranty of +dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +dnl General Public License for more details. +dnl +dnl You should have received a copy of the GNU General Public License +dnl along with this program; if not, write to the Free Software +dnl Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA +dnl 02111-1307, USA. +dnl +dnl As a special exception to the GNU General Public License, if you +dnl distribute this file as part of a program that contains a +dnl configuration script generated by Autoconf, you may include it under +dnl the same distribution terms that you use for the rest of that +dnl program. + +dnl PKG_PREREQ(MIN-VERSION) +dnl --- +dnl Since: 0.29 +dnl +dnl Verify that the version of the pkg-config macros are at least +dnl MIN-VERSION. Unlike PKG_PROG_PKG_CONFIG, which checks the user's +dnl installed version of pkg-config, this checks the developer's version +dnl of pkg.m4 when generating configure. +dnl +dnl To ensure that this macro is defined, also add: +dnl m4_ifndef([PKG_PREREQ], +dnl [m4_fatal([must install pkg-config 0.29 or later before running autoconf/autogen])]) +dnl +dnl See the "Since" comment for each macro you use to see what version +dnl of the macros you require. +m4_defun([PKG_PREREQ], +[m4_define([PKG_MACROS_VERSION], [0.29.2]) +m4_if(m4_version_compare(PKG_MACROS_VERSION, [$1]), -1, +[m4_fatal([pkg.m4 version $1 or higher is required but ]PKG_MACROS_VERSION[ found])]) +])dnl PKG_PREREQ + +dnl PKG_PROG_PKG_CONFIG([MIN-VERSION]) +dnl -- +dnl Since: 0.16 +dnl +dnl Search for the pkg-config tool and set the PKG_CONFIG variable to +dnl first found in the path. Checks that the version of pkg-config found +dnl is at least MIN-VERSION. If MIN-VERSION is not specified, 0.9.0 is +dnl used since that's the first version where most current features of +dnl pkg-config existed. +AC_DEFUN([PKG_PROG_PKG_CONFIG], +[m4_pattern_forbid([^_?PKG_[A-Z_]+$]) +m4_pattern_allow([^PKG_CONFIG(_(PATH|LIBDIR|SYSROOT_DIR|ALLOW_SYSTEM_(CFLAGS|LIBS)))?$]) +m4_pattern_allow([^PKG_CONFIG_(DISABLE_UNINSTALLED|TOP_BUILD_DIR|DEBUG_SPEW)$]) +AC_ARG_VAR([PKG_CONFIG], [path to pkg-config utility]) +AC_ARG_VAR([PKG_CONFIG_PATH], [directories to add to pkg-config's search path]) +AC_ARG_VAR([PKG_CONFIG_LIBDIR], [path overriding pkg-config's built-in search path]) + +if test "x$ac_cv_env_PKG_CONFIG_set" != "xset"; then + AC_PATH_TOOL([PKG_CONFIG], [pkg-config]) +fi +if test -n "$PKG_CONFIG"; then + _pkg_min_version=m4_default([$1], [0.9.0]) + AC_MSG_CHECKING([pkg-config is at least version $_pkg_min_version]) + if $PKG_CONFIG --atleast-pkgconfig-version $_pkg_min_version; then + AC_MSG_RESULT([yes]) + else + AC_MSG_RESULT([no]) + PKG_CONFIG="" + fi +fi[]dnl +])dnl PKG_PROG_PKG_CONFIG + +dnl PKG_CHECK_EXISTS(MODULES, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND]) +dnl --- +dnl Since: 0.18 +dnl +dnl Check to see whether a particular set of modules exists. Similar to +dnl PKG_CHECK_MODULES(), but does not set variables or print errors. +dnl +dnl Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG]) +dnl only at the first occurence in configure.ac, so if the first place +dnl it's called might be skipped (such as if it is within an "if", you +dnl have to call PKG_CHECK_EXISTS manually +AC_DEFUN([PKG_CHECK_EXISTS], +[AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl +if test -n "$PKG_CONFIG" && \ +AC_RUN_LOG([$PKG_CONFIG --exists --print-errors "$1"]); then + m4_default([$2], [:]) +m4_ifvaln([$3], [else + $3])dnl +fi]) + +dnl _PKG_CONFIG([VARIABLE], [COMMAND], [MODULES]) +dnl - +dnl Internal wrapper calling pkg-config via PKG_CONFIG and setting +dnl pkg_failed based on
[PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)
I've submitted a bug tracker, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93707. The root cause is that for a self-recursive function, a for-all-contexts clone could generate an edge whose callee is not the function. Therefore, to check whether an edge stands for a recursive call during cloning, we should not use ordinary way of comparing caller and callee of the edge. Bootstrapped/regtested on x86_64-linux and aarch64-linux, and also tested Spec 2017 with option "--param ipa-cp-eval-threshold=1". Feng --- 2020-02-13 Feng Xue PR ipa/93707 * ipa-cp.c (self_recursive_pass_through_p): Add a new parameter "node", and check recursiveness by comparing "node" and cs->caller. (self_recursive_agg_pass_through_p): Likewise. (find_more_scalar_values_for_callers_subset): Add "node" argument to self_recursive_pass_through_p call. (intersect_aggregates_with_edge): Add a new parameter "node", and add "node" argument to self_cursive_agg_pass_through_p call. (find_aggregate_values_for_callers_subset): Add "node" argument to self_recursive_pass_through_p and intersect_aggregates_with_edge calls. (cgraph_edge_brings_all_agg_vals_for_node): Add "node" argument to intersect_aggregates_with_edge call. > > From: gcc-patches-ow...@gcc.gnu.org > on behalf of Tamar Christina > Sent: Tuesday, February 11, 2020 6:05 PM > To: Feng Xue OS; Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org > Cc: nd > Subject: RE: [PATCH V2] Generalized value pass-through for self-recursive > function (ipa/pr93203) > > Hi Feng, > > This patch (commit a0f6a8cb414b687f22c9011a894d5e8e398c4be0) is causing > ICEs in the GCC and perlbench benchmark in Spec2017. > > during IPA pass: cp > lto1: internal compiler error: in find_more_scalar_values_for_callers_subset, > at ipa-cp.c:4709 > 0x1698187 find_more_scalar_values_for_callers_subset > ../.././gcc/ipa-cp.c:4709 > 0x169f7d3 decide_about_value > ../.././gcc/ipa-cp.c:5490 > 0x169fdc3 decide_whether_version_node > ../.././gcc/ipa-cp.c:5537 > 0x169fdc3 ipcp_decision_stage > ../.././gcc/ipa-cp.c:5718 > 0x169fdc3 ipcp_driver > ../.././gcc/ipa-cp.c:5901 > Please submit a full bug report, > > Thanks, > Tamar > 0001-Fix-bug-in-recursiveness-check-for-function-in-clone.patch Description: 0001-Fix-bug-in-recursiveness-check-for-function-in-clone.patch
testsuite: Fix g++.dg/analyzer/pr93212.C with check-c++-all
On Thu, Feb 06, 2020 at 03:27:29PM -0500, David Malcolm wrote: > gcc/testsuite/ChangeLog: > PR analyzer/93212 > * g++.dg/analyzer/analyzer.exp: New subdirectory and .exp suite. > * g++.dg/analyzer/malloc.C: New test. > * g++.dg/analyzer/pr93212.C: New test. The test FAILs with c++11: .../gcc/testsuite/g++.dg/analyzer/pr93212.C:4:1: error: 'lol' function uses 'auto' type specifier without trailing return type .../gcc/testsuite/g++.dg/analyzer/pr93212.C:4:1: note: deduced return type only available with '-std=c++14' or '-std=gnu++14' Fixed thusly, regtested on x86_64-linux, committed to trunk as obvious. 2020-02-13 Jakub Jelinek * g++.dg/analyzer/pr93212.C: Require c++14 rather than c++11. --- gcc/testsuite/g++.dg/analyzer/pr93212.C +++ gcc/testsuite/g++.dg/analyzer/pr93212.C @@ -1,4 +1,4 @@ -// { dg-do compile { target c++11 } } +// { dg-do compile { target c++14 } } #include auto lol() Jakub
Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
On Wed, 12 Feb 2020, Segher Boessenkool wrote: > On Wed, Feb 12, 2020 at 11:53:22AM +0100, Richard Biener wrote: > > On Wed, 12 Feb 2020, Segher Boessenkool wrote: > > > On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote: > > > > On Tue, 11 Feb 2020, Segher Boessenkool wrote: > > > > > Basic block partitioning has wildly disproportionate fallout in all > > > > > later passes, both in terms of what those *do* (or don't, if > > > > > partitioning > > > > > is enabled), and of impact on the code (not to mention developer > > > > > time). > > > > > > > > > > Maybe the implementation can be improved, but probably we should do > > > > > this > > > > > in a different way altogether. The current situation is not good. > > > > > > > > I think the expectation that you can go back to CFG layout mode > > > > and then work with CFG layout tools after we've lowered to CFG RTL > > > > is simply bogus. > > > > > > Partitioning is also quite problematic if you do not use cfglayout > > > mode. For example, in shrink-wrapping. It prevents a lot there. > > > > > > > Yeah, you can probably do analysis things but > > > > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle > > > > can wreck things. Undoubtedly doing CFG manipulations is not going > > > > to work since CFG layout does not respect CFG RTL restrictions. > > > > > > Doing CFG manipulations on CFG RTL mode directly is almost impossible > > > to do correctly. > > > > > > For example, bb-reorder. Which is a much more important optimisation > > > than partitioning, btw. > > > > BB reorder switches back and forth as well ... :/ > > Yes. It is extremely hard to change any jumps in cfgrtl mode. > > The goal is to use cfglayout mode more, ideally *always*, not to use > it less! Sure! It must be that split requires CFG RTL (it doesn't say so, but we don't have a PROP_rtllayout or a "cannot work with" set of properties). Otherwise I'm not sure why we go out of CFG layout mode so early. Passes not messing with the CFG at all should be ignorant of what mode we are in. > > I think both are closely enough related that we probably should do > > partitioning from within the same framework? OTOH BB reorder > > happens _much_ later. > > This may be doable. What we need to do first though is to find a better > thing to use than EDGE_CROSSING. > > Maybe we can determine which blocks should be hot and cold early, but > actually make that happen only very late (maybe adjusting the decision > if we have to to make things work)? That way, intervening passes do not > have to care (much). The question is whether we can even preserve things like BB counts and edge frequencies once we've gone out of CFG layout mode for once. > > > > Partitioning simply uncovered latent bugs, there's nothing wrong > > > > with it IMHO. > > > > > > I don't agree. The whole way EDGE_CROSSING works hinders all other > > > optimisations a lot. > > > > I'm not sure if it's there for correctness reasons or just to > > help checking that nothing "undoes" the partitioning decision. > > The latter. (It may also make it easier for the former, of course, but > that can be solved anyway). And that makes live very hard for all later > passes, while it is doubtful that it even give the best decisions: for > example it prevents a lot of shrink-wrapping, which you dearly *want* to > do on cold code! Sure. So do that earlier ;) Richard.
[PATCH] sccvn: Handle bitfields in vn_reference_lookup_3 [PR93582]
Hi! The following patch is first step towards fixing PR93582. vn_reference_lookup_3 right now punts on anything that isn't byte aligned, so to be able to lookup a constant bitfield store, one needs to use the exact same COMPONENT_REF, otherwise it isn't found. This patch lifts up that that restriction if the bits to be loaded are covered by a single store of a constant (keeps the restriction so far for the multiple store case, can tweak that incrementally, but I think for bisection etc. it is worth to do it one step at a time). Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux and powerpc64-linux (the last one regtested with \{-m32,-m64\}). Additionally, I've bootstrapped/regtested it on those with statistics gathering on when val was non-NULL (i.e. we managed to see something through) when any of offseti, offset2i, maxsizei or size2i wasn't multiple of BITS_PER_UNIT (i.e. when this optimization triggered). Below is a list of the unique cases where it triggered, the first column says on which of the 5 ABIs it triggered, qmath stands for those that enable libquadmath, i.e. ia32,x86_64,ppc64le. Ok for trunk? all ../../gcc/gimple-ssa-backprop.c {anonymous}::backprop::process_var all gcc/gcc/testsuite/gcc.c-torture/compile/20191015-1.c f all gcc/gcc/testsuite/gcc.c-torture/compile/20191015-2.c f all gcc/gcc/testsuite/gcc.c-torture/compile/20200105-1.c g all gcc/gcc/testsuite/gcc.c-torture/compile/20200105-2.c g all gcc/gcc/testsuite/gcc.c-torture/compile/20200105-3.c g all gcc/gcc/testsuite/gcc.c-torture/execute/20181120-1.c main all gcc/gcc/testsuite/gcc.c-torture/execute/921204-1.c main all gcc/gcc/testsuite/gcc.c-torture/execute/pr58726.c main all gcc/gcc/testsuite/gcc.c-torture/execute/pr86492.c foo all gcc/gcc/testsuite/gcc.dg/store_merging_24.c foo all gcc/gcc/testsuite/gcc.dg/store_merging_25.c foo all gcc/gcc/testsuite/gcc.dg/tree-ssa/pr93582-1.c foo all gcc/gcc/testsuite/gcc.dg/tree-ssa/pr93582-2.c foo all gcc/gcc/testsuite/gcc.dg/tree-ssa/pr93582-3.c foo all gcc/gcc/testsuite/g++.dg/other/pr89692.C foo all ../../../libgcc/unwind-dw2-fde-dip.c init_object all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info_bases all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info_table all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info_table_bases all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_table all ../../../libgcc/unwind-dw2-fde-dip.c search_object all ../../../libgcc/unwind-dw2-fde-dip.c _Unwind_IteratePhdrCallback ia32 /tmp/921204-1.exe.NdZMiZ.ltrans0.o main ia32 /tmp/ccK7HiiN.o main ia32 /tmp/ccWtcuID.o foo ia32 /tmp/pr86492.exe.IKs2SA.ltrans0.o foo lp64 gcc/gcc/testsuite/gnat.dg/pack22.adb pack22 ppc32 /tmp/921204-1.exe.1GoDpE.ltrans0.o main ppc32 /tmp/ccivx4sg.o foo ppc32 /tmp/ccLFNqjQ.o main ppc32 /tmp/pr86492.exe.VkJDwY.ltrans0.o foo ppc32 /tmp/pr88739.exe.y33n0X.ltrans0.o main ppc64le cd2a32a.adb cd2a32a ppc64le /tmp/921204-1.exe.la923O.ltrans0.o main ppc64le /tmp/ccH3NcwE.o main ppc64le /tmp/ccmHysWx.o foo ppc64le /tmp/pr86492.exe.EYd24l.ltrans0.o foo ppc64le /tmp/pr88739.exe.uAT6pA.ltrans0.o main ppc64 /tmp/921204-1.exe.vtoTSo.ltrans0.o main ppc64 /tmp/ccm4RGtK.o main ppc64 /tmp/cczZpRCD.o foo ppc64 /tmp/pr86492.exe.UdbN8I.ltrans0.o foo ppc64 /tmp/pr88739.exe.DtQe1H.ltrans0.o main qmath ../../../libquadmath/math/expq.c expq qmath ../../../libquadmath/math/fmaq.c fmaq qmath ../../../libquadmath/math/nanq.c nanq qmath ../../../libquadmath/strtod/strtoflt128.c strtoflt128_internal x86_64 cd2a32a.adb cd2a32a x86_64 /tmp/921204-1.exe.0059fB.ltrans0.o main x86_64 /tmp/cci9lHhD.o main x86_64 /tmp/ccTlWSbV.o foo x86_64 /tmp/pr86492.exe.NE0yez.ltrans0.o foo x86_64 /tmp/pr88739.exe.PKCG2M.ltrans0.o main x86 gcc/gcc/testsuite/gcc.dg/torture/pr45017.c main x86 gcc/gcc/testsuite/gcc.target/i386/pr37870.c main 2020-02-13 Jakub Jelinek PR tree-optimization/93582 * fold-const.h (shift_bytes_in_array_left, shift_bytes_in_array_right): Declare. * fold-const.c (shift_bytes_in_array_left, shift_bytes_in_array_right): New function, moved from gimple-ssa-store-merging.c, no longer static. * gimple-ssa-store-merging.c (shift_bytes_in_array): Move to gimple-ssa-store-merging.c and rename to shift_bytes_in_array_left. (shift_bytes_in_array_right): Move to gimple-ssa-store-merging.c. (encode_tree_to_bitpos): Use shift_bytes_in_array_left instead of shift_bytes_in_array. (verify_shift_bytes_in_array): Rename to ... (verify_shift_bytes_in_array_left): ... this. Use shift_bytes_in_array_left instead of shift_bytes_in_array. (store_merging_c_tests): Call verify_shift_bytes_in_array_left instead of verify_shift_bytes_in_array. * tree-ssa-sccvn.c (vn_reference_lookup_3): For native_encode_expr / native_interpret_expr where the store covers all needed b