[PATCH v2] Performance/size improvement to single_use when matching GIMPLE.
Here's version 2 of my patch, incorporating Richard Biener's suggestions. I've also done an analysis of the stage3 sizes of gimple-match.o on x86_64-pc-linux-gnu, which I believe is dominated by debug information, the .o file is 30MB in stage3, but only 4.8M in stage2. Before my proposed patch gimple-match.o is 31385160 bytes. The patch as proposed yesterday (using a single loop in single_use) reduces that to 31105040 bytes, saving 280120 bytes. The suggestion to remove the "inline" keyword saves only 56 more bytes, but annotating ATTRIBUTE_PURE on a function prototype was curiously effective, saving 1888 bytes. before: 31385160 after:31105040 saved 280120 -inline: 31104984 saved 56 +pure:31103096 saved 1888 I'll post two more patches later today, that save an additional 53K and 58K respectively (by tweaking genmatch) but the issue is with the size of debugging info (or something else?). This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-16 Roger Sayle Richard Biener gcc/ChangeLog * gimple-match-head.cc (single_use): Implement inline using a single loop. Thanks in advance, Roger -- > -Original Message- > From: Richard Biener > Sent: 15 March 2022 09:18 > To: Roger Sayle > Cc: 'GCC Patches' ; 'Marc Glisse' > > Subject: Re: [PATCH] Performance/size improvement to single_use when > matching GIMPLE. > > On Tue, 15 Mar 2022, Roger Sayle wrote: > > > > > > > This patch improves the implementation of single_use as used in code > > > > generated from match.pd for patterns using :s. The current > > implementation > > > > contains the logic "has_zero_uses (t) || has_single_use (t)" which > > > > performs a loop over the uses to first check if there are zero > > non-debug > > > > uses [which is rare], then another loop over these uses to check if > > there > > > > is exactly one non-debug use. This can be better implemented using a > > > > single loop. > > > > > > > > This function is currently inlined over 800 times in gimple-match.cc, > > > > whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up > > > > and shrinking this function should help offset the growth in match.pd > > > > for GCC 12. > > > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > > and make -k check with no new failures. Ok for mainline? > > Note the intent of has_zero_uses () is even simpler - it's the case for when > there's no SSA operand info on the stmt (no update_stmt called yet). More > precisely it wants to catch the case where the definition of the SSA name is not > in the IL. > > I'm not sure if we want to twist the effective semantics at this point (I guess we > do not want that), so the patch looks like an improvement. But may I ask to > move the function out of line for even more savings? Just put it in gimple- > match-head.cc and have it not declared inline. I think we may want to go as far > and declare the function 'pure' using ATTRIBUTE_PURE. > > > > > > > > > > > 2022-03-15 Roger Sayle > > > > > > > > gcc/ChangeLog > > > > * gimple-match-head.cc (single_use): Implement inline using a > > > > single loop. > > > > > > > > Thanks in advance, > > > > Roger > > > > -- > > > > > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg) diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc index 74d5818..1c74d38 100644 --- a/gcc/gimple-match-head.cc +++ b/gcc/gimple-match-head.cc @@ -1160,10 +1160,28 @@ types_match (tree t1, tree t2) non-SSA_NAME (ie constants) and zero uses to cope with uses that aren't linked up yet. */ -static inline bool -single_use (tree t) +static bool +single_use (const_tree) ATTRIBUTE_PURE; + +static bool +single_use (const_tree t) { - return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t); + if (TREE_CODE (t) != SSA_NAME) +return true; + + /* Inline return has_zero_uses (t) || has_single_use (t); */ + const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t)); + const ssa_use_operand_t *ptr; + bool single = false; + + for (ptr = head->next; ptr != head; ptr = ptr->next) +if (USE_STMT(ptr) && !is_gimple_debug (USE_STMT (ptr))) + { +if (single) + return false; + single = true; + } + return true; } /* Return true if math operations should be canonicalized,
Re: RFA: crc builtin functions & optimizations
On Tue, Mar 15, 2022 at 4:15 PM Joern Rennecke wrote: > > On 15/03/2022, Richard Biener wrote: > > > Why's this a new pass? Every walk over all insns costs time. > > If should typically scan considerably less than all the insns. > > > The pass > > lacks any comments as to what CFG / stmt structure is matched. > > I've put a file in: > config/riscv/tree-crc-doc.txt > > would this text be suitabe to put in a comment block in tree-crc.cc ? Yes, that would be a better place I think. > > From > > a quick look it seems like it first(?) statically matches a stmt sequence > > without considering intermediate stmts, so matching should be quite > > fragile. > > It might be fragile inasmuch as it won't match when things change, but > the matching has remained effective for seven years and across two > architecture families with varying word sizes. > And with regards to matching only what it's supposed to match, I believe > I have checked all the data dependencies and phis so that it's definitely > calculating a CRC. > > > Why not match (sub-)expressions with the help of match.pd? > > Can you match a loop with match.pd ? No, this is why I said (sub-)expression. I'm mainly talking about tmp = (tmp >> 1) | (tmp << (sizeof (tmp) * (8 /*CHAR_BIT*/) - 1)); if ((long)tmp < 0) for example - one key bit of the CRC seems to be a comparison, so you'd match that with sth like (match (crc_compare @0) (lt (bit_ior (rshift @0 integer_onep) (lshift @0 INTEGER_CST@1)) integer_zerop) (if (compare_tree_int (@1, TYPE_SIZE_UNIT (TREE_TYPE (@0)) * 8 - 1 where you can add alternative expression forms. You can then use the generated match function from the pass. See for example the tree-ssa-forwprop.cc use of gimple_ctz_table_index. > > Any reason why you match CRC before early inlinig and thus even when > > not optimizing? Matching at least after early FRE/DCE/DSE would help > > to get rid of abstraction and/or memory temporary uses. > > I haven't originally placed it there, but I believe benefits include: > - Getting rid of loop without having to actively deleting it in the > crc pass (this also > might be safer as we just have to make sure we're are computing the CRC, and > DCE will determine if there is any ancillary result that is left, > and only delete the > loop if it's really dead. > - The optimized function is available for inlining. The canonical place to transform loops into builtins would be loop distribution. Another place would be final value replacement since you basically replace the reduction result with a call to the builtin, but I think loop-distribution is the better overall place. See how we match strlen() there. Richard.
[PATCH] [i386] Don't fold __builtin_ia32_blendvpd w/o sse4.2.
__builtin_ia32_blendvpd is defined under sse4.1 and gimple folded to ((v2di) c) < 0 ? b : a where vec_cmpv2di is under sse4.2 w/o which it's veclowered to scalar operations and not combined back in rtl. Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}. Ready push to main trunk. gcc/ChangeLog: PR target/104946 * config/i386/i386-builtin.def (BDESC): Add CODE_FOR_sse4_1_blendvpd for IX86_BUILTIN_BLENDVPD. * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold __builtin_ia32_blendvpd w/o sse4.2 gcc/testsuite/ChangeLog: * gcc.target/i386/sse4_1-blendvpd-1.c: New test. --- gcc/config/i386/i386-builtin.def | 2 +- gcc/config/i386/i386.cc | 8 +++- gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c | 11 +++ 3 files changed, 19 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index ea327555639..b410102614d 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -906,7 +906,7 @@ BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, CODE_FOR_ssse3_palignrdi, /* SSE4.1 */ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendpd, "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendps, "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) -BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF) +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index d77ad83e437..5a561966eb4 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -18368,10 +18368,16 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) } break; +case IX86_BUILTIN_BLENDVPD: + /* blendvpd is under sse4.1 but pcmpgtq is under sse4.2, +w/o sse4.2, it's veclowered to scalar operations and +not combined back. */ + if (!TARGET_SSE4_2) + break; + /* FALLTHRU. */ case IX86_BUILTIN_PBLENDVB128: case IX86_BUILTIN_PBLENDVB256: case IX86_BUILTIN_BLENDVPS: -case IX86_BUILTIN_BLENDVPD: case IX86_BUILTIN_BLENDVPS256: case IX86_BUILTIN_BLENDVPD256: gcc_assert (n_args == 3); diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c new file mode 100644 index 000..c25d3fbcbd4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-msse4.1 -O2 -mno-sse4.2" } */ +/* { dg-final { scan-assembler-times {(?n)blendvpd[ \t]+%xmm[0-9]+} 1 } } */ + +#include + +__m128d +foo (__m128d a, __m128d b, __m128d c) +{ + return _mm_blendv_pd (a, b, c); +} -- 2.18.1
Re: [PATCH v2] Performance/size improvement to single_use when matching GIMPLE.
On Wed, 16 Mar 2022, Roger Sayle wrote: > > Here's version 2 of my patch, incorporating Richard Biener's suggestions. > I've also done an analysis of the stage3 sizes of gimple-match.o on > x86_64-pc-linux-gnu, which I believe is dominated by debug information, > the .o file is 30MB in stage3, but only 4.8M in stage2. Before my > proposed patch gimple-match.o is 31385160 bytes. The patch as proposed > yesterday (using a single loop in single_use) reduces that to 31105040 > bytes, saving 280120 bytes. The suggestion to remove the "inline" > keyword saves only 56 more bytes, but annotating ATTRIBUTE_PURE on a > function prototype was curiously effective, saving 1888 bytes. > > before: 31385160 > after:31105040 saved 280120 > -inline: 31104984 saved 56 > +pure:31103096 saved 1888 > > I'll post two more patches later today, that save an additional 53K > and 58K respectively (by tweaking genmatch) but the issue is with the > size of debugging info (or something else?). You can always use the 'size' tool and report the size of the text section only ... especially interesting for folks are also compile-times of stage2 gimple-match.cc since that's built by the -O0 compiled checking-enabled stage1 compiler during bootstrap. > This revised patch has been tested on x86_64-pc-linux-gnu with > make bootstrap and make -k check with no new failures. > Ok for mainline? OK. Thanks, Richard. > > 2022-03-16 Roger Sayle > Richard Biener > > gcc/ChangeLog > * gimple-match-head.cc (single_use): Implement inline using a > single loop. > > > Thanks in advance, > Roger > -- > > > -Original Message- > > From: Richard Biener > > Sent: 15 March 2022 09:18 > > To: Roger Sayle > > Cc: 'GCC Patches' ; 'Marc Glisse' > > > > Subject: Re: [PATCH] Performance/size improvement to single_use when > > matching GIMPLE. > > > > On Tue, 15 Mar 2022, Roger Sayle wrote: > > > > > > > > > > > This patch improves the implementation of single_use as used in code > > > > > > generated from match.pd for patterns using :s. The current > > > implementation > > > > > > contains the logic "has_zero_uses (t) || has_single_use (t)" which > > > > > > performs a loop over the uses to first check if there are zero > > > non-debug > > > > > > uses [which is rare], then another loop over these uses to check if > > > there > > > > > > is exactly one non-debug use. This can be better implemented using a > > > > > > single loop. > > > > > > > > > > > > This function is currently inlined over 800 times in gimple-match.cc, > > > > > > whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up > > > > > > and shrinking this function should help offset the growth in match.pd > > > > > > for GCC 12. > > > > > > > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > > > > and make -k check with no new failures. Ok for mainline? > > > > Note the intent of has_zero_uses () is even simpler - it's the case for > when > > there's no SSA operand info on the stmt (no update_stmt called yet). More > > precisely it wants to catch the case where the definition of the SSA name > is not > > in the IL. > > > > I'm not sure if we want to twist the effective semantics at this point (I > guess we > > do not want that), so the patch looks like an improvement. But may I ask > to > > move the function out of line for even more savings? Just put it in > gimple- > > match-head.cc and have it not declared inline. I think we may want to go > as far > > and declare the function 'pure' using ATTRIBUTE_PURE. > > > > > > > > > > > > > > > > > 2022-03-15 Roger Sayle > > > > > > > > > > > > gcc/ChangeLog > > > > > > * gimple-match-head.cc (single_use): Implement inline using a > > > > > > single loop. > > > > > > > > > > > > Thanks in advance, > > > > > > Roger > > > > > > -- > > > > > > > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
Re: [PATCH] tree-optimization/104942: Retain sizetype conversions till the end
On Wed, Mar 16, 2022 at 01:56:50PM +0530, Siddhesh Poyarekar wrote: > Retain the sizetype alloc_object_size to guarantee the assertion in > size_for_offset and to avoid adding a conversion there. nop conversions > are eliminated at the end anyway in dynamic object size computation. > > gcc/ChangeLog: > > tree-optimization/104942 > * tree-object-size.cc (alloc_object_size): Remove STRIP_NOPS. > > gcc/testsuite/ChangeLog: > > tree-optimization/104942 > * gcc.dg/builtin-dynamic-object-size-0.c (alloc_func_long, > test_builtin_malloc_long): New functions. > (main): Use it. > > Signed-off-by: Siddhesh Poyarekar Ok, thanks. Jakub
[PATCH] tree-optimization/104942: Retain sizetype conversions till the end
Retain the sizetype alloc_object_size to guarantee the assertion in size_for_offset and to avoid adding a conversion there. nop conversions are eliminated at the end anyway in dynamic object size computation. gcc/ChangeLog: tree-optimization/104942 * tree-object-size.cc (alloc_object_size): Remove STRIP_NOPS. gcc/testsuite/ChangeLog: tree-optimization/104942 * gcc.dg/builtin-dynamic-object-size-0.c (alloc_func_long, test_builtin_malloc_long): New functions. (main): Use it. Signed-off-by: Siddhesh Poyarekar --- Testing: - i686 build and check - x86_64 bootstrap build and check - --with-build-config=bootstrap-ubsan .../gcc.dg/builtin-dynamic-object-size-0.c| 22 +++ gcc/tree-object-size.cc | 5 + 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index dd8dc99a580..2fca0a9c5b4 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -4,6 +4,15 @@ typedef __SIZE_TYPE__ size_t; #define abort __builtin_abort +void * +__attribute__ ((alloc_size (1))) +__attribute__ ((__nothrow__ , __leaf__)) +__attribute__ ((noinline)) +alloc_func_long (long sz) +{ + return __builtin_malloc (sz); +} + void * __attribute__ ((alloc_size (1))) __attribute__ ((__nothrow__ , __leaf__)) @@ -145,6 +154,16 @@ test_builtin_malloc_condphi5 (size_t sz, int cond, char *c) return ret; } +long +__attribute__ ((noinline)) +test_builtin_malloc_long (long sz, long off) +{ + char *a = alloc_func_long (sz); + char *dest = a + off; + long ret = __builtin_dynamic_object_size (dest, 0); + return ret; +} + /* Calloc-like allocator. */ size_t @@ -419,6 +438,9 @@ main (int argc, char **argv) FAIL (); if (test_builtin_malloc_condphi5 (128, 0, argv[0]) != -1) FAIL (); + long x = 42; + if (test_builtin_malloc_long (x, 0) != x) +FAIL (); if (test_calloc (2048, 4) != 2048 * 4) FAIL (); if (test_builtin_calloc (2048, 8) != 2048 * 8) diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 8be0df6ba40..9728f79da75 100644 --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -784,10 +784,7 @@ alloc_object_size (const gcall *call, int object_size_type) else if (arg1 >= 0) bytes = fold_convert (sizetype, gimple_call_arg (call, arg1)); - if (bytes) -return STRIP_NOPS (bytes); - - return size_unknown (object_size_type); + return bytes ? bytes : size_unknown (object_size_type); } -- 2.35.1
Re: [PATCH] [i386] Don't fold __builtin_ia32_blendvpd w/o sse4.2.
On Wed, Mar 16, 2022 at 9:18 AM liuhongt via Gcc-patches wrote: > > __builtin_ia32_blendvpd is defined under sse4.1 and gimple folded > to ((v2di) c) < 0 ? b : a where vec_cmpv2di is under sse4.2 w/o which > it's veclowered to scalar operations and not combined back in rtl. > > Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}. > Ready push to main trunk. OK. Richard. > gcc/ChangeLog: > > PR target/104946 > * config/i386/i386-builtin.def (BDESC): Add > CODE_FOR_sse4_1_blendvpd for IX86_BUILTIN_BLENDVPD. > * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold > __builtin_ia32_blendvpd w/o sse4.2 > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/sse4_1-blendvpd-1.c: New test. > --- > gcc/config/i386/i386-builtin.def | 2 +- > gcc/config/i386/i386.cc | 8 +++- > gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c | 11 +++ > 3 files changed, 19 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c > > diff --git a/gcc/config/i386/i386-builtin.def > b/gcc/config/i386/i386-builtin.def > index ea327555639..b410102614d 100644 > --- a/gcc/config/i386/i386-builtin.def > +++ b/gcc/config/i386/i386-builtin.def > @@ -906,7 +906,7 @@ BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, > CODE_FOR_ssse3_palignrdi, > /* SSE4.1 */ > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendpd, > "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) > V2DF_FTYPE_V2DF_V2DF_INT) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendps, > "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) > V4SF_FTYPE_V4SF_V4SF_INT) > -BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, > "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) > V2DF_FTYPE_V2DF_V2DF_V2DF) > +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendvpd, > "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) > V2DF_FTYPE_V2DF_V2DF_V2DF) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, > "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) > V4SF_FTYPE_V4SF_V4SF_V4SF) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, > "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) > V2DF_FTYPE_V2DF_V2DF_INT) > BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, > "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) > V4SF_FTYPE_V4SF_V4SF_INT) > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index d77ad83e437..5a561966eb4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -18368,10 +18368,16 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi) > } >break; > > +case IX86_BUILTIN_BLENDVPD: > + /* blendvpd is under sse4.1 but pcmpgtq is under sse4.2, > +w/o sse4.2, it's veclowered to scalar operations and > +not combined back. */ > + if (!TARGET_SSE4_2) > + break; > + /* FALLTHRU. */ > case IX86_BUILTIN_PBLENDVB128: > case IX86_BUILTIN_PBLENDVB256: > case IX86_BUILTIN_BLENDVPS: > -case IX86_BUILTIN_BLENDVPD: > case IX86_BUILTIN_BLENDVPS256: > case IX86_BUILTIN_BLENDVPD256: >gcc_assert (n_args == 3); > diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c > b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c > new file mode 100644 > index 000..c25d3fbcbd4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse4.1 -O2 -mno-sse4.2" } */ > +/* { dg-final { scan-assembler-times {(?n)blendvpd[ \t]+%xmm[0-9]+} 1 } } */ > + > +#include > + > +__m128d > +foo (__m128d a, __m128d b, __m128d c) > +{ > + return _mm_blendv_pd (a, b, c); > +} > -- > 2.18.1 >
Re: [PATCH] aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]
Jakub Jelinek writes: > Hi! > > We unshare all RTL created during expansion, but when > aarch64_load_symref_appropriately is called after expansion like in the > following testcases, we use imm in both HIGH and LO_SUM operands. > If imm is some RTL that shouldn't be shared like a non-sharable CONST, > we get at least with --enable-checking=rtl a checking ICE, otherwise might > just get silently wrong code. > > The following patch fixes that by copying it if it can't be shared. > > Bootstrapped/regtested on aarch64-linux, ok for trunk? > > 2022-03-15 Jakub Jelinek > > PR target/104910 > * config/aarch64/aarch64.cc (aarch64_load_symref_appropriately): Copy > imm rtx. > > * gcc.dg/pr104910.c: New test. OK, thanks. I guess the REG_EQUAL notes might need the same treatment, but that's probably a separate patch. Richard --- gcc/config/aarch64/aarch64.cc.jj 2022-02-22 10:38:02.404689359 +0100 > +++ gcc/config/aarch64/aarch64.cc 2022-03-14 12:42:00.218975192 +0100 > @@ -3971,7 +3971,7 @@ aarch64_load_symref_appropriately (rtx d > if (can_create_pseudo_p ()) > tmp_reg = gen_reg_rtx (mode); > > - emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm)); > + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); > emit_insn (gen_add_losym (dest, tmp_reg, imm)); > return; >} > --- gcc/testsuite/gcc.dg/pr104910.c.jj2022-03-14 12:46:20.983327114 > +0100 > +++ gcc/testsuite/gcc.dg/pr104910.c 2022-03-14 12:46:06.064535794 +0100 > @@ -0,0 +1,14 @@ > +/* PR target/104910 */ > +/* { dg-do compile } */ > +/* { dg-options "-Os -fno-forward-propagate" } */ > +/* { dg-additional-options "-fstack-protector-all" { target fstack_protector > } } */ > + > +void > +bar (void); > + > +void > +foo (int x) > +{ > + if (x) > +bar (); > +} > > Jakub
OpenACC privatization diagnostics vs. 'assert' [PR102841]
Hi! On 2021-05-21T21:29:19+0200, I wrote: > I've pushed "[OpenACC privatization] Largely extend diagnostics and > corresponding testsuite coverage [PR90115]" to master branch in commit > 11b8286a83289f5b54e813f14ff56d730c3f3185 Pushed to master branch commit ab46fc7c3bf01337ea4554f08f4f6b0be8173557 "OpenACC privatization diagnostics vs. 'assert' [PR102841]", see attached. Grüße Thomas > --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c > @@ -1,6 +1,11 @@ > -/* { dg-do run } */ > - > /* Test if, if_present clauses on host_data construct. */ > + > +/* { dg-additional-options "-fopt-info-all-omp" } > + { dg-additional-options "--param=openacc-privatization=noisy" } > + { dg-additional-options "-foffload=-fopt-info-all-omp" } > + { dg-additional-options "-foffload=--param=openacc-privatization=noisy" } > + for testing/documenting aspects of that functionality. */ > + > /* C/C++ variant of 'libgomp.oacc-fortran/host_data-5.F90' */ > > #include > @@ -14,15 +19,19 @@ foo (float *p, intptr_t host_p, int cond) > #pragma acc data copyin(host_p) >{ > #pragma acc host_data use_device(p) if_present > +/* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't > candidate for adjusting OpenACC privatization level: not addressable} "" { > target *-*-* } .-1 } */ > /* p not mapped yet, so it will be equal to the host pointer. */ > assert (p == (float *) host_p); > > #pragma acc data copy(p[0:100]) > +/* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't > candidate for adjusting OpenACC privatization level: not addressable} "" { > target *-*-* } .-1 } */ > +/* { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for > adjusting OpenACC privatization level: not addressable} "" { target *-*-* } > .-2 } */ > { >/* Not inside a host_data construct, so p is still the host pointer. > */ >assert (p == (float *) host_p); > > #pragma acc host_data use_device(p) > + /* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't > candidate for adjusting OpenACC privatization level: not addressable} "" { > target *-*-* } .-1 } */ >{ > #if ACC_MEM_SHARED > assert (p == (float *) host_p); > @@ -33,6 +42,7 @@ foo (float *p, intptr_t host_p, int cond) >} > > #pragma acc host_data use_device(p) if_present > + /* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't > candidate for adjusting OpenACC privatization level: not addressable} "" { > target *-*-* } .-1 } */ >{ > #if ACC_MEM_SHARED > assert (p == (float *) host_p); > @@ -43,6 +53,8 @@ foo (float *p, intptr_t host_p, int cond) >} > > #pragma acc host_data use_device(p) if(cond) > + /* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't > candidate for adjusting OpenACC privatization level: not addressable} "" { > target *-*-* } .-1 } */ > + /* { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate > for adjusting OpenACC privatization level: not addressable} "" { target { ! > openacc_host_selected } } .-2 } */ >{ > #if ACC_MEM_SHARED > assert (p == (float *) host_p); - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From ab46fc7c3bf01337ea4554f08f4f6b0be8173557 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 16 Mar 2022 08:02:39 +0100 Subject: [PATCH] OpenACC privatization diagnostics vs. 'assert' [PR102841] It's an orthogonal concern why these diagnostics do appear at all for non-offloaded OpenACC constructs (where they're not relevant at all); PR90115. Depending on how 'assert' is implemented, it may cause temporaries to be created, and/or may lower into 'COND_EXPR's, and 'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'. Fix-up for commit 11b8286a83289f5b54e813f14ff56d730c3f3185 "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR testsuite/102841 libgomp/ * testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust. --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c index 66501e614fb..50b4fc264d0 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c @@ -4,7 +4,9 @@ { dg-additional-options "--param=openacc-privatization=noisy" } { dg-additional-options "-foffload=-fopt-info-all-omp" } { dg-additional-options "-foffload=--param=openacc-privatization=noisy" } - for testing/document
Re: [PATCH] aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]
On Wed, Mar 16, 2022 at 09:10:35AM +, Richard Sandiford wrote: > Jakub Jelinek writes: > > We unshare all RTL created during expansion, but when > > aarch64_load_symref_appropriately is called after expansion like in the > > following testcases, we use imm in both HIGH and LO_SUM operands. > > If imm is some RTL that shouldn't be shared like a non-sharable CONST, > > we get at least with --enable-checking=rtl a checking ICE, otherwise might > > just get silently wrong code. > > > > The following patch fixes that by copying it if it can't be shared. > > > > Bootstrapped/regtested on aarch64-linux, ok for trunk? > > > > 2022-03-15 Jakub Jelinek > > > > PR target/104910 > > * config/aarch64/aarch64.cc (aarch64_load_symref_appropriately): Copy > > imm rtx. > > > > * gcc.dg/pr104910.c: New test. > > OK, thanks. I guess the REG_EQUAL notes might need the same treatment, > but that's probably a separate patch. If it can be invoked after expansion, yes. Though, I think e.g. having the TLS related loads/stores done after expansion is unlikely. Anyway, --enable-checking=rtl ought to catch this. Jakub
Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.
Richard Biener via Gcc-patches writes: > On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle > wrote: >> I've been wondering about the possible performance/missed-optimization >> impact of my patch for PR middle-end/98420 and similar IEEE correctness >> fixes that disable constant folding optimizations when worrying about -0.0. >> In the common situation where the floating point result is used by a >> FP comparison, there's no distinction between +0.0 and -0.0, so some >> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe. >> >> Consider the following interesting example: >> >> int foo(int x, double y) { >> return (x * 0.0) < y; >> } >> >> Although we know that x (when converted to double) can't be NaN or Inf, >> we still worry that for negative values of x that (x * 0.0) may be -0.0 >> and so perform the multiplication at run-time. But in this case, the >> result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y) >> for any y, hence the above may be safely constant folded to "0.0 < y" >> avoiding the multiplication at run-time. >> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >> and make -k check with no new failures, and allows GCC to continue to >> optimize cases that we optimized in GCC 11 (without regard to correctness). >> Ok for mainline? > > Isn't that something that gimple-ssa-backprop.c is designed to handle? I > wonder > if you can see whether the signed zero speciality can be retrofitted there? > It currently tracks "sign does not matter", so possibly another state, > "sign of zero > does not matter" could be introduced there. I agree that would be a nice thing to have FWIW. gimple-ssa-backprop.c was added to avoid regressions in one specific fold-const -> match.pd move, but it was supposed to be suitable for other similar things in future. Thanks, Richard
Re: [PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.
On Wed, Mar 16, 2022 at 3:19 AM liuhongt wrote: > > This patch only handle pure-slp for by-value passed parameter which > has nothing to do with IPA but psABI. For by-reference passed > parameter IPA is required. > > The patch is aggressive in determining STLF failure, any > unaligned_load for parm_decl passed by stack is thought to have STLF > stall issue. It could lose some perf where there's no such issue(1 > vector_load vs n scalar_load + CTOR). > > According to microbenchmark in PR, cost of STLF failure is generally > between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD > processors. > > gcc/ChangeLog: > > PR target/101908 > * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. > (ix86_vector_costs::add_stmt_cost): Add extra cost for > unsigned_load which may have store forwarding stall issue. > * config/i386/i386.h (processor_costs): Add new member > stfs. > * config/i386/x86-tune-costs.h (i386_size_cost): Initialize > stfs. > (i386_cost, i486_cost, pentium_cost, lakemont_cost, > pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost, > amdfam10_cost, bdver_cost, znver1_cost, znver2_cost, > znver3_cost, skylake_cost, icelake_cost, alderlake_cost, > btver1_cost, btver2_cost, pentium4_cost, nocano_cost, > atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost, > core_cost): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr101908-1.c: New test. > * gcc.target/i386/pr101908-2.c: New test. > * gcc.target/i386/pr101908-3.c: New test. > * gcc.target/i386/pr101908-v16hi.c: New test. > * gcc.target/i386/pr101908-v16qi.c: New test. > * gcc.target/i386/pr101908-v16sf.c: New test. > * gcc.target/i386/pr101908-v16si.c: New test. > * gcc.target/i386/pr101908-v2df.c: New test. > * gcc.target/i386/pr101908-v2di.c: New test. > * gcc.target/i386/pr101908-v2hi.c: New test. > * gcc.target/i386/pr101908-v2qi.c: New test. > * gcc.target/i386/pr101908-v2sf.c: New test. > * gcc.target/i386/pr101908-v2si.c: New test. > * gcc.target/i386/pr101908-v4df.c: New test. > * gcc.target/i386/pr101908-v4di.c: New test. > * gcc.target/i386/pr101908-v4hi.c: New test. > * gcc.target/i386/pr101908-v4qi.c: New test. > * gcc.target/i386/pr101908-v4sf.c: New test. > * gcc.target/i386/pr101908-v4si.c: New test. > * gcc.target/i386/pr101908-v8df-adl.c: New test. > * gcc.target/i386/pr101908-v8df.c: New test. > * gcc.target/i386/pr101908-v8di-adl.c: New test. > * gcc.target/i386/pr101908-v8di.c: New test. > * gcc.target/i386/pr101908-v8hi-adl.c: New test. > * gcc.target/i386/pr101908-v8hi.c: New test. > * gcc.target/i386/pr101908-v8qi-adl.c: New test. > * gcc.target/i386/pr101908-v8qi.c: New test. > * gcc.target/i386/pr101908-v8sf-adl.c: New test. > * gcc.target/i386/pr101908-v8sf.c: New test. > * gcc.target/i386/pr101908-v8si-adl.c: New test. > * gcc.target/i386/pr101908-v8si.c: New test. > --- > gcc/config/i386/i386.cc | 51 +++ > gcc/config/i386/i386.h| 1 + > gcc/config/i386/x86-tune-costs.h | 28 ++ > gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++ > gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++ > gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++ > .../gcc.target/i386/pr101908-v16hi.c | 6 ++ > .../gcc.target/i386/pr101908-v16qi.c | 30 +++ > .../gcc.target/i386/pr101908-v16sf.c | 6 ++ > .../gcc.target/i386/pr101908-v16si.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v2df.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v2di.c | 7 ++ > gcc/testsuite/gcc.target/i386/pr101908-v2hi.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 > gcc/testsuite/gcc.target/i386/pr101908-v2sf.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v2si.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v4df.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v4di.c | 7 ++ > gcc/testsuite/gcc.target/i386/pr101908-v4hi.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 > gcc/testsuite/gcc.target/i386/pr101908-v4sf.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v4si.c | 6 ++ > .../gcc.target/i386/pr101908-v8df-adl.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v8df.c | 6 ++ > .../gcc.target/i386/pr101908-v8di-adl.c | 7 ++ > gcc/testsuite/gcc.target/i386/pr101908-v8di.c | 7 ++ > .../gcc.target/i386/pr101908-v8hi-adl.c | 6 ++ > gcc/testsuite/gcc.target/i386/pr101908-v8hi.c | 6 ++ > .../gcc.target/i386/pr101908-v8qi-adl.c | 22 + > gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 + > .../gcc.target/i386/pr101908-
[PATCH] c-family: Fix ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128 [PR101515]
On Tue, Mar 15, 2022 at 01:32:46PM +0100, Jakub Jelinek via Gcc-patches wrote: > Another one is whether we shouldn't punt for FIELD_DECLs that don't have > nicely printable name of its containing scope, something like: > if (tree scope = get_containing_scope (field)) > if (TYPE_P (scope) && TYPE_NAME (scope) == NULL_TREE) > break; > return cop; > or so. This seems to work fine, ok if it passes bootstrap/regtest? No need to do it for C because it doesn't print the contexts (has no syntax for that even). 2022-03-16 Jakub Jelinek PR c++/101515 * c-pretty-print.cc (c_fold_indirect_ref_for_warn): For C++ don't return COMPONENT_REFs with FIELD_DECLs whose containing scope can't be printed. * g++.dg/warn/pr101515.C: New test. --- gcc/c-family/c-pretty-print.cc.jj 2022-02-14 13:14:39.037128630 +0100 +++ gcc/c-family/c-pretty-print.cc 2022-03-16 11:16:54.957658064 +0100 @@ -1889,6 +1889,12 @@ c_fold_indirect_ref_for_warn (location_t = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (field))); if (upos <= off && off < upos + el_sz) { + /* The C++ pretty printers print scope of the FIELD_DECLs, + so punt if it is something that can't be printed. */ + if (c_dialect_cxx ()) + if (tree scope = get_containing_scope (field)) + if (TYPE_P (scope) && TYPE_NAME (scope) == NULL_TREE) + break; tree cop = build3_loc (loc, COMPONENT_REF, TREE_TYPE (field), op, field, NULL_TREE); off = off - upos; --- gcc/testsuite/g++.dg/warn/pr101515.C.jj 2022-03-16 11:19:09.004806778 +0100 +++ gcc/testsuite/g++.dg/warn/pr101515.C2022-03-16 11:24:35.136302673 +0100 @@ -0,0 +1,18 @@ +// PR c++/101515 +// { dg-do compile } +// { dg-options "-O1 -Wuninitialized" } + +struct S { int j; }; +struct T : public S { virtual void h () {} }; +struct U { void (*ptr) (); }; +typedef void (S::*sp) (); + +int +main () +{ + T t; + sp x; + U *xp = (U *) &x; + if (xp->ptr != ((void (*) ()) (sizeof (void *// { dg-warning "is used uninitialized" } +return 1; +} Jakub
[PATCH] Generate an if instead of a switch with one case in genmatch.
This patch is the first of two changes to genmatch that don't affect the executable code, but reduce the amount of debugging information generated in stage3 of a build, but adhering more closely to GNU style guidelines. This patch avoids generating a switch with a single case statement, instead preferring to use an "if (TREE_CODE (...) == SSA_NAME)" idiom. These should compile to the same instructions, but the switch requires more lines, especially when a debugger may set a break point on the switch, the case, or the (obligatory) final "default:;". This reduces the size of gimple-match.o by 53K on x86_64-pc-linux-gnu. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-16 Roger Sayle gcc/ChangeLog * gcc/genmatch.cc (dt_node::gen_kids_1): Introduce use_switch logic that prefers to generate an if statement rather than a switch containing a single case (and a default). Thanks in advance, Roger -- diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index 2eda730..c4e22ef 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -2999,6 +2999,7 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth, unsigned gexprs_len = generic_exprs.length (); unsigned fns_len = fns.length (); unsigned gfns_len = generic_fns.length (); + bool use_switch = true; if (exprs_len || fns_len || gexprs_len || gfns_len) { @@ -3011,7 +3012,30 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth, else generic_exprs[0]->get_name (kid_opname); - fprintf_indent (f, indent, "switch (TREE_CODE (%s))\n", kid_opname); + if (gexprs_len == 0 && gfns_len == 0) + { + fprintf_indent (f, indent, "if (TREE_CODE (%s) == SSA_NAME)\n", + kid_opname); + use_switch = false; + } + else if (exprs_len == 0 + && fns_len == 0 + && gexprs_len == 1 + && gfns_len == 0) + { + expr *e = as_a (generic_exprs[0]->op); + id_base *op = e->operation; + /* id_base doesn't have a != operator. */ + if (!(*op == CONVERT_EXPR || *op == NOP_EXPR)) + { + fprintf_indent (f, indent, "if (TREE_CODE (%s) == %s)\n", + kid_opname, op->id); + use_switch = false; + } + } + + if (use_switch) + fprintf_indent (f, indent, "switch (TREE_CODE (%s))\n", kid_opname); fprintf_indent (f, indent, " {\n"); indent += 2; } @@ -3019,8 +3043,8 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth, if (exprs_len || fns_len) { depth++; - fprintf_indent (f, indent, - "case SSA_NAME:\n"); + if (use_switch) + fprintf_indent (f, indent, "case SSA_NAME:\n"); fprintf_indent (f, indent, " if (gimple *_d%d = get_def (valueize, %s))\n", depth, kid_opname); @@ -3103,7 +3127,8 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth, } } - fprintf_indent (f, indent, " break;\n"); + if (use_switch) + fprintf_indent (f, indent, " break;\n"); } for (unsigned i = 0; i < generic_exprs.length (); ++i) @@ -3115,12 +3140,18 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth, else if (*op == SSA_NAME && (exprs_len || fns_len)) /* Already handled above. */ continue; - else + else if (use_switch) fprintf_indent (f, indent, "case %s:\n", op->id); - fprintf_indent (f, indent, " {\n"); - generic_exprs[i]->gen (f, indent + 4, gimple, depth); - fprintf_indent (f, indent, "break;\n"); - fprintf_indent (f, indent, " }\n"); + + if (use_switch) + { + fprintf_indent (f, indent, " {\n"); + generic_exprs[i]->gen (f, indent + 4, gimple, depth); + fprintf_indent (f, indent, "break;\n"); + fprintf_indent (f, indent, " }\n"); + } + else + generic_exprs[i]->gen (f, indent + 2, gimple, depth); } if (gfns_len) @@ -3157,7 +3188,8 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, int depth, if (exprs_len || fns_len || gexprs_len || gfns_len) { indent -= 4; - fprintf_indent (f, indent, "default:;\n"); + if (use_switch) + fprintf_indent (f, indent, "default:;\n"); fprintf_indent (f, indent, "}\n"); }
[PATCH] Avoid generating unused labels in genmatch.
This patch is the second of two changes to genmatch that don't affect the executable code, but reduce the amount of debugging information generated in stage3 of a build, but adhering more closely to GNU style guidelines. This patch avoids generating "next_after_fail1:;" label statements in genmatch, if this label is unused/never referenced as the target of a goto. Because jumps to these labels are always forwards, we know at the point the label is emitted whether it is used or not. Because a debugger may set a breakpoint these labels, this increase the size of g{imple,eneric}-match.o in a stage3 build. To save a little extra space, I also shorten the label to "Lfail1:;" and only increment the counter when necessary. This reduces the size of gimple-match.o by 58K on x86_64-pc-linux-gnu. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-16 Roger Sayle gcc/ChangeLog * gcc/genmatch.cc (fail_label_used): New global variable. (expr::gen_transform): Set fail_label_used whenever a goto FAIL_LABEL is generated. (dt_simplify::gen_1): Clear fail_label_used when generating a new (provisional) fail_label. Set fail_label used whenever a goto fail_label is generated. Avoid emitting fail_label: if fail_label_used is false, instead decrement fail_label_cnt. Thanks in advance, Roger -- diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index 2eda730..4a61f4e 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -2356,6 +2356,8 @@ capture_info::walk_c_expr (c_expr *e) /* The current label failing the current matched pattern during code generation. */ static char *fail_label; +/* Record that a reference/goto to the above label been generated. */ +static bool fail_label_used; /* Code generation off the decision tree and the refered AST nodes. */ @@ -2533,6 +2535,7 @@ expr::gen_transform (FILE *f, int indent, const char *dest, bool gimple, fprintf_indent (f, indent, "if (!_r%d) goto %s;\n", depth, fail_label); + fail_label_used = true; if (*opr == CONVERT_EXPR) { indent -= 4; @@ -2560,13 +2563,15 @@ expr::gen_transform (FILE *f, int indent, const char *dest, bool gimple, fprintf (f, ");\n"); if (opr->kind != id_base::CODE) { - fprintf_indent (f, indent, "if (!_r%d)\n", depth); - fprintf_indent (f, indent, " goto %s;\n", fail_label); + fprintf_indent (f, indent, "if (!_r%d) goto %s;\n", + depth, fail_label); + fail_label_used = true; } if (force_leaf) { fprintf_indent (f, indent, "if (EXPR_P (_r%d))\n", depth); fprintf_indent (f, indent, " goto %s;\n", fail_label); + fail_label_used = true; } if (*opr == CONVERT_EXPR) { @@ -3285,8 +3290,9 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result) static unsigned fail_label_cnt; char local_fail_label[256]; - snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt); + snprintf (local_fail_label, 256, "Lfail%u", ++fail_label_cnt); fail_label = local_fail_label; + fail_label_used = false; /* Analyze captures and perform early-outs on the incoming arguments that cover cases we cannot handle. */ @@ -3301,6 +3307,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result) fprintf_indent (f, indent, "if (TREE_SIDE_EFFECTS (_p%d)) goto %s;\n", i, fail_label); + fail_label_used = true; if (verbose >= 1) warning_at (as_a (s->match)->ops[i]->location, "forcing toplevel operand to have no " @@ -3316,6 +3323,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result) fprintf_indent (f, indent, "if (TREE_SIDE_EFFECTS (captures[%d])) " "goto %s;\n", i, fail_label); + fail_label_used = true; if (verbose >= 1) warning_at (cinfo.info[i].c->location, "forcing captured operand to have no " @@ -3358,7 +3366,12 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result) } if (s->kind == simplify::SIMPLIFY) -fprintf_indent (f, indent, "if (__builtin_expect (!dbg_cnt (match), 0)) goto %s;\n", fail_label); +{ + fprintf_indent (f, indent, + "if (__builtin_expect (!dbg_cnt (match), 0)) goto %s;\n", + fail_label); + fail_label_used = true; +} fprintf_indent (f, indent, "if (__builtin_expect (dump_file && (dump_flags & TDF_FOLDING), 0)) " "fprintf (dump_file, \"%s ", @@ -3430,9 +3443,12 @@ dt_simplify::gen_
Re: [PATCH v8 00/12] Add LoongArch support.
在 2022/3/15 下午10:35, Xi Ruoyao 写道: On Fri, 2022-03-04 at 15:17 +0800, xucheng...@loongson.cn wrote: v7 -> v8 1. Add new addressing type ADDRESS_REG_REG support. 2. Modify documentation. 3. Eliminate compile-time warnings. Hi, The v8 series does not build LoongArch Linux kernel tree (https://github.com/loongson/linux, loongarch-next branch) successfully. This is a regression: the v7 series built the kernel fine. A testcase reduced from the __get_data_asm macro in uaccess.h: $ cat t1.c char *ptr; int offset; struct m { char a[2]; }; char x (void) { char t; asm volatile("ld.b %0, %1" : "=r"(t) : "o"(*(struct m *)(ptr + offset))); return t; } $ ./gcc/cc1 t1.c -nostdinc -O t1.c: In function ‘x’: t1.c:12:3: error: impossible constraint in ‘asm’ 12 | asm volatile("ld.b %0, %1" : "=r"(t) : "o"(*(struct m *)(ptr + offset))); | ^~~ It seems changing the constraint "o" to "m" can work around this issue. I'm not sure if this is a compiler bug or a kernel bug. Hi, LoongArch supports memory modes as follows: mode memory constraint 1. base + index "k" 2. base + imm12 "m" 3. base + imm16 (immediate 4-byte alignment) "ZC" 4. base + 0 "ZB" so, this constraint should be "m".
Re: [PATCH] rs6000: Allow using -mlong-double-64 after -mabi={ibm, ieee}longdouble [PR104208, PR87496]
On Tue, Mar 15, 2022 at 02:49:39PM -0500, Peter Bergner wrote: > On 3/4/22 8:14 PM, Peter Bergner wrote: > > On 3/4/22 11:33 AM, Peter Bergner wrote: > >>> Ok pushed to trunk. I haven't determined yet whether we need this on GCC > >>> 11 yet. > >>> I'll check on that and report back. Thanks! > >> > >> I've confirmed that GCC 11 fails the same way and that the backported patch > >> fixes the issue there too. Ok for GCC 11 assuming my full regression > >> testing > >> is clean? > >> > >> GCC 10 has the same checking code, so it looks to need the backport as > >> well. > >> I'll go ahead and backport and regression test it there too. > > > > The backports to GCC 11 and GCC 10 bootstrapped and regtested with no > > regressions. > > Ok for the GCC 11 and GCC 10 release branches after a day or two of baking > > on > > trunk? > > Ping. > > The trunk patch has been confirmed to fix the glibc build errors and no issues > with the patch has surfaced, so ok for the GCC11 and GCC10 release branches? If you can confirm it fixes the glibc build error on 11 and 10 as well, then yes please. Thanks! Segher
[PATCH] tree-optimization/102008 - restore if-conversion of adjacent loads
The following re-orders the newly added code sinking pass before the last phiopt pass which performs hoisting of adjacent loads with the intent to enable if-conversion on those. I've added the aarch64 specific testcase from the PR. Bootstrapped and tested on x86_64-unknown-linux-gnu, verified the testcase fails/passes with a aarch64 cross. OK? Thanks, Richard. 2022-03-16 Richard Biener PR tree-optimization/102008 * passes.def: Move the added code sinking pass before the preceeding phiopt pass. * gcc.target/aarch64/pr102008.c: New testcase. --- gcc/passes.def | 2 +- gcc/testsuite/gcc.target/aarch64/pr102008.c | 13 + 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102008.c diff --git a/gcc/passes.def b/gcc/passes.def index f7718181038..c8903b4ec16 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -349,10 +349,10 @@ along with GCC; see the file COPYING3. If not see /* After late CD DCE we rewrite no longer addressed locals into SSA form if possible. */ NEXT_PASS (pass_forwprop); + NEXT_PASS (pass_sink_code); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_fold_builtins); NEXT_PASS (pass_optimize_widening_mul); - NEXT_PASS (pass_sink_code); NEXT_PASS (pass_store_merging); NEXT_PASS (pass_tail_calls); /* If DCE is not run before checking for uninitialized uses, diff --git a/gcc/testsuite/gcc.target/aarch64/pr102008.c b/gcc/testsuite/gcc.target/aarch64/pr102008.c new file mode 100644 index 000..d54436c0ffd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr102008.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct Foo { int a; int b; }; + +int test(int side, const struct Foo *foo) { + if (side == 1) return foo->a; + return foo->b; +} + +/* We want to if-convert the load, not the address. */ +/* { dg-final { scan-assembler-not "add" } } */ +/* { dg-final { scan-assembler-times "csel" 1 } } */ -- 2.34.1
Re: [PATCH] tree-optimization/102008 - restore if-conversion of adjacent loads
On Wed, Mar 16, 2022 at 01:42:26PM +0100, Richard Biener wrote: > The following re-orders the newly added code sinking pass before > the last phiopt pass which performs hoisting of adjacent loads > with the intent to enable if-conversion on those. > > I've added the aarch64 specific testcase from the PR. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, verified > the testcase fails/passes with a aarch64 cross. > > OK? > > Thanks, > Richard. > > 2022-03-16 Richard Biener > > PR tree-optimization/102008 > * passes.def: Move the added code sinking pass before the > preceeding phiopt pass. > > * gcc.target/aarch64/pr102008.c: New testcase. Ok. Jakub
Re: [PATCH] Generate an if instead of a switch with one case in genmatch.
On Wed, Mar 16, 2022 at 1:28 PM Roger Sayle wrote: > > > This patch is the first of two changes to genmatch that don't affect > the executable code, but reduce the amount of debugging information > generated in stage3 of a build, but adhering more closely to GNU style > guidelines. > > This patch avoids generating a switch with a single case statement, > instead preferring to use an "if (TREE_CODE (...) == SSA_NAME)" idiom. > These should compile to the same instructions, but the switch requires > more lines, especially when a debugger may set a break point on the > switch, the case, or the (obligatory) final "default:;". This reduces > the size of gimple-match.o by 53K on x86_64-pc-linux-gnu. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? Hmm, this makes the complicated code emission in gen_kids_1 even more complicated for a questionable gain which I'd rather not do, debuginfo savings or not ... Richard. > > 2022-03-16 Roger Sayle > > gcc/ChangeLog > * gcc/genmatch.cc (dt_node::gen_kids_1): Introduce use_switch > logic that prefers to generate an if statement rather than a > switch containing a single case (and a default). > > > Thanks in advance, > Roger > -- >
Re: [PATCH] Avoid generating unused labels in genmatch.
On Wed, Mar 16, 2022 at 1:31 PM Roger Sayle wrote: > > > This patch is the second of two changes to genmatch that don't affect > the executable code, but reduce the amount of debugging information > generated in stage3 of a build, but adhering more closely to GNU style > guidelines. > > This patch avoids generating "next_after_fail1:;" label statements > in genmatch, if this label is unused/never referenced as the target > of a goto. Because jumps to these labels are always forwards, we > know at the point the label is emitted whether it is used or not. > Because a debugger may set a breakpoint these labels, this increase > the size of g{imple,eneric}-match.o in a stage3 build. To save a > little extra space, I also shorten the label to "Lfail1:;" and only > increment the counter when necessary. This reduces the size of > gimple-match.o by 58K on x86_64-pc-linux-gnu. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? I wonder if we could make this nicer by abstracting sth like class fail_label { fail_label (unsigned num); emit_goto () { fprintf (..., "goto L%u", num); used = true; } emit_label () { fprintf ("L%u:;\n", num); } unsigned num; bool used; }; or some better abstraction capturing all of the global state. > > > 2022-03-16 Roger Sayle > > gcc/ChangeLog > * gcc/genmatch.cc (fail_label_used): New global variable. > (expr::gen_transform): Set fail_label_used whenever a goto > FAIL_LABEL is generated. > (dt_simplify::gen_1): Clear fail_label_used when generating > a new (provisional) fail_label. Set fail_label used whenever > a goto fail_label is generated. Avoid emitting fail_label: > if fail_label_used is false, instead decrement fail_label_cnt. > > > Thanks in advance, > Roger > -- >
Re: [PATCH] Avoid generating unused labels in genmatch.
On Wed, Mar 16, 2022 at 1:59 PM Richard Biener wrote: > > On Wed, Mar 16, 2022 at 1:31 PM Roger Sayle > wrote: > > > > > > This patch is the second of two changes to genmatch that don't affect > > the executable code, but reduce the amount of debugging information > > generated in stage3 of a build, but adhering more closely to GNU style > > guidelines. > > > > This patch avoids generating "next_after_fail1:;" label statements > > in genmatch, if this label is unused/never referenced as the target > > of a goto. Because jumps to these labels are always forwards, we > > know at the point the label is emitted whether it is used or not. > > Because a debugger may set a breakpoint these labels, this increase > > the size of g{imple,eneric}-match.o in a stage3 build. To save a > > little extra space, I also shorten the label to "Lfail1:;" and only > > increment the counter when necessary. This reduces the size of > > gimple-match.o by 58K on x86_64-pc-linux-gnu. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > I wonder if we could make this nicer by abstracting sth like > > class fail_label > { >fail_label (unsigned num); >emit_goto () { fprintf (..., "goto L%u", num); used = true; } >emit_label () { fprintf ("L%u:;\n", num); } > unsigned num; > bool used; > }; > > or some better abstraction capturing all of the global state. Btw, the unused labels were one reason for having gimple-match.o-warn = -Wno-unused generic-match.o-warn = -Wno-unused maybe that's then no longer needed. > > > > > > 2022-03-16 Roger Sayle > > > > gcc/ChangeLog > > * gcc/genmatch.cc (fail_label_used): New global variable. > > (expr::gen_transform): Set fail_label_used whenever a goto > > FAIL_LABEL is generated. > > (dt_simplify::gen_1): Clear fail_label_used when generating > > a new (provisional) fail_label. Set fail_label used whenever > > a goto fail_label is generated. Avoid emitting fail_label: > > if fail_label_used is false, instead decrement fail_label_cnt. > > > > > > Thanks in advance, > > Roger > > -- > >
Re: [PATCH] OpenMP, Fortran: Bugfix for omp_set_num_teams.
Hi Jakub, ! { dg-do run } ! { dg-additional-options "-fdefault-integer-8" } program set_num_teams_8 use omp_lib omp_set_num_teams (42) if (omp_get_num_teams () .ne. 42) stop 1 end program I modified your suggested test case a bit: program set_num_teams_8 use omp_lib use, intrinsic :: iso_fortran_env integer(int64) :: x x = 42 call omp_set_num_teams (x) if (omp_get_max_teams () .ne. 42) stop 1 end program I tested it with/without the fix and the test passed/failed as expected. Hope, that's ok? Marcel - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP, Fortran: Bugfix for omp_set_num_teams. This patch fixes a small bug in the omp_set_num_teams implementation. libgomp/ChangeLog: * fortran.c (omp_set_num_teams_8_): Fix bug. * testsuite/libgomp.fortran/icv-8.f90: New test. diff --git a/libgomp/fortran.c b/libgomp/fortran.c index 8c1cfd1..d984ce5 100644 --- a/libgomp/fortran.c +++ b/libgomp/fortran.c @@ -491,7 +491,7 @@ omp_set_num_teams_ (const int32_t *num_teams) void omp_set_num_teams_8_ (const int64_t *num_teams) { - omp_set_max_active_levels (TO_INT (*num_teams)); + omp_set_num_teams (TO_INT (*num_teams)); } int32_t diff --git a/libgomp/testsuite/libgomp.fortran/icv-8.f90 b/libgomp/testsuite/libgomp.fortran/icv-8.f90 new file mode 100644 index 000..9478c15 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/icv-8.f90 @@ -0,0 +1,10 @@ +! This tests 'set_num_teams_8' function. + +program set_num_teams_8 + use omp_lib + use, intrinsic :: iso_fortran_env + integer(int64) :: x + x = 42 + call omp_set_num_teams (x) + if (omp_get_max_teams () .ne. 42) stop 1 +end program
Re: [PATCH] RISC-V: Handle combine extension in canonical ordering.
Hi Shi-Hua: Thanks, generally it's LGTM, just a few coding style issues. I've fixed that and committed this time, but just let you know where you should update your coding style. And you could use git clang-format and use /contrib/clang-format as format file to save your time to indent that. https://stackoverflow.com/questions/43974371/run-git-clang-format-on-series-of-git-commits On Tue, Mar 8, 2022 at 11:31 AM wrote: > > From: LiaoShihua > > The crypto extension have several shorthand extensions that don't consist of > any extra instructions. > Take zk for example, while the extension would imply zkn, zkr, zkt. > The 3 extensions should also combine back into zk to maintain the canonical > order in isa strings. > This patch addresses the above. > And if the other extension has the same situation, you can add them in > riscv_combine_info[] > > > > gcc/ChangeLog: > > * common/config/riscv/riscv-common.cc > (riscv_subset_list::handle_combine_ext):Combine back into zk to maintain the > canonical order in isa strings. > (riscv_subset_list::parse):Ditto. You need to mention riscv_combine_info in the change log too :) > * config/riscv/riscv-subset.h:Declare handle_combine_ext(); > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/predef-17.c: New test. > > --- > gcc/common/config/riscv/riscv-common.cc| 56 +++ > gcc/config/riscv/riscv-subset.h| 1 + > gcc/testsuite/gcc.target/riscv/predef-17.c | 63 ++ > 3 files changed, 120 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/predef-17.c > > diff --git a/gcc/common/config/riscv/riscv-common.cc > b/gcc/common/config/riscv/riscv-common.cc > index a904893b9ed..1c06f83cc1c 100644 > --- a/gcc/common/config/riscv/riscv-common.cc > +++ b/gcc/common/config/riscv/riscv-common.cc > @@ -189,6 +189,16 @@ static const struct riscv_ext_version > riscv_ext_version_table[] = >{NULL, ISA_SPEC_CLASS_NONE, 0, 0} > }; > > +/* Combine extensions defined in this table */ > +static const struct riscv_ext_version riscv_combine_info[] = > +{ > + {"zk", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zkn", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zks", ISA_SPEC_CLASS_NONE, 1, 0}, > + /* Terminate the list. */ > + {NULL, ISA_SPEC_CLASS_NONE, 0, 0} > +}; > + > static const riscv_cpu_info riscv_cpu_tables[] = > { > #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \ > @@ -813,6 +823,50 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t > *ext) > } > } > > +/* Check any combine extensions for EXT. */ > +void > +riscv_subset_list::handle_combine_ext (riscv_subset_list *subset_list) You don't need to pass subset_list again, you can just use `this` > +{ > + const riscv_ext_version *combine_info; > + const riscv_implied_info_t *implied_info; > + bool IsCombined = false; is_combined > + > + for (combine_info = &riscv_combine_info[0]; combine_info->name; > ++combine_info) > + { for (...) { // two space > + > +/* Skip if combine extensions are present */ > +if (subset_list->lookup(combine_info->name)) > + continue; > + > +/* Find all extensions of the combine extension */ > +for (implied_info = &riscv_implied_info[0]; implied_info->ext; > ++implied_info) > +{ > + /* Skip if implied extension don't match combine extension */ > + if (strcmp(combine_info->name, implied_info->ext) != 0) > +continue; > + > + if (subset_list->lookup(implied_info->implied_ext)) > + { > +IsCombined = true; > + } No curly brackets needed if only one line. > + else > + { > +IsCombined = false; > +break; > + } > +} > + > +/* Add combine extensions */ > +if (IsCombined) > +{ > + if (subset_list->lookup(combine_info->name) == NULL) > + { > +subset_list->add (combine_info->name, combine_info->major_version, > combine_info->minor_version, false, true); > + } > +} > + } > +} > + > /* Parsing function for multi-letter extensions. > > Return Value: > @@ -992,6 +1046,8 @@ riscv_subset_list::parse (const char *arch, location_t > loc) >subset_list->handle_implied_ext (itr); > } > > + subset_list->handle_combine_ext (subset_list); > + >return subset_list; > > fail: > diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h > index 4f3556a8d9b..da2e22d34f2 100644 > --- a/gcc/config/riscv/riscv-subset.h > +++ b/gcc/config/riscv/riscv-subset.h > @@ -68,6 +68,7 @@ private: > const char *); > >void handle_implied_ext (riscv_subset_t *); > + void handle_combine_ext (riscv_subset_list *); > > public: >~riscv_subset_list (); > diff --git a/gcc/testsuite/gcc.target/riscv/predef-17.c > b/gcc/testsuite/gcc.target/riscv/predef-17.c > new file mode 100644 > index 000..68f5f95a66c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/predef-17.
[PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]
This patch documents the Solaris-specific D bootstrap requirements. Tested by building and inspecting gccinstall.{pdf,info}. Ok for trunk? I've omitted the Darwin-specific stuff so far documented in PRs d/103577 and d/103578: * needs --enable-libphobos * top of gcc-11 branch only * backport of -static-libphobos patch * Darwin/i386 doesn't work at all Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2022-03-16 Rainer Orth gcc: PR d/103528 * doc/install.texi (Tools/packages necessary for building GCC) (GDC): Document libphobos requirement. (Host/target specific installation notes for GCC, *-*-solaris2*): Document libphobos and GDC specifics. # HG changeset patch # Parent 33de4c9d886299fd8cc97e20c0f761c2f28a3eef doc: Document Solaris D bootstrap requirements [PR 103528] diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -287,7 +287,8 @@ section. @item @anchor{GDC-prerequisite}GDC In order to build GDC, the D compiler, you need a working GDC -compiler (GCC version 9.1 or later), as the D front end is written in D. +compiler (GCC version 9.1 or later) and D runtime library, +@samp{libphobos}, as the D front end is written in D. Versions of GDC prior to 12 can be built with an ISO C++11 compiler, which can then be installed and used to bootstrap newer versions of the D front end. @@ -303,6 +304,10 @@ front end does not make use of any GDC-s of the D language, if too old a GDC version is installed and @option{--enable-languages=d} is used, the build will fail. +On some targets, @samp{libphobos} isn't enabled by default, but compiles +and works if @option{--enable-libphobos} is used. Specifics are +documented for affected targets. + @item A ``working'' POSIX compatible shell, or GNU bash Necessary when running @command{configure} because some @@ -4851,6 +4856,12 @@ GNU binutils. @samp{libstdc++} symbol v appropriate version is found. Solaris @command{c++filt} from the Solaris Studio compilers does @emph{not} work. +In order to build the GNU D compiler, GDC, a working @samp{libphobos} is +needed. That library wasn't built by default in GCC 9--11 on SPARC, or +on x86 when the Solaris assembler is used, but can be enabled by +configuring with @option{--enable-libphobos}. Also, GDC 9.4.0 is +required on x86, while GDC 9.3.0 is known to work on SPARC. + The versions of the GNU Multiple Precision Library (GMP), the MPFR library and the MPC library bundled with Solaris 11.3 and later are usually recent enough to match GCC's requirements. There are two @@ -4864,6 +4875,7 @@ need to configure with @option{--with-gm @item The version of the MPFR libary included in Solaris 11.3 is too old; you need to provide a more recent one. + @end itemize @html
Re: [PATCH v2] x86: Also check _SOFT_FLOAT in
On Tue, Mar 15, 2022 at 6:57 PM Hongtao Liu wrote: > > On Tue, Mar 15, 2022 at 10:40 PM H.J. Lu wrote: > > > > On Mon, Mar 14, 2022 at 7:31 AM H.J. Lu wrote: > > > > > > Push target("general-regs-only") in if x87 is enabled. > > > > > > gcc/ > > > > > > PR target/104890 > > > * config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before > > > pushing target("general-regs-only"). > > > > > > gcc/testsuite/ > > > > > > PR target/104890 > > > * gcc.target/i386/pr104890.c: New test. > > > --- > > > gcc/config/i386/x86gprintrin.h | 2 +- > > > gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++ > > > 2 files changed, 12 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c > > > > > > diff --git a/gcc/config/i386/x86gprintrin.h > > > b/gcc/config/i386/x86gprintrin.h > > > index 017ec299793..e0be01d5e78 100644 > > > --- a/gcc/config/i386/x86gprintrin.h > > > +++ b/gcc/config/i386/x86gprintrin.h > > > @@ -24,7 +24,7 @@ > > > #ifndef _X86GPRINTRIN_H_INCLUDED > > > #define _X86GPRINTRIN_H_INCLUDED > > > > > > -#if defined __MMX__ || defined __SSE__ > > > +#if !defined _SOFT_FLOAT || defined __MMX__ || defined __SSE__ > The patch LGTM. I am checking it in. > > > #pragma GCC push_options > > > #pragma GCC target("general-regs-only") > > > #define __DISABLE_GENERAL_REGS_ONLY__ > > > diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c > > > b/gcc/testsuite/gcc.target/i386/pr104890.c > > > new file mode 100644 > > > index 000..cb430eef688 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/pr104890.c > > > @@ -0,0 +1,11 @@ > > > +/* { dg-do compile { target ia32 } } */ > > > +/* { dg-options "-O2 -mshstk -march=i686" } */ > > > + > > > +#include > > > + > > > +__attribute__((target ("general-regs-only"))) > > > +int > > > +foo () > > > +{ > > > + return _get_ssp (); > > > +} > > > -- > > > 2.35.1 > > > > > > > It also fixed: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99744#c18 > > > > Any comments on this patch? > > > > -- > > H.J. > > > > -- > BR, > Hongtao -- H.J.
Re: [PATCH] OpenMP, Fortran: Bugfix for omp_set_num_teams.
On Wed, Mar 16, 2022 at 02:06:16PM +0100, Marcel Vollweiler wrote: > libgomp/ChangeLog: > > * fortran.c (omp_set_num_teams_8_): Fix bug. > * testsuite/libgomp.fortran/icv-8.f90: New test. Ok, with a minor nit. Please use Call omp_set_num_teams instead of omp_set_max_active_levels. instead of Fix bug. in the ChangeLog. > diff --git a/libgomp/fortran.c b/libgomp/fortran.c > index 8c1cfd1..d984ce5 100644 > --- a/libgomp/fortran.c > +++ b/libgomp/fortran.c > @@ -491,7 +491,7 @@ omp_set_num_teams_ (const int32_t *num_teams) > void > omp_set_num_teams_8_ (const int64_t *num_teams) > { > - omp_set_max_active_levels (TO_INT (*num_teams)); > + omp_set_num_teams (TO_INT (*num_teams)); > } > > int32_t > diff --git a/libgomp/testsuite/libgomp.fortran/icv-8.f90 > b/libgomp/testsuite/libgomp.fortran/icv-8.f90 > new file mode 100644 > index 000..9478c15 > --- /dev/null > +++ b/libgomp/testsuite/libgomp.fortran/icv-8.f90 > @@ -0,0 +1,10 @@ > +! This tests 'set_num_teams_8' function. > + > +program set_num_teams_8 > + use omp_lib > + use, intrinsic :: iso_fortran_env > + integer(int64) :: x > + x = 42 > + call omp_set_num_teams (x) > + if (omp_get_max_teams () .ne. 42) stop 1 > +end program Jakub
[PATCH][GCC11] tree-optimization/104931 - mitigate niter analysis issue
The following backports a pointer associating pattern from trunk that mitigates an issue with number_of_iterations_lt_to_ne in which context we fail to fold a comparison but succeed in folding a related subtraction. In the failure mode this results in a loop wrongly assumed looping with a bogus number of iterations, resulting in crashing of the premake application on start. With the backported simplification we are able to fold the comparison and correctly compute the loop as not iterating. I have failed to create a standalone testcase. I belive part of the issue is still latent but I have failed to nail down the issue exactly. Still I believe the backporting of the mitigation patch is the most sensible and safe thing at this point. Bootstrapped and tested on x86_64-unknown-linux-gnu on the branch, OK? Thanks, Richard. 2022-03-16 Richard Biener PR tree-optimization/104931 * match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): New GENERIC simplification. --- gcc/match.pd | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 05a08d0f96a..a005dcd42bd 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1997,6 +1997,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (pointer_plus (pointer_plus:s @0 @1) @3) (pointer_plus @0 (plus @1 @3))) +#if GENERIC +(simplify + (pointer_plus (convert:s (pointer_plus:s @0 @1)) @3) + (convert:type (pointer_plus @0 (plus @1 @3 +#endif /* Pattern match tem1 = (long) ptr1; -- 2.34.1
[aarch64] Add Neoverse N2 tuning structs
Hi, This patch adds tuning structures for Neoverse N2. 2022-03-16 Tamar Christina Andre Vieira * config/aarch64/aarch64.cc (neoversen2_addrcost_table, neoversen2_regmove_cost, neoversen2_advsimd_vector_cost, neoversen2_sve_vector_cost, neoversen2_scalar_issue_info, neoversen2_advsimd_issue_info, neoversen2_sve_issue_info, neoversen2_vec_issue_info, neoversen2_tunings): New structs. (neoversen2_tunings): Use new structs and update tuning flags. (aarch64_vec_op_count::rename_cycles_per_iter): Enable for neoversen2 tuning. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index d504fe7607b66a9c9ed9b183a2d3c03d34fb0f80..e0bb447beb9eae74551d863505eb265737d36334 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -519,6 +519,24 @@ static const struct cpu_addrcost_table neoversev1_addrcost_table = 0 /* imm_offset */ }; +static const struct cpu_addrcost_table neoversen2_addrcost_table = +{ +{ + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ +}, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + static const struct cpu_regmove_cost generic_regmove_cost = { 1, /* GP2GP */ @@ -624,6 +642,16 @@ static const struct cpu_regmove_cost a64fx_regmove_cost = 2 /* FP2FP */ }; +static const struct cpu_regmove_cost neoversen2_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmv_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + /* Generic costs for Advanced SIMD vector operations. */ static const advsimd_vec_cost generic_advsimd_vector_cost = { @@ -2174,12 +2202,166 @@ static const struct tune_params neoverse512tvb_tunings = &generic_prefetch_tune }; +static const advsimd_vec_cost neoversen2_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* ld3_st3_permute_cost */ + 3, /* ld4_st4_permute_cost */ + 3, /* permute_cost */ + 4, /* reduc_i8_cost */ + 4, /* reduc_i16_cost */ + 2, /* reduc_i32_cost */ + 2, /* reduc_i64_cost */ + 6, /* reduc_f16_cost */ + 4, /* reduc_f32_cost */ + 2, /* reduc_f64_cost */ + 2, /* store_elt_extra_cost */ + /* This value is just inherited from the Cortex-A57 table. */ + 8, /* vec_to_scalar_cost */ + /* This depends very much on what the scalar value is and + where it comes from. E.g. some constants take two dependent + instructions or a load, while others might be moved from a GPR. + 4 seems to be a reasonable compromise in practice. */ + 4, /* scalar_to_vec_cost */ + 4, /* align_load_cost */ + 4, /* unalign_load_cost */ + /* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ + 1, /* unalign_store_cost */ + 1 /* store_cost */ +}; + +static const sve_vec_cost neoversen2_sve_vector_cost = +{ + { +2, /* int_stmt_cost */ +2, /* fp_stmt_cost */ +3, /* ld2_st2_permute_cost */ +4, /* ld3_st3_permute_cost */ +4, /* ld4_st4_permute_cost */ +3, /* permute_cost */ +/* Theoretically, a reduction involving 15 scalar ADDs could + complete in ~5 cycles and would have a cost of 15. [SU]ADDV + completes in 11 cycles, so give it a cost of 15 + 6. */ +21, /* reduc_i8_cost */ +/* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ +13, /* reduc_i16_cost */ +/* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ +9, /* reduc_i32_cost */ +/* Likewise for 1 scalar ADDs (~1 cycles) vs. 2: 1 + 1. */ +2, /* reduc_i64_cost */ +/* Theoretically, a reduction involving 7 scalar FADDs could + complete in ~8 cycles and would have a cost of 14. FADDV + completes in 6 cycles, so give it a cost of 14 - 2. */ +12, /* reduc_f16_cost */ +/* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ +6, /* reduc_f32_cost */ +/* Likewise for 1 scalar FADDs (~2 cycles) vs. 2: 2 - 0. */ +2, /* reduc_f64_cost */ +2, /* store_elt_extra_cost */ +/* This value is just inherited from the Cortex-A57 table. */ +8, /* vec_to_scalar_cost */ +/* See the comment above the Advanced SIMD versions. */ +4, /* scalar_to_vec_cost */ +4, /* align_load_cost */ +4, /* unalign_load_cost */ +/* Although stores have a latency of 2 and compete for the + vector pipes, in practice it's better not to model that. */ +1, /* unalign_store_cost */ +1 /* store_cost */ + }, + 3, /* clast_cost */ + 10, /* fadda_f16_cost */ + 6, /* fadda_f32_cost */ + 4, /* fadda_f64_cost */ + /* A strided Advanced SIMD x64 load would ta
[aarch64] Add Demeter tuning structs
Hi, This patch adds tuning structs for -mcpu/-mtune=demeter. 2022-03-16 Tamar Christina Andre Vieira * config/aarch64/aarch64.cc (demeter_addrcost_table, demeter_regmove_cost, demeter_advsimd_vector_cost, demeter_sve_vector_cost, demeter_scalar_issue_info, demeter_advsimd_issue_info, demeter_sve_issue_info, demeter_vec_issue_info, demeter_vector_cost, demeter_tunings): New tuning structs. (aarch64_ve_op_count::rename_cycles_per_iter): Enable for demeter tuning. * config/aarch64/aarch64-cores.def: Add entry for demeter. * config/aarch64/aarch64-tune.md (tune): Add demeter to list. diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index eda3a4eb6028d97d50295773c87119d9fa0c41bf..9e6ca84bd4b277ccf2c1809c419703a23075f315 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -172,4 +172,6 @@ AARCH64_CORE("cortex-a710", cortexa710, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_CORE("cortex-x2", cortexx2, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_MEMTAG | AARCH64_FL_I8MM | AARCH64_FL_BF16, neoversen2, 0x41, 0xd48, -1) +AARCH64_CORE("demeter", demeter, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, demeter, 0x41, 0xd4f, -1) + #undef AARCH64_CORE diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index 3eed700c063c2abf36cd648b22f713e887255f96..dda5dbdda75c1383004f5dac3f249909f7f23589 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2" + "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,demeter" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index e0bb447beb9eae74551d863505eb265737d36334..9b6f67dc592d8a447d6b28390c90abe5dcfa5f08 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -537,6 +537,24 @@ static const struct cpu_addrcost_table neoversen2_addrcost_table = 0 /* imm_offset */ }; +static const struct cpu_addrcost_table demeter_addrcost_table = +{ +{ + 1, /* hi */ + 0, /* si */ + 0, /* di */ + 1, /* ti */ +}, + 0, /* pre_modify */ + 0, /* post_modify */ + 2, /* post_modify_ld3_st3 */ + 2, /* post_modify_ld4_st4 */ + 0, /* register_offset */ + 0, /* register_sextend */ + 0, /* register_zextend */ + 0 /* imm_offset */ +}; + static const struct cpu_regmove_cost generic_regmove_cost = { 1, /* GP2GP */ @@ -652,6 +670,16 @@ static const struct cpu_regmove_cost neoversen2_regmove_cost = 2 /* FP2FP */ }; +static const struct cpu_regmove_cost demeter_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmv_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + /* Generic costs for Advanced SIMD vector operations. */ static const advsimd_vec_cost generic_advsimd_vector_cost = { @@ -2391,6 +2419,195 @@ static const struct tune_params neoversen2_tunings = &generic_prefetch_tune }; +static const advsimd_vec_cost demeter_advsimd_vector_cost = +{ + 2, /* int_stmt_cost */ + 2, /* fp_stmt_cost */ + 2, /* ld2_st2_permute_cost */ + 2, /* l
[aarch64] Update regmove costs for neoverse-v1 and neoverse-512tvb tunings
Hi, This patch updates the register move tunings for -mcpu/-mtune={neoverse-v1,neoverse-512tvb}. 2022-03-16 Tamar Christina Andre Vieira * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New tuning struct. (neoversev1_tunings): Use neoversev1_regmove_cost and update store_int cost. (neoverse512tvb_tunings): Likewise. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9b6f67dc592d8a447d6b28390c90abe5dcfa5f08..f0485574528c47221e17a3aa5aee70a56508f61e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -670,6 +670,16 @@ static const struct cpu_regmove_cost neoversen2_regmove_cost = 2 /* FP2FP */ }; +static const struct cpu_regmove_cost neoversev1_regmove_cost = +{ + 1, /* GP2GP */ + /* Spilling to int<->fp instead of memory is recommended so set + realistic costs compared to memmv_cost. */ + 3, /* GP2FP */ + 2, /* FP2GP */ + 2 /* FP2FP */ +}; + static const struct cpu_regmove_cost demeter_regmove_cost = { 1, /* GP2GP */ @@ -2063,13 +2073,13 @@ static const struct tune_params neoversev1_tunings = { &cortexa76_extra_costs, &neoversev1_addrcost_table, - &generic_regmove_cost, + &neoversev1_regmove_cost, &neoversev1_vector_cost, &generic_branch_cost, &generic_approx_modes, SVE_256, /* sve_width */ { 4, /* load_int. */ -1, /* store_int. */ +2, /* store_int. */ 6, /* load_fp. */ 2, /* store_fp. */ 6, /* load_pred. */ @@ -2200,13 +2210,13 @@ static const struct tune_params neoverse512tvb_tunings = { &cortexa76_extra_costs, &neoversev1_addrcost_table, - &generic_regmove_cost, + &neoversev1_regmove_cost, &neoverse512tvb_vector_cost, &generic_branch_cost, &generic_approx_modes, SVE_128 | SVE_256, /* sve_width */ { 4, /* load_int. */ -1, /* store_int. */ +2, /* store_int. */ 6, /* load_fp. */ 2, /* store_fp. */ 6, /* load_pred. */
[PATCH] tree-optimization/104941: Actually assign the conversion result
Assign the result of fold_convert to offset. gcc/ChangeLog: PR tree-optimization/104941 * tree-object-size.cc (size_for_offset): Assign result of fold_convert to OFFSET. gcc/testsuite/ChangeLog: PR tree-optimization/104941 * gcc.dg/builtin-dynamic-object-size-0.c (S1, S2): New structs. (test_alloc_nested_structs, g): New functions. (main): Call test_alloc_nested_structs. Signed-off-by: Siddhesh Poyarekar --- Testing: - x86_64 bootstrap build and check - i686 build and check .../gcc.dg/builtin-dynamic-object-size-0.c| 34 +++ gcc/tree-object-size.cc | 2 +- 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index 2fca0a9c5b4..e5dc23a908d 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -323,6 +323,34 @@ test_substring (size_t sz, size_t off) return __builtin_dynamic_object_size (&str[off], 0); } +struct S2 +{ + char arr[7]; +}; + +struct S1 +{ + int pad; + struct S2 s2; +}; + +static long +g (struct S1 *s1) +{ + struct S2 *s2 = &s1->s2; + return __builtin_dynamic_object_size (s2->arr, 0); +} + +long +__attribute__ ((noinline)) +test_alloc_nested_structs (int x) +{ + struct S1 *s1 = __builtin_malloc (x); + return g (s1); +} + +/* POINTER_PLUS expressions. */ + size_t __attribute__ ((noinline)) test_substring_ptrplus (size_t sz, size_t off) @@ -342,6 +370,8 @@ test_substring_ptrplus2 (size_t sz, size_t off, size_t off2) return __builtin_dynamic_object_size (ptr + off2, 0); } +/* Function parameters. */ + size_t __attribute__ ((access (__read_write__, 1, 2))) __attribute__ ((noinline)) @@ -382,6 +412,8 @@ test_parmsz_unknown (void *obj, void *unknown, size_t sz, int cond) return __builtin_dynamic_object_size (cond ? obj : unknown, 0); } +/* Loops. */ + size_t __attribute__ ((noinline)) __attribute__ ((access (__read_write__, 1, 2))) @@ -491,6 +523,8 @@ main (int argc, char **argv) FAIL (); if (test_dynarray_cond (1) != 8) FAIL (); + if (test_alloc_nested_structs (42) != 42 - sizeof (int)) +FAIL (); if (test_deploop (128, 4) != 128) FAIL (); if (test_deploop (128, 129) != 32) diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc index 9728f79da75..e23e80cb726 100644 --- a/gcc/tree-object-size.cc +++ b/gcc/tree-object-size.cc @@ -372,7 +372,7 @@ size_for_offset (tree sz, tree offset, tree wholesize = NULL_TREE) /* Safe to convert now, since a valid net offset should be non-negative. */ if (!types_compatible_p (TREE_TYPE (offset), sizetype)) -fold_convert (sizetype, offset); +offset = fold_convert (sizetype, offset); if (TREE_CODE (offset) == INTEGER_CST) { -- 2.35.1
[aarch64] Update reg-costs to differentiate between memmove costs
This patch introduces a struct to differentiate between different memmove costs to enable a better modeling of memory operations. These have been modelled for -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for all other tunings all entries are equal to the old single memmove cost to ensure the behaviour remains the same. 2022-03-16 Tamar Christina Andre Vieira gcc/ChangeLog: * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New struct. (struct tune_params): Change type of memmov_cost to use cpu_memmov_cost. * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update all tunings to use new cpu_memmov_cost struct. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index f2fde35c6eb4989af8736db8fad004171c160282..5190eb8b96ea9af809a28470905b8b85ee720b09 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -508,6 +508,18 @@ struct cpu_prefetch_tune const int default_opt_level; }; +/* Model the costs for loads/stores for reload so that it can do more + accurate spill heuristics. */ +struct cpu_memmov_cost +{ + int load_int; + int store_int; + int load_fp; + int store_fp; + int load_pred; + int store_pred; +}; + struct tune_params { const struct cpu_cost_table *insn_extra_cost; @@ -520,7 +532,8 @@ struct tune_params or SVE_NOT_IMPLEMENTED if not applicable. Only used for tuning decisions, does not disable VLA vectorization. */ unsigned int sve_width; - int memmov_cost; + /* Structure used by reload to cost spills. */ + struct cpu_memmov_cost memmov_cost; int issue_rate; unsigned int fusible_ops; const char *function_align; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9a94f3a30b0f1acc3c9b8a0e3d703e60780d0cbc..3fc5e0bd3d3f39f99b0c8ffb9357603bc0998515 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1291,7 +1291,13 @@ static const struct tune_params generic_tunings = &generic_branch_cost, &generic_approx_modes, SVE_NOT_IMPLEMENTED, /* sve_width */ - 4, /* memmov_cost */ + { 4, /* load_int. */ +4, /* store_int. */ +4, /* load_fp. */ +4, /* store_fp. */ +4, /* load_pred. */ +4 /* store_pred. */ + }, /* memmov_cost. */ 2, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ "16:12", /* function_align. */ @@ -1320,7 +1326,13 @@ static const struct tune_params cortexa35_tunings = &generic_branch_cost, &generic_approx_modes, SVE_NOT_IMPLEMENTED, /* sve_width */ - 4, /* memmov_cost */ + { 4, /* load_int. */ +4, /* store_int. */ +4, /* load_fp. */ +4, /* store_fp. */ +4, /* load_pred. */ +4 /* store_pred. */ + }, /* memmov_cost. */ 1, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ @@ -1347,7 +1359,13 @@ static const struct tune_params cortexa53_tunings = &generic_branch_cost, &generic_approx_modes, SVE_NOT_IMPLEMENTED, /* sve_width */ - 4, /* memmov_cost */ + { 4, /* load_int. */ +4, /* store_int. */ +4, /* load_fp. */ +4, /* store_fp. */ +4, /* load_pred. */ +4 /* store_pred. */ + }, /* memmov_cost. */ 2, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ @@ -1374,7 +1392,13 @@ static const struct tune_params cortexa57_tunings = &generic_branch_cost, &generic_approx_modes, SVE_NOT_IMPLEMENTED, /* sve_width */ - 4, /* memmov_cost */ + { 4, /* load_int. */ +4, /* store_int. */ +4, /* load_fp. */ +4, /* store_fp. */ +4, /* load_pred. */ +4 /* store_pred. */ + }, /* memmov_cost. */ 3, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ @@ -1401,7 +1425,13 @@ static const struct tune_params cortexa72_tunings = &generic_branch_cost, &generic_approx_modes, SVE_NOT_IMPLEMENTED, /* sve_width */ - 4, /* memmov_cost */ + { 4, /* load_int. */ +4, /* store_int. */ +4, /* load_fp. */ +4, /* store_fp. */ +4, /* load_pred. */ +4 /* store_pred. */ + }, /* memmov_cost. */ 3, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ @@ -1428,7 +1458,13 @@ static const struct tune_params cortexa73_tunings = &generic_branch_cost, &generic_approx_modes, SVE_NOT_IMPLEMENTED, /* sve_width */ - 4, /* memmov_cost. */ + { 4, /* load_int. */ +4, /* store_int. */ +4, /* load_fp. */ +4, /* store_fp. */ +4, /* load_pred. */ +4 /* store_pred. */ +
[aarch64] Implement determine_suggested_unroll_factor
Hi, This patch implements the costing function determine_suggested_unroll_factor for aarch64. It determines the unrolling factor by dividing the number of X operations we can do per cycle by the number of X operations in the loop body, taking this information from the vec_ops analysis during vector costing and the available issue_info information. We multiply the dividend by a potential reduction_latency, to improve our pipeline utilization if we are stalled waiting on a particular reduction operation. Right now we also have a work around for vectorization choices where the main loop uses a NEON mode and predication is available, such that if the main loop makes use of a NEON pattern that is not directly supported by SVE we do not unroll, as that might cause performance regressions in cases where we would enter the original main loop's VF. As an example if you have a loop where you could use AVG_CEIL with a V8HI mode, you would originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated epilogue, using other instructions. Whereas with the unrolling you would end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping the original 8x NEON. In the future, we could handle this differently, by either using a different costing model for epilogues, or potentially vectorizing more than one single epilogue. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_vector_costs): Define determine_suggested_unroll_factor. (determine_suggested_unroll_factor): New function. (aarch64_vector_costs::finish_costs): Use determine_suggested_unroll_factor. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -15680,6 +15680,7 @@ private: unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *, unsigned int); bool prefer_unrolled_loop () const; + unsigned int determine_suggested_unroll_factor (); /* True if we have performed one-time initialization based on the vec_info. */ @@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops, return sve_cycles_per_iter; } +unsigned int +aarch64_vector_costs::determine_suggested_unroll_factor () +{ + auto *issue_info = aarch64_tune_params.vec_costs->issue_info; + if (!issue_info) +return 1; + bool sve = false; + if (aarch64_sve_mode_p (m_vinfo->vector_mode)) +{ + if (!issue_info->sve) + return 1; + sve = true; +} + else +{ + if (!issue_info->advsimd) + return 1; + /* If we are trying to unroll a NEON main loop that contains patterns +that we do not support with SVE and we might use a predicated +epilogue, we need to be conservative and block unrolling as this might +lead to a less optimal loop for the first and only epilogue using the +original loop's vectorization factor. +TODO: Remove this constraint when we add support for multiple epilogue +vectorization. */ + if (partial_vectors_supported_p () + && param_vect_partial_vector_usage != 0 + && !TARGET_SVE2) + { + unsigned int i; + stmt_vec_info stmt_vinfo; + FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo) + { + if (is_pattern_stmt_p (stmt_vinfo)) + { + gimple *stmt = stmt_vinfo->stmt; + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt)) + { + enum internal_fn ifn + = gimple_call_internal_fn (stmt); + switch (ifn) + { + case IFN_AVG_FLOOR: + case IFN_AVG_CEIL: + return 1; + default: + break; + } + } + } + } + } +} + + unsigned int max_unroll_factor = 1; + aarch64_simd_vec_issue_info const *vec_issue += sve ? issue_info->sve : issue_info->advsimd; + for (auto vec_ops : m_ops) +{ + /* Limit unroll factor to 4 for now. */ + unsigned int unroll_factor = 4; + unsigned int factor + = vec_ops.reduction_latency > 1 ? vec_ops.reduction_latency : 1; + unsigned int temp; + + /* Sanity check, this should never happen. */ + if ((vec_ops.stores + vec_ops.loads + vec_ops.general_ops) == 0) + return 1; + + /* Check stores. */ + if (vec_ops.stores > 0) + { + temp = CEIL (factor * vec_issue->stores_per_cycle, + vec_ops.stores); + unroll_factor = MIN (unroll_factor, temp); + } + + /* Check loads. */ + if (vec_ops.loads > 0) + { +
[aarch64] Update Neoverse N2 core definition
Hi, As requested, I updated the Neoverse N2 entry to use the AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the ARCH_INDENT to 9A and moved it under the Armv9 cores. gcc/ChangeLog: * config/aarch64/aarch64-cores.def: Update Neoverse N2 core entry. diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 9e6ca84bd4b277ccf2c1809c419703a23075f315..41d95354b6a483926b82a64ee6788eaf41814108 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -145,9 +145,6 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, cortexa57, 8_4A, AARCH64_FL_FOR /* Qualcomm ('Q') cores. */ AARCH64_CORE("saphira", saphira,saphira,8_4A, AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira, 0x51, 0xC01, -1) -/* Armv8.5-A Architecture Processors. */ -AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, 8_5A, AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_F16 | AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | AARCH64_FL_MEMTAG, neoversen2, 0x41, 0xd49, -1) - /* ARMv8-A big.LITTLE implementations. */ AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1) @@ -172,6 +169,8 @@ AARCH64_CORE("cortex-a710", cortexa710, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_CORE("cortex-x2", cortexx2, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_MEMTAG | AARCH64_FL_I8MM | AARCH64_FL_BF16, neoversen2, 0x41, 0xd48, -1) +AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, neoversen2, 0x41, 0xd49, -1) + AARCH64_CORE("demeter", demeter, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, demeter, 0x41, 0xd4f, -1) #undef AARCH64_CORE
Re: [PATCH] tree-optimization/104941: Actually assign the conversion result
On Wed, Mar 16, 2022 at 08:24:24PM +0530, Siddhesh Poyarekar wrote: > Assign the result of fold_convert to offset. > > gcc/ChangeLog: > > PR tree-optimization/104941 > * tree-object-size.cc (size_for_offset): Assign result of > fold_convert to OFFSET. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/104941 > * gcc.dg/builtin-dynamic-object-size-0.c (S1, S2): New structs. > (test_alloc_nested_structs, g): New functions. > (main): Call test_alloc_nested_structs. > > Signed-off-by: Siddhesh Poyarekar > --- a/gcc/tree-object-size.cc > +++ b/gcc/tree-object-size.cc > @@ -372,7 +372,7 @@ size_for_offset (tree sz, tree offset, tree wholesize = > NULL_TREE) > >/* Safe to convert now, since a valid net offset should be non-negative. > */ >if (!types_compatible_p (TREE_TYPE (offset), sizetype)) > -fold_convert (sizetype, offset); > +offset = fold_convert (sizetype, offset); > >if (TREE_CODE (offset) == INTEGER_CST) > { I think the types_compatible_p call is too heavy, you want to check if emitting the fold_convert to sizetype is unnecessary, which is if (!useless_type_conversion_p (sizetype, TREE_TYPE (offset))) offset = fold_convert (sizetype, offset); types_compatible_p is that the conversion is useless in both directions. Ok for trunk with that change. Jakub
Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]
> On 16 Mar 2022, at 13:28, Rainer Orth wrote: > > This patch documents the Solaris-specific D bootstrap requirements. > > Tested by building and inspecting gccinstall.{pdf,info}. > > Ok for trunk? > > I've omitted the Darwin-specific stuff so far documented in PRs d/103577 > and d/103578: > > * needs --enable-libphobos > * top of gcc-11 branch only > * backport of -static-libphobos patch > * Darwin/i386 doesn't work at all — hopefully we will not need to add that — The actual problem is known (FWIW there is also some wrong code generated for x86_64, we just happen to link it to something harmless instead of failing to link). - I remain hopeful that a solution for 11.x can be found before gcc-12 branches, since without m32 support it is hard for me to fit D into regular Darwin testing (at present D is effectively untested on Darwin since the change to require a D bootstrap) .. at one stage, D did work on m32 Darwin. Iain > > Rainer > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University > > > 2022-03-16 Rainer Orth > > gcc: > PR d/103528 > * doc/install.texi (Tools/packages necessary for building GCC) > (GDC): Document libphobos requirement. > (Host/target specific installation notes for GCC, *-*-solaris2*): > Document libphobos and GDC specifics. > > # HG changeset patch > # Parent 33de4c9d886299fd8cc97e20c0f761c2f28a3eef > doc: Document Solaris D bootstrap requirements [PR 103528] > > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi > --- a/gcc/doc/install.texi > +++ b/gcc/doc/install.texi > @@ -287,7 +287,8 @@ section. > @item @anchor{GDC-prerequisite}GDC > > In order to build GDC, the D compiler, you need a working GDC > -compiler (GCC version 9.1 or later), as the D front end is written in D. > +compiler (GCC version 9.1 or later) and D runtime library, > +@samp{libphobos}, as the D front end is written in D. > > Versions of GDC prior to 12 can be built with an ISO C++11 compiler, which can > then be installed and used to bootstrap newer versions of the D front end. > @@ -303,6 +304,10 @@ front end does not make use of any GDC-s > of the D language, if too old a GDC version is installed and > @option{--enable-languages=d} is used, the build will fail. > > +On some targets, @samp{libphobos} isn't enabled by default, but compiles > +and works if @option{--enable-libphobos} is used. Specifics are > +documented for affected targets. > + > @item A ``working'' POSIX compatible shell, or GNU bash > > Necessary when running @command{configure} because some > @@ -4851,6 +4856,12 @@ GNU binutils. @samp{libstdc++} symbol v > appropriate version is found. Solaris @command{c++filt} from the Solaris > Studio compilers does @emph{not} work. > > +In order to build the GNU D compiler, GDC, a working @samp{libphobos} is > +needed. That library wasn't built by default in GCC 9--11 on SPARC, or > +on x86 when the Solaris assembler is used, but can be enabled by > +configuring with @option{--enable-libphobos}. Also, GDC 9.4.0 is > +required on x86, while GDC 9.3.0 is known to work on SPARC. > + > The versions of the GNU Multiple Precision Library (GMP), the MPFR > library and the MPC library bundled with Solaris 11.3 and later are > usually recent enough to match GCC's requirements. There are two > @@ -4864,6 +4875,7 @@ need to configure with @option{--with-gm > @item > The version of the MPFR libary included in Solaris 11.3 is too old; you > need to provide a more recent one. > + > @end itemize > > @html
Re: [PATCH] handle "invisible" reference in -Wdangling-pointer (PR104436)
On 3/9/22 06:17, Richard Biener wrote: On Fri, Feb 11, 2022 at 12:05 AM Martin Sebor via Gcc-patches wrote: On 2/8/22 15:37, Jason Merrill wrote: On 2/8/22 16:59, Martin Sebor wrote: Transforming a by-value arguments to by-reference as GCC does for some class types can trigger -Wdangling-pointer when the argument is used to store the address of a local variable. Since the stored value is not accessible in the caller the warning is a false positive. The attached patch handles this case by excluding PARM_DECLs with the DECL_BY_REFERENCE bit set from consideration. While testing the patch I noticed some instances of the warning are uninitentionally duplicated as the pass runs more than once. To avoid that, I also introduce warning suppression into the handler for this instance of the warning. (There might still be others.) The second test should verify that we do warn about returning 't' from a function; we don't want to ignore the DECL_BY_REFERENCE RESULT_DECL. The indirect aggregate case isn't handled and needs more work but since you brought it up I thought I should look into finishing it. The attached patch #2 adds support for it. It also incorporates Richard's suggestion to test PARM_DECL. Patch #2 applies on top of patch #1 which is unchanged from the first revision. patch #1 would be OK if you'd do the PARM_DECL check there - I'd rather defer #2 to stage1. Richard. I have retested it on x86_64-linux and by building Glibc and Binutils + GDB. If now is too late for the aggregate enhancement I'm okay with deferring it until stage 1. I committed the adjusted patch in r12-7650. I'll try to remember to ping the second patch in stage 1. Martin + tree var = SSA_NAME_VAR (lhs_ref.ref); + if (DECL_BY_REFERENCE (var)) +/* Avoid by-value arguments transformed into by-reference. */ +continue; I wonder if we can we express this property of invisiref parms somewhere more general? I imagine optimizations would find it useful as well. Could pointer_query somehow treat the reference as pointing to a function-local object? I don't quite see where in the pointer_query class this would be useful (the class also isn't used for optimization). I could add a helper to the access_ref class to query this property in warning code but as this is the only potential client I'm also not sure that's quite what you had in mind. I'd need some guidance as to what you're thinking of here. Martin I previously tried to express this by marking the reference as 'restrict', but that was wrong (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97474). Jason
Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]
Hi Iain, >> I've omitted the Darwin-specific stuff so far documented in PRs d/103577 >> and d/103578: >> >> * needs --enable-libphobos >> * top of gcc-11 branch only >> * backport of -static-libphobos patch >> * Darwin/i386 doesn't work at all > > — hopefully we will not need to add that — > > The actual problem is known (FWIW there is also some wrong code generated for > x86_64, we just happen to link it to something harmless instead of failing > to link). I see. One issue I see during x86_64-darwin testing (10.7 only?) is that the forkgc2 test hangs and has to be killed manually. > - I remain hopeful that a solution for 11.x can be found before gcc-12 > branches, > since without m32 support it is hard for me to fit D into regular Darwin > testing (at > present D is effectively untested on Darwin since the change to require a D > bootstrap) > .. at one stage, D did work on m32 Darwin. I suspected that the 32-bit issue might be due to several stdint types being wrong, which was just fixed on master. At least the issue seemed similar to PR d/104738. I'm building a 64-bit trunk gdc as we speak in order to try that as a bootstrap compiler for a 32-bit build. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]
Hi Rainer, > On 16 Mar 2022, at 15:55, Rainer Orth wrote: > >>> I've omitted the Darwin-specific stuff so far documented in PRs d/103577 >>> and d/103578: >>> >>> * needs --enable-libphobos >>> * top of gcc-11 branch only >>> * backport of -static-libphobos patch >>> * Darwin/i386 doesn't work at all >> >> — hopefully we will not need to add that — >> >> The actual problem is known (FWIW there is also some wrong code generated for >> x86_64, we just happen to link it to something harmless instead of failing >> to link). > > I see. One issue I see during x86_64-darwin testing (10.7 only?) is > that the forkgc2 test hangs and has to be killed manually. yes, there are known issues with GC and threading - but those do not break bootstrap since the compiler is single-threaded. >> - I remain hopeful that a solution for 11.x can be found before gcc-12 >> branches, >> since without m32 support it is hard for me to fit D into regular Darwin >> testing (at >> present D is effectively untested on Darwin since the change to require a D >> bootstrap) >> .. at one stage, D did work on m32 Darwin. > > I suspected that the 32-bit issue might be due to several stdint types > being wrong, which was just fixed on master. At least the issue seemed > similar to PR d/104738. I'm building a 64-bit trunk gdc as we speak in > order to try that as a bootstrap compiler for a 32-bit build. The issue on 11.x is related to code that tries to synthesize section-start symbols, it causes a link fail on m32 - which is a bootstrap breaker. .. if that has been fixed for m32 codegenon master then, yes - presumably we could build an x86_64 compiler and use that “-m32” to make an i686 bootstrap. Iain.
Re: [aarch64] Add Neoverse N2 tuning structs
"Andre Vieira (lists)" writes: > Hi, > > This patch adds tuning structures for Neoverse N2. > > 2022-03-16 Tamar Christina > Andre Vieira > > * config/aarch64/aarch64.cc (neoversen2_addrcost_table, > neoversen2_regmove_cost, > neoversen2_advsimd_vector_cost, neoversen2_sve_vector_cost, > neoversen2_scalar_issue_info, > neoversen2_advsimd_issue_info, neoversen2_sve_issue_info, > neoversen2_vec_issue_info, > neoversen2_tunings): New structs. > (neoversen2_tunings): Use new structs and update tuning flags. > (aarch64_vec_op_count::rename_cycles_per_iter): Enable for > neoversen2 tuning. > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > d504fe7607b66a9c9ed9b183a2d3c03d34fb0f80..e0bb447beb9eae74551d863505eb265737d36334 > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -519,6 +519,24 @@ static const struct cpu_addrcost_table > neoversev1_addrcost_table = >0 /* imm_offset */ > }; > > +static const struct cpu_addrcost_table neoversen2_addrcost_table = > +{ > +{ > + 1, /* hi */ > + 0, /* si */ > + 0, /* di */ > + 1, /* ti */ > +}, > + 0, /* pre_modify */ > + 0, /* post_modify */ > + 2, /* post_modify_ld3_st3 */ > + 2, /* post_modify_ld4_st4 */ > + 0, /* register_offset */ > + 0, /* register_sextend */ > + 0, /* register_zextend */ > + 0 /* imm_offset */ > +}; > + > static const struct cpu_regmove_cost generic_regmove_cost = > { >1, /* GP2GP */ > @@ -624,6 +642,16 @@ static const struct cpu_regmove_cost a64fx_regmove_cost = >2 /* FP2FP */ > }; > > +static const struct cpu_regmove_cost neoversen2_regmove_cost = > +{ > + 1, /* GP2GP */ > + /* Spilling to int<->fp instead of memory is recommended so set > + realistic costs compared to memmv_cost. */ > + 3, /* GP2FP */ > + 2, /* FP2GP */ > + 2 /* FP2FP */ > +}; > + > /* Generic costs for Advanced SIMD vector operations. */ > static const advsimd_vec_cost generic_advsimd_vector_cost = > { > @@ -2174,12 +2202,166 @@ static const struct tune_params > neoverse512tvb_tunings = >&generic_prefetch_tune > }; > > +static const advsimd_vec_cost neoversen2_advsimd_vector_cost = > +{ > + 2, /* int_stmt_cost */ > + 2, /* fp_stmt_cost */ > + 2, /* ld2_st2_permute_cost */ > + 2, /* ld3_st3_permute_cost */ > + 3, /* ld4_st4_permute_cost */ > + 3, /* permute_cost */ > + 4, /* reduc_i8_cost */ > + 4, /* reduc_i16_cost */ > + 2, /* reduc_i32_cost */ > + 2, /* reduc_i64_cost */ > + 6, /* reduc_f16_cost */ > + 4, /* reduc_f32_cost */ > + 2, /* reduc_f64_cost */ > + 2, /* store_elt_extra_cost */ > + /* This value is just inherited from the Cortex-A57 table. */ > + 8, /* vec_to_scalar_cost */ > + /* This depends very much on what the scalar value is and > + where it comes from. E.g. some constants take two dependent > + instructions or a load, while others might be moved from a GPR. > + 4 seems to be a reasonable compromise in practice. */ > + 4, /* scalar_to_vec_cost */ > + 4, /* align_load_cost */ > + 4, /* unalign_load_cost */ > + /* Although stores have a latency of 2 and compete for the > + vector pipes, in practice it's better not to model that. */ > + 1, /* unalign_store_cost */ > + 1 /* store_cost */ > +}; > + > +static const sve_vec_cost neoversen2_sve_vector_cost = > +{ > + { > +2, /* int_stmt_cost */ > +2, /* fp_stmt_cost */ > +3, /* ld2_st2_permute_cost */ > +4, /* ld3_st3_permute_cost */ > +4, /* ld4_st4_permute_cost */ > +3, /* permute_cost */ > +/* Theoretically, a reduction involving 15 scalar ADDs could > + complete in ~5 cycles and would have a cost of 15. [SU]ADDV > + completes in 11 cycles, so give it a cost of 15 + 6. */ > +21, /* reduc_i8_cost */ > +/* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6. */ > +13, /* reduc_i16_cost */ > +/* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6. */ > +9, /* reduc_i32_cost */ > +/* Likewise for 1 scalar ADDs (~1 cycles) vs. 2: 1 + 1. */ typo: 1 scalar ADD. > +2, /* reduc_i64_cost */ > +/* Theoretically, a reduction involving 7 scalar FADDs could > + complete in ~8 cycles and would have a cost of 14. FADDV > + completes in 6 cycles, so give it a cost of 14 - 2. */ > +12, /* reduc_f16_cost */ > +/* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0. */ > +6, /* reduc_f32_cost */ > +/* Likewise for 1 scalar FADDs (~2 cycles) vs. 2: 2 - 0. */ Similarly here. OK with those changes, thanks. Richard > +2, /* reduc_f64_cost */ > +2, /* store_elt_extra_cost */ > +/* This value is just inherited from the Cortex-A57 table. */ > +8, /* vec_to_scalar_cost */ > +/* See the comment above the Advanced SIMD versions. */ > +4, /* scalar_to_vec_cost */ > +4, /* align_load_cost */ >
[C PATCH] PR c/98198: ICE-on-invalid-code error recovery.
This is Christophe Lyon's fix to PR c/98198, an ICE-on-invalid-code regression affecting mainline, and a suitable testcase. Tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-03-16 Christophe Lyon Roger Sayle gcc/c-family/ChangeLog PR c/98198 * c-attribs.cc (decl_or_type_attrs): Add error_mark_node check. gcc/testsuite/ChangeLog PR c/98198 * gcc.dg/pr98198.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index c885d88..111a33f 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -806,6 +806,8 @@ decl_or_type_attrs (tree node) return attrs; tree type = TREE_TYPE (node); + if (type == error_mark_node) + return NULL_TREE; return TYPE_ATTRIBUTES (type); } diff --git a/gcc/testsuite/gcc.dg/pr98198.c b/gcc/testsuite/gcc.dg/pr98198.c new file mode 100644 index 000..489afae --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr98198.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +static inline void sub_1 ( ) { + struct struct_1 var_9 , var_10 +} + +static int var_9[1] __attribute__ ( ( section ( ".data" ) ) ) ; +/* { dg-excess-errors "" } */
Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported
On Tue, 15 Mar 2022 at 14:12, Patrick Palka wrote: > > On Mon, 14 Mar 2022, Jonathan Wakely wrote: > > > On Mon, 14 Mar 2022 at 14:17, Patrick Palka via Libstdc++ > > wrote: > > > > > > On Fri, 11 Mar 2022, Jonathan Wakely wrote: > > > > > > > Patrick, I think this is right, but please take a look to double check. > > > > > > > > I think we should fix the feature-test macro conditions for gcc-11 too, > > > > although it's a bit more complicated there. It should depend on IEEE > > > > float and double *and* uselocale. We don't need the other changes on the > > > > branch. > > > > > > Don't we still depend on uselocale in GCC 12 for long double from_chars, > > > at least on targets where long double != binary64? > > > > Not after this patch: > > > > from_chars(const char* first, const char* last, long double& value, > > chars_format fmt) noexcept > > { > > -#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \ > > - && ! USE_STRTOD_FOR_FROM_CHARS > > +#if ! USE_STRTOD_FOR_FROM_CHARS > > + // Either long double is the same as double, or we can't use strtold. > > + // In the latter case, this might give an incorrect result (e.g. values > > + // out of range of double give an error, even if they fit in long > > double). > > > > If uselocale isn't available, this defines the long double overload in > > terms of the double one, even if that doesn't always give the right > > answers. That greatly simplifies the preprocessor conditions for when > > it's supported. If the float and double forms are present, so is the > > long double one. > > Ah sorry, I had overlooked that part of the patch. Makes sense and LGTM! Pushed to trunk now.
Re: [aarch64] Add Demeter tuning structs
"Andre Vieira (lists)" writes: > Hi, > > This patch adds tuning structs for -mcpu/-mtune=demeter. > > > 2022-03-16 Tamar Christina > Andre Vieira > > * config/aarch64/aarch64.cc (demeter_addrcost_table, > demeter_regmove_cost, > demeter_advsimd_vector_cost, demeter_sve_vector_cost, > demeter_scalar_issue_info, > demeter_advsimd_issue_info, demeter_sve_issue_info, > demeter_vec_issue_info, > demeter_vector_cost, demeter_tunings): New tuning structs. > (aarch64_ve_op_count::rename_cycles_per_iter): Enable for > demeter tuning. > * config/aarch64/aarch64-cores.def: Add entry for demeter. > * config/aarch64/aarch64-tune.md (tune): Add demeter to list. > > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index > eda3a4eb6028d97d50295773c87119d9fa0c41bf..9e6ca84bd4b277ccf2c1809c419703a23075f315 > 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -172,4 +172,6 @@ AARCH64_CORE("cortex-a710", cortexa710, cortexa57, 9A, > AARCH64_FL_FOR_ARCH9 | > > AARCH64_CORE("cortex-x2", cortexx2, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | > AARCH64_FL_SVE2_BITPERM | AARCH64_FL_MEMTAG | AARCH64_FL_I8MM | > AARCH64_FL_BF16, neoversen2, 0x41, 0xd48, -1) > > +AARCH64_CORE("demeter", demeter, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | > AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG > | AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, demeter, 0x41, 0xd4f, -1) > + > #undef AARCH64_CORE > diff --git a/gcc/config/aarch64/aarch64-tune.md > b/gcc/config/aarch64/aarch64-tune.md > index > 3eed700c063c2abf36cd648b22f713e887255f96..dda5dbdda75c1383004f5dac3f249909f7f23589 > 100644 > --- a/gcc/config/aarch64/aarch64-tune.md > +++ b/gcc/config/aarch64/aarch64-tune.md > @@ -1,5 +1,5 @@ > ;; -*- buffer-read-only: t -*- > ;; Generated automatically by gentune.sh from aarch64-cores.def > (define_attr "tune" > - > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2" > + > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,demeter" > (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > e0bb447beb9eae74551d863505eb265737d36334..9b6f67dc592d8a447d6b28390c90abe5dcfa5f08 > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -537,6 +537,24 @@ static const struct cpu_addrcost_table > neoversen2_addrcost_table = >0 /* imm_offset */ > }; > > +static const struct cpu_addrcost_table demeter_addrcost_table = > +{ > +{ > + 1, /* hi */ > + 0, /* si */ > + 0, /* di */ > + 1, /* ti */ > +}, > + 0, /* pre_modify */ > + 0, /* post_modify */ > + 2, /* post_modify_ld3_st3 */ > + 2, /* post_modify_ld4_st4 */ > + 0, /* register_offset */ > + 0, /* register_sextend */ > + 0, /* register_zextend */ > + 0 /* imm_offset */ > +}; > + > static const struct cpu_regmove_cost generic_regmove_cost = > { >1, /* GP2GP */ > @@ -652,6 +670,16 @@ static const struct cpu_regmove_cost > neoversen2_regmove_cost = >2 /* FP2FP */ > }; > > +static const struct cpu_regmove_cost demeter_regmove_cost = > +{ > + 1, /* GP2GP */ > + /* Spilling to int<->fp instead of memory is recommended so set > + realistic costs compared to memmv_cost. */ > + 3, /* GP2FP */ > + 2, /* FP2GP */ > + 2 /* FP2FP */ > +}; > + Given that these values are the same as for Neoverse V1, I wonder whether we should reuse the existing structures, like we do for cortexa76. I can see there are
Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported
On Mon, 14 Mar 2022 at 14:15, Patrick Palka wrote: > I think __floating_from_chars_hex should work fine on 16 bit targets, > so I suppose we could use it in the !USE_LIB_FAST_FLOAT branch as well. Good point, and even for SIZE_WIDTH >= 32, it might be faster to use your __floating_from_chars_hex function than to change locale twice and allocate memory and call strto{f,d,ld}. It certainly avoids the non-standard errc::out_of_memory result. So maybe something like the attached semi-tested patch. commit 57b9e2227c7b14f04f2593ec41e177631b6f3fc2 Author: Jonathan Wakely Date: Wed Mar 16 16:46:07 2022 libstdc++: Prefer our own __floating_from_chars_hex if supported This adjust the floating-point overloads of std::from_chars so that they prefer to use __floating_from_chars_hex if possible, instead of using strtof/strtod/strtold. This avoids changing the per-thread C locale and allocating memory for a null-terminated string when we can use the hex-specific function instead. libstdc++-v3/ChangeLog: * src/c++17/floating_from_chars.cc (from_chars): Always use __floating_from_chars_hex for hex format if possible. diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc b/libstdc++-v3/src/c++17/floating_from_chars.cc index 4aa2483bc28..ce30752ee95 100644 --- a/libstdc++-v3/src/c++17/floating_from_chars.cc +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc @@ -821,13 +821,12 @@ from_chars_result from_chars(const char* first, const char* last, float& value, chars_format fmt) noexcept { -#if USE_LIB_FAST_FLOAT +#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 if (fmt == chars_format::hex) return __floating_from_chars_hex(first, last, value); - else -{ - return fast_float::from_chars(first, last, value, fmt); -} +#endif +#if USE_LIB_FAST_FLOAT + return fast_float::from_chars(first, last, value, fmt); #else return from_chars_strtod(first, last, value, fmt); #endif @@ -837,13 +836,12 @@ from_chars_result from_chars(const char* first, const char* last, double& value, chars_format fmt) noexcept { -#if USE_LIB_FAST_FLOAT +#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 if (fmt == chars_format::hex) return __floating_from_chars_hex(first, last, value); - else -{ - return fast_float::from_chars(first, last, value, fmt); -} +#endif +#if USE_LIB_FAST_FLOAT + return fast_float::from_chars(first, last, value, fmt); #else return from_chars_strtod(first, last, value, fmt); #endif @@ -853,18 +851,12 @@ from_chars_result from_chars(const char* first, const char* last, long double& value, chars_format fmt) noexcept { -#if ! USE_STRTOD_FOR_FROM_CHARS +#if ! USE_STRTOD_FOR_FROM_CHARS || __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ // Either long double is the same as double, or we can't use strtold. // In the latter case, this might give an incorrect result (e.g. values // out of range of double give an error, even if they fit in long double). double dbl_value; - from_chars_result result; - if (fmt == chars_format::hex) -result = __floating_from_chars_hex(first, last, dbl_value); - else -{ - result = fast_float::from_chars(first, last, dbl_value, fmt); -} + from_chars_result result = std::from_chars(first, last, dbl_value, fmt); if (result.ec == errc{}) value = dbl_value; return result; @@ -888,7 +880,8 @@ from_chars_result from_chars(const char* first, const char* last, __ieee128& value, chars_format fmt) noexcept { - // fast_float doesn't support IEEE binary128 format, but we can use strtold. + // fast_float doesn't support IEEE binary128 format, and neither does our + // __floating_from_chars_hex, but we can use __strtoieee128 on this target. return from_chars_strtod(first, last, value, fmt); } #endif
[PATCH] c++, v2: Fix up constexpr evaluation of new with zero sized types [PR104568]
On Tue, Mar 15, 2022 at 04:19:05PM -0400, Jason Merrill wrote: > > But if you strongly prefer it that way, I can do that. > > Note, probably not 3 new args but 4, depends on whether we could turn > > all those cases where the tree arg0 = CALL_EXPR_ARG (oldop, 0); > > is done but var_size_adjusted is false into assertion failures. > > I'm worried that with the zero size of element we could end up with > > a variable number of elements which when multiplied by 0 gives constant 0, > > though hopefully that would be rejected earlier during constant evaluation. > > Or we could move all the adjustment into a separate function and only ever > pass the number of elements to build_new_constexpr_heap_type? So like this? 2022-03-16 Jakub Jelinek PR c++/104568 * init.cc (build_new_constexpr_heap_type): Remove FULL_SIZE argument and its handling, instead add ITYPE2 argument. Only support COOKIE_SIZE != NULL. (build_new_1): If size is 0, change it to 0 * outer_nelts if outer_nelts is non-NULL. Pass type rather than elt_type to maybe_wrap_new_for_constexpr. * constexpr.cc (build_new_constexpr_heap_type): New function. (cxx_eval_constant_expression) : If elt_size is zero sized type, try to recover outer_nelts from the size argument to operator new/new[] and pass that as arg_size to build_new_constexpr_heap_type. Pass ctx, non_constant_p and overflow_p to that call too. * g++.dg/cpp2a/constexpr-new22.C: New test. --- gcc/cp/constexpr.cc.jj 2022-03-16 15:25:26.294551244 +0100 +++ gcc/cp/constexpr.cc 2022-03-16 17:27:08.202184961 +0100 @@ -6422,6 +6422,84 @@ maybe_warn_about_constant_value (locatio } } +/* For element type ELT_TYPE, return the appropriate type of the heap object + containing such element(s). COOKIE_SIZE is NULL or the size of cookie + in bytes. If COOKIE_SIZE is NULL, return array type + ELT_TYPE[FULL_SIZE / sizeof(ELT_TYPE)], otherwise return + struct { size_t[COOKIE_SIZE/sizeof(size_t)]; ELT_TYPE[N]; } + where N is is computed such that the size of the struct fits into FULL_SIZE. + If ARG_SIZE is non-NULL, it is the first argument to the new operator. + It should be passed if ELT_TYPE is zero sized type in which case FULL_SIZE + will be also 0 and so it is not possible to determine the actual array + size. CTX, NON_CONSTANT_P and OVERFLOW_P are used during constant + expression evaluation of subexpressions of ARG_SIZE. */ + +tree +build_new_constexpr_heap_type (const constexpr_ctx *ctx, tree elt_type, + tree cookie_size, tree full_size, tree arg_size, + bool *non_constant_p, bool *overflow_p) +{ + gcc_assert (cookie_size == NULL_TREE || tree_fits_uhwi_p (cookie_size)); + gcc_assert (tree_fits_uhwi_p (full_size)); + unsigned HOST_WIDE_INT csz = cookie_size ? tree_to_uhwi (cookie_size) : 0; + if (arg_size) +{ + STRIP_NOPS (arg_size); + if (cookie_size) + { + if (TREE_CODE (arg_size) != PLUS_EXPR) + arg_size = NULL_TREE; + else if (TREE_CODE (TREE_OPERAND (arg_size, 0)) == INTEGER_CST + && tree_int_cst_equal (cookie_size, + TREE_OPERAND (arg_size, 0))) + { + arg_size = TREE_OPERAND (arg_size, 1); + STRIP_NOPS (arg_size); + } + else if (TREE_CODE (TREE_OPERAND (arg_size, 1)) == INTEGER_CST + && tree_int_cst_equal (cookie_size, + TREE_OPERAND (arg_size, 1))) + { + arg_size = TREE_OPERAND (arg_size, 0); + STRIP_NOPS (arg_size); + } + else + arg_size = NULL_TREE; + } + if (arg_size && TREE_CODE (arg_size) == MULT_EXPR) + { + tree op0 = TREE_OPERAND (arg_size, 0); + tree op1 = TREE_OPERAND (arg_size, 1); + if (integer_zerop (op0)) + arg_size + = cxx_eval_constant_expression (ctx, op1, false, non_constant_p, + overflow_p); + else if (integer_zerop (op1)) + arg_size + = cxx_eval_constant_expression (ctx, op0, false, non_constant_p, + overflow_p); + else + arg_size = NULL_TREE; + } + else + arg_size = NULL_TREE; +} + + unsigned HOST_WIDE_INT fsz = tree_to_uhwi (arg_size ? arg_size : full_size); + unsigned HOST_WIDE_INT esz = int_size_in_bytes (elt_type); + if (!arg_size) +{ + gcc_assert (fsz >= csz); + fsz -= csz; + if (esz) + fsz /= esz; +} + tree itype2 = build_index_type (size_int (fsz - 1)); + if (!cookie_size) +return build_cplus_array_type (elt_type, itype2); + return build_new_constexpr_heap_type (elt_type, cookie_size, itype2); +} + /* Attempt to reduce the expressio
Re: RFA: crc builtin functions & optimizations
On Wed, 16 Mar 2022 at 08:15, Richard Biener wrote: > The canonical place to transform loops into builtins would be loop > distribution. > Another place would be final value replacement since you basically replace > the reduction result with a call to the builtin, but I think > loop-distribution is > the better overall place. See how we match strlen() there. > > Richard. So I suppose that would be something along the line of the below patch? Except it'd need a lot more checks to make sure it's actually processing a CRC computation, and there needs to be code to actually expand the builtin using optabs. And something needs to be done to make match.pd work on the output. I'm seeing crcu16 being unrolled by in the *.cunroll dump and the remains of the loops deleted in the *.dse4 dump, but the patch.pd clause to simplify the two __builtin_crc8s calls never matches, so we end up with this in *.optimized: ee_u16 crcu16 (ee_u16 newval, ee_u16 crc) { unsigned char _1; short unsigned int _2; unsigned char _3; short unsigned int _24; short unsigned int _26; [local count: 1073741824]: _1 = (unsigned char) newval_4(D); _24 = __builtin_crc8s (crc_6(D), _1, 40961); _2 = newval_4(D) >> 8; _3 = (unsigned char) _2; _26 = __builtin_crc8s (_24, _3, 40961); [tail call] return _26; } diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index db6e9096a86..b74e8569b94 100644 --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -659,6 +659,9 @@ class loop_distribution replace them accordingly. */ bool transform_reduction_loop (loop_p loop); + /* Transform some loops which calculate a CRC. */ + bool transform_reduction_loop (loop_p loop, tree niters); + /* Compute topological order for basic blocks. Topological order is needed because data dependence is computed for data references in lexicographical order. */ @@ -3432,6 +3435,26 @@ generate_strlen_builtin_using_rawmemchr (loop_p loop, tree reduction_var, start_len); } +static void +generate_crc_builtin (loop_p loop, tree reduction_var, + tree crc_in, tree data_in, tree xor_val, + location_t loc) +{ + gimple_seq seq = NULL; + tree reduction_var_new = make_ssa_name (TREE_TYPE (reduction_var)); + + crc_in = force_gimple_operand (crc_in, &seq, true, NULL_TREE); + data_in = force_gimple_operand (data_in, &seq, true, NULL_TREE); + tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_CRC8S)); + gimple *fn_call = gimple_build_call (fn, 3, crc_in, data_in, xor_val); + gimple_call_set_lhs (fn_call, reduction_var_new); + gimple_set_location (fn_call, loc); + gimple_seq_add_stmt (&seq, fn_call); + + generate_reduction_builtin_1 (loop, seq, reduction_var, reduction_var_new, + "generated crc%s", E_QImode); +} + /* Return true if we can count at least as many characters by taking pointer difference as we can count via reduction_var without an overflow. Thus compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type), @@ -3713,6 +3736,128 @@ loop_distribution::transform_reduction_loop (loop_p loop) return false; } +/* Match loops like: + + [local count: 954449105]: + # data_16 = PHI + # crc_22 = PHI + # i_3 = PHI + # ivtmp_7 = PHI + _8 = (unsigned char) crc_22; + _1 = _8 ^ data_16; + x16_12 = _1 & 1; + data_13 = data_16 >> 1; + _19 = crc_22 >> 1; + if (x16_12 != 0) +goto ; [50.00%] + else +goto ; [50.00%] + + [local count: 477224553]: + goto ; [100.00%] + + [local count: 477224552]: + crc_17 = _19 ^ 40961; + + [local count: 954449105]: + # crc_4 = PHI + i_18 = i_3 + 1; + ivtmp_6 = ivtmp_7 - 1; + if (ivtmp_6 != 0) +goto ; [88.89%] + else +goto ; [11.11%] + + [local count: 848409806]: + goto ; [100.00%] */ + +bool +loop_distribution::transform_reduction_loop (loop_p loop, tree niters) +{ + gimple *reduction_stmt; + + if (!wi::eq_p (wi::to_widest (niters), 7)) +return false; + + if (loop->num_nodes != 5) +return false; + + reduction_stmt = determine_reduction_stmt (loop); + if (reduction_stmt == NULL) +return false; + + /* Reduction variables are guaranteed to be SSA names. */ + tree reduction_var; + switch (gimple_code (reduction_stmt)) +{ +case GIMPLE_ASSIGN: +case GIMPLE_PHI: + reduction_var = gimple_get_lhs (reduction_stmt); + break; +default: + /* Bail out e.g. for GIMPLE_CALL. */ + return false; +} + + if (EDGE_COUNT (loop->header->preds) != 2) +return false; + + edge e, entry_edge = NULL, backedge = NULL; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, loop->header->preds) +if (e->src->loop_father != loop) + entry_edge = e; +else + backedge = e; + + if (!entry_edge || !backedge) +return false; + + tree crc_in = NULL_TREE, data_in = NULL_TREE; + + for (gphi_iterator gsi = gsi_start_phis (loop
Re: [aarch64] Update reg-costs to differentiate between memmove costs
"Andre Vieira (lists)" writes: > This patch introduces a struct to differentiate between different > memmove costs to enable a better modeling of memory operations. These > have been modelled for > -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for > all other tunings all entries are equal to the old single memmove cost > to ensure the behaviour remains the same. Thanks for doing this. Having the same cost for loads and stores has been a long-standing wart. > 2022-03-16 Tamar Christina > Andre Vieira > > gcc/ChangeLog: > > * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New > struct. > (struct tune_params): Change type of memmov_cost to use > cpu_memmov_cost. > * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update > all tunings > to use new cpu_memmov_cost struct. > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index > f2fde35c6eb4989af8736db8fad004171c160282..5190eb8b96ea9af809a28470905b8b85ee720b09 > 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -508,6 +508,18 @@ struct cpu_prefetch_tune >const int default_opt_level; > }; > > +/* Model the costs for loads/stores for reload so that it can do more I'd say s/reload/the register allocators/ here, since the costs affect decisions made by IRA too. > + accurate spill heuristics. */ > +struct cpu_memmov_cost > +{ > + int load_int; > + int store_int; > + int load_fp; > + int store_fp; > + int load_pred; > + int store_pred; > +}; > + > struct tune_params > { >const struct cpu_cost_table *insn_extra_cost; > […] > @@ -14501,12 +14633,41 @@ aarch64_register_move_cost (machine_mode mode, >return regmove_cost->FP2FP; > } > > +/* Implements TARGET_MEMORY_MOVE_COST. */ > static int > -aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED, > - reg_class_t rclass ATTRIBUTE_UNUSED, > - bool in ATTRIBUTE_UNUSED) > +aarch64_memory_move_cost (machine_mode mode, reg_class_t rclass_i, bool in) > { > - return aarch64_tune_params.memmov_cost; > + enum reg_class rclass = (enum reg_class) rclass_i; > + switch (rclass) > +{ > +case PR_LO_REGS: > +case PR_HI_REGS: > +case PR_REGS: > + return in ? aarch64_tune_params.memmov_cost.load_pred > + : aarch64_tune_params.memmov_cost.store_pred; > +case POINTER_AND_FP_REGS: > +case ALL_REGS: > + { > + if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL) > + return in ? aarch64_tune_params.memmov_cost.load_pred > + : aarch64_tune_params.memmov_cost.store_pred; > + > + if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode)) > + return in ? aarch64_tune_params.memmov_cost.load_fp > + : aarch64_tune_params.memmov_cost.store_fp; > + > + return in ? aarch64_tune_params.memmov_cost.load_int > + : aarch64_tune_params.memmov_cost.store_int; > + } > +case FP_LO8_REGS: > +case FP_LO_REGS: > +case FP_REGS: > + return in ? aarch64_tune_params.memmov_cost.load_fp > + : aarch64_tune_params.memmov_cost.store_fp; > +default: > + return in ? aarch64_tune_params.memmov_cost.load_int > + : aarch64_tune_params.memmov_cost.store_int; > +} > } It would be good to avoid listing individual subclasses if possible, since it's easy for the list to get out of date if more subclasses are added. An alternative would be: if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL ? reg_classes_intersect_p (rclass, PR_REGS) : reg_class_subset_p (rclass, PR_REGS)) return (in ? aarch64_tune_params.memmov_cost.load_pred : aarch64_tune_params.memmov_cost.store_pred); if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode) ? reg_classes_intersect_p (rclass, FP_REGS) : reg_class_subset_p (rclass, FP_REGS)) return (in ? aarch64_tune_params.memmov_cost.load_fp : aarch64_tune_params.memmov_cost.store_fp); return (in ? aarch64_tune_params.memmov_cost.load_int : aarch64_tune_params.memmov_cost.store_int); OK with that change, if it works. Thanks, Richard
Re: [aarch64] Update regmove costs for neoverse-v1 and neoverse-512tvb tunings
"Andre Vieira (lists)" writes: > Hi, > > This patch updates the register move tunings for > -mcpu/-mtune={neoverse-v1,neoverse-512tvb}. > > 2022-03-16 Tamar Christina > Andre Vieira > > * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New > tuning struct. > (neoversev1_tunings): Use neoversev1_regmove_cost and update > store_int cost. > (neoverse512tvb_tunings): Likewise. > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 9b6f67dc592d8a447d6b28390c90abe5dcfa5f08..f0485574528c47221e17a3aa5aee70a56508f61e > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -670,6 +670,16 @@ static const struct cpu_regmove_cost > neoversen2_regmove_cost = >2 /* FP2FP */ > }; > > +static const struct cpu_regmove_cost neoversev1_regmove_cost = > +{ > + 1, /* GP2GP */ > + /* Spilling to int<->fp instead of memory is recommended so set > + realistic costs compared to memmv_cost. */ s/memmv_cost/memmov_cost/ Same for the earlier patches, I just didn't see it till now :-) OK with that change on the basis that it forms a pair with the follow-on memmov patch. Thanks, Richard > + 3, /* GP2FP */ > + 2, /* FP2GP */ > + 2 /* FP2FP */ > +}; > + > static const struct cpu_regmove_cost demeter_regmove_cost = > { >1, /* GP2GP */ > @@ -2063,13 +2073,13 @@ static const struct tune_params neoversev1_tunings = > { >&cortexa76_extra_costs, >&neoversev1_addrcost_table, > - &generic_regmove_cost, > + &neoversev1_regmove_cost, >&neoversev1_vector_cost, >&generic_branch_cost, >&generic_approx_modes, >SVE_256, /* sve_width */ >{ 4, /* load_int. */ > -1, /* store_int. */ > +2, /* store_int. */ > 6, /* load_fp. */ > 2, /* store_fp. */ > 6, /* load_pred. */ > @@ -2200,13 +2210,13 @@ static const struct tune_params > neoverse512tvb_tunings = > { >&cortexa76_extra_costs, >&neoversev1_addrcost_table, > - &generic_regmove_cost, > + &neoversev1_regmove_cost, >&neoverse512tvb_vector_cost, >&generic_branch_cost, >&generic_approx_modes, >SVE_128 | SVE_256, /* sve_width */ >{ 4, /* load_int. */ > -1, /* store_int. */ > +2, /* store_int. */ > 6, /* load_fp. */ > 2, /* store_fp. */ > 6, /* load_pred. */
rs6000: RFC/Update support for addg6s instruction. PR100693
Hi, RFC/Update support for addg6s instruction. PR100693 For PR100693, we currently provide an addg6s builtin using unsigned int arguments, but we are missing an unsigned long long argument equivalent. This patch adds an overload to provide the long long version of the builtin. unsigned long long __builtin_addg6s (unsigned long long, unsigned long long); RFC/concerns: This patch works, but looking briefly at intermediate stages is not behaving quite as I expected. Looking at the intermediate dumps, I see in pr100693.original that calls I expect to be routed to the internal __builtin_addg6s_si() that uses (unsigned int) arguments are instead being handled by __builtin_addg6s_di() with casts that convert the arguments to (unsigned long long). i.e. return (unsigned int) __builtin_addg6s_di ((long long unsigned int) a, (long long unsigned int) b); As a test, I see if I swap the order of the builtins in rs6000-overload.def I end up with code casting the ULL values to UI, which provides truncated results, and is similar to what occurs today without this patch. All that said, this patch seems to work. OK for next stage 1? Tested on power8BE as well as LE power8,power9,power10. 2022-03-15 Will Schmidt gcc/ PR target/100693 * config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs() and add entries for __builtin_addg6s_di() and __builtin_addg6s_si(). * config/rs6000/rs6000-overload.def: Add overloaded entries allowing __builtin_addg6s() to map to either of the __builtin_addg6s_{di,si} builtins. * config/rs6000/rs6000.md: Add UNSPEC_ADDG6S_SI and UNSPEC_ADDG6S_DI unspecs. Add define_insn entries for addg6s_si and addg6s_di based on those unspecs. * doc/extend.texi: Add entry for ULL __builtin_addg6s (ULL, ULL); testsuite/ PR target/100693 * gcc.target/powerpc/pr100693.c: New test. diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index ae2760c33389..4c23cac26932 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1993,12 +1993,16 @@ XXSPLTD_V2DI vsx_xxspltd_v2di {} ; Power7 builtins (ISA 2.06). [power7] - const unsigned int __builtin_addg6s (unsigned int, unsigned int); -ADDG6S addg6s {} + const unsigned long long __builtin_addg6s_di (unsigned long long, \ + unsigned long long); +ADDG6S_DI addg6s_di {} + + const unsigned int __builtin_addg6s_si (unsigned int, unsigned int); +ADDG6S_SI addg6s_si {} const signed long __builtin_bpermd (signed long, signed long); BPERMD bpermd_di {32bit} const unsigned int __builtin_cbcdtd (unsigned int); diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 44e2945aaa0e..931f85b738c5 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -76,10 +76,15 @@ ; Blank lines may be used as desired in this file between the lines as ; defined above; that is, you can introduce as many extra newlines as you ; like after a required newline, but nowhere else. Lines beginning with ; a semicolon are also treated as blank lines. +[ADDG6S, __builtin_i_addg6s, __builtin_addg6s] + unsigned long long __builtin_addg6s_di (signed long long, unsigned long long); +ADDG6S_DI + unsigned int __builtin_addg6s_si (unsigned int, unsigned int); +ADDG6S_SI [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd] vsq __builtin_vec_bcdadd (vsq, vsq, const int); BCDADD_V1TI vuc __builtin_vec_bcdadd (vuc, vuc, const int); diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index fdfbc6566a5c..d040f127eb55 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -122,11 +122,12 @@ (define_c_enum "unspec" UNSPEC_P8V_MTVSRWZ UNSPEC_P8V_RELOAD_FROM_GPR UNSPEC_P8V_MTVSRD UNSPEC_P8V_XXPERMDI UNSPEC_P8V_RELOAD_FROM_VSX - UNSPEC_ADDG6S + UNSPEC_ADDG6S_SI + UNSPEC_ADDG6S_DI UNSPEC_CDTBCD UNSPEC_CBCDTD UNSPEC_DIVE UNSPEC_DIVEU UNSPEC_UNPACK_128BIT @@ -14495,15 +14496,24 @@ (define_peephole2 operands[5] = change_address (mem, mode, new_addr); }) ;; Miscellaneous ISA 2.06 (power7) instructions -(define_insn "addg6s" +(define_insn "addg6s_si" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] - UNSPEC_ADDG6S))] + UNSPEC_ADDG6S_SI))] + "TARGET_POPCNTD" + "addg6s %0,%1,%2" + [(set_attr "type" "integer")]) + +(define_insn "addg6s_di" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "register_operand" "r")] + UNSPEC_
[PATCH] gimplify: Emit clobbers for TARGET_EXPR_SLOT vars later [PR103984]
Hi! As mentioned in the PR, we emit a bogus uninitialized warning but easily could emit wrong-code for it or similar testcases too. The bug is that we emit clobber for a TARGET_EXPR_SLOT too early: D.2499.e = B::qux (&h); [return slot optimization] D.2516 = 1; try { B::B (&D.2498, &h); try { _2 = baz (&D.2498); D.2499.f = _2; D.2516 = 0; try { try { bar (&D.2499); } finally { C::~C (&D.2499); } } finally { D.2499 = {CLOBBER(eol)}; } } finally { D.2498 = {CLOBBER(eol)}; } } catch { if (D.2516 != 0) goto ; else goto ; : A::~A (&D.2499.e); goto ; : : } The CLOBBER for D.2499 is essentially only emitted on the non-exceptional path, if B::B or baz throws, then there is no CLOBBER for it but there is a conditional destructor A::~A (&D.2499.e). Now, ehcleanup1 sink_clobbers optimization assumes that clobbers in the EH cases are emitted after last use and so sinks the D.2499 = {CLOBBER(eol)}; later, so we then have # _3 = PHI <1(3), 0(9)> : D.2499 ={v} {CLOBBER(eol)}; D.2498 ={v} {CLOBBER(eol)}; if (_3 != 0) goto ; [INV] else goto ; [INV] : _35 = D.2499.a; if (&D.2499.b != _35) where that _35 = D.2499.a comes from inline expansion of the A::~A dtor, and that is a load from a clobbered memory. Now, what the gimplifier sees in this case is a CLEANUP_POINT_EXPR with somewhere inside of it a TARGET_EXPR for D.2499 (with the C::~C (&D.2499) cleanup) which in its TARGET_EXPR_INITIAL has another TARGET_EXPR for D.2516 bool flag which has CLEANUP_EH_ONLY which performs that conditional A::~A (&D.2499.e) call. The following patch ensures that CLOBBERs (and asan poisoning) are emitted after even those gimple_push_cleanup pushed cleanups from within the TARGET_EXPR_INITIAL gimplification (i.e. the last point where the slot could be in theory used). In my first version of the patch I've done it by just moving the /* Add a clobber for the temporary going out of scope, like gimplify_bind_expr. */ if (gimplify_ctxp->in_cleanup_point_expr && needs_to_live_in_memory (temp)) { ... } block earlier in gimplify_target_expr, but that regressed a couple of tests where temp is marked TREE_ADDRESSABLE only during (well, very early during that) the gimplification of TARGET_EXPR_INITIAL, so we didn't emit e.g. on pr80032.C or stack2.C tests any clobbers for the slots and thus stack slot reuse wasn't performed. So that we don't regress those tests, this patch gimplifies TARGET_EXPR_INITIAL as before, but doesn't emit it directly into pre_p, emits it into a temporary sequence. Then emits the CLOBBER cleanup into pre_p, then asan poisoning if needed, then appends the TARGET_EXPR_INITIAL temporary sequence and finally adds TARGET_EXPR_CLEANUP gimple_push_cleanup. The earlier a GIMPLE_WCE appears in the sequence, the outer try/finally or try/catch it is. So, with this patch the part of the testcase in gimple dump cited above looks instead like: try { D.2499.e = B::qux (&h); [return slot optimization] D.2516 = 1; try { try { B::B (&D.2498, &h); _2 = baz (&D.2498); D.2499.f = _2; D.2516 = 0; try { bar (&D.2499); } finally { C::~C (&D.2499); } } finally { D.2498 = {CLOBBER(eol)}; } } catch { if (D.2516 != 0) goto ; else goto ; : A::~A (&D.2499.e); goto ; : : } } finally { D.2499 = {CLOBBER(eol)}; } Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-03-16 Jakub Jelinek PR middle-end/103984 * gimplify.cc (gimplify_target_expr): Gimplify type sizes and TARGET_EXPR_INITIAL into a temporary sequence, then push clobbers and asan unpois
Patch ping (Re: [PATCH] libatomic: Improve 16-byte atomics on Intel AVX [PR104688])
Hi! I'd like to ping this patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590960.html Thanks. On Mon, Feb 28, 2022 at 07:06:30AM +0100, Jakub Jelinek wrote: > As mentioned in the PR, the latest Intel SDM has added: > "Processors that enumerate support for Intel® AVX (by setting the feature > flag CPUID.01H:ECX.AVX[bit 28]) > guarantee that the 16-byte memory operations performed by the following > instructions will always be > carried out atomically: > • MOVAPD, MOVAPS, and MOVDQA. > • VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128. > • VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128 and > k0 (masking disabled). > (Note that these instructions require the linear addresses of their memory > operands to be 16-byte > aligned.)" > > The following patch deals with it just on the libatomic library side so far, > currently (since ~ 2017) we emit all the __atomic_* 16-byte builtins as > library calls since and this is something that we can hopefully backport. > > The patch simply introduces yet another ifunc variant that takes priority > over the pure CMPXCHG16B one, one that checks AVX and CMPXCHG16B bits and > on non-Intel clears the AVX bit during detection for now (if AMD comes > with the same guarantee, we could revert the config/x86/init.c hunk), > which implements 16-byte atomic load as vmovdqa and 16-byte atomic store > as vmovdqa followed by mfence. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk so far? > > 2022-02-28 Jakub Jelinek > > PR target/104688 > * Makefile.am (IFUNC_OPTIONS): Change on x86_64 to -mcx16 -mcx16. > (libatomic_la_LIBADD): Add $(addsuffix _16_2_.lo,$(SIZEOBJS)) for > x86_64. > * Makefile.in: Regenerated. > * config/x86/host-config.h (IFUNC_COND_1): For x86_64 define to > both AVX and CMPXCHG16B bits. > (IFUNC_COND_2): Define. > (IFUNC_NCOND): For x86_64 define to 2 * (N == 16). > (MAYBE_HAVE_ATOMIC_CAS_16, MAYBE_HAVE_ATOMIC_EXCHANGE_16, > MAYBE_HAVE_ATOMIC_LDST_16): Define to IFUNC_COND_2 rather than > IFUNC_COND_1. > (HAVE_ATOMIC_CAS_16): Redefine to 1 whenever IFUNC_ALT != 0. > (HAVE_ATOMIC_LDST_16): Redefine to 1 whenever IFUNC_ALT == 1. > (atomic_compare_exchange_n): Define whenever IFUNC_ALT != 0 > on x86_64 for N == 16. > (__atomic_load_n, __atomic_store_n): Redefine whenever IFUNC_ALT == 1 > on x86_64 for N == 16. > (atomic_load_n, atomic_store_n): New functions. > * config/x86/init.c (__libat_feat1_init): On x86_64 clear bit_AVX > if CPU vendor is not Intel. Jakub
Re: [C PATCH] PR c/98198: ICE-on-invalid-code error recovery.
On Wed, 16 Mar 2022, Roger Sayle wrote: > This is Christophe Lyon's fix to PR c/98198, an ICE-on-invalid-code > regression affecting mainline, and a suitable testcase. > Tested on x86_64-pc-linux-gnu with make bootstrap and make -k check > with no new failures. Ok for mainline? > > > 2022-03-16 Christophe Lyon > Roger Sayle > > gcc/c-family/ChangeLog > PR c/98198 > * c-attribs.cc (decl_or_type_attrs): Add error_mark_node check. > > gcc/testsuite/ChangeLog > PR c/98198 > * gcc.dg/pr98198.c: New test case. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [aarch64] Implement determine_suggested_unroll_factor
"Andre Vieira (lists)" writes: > Hi, > > This patch implements the costing function > determine_suggested_unroll_factor for aarch64. > It determines the unrolling factor by dividing the number of X > operations we can do per cycle by the number of X operations in the loop > body, taking this information from the vec_ops analysis during vector > costing and the available issue_info information. > We multiply the dividend by a potential reduction_latency, to improve > our pipeline utilization if we are stalled waiting on a particular > reduction operation. > > Right now we also have a work around for vectorization choices where the > main loop uses a NEON mode and predication is available, such that if > the main loop makes use of a NEON pattern that is not directly supported > by SVE we do not unroll, as that might cause performance regressions in > cases where we would enter the original main loop's VF. As an example if > you have a loop where you could use AVG_CEIL with a V8HI mode, you would > originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated > epilogue, using other instructions. Whereas with the unrolling you would > end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping > the original 8x NEON. In the future, we could handle this differently, > by either using a different costing model for epilogues, or potentially > vectorizing more than one single epilogue. > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_vector_costs): Define > determine_suggested_unroll_factor. > (determine_suggested_unroll_factor): New function. > (aarch64_vector_costs::finish_costs): Use > determine_suggested_unroll_factor. > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521 > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -15680,6 +15680,7 @@ private: >unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *, >unsigned int); >bool prefer_unrolled_loop () const; > + unsigned int determine_suggested_unroll_factor (); > >/* True if we have performed one-time initialization based on the > vec_info. */ > @@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count > *ops, >return sve_cycles_per_iter; > } > > +unsigned int > +aarch64_vector_costs::determine_suggested_unroll_factor () > +{ > + auto *issue_info = aarch64_tune_params.vec_costs->issue_info; > + if (!issue_info) > +return 1; > + bool sve = false; > + if (aarch64_sve_mode_p (m_vinfo->vector_mode)) Other code uses m_vec_flags & VEC_ANY_SVE for this. > +{ > + if (!issue_info->sve) > + return 1; > + sve = true; > +} > + else > +{ > + if (!issue_info->advsimd) > + return 1; The issue info should instead be taken from vec_ops.simd_issue_info () in the loop below. It can vary for each entry. > + /* If we are trying to unroll a NEON main loop that contains patterns s/a NEON/an Advanced SIMD/ > + that we do not support with SVE and we might use a predicated > + epilogue, we need to be conservative and block unrolling as this might > + lead to a less optimal loop for the first and only epilogue using the > + original loop's vectorization factor. > + TODO: Remove this constraint when we add support for multiple epilogue > + vectorization. */ > + if (partial_vectors_supported_p () > + && param_vect_partial_vector_usage != 0 > + && !TARGET_SVE2) > + { > + unsigned int i; > + stmt_vec_info stmt_vinfo; > + FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo) > + { > + if (is_pattern_stmt_p (stmt_vinfo)) > + { > + gimple *stmt = stmt_vinfo->stmt; > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt)) > + { > + enum internal_fn ifn > + = gimple_call_internal_fn (stmt); > + switch (ifn) > + { > + case IFN_AVG_FLOOR: > + case IFN_AVG_CEIL: > + return 1; > + default: > + break; > + } > + } > + } > + } > + } I think we should instead record this during add_stmt_cost, like we do for the other data that this function uses. We need to drop the partial_vectors_supported_p () condition though: enabling SVE must not make Advanced SIMD code worse. So for now, Advanced SIMD-only code will have to suffer the missed unrolling opportunity too. > +} > + > + unsigned int max_unroll_factor = 1; > + aarch64_simd_vec_issue_info const *vec_issue > += sve ? issue_info->sve : issue_info->a
[committed] analyzer: early rejection of disabled warnings [PR104955]
Avoid generating execution paths for warnings that are ultimately rejected due to -Wno-analyzer-* flags. This improves the test case from taking at least several minutes (before I killed it) to taking under a second. This doesn't fix the slowdown seen in PR analyzer/104955 with large numbers of warnings when the warnings are still enabled. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r12-7677-g7fd6e36ea9aa85. gcc/analyzer/ChangeLog: PR analyzer/104955 * diagnostic-manager.cc (get_emission_location): New. (diagnostic_manager::diagnostic_manager): Initialize m_num_disabled_diagnostics. (diagnostic_manager::add_diagnostic): Reject diagnostics that will eventually be rejected due to being disabled. (diagnostic_manager::emit_saved_diagnostics): Log the number of disabled diagnostics. (diagnostic_manager::emit_saved_diagnostic): Split out logic for determining emission location to get_emission_location. * diagnostic-manager.h (diagnostic_manager::m_num_disabled_diagnostics): New field. * engine.cc (stale_jmp_buf::get_controlling_option): New. (stale_jmp_buf::emit): Use it. * pending-diagnostic.h (pending_diagnostic::get_controlling_option): New vfunc. * region-model.cc (poisoned_value_diagnostic::get_controlling_option): New. (poisoned_value_diagnostic::emit): Use it. (shift_count_negative_diagnostic::get_controlling_option): New. (shift_count_negative_diagnostic::emit): Use it. (shift_count_overflow_diagnostic::get_controlling_option): New. (shift_count_overflow_diagnostic::emit): Use it. (dump_path_diagnostic::get_controlling_option): New. (dump_path_diagnostic::emit): Use it. (write_to_const_diagnostic::get_controlling_option): New. (write_to_const_diagnostic::emit): Use it. (write_to_string_literal_diagnostic::get_controlling_option): New. (write_to_string_literal_diagnostic::emit): Use it. * sm-file.cc (double_fclose::get_controlling_option): New. (double_fclose::emit): Use it. (file_leak::get_controlling_option): New. (file_leak::emit): Use it. * sm-malloc.cc (mismatching_deallocation::get_controlling_option): New. (mismatching_deallocation::emit): Use it. (double_free::get_controlling_option): New. (double_free::emit): Use it. (possible_null_deref::get_controlling_option): New. (possible_null_deref::emit): Use it. (possible_null_arg::get_controlling_option): New. (possible_null_arg::emit): Use it. (null_deref::get_controlling_option): New. (null_deref::emit): Use it. (null_arg::get_controlling_option): New. (null_arg::emit): Use it. (use_after_free::get_controlling_option): New. (use_after_free::emit): Use it. (malloc_leak::get_controlling_option): New. (malloc_leak::emit): Use it. (free_of_non_heap::get_controlling_option): New. (free_of_non_heap::emit): Use it. * sm-pattern-test.cc (pattern_match::get_controlling_option): New. (pattern_match::emit): Use it. * sm-sensitive.cc (exposure_through_output_file::get_controlling_option): New. (exposure_through_output_file::emit): Use it. * sm-signal.cc (signal_unsafe_call::get_controlling_option): New. (signal_unsafe_call::emit): Use it. * sm-taint.cc (tainted_array_index::get_controlling_option): New. (tainted_array_index::emit): Use it. (tainted_offset::get_controlling_option): New. (tainted_offset::emit): Use it. (tainted_size::get_controlling_option): New. (tainted_size::emit): Use it. (tainted_divisor::get_controlling_option): New. (tainted_divisor::emit): Use it. (tainted_allocation_size::get_controlling_option): New. (tainted_allocation_size::emit): Use it. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/many-disabled-diagnostics.c: New test. * gcc.dg/plugin/analyzer_gil_plugin.c (gil_diagnostic::get_controlling_option): New. (double_save_thread::emit): Use it. (fncall_without_gil::emit): Likewise. (pyobject_usage_without_gil::emit): Likewise. Signed-off-by: David Malcolm --- gcc/analyzer/diagnostic-manager.cc| 44 ++-- gcc/analyzer/diagnostic-manager.h | 1 + gcc/analyzer/engine.cc| 7 +- gcc/analyzer/pending-diagnostic.h | 6 ++ gcc/analyzer/region-model.cc | 61 gcc/analyzer/sm-file.cc | 16 - gcc/analyzer/sm-malloc.cc | 72 +++ gcc/analyzer/sm-pattern-test.cc | 8 ++- gcc/analyzer/sm-sensitive.cc | 8 ++- gcc/analyzer/sm-signa
Re: rs6000: RFC/Update support for addg6s instruction. PR100693
Hi! On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote: > For PR100693, we currently provide an addg6s builtin using unsigned > int arguments, but we are missing an unsigned long long argument > equivalent. This patch adds an overload to provide the long long > version of the builtin. > > unsigned long long __builtin_addg6s (unsigned long long, unsigned long long); > > RFC/concerns: This patch works, but looking briefly at intermediate stages > is not behaving quite as I expected. Looking at the intermediate dumps, I > see in pr100693.original that calls I expect to be routed to the internal > __builtin_addg6s_si() that uses (unsigned int) arguments are instead being > handled by __builtin_addg6s_di() with casts that convert the arguments to > (unsigned long long). Did you test with actual 32-bit variables, instead of just function arguments? Function arguments are always passed in (sign-extended) registers. Like, unsigned int f(unsigned int *a, unsigned int *b) { return __builtin_addg6s(*a, *b); } > As a test, I see if I swap the order of the builtins in rs6000-overload.def > I end up with code casting the ULL values to UI, which provides truncated > results, and is similar to what occurs today without this patch. > > All that said, this patch seems to work. OK for next stage 1? > Tested on power8BE as well as LE power8,power9,power10. Please ask again when stage 1 has started? > gcc/ > PR target/100693 > * config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs() > and add entries for __builtin_addg6s_di() and __builtin_addg6s_si(). Indent of second and further lines should be at the "*", not two spaces after that. > - UNSPEC_ADDG6S > + UNSPEC_ADDG6S_SI > + UNSPEC_ADDG6S_DI You do not need multiple unspec numbers. You can differentiate them based on the modes of the arguments, already :-) > ;; Miscellaneous ISA 2.06 (power7) instructions > -(define_insn "addg6s" > +(define_insn "addg6s_si" >[(set (match_operand:SI 0 "register_operand" "=r") > (unspec:SI [(match_operand:SI 1 "register_operand" "r") > (match_operand:SI 2 "register_operand" "r")] > -UNSPEC_ADDG6S))] > +UNSPEC_ADDG6S_SI))] > + "TARGET_POPCNTD" > + "addg6s %0,%1,%2" > + [(set_attr "type" "integer")]) > + > +(define_insn "addg6s_di" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (unspec:DI [(match_operand:DI 1 "register_operand" "r") > + (match_operand:DI 2 "register_operand" "r")] > +UNSPEC_ADDG6S_DI))] >"TARGET_POPCNTD" >"addg6s %0,%1,%2" >[(set_attr "type" "integer")]) (define_insn "addg6s" [(set (match_operand:GPR 0 "register_operand" "=r") (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") (match_operand:GPR 2 "register_operand" "r")] UNSPEC_ADDG6S))] "TARGET_POPCNTD" "addg6s %0,%1,%2" [(set_attr "type" "integer")]) We do not want DI (here, and in most places) for -m32! > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile { target { powerpc*-*-linux* } } } */ Why only on Linux? > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ Why not on Darwin? And why skip it anyway, given the previous line :-) > +/* { dg-require-effective-target powerpc_vsx_ok } */ That is the wrong requirement. You want to test for Power7, not for VSX. I realise you probably copied this from elsewhere :-( (If from another addg6s testcase, just keep it). Segher
[r12-7670 Regression] FAIL: g++.dg/tree-ssa/pr86544.C -std=gnu++98 scan-tree-dump-times phiopt4 "if" 0 on Linux/x86_64
On Linux/x86_64, f6fb661ea8ac7e17c6924719de6219f002c4efef is the first bad commit commit f6fb661ea8ac7e17c6924719de6219f002c4efef Author: Richard Biener Date: Wed Mar 16 13:39:31 2022 +0100 tree-optimization/102008 - restore if-conversion of adjacent loads caused FAIL: gcc.dg/tree-ssa/phi-opt-21.c scan-tree-dump phiopt4 "converted to straightline code" FAIL: gcc.dg/tree-ssa/popcount3.c scan-tree-dump-times phiopt4 "if" 0 FAIL: gcc.dg/tree-ssa/pr18134.c scan-tree-dump-times optimized "= a_..D. != 0" 1 FAIL: gcc.dg/tree-ssa/pr96480.c scan-tree-dump optimized " = _[0-9]* <= 3;" FAIL: gcc.dg/tree-ssa/sccp-2.c scan-tree-dump-times optimized "bb" 1 FAIL: gcc.dg/tree-ssa/ssa-hoist-4.c scan-tree-dump-times optimized "MAX_EXPR" 1 FAIL: gcc.target/i386/pr95852-1.c scan-assembler-times \tjn?o\t 16 FAIL: gcc.target/i386/pr95852-1.c scan-assembler-times \tsetno\t 8 FAIL: gcc.target/i386/pr95852-1.c scan-assembler-times \tseto\t 8 FAIL: gcc.target/i386/pr95852-2.c scan-assembler-times \tjn?o\t 16 FAIL: gcc.target/i386/pr95852-2.c scan-assembler-times \tseto\t 8 FAIL: gcc.target/i386/pr95852-3.c scan-assembler-times \tjn?o\t 16 FAIL: gcc.target/i386/pr95852-3.c scan-assembler-times \tsetno\t 8 FAIL: gcc.target/i386/pr95852-3.c scan-assembler-times \tseto\t 8 FAIL: gcc.target/i386/pr95852-4.c scan-assembler-times \tjn?o\t 16 FAIL: gcc.target/i386/pr95852-4.c scan-assembler-times \tseto\t 8 FAIL: g++.dg/tree-ssa/pr86544.C -std=gnu++14 scan-tree-dump-times phiopt4 "if" 0 FAIL: g++.dg/tree-ssa/pr86544.C -std=gnu++17 scan-tree-dump-times phiopt4 "if" 0 FAIL: g++.dg/tree-ssa/pr86544.C -std=gnu++20 scan-tree-dump-times phiopt4 "if" 0 FAIL: g++.dg/tree-ssa/pr86544.C -std=gnu++98 scan-tree-dump-times phiopt4 "if" 0 with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7670/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/popcount3.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/popcount3.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make chec
Re: rs6000: RFC/Update support for addg6s instruction. PR100693
On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote: > > For PR100693, we currently provide an addg6s builtin using unsigned > > int arguments, but we are missing an unsigned long long argument > > equivalent. This patch adds an overload to provide the long long > > version of the builtin. > > > > unsigned long long __builtin_addg6s (unsigned long long, unsigned > > long long); > > > > RFC/concerns: This patch works, but looking briefly at intermediate > > stages > > is not behaving quite as I expected. Looking at the intermediate > > dumps, I > > see in pr100693.original that calls I expect to be routed to the > > internal > > __builtin_addg6s_si() that uses (unsigned int) arguments are > > instead being > > handled by __builtin_addg6s_di() with casts that convert the > > arguments to > > (unsigned long long). > > Did you test with actual 32-bit variables, instead of just function > arguments? Function arguments are always passed in (sign-extended) > registers. > > Like, > > unsigned int f(unsigned int *a, unsigned int *b) > { > return __builtin_addg6s(*a, *b); > } I perhaps missed that subtlety. I'll investigate that further. > > > As a test, I see if I swap the order of the builtins in rs6000- > > overload.def > > I end up with code casting the ULL values to UI, which provides > > truncated > > results, and is similar to what occurs today without this patch. > > > > All that said, this patch seems to work. OK for next stage 1? > > Tested on power8BE as well as LE power8,power9,power10. > > Please ask again when stage 1 has started? > > > gcc/ > > PR target/100693 > > * config/rs6000/rs600-builtins.def: Remove entry for > > __builtin_addgs() > > and add entries for __builtin_addg6s_di() and > > __builtin_addg6s_si(). > > Indent of second and further lines should be at the "*", not two > spaces > after that. > > > - UNSPEC_ADDG6S > > + UNSPEC_ADDG6S_SI > > + UNSPEC_ADDG6S_DI > > You do not need multiple unspec numbers. You can differentiate them > based on the modes of the arguments, already :-) > > > ;; Miscellaneous ISA 2.06 (power7) instructions > > -(define_insn "addg6s" > > +(define_insn "addg6s_si" > >[(set (match_operand:SI 0 "register_operand" "=r") > > (unspec:SI [(match_operand:SI 1 "register_operand" "r") > > (match_operand:SI 2 "register_operand" "r")] > > - UNSPEC_ADDG6S))] > > + UNSPEC_ADDG6S_SI))] > > + "TARGET_POPCNTD" > > + "addg6s %0,%1,%2" > > + [(set_attr "type" "integer")]) > > + > > +(define_insn "addg6s_di" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (unspec:DI [(match_operand:DI 1 "register_operand" "r") > > + (match_operand:DI 2 "register_operand" "r")] > > + UNSPEC_ADDG6S_DI))] > >"TARGET_POPCNTD" > >"addg6s %0,%1,%2" > >[(set_attr "type" "integer")]) > > (define_insn "addg6s" > [(set (match_operand:GPR 0 "register_operand" "=r") > (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") >(match_operand:GPR 2 "register_operand" "r")] > UNSPEC_ADDG6S))] > "TARGET_POPCNTD" > "addg6s %0,%1,%2" > [(set_attr "type" "integer")]) > You do not need multiple unspec numbers. You can differentiate them > based on the modes of the arguments, already :-) Yeah, Thats what I thought, which is a big part of why I posted this with RFC. :-)When I attempted this there was an issue with multiple s (behind the GPR predicate) versus the singular "addg6s" define_insn. It's possible I had something else wrong there, but I'll go back to that attempt and work in that direction. > > We do not want DI (here, and in most places) for -m32! > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c > > @@ -0,0 +1,68 @@ > > +/* { dg-do compile { target { powerpc*-*-linux* } } } */ > > Why only on Linux? > > > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > Why not on Darwin? And why skip it anyway, given the previous line > :-) > > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > That is the wrong requirement. You want to test for Power7, not for > VSX. I realise you probably copied this from elsewhere :-( (If from > another addg6s testcase, just keep it). Because reasons. :-) The stanzas are copied from the nearby bcd-1.c testcase that has a simpler test for addg6s.Given the input I'll try to correct the stanzas here and limit how much error I carry along. Thanks for the feedback and review. I'll investigate further, and resubmit at stage1. Thanks, -Will > > > Segher
Re: [PATCH][Middle-end][Backport to GCC11][PR100775]Updating the reg use in exit block for -fzero-call-used-regs
Thanks. Just committed the patch to gcc11 branch. Qing > On Mar 14, 2022, at 2:46 AM, Richard Biener > wrote: > > On Fri, Mar 11, 2022 at 5:31 PM Qing Zhao via Gcc-patches > wrote: >> >> >> Hi, >> >> I plan to backport the patch to fix PR100775: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100775 >> >> To gcc11 since this is a general bug to -fzero-call-used-regs. And should be >> fixed in gcc11 as well. >> >> I have tested the patch with gcc11 release branch on both x86 and aarch64, >> no regression. >> >> Okay for commit? > > OK > >> thanks. >> >> Qing. >> >> == >> >> >> From 737a0b0824111f46da44c5578b9769070c52 Mon Sep 17 00:00:00 2001 >> From: Qing Zhao >> Date: Thu, 10 Mar 2022 23:22:29 + >> Subject: [PATCH] middle-end: updating the reg use in exit block for >> -fzero-call-used-regs [PR100775] GCC11 backport. >> >> In the pass_zero_call_used_regs, when updating dataflow info after adding >> the register zeroing sequence in the epilogue of the function, we should >> call "df_update_exit_block_uses" to update the register use information in >> the exit block to include all the registers that have been zeroed. >> >> gcc/ChangeLog: >> >>PR middle-end/100775 >>* function.c (gen_call_used_regs_seq): Call >>df_update_exit_block_uses when updating df. >> >> gcc/testsuite/ChangeLog: >> >>PR middle-end/100775 >>* gcc.target/arm/pr100775.c: New test. >> --- >> gcc/function.c | 2 +- >> gcc/testsuite/gcc.target/arm/pr100775.c | 9 + >> 2 files changed, 10 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/arm/pr100775.c >> >> diff --git a/gcc/function.c b/gcc/function.c >> index fc7b147b5f1..0495e9f1b81 100644 >> --- a/gcc/function.c >> +++ b/gcc/function.c >> @@ -5922,7 +5922,7 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int >> zero_regs_type) >> >> /* Update the data flow information. */ >> crtl->must_be_zero_on_return |= zeroed_hardregs; >> - df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun)); >> + df_update_exit_block_uses (); >> } >> } >> >> diff --git a/gcc/testsuite/gcc.target/arm/pr100775.c >> b/gcc/testsuite/gcc.target/arm/pr100775.c >> new file mode 100644 >> index 000..c648cd5e8f7 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/pr100775.c >> @@ -0,0 +1,9 @@ >> +/* { dg-do compile } */ >> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */ >> +/* { dg-options "-mthumb -fzero-call-used-regs=used" } */ >> + >> +int >> +foo (int x) >> +{ >> + return x; >> +} >> -- >> 2.27.0 >> >>
Re: RFA: crc builtin functions & optimizations
> and there needs to be code to actually expand the builtin using optabs. > And something needs to be done to make match.pd work on the output. Never mind that bit, that was just due to a function argument type mismatch on the last argument of the crc built-in functions.
Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]
> On Mar 14, 2022, at 11:04 AM, Richard Sandiford > wrote: > > Sorry for the slow response, was out for a few days. > > Xi Ruoyao writes: >> On Sat, 2022-03-12 at 18:48 +0800, Xi Ruoyao via Gcc-patches wrote: >>> On Fri, 2022-03-11 at 21:26 +, Qing Zhao wrote: Hi, Ruoyao, (I might not be able to reply to this thread till next Wed due to a short vacation). First, some comments on opening bugs against Gcc: I took a look at the bug reports PR104817 and PR104820: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104820 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104817 I didn’t see a testing case and a script to repeat the error, so I cannot repeat the error at my side. >>> >>> I've put the test case, but maybe you didn't see it because it is too >>> simple: >>> >>> echo 'int t() {}' | /home/xry111/git-repos/gcc-test-mips/gcc/cc1 - >>> nostdinc -fzero-call-used-regs=all >>> >>> An empty function is enough to break -fzero-call-used-regs=all. And >>> if >>> you append -mips64r2 to the cc1 command line you'll get 104820. I >>> enabled 4 existing tests for MIPS (reported "not work" on MIPS) in the >>> patch so I think it's unnecessary to add new test cases. >>> >>> Richard: can we use MIPS_EPILOGUE_TEMP as a scratch register in the >>> sequence for zeroing the call-used registers, and then zero itself >>> (despite it's not in need_zeroed_hardregs)? >> >> No, it leads to an ICE at stage 3 bootstrapping :(. >> >> Now I think the only rational ways are: >> >> (1) allow zeroing more registers than need_zeroed_hardregs. > > I think this is the way to go. I agree it's a bit hacky, but it seems > like the least worst option. > > A less hacky alternative would be to pass an extra argument to the hook > that contains the set of registers that the hook is *allowed* to clobber. > For -fzero-call-used-regs=X, this new argument would be the set that > would have been chosen for -fzero-call-used-regs=all, regardless of > what X actually is. We could then assert that the extra registers we > want to clobber are in that set (which will be true for all values of X). If we have to go this way, I think it’s better to make the change you suggested above, and then also update the documentation, both internal documentation on how to define the hook and the user level documentation on what the user might expect when using this option (i.e, it’s possible that the compiler might clear more registers than the user requests on some targets due to the implementation limitation). I can make this change if we decide to do this. Thanks. Qing > >> Or >> >> (2) allow zeroing less registers than need_zeroed_hardregs (then I'll >> skip ST_REGS, after all they are just 8 bits in total). > > Yeah, this is explicitly OK, provided that the target maintainers > feel that the contents of the registers in question are not a significant > security concern. I don't feel I can make that call though. It's really > a question for the userbase. > > Thanks, > Richard > >> If all these are unacceptable, then >> >> (3) I'll just call sorry in MIPS target hook to tell not to use this >> feature on MIPS.
libgo: update to 1.18 release
This patch updates libgo to the final 1.18 release. Bootstrapped and ran Go testsuite on x86_64-pc-linux-nu. Committed to mainline. Ian patch.txt.gz Description: application/gzip
Re: rs6000: RFC/Update support for addg6s instruction. PR100693
On Wed, Mar 16, 2022 at 03:06:42PM -0500, will schmidt wrote: > On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote: > > (define_insn "addg6s" > > [(set (match_operand:GPR 0 "register_operand" "=r") > > (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") > > (match_operand:GPR 2 "register_operand" "r")] > > UNSPEC_ADDG6S))] > > "TARGET_POPCNTD" > > "addg6s %0,%1,%2" > > [(set_attr "type" "integer")]) > > You do not need multiple unspec numbers. You can differentiate > them > > based on the modes of the arguments, already :-) > > Yeah, Thats what I thought, which is a big part of why I posted this > with RFC. :-)When I attempted this there was an issue with multiple > s (behind the GPR predicate) versus the singular "addg6s" > define_insn. > It's possible I had something else wrong there, but I'll > go back to that attempt and work in that direction. No, that is still there. One way out is to make this an unnamed pattern (like "*addg6s"). Another is to put the mode in the name, and then you probably want to make it a parameterized name as well, which then hides all that again: (define_insn "@addg6s" ... > > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > > > That is the wrong requirement. You want to test for Power7, not for > > VSX. I realise you probably copied this from elsewhere :-( (If from > > another addg6s testcase, just keep it). > > Because reasons. :-) The stanzas are copied from the nearby bcd-1.c > testcase that has a simpler test for addg6s.Given the input I'll > try to correct the stanzas here and limit how much error I carry along. If you do not improve the existing ones it may be best to just keep this the same in the new testcases as well. Consistency is a good thing. Consistent wrong is not so great of course :-) Segher
[PATCH] c++: alias template and empty parameter packs [PR104008]
Zero-length pack expansions are treated as if no list were provided at all, that is, with template struct S { }; template void g() { S...>; } g will result in S<>. In the following test we have something similar: template using IsOneOf = disjunction...>; and then we have "IsOneOf..." where OtherHolders is an empty pack. Since r11-7931, we strip_typedefs in TYPE_PACK_EXPANSION. In this test that results in "IsOneOf" being turned into "disjunction<>". So the whole expansion is now "disjunction<>...". But then we error in make_pack_expansion because find_parameter_packs_r won't find the pack OtherHolders. We strip the alias template because dependent_alias_template_spec_p says it's not dependent. It it not dependent because this alias is not TEMPLATE_DECL_COMPLEX_ALIAS_P. My understanding is that currently we consider an alias complex if it 1) expands a pack from the enclosing class, as in template typename... TT> struct S { template using X = P...>; }; where the alias expands TT; or 2) the expansion does *not* name all the template parameters, as in template struct R; template using U = R...>; where T is not named in the expansion. But IsOneOf is neither. And it can't know how it's going to be used. Therefore I think we cannot make it complex (and in turn dependent) to fix this bug. After much gnashing of teeth, I think we simply want to avoid stripping the alias if the new pattern doesn't have any parameter packs to expand. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11? PR c++/104008 gcc/cp/ChangeLog: * tree.cc (strip_typedefs): Don't strip an alias template when doing so would result in losing a parameter pack. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/variadic-alias3.C: New test. * g++.dg/cpp0x/variadic-alias4.C: New test. --- gcc/cp/tree.cc | 13 +- gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C | 45 ++ gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C | 48 3 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc index 6e9be713c51..eb59e56610b 100644 --- a/gcc/cp/tree.cc +++ b/gcc/cp/tree.cc @@ -1778,7 +1778,18 @@ strip_typedefs (tree t, bool *remove_attributes, unsigned int flags) if (TYPE_P (pat)) { type = strip_typedefs (pat, remove_attributes, flags); - if (type != pat) + /* Empty packs can thwart our efforts here. Consider + + template + using IsOneOf = disjunction...>; + + where IsOneOf seemingly uses all of its template parameters in + its expansion (and does not expand a pack from the enclosing + class), so the alias is not marked as complex. However, it may + be used as in "IsOneOf", where Ts is an empty parameter pack, + and stripping it down into "disjunction<>" here would exclude the + Ts pack, resulting in an error. */ + if (type != pat && uses_parameter_packs (type)) { result = copy_node (t); PACK_EXPANSION_PATTERN (result) = type; diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C b/gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C new file mode 100644 index 000..6b6dd9f4c85 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C @@ -0,0 +1,45 @@ +// PR c++/104008 +// { dg-do compile { target c++11 } } + +template struct conjunction; +template struct disjunction; +template struct is_same; +template struct enable_if; +template using enable_if_t = typename enable_if<_Cond>::type; +struct B; +struct __uniq_ptr_impl { + struct _Ptr { +using type = B *; + }; + using pointer = _Ptr::type; +}; +struct unique_ptr { + using pointer = __uniq_ptr_impl::pointer; + unique_ptr(pointer); +}; +template +using IsOneOf = disjunction...>; + +template struct any_badge; + +struct badge { + badge(any_badge<>); + badge(); +}; + +template struct any_badge { + template ...>::value>> + any_badge(); +}; + +template unique_ptr make_unique(_Args... __args); + +struct B { + B(badge); + unique_ptr b_ = make_unique(badge{}); +}; + +template unique_ptr make_unique(_Args... __args) { + return new B(__args...); +} diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C b/gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C new file mode 100644 index 000..896a4725627 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C @@ -0,0 +1,48 @@ +// PR c++/104008 +// { dg-do compile { target c++11 } } +// Differs from variadic-alias3.C only in the pattern of a pack expansion +// in line 34. But it's important to check that we also deal with more +// complex patterns. + +template struct conjunctio
Re: [PATCH] fortran: Separate associate character lengths earlier [PR104570]
Hi Mikael, Am 14.03.22 um 19:28 schrieb Mikael Morin: Hello, this workarounds the regression I introduced with the fix for pr104228. The problem comes from the evaluation of character length for the associate target (y) in the testcase. The expression is non-scalar which is not supported at that point, as no scalarization setup is made there. My initial direction to fix this was trying to modify the evaluation of expressions so that the rank was ignored in the context of character length evaluation. That’s the patch I attached to the PR. The patch I’m proposing here tries instead to avoid the need to evaluate character length through sharing of gfc_charlen between symbols and expressions. I slightly prefer the latter direction, though I’m not sure it’s more robust. > At least it passes regression testing, which proves some basic level of goodness. OK for master? I tried a few minor variations of the testcase, and they also seemed to work. The patch looks simple enough, so OK from my side. Even if PR104570 is a 12 regression only, the PR104228 fix is targeted at all the open branches so this one as well. OK for 11/10/9? Yes, after giving it a short time on master to "burn in". Thanks for the patch! Harald
[committed] libstdc++: Fix symbol versioning for Solaris 11.3 [PR103407]
Tested x86_64-linux and sparc-sun-solaris2.11 (but 11.3 only). Pushed to trunk. Rainer, this should allow you to continue omitting the _ZSt10from_charsPKcS0_R[def]St12chars_format symbols from the baseline, without the current FAIL. Please check on your other Solaris targets. -- >8 -- The new std::from_chars implementation means that those symbols are now defined on Solaris 11.3, which lacks uselocale. They were not present in gcc-11, but the linker script gives them the GLIBCXX_3.4.29 symbol version because that is the version where they appeared for systems with uselocale. This makes the version for those symbols depend on whether uselocale is available or not, so that they get version GLIBCXX_3.4.30 on targets where they weren't defined in gcc-11. In order to avoid needing separate ABI baseline files for Solaris 11.3 and 11.4, the ABI checker program now treats the floating-point std::from_chars overloads as undesignated if they are not found in the baseline symbols file. This means they can be left out of the SOlaris baseline without causing the check-abi target to fail. libstdc++-v3/ChangeLog: PR libstdc++/103407 * config/abi/pre/gnu.ver: Make version for std::from_chars depend on HAVE_USELOCALE macro. * testsuite/util/testsuite_abi.cc (compare_symbols): Treat std::from_chars for floating-point types as undesignated if not found in the baseline symbols file. --- libstdc++-v3/config/abi/pre/gnu.ver | 7 +++ libstdc++-v3/testsuite/util/testsuite_abi.cc | 13 + 2 files changed, 20 insertions(+) diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index dedee8c1b0a..9b80a31768b 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -2337,8 +2337,10 @@ GLIBCXX_3.4.28 { GLIBCXX_3.4.29 { +#ifdef HAVE_USELOCALE # std::from_chars _ZSt10from_charsPKcS0_R[def]St12chars_format; +#endif # std::__istream_extract(istream&, char*, streamsize) _ZSt17__istream_extractRSiPc[ilx]; @@ -2416,6 +2418,11 @@ GLIBCXX_3.4.29 { GLIBCXX_3.4.30 { +#ifndef HAVE_USELOCALE +# std::from_chars +_ZSt10from_charsPKcS0_R[def]St12chars_format; +#endif + _ZSt21__glibcxx_assert_fail*; #ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc b/libstdc++-v3/testsuite/util/testsuite_abi.cc index 3dee737b2e5..5c8383554e5 100644 --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc @@ -499,6 +499,19 @@ compare_symbols(const char* baseline_file, const char* test_file, else undesignated.push_back(stest); } + // See PR libstdc++/103407 - abi_check FAILs on Solaris + else if (stest.type == symbol::function +&& stest.name.compare(0, 23, "_ZSt10from_charsPKcS0_R") == 0 +&& stest.name.find_first_of("def", 23) == 23 +&& (stest.version_name == "GLIBCXX_3.4.29" + || stest.version_name == "GLIBCXX_3.4.30")) + { + stest.status = symbol::undesignated; + if (!check_version(stest, false)) + incompatible.push_back(symbol_pair(stest, stest)); + else + undesignated.push_back(stest); + } else { stest.status = symbol::added; -- 2.34.1
Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings
On 3/15/22 19:24, Siddhesh Poyarekar wrote: On 16/03/2022 02:06, Martin Sebor wrote: The intended use of the strncmp bound is to limit the comparison to at most the size of the arrays or (in a subset of cases) the length of an initial substring. Providing an arbitrary bound that's not related to the sizes as you describe sounds very much like a misuse. Nothing in the standard says that the bound is related to the sizes of input buffers. I don't think deducing that intent makes sense either, nor concluding that any other use case is misuse. As a historical note, strncmp was first introduced in UNIX v7 where its purpose, alongside strncpy, was to manipulate (potentially) unterminated character arrays like file names stored in fixed size arrays (typically 14 bytes). Strncpy would fill the buffers with ASCII data up to their size and pad the rest with nuls only if there was room. Strncmp was then used to compare these potentially unterminated character arrays (e.g., archive headers in ld and ranlib). The bound was the size of the fixed size array. Its other use case was to compare leading portions of strings (e.g, when looking for an environment variable or when stripping "./" from path names). Thanks for sharing the historical perspective. Since the early UNIX days, both strncpy and to a lesser extent strncmp have been widely misused and, along with many other functions in , a frequent source of bugs due to common misunderstanding of their intended purpose. The aim of these warnings is to detect the common (and sometimes less common) misuses and bugs. They're all valid uses however since they do not violate the standard. If we find at compile time that the strings don't terminate at the bounds, emitting the warning is OK but the more pessimistic check seems like overkill. I haven't seen these so I can't very well comment on them. But I can assure you that warning for the code above is intentional. Whether or not the arrays are nul-terminated, the expected way to call the function is with a bound no greater than their size (some coding guidelines are explicit about this; see for example the CERT C Secure Coding standard rule ARR38-C). (Granted, the manual makes it sound like -Wstringop-overread only detects provable past-the-end reads. That's a mistake in the documentation that should be fixed. The warning was never quite so limited, nor was it intended to be.) The contention is not that it's not provable, it's more that it's doesn't even pass the "based on available information this is definitely buggy" assertion, making it more a strong suggestion than a warning that something is definitely amiss. Which is why IMO it is more suitable as an analyzer check than a warning. As the GCC manual prominently states (and as I already pointed out) warnings are: constructions that are not inherently erroneous but that are risky or suggest there may have been an error. None of them implies a definite bug, and only a minority are about violations of the standard. Many point out benign code that's just suspicious. We can disagree (and many of us do) about the value of this or that warning but that alone is not sufficient reason to change it. Least of all in stage 4, and with no details about the issues you say you found. Martin
Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]
Thanks for the detailed review. I have uploaded patch v2 based on the review. v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html Changes since v1: 1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix test regressions caused by those patterns. 2. Support signed integers except for the possible INT_MIN / -1 case. 3. Support cases where X and Y have different precisions. On Tue, 22 Feb 2022 at 15:54, Richard Biener wrote: > +/* (type)X / (type)Y -> (type)(X / Y) > + when the resulting type is at least precise as the original types > + and when all the types are unsigned integral types. */ > +(simplify > + (trunc_div (convert @0) (convert @1)) > + (if (INTEGRAL_TYPE_P (type) > + && INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > + && TYPE_UNSIGNED (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && TYPE_UNSIGNED (TREE_TYPE (@1)) > + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) > + && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0))) > + (convert (trunc_div @0 @1 > > since you are requiring the types of @0 and @1 to match it's easier to write > > && types_match (TREE_TYPE(@0), TREE_TYPE (@1)) > Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now have different precisions, so I believe that I can't use types_match() anymore. > that allows you to elide checks on either @0 or @1. I suppose the transform > does not work for signed types because of the -INT_MIN / -1 overflow case? > It might be possible to use expr_not_equal_to (@0, -INT_MIN) || > expr_not_equal_to (@1, -1) > (correctly written, lookup the existing examples in match.pd for the X > % -Y transform) That's right. I have made changes to support signed types except for the INT_MIN / -1 case. > I'll note that as written the transform will not catch CST / (T)x or > (T)x / CST since > you'll not see conversions around constants. I'm not sure whether > using (convert[12]? ...) > or writing special patterns with INTEGER_CST operands is more convenient. > There is int_fits_type_p to check whether a constant will fit in a > type without truncation > or sign change. I see. I couldn't find an easy way to use (convert[12]? ...) in this case so I added 2 other patterns for the CST / (T)x and (T)x / CST cases. > When @0 and @1 do not have the same type there might still be a common type > that can be used and is smaller than 'type', it might be as simple as using > build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/). Thanks for the tip. I've made a similar change, but using either TREE_TYPE (@0) or TREE_TYPE (@1) instead of build_nonstandard_integer_type(). > > In the past there have been attempts to more globally narrow operations using > a new pass rather than using individual patterns. So for more complicated > cases > that might be the way to go. There's now also the ISEL pass which does > pre-RTL expansion transforms that need some global context and for example > can look at SSA uses. I had wanted to do something similar to handle all integral trunc_divs, but I'm not sure where/how to start. It seems out of my league at this moment. I'll gladly explore it after this change though! From f5f768b55f6cadcd9eba459561abfc1d7e28f94e Mon Sep 17 00:00:00 2001 From: Zhao Wei Liew Date: Sat, 19 Feb 2022 16:28:38 +0800 Subject: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y This pattern converts (trunc_div (convert X) (convert Y)) to (convert (trunc_div X Y)) when: 1. type, X, and Y are all INTEGRAL_TYPES with the same signedness. 2. type has TYPE_PRECISION at least as large as those of X and Y. 3. the result of the division is guaranteed to fit in either the type of Y or the type of Y. This patch also adds corresponding patterns for CST / (type)Y and (type)X / CST. These patterns useful as wider divisions are typically more expensive. Bootstrapped and regression tested on x86_64-pc-linux-gnu. Signed-off-by: Zhao Wei Liew PR tree-optimization/103855 gcc/ChangeLog: * match.pd: Add patterns for (type)X / (type)Y, CST / (type)Y, and (type)X / CST. gcc/testsuite/ChangeLog: * gcc.dg/ipa/pr91088.c: Fix a test regression. * gcc.dg/tree-ssa/divide-8.c: New test. --- gcc/match.pd | 57 gcc/testsuite/gcc.dg/ipa/pr91088.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/divide-8.c | 45 +++ 3 files changed, 104 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c diff --git a/gcc/match.pd b/gcc/match.pd index 8b44b5cc92c..b0bfb94f506 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -687,6 +687,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) (convert (trunc_mod @0 @1 +/* (type)X / (type)Y -> (type)(X / Y) + when all types are integral and have th
Re: [PATCH] c++/96765: warn when casting "this" to Derived* in Base ctor/dtor
On 3/14/22 02:36, Zhao Wei Liew via Gcc-patches wrote: This patch adds a warning when casting "this" to Derived* in the Base class constructor and destructor. I've added the warning under the -Wextra flag as I can't find a more appropriate flag for it. It's generally best to add a new warning flag if an existing one isn't a good match. Maybe -Wderived-cast-undefined? It seems like it could be part of -Wall. + if (current_function_decl + && (DECL_CONSTRUCTOR_P (current_function_decl) + || DECL_DESTRUCTOR_P (current_function_decl)) + && TREE_CODE (expr) == NOP_EXPR + && is_this_parameter (TREE_OPERAND (expr, 0))) I think the resolves_to_fixed_type_p function would be useful here; you're testing for a subset of the cases it handles. +warning_at(loc, OPT_Wextra, Missing space before '('. Jason
Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings
On 17/03/2022 05:11, Martin Sebor wrote: As the GCC manual prominently states (and as I already pointed out) warnings are: We are indeed going around in circles. Hopefully someone else will pitch in and break the deadlock. Siddhesh
[BACKPORT] Backport PR fortran/96983 fix to GCC 11
Backport PR fortran/96983 patch to GCC 11. I applied a patch on the trunk in April 22nd, 2021 that fixes an issue (PR fortran/66983) where we could fail for 128-bit floating point types because we don't have a built-in function that is equivalent to llround for 128-bit integer types. Instead, the patch does a round operation followed by conversion to the appropriate integer size if we don't have the specialized round to specific integer type built-in functions. When I checked in the change, I was told to wait until after GCC 11.1 shipped before doing the backport. I forgot about the change until recently. Before checking it in, I did bootstraps and ran regression tests on: 1) little endian power9 using 128-bit IBM long double 2) little endian power9 using 128-bit IEEE long double 3) big endian power8 (both 64-bit and 32-bit) (and) 4) x86_64 native build There were no new regressions. The test gfortran.dg/pr96711.f90 now passes on PowerPC. In the 10 months or so since the change was made on the trunk, the code in build_round_expr did not change. 2022-03-16 Michael Meissner gcc/fortran/ PR fortran/96983 * trans-intrinsic.c (build_round_expr): If int type is larger than long long, do the round and convert to the integer type. Do not try to find a floating point type the exact size of the integer type. Backport patch from 2021-04-22 on the trunk. --- gcc/fortran/trans-intrinsic.c | 26 -- 1 file changed, 8 insertions(+), 18 deletions(-) diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index bd6e6039429..313440d7022 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -386,30 +386,20 @@ build_round_expr (tree arg, tree restype) argprec = TYPE_PRECISION (argtype); resprec = TYPE_PRECISION (restype); - /* Depending on the type of the result, choose the int intrinsic - (iround, available only as a builtin, therefore cannot use it for - __float128), long int intrinsic (lround family) or long long - intrinsic (llround). We might also need to convert the result - afterwards. */ + /* Depending on the type of the result, choose the int intrinsic (iround, + available only as a builtin, therefore cannot use it for _Float128), long + int intrinsic (lround family) or long long intrinsic (llround). If we + don't have an appropriate function that converts directly to the integer + type (such as kind == 16), just use ROUND, and then convert the result to + an integer. We might also need to convert the result afterwards. */ if (resprec <= INT_TYPE_SIZE && argprec <= LONG_DOUBLE_TYPE_SIZE) fn = builtin_decl_for_precision (BUILT_IN_IROUND, argprec); else if (resprec <= LONG_TYPE_SIZE) fn = builtin_decl_for_precision (BUILT_IN_LROUND, argprec); else if (resprec <= LONG_LONG_TYPE_SIZE) fn = builtin_decl_for_precision (BUILT_IN_LLROUND, argprec); - else if (resprec >= argprec && resprec == 128) -{ - /* Search for a real kind suitable as temporary for conversion. */ - int kind = -1; - for (int i = 0; kind < 0 && gfc_real_kinds[i].kind != 0; i++) - if (gfc_real_kinds[i].mode_precision >= resprec) - kind = gfc_real_kinds[i].kind; - if (kind < 0) - gfc_internal_error ("Could not find real kind with at least %d bits", - resprec); - arg = fold_convert (gfc_get_real_type (kind), arg); - fn = gfc_builtin_decl_for_float_kind (BUILT_IN_ROUND, kind); -} + else if (resprec >= argprec) +fn = builtin_decl_for_precision (BUILT_IN_ROUND, argprec); else gcc_unreachable (); -- 2.35.1 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.
This patch only handle pure-slp for by-value passed parameter which has nothing to do with IPA but psABI. For by-reference passed parameter IPA is required. The patch is aggressive in determining STLF failure, any unaligned_load for parm_decl passed by stack is thought to have STLF stall issue. It could lose some perf where there's no such issue(1 vector_load vs n scalar_load + CTOR). According to microbenchmark in PR, cost of STLF failure is generally between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD processors. gcc/ChangeLog: PR target/101908 * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. (ix86_vector_costs::add_stmt_cost): Add extra cost for unsigned_load which may have store forwarding stall issue. * config/i386/i386.h (processor_costs): Add new member stfs. * config/i386/x86-tune-costs.h (i386_size_cost): Initialize stfs. (i386_cost, i486_cost, pentium_cost, lakemont_cost, pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost, amdfam10_cost, bdver_cost, znver1_cost, znver2_cost, znver3_cost, skylake_cost, icelake_cost, alderlake_cost, btver1_cost, btver2_cost, pentium4_cost, nocano_cost, atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost, core_cost): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr101908-1.c: New test. * gcc.target/i386/pr101908-2.c: New test. * gcc.target/i386/pr101908-3.c: New test. * gcc.target/i386/pr101908-v16hi.c: New test. * gcc.target/i386/pr101908-v16qi.c: New test. * gcc.target/i386/pr101908-v16sf.c: New test. * gcc.target/i386/pr101908-v16si.c: New test. * gcc.target/i386/pr101908-v2df.c: New test. * gcc.target/i386/pr101908-v2di.c: New test. * gcc.target/i386/pr101908-v2hi.c: New test. * gcc.target/i386/pr101908-v2qi.c: New test. * gcc.target/i386/pr101908-v2sf.c: New test. * gcc.target/i386/pr101908-v2si.c: New test. * gcc.target/i386/pr101908-v4df.c: New test. * gcc.target/i386/pr101908-v4di.c: New test. * gcc.target/i386/pr101908-v4hi.c: New test. * gcc.target/i386/pr101908-v4qi.c: New test. * gcc.target/i386/pr101908-v4sf.c: New test. * gcc.target/i386/pr101908-v4si.c: New test. * gcc.target/i386/pr101908-v8df-adl.c: New test. * gcc.target/i386/pr101908-v8df.c: New test. * gcc.target/i386/pr101908-v8di-adl.c: New test. * gcc.target/i386/pr101908-v8di.c: New test. * gcc.target/i386/pr101908-v8hi-adl.c: New test. * gcc.target/i386/pr101908-v8hi.c: New test. * gcc.target/i386/pr101908-v8qi-adl.c: New test. * gcc.target/i386/pr101908-v8qi.c: New test. * gcc.target/i386/pr101908-v8sf-adl.c: New test. * gcc.target/i386/pr101908-v8sf.c: New test. * gcc.target/i386/pr101908-v8si-adl.c: New test. * gcc.target/i386/pr101908-v8si.c: New test. --- gcc/config/i386/i386.cc | 51 +++ gcc/config/i386/i386.h| 1 + gcc/config/i386/x86-tune-costs.h | 28 ++ gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++ gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++ gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++ .../gcc.target/i386/pr101908-v16hi.c | 6 ++ .../gcc.target/i386/pr101908-v16qi.c | 30 +++ .../gcc.target/i386/pr101908-v16sf.c | 6 ++ .../gcc.target/i386/pr101908-v16si.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2df.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2di.c | 7 ++ gcc/testsuite/gcc.target/i386/pr101908-v2hi.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 gcc/testsuite/gcc.target/i386/pr101908-v2sf.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2si.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4df.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4di.c | 7 ++ gcc/testsuite/gcc.target/i386/pr101908-v4hi.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 gcc/testsuite/gcc.target/i386/pr101908-v4sf.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4si.c | 6 ++ .../gcc.target/i386/pr101908-v8df-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8df.c | 6 ++ .../gcc.target/i386/pr101908-v8di-adl.c | 7 ++ gcc/testsuite/gcc.target/i386/pr101908-v8di.c | 7 ++ .../gcc.target/i386/pr101908-v8hi-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8hi.c | 6 ++ .../gcc.target/i386/pr101908-v8qi-adl.c | 22 + gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 + .../gcc.target/i386/pr101908-v8sf-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8sf.c | 6 ++ .../gcc.target/i386/pr101908-v8si-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8si.c | 6 ++ 34 files changed, 444 insertions(+) create mo
[PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.
This patch only handle pure-slp for by-value passed parameter which has nothing to do with IPA but psABI. For by-reference passed parameter IPA is required. The patch is aggressive in determining STLF failure, any unaligned_load for parm_decl passed by stack is thought to have STLF stall issue. It could lose some perf where there's no such issue(1 vector_load vs n scalar_load + CTOR). According to microbenchmark in PR, cost of STLF failure is generally between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD processors. gcc/ChangeLog: PR target/101908 * config/i386/i386.cc (ix86_load_maybe_stfs_p): New. (ix86_vector_costs::add_stmt_cost): Add extra cost for unsigned_load which may have store forwarding stall issue. * config/i386/i386.h (processor_costs): Add new member stfs. * config/i386/x86-tune-costs.h (i386_size_cost): Initialize stfs. (i386_cost, i486_cost, pentium_cost, lakemont_cost, pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost, amdfam10_cost, bdver_cost, znver1_cost, znver2_cost, znver3_cost, skylake_cost, icelake_cost, alderlake_cost, btver1_cost, btver2_cost, pentium4_cost, nocano_cost, atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost, core_cost): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr101908-1.c: New test. * gcc.target/i386/pr101908-2.c: New test. * gcc.target/i386/pr101908-3.c: New test. * gcc.target/i386/pr101908-v16hi.c: New test. * gcc.target/i386/pr101908-v16qi.c: New test. * gcc.target/i386/pr101908-v16sf.c: New test. * gcc.target/i386/pr101908-v16si.c: New test. * gcc.target/i386/pr101908-v2df.c: New test. * gcc.target/i386/pr101908-v2di.c: New test. * gcc.target/i386/pr101908-v2hi.c: New test. * gcc.target/i386/pr101908-v2qi.c: New test. * gcc.target/i386/pr101908-v2sf.c: New test. * gcc.target/i386/pr101908-v2si.c: New test. * gcc.target/i386/pr101908-v4df.c: New test. * gcc.target/i386/pr101908-v4di.c: New test. * gcc.target/i386/pr101908-v4hi.c: New test. * gcc.target/i386/pr101908-v4qi.c: New test. * gcc.target/i386/pr101908-v4sf.c: New test. * gcc.target/i386/pr101908-v4si.c: New test. * gcc.target/i386/pr101908-v8df-adl.c: New test. * gcc.target/i386/pr101908-v8df.c: New test. * gcc.target/i386/pr101908-v8di-adl.c: New test. * gcc.target/i386/pr101908-v8di.c: New test. * gcc.target/i386/pr101908-v8hi-adl.c: New test. * gcc.target/i386/pr101908-v8hi.c: New test. * gcc.target/i386/pr101908-v8qi-adl.c: New test. * gcc.target/i386/pr101908-v8qi.c: New test. * gcc.target/i386/pr101908-v8sf-adl.c: New test. * gcc.target/i386/pr101908-v8sf.c: New test. * gcc.target/i386/pr101908-v8si-adl.c: New test. * gcc.target/i386/pr101908-v8si.c: New test. --- gcc/config/i386/i386.cc | 51 +++ gcc/config/i386/i386.h| 1 + gcc/config/i386/x86-tune-costs.h | 28 ++ gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++ gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++ gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++ .../gcc.target/i386/pr101908-v16hi.c | 6 ++ .../gcc.target/i386/pr101908-v16qi.c | 30 +++ .../gcc.target/i386/pr101908-v16sf.c | 6 ++ .../gcc.target/i386/pr101908-v16si.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2df.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2di.c | 7 ++ gcc/testsuite/gcc.target/i386/pr101908-v2hi.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 gcc/testsuite/gcc.target/i386/pr101908-v2sf.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v2si.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4df.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4di.c | 7 ++ gcc/testsuite/gcc.target/i386/pr101908-v4hi.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 gcc/testsuite/gcc.target/i386/pr101908-v4sf.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v4si.c | 6 ++ .../gcc.target/i386/pr101908-v8df-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8df.c | 6 ++ .../gcc.target/i386/pr101908-v8di-adl.c | 7 ++ gcc/testsuite/gcc.target/i386/pr101908-v8di.c | 7 ++ .../gcc.target/i386/pr101908-v8hi-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8hi.c | 6 ++ .../gcc.target/i386/pr101908-v8qi-adl.c | 22 + gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 + .../gcc.target/i386/pr101908-v8sf-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8sf.c | 6 ++ .../gcc.target/i386/pr101908-v8si-adl.c | 6 ++ gcc/testsuite/gcc.target/i386/pr101908-v8si.c | 6 ++ 34 files changed, 444 insertions(+) create mo
[PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]
Hi, This patch adds V1TI mode into a new mode iterator used in vector comparison expands.With the patch, both built-ins and direct comparison could generate P10 new V1TI comparison instructions. Bootstrapped and tested on ppc64 Linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-03-16 Haochen Gui gcc/ PR target/103316 * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET, RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT, RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI. * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10 V1TI instructions. (vec_cmp): Set mode iterator to VEC_IC. (vec_cmpu): Likewise. gcc/testsuite/ PR target/103316 * gcc.target/powerpc/pr103316.c: New. * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector __int128. patch.diff diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 5d34c1bcfc9..fac7f43f438 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case RS6000_BIF_VCMPEQUH: case RS6000_BIF_VCMPEQUW: case RS6000_BIF_VCMPEQUD: -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple - folding produces worse code for 128-bit compares. */ +case RS6000_BIF_VCMPEQUT: fold_compare_helper (gsi, EQ_EXPR, stmt); return true; case RS6000_BIF_VCMPNEB: case RS6000_BIF_VCMPNEH: case RS6000_BIF_VCMPNEW: -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple - folding produces worse code for 128-bit compares. */ +case RS6000_BIF_VCMPNET: fold_compare_helper (gsi, NE_EXPR, stmt); return true; @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case RS6000_BIF_CMPGE_U4SI: case RS6000_BIF_CMPGE_2DI: case RS6000_BIF_CMPGE_U2DI: -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI - for now, because gimple folding produces worse code for 128-bit - compares. */ +case RS6000_BIF_CMPGE_1TI: +case RS6000_BIF_CMPGE_U1TI: fold_compare_helper (gsi, GE_EXPR, stmt); return true; @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case RS6000_BIF_VCMPGTUW: case RS6000_BIF_VCMPGTUD: case RS6000_BIF_VCMPGTSD: -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST - for now, because gimple folding produces worse code for 128-bit - compares. */ +case RS6000_BIF_VCMPGTUT: +case RS6000_BIF_VCMPGTST: fold_compare_helper (gsi, GT_EXPR, stmt); return true; @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) case RS6000_BIF_CMPLE_U4SI: case RS6000_BIF_CMPLE_2DI: case RS6000_BIF_CMPLE_U2DI: -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI - for now, because gimple folding produces worse code for 128-bit - compares. */ +case RS6000_BIF_CMPLE_1TI: +case RS6000_BIF_CMPLE_U1TI: fold_compare_helper (gsi, LE_EXPR, stmt); return true; diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index b87a742cca8..d88869cc8d0 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -26,6 +26,9 @@ ;; Vector int modes (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI]) +;; Vector int modes for comparison +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")]) + ;; 128-bit int modes (define_mode_iterator VEC_TI [V1TI TI]) @@ -533,10 +536,10 @@ (define_expand "vcond_mask_" ;; For signed integer vectors comparison. (define_expand "vec_cmp" - [(set (match_operand:VEC_I 0 "vint_operand") + [(set (match_operand:VEC_IC 0 "vint_operand") (match_operator 1 "signed_or_equality_comparison_operator" - [(match_operand:VEC_I 2 "vint_operand") - (match_operand:VEC_I 3 "vint_operand")]))] + [(match_operand:VEC_IC 2 "vint_operand") + (match_operand:VEC_IC 3 "vint_operand")]))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" { enum rtx_code code = GET_CODE (operands[1]); @@ -573,10 +576,10 @@ (define_expand "vec_cmp" ;; For unsigned integer vectors comparison. (define_expand "vec_cmpu" - [(set (match_operand:VEC_I 0 "vint_operand") + [(set (match_operand:VEC_IC 0 "vint_operand") (match_operator 1 "unsigned_or_equality_comparison_operator" - [(match_operand:VEC_I 2 "vint_operand") - (match_operand:VEC_I 3 "vint_operand")]))] + [(match_operand:VEC_IC 2 "vint_operand") + (match_operand:VEC_IC 3 "vint_operand")]))] "VECTOR_UNIT_ALTIVEC_OR_V
Re: [PATCH v2] c++/96765: warn when casting Base* to Derived* in Base ctor/dtor
Thanks for the review. I've tested and uploaded a new patch v2 with the requested changes. On Thu, 17 Mar 2022 at 09:20, Jason Merrill wrote: > > On 3/14/22 02:36, Zhao Wei Liew via Gcc-patches wrote: > > > This patch adds a warning when casting "this" to Derived* in the Base > > class constructor and destructor. I've added the warning under the > > -Wextra flag as I can't find a more appropriate flag for it. > > It's generally best to add a new warning flag if an existing one isn't a > good match. Maybe -Wderived-cast-undefined? It seems like it could be > part of -Wall. Hmm, I agree. I've added a new -Wderived-cast-undefined flag for this. > > + if (current_function_decl > > + && (DECL_CONSTRUCTOR_P (current_function_decl) > > + || DECL_DESTRUCTOR_P (current_function_decl)) > > + && TREE_CODE (expr) == NOP_EXPR > > + && is_this_parameter (TREE_OPERAND (expr, 0))) > > I think the resolves_to_fixed_type_p function would be useful here; > you're testing for a subset of the cases it handles. Does the new patch look like how you intended it to? The new patch passes all regression tests and newly added tests. However, I don't fully understand the code for resolves_to_fixed_type_p() and fixed_type_or_null(), except that I see that they contain logic to return -1 when within a ctor/dtor. Thanks! From a8bbd4158f3311372b67d32816ce40d5613ae0d8 Mon Sep 17 00:00:00 2001 From: Zhao Wei Liew Date: Tue, 22 Feb 2022 16:03:17 +0800 Subject: [PATCH] c++: warn when casting Base* to Derived* in Base ctor/dtor [PR96765] Casting "this" in a base class constructor to a derived class type is undefined behaviour, but there is no warning when doing so. Add a warning for this. Signed-off-by: Zhao Wei Liew PR c++/96765 gcc/c-family/ChangeLog: * c.opt (-Wderived-cast-undefined): New option. gcc/cp/ChangeLog: * typeck.cc (build_static_cast_1): Add a warning when casting Base* to Derived* in Base constructor and destructor. gcc/testsuite/ChangeLog: * g++.dg/warn/Wderived-cast-undefined.C: New test. --- gcc/c-family/c.opt| 4 +++ gcc/cp/typeck.cc | 6 .../g++.dg/warn/Wderived-cast-undefined.C | 33 +++ 3 files changed, 43 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 9cfd2a6bc4e..1681395c529 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -598,6 +598,10 @@ C++ ObjC++ Var(warn_deprecated_enum_float_conv) Warning Warn about deprecated arithmetic conversions on operands where one is of enumeration type and the other is of a floating-point type. +Wderived-cast-undefined +C++ Var(warn_derived_cast_undefined) Warning LangEnabledBy(C++, Wall) +Warn about undefined casts from Base* to Derived* in the Base constructor or destructor. + Wdesignated-init C ObjC Var(warn_designated_init) Init(1) Warning Warn about positional initialization of structs requiring designated initializers. diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index 516fa574ef6..0bbbc378806 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -8079,6 +8079,12 @@ build_static_cast_1 (location_t loc, tree type, tree expr, bool c_cast_p, { tree base; + if (TREE_CODE (expr) == NOP_EXPR + && resolves_to_fixed_type_p (TREE_OPERAND (expr, 0)) == -1) +warning_at (loc, OPT_Wderived_cast_undefined, + "invalid % from type %qT to type %qT before the latter is constructed", + intype, type); + if (processing_template_decl) return expr; diff --git a/gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C b/gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C new file mode 100644 index 000..ea7388514dc --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C @@ -0,0 +1,33 @@ +// PR c++/96765 +// { dg-options "-Wderived-cast-undefined" } + +struct Derived; +struct Base { + Derived *x; + Derived *y; + Base(); + ~Base(); +}; + +struct Derived : Base {}; + +Base::Base() +: x(static_cast(this)), // { dg-warning "invalid 'static_cast'" } + y((Derived *)this) // { dg-warning "invalid 'static_cast'" } +{ + static_cast(this); // { dg-warning "invalid 'static_cast'" } + (Derived *)this; // { dg-warning "invalid 'static_cast'" } +} + +Base::~Base() { + static_cast(this); // { dg-warning "invalid 'static_cast'" } + (Derived *)this; // { dg-warning "invalid 'static_cast'" } +} + +struct Other { + Other() { +Base b; +static_cast(&b); +(Derived *)(&b); + } +}; -- 2.25.1