[PATCH v2] Performance/size improvement to single_use when matching GIMPLE.

2022-03-16 Thread Roger Sayle

Here's version 2 of my patch, incorporating Richard Biener's suggestions.
I've also done an analysis of the stage3 sizes of gimple-match.o on
x86_64-pc-linux-gnu, which I believe is dominated by debug information,
the .o file is 30MB in stage3, but only 4.8M in stage2.  Before my
proposed patch gimple-match.o is 31385160 bytes.  The patch as proposed
yesterday (using a single loop in single_use) reduces that to 31105040
bytes, saving 280120 bytes.  The suggestion to remove the "inline"
keyword saves only 56 more bytes, but annotating ATTRIBUTE_PURE on a
function prototype was curiously effective, saving 1888 bytes.

before:   31385160
after:31105040  saved 280120
-inline:  31104984  saved 56
+pure:31103096  saved 1888

I'll post two more patches later today, that save an additional 53K
and 58K respectively (by tweaking genmatch) but the issue is with the
size of debugging info (or something else?).

This revised patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check with no new failures.
Ok for mainline?


2022-03-16  Roger Sayle  
Richard Biener  

gcc/ChangeLog
* gimple-match-head.cc (single_use): Implement inline using a
single loop.


Thanks in advance,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 15 March 2022 09:18
> To: Roger Sayle 
> Cc: 'GCC Patches' ; 'Marc Glisse'
> 
> Subject: Re: [PATCH] Performance/size improvement to single_use when
> matching GIMPLE.
> 
> On Tue, 15 Mar 2022, Roger Sayle wrote:
> 
> >
> >
> > This patch improves the implementation of single_use as used in code
> >
> > generated from match.pd for patterns using :s.  The current
> > implementation
> >
> > contains the logic "has_zero_uses (t) || has_single_use (t)" which
> >
> > performs a loop over the uses to first check if there are zero
> > non-debug
> >
> > uses [which is rare], then another loop over these uses to check if
> > there
> >
> > is exactly one non-debug use.  This can be better implemented using a
> >
> > single loop.
> >
> >
> >
> > This function is currently inlined over 800 times in gimple-match.cc,
> >
> > whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up
> >
> > and shrinking this function should help offset the growth in match.pd
> >
> > for GCC 12.
> >
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >
> > and make -k check with no new failures.  Ok for mainline?
> 
> Note the intent of has_zero_uses () is even simpler - it's the case for
when
> there's no SSA operand info on the stmt (no update_stmt called yet).  More
> precisely it wants to catch the case where the definition of the SSA name
is not
> in the IL.
> 
> I'm not sure if we want to twist the effective semantics at this point (I
guess we
> do not want that), so the patch looks like an improvement.  But may I ask
to
> move the function out of line for even more savings?  Just put it in
gimple-
> match-head.cc and have it not declared inline.  I think we may want to go
as far
> and declare the function 'pure' using ATTRIBUTE_PURE.
> 
> >
> >
> >
> >
> > 2022-03-15  Roger Sayle  
> >
> >
> >
> > gcc/ChangeLog
> >
> > * gimple-match-head.cc (single_use): Implement inline using a
> >
> > single loop.
> >
> >
> >
> > Thanks in advance,
> >
> > Roger
> >
> > --
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 74d5818..1c74d38 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -1160,10 +1160,28 @@ types_match (tree t1, tree t2)
non-SSA_NAME (ie constants) and zero uses to cope with uses
that aren't linked up yet.  */
 
-static inline bool
-single_use (tree t)
+static bool
+single_use (const_tree) ATTRIBUTE_PURE;
+
+static bool
+single_use (const_tree t)
 {
-  return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t);
+  if (TREE_CODE (t) != SSA_NAME)
+return true;
+
+  /* Inline return has_zero_uses (t) || has_single_use (t);  */
+  const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t));
+  const ssa_use_operand_t *ptr;
+  bool single = false;
+
+  for (ptr = head->next; ptr != head; ptr = ptr->next)
+if (USE_STMT(ptr) && !is_gimple_debug (USE_STMT (ptr)))
+  {
+if (single)
+  return false;
+   single = true;
+  }
+  return true;
 }
 
 /* Return true if math operations should be canonicalized,


Re: RFA: crc builtin functions & optimizations

2022-03-16 Thread Richard Biener via Gcc-patches
On Tue, Mar 15, 2022 at 4:15 PM Joern Rennecke
 wrote:
>
> On 15/03/2022, Richard Biener  wrote:
>
> > Why's this a new pass?  Every walk over all insns costs time.
>
> If should typically scan considerably less than all the insns.
>
> >  The pass
> > lacks any comments as to what CFG / stmt structure is matched.
>
> I've put a file in:
> config/riscv/tree-crc-doc.txt
>
> would this text be suitabe to put in a comment block in tree-crc.cc ?

Yes, that would be a better place I think.

> >  From
> > a quick look it seems like it first(?) statically matches a stmt sequence
> > without considering intermediate stmts, so matching should be quite
> > fragile.
>
> It might be fragile inasmuch as it won't match when things change, but
> the matching has remained effective for seven years and across two
> architecture families with varying word sizes.
> And with regards to matching only what it's supposed to match, I believe
> I have checked all the data dependencies and phis so that it's definitely
> calculating a CRC.
>
> >  Why not match (sub-)expressions with the help of match.pd?
>
> Can you match a loop with match.pd ?

No, this is why I said (sub-)expression.  I'm mainly talking about

  tmp = (tmp >> 1) | (tmp << (sizeof (tmp) * (8 /*CHAR_BIT*/) - 1));
  if ((long)tmp < 0)

for example - one key bit of the CRC seems to be a comparison, so
you'd match that with sth like

(match (crc_compare @0)
 (lt (bit_ior (rshift @0 integer_onep) (lshift @0 INTEGER_CST@1)) integer_zerop)
 (if (compare_tree_int (@1, TYPE_SIZE_UNIT (TREE_TYPE (@0)) * 8 - 1

where you can add alternative expression forms.  You can then use
the generated match function from the pass.  See for example the
tree-ssa-forwprop.cc use of gimple_ctz_table_index.

> > Any reason why you match CRC before early inlinig and thus even when
> > not optimizing?  Matching at least after early FRE/DCE/DSE would help
> > to get rid of abstraction and/or memory temporary uses.
>
> I haven't originally placed it there, but I believe benefits include:
> - Getting rid of loop without having to actively deleting it in the
> crc pass (this also
>   might be safer as we just have to make sure we're are computing the CRC, and
>   DCE will determine if there is any ancillary result that is left,
> and only delete the
>   loop if it's really dead.
> - The optimized function is available for inlining.

The canonical place to transform loops into builtins would be loop distribution.
Another place would be final value replacement since you basically replace
the reduction result with a call to the builtin, but I think
loop-distribution is
the better overall place.  See how we match strlen() there.

Richard.


[PATCH] [i386] Don't fold __builtin_ia32_blendvpd w/o sse4.2.

2022-03-16 Thread liuhongt via Gcc-patches
__builtin_ia32_blendvpd is defined under sse4.1 and gimple folded
to ((v2di) c) < 0 ? b : a where vec_cmpv2di is under sse4.2 w/o which
it's veclowered to scalar operations and not combined back in rtl.

Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}.
Ready push to main trunk.

gcc/ChangeLog:

PR target/104946
* config/i386/i386-builtin.def (BDESC): Add
CODE_FOR_sse4_1_blendvpd for IX86_BUILTIN_BLENDVPD.
* config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
__builtin_ia32_blendvpd w/o sse4.2

gcc/testsuite/ChangeLog:

* gcc.target/i386/sse4_1-blendvpd-1.c: New test.
---
 gcc/config/i386/i386-builtin.def  |  2 +-
 gcc/config/i386/i386.cc   |  8 +++-
 gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c | 11 +++
 3 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index ea327555639..b410102614d 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -906,7 +906,7 @@ BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
CODE_FOR_ssse3_palignrdi,
 /* SSE4.1 */
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendpd, 
"__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) 
V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendps, 
"__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_V4SF_INT)
-BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_blendvpd", 
IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendvpd, 
"__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) 
V2DF_FTYPE_V2DF_V2DF_V2DF)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_blendvps", 
IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", 
IX86_BUILTIN_DPPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", 
IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index d77ad83e437..5a561966eb4 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -18368,10 +18368,16 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
}
   break;
 
+case IX86_BUILTIN_BLENDVPD:
+  /* blendvpd is under sse4.1 but pcmpgtq is under sse4.2,
+w/o sse4.2, it's veclowered to scalar operations and
+not combined back.  */
+  if (!TARGET_SSE4_2)
+   break;
+  /* FALLTHRU.  */
 case IX86_BUILTIN_PBLENDVB128:
 case IX86_BUILTIN_PBLENDVB256:
 case IX86_BUILTIN_BLENDVPS:
-case IX86_BUILTIN_BLENDVPD:
 case IX86_BUILTIN_BLENDVPS256:
 case IX86_BUILTIN_BLENDVPD256:
   gcc_assert (n_args == 3);
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c 
b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c
new file mode 100644
index 000..c25d3fbcbd4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-msse4.1 -O2 -mno-sse4.2" } */
+/* { dg-final { scan-assembler-times {(?n)blendvpd[ \t]+%xmm[0-9]+} 1 } } */
+
+#include 
+
+__m128d
+foo (__m128d a, __m128d b, __m128d c)
+{
+  return _mm_blendv_pd (a, b, c);
+}
-- 
2.18.1



Re: [PATCH v2] Performance/size improvement to single_use when matching GIMPLE.

2022-03-16 Thread Richard Biener via Gcc-patches
On Wed, 16 Mar 2022, Roger Sayle wrote:

> 
> Here's version 2 of my patch, incorporating Richard Biener's suggestions.
> I've also done an analysis of the stage3 sizes of gimple-match.o on
> x86_64-pc-linux-gnu, which I believe is dominated by debug information,
> the .o file is 30MB in stage3, but only 4.8M in stage2.  Before my
> proposed patch gimple-match.o is 31385160 bytes.  The patch as proposed
> yesterday (using a single loop in single_use) reduces that to 31105040
> bytes, saving 280120 bytes.  The suggestion to remove the "inline"
> keyword saves only 56 more bytes, but annotating ATTRIBUTE_PURE on a
> function prototype was curiously effective, saving 1888 bytes.
> 
> before:   31385160
> after:31105040  saved 280120
> -inline:  31104984  saved 56
> +pure:31103096  saved 1888
> 
> I'll post two more patches later today, that save an additional 53K
> and 58K respectively (by tweaking genmatch) but the issue is with the
> size of debugging info (or something else?).

You can always use the 'size' tool and report the size of the text
section only ... especially interesting for folks are also
compile-times of stage2 gimple-match.cc since that's built by
the -O0 compiled checking-enabled stage1 compiler during bootstrap.

> This revised patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check with no new failures.
> Ok for mainline?

OK.

Thanks,
Richard.

> 
> 2022-03-16  Roger Sayle  
>   Richard Biener  
> 
> gcc/ChangeLog
>   * gimple-match-head.cc (single_use): Implement inline using a
>   single loop.
> 
> 
> Thanks in advance,
> Roger
> --
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: 15 March 2022 09:18
> > To: Roger Sayle 
> > Cc: 'GCC Patches' ; 'Marc Glisse'
> > 
> > Subject: Re: [PATCH] Performance/size improvement to single_use when
> > matching GIMPLE.
> > 
> > On Tue, 15 Mar 2022, Roger Sayle wrote:
> > 
> > >
> > >
> > > This patch improves the implementation of single_use as used in code
> > >
> > > generated from match.pd for patterns using :s.  The current
> > > implementation
> > >
> > > contains the logic "has_zero_uses (t) || has_single_use (t)" which
> > >
> > > performs a loop over the uses to first check if there are zero
> > > non-debug
> > >
> > > uses [which is rare], then another loop over these uses to check if
> > > there
> > >
> > > is exactly one non-debug use.  This can be better implemented using a
> > >
> > > single loop.
> > >
> > >
> > >
> > > This function is currently inlined over 800 times in gimple-match.cc,
> > >
> > > whose .o on x86_64-pc-linux-gnu is now up to 30 Mbytes, so speeding up
> > >
> > > and shrinking this function should help offset the growth in match.pd
> > >
> > > for GCC 12.
> > >
> > >
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > >
> > > and make -k check with no new failures.  Ok for mainline?
> > 
> > Note the intent of has_zero_uses () is even simpler - it's the case for
> when
> > there's no SSA operand info on the stmt (no update_stmt called yet).  More
> > precisely it wants to catch the case where the definition of the SSA name
> is not
> > in the IL.
> > 
> > I'm not sure if we want to twist the effective semantics at this point (I
> guess we
> > do not want that), so the patch looks like an improvement.  But may I ask
> to
> > move the function out of line for even more savings?  Just put it in
> gimple-
> > match-head.cc and have it not declared inline.  I think we may want to go
> as far
> > and declare the function 'pure' using ATTRIBUTE_PURE.
> > 
> > >
> > >
> > >
> > >
> > > 2022-03-15  Roger Sayle  
> > >
> > >
> > >
> > > gcc/ChangeLog
> > >
> > > * gimple-match-head.cc (single_use): Implement inline using a
> > >
> > > single loop.
> > >
> > >
> > >
> > > Thanks in advance,
> > >
> > > Roger
> > >
> > > --
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/104942: Retain sizetype conversions till the end

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 16, 2022 at 01:56:50PM +0530, Siddhesh Poyarekar wrote:
> Retain the sizetype alloc_object_size to guarantee the assertion in
> size_for_offset and to avoid adding a conversion there.  nop conversions
> are eliminated at the end anyway in dynamic object size computation.
> 
> gcc/ChangeLog:
> 
>   tree-optimization/104942
>   * tree-object-size.cc (alloc_object_size): Remove STRIP_NOPS.
> 
> gcc/testsuite/ChangeLog:
> 
>   tree-optimization/104942
>   * gcc.dg/builtin-dynamic-object-size-0.c (alloc_func_long,
>   test_builtin_malloc_long): New functions.
>   (main): Use it.
> 
> Signed-off-by: Siddhesh Poyarekar 

Ok, thanks.

Jakub



[PATCH] tree-optimization/104942: Retain sizetype conversions till the end

2022-03-16 Thread Siddhesh Poyarekar
Retain the sizetype alloc_object_size to guarantee the assertion in
size_for_offset and to avoid adding a conversion there.  nop conversions
are eliminated at the end anyway in dynamic object size computation.

gcc/ChangeLog:

tree-optimization/104942
* tree-object-size.cc (alloc_object_size): Remove STRIP_NOPS.

gcc/testsuite/ChangeLog:

tree-optimization/104942
* gcc.dg/builtin-dynamic-object-size-0.c (alloc_func_long,
test_builtin_malloc_long): New functions.
(main): Use it.

Signed-off-by: Siddhesh Poyarekar 
---

Testing:

- i686 build and check
- x86_64 bootstrap build and check
- --with-build-config=bootstrap-ubsan

 .../gcc.dg/builtin-dynamic-object-size-0.c| 22 +++
 gcc/tree-object-size.cc   |  5 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index dd8dc99a580..2fca0a9c5b4 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -4,6 +4,15 @@
 typedef __SIZE_TYPE__ size_t;
 #define abort __builtin_abort
 
+void *
+__attribute__ ((alloc_size (1)))
+__attribute__ ((__nothrow__ , __leaf__))
+__attribute__ ((noinline))
+alloc_func_long (long sz)
+{
+  return __builtin_malloc (sz);
+}
+
 void *
 __attribute__ ((alloc_size (1)))
 __attribute__ ((__nothrow__ , __leaf__))
@@ -145,6 +154,16 @@ test_builtin_malloc_condphi5 (size_t sz, int cond, char *c)
   return ret;
 }
 
+long
+__attribute__ ((noinline))
+test_builtin_malloc_long (long sz, long off)
+{
+  char *a = alloc_func_long (sz);
+  char *dest = a + off;
+  long ret = __builtin_dynamic_object_size (dest, 0);
+  return ret;
+}
+
 /* Calloc-like allocator.  */
 
 size_t
@@ -419,6 +438,9 @@ main (int argc, char **argv)
 FAIL ();
   if (test_builtin_malloc_condphi5 (128, 0, argv[0]) != -1)
 FAIL ();
+  long x = 42;
+  if (test_builtin_malloc_long (x, 0) != x)
+FAIL ();
   if (test_calloc (2048, 4) != 2048 * 4)
 FAIL ();
   if (test_builtin_calloc (2048, 8) != 2048 * 8)
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 8be0df6ba40..9728f79da75 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -784,10 +784,7 @@ alloc_object_size (const gcall *call, int object_size_type)
   else if (arg1 >= 0)
 bytes = fold_convert (sizetype, gimple_call_arg (call, arg1));
 
-  if (bytes)
-return STRIP_NOPS (bytes);
-
-  return size_unknown (object_size_type);
+  return bytes ? bytes : size_unknown (object_size_type);
 }
 
 
-- 
2.35.1



Re: [PATCH] [i386] Don't fold __builtin_ia32_blendvpd w/o sse4.2.

2022-03-16 Thread Richard Biener via Gcc-patches
On Wed, Mar 16, 2022 at 9:18 AM liuhongt via Gcc-patches
 wrote:
>
> __builtin_ia32_blendvpd is defined under sse4.1 and gimple folded
> to ((v2di) c) < 0 ? b : a where vec_cmpv2di is under sse4.2 w/o which
> it's veclowered to scalar operations and not combined back in rtl.
>
> Bootstrap and regtest on x86_64-pc-linux-gnu{-m32,}.
> Ready push to main trunk.

OK.

Richard.

> gcc/ChangeLog:
>
> PR target/104946
> * config/i386/i386-builtin.def (BDESC): Add
> CODE_FOR_sse4_1_blendvpd for IX86_BUILTIN_BLENDVPD.
> * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
> __builtin_ia32_blendvpd w/o sse4.2
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/sse4_1-blendvpd-1.c: New test.
> ---
>  gcc/config/i386/i386-builtin.def  |  2 +-
>  gcc/config/i386/i386.cc   |  8 +++-
>  gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c | 11 +++
>  3 files changed, 19 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c
>
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index ea327555639..b410102614d 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -906,7 +906,7 @@ BDESC (OPTION_MASK_ISA_SSSE3 | OPTION_MASK_ISA_MMX, 0, 
> CODE_FOR_ssse3_palignrdi,
>  /* SSE4.1 */
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendpd, 
> "__builtin_ia32_blendpd", IX86_BUILTIN_BLENDPD, UNKNOWN, (int) 
> V2DF_FTYPE_V2DF_V2DF_INT)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendps, 
> "__builtin_ia32_blendps", IX86_BUILTIN_BLENDPS, UNKNOWN, (int) 
> V4SF_FTYPE_V4SF_V4SF_INT)
> -BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, 
> "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) 
> V2DF_FTYPE_V2DF_V2DF_V2DF)
> +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendvpd, 
> "__builtin_ia32_blendvpd", IX86_BUILTIN_BLENDVPD, UNKNOWN, (int) 
> V2DF_FTYPE_V2DF_V2DF_V2DF)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, 
> "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) 
> V4SF_FTYPE_V4SF_V4SF_V4SF)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, 
> "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) 
> V2DF_FTYPE_V2DF_V2DF_INT)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, 
> "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) 
> V4SF_FTYPE_V4SF_V4SF_INT)
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d77ad83e437..5a561966eb4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -18368,10 +18368,16 @@ ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> }
>break;
>
> +case IX86_BUILTIN_BLENDVPD:
> +  /* blendvpd is under sse4.1 but pcmpgtq is under sse4.2,
> +w/o sse4.2, it's veclowered to scalar operations and
> +not combined back.  */
> +  if (!TARGET_SSE4_2)
> +   break;
> +  /* FALLTHRU.  */
>  case IX86_BUILTIN_PBLENDVB128:
>  case IX86_BUILTIN_PBLENDVB256:
>  case IX86_BUILTIN_BLENDVPS:
> -case IX86_BUILTIN_BLENDVPD:
>  case IX86_BUILTIN_BLENDVPS256:
>  case IX86_BUILTIN_BLENDVPD256:
>gcc_assert (n_args == 3);
> diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c 
> b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c
> new file mode 100644
> index 000..c25d3fbcbd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse4_1-blendvpd-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse4.1 -O2 -mno-sse4.2" } */
> +/* { dg-final { scan-assembler-times {(?n)blendvpd[ \t]+%xmm[0-9]+} 1 } } */
> +
> +#include 
> +
> +__m128d
> +foo (__m128d a, __m128d b, __m128d c)
> +{
> +  return _mm_blendv_pd (a, b, c);
> +}
> --
> 2.18.1
>


Re: [PATCH] aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]

2022-03-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> Hi!
>
> We unshare all RTL created during expansion, but when
> aarch64_load_symref_appropriately is called after expansion like in the
> following testcases, we use imm in both HIGH and LO_SUM operands.
> If imm is some RTL that shouldn't be shared like a non-sharable CONST,
> we get at least with --enable-checking=rtl a checking ICE, otherwise might
> just get silently wrong code.
>
> The following patch fixes that by copying it if it can't be shared.
>
> Bootstrapped/regtested on aarch64-linux, ok for trunk?
>
> 2022-03-15  Jakub Jelinek  
>
>   PR target/104910
>   * config/aarch64/aarch64.cc (aarch64_load_symref_appropriately): Copy
>   imm rtx.
>
>   * gcc.dg/pr104910.c: New test.

OK, thanks.  I guess the REG_EQUAL notes might need the same treatment,
but that's probably a separate patch.

Richard

  --- gcc/config/aarch64/aarch64.cc.jj  2022-02-22 10:38:02.404689359 +0100
> +++ gcc/config/aarch64/aarch64.cc 2022-03-14 12:42:00.218975192 +0100
> @@ -3971,7 +3971,7 @@ aarch64_load_symref_appropriately (rtx d
>   if (can_create_pseudo_p ())
> tmp_reg = gen_reg_rtx (mode);
>  
> - emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
> + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
>   emit_insn (gen_add_losym (dest, tmp_reg, imm));
>   return;
>}
> --- gcc/testsuite/gcc.dg/pr104910.c.jj2022-03-14 12:46:20.983327114 
> +0100
> +++ gcc/testsuite/gcc.dg/pr104910.c   2022-03-14 12:46:06.064535794 +0100
> @@ -0,0 +1,14 @@
> +/* PR target/104910 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fno-forward-propagate" } */
> +/* { dg-additional-options "-fstack-protector-all" { target fstack_protector 
> } } */
> +
> +void
> +bar (void);
> +
> +void
> +foo (int x)
> +{
> +  if (x)
> +bar ();
> +}
>
>   Jakub


OpenACC privatization diagnostics vs. 'assert' [PR102841]

2022-03-16 Thread Thomas Schwinge
Hi!

On 2021-05-21T21:29:19+0200, I wrote:
> I've pushed "[OpenACC privatization] Largely extend diagnostics and
> corresponding testsuite coverage [PR90115]" to master branch in commit
> 11b8286a83289f5b54e813f14ff56d730c3f3185

Pushed to master branch commit ab46fc7c3bf01337ea4554f08f4f6b0be8173557
"OpenACC privatization diagnostics vs. 'assert' [PR102841]", see
attached.


Grüße
 Thomas


> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c
> @@ -1,6 +1,11 @@
> -/* { dg-do run } */
> -
>  /* Test if, if_present clauses on host_data construct.  */
> +
> +/* { dg-additional-options "-fopt-info-all-omp" }
> +   { dg-additional-options "--param=openacc-privatization=noisy" }
> +   { dg-additional-options "-foffload=-fopt-info-all-omp" }
> +   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
> +   for testing/documenting aspects of that functionality.  */
> +
>  /* C/C++ variant of 'libgomp.oacc-fortran/host_data-5.F90' */
>
>  #include 
> @@ -14,15 +19,19 @@ foo (float *p, intptr_t host_p, int cond)
>  #pragma acc data copyin(host_p)
>{
>  #pragma acc host_data use_device(p) if_present
> +/* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't 
> candidate for adjusting OpenACC privatization level: not addressable} "" { 
> target *-*-* } .-1 } */
>  /* p not mapped yet, so it will be equal to the host pointer.  */
>  assert (p == (float *) host_p);
>
>  #pragma acc data copy(p[0:100])
> +/* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't 
> candidate for adjusting OpenACC privatization level: not addressable} "" { 
> target *-*-* } .-1 } */
> +/* { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for 
> adjusting OpenACC privatization level: not addressable} "" { target *-*-* } 
> .-2 } */
>  {
>/* Not inside a host_data construct, so p is still the host pointer.  
> */
>assert (p == (float *) host_p);
>
>  #pragma acc host_data use_device(p)
> +  /* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't 
> candidate for adjusting OpenACC privatization level: not addressable} "" { 
> target *-*-* } .-1 } */
>{
>  #if ACC_MEM_SHARED
>   assert (p == (float *) host_p);
> @@ -33,6 +42,7 @@ foo (float *p, intptr_t host_p, int cond)
>}
>
>  #pragma acc host_data use_device(p) if_present
> +  /* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't 
> candidate for adjusting OpenACC privatization level: not addressable} "" { 
> target *-*-* } .-1 } */
>{
>  #if ACC_MEM_SHARED
>   assert (p == (float *) host_p);
> @@ -43,6 +53,8 @@ foo (float *p, intptr_t host_p, int cond)
>}
>
>  #pragma acc host_data use_device(p) if(cond)
> +  /* { dg-note {variable 'host_p\.[0-9]+' declared in block isn't 
> candidate for adjusting OpenACC privatization level: not addressable} "" { 
> target *-*-* } .-1 } */
> +  /* { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate 
> for adjusting OpenACC privatization level: not addressable} "" { target { ! 
> openacc_host_selected } } .-2 } */
>{
>  #if ACC_MEM_SHARED
>   assert (p == (float *) host_p);


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From ab46fc7c3bf01337ea4554f08f4f6b0be8173557 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Mar 2022 08:02:39 +0100
Subject: [PATCH] OpenACC privatization diagnostics vs. 'assert' [PR102841]

It's an orthogonal concern why these diagnostics do appear at all for
non-offloaded OpenACC constructs (where they're not relevant at all); PR90115.

Depending on how 'assert' is implemented, it may cause temporaries to be
created, and/or may lower into 'COND_EXPR's, and
'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'.

Fix-up for commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and
corresponding testsuite coverage [PR90115]".

	PR testsuite/102841
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c
index 66501e614fb..50b4fc264d0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-7.c
@@ -4,7 +4,9 @@
{ dg-additional-options "--param=openacc-privatization=noisy" }
{ dg-additional-options "-foffload=-fopt-info-all-omp" }
{ dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
-   for testing/document

Re: [PATCH] aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 16, 2022 at 09:10:35AM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > We unshare all RTL created during expansion, but when
> > aarch64_load_symref_appropriately is called after expansion like in the
> > following testcases, we use imm in both HIGH and LO_SUM operands.
> > If imm is some RTL that shouldn't be shared like a non-sharable CONST,
> > we get at least with --enable-checking=rtl a checking ICE, otherwise might
> > just get silently wrong code.
> >
> > The following patch fixes that by copying it if it can't be shared.
> >
> > Bootstrapped/regtested on aarch64-linux, ok for trunk?
> >
> > 2022-03-15  Jakub Jelinek  
> >
> > PR target/104910
> > * config/aarch64/aarch64.cc (aarch64_load_symref_appropriately): Copy
> > imm rtx.
> >
> > * gcc.dg/pr104910.c: New test.
> 
> OK, thanks.  I guess the REG_EQUAL notes might need the same treatment,
> but that's probably a separate patch.

If it can be invoked after expansion, yes.  Though, I think e.g. having the
TLS related loads/stores done after expansion is unlikely.
Anyway, --enable-checking=rtl ought to catch this.

Jakub



Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-16 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle  
> wrote:
>> I've been wondering about the possible performance/missed-optimization
>> impact of my patch for PR middle-end/98420 and similar IEEE correctness
>> fixes that disable constant folding optimizations when worrying about -0.0.
>> In the common situation where the floating point result is used by a
>> FP comparison, there's no distinction between +0.0 and -0.0, so some
>> HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.
>>
>> Consider the following interesting example:
>>
>> int foo(int x, double y) {
>> return (x * 0.0) < y;
>> }
>>
>> Although we know that x (when converted to double) can't be NaN or Inf,
>> we still worry that for negative values of x that (x * 0.0) may be -0.0
>> and so perform the multiplication at run-time.  But in this case, the
>> result of the comparison (-0.0 < y) will be exactly the same as (+0.0 < y)
>> for any y, hence the above may be safely constant folded to "0.0 < y"
>> avoiding the multiplication at run-time.
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check with no new failures, and allows GCC to continue to
>> optimize cases that we optimized in GCC 11 (without regard to correctness).
>> Ok for mainline?
>
> Isn't that something that gimple-ssa-backprop.c is designed to handle?  I 
> wonder
> if you can see whether the signed zero speciality can be retrofitted there?
> It currently tracks "sign does not matter", so possibly another state,
> "sign of zero
> does not matter" could be introduced there.

I agree that would be a nice thing to have FWIW.  gimple-ssa-backprop.c
was added to avoid regressions in one specific fold-const -> match.pd
move, but it was supposed to be suitable for other similar things
in future.

Thanks,
Richard


Re: [PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.

2022-03-16 Thread Richard Biener via Gcc-patches
On Wed, Mar 16, 2022 at 3:19 AM liuhongt  wrote:
>
> This patch only handle pure-slp for by-value passed parameter which
> has nothing to do with IPA but psABI. For by-reference passed
> parameter IPA is required.
>
> The patch is aggressive in determining STLF failure, any
> unaligned_load for parm_decl passed by stack is thought to have STLF
> stall issue. It could lose some perf where there's no such issue(1
> vector_load vs n scalar_load + CTOR).
>
> According to microbenchmark in PR, cost of STLF failure is generally
> between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD
> processors.
>
> gcc/ChangeLog:
>
> PR target/101908
> * config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
> (ix86_vector_costs::add_stmt_cost): Add extra cost for
> unsigned_load which may have store forwarding stall issue.
> * config/i386/i386.h (processor_costs): Add new member
> stfs.
> * config/i386/x86-tune-costs.h (i386_size_cost): Initialize
> stfs.
> (i386_cost, i486_cost, pentium_cost, lakemont_cost,
> pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost,
> amdfam10_cost, bdver_cost, znver1_cost, znver2_cost,
> znver3_cost, skylake_cost, icelake_cost, alderlake_cost,
> btver1_cost, btver2_cost, pentium4_cost, nocano_cost,
> atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost,
> core_cost): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr101908-1.c: New test.
> * gcc.target/i386/pr101908-2.c: New test.
> * gcc.target/i386/pr101908-3.c: New test.
> * gcc.target/i386/pr101908-v16hi.c: New test.
> * gcc.target/i386/pr101908-v16qi.c: New test.
> * gcc.target/i386/pr101908-v16sf.c: New test.
> * gcc.target/i386/pr101908-v16si.c: New test.
> * gcc.target/i386/pr101908-v2df.c: New test.
> * gcc.target/i386/pr101908-v2di.c: New test.
> * gcc.target/i386/pr101908-v2hi.c: New test.
> * gcc.target/i386/pr101908-v2qi.c: New test.
> * gcc.target/i386/pr101908-v2sf.c: New test.
> * gcc.target/i386/pr101908-v2si.c: New test.
> * gcc.target/i386/pr101908-v4df.c: New test.
> * gcc.target/i386/pr101908-v4di.c: New test.
> * gcc.target/i386/pr101908-v4hi.c: New test.
> * gcc.target/i386/pr101908-v4qi.c: New test.
> * gcc.target/i386/pr101908-v4sf.c: New test.
> * gcc.target/i386/pr101908-v4si.c: New test.
> * gcc.target/i386/pr101908-v8df-adl.c: New test.
> * gcc.target/i386/pr101908-v8df.c: New test.
> * gcc.target/i386/pr101908-v8di-adl.c: New test.
> * gcc.target/i386/pr101908-v8di.c: New test.
> * gcc.target/i386/pr101908-v8hi-adl.c: New test.
> * gcc.target/i386/pr101908-v8hi.c: New test.
> * gcc.target/i386/pr101908-v8qi-adl.c: New test.
> * gcc.target/i386/pr101908-v8qi.c: New test.
> * gcc.target/i386/pr101908-v8sf-adl.c: New test.
> * gcc.target/i386/pr101908-v8sf.c: New test.
> * gcc.target/i386/pr101908-v8si-adl.c: New test.
> * gcc.target/i386/pr101908-v8si.c: New test.
> ---
>  gcc/config/i386/i386.cc   | 51 +++
>  gcc/config/i386/i386.h|  1 +
>  gcc/config/i386/x86-tune-costs.h  | 28 ++
>  gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++
>  gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++
>  gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++
>  .../gcc.target/i386/pr101908-v16hi.c  |  6 ++
>  .../gcc.target/i386/pr101908-v16qi.c  | 30 +++
>  .../gcc.target/i386/pr101908-v16sf.c  |  6 ++
>  .../gcc.target/i386/pr101908-v16si.c  |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v2df.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v2di.c |  7 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v2hi.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 
>  gcc/testsuite/gcc.target/i386/pr101908-v2sf.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v2si.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v4df.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v4di.c |  7 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v4hi.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 
>  gcc/testsuite/gcc.target/i386/pr101908-v4sf.c |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v4si.c |  6 ++
>  .../gcc.target/i386/pr101908-v8df-adl.c   |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v8df.c |  6 ++
>  .../gcc.target/i386/pr101908-v8di-adl.c   |  7 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v8di.c |  7 ++
>  .../gcc.target/i386/pr101908-v8hi-adl.c   |  6 ++
>  gcc/testsuite/gcc.target/i386/pr101908-v8hi.c |  6 ++
>  .../gcc.target/i386/pr101908-v8qi-adl.c   | 22 +
>  gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 +
>  .../gcc.target/i386/pr101908-

[PATCH] c-family: Fix ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128 [PR101515]

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 15, 2022 at 01:32:46PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Another one is whether we shouldn't punt for FIELD_DECLs that don't have
> nicely printable name of its containing scope, something like:
>   if (tree scope = get_containing_scope (field))
> if (TYPE_P (scope) && TYPE_NAME (scope) == NULL_TREE)
>   break;
>   return cop;
> or so.

This seems to work fine, ok if it passes bootstrap/regtest?
No need to do it for C because it doesn't print the contexts (has no syntax
for that even).

2022-03-16  Jakub Jelinek  

PR c++/101515
* c-pretty-print.cc (c_fold_indirect_ref_for_warn): For C++ don't
return COMPONENT_REFs with FIELD_DECLs whose containing scope can't
be printed.

* g++.dg/warn/pr101515.C: New test.

--- gcc/c-family/c-pretty-print.cc.jj   2022-02-14 13:14:39.037128630 +0100
+++ gcc/c-family/c-pretty-print.cc  2022-03-16 11:16:54.957658064 +0100
@@ -1889,6 +1889,12 @@ c_fold_indirect_ref_for_warn (location_t
  = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (field)));
if (upos <= off && off < upos + el_sz)
  {
+   /* The C++ pretty printers print scope of the FIELD_DECLs,
+  so punt if it is something that can't be printed.  */
+   if (c_dialect_cxx ())
+ if (tree scope = get_containing_scope (field))
+   if (TYPE_P (scope) && TYPE_NAME (scope) == NULL_TREE)
+ break;
tree cop = build3_loc (loc, COMPONENT_REF, TREE_TYPE (field),
   op, field, NULL_TREE);
off = off - upos;
--- gcc/testsuite/g++.dg/warn/pr101515.C.jj 2022-03-16 11:19:09.004806778 
+0100
+++ gcc/testsuite/g++.dg/warn/pr101515.C2022-03-16 11:24:35.136302673 
+0100
@@ -0,0 +1,18 @@
+// PR c++/101515
+// { dg-do compile }
+// { dg-options "-O1 -Wuninitialized" }
+
+struct S { int j; };
+struct T : public S { virtual void h () {} };
+struct U { void (*ptr) (); };
+typedef void (S::*sp) ();
+
+int
+main ()
+{
+  T t;
+  sp x;
+  U *xp = (U *) &x;
+  if (xp->ptr != ((void (*) ()) (sizeof (void *// { dg-warning "is 
used uninitialized" }
+return 1;
+}


Jakub



[PATCH] Generate an if instead of a switch with one case in genmatch.

2022-03-16 Thread Roger Sayle

This patch is the first of two changes to genmatch that don't affect
the executable code, but reduce the amount of debugging information
generated in stage3 of a build, but adhering more closely to GNU style
guidelines.

This patch avoids generating a switch with a single case statement,
instead preferring to use an "if (TREE_CODE (...) == SSA_NAME)" idiom.
These should compile to the same instructions, but the switch requires
more lines, especially when a debugger may set a break point on the
switch, the case, or the (obligatory) final "default:;".  This reduces
the size of gimple-match.o by 53K on x86_64-pc-linux-gnu.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-03-16  Roger Sayle  

gcc/ChangeLog
* gcc/genmatch.cc (dt_node::gen_kids_1): Introduce use_switch
logic that prefers to generate an if statement rather than a
switch containing a single case (and a default).


Thanks in advance,
Roger
--

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 2eda730..c4e22ef 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -2999,6 +2999,7 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
   unsigned gexprs_len = generic_exprs.length ();
   unsigned fns_len = fns.length ();
   unsigned gfns_len = generic_fns.length ();
+  bool use_switch = true;
 
   if (exprs_len || fns_len || gexprs_len || gfns_len)
 {
@@ -3011,7 +3012,30 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
   else
generic_exprs[0]->get_name (kid_opname);
 
-  fprintf_indent (f, indent, "switch (TREE_CODE (%s))\n", kid_opname);
+  if (gexprs_len == 0 && gfns_len == 0)
+   {
+ fprintf_indent (f, indent, "if (TREE_CODE (%s) == SSA_NAME)\n",
+ kid_opname);
+ use_switch = false;
+   }
+  else if (exprs_len == 0
+  && fns_len == 0
+  && gexprs_len == 1
+  && gfns_len == 0)
+   {
+ expr *e = as_a (generic_exprs[0]->op);
+ id_base *op = e->operation;
+ /* id_base doesn't have a != operator.  */
+ if (!(*op == CONVERT_EXPR || *op == NOP_EXPR))
+   {
+ fprintf_indent (f, indent, "if (TREE_CODE (%s) == %s)\n",
+ kid_opname, op->id);
+ use_switch = false;
+   }
+   }
+
+  if (use_switch)
+   fprintf_indent (f, indent, "switch (TREE_CODE (%s))\n", kid_opname);
   fprintf_indent (f, indent, "  {\n");
   indent += 2;
 }
@@ -3019,8 +3043,8 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
   if (exprs_len || fns_len)
 {
   depth++;
-  fprintf_indent (f, indent,
- "case SSA_NAME:\n");
+  if (use_switch)
+   fprintf_indent (f, indent, "case SSA_NAME:\n");
   fprintf_indent (f, indent,
  "  if (gimple *_d%d = get_def (valueize, %s))\n",
  depth, kid_opname);
@@ -3103,7 +3127,8 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
}
}
 
-  fprintf_indent (f, indent, "  break;\n");
+  if (use_switch)
+   fprintf_indent (f, indent, "  break;\n");
 }
 
   for (unsigned i = 0; i < generic_exprs.length (); ++i)
@@ -3115,12 +3140,18 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
   else if (*op == SSA_NAME && (exprs_len || fns_len))
/* Already handled above.  */
continue;
-  else
+  else if (use_switch)
fprintf_indent (f, indent, "case %s:\n", op->id);
-  fprintf_indent (f, indent, "  {\n");
-  generic_exprs[i]->gen (f, indent + 4, gimple, depth);
-  fprintf_indent (f, indent, "break;\n");
-  fprintf_indent (f, indent, "  }\n");
+
+  if (use_switch)
+   {
+ fprintf_indent (f, indent, "  {\n");
+ generic_exprs[i]->gen (f, indent + 4, gimple, depth);
+ fprintf_indent (f, indent, "break;\n");
+ fprintf_indent (f, indent, "  }\n");
+   }
+  else
+   generic_exprs[i]->gen (f, indent + 2, gimple, depth);
 }
 
   if (gfns_len)
@@ -3157,7 +3188,8 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
   if (exprs_len || fns_len || gexprs_len || gfns_len)
 {
   indent -= 4;
-  fprintf_indent (f, indent, "default:;\n");
+  if (use_switch)
+   fprintf_indent (f, indent, "default:;\n");
   fprintf_indent (f, indent, "}\n");
 }
 


[PATCH] Avoid generating unused labels in genmatch.

2022-03-16 Thread Roger Sayle

This patch is the second of two changes to genmatch that don't affect
the executable code, but reduce the amount of debugging information
generated in stage3 of a build, but adhering more closely to GNU style
guidelines.

This patch avoids generating "next_after_fail1:;" label statements
in genmatch, if this label is unused/never referenced as the target
of a goto.  Because jumps to these labels are always forwards, we
know at the point the label is emitted whether it is used or not.
Because a debugger may set a breakpoint these labels, this increase
the size of g{imple,eneric}-match.o in a stage3 build.  To save a
little extra space, I also shorten the label to "Lfail1:;" and only
increment the counter when necessary.  This reduces the size of
gimple-match.o by 58K on x86_64-pc-linux-gnu.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-03-16  Roger Sayle  

gcc/ChangeLog
* gcc/genmatch.cc (fail_label_used): New global variable.
(expr::gen_transform): Set fail_label_used whenever a goto
FAIL_LABEL is generated.
(dt_simplify::gen_1): Clear fail_label_used when generating
a new (provisional) fail_label.  Set fail_label used whenever
a goto fail_label is generated.  Avoid emitting fail_label:
if fail_label_used is false, instead decrement fail_label_cnt.


Thanks in advance,
Roger
--

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 2eda730..4a61f4e 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -2356,6 +2356,8 @@ capture_info::walk_c_expr (c_expr *e)
 /* The current label failing the current matched pattern during
code generation.  */
 static char *fail_label;
+/* Record that a reference/goto to the above label been generated.  */
+static bool fail_label_used;
 
 /* Code generation off the decision tree and the refered AST nodes.  */
 
@@ -2533,6 +2535,7 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
   fprintf_indent (f, indent,
  "if (!_r%d) goto %s;\n",
  depth, fail_label);
+  fail_label_used = true;
   if (*opr == CONVERT_EXPR)
{
  indent -= 4;
@@ -2560,13 +2563,15 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
   fprintf (f, ");\n");
   if (opr->kind != id_base::CODE)
{
- fprintf_indent (f, indent, "if (!_r%d)\n", depth);
- fprintf_indent (f, indent, "  goto %s;\n", fail_label);
+ fprintf_indent (f, indent, "if (!_r%d) goto %s;\n",
+ depth, fail_label);
+ fail_label_used = true;
}
   if (force_leaf)
{
  fprintf_indent (f, indent, "if (EXPR_P (_r%d))\n", depth);
  fprintf_indent (f, indent, "  goto %s;\n", fail_label);
+ fail_label_used = true;
}
   if (*opr == CONVERT_EXPR)
{
@@ -3285,8 +3290,9 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 
   static unsigned fail_label_cnt;
   char local_fail_label[256];
-  snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt);
+  snprintf (local_fail_label, 256, "Lfail%u", ++fail_label_cnt);
   fail_label = local_fail_label;
+  fail_label_used = false;
 
   /* Analyze captures and perform early-outs on the incoming arguments
  that cover cases we cannot handle.  */
@@ -3301,6 +3307,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
fprintf_indent (f, indent,
"if (TREE_SIDE_EFFECTS (_p%d)) goto %s;\n",
i, fail_label);
+   fail_label_used = true;
if (verbose >= 1)
  warning_at (as_a  (s->match)->ops[i]->location,
  "forcing toplevel operand to have no "
@@ -3316,6 +3323,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
fprintf_indent (f, indent,
"if (TREE_SIDE_EFFECTS (captures[%d])) "
"goto %s;\n", i, fail_label);
+   fail_label_used = true;
if (verbose >= 1)
  warning_at (cinfo.info[i].c->location,
  "forcing captured operand to have no "
@@ -3358,7 +3366,12 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 }
 
   if (s->kind == simplify::SIMPLIFY)
-fprintf_indent (f, indent, "if (__builtin_expect (!dbg_cnt (match), 0)) 
goto %s;\n", fail_label);
+{
+  fprintf_indent (f, indent,
+ "if (__builtin_expect (!dbg_cnt (match), 0)) goto %s;\n",
+ fail_label);
+  fail_label_used = true;
+}
 
   fprintf_indent (f, indent, "if (__builtin_expect (dump_file && (dump_flags & 
TDF_FOLDING), 0)) "
   "fprintf (dump_file, \"%s ",
@@ -3430,9 +3443,12 @@ dt_simplify::gen_

Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-16 Thread 程璐璐



在 2022/3/15 下午10:35, Xi Ruoyao 写道:

On Fri, 2022-03-04 at 15:17 +0800, xucheng...@loongson.cn wrote:


v7 -> v8
1. Add new addressing type ADDRESS_REG_REG support.
2. Modify documentation.
3. Eliminate compile-time warnings.

Hi,

The v8 series does not build LoongArch Linux kernel tree
(https://github.com/loongson/linux, loongarch-next branch) successfully.
This is a regression: the v7 series built the kernel fine.

A testcase reduced from the __get_data_asm macro in uaccess.h:

$ cat t1.c
char *ptr;
int offset;
struct m
{
   char a[2];
};

char
x (void)
{
   char t;
   asm volatile("ld.b %0, %1" : "=r"(t) : "o"(*(struct m *)(ptr + offset)));
   return t;
}

$ ./gcc/cc1 t1.c -nostdinc -O
t1.c: In function ‘x’:
t1.c:12:3: error: impossible constraint in ‘asm’
12 |   asm volatile("ld.b %0, %1" : "=r"(t) : "o"(*(struct m *)(ptr + 
offset)));
   |   ^~~

It seems changing the constraint "o" to "m" can work around this issue.
I'm not sure if this is a compiler bug or a kernel bug.


Hi,

LoongArch supports memory modes as follows:

 mode        memory constraint

1. base  + index "k"

2. base + imm12 "m"

3. base + imm16 (immediate 4-byte alignment) "ZC"

4. base + 0 "ZB"

so, this constraint should be "m".





Re: [PATCH] rs6000: Allow using -mlong-double-64 after -mabi={ibm, ieee}longdouble [PR104208, PR87496]

2022-03-16 Thread Segher Boessenkool
On Tue, Mar 15, 2022 at 02:49:39PM -0500, Peter Bergner wrote:
> On 3/4/22 8:14 PM, Peter Bergner wrote:
> > On 3/4/22 11:33 AM, Peter Bergner wrote:
> >>> Ok pushed to trunk.  I haven't determined yet whether we need this on GCC 
> >>> 11 yet.
> >>> I'll check on that and report back.  Thanks!
> >>
> >> I've confirmed that GCC 11 fails the same way and that the backported patch
> >> fixes the issue there too.  Ok for GCC 11 assuming my full regression 
> >> testing
> >> is clean?
> >>
> >> GCC 10 has the same checking code, so it looks to need the backport as 
> >> well.
> >> I'll go ahead and backport and regression test it there too.
> > 
> > The backports to GCC 11 and GCC 10 bootstrapped and regtested with no 
> > regressions.
> > Ok for the GCC 11 and GCC 10 release branches after a day or two of baking 
> > on
> > trunk?
> 
> Ping.
> 
> The trunk patch has been confirmed to fix the glibc build errors and no issues
> with the patch has surfaced, so ok for the GCC11 and GCC10 release branches?

If you can confirm it fixes the glibc build error on 11 and 10 as well,
then yes please.  Thanks!


Segher


[PATCH] tree-optimization/102008 - restore if-conversion of adjacent loads

2022-03-16 Thread Richard Biener via Gcc-patches
The following re-orders the newly added code sinking pass before
the last phiopt pass which performs hoisting of adjacent loads
with the intent to enable if-conversion on those.

I've added the aarch64 specific testcase from the PR.

Bootstrapped and tested on x86_64-unknown-linux-gnu, verified
the testcase fails/passes with a aarch64 cross.

OK?

Thanks,
Richard.

2022-03-16  Richard Biener  

PR tree-optimization/102008
* passes.def: Move the added code sinking pass before the
preceeding phiopt pass.

* gcc.target/aarch64/pr102008.c: New testcase.
---
 gcc/passes.def  |  2 +-
 gcc/testsuite/gcc.target/aarch64/pr102008.c | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102008.c

diff --git a/gcc/passes.def b/gcc/passes.def
index f7718181038..c8903b4ec16 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -349,10 +349,10 @@ along with GCC; see the file COPYING3.  If not see
   /* After late CD DCE we rewrite no longer addressed locals into SSA
 form if possible.  */
   NEXT_PASS (pass_forwprop);
+  NEXT_PASS (pass_sink_code);
   NEXT_PASS (pass_phiopt, false /* early_p */);
   NEXT_PASS (pass_fold_builtins);
   NEXT_PASS (pass_optimize_widening_mul);
-  NEXT_PASS (pass_sink_code);
   NEXT_PASS (pass_store_merging);
   NEXT_PASS (pass_tail_calls);
   /* If DCE is not run before checking for uninitialized uses,
diff --git a/gcc/testsuite/gcc.target/aarch64/pr102008.c 
b/gcc/testsuite/gcc.target/aarch64/pr102008.c
new file mode 100644
index 000..d54436c0ffd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr102008.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct Foo {  int a;  int b; };
+
+int test(int side, const struct Foo *foo) {
+  if (side == 1) return foo->a;
+  return foo->b;
+}
+
+/* We want to if-convert the load, not the address.  */
+/* { dg-final { scan-assembler-not "add" } } */
+/* { dg-final { scan-assembler-times "csel" 1 } } */
-- 
2.34.1


Re: [PATCH] tree-optimization/102008 - restore if-conversion of adjacent loads

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 16, 2022 at 01:42:26PM +0100, Richard Biener wrote:
> The following re-orders the newly added code sinking pass before
> the last phiopt pass which performs hoisting of adjacent loads
> with the intent to enable if-conversion on those.
> 
> I've added the aarch64 specific testcase from the PR.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, verified
> the testcase fails/passes with a aarch64 cross.
> 
> OK?
> 
> Thanks,
> Richard.
> 
> 2022-03-16  Richard Biener  
> 
>   PR tree-optimization/102008
>   * passes.def: Move the added code sinking pass before the
>   preceeding phiopt pass.
> 
>   * gcc.target/aarch64/pr102008.c: New testcase.

Ok.

Jakub



Re: [PATCH] Generate an if instead of a switch with one case in genmatch.

2022-03-16 Thread Richard Biener via Gcc-patches
On Wed, Mar 16, 2022 at 1:28 PM Roger Sayle  wrote:
>
>
> This patch is the first of two changes to genmatch that don't affect
> the executable code, but reduce the amount of debugging information
> generated in stage3 of a build, but adhering more closely to GNU style
> guidelines.
>
> This patch avoids generating a switch with a single case statement,
> instead preferring to use an "if (TREE_CODE (...) == SSA_NAME)" idiom.
> These should compile to the same instructions, but the switch requires
> more lines, especially when a debugger may set a break point on the
> switch, the case, or the (obligatory) final "default:;".  This reduces
> the size of gimple-match.o by 53K on x86_64-pc-linux-gnu.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

Hmm, this makes the complicated code emission in gen_kids_1 even more
complicated for a questionable gain which I'd rather not do, debuginfo
savings or not ...

Richard.

>
> 2022-03-16  Roger Sayle  
>
> gcc/ChangeLog
> * gcc/genmatch.cc (dt_node::gen_kids_1): Introduce use_switch
> logic that prefers to generate an if statement rather than a
> switch containing a single case (and a default).
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Avoid generating unused labels in genmatch.

2022-03-16 Thread Richard Biener via Gcc-patches
On Wed, Mar 16, 2022 at 1:31 PM Roger Sayle  wrote:
>
>
> This patch is the second of two changes to genmatch that don't affect
> the executable code, but reduce the amount of debugging information
> generated in stage3 of a build, but adhering more closely to GNU style
> guidelines.
>
> This patch avoids generating "next_after_fail1:;" label statements
> in genmatch, if this label is unused/never referenced as the target
> of a goto.  Because jumps to these labels are always forwards, we
> know at the point the label is emitted whether it is used or not.
> Because a debugger may set a breakpoint these labels, this increase
> the size of g{imple,eneric}-match.o in a stage3 build.  To save a
> little extra space, I also shorten the label to "Lfail1:;" and only
> increment the counter when necessary.  This reduces the size of
> gimple-match.o by 58K on x86_64-pc-linux-gnu.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

I wonder if we could make this nicer by abstracting sth like

class fail_label
{
   fail_label (unsigned num);
   emit_goto () { fprintf (..., "goto L%u", num); used = true; }
   emit_label () { fprintf ("L%u:;\n", num); }
  unsigned num;
  bool used;
};

or some better abstraction capturing all of the global state.

>
>
> 2022-03-16  Roger Sayle  
>
> gcc/ChangeLog
> * gcc/genmatch.cc (fail_label_used): New global variable.
> (expr::gen_transform): Set fail_label_used whenever a goto
> FAIL_LABEL is generated.
> (dt_simplify::gen_1): Clear fail_label_used when generating
> a new (provisional) fail_label.  Set fail_label used whenever
> a goto fail_label is generated.  Avoid emitting fail_label:
> if fail_label_used is false, instead decrement fail_label_cnt.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Avoid generating unused labels in genmatch.

2022-03-16 Thread Richard Biener via Gcc-patches
On Wed, Mar 16, 2022 at 1:59 PM Richard Biener
 wrote:
>
> On Wed, Mar 16, 2022 at 1:31 PM Roger Sayle  
> wrote:
> >
> >
> > This patch is the second of two changes to genmatch that don't affect
> > the executable code, but reduce the amount of debugging information
> > generated in stage3 of a build, but adhering more closely to GNU style
> > guidelines.
> >
> > This patch avoids generating "next_after_fail1:;" label statements
> > in genmatch, if this label is unused/never referenced as the target
> > of a goto.  Because jumps to these labels are always forwards, we
> > know at the point the label is emitted whether it is used or not.
> > Because a debugger may set a breakpoint these labels, this increase
> > the size of g{imple,eneric}-match.o in a stage3 build.  To save a
> > little extra space, I also shorten the label to "Lfail1:;" and only
> > increment the counter when necessary.  This reduces the size of
> > gimple-match.o by 58K on x86_64-pc-linux-gnu.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures.  Ok for mainline?
>
> I wonder if we could make this nicer by abstracting sth like
>
> class fail_label
> {
>fail_label (unsigned num);
>emit_goto () { fprintf (..., "goto L%u", num); used = true; }
>emit_label () { fprintf ("L%u:;\n", num); }
>   unsigned num;
>   bool used;
> };
>
> or some better abstraction capturing all of the global state.

Btw, the unused labels were one reason for having

gimple-match.o-warn = -Wno-unused
generic-match.o-warn = -Wno-unused

maybe that's then no longer needed.

> >
> >
> > 2022-03-16  Roger Sayle  
> >
> > gcc/ChangeLog
> > * gcc/genmatch.cc (fail_label_used): New global variable.
> > (expr::gen_transform): Set fail_label_used whenever a goto
> > FAIL_LABEL is generated.
> > (dt_simplify::gen_1): Clear fail_label_used when generating
> > a new (provisional) fail_label.  Set fail_label used whenever
> > a goto fail_label is generated.  Avoid emitting fail_label:
> > if fail_label_used is false, instead decrement fail_label_cnt.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >


Re: [PATCH] OpenMP, Fortran: Bugfix for omp_set_num_teams.

2022-03-16 Thread Marcel Vollweiler

Hi Jakub,


! { dg-do run }
! { dg-additional-options "-fdefault-integer-8" }

program set_num_teams_8
   use omp_lib
   omp_set_num_teams (42)
   if (omp_get_num_teams () .ne. 42) stop 1
end program


I modified your suggested test case a bit:

program set_num_teams_8
  use omp_lib
  use, intrinsic :: iso_fortran_env
  integer(int64) :: x
  x = 42
  call omp_set_num_teams (x)
  if (omp_get_max_teams () .ne. 42) stop 1
end program

I tested it with/without the fix and the test passed/failed as expected.

Hope, that's ok?

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, Fortran: Bugfix for omp_set_num_teams.

This patch fixes a small bug in the omp_set_num_teams implementation.

libgomp/ChangeLog:

* fortran.c (omp_set_num_teams_8_): Fix bug.
* testsuite/libgomp.fortran/icv-8.f90: New test.

diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 8c1cfd1..d984ce5 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -491,7 +491,7 @@ omp_set_num_teams_ (const int32_t *num_teams)
 void
 omp_set_num_teams_8_ (const int64_t *num_teams)
 {
-  omp_set_max_active_levels (TO_INT (*num_teams));
+  omp_set_num_teams (TO_INT (*num_teams));
 }
 
 int32_t
diff --git a/libgomp/testsuite/libgomp.fortran/icv-8.f90 
b/libgomp/testsuite/libgomp.fortran/icv-8.f90
new file mode 100644
index 000..9478c15
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-8.f90
@@ -0,0 +1,10 @@
+! This tests 'set_num_teams_8' function.
+
+program set_num_teams_8
+  use omp_lib
+  use, intrinsic :: iso_fortran_env
+  integer(int64) :: x
+  x = 42
+  call omp_set_num_teams (x)
+  if (omp_get_max_teams () .ne. 42) stop 1
+end program


Re: [PATCH] RISC-V: Handle combine extension in canonical ordering.

2022-03-16 Thread Kito Cheng via Gcc-patches
Hi Shi-Hua:

Thanks, generally it's LGTM, just a few coding style issues. I've
fixed that and  committed this time,
but just let you know where you should update your coding style.

And you could use git clang-format and use /contrib/clang-format
as format file to save
your time to indent that.

https://stackoverflow.com/questions/43974371/run-git-clang-format-on-series-of-git-commits

On Tue, Mar 8, 2022 at 11:31 AM  wrote:
>
> From: LiaoShihua 
>
> The crypto extension have several shorthand extensions that don't consist of 
> any extra instructions.
> Take zk for example, while the extension would imply zkn, zkr, zkt.
> The 3 extensions should also combine back into zk to maintain the canonical 
> order in isa strings.
> This patch addresses the above.
> And if the other extension has the same situation, you can add them in 
> riscv_combine_info[]
>
>
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc 
> (riscv_subset_list::handle_combine_ext):Combine back into zk to maintain the 
> canonical order in isa strings.
> (riscv_subset_list::parse):Ditto.

You need to mention riscv_combine_info in the change log too :)

> * config/riscv/riscv-subset.h:Declare handle_combine_ext();
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/predef-17.c: New test.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc| 56 +++
>  gcc/config/riscv/riscv-subset.h|  1 +
>  gcc/testsuite/gcc.target/riscv/predef-17.c | 63 ++
>  3 files changed, 120 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-17.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index a904893b9ed..1c06f83cc1c 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -189,6 +189,16 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
>  };
>
> +/* Combine extensions defined in this table  */
> +static const struct riscv_ext_version riscv_combine_info[] =
> +{
> +  {"zk",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zkn",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zks",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  /* Terminate the list.  */
> +  {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
> +};
> +
>  static const riscv_cpu_info riscv_cpu_tables[] =
>  {
>  #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
> @@ -813,6 +823,50 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t 
> *ext)
>  }
>  }
>
> +/* Check any combine extensions for EXT.  */
> +void
> +riscv_subset_list::handle_combine_ext (riscv_subset_list *subset_list)

You don't need to pass subset_list again, you can just use `this`

> +{
> +  const riscv_ext_version *combine_info;
> +  const riscv_implied_info_t *implied_info;
> +  bool IsCombined = false;

is_combined

> +
> +  for (combine_info = &riscv_combine_info[0]; combine_info->name; 
> ++combine_info)
> +  {

for (...)
  { // two space

> +
> +/* Skip if combine extensions are present */
> +if (subset_list->lookup(combine_info->name))
> +  continue;
> +
> +/* Find all extensions of the combine extension   */
> +for (implied_info = &riscv_implied_info[0]; implied_info->ext; 
> ++implied_info)
> +{
> +  /* Skip if implied extension don't match combine extension */
> +  if (strcmp(combine_info->name, implied_info->ext) != 0)
> +continue;
> +
> +  if (subset_list->lookup(implied_info->implied_ext))
> +  {
> +IsCombined = true;
> +  }

No curly brackets needed if only one line.

> +  else
> +  {
> +IsCombined = false;
> +break;
> +  }
> +}
> +
> +/* Add combine extensions */
> +if (IsCombined)
> +{
> +  if (subset_list->lookup(combine_info->name) == NULL)
> +  {
> +subset_list->add (combine_info->name, combine_info->major_version, 
> combine_info->minor_version, false, true);
> +  }
> +}
> +  }
> +}
> +
>  /* Parsing function for multi-letter extensions.
>
> Return Value:
> @@ -992,6 +1046,8 @@ riscv_subset_list::parse (const char *arch, location_t 
> loc)
>subset_list->handle_implied_ext (itr);
>  }
>
> +  subset_list->handle_combine_ext (subset_list);
> +
>return subset_list;
>
>  fail:
> diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
> index 4f3556a8d9b..da2e22d34f2 100644
> --- a/gcc/config/riscv/riscv-subset.h
> +++ b/gcc/config/riscv/riscv-subset.h
> @@ -68,6 +68,7 @@ private:
>  const char *);
>
>void handle_implied_ext (riscv_subset_t *);
> +  void handle_combine_ext (riscv_subset_list *);
>
>  public:
>~riscv_subset_list ();
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-17.c 
> b/gcc/testsuite/gcc.target/riscv/predef-17.c
> new file mode 100644
> index 000..68f5f95a66c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/predef-17.

[PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]

2022-03-16 Thread Rainer Orth
This patch documents the Solaris-specific D bootstrap requirements.

Tested by building and inspecting gccinstall.{pdf,info}.

Ok for trunk?

I've omitted the Darwin-specific stuff so far documented in PRs d/103577
and d/103578:

* needs --enable-libphobos
* top of gcc-11 branch only
* backport of -static-libphobos patch
* Darwin/i386 doesn't work at all

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2022-03-16  Rainer Orth  

gcc:
PR d/103528
* doc/install.texi (Tools/packages necessary for building GCC)
(GDC): Document libphobos requirement.
(Host/target specific installation notes for GCC, *-*-solaris2*):
Document libphobos and GDC specifics.

# HG changeset patch
# Parent  33de4c9d886299fd8cc97e20c0f761c2f28a3eef
doc: Document Solaris D bootstrap requirements [PR 103528]

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -287,7 +287,8 @@ section.
 @item @anchor{GDC-prerequisite}GDC
 
 In order to build GDC, the D compiler, you need a working GDC
-compiler (GCC version 9.1 or later), as the D front end is written in D.
+compiler (GCC version 9.1 or later) and D runtime library,
+@samp{libphobos}, as the D front end is written in D.
 
 Versions of GDC prior to 12 can be built with an ISO C++11 compiler, which can
 then be installed and used to bootstrap newer versions of the D front end.
@@ -303,6 +304,10 @@ front end does not make use of any GDC-s
 of the D language, if too old a GDC version is installed and
 @option{--enable-languages=d} is used, the build will fail.
 
+On some targets, @samp{libphobos} isn't enabled by default, but compiles
+and works if @option{--enable-libphobos} is used.  Specifics are
+documented for affected targets.
+
 @item A ``working'' POSIX compatible shell, or GNU bash
 
 Necessary when running @command{configure} because some
@@ -4851,6 +4856,12 @@ GNU binutils.  @samp{libstdc++} symbol v
 appropriate version is found.  Solaris @command{c++filt} from the Solaris
 Studio compilers does @emph{not} work.
 
+In order to build the GNU D compiler, GDC, a working @samp{libphobos} is
+needed.  That library wasn't built by default in GCC 9--11 on SPARC, or
+on x86 when the Solaris assembler is used, but can be enabled by
+configuring with @option{--enable-libphobos}.  Also, GDC 9.4.0 is
+required on x86, while GDC 9.3.0 is known to work on SPARC.
+
 The versions of the GNU Multiple Precision Library (GMP), the MPFR
 library and the MPC library bundled with Solaris 11.3 and later are
 usually recent enough to match GCC's requirements.  There are two
@@ -4864,6 +4875,7 @@ need to configure with @option{--with-gm
 @item
 The version of the MPFR libary included in Solaris 11.3 is too old; you
 need to provide a more recent one.
+
 @end itemize
 
 @html


Re: [PATCH v2] x86: Also check _SOFT_FLOAT in

2022-03-16 Thread H.J. Lu via Gcc-patches
On Tue, Mar 15, 2022 at 6:57 PM Hongtao Liu  wrote:
>
> On Tue, Mar 15, 2022 at 10:40 PM H.J. Lu  wrote:
> >
> > On Mon, Mar 14, 2022 at 7:31 AM H.J. Lu  wrote:
> > >
> > > Push target("general-regs-only") in  if x87 is enabled.
> > >
> > > gcc/
> > >
> > > PR target/104890
> > > * config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before
> > > pushing target("general-regs-only").
> > >
> > > gcc/testsuite/
> > >
> > > PR target/104890
> > > * gcc.target/i386/pr104890.c: New test.
> > > ---
> > >  gcc/config/i386/x86gprintrin.h   |  2 +-
> > >  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
> > >  2 files changed, 12 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
> > >
> > > diff --git a/gcc/config/i386/x86gprintrin.h 
> > > b/gcc/config/i386/x86gprintrin.h
> > > index 017ec299793..e0be01d5e78 100644
> > > --- a/gcc/config/i386/x86gprintrin.h
> > > +++ b/gcc/config/i386/x86gprintrin.h
> > > @@ -24,7 +24,7 @@
> > >  #ifndef _X86GPRINTRIN_H_INCLUDED
> > >  #define _X86GPRINTRIN_H_INCLUDED
> > >
> > > -#if defined __MMX__ || defined __SSE__
> > > +#if !defined _SOFT_FLOAT || defined __MMX__ || defined __SSE__
> The patch LGTM.

I am checking it in.

> > >  #pragma GCC push_options
> > >  #pragma GCC target("general-regs-only")
> > >  #define __DISABLE_GENERAL_REGS_ONLY__
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
> > > b/gcc/testsuite/gcc.target/i386/pr104890.c
> > > new file mode 100644
> > > index 000..cb430eef688
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr104890.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do compile { target ia32 } } */
> > > +/* { dg-options "-O2 -mshstk -march=i686" } */
> > > +
> > > +#include 
> > > +
> > > +__attribute__((target ("general-regs-only")))
> > > +int
> > > +foo ()
> > > +{
> > > +  return _get_ssp ();
> > > +}
> > > --
> > > 2.35.1
> > >
> >
> > It also fixed:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99744#c18
> >
> > Any comments on this patch?
> >
> > --
> > H.J.
>
>
>
> --
> BR,
> Hongtao



-- 
H.J.


Re: [PATCH] OpenMP, Fortran: Bugfix for omp_set_num_teams.

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 16, 2022 at 02:06:16PM +0100, Marcel Vollweiler wrote:
> libgomp/ChangeLog:
> 
>   * fortran.c (omp_set_num_teams_8_): Fix bug.
>   * testsuite/libgomp.fortran/icv-8.f90: New test.

Ok, with a minor nit.  Please use
Call omp_set_num_teams instead of omp_set_max_active_levels.
instead of
Fix bug.
in the ChangeLog.

> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
> index 8c1cfd1..d984ce5 100644
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c
> @@ -491,7 +491,7 @@ omp_set_num_teams_ (const int32_t *num_teams)
>  void
>  omp_set_num_teams_8_ (const int64_t *num_teams)
>  {
> -  omp_set_max_active_levels (TO_INT (*num_teams));
> +  omp_set_num_teams (TO_INT (*num_teams));
>  }
>  
>  int32_t
> diff --git a/libgomp/testsuite/libgomp.fortran/icv-8.f90 
> b/libgomp/testsuite/libgomp.fortran/icv-8.f90
> new file mode 100644
> index 000..9478c15
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/icv-8.f90
> @@ -0,0 +1,10 @@
> +! This tests 'set_num_teams_8' function.
> +
> +program set_num_teams_8
> +  use omp_lib
> +  use, intrinsic :: iso_fortran_env
> +  integer(int64) :: x
> +  x = 42
> +  call omp_set_num_teams (x)
> +  if (omp_get_max_teams () .ne. 42) stop 1
> +end program


Jakub



[PATCH][GCC11] tree-optimization/104931 - mitigate niter analysis issue

2022-03-16 Thread Richard Biener via Gcc-patches
The following backports a pointer associating pattern from trunk
that mitigates an issue with number_of_iterations_lt_to_ne in
which context we fail to fold a comparison but succeed in folding
a related subtraction.  In the failure mode this results in
a loop wrongly assumed looping with a bogus number of iterations,
resulting in crashing of the premake application on start.

With the backported simplification we are able to fold the
comparison and correctly compute the loop as not iterating.

I have failed to create a standalone testcase.  I belive part
of the issue is still latent but I have failed to nail down
the issue exactly.  Still I believe the backporting of the
mitigation patch is the most sensible and safe thing at this
point.

Bootstrapped and tested on x86_64-unknown-linux-gnu on the branch,
OK?

Thanks,
Richard.

2022-03-16  Richard Biener  

PR tree-optimization/104931
* match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): New GENERIC
simplification.
---
 gcc/match.pd | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 05a08d0f96a..a005dcd42bd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1997,6 +1997,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
   (pointer_plus (pointer_plus:s @0 @1) @3)
   (pointer_plus @0 (plus @1 @3)))
+#if GENERIC
+(simplify
+  (pointer_plus (convert:s (pointer_plus:s @0 @1)) @3)
+  (convert:type (pointer_plus @0 (plus @1 @3
+#endif
 
 /* Pattern match
  tem1 = (long) ptr1;
-- 
2.34.1


[aarch64] Add Neoverse N2 tuning structs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch adds tuning structures for Neoverse N2.

2022-03-16  Tamar Christina  
               Andre Vieira 

    * config/aarch64/aarch64.cc (neoversen2_addrcost_table, 
neoversen2_regmove_cost,
    neoversen2_advsimd_vector_cost, neoversen2_sve_vector_cost, 
neoversen2_scalar_issue_info,
    neoversen2_advsimd_issue_info, neoversen2_sve_issue_info, 
neoversen2_vec_issue_info,

    neoversen2_tunings): New structs.
    (neoversen2_tunings): Use new structs and update tuning flags.
    (aarch64_vec_op_count::rename_cycles_per_iter): Enable for 
neoversen2 tuning.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d504fe7607b66a9c9ed9b183a2d3c03d34fb0f80..e0bb447beb9eae74551d863505eb265737d36334
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -519,6 +519,24 @@ static const struct cpu_addrcost_table 
neoversev1_addrcost_table =
   0 /* imm_offset  */
 };
 
+static const struct cpu_addrcost_table neoversen2_addrcost_table =
+{
+{
+  1, /* hi  */
+  0, /* si  */
+  0, /* di  */
+  1, /* ti  */
+},
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   1, /* GP2GP  */
@@ -624,6 +642,16 @@ static const struct cpu_regmove_cost a64fx_regmove_cost =
   2 /* FP2FP  */
 };
 
+static const struct cpu_regmove_cost neoversen2_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+ realistic costs compared to memmv_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
 /* Generic costs for Advanced SIMD vector operations.   */
 static const advsimd_vec_cost generic_advsimd_vector_cost =
 {
@@ -2174,12 +2202,166 @@ static const struct tune_params neoverse512tvb_tunings 
=
   &generic_prefetch_tune
 };
 
+static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* ld3_st3_permute_cost  */
+  3, /* ld4_st4_permute_cost  */
+  3, /* permute_cost  */
+  4, /* reduc_i8_cost  */
+  4, /* reduc_i16_cost  */
+  2, /* reduc_i32_cost  */
+  2, /* reduc_i64_cost  */
+  6, /* reduc_f16_cost  */
+  4, /* reduc_f32_cost  */
+  2, /* reduc_f64_cost  */
+  2, /* store_elt_extra_cost  */
+  /* This value is just inherited from the Cortex-A57 table.  */
+  8, /* vec_to_scalar_cost  */
+  /* This depends very much on what the scalar value is and
+ where it comes from.  E.g. some constants take two dependent
+ instructions or a load, while others might be moved from a GPR.
+ 4 seems to be a reasonable compromise in practice.  */
+  4, /* scalar_to_vec_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  /* Although stores have a latency of 2 and compete for the
+ vector pipes, in practice it's better not to model that.  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
+};
+
+static const sve_vec_cost neoversen2_sve_vector_cost =
+{
+  {
+2, /* int_stmt_cost  */
+2, /* fp_stmt_cost  */
+3, /* ld2_st2_permute_cost  */
+4, /* ld3_st3_permute_cost  */
+4, /* ld4_st4_permute_cost  */
+3, /* permute_cost  */
+/* Theoretically, a reduction involving 15 scalar ADDs could
+   complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
+   completes in 11 cycles, so give it a cost of 15 + 6.  */
+21, /* reduc_i8_cost  */
+/* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
+13, /* reduc_i16_cost  */
+/* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
+9, /* reduc_i32_cost  */
+/* Likewise for 1 scalar ADDs (~1 cycles) vs. 2: 1 + 1.  */
+2, /* reduc_i64_cost  */
+/* Theoretically, a reduction involving 7 scalar FADDs could
+   complete in ~8 cycles and would have a cost of 14.  FADDV
+   completes in 6 cycles, so give it a cost of 14 - 2.  */
+12, /* reduc_f16_cost  */
+/* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
+6, /* reduc_f32_cost  */
+/* Likewise for 1 scalar FADDs (~2 cycles) vs. 2: 2 - 0.  */
+2, /* reduc_f64_cost  */
+2, /* store_elt_extra_cost  */
+/* This value is just inherited from the Cortex-A57 table.  */
+8, /* vec_to_scalar_cost  */
+/* See the comment above the Advanced SIMD versions.  */
+4, /* scalar_to_vec_cost  */
+4, /* align_load_cost  */
+4, /* unalign_load_cost  */
+/* Although stores have a latency of 2 and compete for the
+   vector pipes, in practice it's better not to model that.  */
+1, /* unalign_store_cost  */
+1  /* store_cost  */
+  },
+  3, /* clast_cost  */
+  10, /* fadda_f16_cost  */
+  6, /* fadda_f32_cost  */
+  4, /* fadda_f64_cost  */
+  /* A strided Advanced SIMD x64 load would ta

[aarch64] Add Demeter tuning structs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch adds tuning structs for -mcpu/-mtune=demeter.


2022-03-16  Tamar Christina  
   Andre Vieira 

    * config/aarch64/aarch64.cc (demeter_addrcost_table, 
demeter_regmove_cost,
    demeter_advsimd_vector_cost, demeter_sve_vector_cost, 
demeter_scalar_issue_info,
    demeter_advsimd_issue_info, demeter_sve_issue_info, 
demeter_vec_issue_info,

    demeter_vector_cost, demeter_tunings): New tuning structs.
    (aarch64_ve_op_count::rename_cycles_per_iter): Enable for 
demeter tuning.

    * config/aarch64/aarch64-cores.def: Add entry for demeter.
    * config/aarch64/aarch64-tune.md (tune): Add demeter to list.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
eda3a4eb6028d97d50295773c87119d9fa0c41bf..9e6ca84bd4b277ccf2c1809c419703a23075f315
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -172,4 +172,6 @@ AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, 9A,  
AARCH64_FL_FOR_ARCH9 |
 
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, 9A,  AARCH64_FL_FOR_ARCH9 | 
AARCH64_FL_SVE2_BITPERM | AARCH64_FL_MEMTAG | AARCH64_FL_I8MM | 
AARCH64_FL_BF16, neoversen2, 0x41, 0xd48, -1)
 
+AARCH64_CORE("demeter", demeter, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | 
AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | 
AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, demeter, 0x41, 0xd4f, -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
3eed700c063c2abf36cd648b22f713e887255f96..dda5dbdda75c1383004f5dac3f249909f7f23589
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,demeter"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
e0bb447beb9eae74551d863505eb265737d36334..9b6f67dc592d8a447d6b28390c90abe5dcfa5f08
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -537,6 +537,24 @@ static const struct cpu_addrcost_table 
neoversen2_addrcost_table =
   0 /* imm_offset  */
 };
 
+static const struct cpu_addrcost_table demeter_addrcost_table =
+{
+{
+  1, /* hi  */
+  0, /* si  */
+  0, /* di  */
+  1, /* ti  */
+},
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  2, /* post_modify_ld3_st3  */
+  2, /* post_modify_ld4_st4  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   1, /* GP2GP  */
@@ -652,6 +670,16 @@ static const struct cpu_regmove_cost 
neoversen2_regmove_cost =
   2 /* FP2FP  */
 };
 
+static const struct cpu_regmove_cost demeter_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+ realistic costs compared to memmv_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
 /* Generic costs for Advanced SIMD vector operations.   */
 static const advsimd_vec_cost generic_advsimd_vector_cost =
 {
@@ -2391,6 +2419,195 @@ static const struct tune_params neoversen2_tunings =
   &generic_prefetch_tune
 };
 
+static const advsimd_vec_cost demeter_advsimd_vector_cost =
+{
+  2, /* int_stmt_cost  */
+  2, /* fp_stmt_cost  */
+  2, /* ld2_st2_permute_cost */
+  2, /* l

[aarch64] Update regmove costs for neoverse-v1 and neoverse-512tvb tunings

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch updates the register move tunings for 
-mcpu/-mtune={neoverse-v1,neoverse-512tvb}.


2022-03-16  Tamar Christina  
   Andre Vieira 

    * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New 
tuning struct.
    (neoversev1_tunings): Use neoversev1_regmove_cost and update 
store_int cost.

    (neoverse512tvb_tunings): Likewise.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9b6f67dc592d8a447d6b28390c90abe5dcfa5f08..f0485574528c47221e17a3aa5aee70a56508f61e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -670,6 +670,16 @@ static const struct cpu_regmove_cost 
neoversen2_regmove_cost =
   2 /* FP2FP  */
 };
 
+static const struct cpu_regmove_cost neoversev1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+ realistic costs compared to memmv_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2 /* FP2FP  */
+};
+
 static const struct cpu_regmove_cost demeter_regmove_cost =
 {
   1, /* GP2GP  */
@@ -2063,13 +2073,13 @@ static const struct tune_params neoversev1_tunings =
 {
   &cortexa76_extra_costs,
   &neoversev1_addrcost_table,
-  &generic_regmove_cost,
+  &neoversev1_regmove_cost,
   &neoversev1_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_256, /* sve_width  */
   { 4, /* load_int.  */
-1, /* store_int.  */
+2, /* store_int.  */
 6, /* load_fp.  */
 2, /* store_fp.  */
 6, /* load_pred.  */
@@ -2200,13 +2210,13 @@ static const struct tune_params neoverse512tvb_tunings =
 {
   &cortexa76_extra_costs,
   &neoversev1_addrcost_table,
-  &generic_regmove_cost,
+  &neoversev1_regmove_cost,
   &neoverse512tvb_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_128 | SVE_256, /* sve_width  */
   { 4, /* load_int.  */
-1, /* store_int.  */
+2, /* store_int.  */
 6, /* load_fp.  */
 2, /* store_fp.  */
 6, /* load_pred.  */


[PATCH] tree-optimization/104941: Actually assign the conversion result

2022-03-16 Thread Siddhesh Poyarekar
Assign the result of fold_convert to offset.

gcc/ChangeLog:

PR tree-optimization/104941
* tree-object-size.cc (size_for_offset): Assign result of
fold_convert to OFFSET.

gcc/testsuite/ChangeLog:

PR tree-optimization/104941
* gcc.dg/builtin-dynamic-object-size-0.c (S1, S2): New structs.
(test_alloc_nested_structs, g): New functions.
(main): Call test_alloc_nested_structs.

Signed-off-by: Siddhesh Poyarekar 
---

Testing:
- x86_64 bootstrap build and check
- i686 build and check

 .../gcc.dg/builtin-dynamic-object-size-0.c| 34 +++
 gcc/tree-object-size.cc   |  2 +-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 2fca0a9c5b4..e5dc23a908d 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -323,6 +323,34 @@ test_substring (size_t sz, size_t off)
   return __builtin_dynamic_object_size (&str[off], 0);
 }
 
+struct S2
+{
+  char arr[7];
+};
+
+struct S1
+{
+  int pad;
+  struct S2 s2;
+};
+
+static long
+g (struct S1 *s1)
+{
+  struct S2 *s2 = &s1->s2;
+  return __builtin_dynamic_object_size (s2->arr, 0);
+}
+
+long
+__attribute__ ((noinline))
+test_alloc_nested_structs (int x)
+{
+  struct S1 *s1 = __builtin_malloc (x);
+  return g (s1);
+}
+
+/* POINTER_PLUS expressions.  */
+
 size_t
 __attribute__ ((noinline))
 test_substring_ptrplus (size_t sz, size_t off)
@@ -342,6 +370,8 @@ test_substring_ptrplus2 (size_t sz, size_t off, size_t off2)
   return __builtin_dynamic_object_size (ptr + off2, 0);
 }
 
+/* Function parameters.  */
+
 size_t
 __attribute__ ((access (__read_write__, 1, 2)))
 __attribute__ ((noinline))
@@ -382,6 +412,8 @@ test_parmsz_unknown (void *obj, void *unknown, size_t sz, 
int cond)
   return __builtin_dynamic_object_size (cond ? obj : unknown, 0);
 }
 
+/* Loops.  */
+
 size_t
 __attribute__ ((noinline))
 __attribute__ ((access (__read_write__, 1, 2)))
@@ -491,6 +523,8 @@ main (int argc, char **argv)
 FAIL ();
   if (test_dynarray_cond (1) != 8)
 FAIL ();
+  if (test_alloc_nested_structs (42) != 42 - sizeof (int))
+FAIL ();
   if (test_deploop (128, 4) != 128)
 FAIL ();
   if (test_deploop (128, 129) != 32)
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 9728f79da75..e23e80cb726 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -372,7 +372,7 @@ size_for_offset (tree sz, tree offset, tree wholesize = 
NULL_TREE)
 
   /* Safe to convert now, since a valid net offset should be non-negative.  */
   if (!types_compatible_p (TREE_TYPE (offset), sizetype))
-fold_convert (sizetype, offset);
+offset = fold_convert (sizetype, offset);
 
   if (TREE_CODE (offset) == INTEGER_CST)
 {
-- 
2.35.1



[aarch64] Update reg-costs to differentiate between memmove costs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
This patch introduces a struct to differentiate between different 
memmove costs to enable a better modeling of memory operations. These 
have been modelled for 
-mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for 
all other tunings all entries are equal to the old single memmove cost 
to ensure the behaviour remains the same.



2022-03-16  Tamar Christina  
               Andre Vieira 

gcc/ChangeLog:

    * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New 
struct.
    (struct tune_params): Change type of memmov_cost to use 
cpu_memmov_cost.
    * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update 
all tunings

    to use new cpu_memmov_cost struct.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
f2fde35c6eb4989af8736db8fad004171c160282..5190eb8b96ea9af809a28470905b8b85ee720b09
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -508,6 +508,18 @@ struct cpu_prefetch_tune
   const int default_opt_level;
 };
 
+/* Model the costs for loads/stores for reload so that it can do more
+   accurate spill heuristics.  */
+struct cpu_memmov_cost
+{
+  int load_int;
+  int store_int;
+  int load_fp;
+  int store_fp;
+  int load_pred;
+  int store_pred;
+};
+
 struct tune_params
 {
   const struct cpu_cost_table *insn_extra_cost;
@@ -520,7 +532,8 @@ struct tune_params
  or SVE_NOT_IMPLEMENTED if not applicable.  Only used for tuning
  decisions, does not disable VLA vectorization.  */
   unsigned int sve_width;
-  int memmov_cost;
+  /* Structure used by reload to cost spills.  */
+  struct cpu_memmov_cost memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
   const char *function_align;
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9a94f3a30b0f1acc3c9b8a0e3d703e60780d0cbc..3fc5e0bd3d3f39f99b0c8ffb9357603bc0998515
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1291,7 +1291,13 @@ static const struct tune_params generic_tunings =
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_NOT_IMPLEMENTED, /* sve_width  */
-  4, /* memmov_cost  */
+  { 4, /* load_int.  */
+4, /* store_int.  */
+4, /* load_fp.  */
+4, /* store_fp.  */
+4, /* load_pred.  */
+4 /* store_pred.  */
+  }, /* memmov_cost.  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops  */
   "16:12", /* function_align.  */
@@ -1320,7 +1326,13 @@ static const struct tune_params cortexa35_tunings =
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_NOT_IMPLEMENTED, /* sve_width  */
-  4, /* memmov_cost  */
+  { 4, /* load_int.  */
+4, /* store_int.  */
+4, /* load_fp.  */
+4, /* store_fp.  */
+4, /* load_pred.  */
+4 /* store_pred.  */
+  }, /* memmov_cost.  */
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
@@ -1347,7 +1359,13 @@ static const struct tune_params cortexa53_tunings =
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_NOT_IMPLEMENTED, /* sve_width  */
-  4, /* memmov_cost  */
+  { 4, /* load_int.  */
+4, /* store_int.  */
+4, /* load_fp.  */
+4, /* store_fp.  */
+4, /* load_pred.  */
+4 /* store_pred.  */
+  }, /* memmov_cost.  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
@@ -1374,7 +1392,13 @@ static const struct tune_params cortexa57_tunings =
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_NOT_IMPLEMENTED, /* sve_width  */
-  4, /* memmov_cost  */
+  { 4, /* load_int.  */
+4, /* store_int.  */
+4, /* load_fp.  */
+4, /* store_fp.  */
+4, /* load_pred.  */
+4 /* store_pred.  */
+  }, /* memmov_cost.  */
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
@@ -1401,7 +1425,13 @@ static const struct tune_params cortexa72_tunings =
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_NOT_IMPLEMENTED, /* sve_width  */
-  4, /* memmov_cost  */
+  { 4, /* load_int.  */
+4, /* store_int.  */
+4, /* load_fp.  */
+4, /* store_fp.  */
+4, /* load_pred.  */
+4 /* store_pred.  */
+  }, /* memmov_cost.  */
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
@@ -1428,7 +1458,13 @@ static const struct tune_params cortexa73_tunings =
   &generic_branch_cost,
   &generic_approx_modes,
   SVE_NOT_IMPLEMENTED, /* sve_width  */
-  4, /* memmov_cost.  */
+  { 4, /* load_int.  */
+4, /* store_int.  */
+4, /* load_fp.  */
+4, /* store_fp.  */
+4, /* load_pred.  */
+4 /* store_pred.  */
+  

[aarch64] Implement determine_suggested_unroll_factor

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch implements the costing function 
determine_suggested_unroll_factor for aarch64.
It determines the unrolling factor by dividing the number of X 
operations we can do per cycle by the number of X operations in the loop 
body, taking this information from the vec_ops analysis during vector 
costing and the available issue_info information.
We multiply the dividend by a potential reduction_latency, to improve 
our pipeline utilization if we are stalled waiting on a particular 
reduction operation.


Right now we also have a work around for vectorization choices where the 
main loop uses a NEON mode and predication is available, such that if 
the main loop makes use of a NEON pattern that is not directly supported 
by SVE we do not unroll, as that might cause performance regressions in 
cases where we would enter the original main loop's VF. As an example if 
you have a loop where you could use AVG_CEIL with a V8HI mode, you would 
originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated 
epilogue, using other instructions. Whereas with the unrolling you would 
end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping 
the original 8x NEON. In the future, we could handle this differently, 
by either using a different costing model for epilogues, or potentially 
vectorizing more than one single epilogue.


gcc/ChangeLog:

    * config/aarch64/aarch64.cc (aarch64_vector_costs): Define 
determine_suggested_unroll_factor.

    (determine_suggested_unroll_factor): New function.
    (aarch64_vector_costs::finish_costs): Use 
determine_suggested_unroll_factor.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15680,6 +15680,7 @@ private:
   unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
 unsigned int);
   bool prefer_unrolled_loop () const;
+  unsigned int determine_suggested_unroll_factor ();
 
   /* True if we have performed one-time initialization based on the
  vec_info.  */
@@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count *ops,
   return sve_cycles_per_iter;
 }
 
+unsigned int
+aarch64_vector_costs::determine_suggested_unroll_factor ()
+{
+  auto *issue_info = aarch64_tune_params.vec_costs->issue_info;
+  if (!issue_info)
+return 1;
+  bool sve = false;
+  if (aarch64_sve_mode_p (m_vinfo->vector_mode))
+{
+  if (!issue_info->sve)
+   return 1;
+  sve = true;
+}
+  else
+{
+  if (!issue_info->advsimd)
+   return 1;
+  /* If we are trying to unroll a NEON main loop that contains patterns
+that we do not support with SVE and we might use a predicated
+epilogue, we need to be conservative and block unrolling as this might
+lead to a less optimal loop for the first and only epilogue using the
+original loop's vectorization factor.
+TODO: Remove this constraint when we add support for multiple epilogue
+vectorization.  */
+  if (partial_vectors_supported_p ()
+ && param_vect_partial_vector_usage != 0
+ && !TARGET_SVE2)
+   {
+ unsigned int i;
+ stmt_vec_info stmt_vinfo;
+ FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo)
+   {
+ if (is_pattern_stmt_p (stmt_vinfo))
+   {
+ gimple *stmt = stmt_vinfo->stmt;
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt))
+   {
+ enum internal_fn ifn
+   = gimple_call_internal_fn (stmt);
+ switch (ifn)
+   {
+   case IFN_AVG_FLOOR:
+   case IFN_AVG_CEIL:
+ return 1;
+   default:
+ break;
+   }
+   }
+   }
+   }
+   }
+}
+
+  unsigned int max_unroll_factor = 1;
+  aarch64_simd_vec_issue_info const *vec_issue
+= sve ? issue_info->sve : issue_info->advsimd;
+  for (auto vec_ops : m_ops)
+{
+  /* Limit unroll factor to 4 for now.  */
+  unsigned int unroll_factor = 4;
+  unsigned int factor
+   = vec_ops.reduction_latency > 1 ? vec_ops.reduction_latency : 1;
+  unsigned int temp;
+
+  /* Sanity check, this should never happen.  */
+  if ((vec_ops.stores + vec_ops.loads + vec_ops.general_ops) == 0)
+   return 1;
+
+  /* Check stores.  */
+  if (vec_ops.stores > 0)
+   {
+ temp = CEIL (factor * vec_issue->stores_per_cycle,
+  vec_ops.stores);
+ unroll_factor = MIN (unroll_factor, temp);
+   }
+
+  /* Check loads.  */
+  if (vec_ops.loads > 0)
+   {
+

[aarch64] Update Neoverse N2 core definition

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches

Hi,

As requested, I updated the Neoverse N2 entry to use the 
AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the 
ARCH_INDENT to 9A and moved it under the Armv9 cores.


gcc/ChangeLog:

    * config/aarch64/aarch64-cores.def: Update Neoverse N2 core entry.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
9e6ca84bd4b277ccf2c1809c419703a23075f315..41d95354b6a483926b82a64ee6788eaf41814108
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -145,9 +145,6 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, cortexa57, 
8_4A,  AARCH64_FL_FOR
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
 
-/* Armv8.5-A Architecture Processors.  */
-AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, 8_5A, 
AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_F16 | 
AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | 
AARCH64_FL_MEMTAG, neoversen2, 0x41, 0xd49, -1)
-
 /* ARMv8-A big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE 
(0xd07, 0xd03), -1)
@@ -172,6 +169,8 @@ AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, 9A,  
AARCH64_FL_FOR_ARCH9 |
 
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, 9A,  AARCH64_FL_FOR_ARCH9 | 
AARCH64_FL_SVE2_BITPERM | AARCH64_FL_MEMTAG | AARCH64_FL_I8MM | 
AARCH64_FL_BF16, neoversen2, 0x41, 0xd48, -1)
 
+AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | 
AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | 
AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, neoversen2, 0x41, 0xd49, -1)
+
 AARCH64_CORE("demeter", demeter, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | 
AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | 
AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, demeter, 0x41, 0xd4f, -1)
 
 #undef AARCH64_CORE


Re: [PATCH] tree-optimization/104941: Actually assign the conversion result

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 16, 2022 at 08:24:24PM +0530, Siddhesh Poyarekar wrote:
> Assign the result of fold_convert to offset.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/104941
>   * tree-object-size.cc (size_for_offset): Assign result of
>   fold_convert to OFFSET.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/104941
>   * gcc.dg/builtin-dynamic-object-size-0.c (S1, S2): New structs.
>   (test_alloc_nested_structs, g): New functions.
>   (main): Call test_alloc_nested_structs.
> 
> Signed-off-by: Siddhesh Poyarekar 

> --- a/gcc/tree-object-size.cc
> +++ b/gcc/tree-object-size.cc
> @@ -372,7 +372,7 @@ size_for_offset (tree sz, tree offset, tree wholesize = 
> NULL_TREE)
>  
>/* Safe to convert now, since a valid net offset should be non-negative.  
> */
>if (!types_compatible_p (TREE_TYPE (offset), sizetype))
> -fold_convert (sizetype, offset);
> +offset = fold_convert (sizetype, offset);
>  
>if (TREE_CODE (offset) == INTEGER_CST)
>  {

I think the types_compatible_p call is too heavy, you want to check if
emitting the fold_convert to sizetype is unnecessary, which is
  if (!useless_type_conversion_p (sizetype, TREE_TYPE (offset)))
offset = fold_convert (sizetype, offset);
types_compatible_p is that the conversion is useless in both directions.

Ok for trunk with that change.

Jakub



Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]

2022-03-16 Thread Iain Sandoe via Gcc-patches



> On 16 Mar 2022, at 13:28, Rainer Orth  wrote:
> 
> This patch documents the Solaris-specific D bootstrap requirements.
> 
> Tested by building and inspecting gccinstall.{pdf,info}.
> 
> Ok for trunk?
> 
> I've omitted the Darwin-specific stuff so far documented in PRs d/103577
> and d/103578:
> 
> * needs --enable-libphobos
> * top of gcc-11 branch only
> * backport of -static-libphobos patch
> * Darwin/i386 doesn't work at all

— hopefully we will not need to add that  —

The actual problem is known (FWIW there is also some wrong code generated for
x86_64, we just happen to link it to something harmless instead of failing to 
link).

- I remain hopeful that a solution for 11.x can be found before gcc-12 branches,
since without m32 support it is hard for me to fit D into regular Darwin 
testing (at
present D is effectively untested on Darwin since the change to require a D 
bootstrap)
 .. at one stage, D did work on m32 Darwin.

Iain

> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2022-03-16  Rainer Orth  
> 
>   gcc:
>   PR d/103528
>   * doc/install.texi (Tools/packages necessary for building GCC)
>   (GDC): Document libphobos requirement.
>   (Host/target specific installation notes for GCC, *-*-solaris2*):
>   Document libphobos and GDC specifics.
> 
> # HG changeset patch
> # Parent  33de4c9d886299fd8cc97e20c0f761c2f28a3eef
> doc: Document Solaris D bootstrap requirements [PR 103528]
> 
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -287,7 +287,8 @@ section.
> @item @anchor{GDC-prerequisite}GDC
> 
> In order to build GDC, the D compiler, you need a working GDC
> -compiler (GCC version 9.1 or later), as the D front end is written in D.
> +compiler (GCC version 9.1 or later) and D runtime library,
> +@samp{libphobos}, as the D front end is written in D.
> 
> Versions of GDC prior to 12 can be built with an ISO C++11 compiler, which can
> then be installed and used to bootstrap newer versions of the D front end.
> @@ -303,6 +304,10 @@ front end does not make use of any GDC-s
> of the D language, if too old a GDC version is installed and
> @option{--enable-languages=d} is used, the build will fail.
> 
> +On some targets, @samp{libphobos} isn't enabled by default, but compiles
> +and works if @option{--enable-libphobos} is used.  Specifics are
> +documented for affected targets.
> +
> @item A ``working'' POSIX compatible shell, or GNU bash
> 
> Necessary when running @command{configure} because some
> @@ -4851,6 +4856,12 @@ GNU binutils.  @samp{libstdc++} symbol v
> appropriate version is found.  Solaris @command{c++filt} from the Solaris
> Studio compilers does @emph{not} work.
> 
> +In order to build the GNU D compiler, GDC, a working @samp{libphobos} is
> +needed.  That library wasn't built by default in GCC 9--11 on SPARC, or
> +on x86 when the Solaris assembler is used, but can be enabled by
> +configuring with @option{--enable-libphobos}.  Also, GDC 9.4.0 is
> +required on x86, while GDC 9.3.0 is known to work on SPARC.
> +
> The versions of the GNU Multiple Precision Library (GMP), the MPFR
> library and the MPC library bundled with Solaris 11.3 and later are
> usually recent enough to match GCC's requirements.  There are two
> @@ -4864,6 +4875,7 @@ need to configure with @option{--with-gm
> @item
> The version of the MPFR libary included in Solaris 11.3 is too old; you
> need to provide a more recent one.
> +
> @end itemize
> 
> @html



Re: [PATCH] handle "invisible" reference in -Wdangling-pointer (PR104436)

2022-03-16 Thread Martin Sebor via Gcc-patches

On 3/9/22 06:17, Richard Biener wrote:

On Fri, Feb 11, 2022 at 12:05 AM Martin Sebor via Gcc-patches
 wrote:


On 2/8/22 15:37, Jason Merrill wrote:

On 2/8/22 16:59, Martin Sebor wrote:

Transforming a by-value arguments to by-reference as GCC does for some
class types can trigger -Wdangling-pointer when the argument is used
to store the address of a local variable.  Since the stored value is
not accessible in the caller the warning is a false positive.

The attached patch handles this case by excluding PARM_DECLs with
the DECL_BY_REFERENCE bit set from consideration.

While testing the patch I noticed some instances of the warning are
uninitentionally duplicated as the pass runs more than once.  To avoid
that, I also introduce warning suppression into the handler for this
instance of the warning.  (There might still be others.)


The second test should verify that we do warn about returning 't' from a
function; we don't want to ignore the DECL_BY_REFERENCE RESULT_DECL.


The indirect aggregate case isn't handled and needs more work but
since you brought it up I thought I should look into finishing it.
The attached patch #2 adds support for it.  It also incorporates
Richard's suggestion to test PARM_DECL.  Patch #2 applies on top
of patch #1 which is unchanged from the first revision.


patch #1 would be OK if you'd do the PARM_DECL check there - I'd
rather defer #2 to stage1.

Richard.



I have retested it on x86_64-linux and by building Glibc and
Binutils + GDB.

If now is too late for the aggregate enhancement I'm okay with
deferring it until stage 1.


I committed the adjusted patch in r12-7650.  I'll try to remember to
ping the second patch in stage 1.

Martin






+  tree var = SSA_NAME_VAR (lhs_ref.ref);
+  if (DECL_BY_REFERENCE (var))
+/* Avoid by-value arguments transformed into by-reference.  */
+continue;


I wonder if we can we express this property of invisiref parms somewhere
more general?  I imagine optimizations would find it useful as well.
Could pointer_query somehow treat the reference as pointing to a
function-local object?


I don't quite see where in the pointer_query class this would be
useful (the class also isn't used for optimization).  I could add
a helper to the access_ref class to query this property in warning
code but as this is the only potential client I'm also not sure
that's quite what you had in mind.  I'd need some guidance as to
what you're thinking of here.

Martin




I previously tried to express this by marking the reference as
'restrict', but that was wrong
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97474).

Jason





Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]

2022-03-16 Thread Rainer Orth
Hi Iain,

>> I've omitted the Darwin-specific stuff so far documented in PRs d/103577
>> and d/103578:
>> 
>> * needs --enable-libphobos
>> * top of gcc-11 branch only
>> * backport of -static-libphobos patch
>> * Darwin/i386 doesn't work at all
>
> — hopefully we will not need to add that  —
>
> The actual problem is known (FWIW there is also some wrong code generated for
> x86_64, we just happen to link it to something harmless instead of failing
> to link).

I see.  One issue I see during x86_64-darwin testing (10.7 only?) is
that the forkgc2 test hangs and has to be killed manually.

> - I remain hopeful that a solution for 11.x can be found before gcc-12 
> branches,
> since without m32 support it is hard for me to fit D into regular Darwin
> testing (at
> present D is effectively untested on Darwin since the change to require a D
> bootstrap)
>  .. at one stage, D did work on m32 Darwin.

I suspected that the 32-bit issue might be due to several stdint types
being wrong, which was just fixed on master.  At least the issue seemed
similar to PR d/104738.  I'm building a 64-bit trunk gdc as we speak in
order to try that as a bootstrap compiler for a 32-bit build.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]

2022-03-16 Thread Iain Sandoe via Gcc-patches
Hi Rainer,

> On 16 Mar 2022, at 15:55, Rainer Orth  wrote:
> 
>>> I've omitted the Darwin-specific stuff so far documented in PRs d/103577
>>> and d/103578:
>>> 
>>> * needs --enable-libphobos
>>> * top of gcc-11 branch only
>>> * backport of -static-libphobos patch
>>> * Darwin/i386 doesn't work at all
>> 
>> — hopefully we will not need to add that  —
>> 
>> The actual problem is known (FWIW there is also some wrong code generated for
>> x86_64, we just happen to link it to something harmless instead of failing
>> to link).
> 
> I see.  One issue I see during x86_64-darwin testing (10.7 only?) is
> that the forkgc2 test hangs and has to be killed manually.

yes, there are  known issues with GC and threading - but those do not break 
bootstrap
since the compiler is single-threaded.

>> - I remain hopeful that a solution for 11.x can be found before gcc-12 
>> branches,
>> since without m32 support it is hard for me to fit D into regular Darwin
>> testing (at
>> present D is effectively untested on Darwin since the change to require a D
>> bootstrap)
>> .. at one stage, D did work on m32 Darwin.
> 
> I suspected that the 32-bit issue might be due to several stdint types
> being wrong, which was just fixed on master.  At least the issue seemed
> similar to PR d/104738.  I'm building a 64-bit trunk gdc as we speak in
> order to try that as a bootstrap compiler for a 32-bit build.

The issue on 11.x is related to code that tries to synthesize section-start
symbols, it causes a link fail on m32 - which is a bootstrap breaker.

..  if that has been fixed for m32 codegenon master then, yes - presumably we
could build an x86_64 compiler and use that “-m32” to make an i686 bootstrap.

Iain.



Re: [aarch64] Add Neoverse N2 tuning structs

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch adds tuning structures for Neoverse N2.
>
> 2022-03-16  Tamar Christina  
>                 Andre Vieira 
>
>      * config/aarch64/aarch64.cc (neoversen2_addrcost_table, 
> neoversen2_regmove_cost,
>      neoversen2_advsimd_vector_cost, neoversen2_sve_vector_cost, 
> neoversen2_scalar_issue_info,
>      neoversen2_advsimd_issue_info, neoversen2_sve_issue_info, 
> neoversen2_vec_issue_info,
>      neoversen2_tunings): New structs.
>      (neoversen2_tunings): Use new structs and update tuning flags.
>      (aarch64_vec_op_count::rename_cycles_per_iter): Enable for 
> neoversen2 tuning.
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> d504fe7607b66a9c9ed9b183a2d3c03d34fb0f80..e0bb447beb9eae74551d863505eb265737d36334
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -519,6 +519,24 @@ static const struct cpu_addrcost_table 
> neoversev1_addrcost_table =
>0 /* imm_offset  */
>  };
>  
> +static const struct cpu_addrcost_table neoversen2_addrcost_table =
> +{
> +{
> +  1, /* hi  */
> +  0, /* si  */
> +  0, /* di  */
> +  1, /* ti  */
> +},
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  2, /* post_modify_ld3_st3  */
> +  2, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
>  static const struct cpu_regmove_cost generic_regmove_cost =
>  {
>1, /* GP2GP  */
> @@ -624,6 +642,16 @@ static const struct cpu_regmove_cost a64fx_regmove_cost =
>2 /* FP2FP  */
>  };
>  
> +static const struct cpu_regmove_cost neoversen2_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> + realistic costs compared to memmv_cost.  */
> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
>  /* Generic costs for Advanced SIMD vector operations.   */
>  static const advsimd_vec_cost generic_advsimd_vector_cost =
>  {
> @@ -2174,12 +2202,166 @@ static const struct tune_params 
> neoverse512tvb_tunings =
>&generic_prefetch_tune
>  };
>  
> +static const advsimd_vec_cost neoversen2_advsimd_vector_cost =
> +{
> +  2, /* int_stmt_cost  */
> +  2, /* fp_stmt_cost  */
> +  2, /* ld2_st2_permute_cost */
> +  2, /* ld3_st3_permute_cost  */
> +  3, /* ld4_st4_permute_cost  */
> +  3, /* permute_cost  */
> +  4, /* reduc_i8_cost  */
> +  4, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  6, /* reduc_f16_cost  */
> +  4, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  2, /* store_elt_extra_cost  */
> +  /* This value is just inherited from the Cortex-A57 table.  */
> +  8, /* vec_to_scalar_cost  */
> +  /* This depends very much on what the scalar value is and
> + where it comes from.  E.g. some constants take two dependent
> + instructions or a load, while others might be moved from a GPR.
> + 4 seems to be a reasonable compromise in practice.  */
> +  4, /* scalar_to_vec_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  /* Although stores have a latency of 2 and compete for the
> + vector pipes, in practice it's better not to model that.  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
> +};
> +
> +static const sve_vec_cost neoversen2_sve_vector_cost =
> +{
> +  {
> +2, /* int_stmt_cost  */
> +2, /* fp_stmt_cost  */
> +3, /* ld2_st2_permute_cost  */
> +4, /* ld3_st3_permute_cost  */
> +4, /* ld4_st4_permute_cost  */
> +3, /* permute_cost  */
> +/* Theoretically, a reduction involving 15 scalar ADDs could
> +   complete in ~5 cycles and would have a cost of 15.  [SU]ADDV
> +   completes in 11 cycles, so give it a cost of 15 + 6.  */
> +21, /* reduc_i8_cost  */
> +/* Likewise for 7 scalar ADDs (~3 cycles) vs. 9: 7 + 6.  */
> +13, /* reduc_i16_cost  */
> +/* Likewise for 3 scalar ADDs (~2 cycles) vs. 8: 3 + 6.  */
> +9, /* reduc_i32_cost  */
> +/* Likewise for 1 scalar ADDs (~1 cycles) vs. 2: 1 + 1.  */

typo: 1 scalar ADD.

> +2, /* reduc_i64_cost  */
> +/* Theoretically, a reduction involving 7 scalar FADDs could
> +   complete in ~8 cycles and would have a cost of 14.  FADDV
> +   completes in 6 cycles, so give it a cost of 14 - 2.  */
> +12, /* reduc_f16_cost  */
> +/* Likewise for 3 scalar FADDs (~4 cycles) vs. 4: 6 - 0.  */
> +6, /* reduc_f32_cost  */
> +/* Likewise for 1 scalar FADDs (~2 cycles) vs. 2: 2 - 0.  */

Similarly here.

OK with those changes, thanks.

Richard

> +2, /* reduc_f64_cost  */
> +2, /* store_elt_extra_cost  */
> +/* This value is just inherited from the Cortex-A57 table.  */
> +8, /* vec_to_scalar_cost  */
> +/* See the comment above the Advanced SIMD versions.  */
> +4, /* scalar_to_vec_cost  */
> +4, /* align_load_cost  */
>

[C PATCH] PR c/98198: ICE-on-invalid-code error recovery.

2022-03-16 Thread Roger Sayle

This is Christophe Lyon's fix to PR c/98198, an ICE-on-invalid-code
regression affecting mainline, and a suitable testcase.
Tested on x86_64-pc-linux-gnu with make bootstrap and make -k check
with no new failures.  Ok for mainline?


2022-03-16  Christophe Lyon  
Roger Sayle  

gcc/c-family/ChangeLog
PR c/98198
* c-attribs.cc (decl_or_type_attrs): Add error_mark_node check.

gcc/testsuite/ChangeLog
PR c/98198
* gcc.dg/pr98198.c: New test case.

Thanks in advance,
Roger
--

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index c885d88..111a33f 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -806,6 +806,8 @@ decl_or_type_attrs (tree node)
return attrs;
 
   tree type = TREE_TYPE (node);
+  if (type == error_mark_node)
+   return NULL_TREE;
   return TYPE_ATTRIBUTES (type);
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr98198.c b/gcc/testsuite/gcc.dg/pr98198.c
new file mode 100644
index 000..489afae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98198.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+static inline void sub_1 ( ) { 
+   struct struct_1 var_9 , var_10
+}
+
+static int var_9[1] __attribute__ ( ( section ( ".data" ) ) ) ; 
+/* { dg-excess-errors "" } */


Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported

2022-03-16 Thread Jonathan Wakely via Gcc-patches
On Tue, 15 Mar 2022 at 14:12, Patrick Palka wrote:
>
> On Mon, 14 Mar 2022, Jonathan Wakely wrote:
>
> > On Mon, 14 Mar 2022 at 14:17, Patrick Palka via Libstdc++
> >  wrote:
> > >
> > > On Fri, 11 Mar 2022, Jonathan Wakely wrote:
> > >
> > > > Patrick, I think this is right, but please take a look to double check.
> > > >
> > > > I think we should fix the feature-test macro conditions for gcc-11 too,
> > > > although it's a bit more complicated there. It should depend on IEEE
> > > > float and double *and* uselocale. We don't need the other changes on the
> > > > branch.
> > >
> > > Don't we still depend on uselocale in GCC 12 for long double from_chars,
> > > at least on targets where long double != binary64?
> >
> > Not after this patch:
> >
> > from_chars(const char* first, const char* last, long double& value,
> >   chars_format fmt) noexcept
> > {
> > -#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
> > -  && ! USE_STRTOD_FOR_FROM_CHARS
> > +#if ! USE_STRTOD_FOR_FROM_CHARS
> > +  // Either long double is the same as double, or we can't use strtold.
> > +  // In the latter case, this might give an incorrect result (e.g. values
> > +  // out of range of double give an error, even if they fit in long 
> > double).
> >
> > If uselocale isn't available, this defines the long double overload in
> > terms of the double one, even if that doesn't always give the right
> > answers. That greatly simplifies the preprocessor conditions for when
> > it's supported. If the float and double forms are present, so is the
> > long double one.
>
> Ah sorry, I had overlooked that part of the patch.  Makes sense and LGTM!

Pushed to trunk now.



Re: [aarch64] Add Demeter tuning structs

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch adds tuning structs for -mcpu/-mtune=demeter.
>
>
> 2022-03-16  Tamar Christina  
>     Andre Vieira 
>
>      * config/aarch64/aarch64.cc (demeter_addrcost_table, 
> demeter_regmove_cost,
>      demeter_advsimd_vector_cost, demeter_sve_vector_cost, 
> demeter_scalar_issue_info,
>      demeter_advsimd_issue_info, demeter_sve_issue_info, 
> demeter_vec_issue_info,
>      demeter_vector_cost, demeter_tunings): New tuning structs.
>      (aarch64_ve_op_count::rename_cycles_per_iter): Enable for 
> demeter tuning.
>      * config/aarch64/aarch64-cores.def: Add entry for demeter.
>      * config/aarch64/aarch64-tune.md (tune): Add demeter to list.
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index 
> eda3a4eb6028d97d50295773c87119d9fa0c41bf..9e6ca84bd4b277ccf2c1809c419703a23075f315
>  100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -172,4 +172,6 @@ AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, 9A,  
> AARCH64_FL_FOR_ARCH9 |
>  
>  AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, 9A,  AARCH64_FL_FOR_ARCH9 | 
> AARCH64_FL_SVE2_BITPERM | AARCH64_FL_MEMTAG | AARCH64_FL_I8MM | 
> AARCH64_FL_BF16, neoversen2, 0x41, 0xd48, -1)
>  
> +AARCH64_CORE("demeter", demeter, cortexa57, 9A, AARCH64_FL_FOR_ARCH9 | 
> AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG 
> | AARCH64_FL_MEMTAG | AARCH64_FL_PROFILE, demeter, 0x41, 0xd4f, -1)
> +
>  #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 
> 3eed700c063c2abf36cd648b22f713e887255f96..dda5dbdda75c1383004f5dac3f249909f7f23589
>  100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> - 
> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2"
> + 
> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,neoversen2,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexx2,demeter"
>   (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> e0bb447beb9eae74551d863505eb265737d36334..9b6f67dc592d8a447d6b28390c90abe5dcfa5f08
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -537,6 +537,24 @@ static const struct cpu_addrcost_table 
> neoversen2_addrcost_table =
>0 /* imm_offset  */
>  };
>  
> +static const struct cpu_addrcost_table demeter_addrcost_table =
> +{
> +{
> +  1, /* hi  */
> +  0, /* si  */
> +  0, /* di  */
> +  1, /* ti  */
> +},
> +  0, /* pre_modify  */
> +  0, /* post_modify  */
> +  2, /* post_modify_ld3_st3  */
> +  2, /* post_modify_ld4_st4  */
> +  0, /* register_offset  */
> +  0, /* register_sextend  */
> +  0, /* register_zextend  */
> +  0 /* imm_offset  */
> +};
> +
>  static const struct cpu_regmove_cost generic_regmove_cost =
>  {
>1, /* GP2GP  */
> @@ -652,6 +670,16 @@ static const struct cpu_regmove_cost 
> neoversen2_regmove_cost =
>2 /* FP2FP  */
>  };
>  
> +static const struct cpu_regmove_cost demeter_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> + realistic costs compared to memmv_cost.  */
> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +

Given that these values are the same as for Neoverse V1, I wonder
whether we should reuse the existing structures, like we do for
cortexa76.  I can see there are

Re: [PATCH] libstdc++: Ensure that std::from_chars is declared when supported

2022-03-16 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Mar 2022 at 14:15, Patrick Palka wrote:
> I think __floating_from_chars_hex should work fine on 16 bit targets,
> so I suppose we could use it in the !USE_LIB_FAST_FLOAT branch as well.

Good point, and even for SIZE_WIDTH >= 32, it might be faster to use
your __floating_from_chars_hex function than to change locale twice
and allocate memory and call strto{f,d,ld}. It certainly avoids the
non-standard errc::out_of_memory result.

So maybe something like the attached semi-tested patch.
commit 57b9e2227c7b14f04f2593ec41e177631b6f3fc2
Author: Jonathan Wakely 
Date:   Wed Mar 16 16:46:07 2022

libstdc++: Prefer our own __floating_from_chars_hex if supported

This adjust the floating-point overloads of std::from_chars so that they
prefer to use __floating_from_chars_hex if possible, instead of using
strtof/strtod/strtold. This avoids changing the per-thread C locale and
allocating memory for a null-terminated string when we can use the
hex-specific function instead.

libstdc++-v3/ChangeLog:

* src/c++17/floating_from_chars.cc (from_chars): Always use
__floating_from_chars_hex for hex format if possible.

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index 4aa2483bc28..ce30752ee95 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -821,13 +821,12 @@ from_chars_result
 from_chars(const char* first, const char* last, float& value,
   chars_format fmt) noexcept
 {
-#if USE_LIB_FAST_FLOAT
+#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64
   if (fmt == chars_format::hex)
 return __floating_from_chars_hex(first, last, value);
-  else
-{
-  return fast_float::from_chars(first, last, value, fmt);
-}
+#endif
+#if USE_LIB_FAST_FLOAT
+  return fast_float::from_chars(first, last, value, fmt);
 #else
   return from_chars_strtod(first, last, value, fmt);
 #endif
@@ -837,13 +836,12 @@ from_chars_result
 from_chars(const char* first, const char* last, double& value,
   chars_format fmt) noexcept
 {
-#if USE_LIB_FAST_FLOAT
+#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64
   if (fmt == chars_format::hex)
 return __floating_from_chars_hex(first, last, value);
-  else
-{
-  return fast_float::from_chars(first, last, value, fmt);
-}
+#endif
+#if USE_LIB_FAST_FLOAT
+  return fast_float::from_chars(first, last, value, fmt);
 #else
   return from_chars_strtod(first, last, value, fmt);
 #endif
@@ -853,18 +851,12 @@ from_chars_result
 from_chars(const char* first, const char* last, long double& value,
   chars_format fmt) noexcept
 {
-#if ! USE_STRTOD_FOR_FROM_CHARS
+#if ! USE_STRTOD_FOR_FROM_CHARS || __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
   // Either long double is the same as double, or we can't use strtold.
   // In the latter case, this might give an incorrect result (e.g. values
   // out of range of double give an error, even if they fit in long double).
   double dbl_value;
-  from_chars_result result;
-  if (fmt == chars_format::hex)
-result = __floating_from_chars_hex(first, last, dbl_value);
-  else
-{
-  result = fast_float::from_chars(first, last, dbl_value, fmt);
-}
+  from_chars_result result = std::from_chars(first, last, dbl_value, fmt);
   if (result.ec == errc{})
 value = dbl_value;
   return result;
@@ -888,7 +880,8 @@ from_chars_result
 from_chars(const char* first, const char* last, __ieee128& value,
   chars_format fmt) noexcept
 {
-  // fast_float doesn't support IEEE binary128 format, but we can use strtold.
+  // fast_float doesn't support IEEE binary128 format, and neither does our
+  // __floating_from_chars_hex, but we can use __strtoieee128 on this target.
   return from_chars_strtod(first, last, value, fmt);
 }
 #endif


[PATCH] c++, v2: Fix up constexpr evaluation of new with zero sized types [PR104568]

2022-03-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 15, 2022 at 04:19:05PM -0400, Jason Merrill wrote:
> > But if you strongly prefer it that way, I can do that.
> > Note, probably not 3 new args but 4, depends on whether we could turn
> > all those cases where the tree arg0 = CALL_EXPR_ARG (oldop, 0);
> > is done but var_size_adjusted is false into assertion failures.
> > I'm worried that with the zero size of element we could end up with
> > a variable number of elements which when multiplied by 0 gives constant 0,
> > though hopefully that would be rejected earlier during constant evaluation.
> 
> Or we could move all the adjustment into a separate function and only ever
> pass the number of elements to build_new_constexpr_heap_type?

So like this?

2022-03-16  Jakub Jelinek  

PR c++/104568
* init.cc (build_new_constexpr_heap_type): Remove FULL_SIZE
argument and its handling, instead add ITYPE2 argument.  Only
support COOKIE_SIZE != NULL.
(build_new_1): If size is 0, change it to 0 * outer_nelts if
outer_nelts is non-NULL.  Pass type rather than elt_type to
maybe_wrap_new_for_constexpr.
* constexpr.cc (build_new_constexpr_heap_type): New function.
(cxx_eval_constant_expression) :
If elt_size is zero sized type, try to recover outer_nelts from
the size argument to operator new/new[] and pass that as
arg_size to build_new_constexpr_heap_type.  Pass ctx,
non_constant_p and overflow_p to that call too.

* g++.dg/cpp2a/constexpr-new22.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-03-16 15:25:26.294551244 +0100
+++ gcc/cp/constexpr.cc 2022-03-16 17:27:08.202184961 +0100
@@ -6422,6 +6422,84 @@ maybe_warn_about_constant_value (locatio
 }
 }
 
+/* For element type ELT_TYPE, return the appropriate type of the heap object
+   containing such element(s).  COOKIE_SIZE is NULL or the size of cookie
+   in bytes.  If COOKIE_SIZE is NULL, return array type
+   ELT_TYPE[FULL_SIZE / sizeof(ELT_TYPE)], otherwise return
+   struct { size_t[COOKIE_SIZE/sizeof(size_t)]; ELT_TYPE[N]; }
+   where N is is computed such that the size of the struct fits into FULL_SIZE.
+   If ARG_SIZE is non-NULL, it is the first argument to the new operator.
+   It should be passed if ELT_TYPE is zero sized type in which case FULL_SIZE
+   will be also 0 and so it is not possible to determine the actual array
+   size.  CTX, NON_CONSTANT_P and OVERFLOW_P are used during constant
+   expression evaluation of subexpressions of ARG_SIZE.  */
+
+tree
+build_new_constexpr_heap_type (const constexpr_ctx *ctx, tree elt_type,
+  tree cookie_size, tree full_size, tree arg_size,
+  bool *non_constant_p, bool *overflow_p)
+{
+  gcc_assert (cookie_size == NULL_TREE || tree_fits_uhwi_p (cookie_size));
+  gcc_assert (tree_fits_uhwi_p (full_size));
+  unsigned HOST_WIDE_INT csz = cookie_size ? tree_to_uhwi (cookie_size) : 0;
+  if (arg_size)
+{
+  STRIP_NOPS (arg_size);
+  if (cookie_size)
+   {
+ if (TREE_CODE (arg_size) != PLUS_EXPR)
+   arg_size = NULL_TREE;
+ else if (TREE_CODE (TREE_OPERAND (arg_size, 0)) == INTEGER_CST
+  && tree_int_cst_equal (cookie_size,
+ TREE_OPERAND (arg_size, 0)))
+   {
+ arg_size = TREE_OPERAND (arg_size, 1);
+ STRIP_NOPS (arg_size);
+   }
+ else if (TREE_CODE (TREE_OPERAND (arg_size, 1)) == INTEGER_CST
+  && tree_int_cst_equal (cookie_size,
+ TREE_OPERAND (arg_size, 1)))
+   {
+ arg_size = TREE_OPERAND (arg_size, 0);
+ STRIP_NOPS (arg_size);
+   }
+ else
+   arg_size = NULL_TREE;
+   }
+  if (arg_size && TREE_CODE (arg_size) == MULT_EXPR)
+   {
+ tree op0 = TREE_OPERAND (arg_size, 0);
+ tree op1 = TREE_OPERAND (arg_size, 1);
+ if (integer_zerop (op0))
+   arg_size
+ = cxx_eval_constant_expression (ctx, op1, false, non_constant_p,
+ overflow_p);
+ else if (integer_zerop (op1))
+   arg_size
+ = cxx_eval_constant_expression (ctx, op0, false, non_constant_p,
+ overflow_p);
+ else
+   arg_size = NULL_TREE;
+   }
+  else
+   arg_size = NULL_TREE;
+}
+
+  unsigned HOST_WIDE_INT fsz = tree_to_uhwi (arg_size ? arg_size : full_size);
+  unsigned HOST_WIDE_INT esz = int_size_in_bytes (elt_type);
+  if (!arg_size)
+{
+  gcc_assert (fsz >= csz);
+  fsz -= csz;
+  if (esz)
+   fsz /= esz;
+}
+  tree itype2 = build_index_type (size_int (fsz - 1));
+  if (!cookie_size)
+return build_cplus_array_type (elt_type, itype2);
+  return build_new_constexpr_heap_type (elt_type, cookie_size, itype2);
+}
+
 /* Attempt to reduce the expressio

Re: RFA: crc builtin functions & optimizations

2022-03-16 Thread Joern Rennecke
On Wed, 16 Mar 2022 at 08:15, Richard Biener  wrote:

> The canonical place to transform loops into builtins would be loop 
> distribution.
> Another place would be final value replacement since you basically replace
> the reduction result with a call to the builtin, but I think
> loop-distribution is
> the better overall place.  See how we match strlen() there.
>
> Richard.

So I suppose that would be something along the line of the below patch?
Except it'd need a lot more checks to make sure it's actually processing a
CRC computation,
and there needs to be code to actually expand the builtin using optabs.
And something needs to be done to make match.pd work on the output.
I'm seeing crcu16 being unrolled by in the *.cunroll dump and the remains
of the loops deleted in the *.dse4 dump, but the patch.pd clause to simplify
the two __builtin_crc8s calls never matches, so we end up with this in
*.optimized:

 ee_u16 crcu16 (ee_u16 newval, ee_u16 crc)
{
  unsigned char _1;
  short unsigned int _2;
  unsigned char _3;
  short unsigned int _24;
  short unsigned int _26;

   [local count: 1073741824]:
  _1 = (unsigned char) newval_4(D);
  _24 = __builtin_crc8s (crc_6(D), _1, 40961);
  _2 = newval_4(D) >> 8;
  _3 = (unsigned char) _2;
  _26 = __builtin_crc8s (_24, _3, 40961); [tail call]
  return _26;

}
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index db6e9096a86..b74e8569b94 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -659,6 +659,9 @@ class loop_distribution
  replace them accordingly.  */
   bool transform_reduction_loop (loop_p loop);
 
+  /* Transform some loops which calculate a CRC.  */
+  bool transform_reduction_loop (loop_p loop, tree niters);
+
   /* Compute topological order for basic blocks.  Topological order is
  needed because data dependence is computed for data references in
  lexicographical order.  */
@@ -3432,6 +3435,26 @@ generate_strlen_builtin_using_rawmemchr (loop_p loop, 
tree reduction_var,
 start_len);
 }
 
+static void
+generate_crc_builtin (loop_p loop, tree reduction_var,
+ tree crc_in, tree data_in, tree xor_val,
+ location_t loc)
+{
+  gimple_seq seq = NULL;
+  tree reduction_var_new = make_ssa_name (TREE_TYPE (reduction_var));
+
+  crc_in = force_gimple_operand (crc_in, &seq, true, NULL_TREE);
+  data_in = force_gimple_operand (data_in, &seq, true, NULL_TREE);
+  tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_CRC8S));
+  gimple *fn_call = gimple_build_call (fn, 3, crc_in, data_in, xor_val);
+  gimple_call_set_lhs (fn_call, reduction_var_new);
+  gimple_set_location (fn_call, loc);
+  gimple_seq_add_stmt (&seq, fn_call);
+
+  generate_reduction_builtin_1 (loop, seq, reduction_var, reduction_var_new,
+   "generated crc%s", E_QImode);
+}
+
 /* Return true if we can count at least as many characters by taking pointer
difference as we can count via reduction_var without an overflow.  Thus
compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type),
@@ -3713,6 +3736,128 @@ loop_distribution::transform_reduction_loop (loop_p 
loop)
   return false;
 }
 
+/* Match loops like:
+
+   [local count: 954449105]:
+  # data_16 = PHI 
+  # crc_22 = PHI 
+  # i_3 = PHI 
+  # ivtmp_7 = PHI 
+  _8 = (unsigned char) crc_22;
+  _1 = _8 ^ data_16;
+  x16_12 = _1 & 1;
+  data_13 = data_16 >> 1;
+  _19 = crc_22 >> 1;
+  if (x16_12 != 0)
+goto ; [50.00%]
+  else
+goto ; [50.00%]
+
+   [local count: 477224553]:
+  goto ; [100.00%]
+
+   [local count: 477224552]:
+  crc_17 = _19 ^ 40961;
+
+   [local count: 954449105]:
+  # crc_4 = PHI 
+  i_18 = i_3 + 1;
+  ivtmp_6 = ivtmp_7 - 1;
+  if (ivtmp_6 != 0)
+goto ; [88.89%]
+  else
+goto ; [11.11%]
+
+   [local count: 848409806]:
+  goto ; [100.00%] */
+
+bool
+loop_distribution::transform_reduction_loop (loop_p loop, tree niters)
+{
+  gimple *reduction_stmt;
+
+  if (!wi::eq_p (wi::to_widest (niters), 7))
+return false;
+
+  if (loop->num_nodes != 5)
+return false;
+
+  reduction_stmt = determine_reduction_stmt (loop);
+  if (reduction_stmt == NULL)
+return false;
+
+  /* Reduction variables are guaranteed to be SSA names.  */
+  tree reduction_var;
+  switch (gimple_code (reduction_stmt))
+{
+case GIMPLE_ASSIGN:
+case GIMPLE_PHI:
+  reduction_var = gimple_get_lhs (reduction_stmt);
+  break;
+default:
+  /* Bail out e.g. for GIMPLE_CALL.  */
+  return false;
+}
+
+  if (EDGE_COUNT (loop->header->preds) != 2)
+return false;
+
+  edge e, entry_edge = NULL, backedge = NULL;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+if (e->src->loop_father != loop)
+  entry_edge = e;
+else
+  backedge = e;
+
+  if (!entry_edge || !backedge)
+return false;
+
+  tree crc_in = NULL_TREE, data_in = NULL_TREE;
+
+  for (gphi_iterator gsi = gsi_start_phis (loop

Re: [aarch64] Update reg-costs to differentiate between memmove costs

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> This patch introduces a struct to differentiate between different 
> memmove costs to enable a better modeling of memory operations. These 
> have been modelled for 
> -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for 
> all other tunings all entries are equal to the old single memmove cost 
> to ensure the behaviour remains the same.

Thanks for doing this.  Having the same cost for loads and stores
has been a long-standing wart.

> 2022-03-16  Tamar Christina  
>                 Andre Vieira 
>
> gcc/ChangeLog:
>
>      * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New 
> struct.
>      (struct tune_params): Change type of memmov_cost to use 
> cpu_memmov_cost.
>      * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update 
> all tunings
>      to use new cpu_memmov_cost struct.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> f2fde35c6eb4989af8736db8fad004171c160282..5190eb8b96ea9af809a28470905b8b85ee720b09
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -508,6 +508,18 @@ struct cpu_prefetch_tune
>const int default_opt_level;
>  };
>  
> +/* Model the costs for loads/stores for reload so that it can do more

I'd say s/reload/the register allocators/ here, since the costs affect
decisions made by IRA too.

> +   accurate spill heuristics.  */
> +struct cpu_memmov_cost
> +{
> +  int load_int;
> +  int store_int;
> +  int load_fp;
> +  int store_fp;
> +  int load_pred;
> +  int store_pred;
> +};
> +
>  struct tune_params
>  {
>const struct cpu_cost_table *insn_extra_cost;
> […]
> @@ -14501,12 +14633,41 @@ aarch64_register_move_cost (machine_mode mode,
>return regmove_cost->FP2FP;
>  }
>  
> +/* Implements TARGET_MEMORY_MOVE_COST.  */
>  static int
> -aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
> -   reg_class_t rclass ATTRIBUTE_UNUSED,
> -   bool in ATTRIBUTE_UNUSED)
> +aarch64_memory_move_cost (machine_mode mode, reg_class_t rclass_i, bool in)
>  {
> -  return aarch64_tune_params.memmov_cost;
> +  enum reg_class rclass = (enum reg_class) rclass_i;
> +  switch (rclass)
> +{
> +case PR_LO_REGS:
> +case PR_HI_REGS:
> +case PR_REGS:
> +  return in ? aarch64_tune_params.memmov_cost.load_pred
> + : aarch64_tune_params.memmov_cost.store_pred;
> +case POINTER_AND_FP_REGS:
> +case ALL_REGS:
> +  {
> + if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
> +   return in ? aarch64_tune_params.memmov_cost.load_pred
> + : aarch64_tune_params.memmov_cost.store_pred;
> +
> + if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode))
> +   return in ? aarch64_tune_params.memmov_cost.load_fp
> + : aarch64_tune_params.memmov_cost.store_fp;
> +
> + return in ? aarch64_tune_params.memmov_cost.load_int
> +   : aarch64_tune_params.memmov_cost.store_int;
> +  }
> +case FP_LO8_REGS:
> +case FP_LO_REGS:
> +case FP_REGS:
> +  return in ? aarch64_tune_params.memmov_cost.load_fp
> + : aarch64_tune_params.memmov_cost.store_fp;
> +default:
> +  return in ? aarch64_tune_params.memmov_cost.load_int
> + : aarch64_tune_params.memmov_cost.store_int;
> +}
>  }

It would be good to avoid listing individual subclasses if possible,
since it's easy for the list to get out of date if more subclasses
are added.

An alternative would be:

  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
  ? reg_classes_intersect_p (rclass, PR_REGS)
  : reg_class_subset_p (rclass, PR_REGS))
return (in
? aarch64_tune_params.memmov_cost.load_pred
: aarch64_tune_params.memmov_cost.store_pred);

  if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode)
  ? reg_classes_intersect_p (rclass, FP_REGS)
  : reg_class_subset_p (rclass, FP_REGS))
return (in
? aarch64_tune_params.memmov_cost.load_fp
: aarch64_tune_params.memmov_cost.store_fp);

  return (in
  ? aarch64_tune_params.memmov_cost.load_int
  : aarch64_tune_params.memmov_cost.store_int);

OK with that change, if it works.

Thanks,
Richard


Re: [aarch64] Update regmove costs for neoverse-v1 and neoverse-512tvb tunings

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch updates the register move tunings for 
> -mcpu/-mtune={neoverse-v1,neoverse-512tvb}.
>
> 2022-03-16  Tamar Christina  
>     Andre Vieira 
>
>      * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New 
> tuning struct.
>      (neoversev1_tunings): Use neoversev1_regmove_cost and update 
> store_int cost.
>      (neoverse512tvb_tunings): Likewise.
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 9b6f67dc592d8a447d6b28390c90abe5dcfa5f08..f0485574528c47221e17a3aa5aee70a56508f61e
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -670,6 +670,16 @@ static const struct cpu_regmove_cost 
> neoversen2_regmove_cost =
>2 /* FP2FP  */
>  };
>  
> +static const struct cpu_regmove_cost neoversev1_regmove_cost =
> +{
> +  1, /* GP2GP  */
> +  /* Spilling to int<->fp instead of memory is recommended so set
> + realistic costs compared to memmv_cost.  */

s/memmv_cost/memmov_cost/

Same for the earlier patches, I just didn't see it till now :-)

OK with that change on the basis that it forms a pair with the follow-on
memmov patch.

Thanks,
Richard

> +  3, /* GP2FP  */
> +  2, /* FP2GP  */
> +  2 /* FP2FP  */
> +};
> +
>  static const struct cpu_regmove_cost demeter_regmove_cost =
>  {
>1, /* GP2GP  */
> @@ -2063,13 +2073,13 @@ static const struct tune_params neoversev1_tunings =
>  {
>&cortexa76_extra_costs,
>&neoversev1_addrcost_table,
> -  &generic_regmove_cost,
> +  &neoversev1_regmove_cost,
>&neoversev1_vector_cost,
>&generic_branch_cost,
>&generic_approx_modes,
>SVE_256, /* sve_width  */
>{ 4, /* load_int.  */
> -1, /* store_int.  */
> +2, /* store_int.  */
>  6, /* load_fp.  */
>  2, /* store_fp.  */
>  6, /* load_pred.  */
> @@ -2200,13 +2210,13 @@ static const struct tune_params 
> neoverse512tvb_tunings =
>  {
>&cortexa76_extra_costs,
>&neoversev1_addrcost_table,
> -  &generic_regmove_cost,
> +  &neoversev1_regmove_cost,
>&neoverse512tvb_vector_cost,
>&generic_branch_cost,
>&generic_approx_modes,
>SVE_128 | SVE_256, /* sve_width  */
>{ 4, /* load_int.  */
> -1, /* store_int.  */
> +2, /* store_int.  */
>  6, /* load_fp.  */
>  2, /* store_fp.  */
>  6, /* load_pred.  */


rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread will schmidt via Gcc-patches
Hi,

RFC/Update support for addg6s instruction.  PR100693

For PR100693, we currently provide an addg6s builtin using unsigned
int arguments, but we are missing an unsigned long long argument
equivalent.  This patch adds an overload to provide the long long
version of the builtin.

unsigned long long __builtin_addg6s (unsigned long long, unsigned long long);

RFC/concerns: This patch works, but looking briefly at intermediate stages
is not behaving quite as I expected.   Looking at the intermediate dumps, I
see in pr100693.original that calls I expect to be routed to the internal
__builtin_addg6s_si() that uses (unsigned int) arguments are instead being
handled by __builtin_addg6s_di() with casts that convert the arguments to
(unsigned long long).
i.e.
 return (unsigned int) __builtin_addg6s_di
 ((long long unsigned int) a, (long long unsigned int) b);

As a test, I see if I swap the order of the builtins in rs6000-overload.def
I end up with code casting the ULL values to UI, which provides truncated
results, and is similar to what occurs today without this patch.

All that said, this patch seems to work.  OK for next stage 1?
Tested on power8BE as well as LE power8,power9,power10.

2022-03-15  Will Schmidt  

gcc/
PR target/100693
* config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs()
  and add entries for __builtin_addg6s_di() and __builtin_addg6s_si().
* config/rs6000/rs6000-overload.def: Add overloaded entries allowing
  __builtin_addg6s() to map to either of the __builtin_addg6s_{di,si}
  builtins.
* config/rs6000/rs6000.md: Add UNSPEC_ADDG6S_SI and UNSPEC_ADDG6S_DI
  unspecs.   Add define_insn entries for addg6s_si and addg6s_di based
  on those unspecs.
* doc/extend.texi:  Add entry for ULL __builtin_addg6s (ULL, ULL);

testsuite/
PR target/100693
* gcc.target/powerpc/pr100693.c:  New test.

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index ae2760c33389..4c23cac26932 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1993,12 +1993,16 @@
 XXSPLTD_V2DI vsx_xxspltd_v2di {}
 
 
 ; Power7 builtins (ISA 2.06).
 [power7]
-  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
-ADDG6S addg6s {}
+  const unsigned long long __builtin_addg6s_di (unsigned long long, \
+   unsigned long long);
+ADDG6S_DI addg6s_di {}
+
+  const unsigned int __builtin_addg6s_si (unsigned int, unsigned int);
+ADDG6S_SI addg6s_si {}
 
   const signed long __builtin_bpermd (signed long, signed long);
 BPERMD bpermd_di {32bit}
 
   const unsigned int __builtin_cbcdtd (unsigned int);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 44e2945aaa0e..931f85b738c5 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -76,10 +76,15 @@
 ; Blank lines may be used as desired in this file between the lines as
 ; defined above; that is, you can introduce as many extra newlines as you
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[ADDG6S, __builtin_i_addg6s, __builtin_addg6s]
+  unsigned long long __builtin_addg6s_di (signed long long, unsigned long 
long);
+ADDG6S_DI
+  unsigned int __builtin_addg6s_si (unsigned int, unsigned int);
+ADDG6S_SI
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
 BCDADD_V1TI
   vuc __builtin_vec_bcdadd (vuc, vuc, const int);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fdfbc6566a5c..d040f127eb55 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -122,11 +122,12 @@ (define_c_enum "unspec"
UNSPEC_P8V_MTVSRWZ
UNSPEC_P8V_RELOAD_FROM_GPR
UNSPEC_P8V_MTVSRD
UNSPEC_P8V_XXPERMDI
UNSPEC_P8V_RELOAD_FROM_VSX
-   UNSPEC_ADDG6S
+   UNSPEC_ADDG6S_SI
+   UNSPEC_ADDG6S_DI
UNSPEC_CDTBCD
UNSPEC_CBCDTD
UNSPEC_DIVE
UNSPEC_DIVEU
UNSPEC_UNPACK_128BIT
@@ -14495,15 +14496,24 @@ (define_peephole2
   operands[5] = change_address (mem, mode, new_addr);
 })

 
 ;; Miscellaneous ISA 2.06 (power7) instructions
-(define_insn "addg6s"
+(define_insn "addg6s_si"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "register_operand" "r")]
-  UNSPEC_ADDG6S))]
+  UNSPEC_ADDG6S_SI))]
+  "TARGET_POPCNTD"
+  "addg6s %0,%1,%2"
+  [(set_attr "type" "integer")])
+
+(define_insn "addg6s_di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "r")
+   (match_operand:DI 2 "register_operand" "r")]
+  UNSPEC_

[PATCH] gimplify: Emit clobbers for TARGET_EXPR_SLOT vars later [PR103984]

2022-03-16 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, we emit a bogus uninitialized warning but
easily could emit wrong-code for it or similar testcases too.
The bug is that we emit clobber for a TARGET_EXPR_SLOT too early:
  D.2499.e = B::qux (&h); [return slot optimization]
  D.2516 = 1;
  try
{
  B::B (&D.2498, &h);
  try
{
  _2 = baz (&D.2498);
  D.2499.f = _2;
  D.2516 = 0;
  try
{
  try
{
  bar (&D.2499);
}
  finally
{
  C::~C (&D.2499);
}
}
  finally
{
  D.2499 = {CLOBBER(eol)};
}
}
  finally
{
  D.2498 = {CLOBBER(eol)};
}
}
  catch
{
  if (D.2516 != 0) goto ; else goto ;
  :
  A::~A (&D.2499.e);
  goto ;
  :
  :
}
The CLOBBER for D.2499 is essentially only emitted on the non-exceptional
path, if B::B or baz throws, then there is no CLOBBER for it but there
is a conditional destructor A::~A (&D.2499.e).  Now, ehcleanup1
sink_clobbers optimization assumes that clobbers in the EH cases are
emitted after last use and so sinks the D.2499 = {CLOBBER(eol)}; later,
so we then have
  # _3 = PHI <1(3), 0(9)>
:
  D.2499 ={v} {CLOBBER(eol)};
  D.2498 ={v} {CLOBBER(eol)};
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  _35 = D.2499.a;
  if (&D.2499.b != _35)
where that _35 = D.2499.a comes from inline expansion of the A::~A dtor,
and that is a load from a clobbered memory.

Now, what the gimplifier sees in this case is a CLEANUP_POINT_EXPR with
somewhere inside of it a TARGET_EXPR for D.2499 (with the C::~C (&D.2499)
cleanup) which in its TARGET_EXPR_INITIAL has another TARGET_EXPR for
D.2516 bool flag which has CLEANUP_EH_ONLY which performs that conditional
A::~A (&D.2499.e) call.
The following patch ensures that CLOBBERs (and asan poisoning) are emitted
after even those gimple_push_cleanup pushed cleanups from within the
TARGET_EXPR_INITIAL gimplification (i.e. the last point where the slot could
be in theory used).  In my first version of the patch I've done it by just
moving the
  /* Add a clobber for the temporary going out of scope, like
 gimplify_bind_expr.  */
  if (gimplify_ctxp->in_cleanup_point_expr
  && needs_to_live_in_memory (temp))
{
...
}
block earlier in gimplify_target_expr, but that regressed a couple of tests
where temp is marked TREE_ADDRESSABLE only during (well, very early during
that) the gimplification of TARGET_EXPR_INITIAL, so we didn't emit e.g. on
pr80032.C or stack2.C tests any clobbers for the slots and thus stack slot
reuse wasn't performed.
So that we don't regress those tests, this patch gimplifies
TARGET_EXPR_INITIAL as before, but doesn't emit it directly into pre_p,
emits it into a temporary sequence.  Then emits the CLOBBER cleanup
into pre_p, then asan poisoning if needed, then appends the
TARGET_EXPR_INITIAL temporary sequence and finally adds TARGET_EXPR_CLEANUP
gimple_push_cleanup.  The earlier a GIMPLE_WCE appears in the sequence, the
outer try/finally or try/catch it is.
So, with this patch the part of the testcase in gimple dump cited above
looks instead like:
  try
{
  D.2499.e = B::qux (&h); [return slot optimization]
  D.2516 = 1;
  try
{
  try
{
  B::B (&D.2498, &h);
  _2 = baz (&D.2498);
  D.2499.f = _2;
  D.2516 = 0;
  try
{
  bar (&D.2499);
}
  finally
{
  C::~C (&D.2499);
}
}
  finally
{
  D.2498 = {CLOBBER(eol)};
}
}
  catch
{
  if (D.2516 != 0) goto ; else goto ;
  :
  A::~A (&D.2499.e);
  goto ;
  :
  :
}
}
  finally
{
  D.2499 = {CLOBBER(eol)};
}

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-16  Jakub Jelinek  

PR middle-end/103984
* gimplify.cc (gimplify_target_expr): Gimplify type sizes and
TARGET_EXPR_INITIAL into a temporary sequence, then push clobbers
and asan unpois

Patch ping (Re: [PATCH] libatomic: Improve 16-byte atomics on Intel AVX [PR104688])

2022-03-16 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590960.html

Thanks.

On Mon, Feb 28, 2022 at 07:06:30AM +0100, Jakub Jelinek wrote:
> As mentioned in the PR, the latest Intel SDM has added:
> "Processors that enumerate support for Intel® AVX (by setting the feature 
> flag CPUID.01H:ECX.AVX[bit 28])
> guarantee that the 16-byte memory operations performed by the following 
> instructions will always be
> carried out atomically:
> • MOVAPD, MOVAPS, and MOVDQA.
> • VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
> • VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128 and 
> k0 (masking disabled).
> (Note that these instructions require the linear addresses of their memory 
> operands to be 16-byte
> aligned.)"
> 
> The following patch deals with it just on the libatomic library side so far,
> currently (since ~ 2017) we emit all the __atomic_* 16-byte builtins as
> library calls since and this is something that we can hopefully backport.
> 
> The patch simply introduces yet another ifunc variant that takes priority
> over the pure CMPXCHG16B one, one that checks AVX and CMPXCHG16B bits and
> on non-Intel clears the AVX bit during detection for now (if AMD comes
> with the same guarantee, we could revert the config/x86/init.c hunk),
> which implements 16-byte atomic load as vmovdqa and 16-byte atomic store
> as vmovdqa followed by mfence.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk so far?
> 
> 2022-02-28  Jakub Jelinek  
> 
>   PR target/104688
>   * Makefile.am (IFUNC_OPTIONS): Change on x86_64 to -mcx16 -mcx16.
>   (libatomic_la_LIBADD): Add $(addsuffix _16_2_.lo,$(SIZEOBJS)) for
>   x86_64.
>   * Makefile.in: Regenerated.
>   * config/x86/host-config.h (IFUNC_COND_1): For x86_64 define to
>   both AVX and CMPXCHG16B bits.
>   (IFUNC_COND_2): Define.
>   (IFUNC_NCOND): For x86_64 define to 2 * (N == 16).
>   (MAYBE_HAVE_ATOMIC_CAS_16, MAYBE_HAVE_ATOMIC_EXCHANGE_16,
>   MAYBE_HAVE_ATOMIC_LDST_16): Define to IFUNC_COND_2 rather than
>   IFUNC_COND_1.
>   (HAVE_ATOMIC_CAS_16): Redefine to 1 whenever IFUNC_ALT != 0.
>   (HAVE_ATOMIC_LDST_16): Redefine to 1 whenever IFUNC_ALT == 1.
>   (atomic_compare_exchange_n): Define whenever IFUNC_ALT != 0
>   on x86_64 for N == 16.
>   (__atomic_load_n, __atomic_store_n): Redefine whenever IFUNC_ALT == 1
>   on x86_64 for N == 16.
>   (atomic_load_n, atomic_store_n): New functions.
>   * config/x86/init.c (__libat_feat1_init): On x86_64 clear bit_AVX
>   if CPU vendor is not Intel.

Jakub



Re: [C PATCH] PR c/98198: ICE-on-invalid-code error recovery.

2022-03-16 Thread Joseph Myers
On Wed, 16 Mar 2022, Roger Sayle wrote:

> This is Christophe Lyon's fix to PR c/98198, an ICE-on-invalid-code
> regression affecting mainline, and a suitable testcase.
> Tested on x86_64-pc-linux-gnu with make bootstrap and make -k check
> with no new failures.  Ok for mainline?
> 
> 
> 2022-03-16  Christophe Lyon  
>   Roger Sayle  
> 
> gcc/c-family/ChangeLog
>   PR c/98198
>   * c-attribs.cc (decl_or_type_attrs): Add error_mark_node check.
> 
> gcc/testsuite/ChangeLog
>   PR c/98198
>   * gcc.dg/pr98198.c: New test case.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch implements the costing function 
> determine_suggested_unroll_factor for aarch64.
> It determines the unrolling factor by dividing the number of X 
> operations we can do per cycle by the number of X operations in the loop 
> body, taking this information from the vec_ops analysis during vector 
> costing and the available issue_info information.
> We multiply the dividend by a potential reduction_latency, to improve 
> our pipeline utilization if we are stalled waiting on a particular 
> reduction operation.
>
> Right now we also have a work around for vectorization choices where the 
> main loop uses a NEON mode and predication is available, such that if 
> the main loop makes use of a NEON pattern that is not directly supported 
> by SVE we do not unroll, as that might cause performance regressions in 
> cases where we would enter the original main loop's VF. As an example if 
> you have a loop where you could use AVG_CEIL with a V8HI mode, you would 
> originally get 8x NEON using AVG_CEIL followed by a 8x SVE predicated 
> epilogue, using other instructions. Whereas with the unrolling you would 
> end up with 16x AVG_CEIL NEON + 8x SVE predicated loop, thus skipping 
> the original 8x NEON. In the future, we could handle this differently, 
> by either using a different costing model for epilogues, or potentially 
> vectorizing more than one single epilogue.
>
> gcc/ChangeLog:
>
>      * config/aarch64/aarch64.cc (aarch64_vector_costs): Define 
> determine_suggested_unroll_factor.
>      (determine_suggested_unroll_factor): New function.
>      (aarch64_vector_costs::finish_costs): Use 
> determine_suggested_unroll_factor.
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> b5687aab59f630920e51b742b80a540c3a56c6c8..9d3a607d378d6a2792efa7c6dece2a65c24e4521
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -15680,6 +15680,7 @@ private:
>unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
>unsigned int);
>bool prefer_unrolled_loop () const;
> +  unsigned int determine_suggested_unroll_factor ();
>  
>/* True if we have performed one-time initialization based on the
>   vec_info.  */
> @@ -16768,6 +16769,105 @@ adjust_body_cost_sve (const aarch64_vec_op_count 
> *ops,
>return sve_cycles_per_iter;
>  }
>  
> +unsigned int
> +aarch64_vector_costs::determine_suggested_unroll_factor ()
> +{
> +  auto *issue_info = aarch64_tune_params.vec_costs->issue_info;
> +  if (!issue_info)
> +return 1;
> +  bool sve = false;
> +  if (aarch64_sve_mode_p (m_vinfo->vector_mode))

Other code uses m_vec_flags & VEC_ANY_SVE for this.

> +{
> +  if (!issue_info->sve)
> + return 1;
> +  sve = true;
> +}
> +  else
> +{
> +  if (!issue_info->advsimd)
> + return 1;

The issue info should instead be taken from vec_ops.simd_issue_info ()
in the loop below.  It can vary for each entry.

> +  /* If we are trying to unroll a NEON main loop that contains patterns

s/a NEON/an Advanced SIMD/

> +  that we do not support with SVE and we might use a predicated
> +  epilogue, we need to be conservative and block unrolling as this might
> +  lead to a less optimal loop for the first and only epilogue using the
> +  original loop's vectorization factor.
> +  TODO: Remove this constraint when we add support for multiple epilogue
> +  vectorization.  */
> +  if (partial_vectors_supported_p ()
> +   && param_vect_partial_vector_usage != 0
> +   && !TARGET_SVE2)
> + {
> +   unsigned int i;
> +   stmt_vec_info stmt_vinfo;
> +   FOR_EACH_VEC_ELT (m_vinfo->stmt_vec_infos, i, stmt_vinfo)
> + {
> +   if (is_pattern_stmt_p (stmt_vinfo))
> + {
> +   gimple *stmt = stmt_vinfo->stmt;
> +   if (is_gimple_call (stmt)
> +   && gimple_call_internal_p (stmt))
> + {
> +   enum internal_fn ifn
> + = gimple_call_internal_fn (stmt);
> +   switch (ifn)
> + {
> + case IFN_AVG_FLOOR:
> + case IFN_AVG_CEIL:
> +   return 1;
> + default:
> +   break;
> + }
> + }
> + }
> + }
> + }

I think we should instead record this during add_stmt_cost, like we do
for the other data that this function uses.

We need to drop the partial_vectors_supported_p () condition though:
enabling SVE must not make Advanced SIMD code worse.  So for now,
Advanced SIMD-only code will have to suffer the missed unrolling
opportunity too.

> +}
> +
> +  unsigned int max_unroll_factor = 1;
> +  aarch64_simd_vec_issue_info const *vec_issue
> += sve ? issue_info->sve : issue_info->a

[committed] analyzer: early rejection of disabled warnings [PR104955]

2022-03-16 Thread David Malcolm via Gcc-patches
Avoid generating execution paths for warnings that are ultimately
rejected due to -Wno-analyzer-* flags.

This improves the test case from taking at least several minutes
(before I killed it) to taking under a second.

This doesn't fix the slowdown seen in PR analyzer/104955 with large
numbers of warnings when the warnings are still enabled.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7677-g7fd6e36ea9aa85.

gcc/analyzer/ChangeLog:
PR analyzer/104955
* diagnostic-manager.cc (get_emission_location): New.
(diagnostic_manager::diagnostic_manager): Initialize
m_num_disabled_diagnostics.
(diagnostic_manager::add_diagnostic): Reject diagnostics that
will eventually be rejected due to being disabled.
(diagnostic_manager::emit_saved_diagnostics): Log the number
of disabled diagnostics.
(diagnostic_manager::emit_saved_diagnostic): Split out logic for
determining emission location to get_emission_location.
* diagnostic-manager.h
(diagnostic_manager::m_num_disabled_diagnostics): New field.
* engine.cc (stale_jmp_buf::get_controlling_option): New.
(stale_jmp_buf::emit): Use it.
* pending-diagnostic.h
(pending_diagnostic::get_controlling_option): New vfunc.
* region-model.cc
(poisoned_value_diagnostic::get_controlling_option): New.
(poisoned_value_diagnostic::emit): Use it.
(shift_count_negative_diagnostic::get_controlling_option): New.
(shift_count_negative_diagnostic::emit): Use it.
(shift_count_overflow_diagnostic::get_controlling_option): New.
(shift_count_overflow_diagnostic::emit): Use it.
(dump_path_diagnostic::get_controlling_option): New.
(dump_path_diagnostic::emit): Use it.
(write_to_const_diagnostic::get_controlling_option): New.
(write_to_const_diagnostic::emit): Use it.
(write_to_string_literal_diagnostic::get_controlling_option): New.
(write_to_string_literal_diagnostic::emit): Use it.
* sm-file.cc (double_fclose::get_controlling_option): New.
(double_fclose::emit): Use it.
(file_leak::get_controlling_option): New.
(file_leak::emit): Use it.
* sm-malloc.cc (mismatching_deallocation::get_controlling_option):
New.
(mismatching_deallocation::emit): Use it.
(double_free::get_controlling_option): New.
(double_free::emit): Use it.
(possible_null_deref::get_controlling_option): New.
(possible_null_deref::emit): Use it.
(possible_null_arg::get_controlling_option): New.
(possible_null_arg::emit): Use it.
(null_deref::get_controlling_option): New.
(null_deref::emit): Use it.
(null_arg::get_controlling_option): New.
(null_arg::emit): Use it.
(use_after_free::get_controlling_option): New.
(use_after_free::emit): Use it.
(malloc_leak::get_controlling_option): New.
(malloc_leak::emit): Use it.
(free_of_non_heap::get_controlling_option): New.
(free_of_non_heap::emit): Use it.
* sm-pattern-test.cc (pattern_match::get_controlling_option): New.
(pattern_match::emit): Use it.
* sm-sensitive.cc
(exposure_through_output_file::get_controlling_option): New.
(exposure_through_output_file::emit): Use it.
* sm-signal.cc (signal_unsafe_call::get_controlling_option): New.
(signal_unsafe_call::emit): Use it.
* sm-taint.cc (tainted_array_index::get_controlling_option): New.
(tainted_array_index::emit): Use it.
(tainted_offset::get_controlling_option): New.
(tainted_offset::emit): Use it.
(tainted_size::get_controlling_option): New.
(tainted_size::emit): Use it.
(tainted_divisor::get_controlling_option): New.
(tainted_divisor::emit): Use it.
(tainted_allocation_size::get_controlling_option): New.
(tainted_allocation_size::emit): Use it.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/many-disabled-diagnostics.c: New test.
* gcc.dg/plugin/analyzer_gil_plugin.c
(gil_diagnostic::get_controlling_option): New.
(double_save_thread::emit): Use it.
(fncall_without_gil::emit): Likewise.
(pyobject_usage_without_gil::emit): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/diagnostic-manager.cc| 44 ++--
 gcc/analyzer/diagnostic-manager.h |  1 +
 gcc/analyzer/engine.cc|  7 +-
 gcc/analyzer/pending-diagnostic.h |  6 ++
 gcc/analyzer/region-model.cc  | 61 
 gcc/analyzer/sm-file.cc   | 16 -
 gcc/analyzer/sm-malloc.cc | 72 +++
 gcc/analyzer/sm-pattern-test.cc   |  8 ++-
 gcc/analyzer/sm-sensitive.cc  |  8 ++-
 gcc/analyzer/sm-signa

Re: rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread Segher Boessenkool
Hi!

On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote:
> For PR100693, we currently provide an addg6s builtin using unsigned
> int arguments, but we are missing an unsigned long long argument
> equivalent.  This patch adds an overload to provide the long long
> version of the builtin.
> 
> unsigned long long __builtin_addg6s (unsigned long long, unsigned long long);
> 
> RFC/concerns: This patch works, but looking briefly at intermediate stages
> is not behaving quite as I expected.   Looking at the intermediate dumps, I
> see in pr100693.original that calls I expect to be routed to the internal
> __builtin_addg6s_si() that uses (unsigned int) arguments are instead being
> handled by __builtin_addg6s_di() with casts that convert the arguments to
> (unsigned long long).

Did you test with actual 32-bit variables, instead of just function
arguments?  Function arguments are always passed in (sign-extended)
registers.

Like,

unsigned int f(unsigned int *a, unsigned int *b)
{
return __builtin_addg6s(*a, *b);
}

> As a test, I see if I swap the order of the builtins in rs6000-overload.def
> I end up with code casting the ULL values to UI, which provides truncated
> results, and is similar to what occurs today without this patch.
> 
> All that said, this patch seems to work.  OK for next stage 1?
> Tested on power8BE as well as LE power8,power9,power10.

Please ask again when stage 1 has started?

> gcc/
>   PR target/100693
>   * config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs()
> and add entries for __builtin_addg6s_di() and __builtin_addg6s_si().

Indent of second and further lines should be at the "*", not two spaces
after that.

> -   UNSPEC_ADDG6S
> +   UNSPEC_ADDG6S_SI
> +   UNSPEC_ADDG6S_DI

You do not need multiple unspec numbers.  You can differentiate them
based on the modes of the arguments, already :-)

>  ;; Miscellaneous ISA 2.06 (power7) instructions
> -(define_insn "addg6s"
> +(define_insn "addg6s_si"
>[(set (match_operand:SI 0 "register_operand" "=r")
>   (unspec:SI [(match_operand:SI 1 "register_operand" "r")
>   (match_operand:SI 2 "register_operand" "r")]
> -UNSPEC_ADDG6S))]
> +UNSPEC_ADDG6S_SI))]
> +  "TARGET_POPCNTD"
> +  "addg6s %0,%1,%2"
> +  [(set_attr "type" "integer")])
> +
> +(define_insn "addg6s_di"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (unspec:DI [(match_operand:DI 1 "register_operand" "r")
> + (match_operand:DI 2 "register_operand" "r")]
> +UNSPEC_ADDG6S_DI))]
>"TARGET_POPCNTD"
>"addg6s %0,%1,%2"
>[(set_attr "type" "integer")])

(define_insn "addg6s"
  [(set (match_operand:GPR 0 "register_operand" "=r")
(unspec:GPR [(match_operand:GPR 1 "register_operand" "r")
 (match_operand:GPR 2 "register_operand" "r")]
UNSPEC_ADDG6S))]
  "TARGET_POPCNTD"
  "addg6s %0,%1,%2"
  [(set_attr "type" "integer")])

We do not want DI (here, and in most places) for -m32!

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c
> @@ -0,0 +1,68 @@
> +/* { dg-do compile { target { powerpc*-*-linux* } } } */

Why only on Linux?

> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */

Why not on Darwin?  And why skip it anyway, given the previous line :-)

> +/* { dg-require-effective-target powerpc_vsx_ok } */

That is the wrong requirement.  You want to test for Power7, not for
VSX.  I realise you probably copied this from elsewhere :-(  (If from
another addg6s testcase, just keep it).


Segher


[r12-7670 Regression] FAIL: g++.dg/tree-ssa/pr86544.C -std=gnu++98 scan-tree-dump-times phiopt4 "if" 0 on Linux/x86_64

2022-03-16 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

f6fb661ea8ac7e17c6924719de6219f002c4efef is the first bad commit
commit f6fb661ea8ac7e17c6924719de6219f002c4efef
Author: Richard Biener 
Date:   Wed Mar 16 13:39:31 2022 +0100

tree-optimization/102008 - restore if-conversion of adjacent loads

caused

FAIL: gcc.dg/tree-ssa/phi-opt-21.c scan-tree-dump phiopt4 "converted to 
straightline code"
FAIL: gcc.dg/tree-ssa/popcount3.c scan-tree-dump-times phiopt4 "if" 0
FAIL: gcc.dg/tree-ssa/pr18134.c scan-tree-dump-times optimized "= a_..D. != 0" 1
FAIL: gcc.dg/tree-ssa/pr96480.c scan-tree-dump optimized " = _[0-9]* <= 3;"
FAIL: gcc.dg/tree-ssa/sccp-2.c scan-tree-dump-times optimized "bb" 1
FAIL: gcc.dg/tree-ssa/ssa-hoist-4.c scan-tree-dump-times optimized "MAX_EXPR" 1
FAIL: gcc.target/i386/pr95852-1.c scan-assembler-times \tjn?o\t 16
FAIL: gcc.target/i386/pr95852-1.c scan-assembler-times \tsetno\t 8
FAIL: gcc.target/i386/pr95852-1.c scan-assembler-times \tseto\t 8
FAIL: gcc.target/i386/pr95852-2.c scan-assembler-times \tjn?o\t 16
FAIL: gcc.target/i386/pr95852-2.c scan-assembler-times \tseto\t 8
FAIL: gcc.target/i386/pr95852-3.c scan-assembler-times \tjn?o\t 16
FAIL: gcc.target/i386/pr95852-3.c scan-assembler-times \tsetno\t 8
FAIL: gcc.target/i386/pr95852-3.c scan-assembler-times \tseto\t 8
FAIL: gcc.target/i386/pr95852-4.c scan-assembler-times \tjn?o\t 16
FAIL: gcc.target/i386/pr95852-4.c scan-assembler-times \tseto\t 8
FAIL: g++.dg/tree-ssa/pr86544.C  -std=gnu++14  scan-tree-dump-times phiopt4 
"if" 0
FAIL: g++.dg/tree-ssa/pr86544.C  -std=gnu++17  scan-tree-dump-times phiopt4 
"if" 0
FAIL: g++.dg/tree-ssa/pr86544.C  -std=gnu++20  scan-tree-dump-times phiopt4 
"if" 0
FAIL: g++.dg/tree-ssa/pr86544.C  -std=gnu++98  scan-tree-dump-times phiopt4 
"if" 0

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7670/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/phi-opt-21.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/popcount3.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/popcount3.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr18134.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/pr96480.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/sccp-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-hoist-4.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make chec

Re: rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread will schmidt via Gcc-patches
On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote:
> > For PR100693, we currently provide an addg6s builtin using unsigned
> > int arguments, but we are missing an unsigned long long argument
> > equivalent.  This patch adds an overload to provide the long long
> > version of the builtin.
> > 
> > unsigned long long __builtin_addg6s (unsigned long long, unsigned
> > long long);
> > 
> > RFC/concerns: This patch works, but looking briefly at intermediate
> > stages
> > is not behaving quite as I expected.   Looking at the intermediate
> > dumps, I
> > see in pr100693.original that calls I expect to be routed to the
> > internal
> > __builtin_addg6s_si() that uses (unsigned int) arguments are
> > instead being
> > handled by __builtin_addg6s_di() with casts that convert the
> > arguments to
> > (unsigned long long).
> 
> Did you test with actual 32-bit variables, instead of just function
> arguments?  Function arguments are always passed in (sign-extended)
> registers.
> 
> Like,
> 
> unsigned int f(unsigned int *a, unsigned int *b)
> {
>   return __builtin_addg6s(*a, *b);
> }


I perhaps missed that subtlety.  I'll investigate that further.

> 
> > As a test, I see if I swap the order of the builtins in rs6000-
> > overload.def
> > I end up with code casting the ULL values to UI, which provides
> > truncated
> > results, and is similar to what occurs today without this patch.
> > 
> > All that said, this patch seems to work.  OK for next stage 1?
> > Tested on power8BE as well as LE power8,power9,power10.
> 
> Please ask again when stage 1 has started?
> 
> > gcc/
> > PR target/100693
> > * config/rs6000/rs600-builtins.def: Remove entry for
> > __builtin_addgs()
> >   and add entries for __builtin_addg6s_di() and
> > __builtin_addg6s_si().
> 
> Indent of second and further lines should be at the "*", not two
> spaces
> after that.
> 
> > -   UNSPEC_ADDG6S
> > +   UNSPEC_ADDG6S_SI
> > +   UNSPEC_ADDG6S_DI
> 
> You do not need multiple unspec numbers.  You can differentiate them
> based on the modes of the arguments, already :-)
> 
> >  ;; Miscellaneous ISA 2.06 (power7) instructions
> > -(define_insn "addg6s"
> > +(define_insn "addg6s_si"
> >[(set (match_operand:SI 0 "register_operand" "=r")
> > (unspec:SI [(match_operand:SI 1 "register_operand" "r")
> > (match_operand:SI 2 "register_operand" "r")]
> > -  UNSPEC_ADDG6S))]
> > +  UNSPEC_ADDG6S_SI))]
> > +  "TARGET_POPCNTD"
> > +  "addg6s %0,%1,%2"
> > +  [(set_attr "type" "integer")])
> > +
> > +(define_insn "addg6s_di"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (unspec:DI [(match_operand:DI 1 "register_operand" "r")
> > +   (match_operand:DI 2 "register_operand" "r")]
> > +  UNSPEC_ADDG6S_DI))]
> >"TARGET_POPCNTD"
> >"addg6s %0,%1,%2"
> >[(set_attr "type" "integer")])
> 
> (define_insn "addg6s"
>   [(set (match_operand:GPR 0 "register_operand" "=r")
>   (unspec:GPR [(match_operand:GPR 1 "register_operand" "r")
>(match_operand:GPR 2 "register_operand" "r")]
>   UNSPEC_ADDG6S))]
>   "TARGET_POPCNTD"
>   "addg6s %0,%1,%2"
>   [(set_attr "type" "integer")])
> You do not need multiple unspec numbers.  You can differentiate
them
> based on the modes of the arguments, already :-)


Yeah, Thats what I thought, which is a big part of why I posted this
with RFC. :-)When I attempted this there was an issue with multiple
s (behind the GPR predicate) versus the singular "addg6s"
define_insn.  
It's possible I had something else wrong there, but I'll
go back to that attempt and work in that direction.

> 
> We do not want DI (here, and in most places) for -m32!
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile { target { powerpc*-*-linux* } } } */
> 
> Why only on Linux?
> 
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> 
> Why not on Darwin?  And why skip it anyway, given the previous line
> :-)
> 
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
> That is the wrong requirement.  You want to test for Power7, not for
> VSX.  I realise you probably copied this from elsewhere :-(  (If from
> another addg6s testcase, just keep it).

Because reasons. :-)   The stanzas are copied from the nearby bcd-1.c
testcase that has a simpler test for addg6s.Given the input I'll
try to correct the stanzas here and limit how much error I carry along.

Thanks for the feedback and review.   I'll investigate further, and
resubmit at stage1.   

Thanks,
-Will

> 
> 
> Segher



Re: [PATCH][Middle-end][Backport to GCC11][PR100775]Updating the reg use in exit block for -fzero-call-used-regs

2022-03-16 Thread Qing Zhao via Gcc-patches
Thanks. Just committed the patch to gcc11 branch.

Qing

> On Mar 14, 2022, at 2:46 AM, Richard Biener  
> wrote:
> 
> On Fri, Mar 11, 2022 at 5:31 PM Qing Zhao via Gcc-patches
>  wrote:
>> 
>> 
>> Hi,
>> 
>> I plan to backport the patch to fix PR100775:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100775
>> 
>> To gcc11 since this is a general bug to -fzero-call-used-regs. And should be 
>> fixed in gcc11 as well.
>> 
>> I have tested the patch with gcc11 release branch on both x86 and aarch64, 
>> no regression.
>> 
>> Okay for commit?
> 
> OK
> 
>> thanks.
>> 
>> Qing.
>> 
>> ==
>> 
>> 
>> From 737a0b0824111f46da44c5578b9769070c52 Mon Sep 17 00:00:00 2001
>> From: Qing Zhao 
>> Date: Thu, 10 Mar 2022 23:22:29 +
>> Subject: [PATCH] middle-end: updating the reg use in exit block for
>> -fzero-call-used-regs [PR100775] GCC11 backport.
>> 
>> In the pass_zero_call_used_regs, when updating dataflow info after adding
>> the register zeroing sequence in the epilogue of the function, we should
>> call "df_update_exit_block_uses" to update the register use information in
>> the exit block to include all the registers that have been zeroed.
>> 
>> gcc/ChangeLog:
>> 
>>PR middle-end/100775
>>* function.c (gen_call_used_regs_seq): Call
>>df_update_exit_block_uses when updating df.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>PR middle-end/100775
>>* gcc.target/arm/pr100775.c: New test.
>> ---
>> gcc/function.c  | 2 +-
>> gcc/testsuite/gcc.target/arm/pr100775.c | 9 +
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/arm/pr100775.c
>> 
>> diff --git a/gcc/function.c b/gcc/function.c
>> index fc7b147b5f1..0495e9f1b81 100644
>> --- a/gcc/function.c
>> +++ b/gcc/function.c
>> @@ -5922,7 +5922,7 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int 
>> zero_regs_type)
>> 
>>   /* Update the data flow information.  */
>>   crtl->must_be_zero_on_return |= zeroed_hardregs;
>> -  df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
>> +  df_update_exit_block_uses ();
>> }
>> }
>> 
>> diff --git a/gcc/testsuite/gcc.target/arm/pr100775.c 
>> b/gcc/testsuite/gcc.target/arm/pr100775.c
>> new file mode 100644
>> index 000..c648cd5e8f7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr100775.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>> +/* { dg-options "-mthumb -fzero-call-used-regs=used" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> --
>> 2.27.0
>> 
>> 



Re: RFA: crc builtin functions & optimizations

2022-03-16 Thread Joern Rennecke
> and there needs to be code to actually expand the builtin using optabs.
> And something needs to be done to make match.pd work on the output.

Never mind that bit, that was just due to a function argument type mismatch
on the last argument of the crc built-in functions.


Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-16 Thread Qing Zhao via Gcc-patches


> On Mar 14, 2022, at 11:04 AM, Richard Sandiford  
> wrote:
> 
> Sorry for the slow response, was out for a few days.
> 
> Xi Ruoyao  writes:
>> On Sat, 2022-03-12 at 18:48 +0800, Xi Ruoyao via Gcc-patches wrote:
>>> On Fri, 2022-03-11 at 21:26 +, Qing Zhao wrote:
 Hi, Ruoyao,
 
 (I might not be able to reply to this thread till next Wed due to a
 short vacation).
 
 First, some comments on opening bugs against Gcc:
 
 I took a look at the bug reports PR104817 and PR104820:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104820
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104817
 
 I didn’t see a testing case and a script to repeat the error, so I
 cannot repeat the error at my side.
>>> 
>>> I've put the test case, but maybe you didn't see it because it is too
>>> simple:
>>> 
>>> echo 'int t() {}' | /home/xry111/git-repos/gcc-test-mips/gcc/cc1 -
>>> nostdinc -fzero-call-used-regs=all
>>> 
>>> An empty function is enough to break -fzero-call-used-regs=all.  And
>>> if
>>> you append -mips64r2 to the cc1 command line you'll get 104820.  I
>>> enabled 4 existing tests for MIPS (reported "not work" on MIPS) in the
>>> patch so I think it's unnecessary to add new test cases.
>>> 
>>> Richard: can we use MIPS_EPILOGUE_TEMP as a scratch register in the
>>> sequence for zeroing the call-used registers, and then zero itself
>>> (despite it's not in need_zeroed_hardregs)?
>> 
>> No, it leads to an ICE at stage 3 bootstrapping :(.
>> 
>> Now I think the only rational ways are:
>> 
>> (1) allow zeroing more registers than need_zeroed_hardregs.
> 
> I think this is the way to go.  I agree it's a bit hacky, but it seems
> like the least worst option.
> 
> A less hacky alternative would be to pass an extra argument to the hook
> that contains the set of registers that the hook is *allowed* to clobber.
> For -fzero-call-used-regs=X, this new argument would be the set that
> would have been chosen for -fzero-call-used-regs=all, regardless of
> what X actually is.  We could then assert that the extra registers we
> want to clobber are in that set (which will be true for all values of X).

If we have to go this way, I think it’s better to make the change you suggested 
above, 
and then also update the documentation, both internal documentation on how to 
define
 the hook and the user level documentation on what the user might expect when 
using 
this option (i.e, it’s possible that the compiler might clear more registers 
than the user 
requests on some targets due to the implementation limitation). 

I can make this change if we decide to do this.

Thanks.

Qing

> 
>> Or
>> 
>> (2) allow zeroing less registers than need_zeroed_hardregs (then I'll
>> skip ST_REGS, after all they are just 8 bits in total).
> 
> Yeah, this is explicitly OK, provided that the target maintainers
> feel that the contents of the registers in question are not a significant
> security concern.  I don't feel I can make that call though.  It's really
> a question for the userbase.
> 
> Thanks,
> Richard
> 
>> If all these are unacceptable, then
>> 
>> (3) I'll just call sorry in MIPS target hook to tell not to use this
>> feature on MIPS.



libgo: update to 1.18 release

2022-03-16 Thread Ian Lance Taylor via Gcc-patches
This patch updates libgo to the final 1.18 release.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-nu.  Committed to mainline.

Ian


patch.txt.gz
Description: application/gzip


Re: rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread Segher Boessenkool
On Wed, Mar 16, 2022 at 03:06:42PM -0500, will schmidt wrote:
> On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote:
> > (define_insn "addg6s"
> >   [(set (match_operand:GPR 0 "register_operand" "=r")
> > (unspec:GPR [(match_operand:GPR 1 "register_operand" "r")
> >  (match_operand:GPR 2 "register_operand" "r")]
> > UNSPEC_ADDG6S))]
> >   "TARGET_POPCNTD"
> >   "addg6s %0,%1,%2"
> >   [(set_attr "type" "integer")])
> > You do not need multiple unspec numbers.  You can differentiate
> them
> > based on the modes of the arguments, already :-)
> 
> Yeah, Thats what I thought, which is a big part of why I posted this
> with RFC. :-)When I attempted this there was an issue with multiple
> s (behind the GPR predicate) versus the singular "addg6s"
> define_insn.  
> It's possible I had something else wrong there, but I'll
> go back to that attempt and work in that direction.

No, that is still there.  One way out is to make this an unnamed pattern
(like "*addg6s").  Another is to put the mode in the name, and then you
probably want to make it a parameterized name as well, which then hides
all that again:

(define_insn "@addg6s"
  ...

> > > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > 
> > That is the wrong requirement.  You want to test for Power7, not for
> > VSX.  I realise you probably copied this from elsewhere :-(  (If from
> > another addg6s testcase, just keep it).
> 
> Because reasons. :-)   The stanzas are copied from the nearby bcd-1.c
> testcase that has a simpler test for addg6s.Given the input I'll
> try to correct the stanzas here and limit how much error I carry along.

If you do not improve the existing ones it may be best to just keep this
the same in the new testcases as well.  Consistency is a good thing.
Consistent wrong is not so great of course :-)


Segher


[PATCH] c++: alias template and empty parameter packs [PR104008]

2022-03-16 Thread Marek Polacek via Gcc-patches
Zero-length pack expansions are treated as if no list were provided
at all, that is, with

  template struct S { };
  template
  void g() {
S...>;
  }

g will result in S<>.  In the following test we have something
similar:

  template 
  using IsOneOf = disjunction...>;

and then we have "IsOneOf..." where OtherHolders is an
empty pack.  Since r11-7931, we strip_typedefs in TYPE_PACK_EXPANSION.
In this test that results in "IsOneOf" being turned into
"disjunction<>".  So the whole expansion is now "disjunction<>...".  But
then we error in make_pack_expansion because find_parameter_packs_r won't
find the pack OtherHolders.

We strip the alias template because dependent_alias_template_spec_p says
it's not dependent.  It it not dependent because this alias is not
TEMPLATE_DECL_COMPLEX_ALIAS_P.  My understanding is that currently we
consider an alias complex if it

1) expands a pack from the enclosing class, as in

template typename... TT>
struct S {
  template
  using X = P...>;
};

   where the alias expands TT; or

2) the expansion does *not* name all the template parameters, as in

template struct R;
template
using U = R...>;

   where T is not named in the expansion.

But IsOneOf is neither.  And it can't know how it's going to be used.
Therefore I think we cannot make it complex (and in turn dependent) to fix
this bug.

After much gnashing of teeth, I think we simply want to avoid stripping
the alias if the new pattern doesn't have any parameter packs to expand.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?

PR c++/104008

gcc/cp/ChangeLog:

* tree.cc (strip_typedefs): Don't strip an alias template when
doing so would result in losing a parameter pack.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic-alias3.C: New test.
* g++.dg/cpp0x/variadic-alias4.C: New test.
---
 gcc/cp/tree.cc   | 13 +-
 gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C | 45 ++
 gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C | 48 
 3 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 6e9be713c51..eb59e56610b 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -1778,7 +1778,18 @@ strip_typedefs (tree t, bool *remove_attributes, 
unsigned int flags)
if (TYPE_P (pat))
  {
type = strip_typedefs (pat, remove_attributes, flags);
-   if (type != pat)
+   /* Empty packs can thwart our efforts here.  Consider
+
+   template 
+   using IsOneOf = disjunction...>;
+
+ where IsOneOf seemingly uses all of its template parameters in
+ its expansion (and does not expand a pack from the enclosing
+ class), so the alias is not marked as complex.  However, it may
+ be used as in "IsOneOf", where Ts is an empty parameter pack,
+ and stripping it down into "disjunction<>" here would exclude the
+ Ts pack, resulting in an error.  */
+   if (type != pat && uses_parameter_packs (type))
  {
result = copy_node (t);
PACK_EXPANSION_PATTERN (result) = type;
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C
new file mode 100644
index 000..6b6dd9f4c85
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-alias3.C
@@ -0,0 +1,45 @@
+// PR c++/104008
+// { dg-do compile { target c++11 } }
+
+template  struct conjunction;
+template  struct disjunction;
+template  struct is_same;
+template  struct enable_if;
+template  using enable_if_t = typename enable_if<_Cond>::type;
+struct B;
+struct __uniq_ptr_impl {
+  struct _Ptr {
+using type = B *;
+  };
+  using pointer = _Ptr::type;
+};
+struct unique_ptr {
+  using pointer = __uniq_ptr_impl::pointer;
+  unique_ptr(pointer);
+};
+template 
+using IsOneOf = disjunction...>;
+
+template  struct any_badge;
+
+struct badge {
+  badge(any_badge<>);
+  badge();
+};
+
+template  struct any_badge {
+  template ...>::value>>
+  any_badge();
+};
+
+template  unique_ptr make_unique(_Args... __args);
+
+struct B {
+  B(badge);
+  unique_ptr b_ = make_unique(badge{});
+};
+
+template  unique_ptr make_unique(_Args... __args) 
{
+  return new B(__args...);
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C
new file mode 100644
index 000..896a4725627
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-alias4.C
@@ -0,0 +1,48 @@
+// PR c++/104008
+// { dg-do compile { target c++11 } }
+// Differs from variadic-alias3.C only in the pattern of a pack expansion
+// in line 34.  But it's important to check that we also deal with more
+// complex patterns.
+
+template  struct conjunctio

Re: [PATCH] fortran: Separate associate character lengths earlier [PR104570]

2022-03-16 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

Am 14.03.22 um 19:28 schrieb Mikael Morin:

Hello,

this workarounds the regression I introduced with the fix for pr104228.
The problem comes from the evaluation of character length for the
associate target (y) in the testcase.
The expression is non-scalar which is not supported at that point, as no
scalarization setup is made there.

My initial direction to fix this was trying to modify the evaluation of
expressions so that the rank was ignored in the context of character
length evaluation.
That’s the patch I attached to the PR.
The patch I’m proposing here tries instead to avoid the need to evaluate
character length through sharing of gfc_charlen between symbols and
expressions.
I slightly prefer the latter direction, though I’m not sure it’s more
robust. > At least it passes regression testing, which proves some basic level 
of
goodness.
OK for master?


I tried a few minor variations of the testcase, and they
also seemed to work.  The patch looks simple enough, so
OK from my side.


Even if PR104570 is a 12 regression only, the PR104228 fix is targeted
at all the open branches so this one as well.
OK for 11/10/9?


Yes, after giving it a short time on master to "burn in".

Thanks for the patch!

Harald


[committed] libstdc++: Fix symbol versioning for Solaris 11.3 [PR103407]

2022-03-16 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and sparc-sun-solaris2.11 (but 11.3 only).

Pushed to trunk.

Rainer, this should allow you to continue omitting the
_ZSt10from_charsPKcS0_R[def]St12chars_format symbols from the baseline,
without the current FAIL. Please check on your other Solaris targets.

-- >8 --

The new std::from_chars implementation means that those symbols are now
defined on Solaris 11.3, which lacks uselocale. They were not present in
gcc-11, but the linker script gives them the GLIBCXX_3.4.29 symbol
version because that is the version where they appeared for systems with
uselocale.

This makes the version for those symbols depend on whether uselocale is
available or not, so that they get version GLIBCXX_3.4.30 on targets
where they weren't defined in gcc-11.

In order to avoid needing separate ABI baseline files for Solaris 11.3
and 11.4, the ABI checker program now treats the floating-point
std::from_chars overloads as undesignated if they are not found in the
baseline symbols file. This means they can be left out of the SOlaris
baseline without causing the check-abi target to fail.

libstdc++-v3/ChangeLog:

PR libstdc++/103407
* config/abi/pre/gnu.ver: Make version for std::from_chars
depend on HAVE_USELOCALE macro.
* testsuite/util/testsuite_abi.cc (compare_symbols): Treat
std::from_chars for floating-point types as undesignated if
not found in the baseline symbols file.
---
 libstdc++-v3/config/abi/pre/gnu.ver  |  7 +++
 libstdc++-v3/testsuite/util/testsuite_abi.cc | 13 +
 2 files changed, 20 insertions(+)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index dedee8c1b0a..9b80a31768b 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2337,8 +2337,10 @@ GLIBCXX_3.4.28 {
 
 GLIBCXX_3.4.29 {
 
+#ifdef HAVE_USELOCALE
 # std::from_chars
 _ZSt10from_charsPKcS0_R[def]St12chars_format;
+#endif
 
 # std::__istream_extract(istream&, char*, streamsize)
 _ZSt17__istream_extractRSiPc[ilx];
@@ -2416,6 +2418,11 @@ GLIBCXX_3.4.29 {
 
 GLIBCXX_3.4.30 {
 
+#ifndef HAVE_USELOCALE
+# std::from_chars
+_ZSt10from_charsPKcS0_R[def]St12chars_format;
+#endif
+
 _ZSt21__glibcxx_assert_fail*;
 
 #ifdef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
b/libstdc++-v3/testsuite/util/testsuite_abi.cc
index 3dee737b2e5..5c8383554e5 100644
--- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
@@ -499,6 +499,19 @@ compare_symbols(const char* baseline_file, const char* 
test_file,
  else
undesignated.push_back(stest);
}
+  // See PR libstdc++/103407 -  abi_check FAILs on Solaris
+  else if (stest.type == symbol::function
+&& stest.name.compare(0, 23, "_ZSt10from_charsPKcS0_R") == 0
+&& stest.name.find_first_of("def", 23) == 23
+&& (stest.version_name == "GLIBCXX_3.4.29"
+  || stest.version_name == "GLIBCXX_3.4.30"))
+   {
+ stest.status = symbol::undesignated;
+ if (!check_version(stest, false))
+   incompatible.push_back(symbol_pair(stest, stest));
+ else
+   undesignated.push_back(stest);
+   }
   else
{
  stest.status = symbol::added;
-- 
2.34.1



Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-16 Thread Martin Sebor via Gcc-patches

On 3/15/22 19:24, Siddhesh Poyarekar wrote:

On 16/03/2022 02:06, Martin Sebor wrote:

The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.


Nothing in the standard says that the bound is related to the sizes of 
input buffers.  I don't think deducing that intent makes sense either, 
nor concluding that any other use case is misuse.



As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).


Thanks for sharing the historical perspective.


Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.


They're all valid uses however since they do not violate the standard. 
If we find at compile time that the strings don't terminate at the 
bounds, emitting the warning is OK but the more pessimistic check seems 
like overkill.



I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether
or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).

(Granted, the manual makes it sound like -Wstringop-overread only
detects provable past-the-end reads.  That's a mistake in
the documentation that should be fixed.  The warning was never quite
so limited, nor was it intended to be.)


The contention is not that it's not provable, it's more that it's 
doesn't even pass the "based on available information this is definitely 
buggy" assertion, making it more a strong suggestion than a warning that 
something is definitely amiss.  Which is why IMO it is more suitable as 
an analyzer check than a warning.


As the GCC manual prominently states (and as I already pointed out)
warnings are:

  constructions that are not inherently erroneous but that are risky
  or suggest there may have been an error.

None of them implies a definite bug, and only a minority are about
violations of the standard.  Many point out benign code that's just
suspicious.

We can disagree (and many of us do) about the value of this or that
warning but that alone is not sufficient reason to change it.  Least
of all in stage 4, and with no details about the issues you say you
found.

Martin



Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]

2022-03-16 Thread Zhao Wei Liew via Gcc-patches
Thanks for the detailed review. I have uploaded patch v2 based on the review.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
Changes since v1:
1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
test regressions caused by those patterns.
2. Support signed integers except for the possible INT_MIN / -1 case.
3. Support cases where X and Y have different precisions.

On Tue, 22 Feb 2022 at 15:54, Richard Biener  wrote:
> +/* (type)X / (type)Y -> (type)(X / Y)
> +   when the resulting type is at least precise as the original types
> +   and when all the types are unsigned integral types. */
> +(simplify
> + (trunc_div (convert @0) (convert @1))
> + (if (INTEGRAL_TYPE_P (type)
> +  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +  && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0))
> +  && TYPE_UNSIGNED (TREE_TYPE (@1))
> +  && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> +  && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
> +  (convert (trunc_div @0 @1
>
> since you are requiring the types of @0 and @1 to match it's easier to write
>
>  && types_match (TREE_TYPE(@0), TREE_TYPE (@1))
>

Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
have different precisions, so I believe that I can't use types_match()
anymore.

> that allows you to elide checks on either @0 or @1.  I suppose the transform
> does not work for signed types because of the -INT_MIN / -1 overflow case?
> It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
> expr_not_equal_to (@1, -1)
> (correctly written, lookup the existing examples in match.pd for the X
> % -Y transform)

That's right. I have made changes to support signed types except for
the INT_MIN / -1 case.

> I'll note that as written the transform will not catch CST / (T)x or
> (T)x / CST since
> you'll not see conversions around constants.  I'm not sure whether
> using (convert[12]? ...)
> or writing special patterns with INTEGER_CST operands is more convenient.
> There is int_fits_type_p to check whether a constant will fit in a
> type without truncation
> or sign change.

I see. I couldn't find an easy way to use (convert[12]? ...) in this
case so I added 2 other patterns for the CST / (T)x and (T)x / CST
cases.

> When @0 and @1 do not have the same type there might still be a common type
> that can be used and is smaller than 'type', it might be as simple as using
> build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).

Thanks for the tip. I've made a similar change, but using either
TREE_TYPE (@0) or TREE_TYPE (@1) instead of
build_nonstandard_integer_type().

>
> In the past there have been attempts to more globally narrow operations using
> a new pass rather than using individual patterns.  So for more complicated 
> cases
> that might be the way to go.  There's now also the ISEL pass which does
> pre-RTL expansion transforms that need some global context and for example
> can look at SSA uses.

I had wanted to do something similar to handle all integral
trunc_divs, but I'm not sure where/how to start. It seems out of my
league at this moment. I'll gladly explore it after this change
though!
From f5f768b55f6cadcd9eba459561abfc1d7e28f94e Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew 
Date: Sat, 19 Feb 2022 16:28:38 +0800
Subject: [PATCH] tree-optimization: [PR103855] Fold (type)X / (type)Y

This pattern converts (trunc_div (convert X) (convert Y)) to
(convert (trunc_div X Y)) when:

1. type, X, and Y are all INTEGRAL_TYPES with the same signedness.
2. type has TYPE_PRECISION at least as large as those of X and Y.
3. the result of the division is guaranteed to fit in either
   the type of Y or the type of Y.

This patch also adds corresponding patterns for CST / (type)Y
and (type)X / CST.

These patterns useful as wider divisions are typically more expensive.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Signed-off-by: Zhao Wei Liew 

PR tree-optimization/103855

gcc/ChangeLog:

* match.pd: Add patterns for (type)X / (type)Y,
  CST / (type)Y, and (type)X / CST.

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/pr91088.c: Fix a test regression.
* gcc.dg/tree-ssa/divide-8.c: New test.
---
 gcc/match.pd | 57 
 gcc/testsuite/gcc.dg/ipa/pr91088.c   |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c | 45 +++
 3 files changed, 104 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8b44b5cc92c..b0bfb94f506 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -687,6 +687,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   (convert (trunc_mod @0 @1
 
+/* (type)X / (type)Y -> (type)(X / Y)
+   when all types are integral and have th

Re: [PATCH] c++/96765: warn when casting "this" to Derived* in Base ctor/dtor

2022-03-16 Thread Jason Merrill via Gcc-patches

On 3/14/22 02:36, Zhao Wei Liew via Gcc-patches wrote:


This patch adds a warning when casting "this" to Derived* in the Base
class constructor and destructor. I've added the warning under the
-Wextra flag as I can't find a more appropriate flag for it.


It's generally best to add a new warning flag if an existing one isn't a 
good match.  Maybe -Wderived-cast-undefined?  It seems like it could be 
part of -Wall.



+  if (current_function_decl
+  && (DECL_CONSTRUCTOR_P (current_function_decl)
+  || DECL_DESTRUCTOR_P (current_function_decl))
+  && TREE_CODE (expr) == NOP_EXPR
+  && is_this_parameter (TREE_OPERAND (expr, 0)))


I think the resolves_to_fixed_type_p function would be useful here; 
you're testing for a subset of the cases it handles.



+warning_at(loc, OPT_Wextra,


Missing space before '('.

Jason



Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-16 Thread Siddhesh Poyarekar

On 17/03/2022 05:11, Martin Sebor wrote:

As the GCC manual prominently states (and as I already pointed out)
warnings are:


We are indeed going around in circles.  Hopefully someone else will 
pitch in and break the deadlock.


Siddhesh


[BACKPORT] Backport PR fortran/96983 fix to GCC 11

2022-03-16 Thread Michael Meissner via Gcc-patches
Backport PR fortran/96983 patch to GCC 11.

I applied a patch on the trunk in April 22nd, 2021 that fixes an issue (PR
fortran/66983) where we could fail for 128-bit floating point types
because we don't have a built-in function that is equivalent to llround
for 128-bit integer types.  Instead, the patch does a round operation
followed by conversion to the appropriate integer size if we don't have
the specialized round to specific integer type built-in functions.

When I checked in the change, I was told to wait until after GCC 11.1
shipped before doing the backport.  I forgot about the change until
recently.  Before checking it in, I did bootstraps and ran regression
tests on:

1)  little endian power9 using 128-bit IBM long double
2)  little endian power9 using 128-bit IEEE long double
3)  big endian power8 (both 64-bit and 32-bit) (and)
4)  x86_64 native build

There were no new regressions.  The test gfortran.dg/pr96711.f90 now
passes on PowerPC.

In the 10 months or so since the change was made on the trunk, the code in
build_round_expr did not change.

2022-03-16   Michael Meissner  

gcc/fortran/
PR fortran/96983
* trans-intrinsic.c (build_round_expr): If int type is larger than
long long, do the round and convert to the integer type.  Do not
try to find a floating point type the exact size of the integer
type.  Backport patch from 2021-04-22 on the trunk.

---
 gcc/fortran/trans-intrinsic.c | 26 --
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index bd6e6039429..313440d7022 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -386,30 +386,20 @@ build_round_expr (tree arg, tree restype)
   argprec = TYPE_PRECISION (argtype);
   resprec = TYPE_PRECISION (restype);
 
-  /* Depending on the type of the result, choose the int intrinsic
- (iround, available only as a builtin, therefore cannot use it for
- __float128), long int intrinsic (lround family) or long long
- intrinsic (llround).  We might also need to convert the result
- afterwards.  */
+  /* Depending on the type of the result, choose the int intrinsic (iround,
+ available only as a builtin, therefore cannot use it for _Float128), long
+ int intrinsic (lround family) or long long intrinsic (llround).  If we
+ don't have an appropriate function that converts directly to the integer
+ type (such as kind == 16), just use ROUND, and then convert the result to
+ an integer.  We might also need to convert the result afterwards.  */
   if (resprec <= INT_TYPE_SIZE && argprec <= LONG_DOUBLE_TYPE_SIZE)
 fn = builtin_decl_for_precision (BUILT_IN_IROUND, argprec);
   else if (resprec <= LONG_TYPE_SIZE)
 fn = builtin_decl_for_precision (BUILT_IN_LROUND, argprec);
   else if (resprec <= LONG_LONG_TYPE_SIZE)
 fn = builtin_decl_for_precision (BUILT_IN_LLROUND, argprec);
-  else if (resprec >= argprec && resprec == 128)
-{
-  /* Search for a real kind suitable as temporary for conversion.  */
-  int kind = -1;
-  for (int i = 0; kind < 0 && gfc_real_kinds[i].kind != 0; i++)
-   if (gfc_real_kinds[i].mode_precision >= resprec)
- kind = gfc_real_kinds[i].kind;
-  if (kind < 0)
-   gfc_internal_error ("Could not find real kind with at least %d bits",
-   resprec);
-  arg = fold_convert (gfc_get_real_type (kind), arg);
-  fn = gfc_builtin_decl_for_float_kind (BUILT_IN_ROUND, kind);
-}
+  else if (resprec >= argprec)
+fn = builtin_decl_for_precision (BUILT_IN_ROUND, argprec);
   else
 gcc_unreachable ();
 
-- 
2.35.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.

2022-03-16 Thread liuhongt via Gcc-patches
This patch only handle pure-slp for by-value passed parameter which
has nothing to do with IPA but psABI. For by-reference passed
parameter IPA is required.

The patch is aggressive in determining STLF failure, any
unaligned_load for parm_decl passed by stack is thought to have STLF
stall issue. It could lose some perf where there's no such issue(1
vector_load vs n scalar_load + CTOR).

According to microbenchmark in PR, cost of STLF failure is generally
between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD
processors.

gcc/ChangeLog:

PR target/101908
* config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
(ix86_vector_costs::add_stmt_cost): Add extra cost for
unsigned_load which may have store forwarding stall issue.
* config/i386/i386.h (processor_costs): Add new member
stfs.
* config/i386/x86-tune-costs.h (i386_size_cost): Initialize
stfs.
(i386_cost, i486_cost, pentium_cost, lakemont_cost,
pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost,
amdfam10_cost, bdver_cost, znver1_cost, znver2_cost,
znver3_cost, skylake_cost, icelake_cost, alderlake_cost,
btver1_cost, btver2_cost, pentium4_cost, nocano_cost,
atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost,
core_cost): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
* gcc.target/i386/pr101908-3.c: New test.
* gcc.target/i386/pr101908-v16hi.c: New test.
* gcc.target/i386/pr101908-v16qi.c: New test.
* gcc.target/i386/pr101908-v16sf.c: New test.
* gcc.target/i386/pr101908-v16si.c: New test.
* gcc.target/i386/pr101908-v2df.c: New test.
* gcc.target/i386/pr101908-v2di.c: New test.
* gcc.target/i386/pr101908-v2hi.c: New test.
* gcc.target/i386/pr101908-v2qi.c: New test.
* gcc.target/i386/pr101908-v2sf.c: New test.
* gcc.target/i386/pr101908-v2si.c: New test.
* gcc.target/i386/pr101908-v4df.c: New test.
* gcc.target/i386/pr101908-v4di.c: New test.
* gcc.target/i386/pr101908-v4hi.c: New test.
* gcc.target/i386/pr101908-v4qi.c: New test.
* gcc.target/i386/pr101908-v4sf.c: New test.
* gcc.target/i386/pr101908-v4si.c: New test.
* gcc.target/i386/pr101908-v8df-adl.c: New test.
* gcc.target/i386/pr101908-v8df.c: New test.
* gcc.target/i386/pr101908-v8di-adl.c: New test.
* gcc.target/i386/pr101908-v8di.c: New test.
* gcc.target/i386/pr101908-v8hi-adl.c: New test.
* gcc.target/i386/pr101908-v8hi.c: New test.
* gcc.target/i386/pr101908-v8qi-adl.c: New test.
* gcc.target/i386/pr101908-v8qi.c: New test.
* gcc.target/i386/pr101908-v8sf-adl.c: New test.
* gcc.target/i386/pr101908-v8sf.c: New test.
* gcc.target/i386/pr101908-v8si-adl.c: New test.
* gcc.target/i386/pr101908-v8si.c: New test.
---
 gcc/config/i386/i386.cc   | 51 +++
 gcc/config/i386/i386.h|  1 +
 gcc/config/i386/x86-tune-costs.h  | 28 ++
 gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++
 gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++
 gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++
 .../gcc.target/i386/pr101908-v16hi.c  |  6 ++
 .../gcc.target/i386/pr101908-v16qi.c  | 30 +++
 .../gcc.target/i386/pr101908-v16sf.c  |  6 ++
 .../gcc.target/i386/pr101908-v16si.c  |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2df.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2di.c |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2hi.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 
 gcc/testsuite/gcc.target/i386/pr101908-v2sf.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2si.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4df.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4di.c |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4hi.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 
 gcc/testsuite/gcc.target/i386/pr101908-v4sf.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4si.c |  6 ++
 .../gcc.target/i386/pr101908-v8df-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8df.c |  6 ++
 .../gcc.target/i386/pr101908-v8di-adl.c   |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8di.c |  7 ++
 .../gcc.target/i386/pr101908-v8hi-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8hi.c |  6 ++
 .../gcc.target/i386/pr101908-v8qi-adl.c   | 22 +
 gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 +
 .../gcc.target/i386/pr101908-v8sf-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8sf.c |  6 ++
 .../gcc.target/i386/pr101908-v8si-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8si.c |  6 ++
 34 files changed, 444 insertions(+)
 create mo

[PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.

2022-03-16 Thread liuhongt via Gcc-patches
This patch only handle pure-slp for by-value passed parameter which
has nothing to do with IPA but psABI. For by-reference passed
parameter IPA is required.

The patch is aggressive in determining STLF failure, any
unaligned_load for parm_decl passed by stack is thought to have STLF
stall issue. It could lose some perf where there's no such issue(1
vector_load vs n scalar_load + CTOR).

According to microbenchmark in PR, cost of STLF failure is generally
between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD
processors.

gcc/ChangeLog:

PR target/101908
* config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
(ix86_vector_costs::add_stmt_cost): Add extra cost for
unsigned_load which may have store forwarding stall issue.
* config/i386/i386.h (processor_costs): Add new member
stfs.
* config/i386/x86-tune-costs.h (i386_size_cost): Initialize
stfs.
(i386_cost, i486_cost, pentium_cost, lakemont_cost,
pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost,
amdfam10_cost, bdver_cost, znver1_cost, znver2_cost,
znver3_cost, skylake_cost, icelake_cost, alderlake_cost,
btver1_cost, btver2_cost, pentium4_cost, nocano_cost,
atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost,
core_cost): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
* gcc.target/i386/pr101908-3.c: New test.
* gcc.target/i386/pr101908-v16hi.c: New test.
* gcc.target/i386/pr101908-v16qi.c: New test.
* gcc.target/i386/pr101908-v16sf.c: New test.
* gcc.target/i386/pr101908-v16si.c: New test.
* gcc.target/i386/pr101908-v2df.c: New test.
* gcc.target/i386/pr101908-v2di.c: New test.
* gcc.target/i386/pr101908-v2hi.c: New test.
* gcc.target/i386/pr101908-v2qi.c: New test.
* gcc.target/i386/pr101908-v2sf.c: New test.
* gcc.target/i386/pr101908-v2si.c: New test.
* gcc.target/i386/pr101908-v4df.c: New test.
* gcc.target/i386/pr101908-v4di.c: New test.
* gcc.target/i386/pr101908-v4hi.c: New test.
* gcc.target/i386/pr101908-v4qi.c: New test.
* gcc.target/i386/pr101908-v4sf.c: New test.
* gcc.target/i386/pr101908-v4si.c: New test.
* gcc.target/i386/pr101908-v8df-adl.c: New test.
* gcc.target/i386/pr101908-v8df.c: New test.
* gcc.target/i386/pr101908-v8di-adl.c: New test.
* gcc.target/i386/pr101908-v8di.c: New test.
* gcc.target/i386/pr101908-v8hi-adl.c: New test.
* gcc.target/i386/pr101908-v8hi.c: New test.
* gcc.target/i386/pr101908-v8qi-adl.c: New test.
* gcc.target/i386/pr101908-v8qi.c: New test.
* gcc.target/i386/pr101908-v8sf-adl.c: New test.
* gcc.target/i386/pr101908-v8sf.c: New test.
* gcc.target/i386/pr101908-v8si-adl.c: New test.
* gcc.target/i386/pr101908-v8si.c: New test.
---
 gcc/config/i386/i386.cc   | 51 +++
 gcc/config/i386/i386.h|  1 +
 gcc/config/i386/x86-tune-costs.h  | 28 ++
 gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++
 gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++
 gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++
 .../gcc.target/i386/pr101908-v16hi.c  |  6 ++
 .../gcc.target/i386/pr101908-v16qi.c  | 30 +++
 .../gcc.target/i386/pr101908-v16sf.c  |  6 ++
 .../gcc.target/i386/pr101908-v16si.c  |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2df.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2di.c |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2hi.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 
 gcc/testsuite/gcc.target/i386/pr101908-v2sf.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v2si.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4df.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4di.c |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4hi.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 
 gcc/testsuite/gcc.target/i386/pr101908-v4sf.c |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v4si.c |  6 ++
 .../gcc.target/i386/pr101908-v8df-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8df.c |  6 ++
 .../gcc.target/i386/pr101908-v8di-adl.c   |  7 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8di.c |  7 ++
 .../gcc.target/i386/pr101908-v8hi-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8hi.c |  6 ++
 .../gcc.target/i386/pr101908-v8qi-adl.c   | 22 +
 gcc/testsuite/gcc.target/i386/pr101908-v8qi.c | 22 +
 .../gcc.target/i386/pr101908-v8sf-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8sf.c |  6 ++
 .../gcc.target/i386/pr101908-v8si-adl.c   |  6 ++
 gcc/testsuite/gcc.target/i386/pr101908-v8si.c |  6 ++
 34 files changed, 444 insertions(+)
 create mo

[PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-16 Thread HAO CHEN GUI via Gcc-patches
Hi,
   This patch adds V1TI mode into a new mode iterator used in vector
comparison expands.With the patch, both built-ins and direct comparison
could generate P10 new V1TI comparison instructions.

   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-03-16 Haochen Gui 

gcc/
PR target/103316
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
* config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
V1TI instructions.
(vec_cmp): Set mode iterator to VEC_IC.
(vec_cmpu): Likewise.

gcc/testsuite/
PR target/103316
* gcc.target/powerpc/pr103316.c: New.
* gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
__int128.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 5d34c1bcfc9..fac7f43f438 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_VCMPEQUH:
 case RS6000_BIF_VCMPEQUW:
 case RS6000_BIF_VCMPEQUD:
-/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
-   folding produces worse code for 128-bit compares.  */
+case RS6000_BIF_VCMPEQUT:
   fold_compare_helper (gsi, EQ_EXPR, stmt);
   return true;

 case RS6000_BIF_VCMPNEB:
 case RS6000_BIF_VCMPNEH:
 case RS6000_BIF_VCMPNEW:
-/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
-   folding produces worse code for 128-bit compares.  */
+case RS6000_BIF_VCMPNET:
   fold_compare_helper (gsi, NE_EXPR, stmt);
   return true;

@@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_CMPGE_U4SI:
 case RS6000_BIF_CMPGE_2DI:
 case RS6000_BIF_CMPGE_U2DI:
-/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */
+case RS6000_BIF_CMPGE_1TI:
+case RS6000_BIF_CMPGE_U1TI:
   fold_compare_helper (gsi, GE_EXPR, stmt);
   return true;

@@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_VCMPGTUW:
 case RS6000_BIF_VCMPGTUD:
 case RS6000_BIF_VCMPGTSD:
-/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */
+case RS6000_BIF_VCMPGTUT:
+case RS6000_BIF_VCMPGTST:
   fold_compare_helper (gsi, GT_EXPR, stmt);
   return true;

@@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_CMPLE_U4SI:
 case RS6000_BIF_CMPLE_2DI:
 case RS6000_BIF_CMPLE_U2DI:
-/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */
+case RS6000_BIF_CMPLE_1TI:
+case RS6000_BIF_CMPLE_U1TI:
   fold_compare_helper (gsi, LE_EXPR, stmt);
   return true;

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index b87a742cca8..d88869cc8d0 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])

+;; Vector int modes for comparison
+(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])
+
 ;; 128-bit int modes
 (define_mode_iterator VEC_TI [V1TI TI])

@@ -533,10 +536,10 @@ (define_expand "vcond_mask_"

 ;; For signed integer vectors comparison.
 (define_expand "vec_cmp"
-  [(set (match_operand:VEC_I 0 "vint_operand")
+  [(set (match_operand:VEC_IC 0 "vint_operand")
(match_operator 1 "signed_or_equality_comparison_operator"
- [(match_operand:VEC_I 2 "vint_operand")
-  (match_operand:VEC_I 3 "vint_operand")]))]
+ [(match_operand:VEC_IC 2 "vint_operand")
+  (match_operand:VEC_IC 3 "vint_operand")]))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
 {
   enum rtx_code code = GET_CODE (operands[1]);
@@ -573,10 +576,10 @@ (define_expand "vec_cmp"

 ;; For unsigned integer vectors comparison.
 (define_expand "vec_cmpu"
-  [(set (match_operand:VEC_I 0 "vint_operand")
+  [(set (match_operand:VEC_IC 0 "vint_operand")
(match_operator 1 "unsigned_or_equality_comparison_operator"
- [(match_operand:VEC_I 2 "vint_operand")
-  (match_operand:VEC_I 3 "vint_operand")]))]
+ [(match_operand:VEC_IC 2 "vint_operand")
+  (match_operand:VEC_IC 3 "vint_operand")]))]
   "VECTOR_UNIT_ALTIVEC_OR_V

Re: [PATCH v2] c++/96765: warn when casting Base* to Derived* in Base ctor/dtor

2022-03-16 Thread Zhao Wei Liew via Gcc-patches
Thanks for the review. I've tested and uploaded a new patch v2 with
the requested changes.

On Thu, 17 Mar 2022 at 09:20, Jason Merrill  wrote:
>
> On 3/14/22 02:36, Zhao Wei Liew via Gcc-patches wrote:
>
> > This patch adds a warning when casting "this" to Derived* in the Base
> > class constructor and destructor. I've added the warning under the
> > -Wextra flag as I can't find a more appropriate flag for it.
>
> It's generally best to add a new warning flag if an existing one isn't a
> good match.  Maybe -Wderived-cast-undefined?  It seems like it could be
> part of -Wall.

Hmm, I agree. I've added a new -Wderived-cast-undefined flag for this.

> > +  if (current_function_decl
> > +  && (DECL_CONSTRUCTOR_P (current_function_decl)
> > +  || DECL_DESTRUCTOR_P (current_function_decl))
> > +  && TREE_CODE (expr) == NOP_EXPR
> > +  && is_this_parameter (TREE_OPERAND (expr, 0)))
>
> I think the resolves_to_fixed_type_p function would be useful here;
> you're testing for a subset of the cases it handles.

Does the new patch look like how you intended it to? The new patch
passes all regression tests and newly added tests. However, I don't
fully understand the code for resolves_to_fixed_type_p() and
fixed_type_or_null(), except that I see that they contain logic to
return -1 when within a ctor/dtor.

Thanks!
From a8bbd4158f3311372b67d32816ce40d5613ae0d8 Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew 
Date: Tue, 22 Feb 2022 16:03:17 +0800
Subject: [PATCH] c++: warn when casting Base* to Derived* in Base ctor/dtor
 [PR96765]

Casting "this" in a base class constructor to a derived class type is
undefined behaviour, but there is no warning when doing so.

Add a warning for this.

Signed-off-by: Zhao Wei Liew 

PR c++/96765

gcc/c-family/ChangeLog:

* c.opt (-Wderived-cast-undefined): New option.

gcc/cp/ChangeLog:

* typeck.cc (build_static_cast_1): Add a warning when casting
  Base* to Derived* in Base constructor and destructor.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wderived-cast-undefined.C: New test.
---
 gcc/c-family/c.opt|  4 +++
 gcc/cp/typeck.cc  |  6 
 .../g++.dg/warn/Wderived-cast-undefined.C | 33 +++
 3 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 9cfd2a6bc4e..1681395c529 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -598,6 +598,10 @@ C++ ObjC++ Var(warn_deprecated_enum_float_conv) Warning
 Warn about deprecated arithmetic conversions on operands where one is of 
enumeration
 type and the other is of a floating-point type.
 
+Wderived-cast-undefined
+C++ Var(warn_derived_cast_undefined) Warning LangEnabledBy(C++, Wall)
+Warn about undefined casts from Base* to Derived* in the Base constructor or 
destructor.
+
 Wdesignated-init
 C ObjC Var(warn_designated_init) Init(1) Warning
 Warn about positional initialization of structs requiring designated 
initializers.
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 516fa574ef6..0bbbc378806 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8079,6 +8079,12 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
 {
   tree base;
 
+  if (TREE_CODE (expr) == NOP_EXPR
+  && resolves_to_fixed_type_p (TREE_OPERAND (expr, 0)) == -1)
+warning_at (loc, OPT_Wderived_cast_undefined,
+   "invalid % from type %qT to type %qT before 
the latter is constructed",
+   intype, type);
+
   if (processing_template_decl)
return expr;
 
diff --git a/gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C 
b/gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C
new file mode 100644
index 000..ea7388514dc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wderived-cast-undefined.C
@@ -0,0 +1,33 @@
+// PR c++/96765
+// { dg-options "-Wderived-cast-undefined" }
+
+struct Derived;
+struct Base {
+  Derived *x;
+  Derived *y;
+  Base();
+  ~Base();
+};
+
+struct Derived : Base {};
+
+Base::Base()
+: x(static_cast(this)), // { dg-warning "invalid 'static_cast'" 
}
+  y((Derived *)this) // { dg-warning "invalid 'static_cast'" }
+{ 
+  static_cast(this); // { dg-warning "invalid 'static_cast'" }
+  (Derived *)this; // { dg-warning "invalid 'static_cast'" }
+}
+
+Base::~Base() {
+  static_cast(this); // { dg-warning "invalid 'static_cast'" }
+  (Derived *)this; // { dg-warning "invalid 'static_cast'" }
+}
+
+struct Other {
+  Other() {
+Base b;
+static_cast(&b);
+(Derived *)(&b);
+  }
+};
-- 
2.25.1