[PATCH] [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element.

2021-08-06 Thread liuhongt via Gcc-patches
Hi:
  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
  Ok for trunk?

gcc/ChangeLog:

PR rtl-optimization/101796
* simplify-rtx.c
(simplify_context::simplify_binary_operation_1): Simplify
vector shift/rotate with const_vec_duplicate to vector
shift/rotate with const_int element.

gcc/testsuite/ChangeLog:

PR rtl-optimization/101796
* gcc.target/i386/pr101796.c: New test.
---
 gcc/simplify-rtx.c   | 15 ++
 gcc/testsuite/gcc.target/i386/pr101796.c | 65 
 2 files changed, 80 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101796.c

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index a719f57870f..75f3e455562 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -3970,6 +3970,21 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return simplify_gen_binary (code, mode, op0,
gen_int_shift_amount (mode, val));
}
+
+  /* Optimize vector shift/rotate with const_vec_duplicate
+to vector shift/rotate with const_int element.
+  /* TODO: vec_duplicate with variable can also be simplified,
+but GCC only require operand 2 of shift/rotate to be a scalar type
+which can have different modes in different backends, it makes
+simplication difficult to decide which mode should be choosed
+for shift/rotate count.  */
+  if ((code == ASHIFTRT || code == LSHIFTRT
+  || code == ASHIFT || code == ROTATERT
+  || code == ROTATE)
+ && const_vec_duplicate_p (op1))
+   return simplify_gen_binary (code, mode, op0,
+   unwrap_const_vec_duplicate (op1));
+
   break;
 
 case ASHIFT:
diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c 
b/gcc/testsuite/gcc.target/i386/pr101796.c
new file mode 100644
index 000..c22d6267fe5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101796.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2 " } */
+/* { dg-final { scan-assembler-not "vpbroadcast" } }  */
+/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } }  */
+/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } }  */
+/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } }  */
+/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } }  */
+/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } }  */
+/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } }  */
+
+#include 
+
+__m512i
+foo (__m512i a)
+{
+  return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3));
+}
+
+__m512i
+foo1 (__m512i a)
+{
+  return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3));
+}
+
+__m512i
+foo2 (__m512i a, long long b)
+{
+  return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3));
+}
+
+__m512i
+foo3 (__m512i a)
+{
+  return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3));
+}
+
+__m512i
+foo4 (__m512i a)
+{
+  return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3));
+}
+
+__m512i
+foo5 (__m512i a, long long b)
+{
+  return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3));
+}
+
+__m512i
+foo6 (__m512i a)
+{
+  return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3));
+}
+
+__m512i
+foo7 (__m512i a)
+{
+  return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3));
+}
+
+__m512i
+foo8 (__m512i a, long long b)
+{
+  return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3));
+}
-- 
2.27.0



Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 7:31 AM Sebastian Huber <
sebastian.hu...@embedded-brains.de> wrote:

> On 05/08/2021 14:53, Martin Liška wrote:
> > On 7/23/21 11:39 AM, Sebastian Huber wrote:
> >> Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda
> >> info in a
> >> freestanding environment.  It is intended to be used with the
> >> -fprofile-info-section option.  A crude test program which doesn't use
> >> a linker
> >> script is (use "gcc -coverage -fprofile-info-section -lgcc test.c" to
> >> compile
> >> it):
> >
> > The patch can be installed once the following nits are fixed:
>
> Thanks for the review, I checked it in like this:
>
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=9124bbe1857f0d3a3015d6461d5f8d04f07cab85
>
> I hope the format is now all right.
>
>
Hi,

Looks like there's a problem with your patch:
 
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/./gcc/xgcc
-B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/./gcc/
-B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/bin/
-B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/lib/
-isystem
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include
-isystem
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/sys-include
   -g -O2 -O2  -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include
-fno-inline -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector
-Dinhibit_libc  -fno-inline -I. -I. -I../.././gcc
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/.
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../gcc
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../include
   -o _gcov.o -MT _gcov.o -MD -MP -MF _gcov.dep -DL_gcov -c
/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/libgcov-driver.c

In file included from
/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/libgcov-driver.c:29:
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/include/stdint.h:9:16:
fatal error: stdint.h: No such file or directory
9 | # include_next 
  |^~
compilation terminated.
make[2]: *** [Makefile:928: _gcov.o] Error 1
make[2]: *** Waiting for unfinished jobs

Can you check?

Thanks,

Christophe

-- 
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/
>


Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 4:07 AM Martin Sebor via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On 7/30/21 9:06 AM, Jason Merrill wrote:
> > On 7/27/21 2:56 PM, Martin Sebor wrote:
> >> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575690.html
> >>
> >> Are there any other suggestions or comments or is the latest revision
> >> okay to commit?
> >
> > OK.
>
> I had to make a few more adjustments to fix up code that's snuck
> in since I last tested the patch.  I committed r12-2776 after
> retesting on x86_64-linux.
>
> With the cleanup out of the way I'll resubmit the copy ctor patch
> next.
>
>
Hi Martin,

Your patch breaks the aarch64 build:
 
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:
In function 'void aarch64_sve::register_svpattern()':
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:3502:27:
error: use of deleted function 'vec::vec(auto_vec&) [with long
unsigned int N = 32ul;
T = std::pair]'
"svpattern", values);
   ^
In file included from
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/hash-table.h:248:0,
 from
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:480,
 from
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:24:
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vec.h:1455:3:
error: declared here
   vec (auto_vec &) = delete;
   ^
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:
In function 'void aarch64_sve::register_svprfop()':
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:3516:30:
error: use of deleted function 'vec::vec(auto_vec&) [with long
unsigned int N = 16ul;
T = std::pair]'
 "svprfop", values);
  ^
In file included from
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/hash-table.h:248:0,
 from
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:480,
 from
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:24:
/tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vec.h:1455:3:
error: declared here
   vec (auto_vec &) = delete;
   ^

Can you check?

Thanks,

Christophe

>
> Martin
>
> >
> >> On 7/20/21 12:34 PM, Martin Sebor wrote:
> >>> On 7/14/21 10:23 AM, Jason Merrill wrote:
>  On 7/14/21 10:46 AM, Martin Sebor wrote:
> > On 7/13/21 9:39 PM, Jason Merrill wrote:
> >> On 7/13/21 4:02 PM, Martin Sebor wrote:
> >>> On 7/13/21 12:37 PM, Jason Merrill wrote:
>  On 7/13/21 10:08 AM, Jonathan Wakely wrote:
> > On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:
> >> Somebody with more C++ knowledge than me needs to approve the
> >> vec.h changes - I don't feel competent to assess all effects
> >> of the change.
> >
> > They look OK to me except for:
> >
> > -extern vnull vNULL;
> > +static constexpr vnull vNULL{ };
> >
> > Making vNULL have static linkage can make it an ODR violation
> > to use
> > vNULL in templates and inline functions, because different
> > instantiations will refer to a different "vNULL" in each
> > translation
> > unit.
> 
>  The ODR says this is OK because it's a literal constant with the
>  same value (6.2/12.2.1).
> 
>  But it would be better without the explicit 'static'; then in
>  C++17 it's implicitly inline instead of static.
> >>>
> >>> I'll remove the static.
> >>>
> 
>  But then, do we really want to keep vNULL at all?  It's a weird
>  blurring of the object/pointer boundary that is also dependent
>  on vec being a thin wrapper around a pointer.  In almost all
>  cases it can be replaced with {}; one exception is ==
>  comparison, where it seems to be testing that the embedded
>  pointer is null, which is a weird thing to want to test.
> >>>
> >>> The one use case I know of for vNULL where I can't think of
> >>> an equally good substitute is in passing a vec as an argument by
> >>> value.  The only way to do that that I can think of is to name
> >>> the full vec type (i.e., the specialization) which is more typing
> >>> and less generic than vNULL.  I don't use vNULL myself so I
> wouldn't
> >>> miss this trick if it were to be removed but others might feel
> >>> differently.
> >>
> >> In C++11, it can be replaced by {} in that context as well.
> >
> > Cool.  I thought I'd tried { } here but I guess not.
> >
> >>
> >>> If not, I'm all for getting rid of vNULL but with over 350 uses
> >>> 

Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-06 Thread Sebastian Huber

Hello Christophe,

On 06/08/2021 09:50, Christophe Lyon wrote:

Looks like there's a problem with your patch:
  
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/./gcc/xgcc
 
-B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/./gcc/
 
-B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/bin/
 
-B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/lib/
 -isystem 
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include
 -isystem 
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/sys-include
    -g -O2 -O2  -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -W -Wall 
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes 
-Wmissing-prototypes -Wold-style-definition  -isystem ./include -fno-inline -g 
-DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc  -fno-inline 
-I. -I. -I../.././gcc 
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc 
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/. 
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../gcc 
-I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../include    
-o _gcov.o -MT _gcov.o -MD -MP -MF _gcov.dep -DL_gcov -c 
/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/libgcov-driver.c

In file included from 
/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/libgcov-driver.c:29:
/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/include/stdint.h:9:16: 
fatal error: stdint.h: No such file or directory

     9 | # include_next 
       |                ^~
compilation terminated.
make[2]: *** [Makefile:928: _gcov.o] Error 1
make[2]: *** Waiting for unfinished jobs

Can you check?


I already feared that the  include may cause problems, from 
the commit message:


"With this patch,  is included in libgcov-driver.c even if 
inhibit_libc is defined.  This header file should be also available for 
freestanding environments.  If this is not the case, then we have to 
define intptr_t somehow."


What about the attached patch?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/
>From 9e91c623116313312408a8809f32eac1f7ef6b16 Mon Sep 17 00:00:00 2001
From: Sebastian Huber 
Date: Fri, 6 Aug 2021 09:57:43 +0200
Subject: [PATCH] gcov: Remove  from libgcov-driver.c

In the patch to add __gcov_info_to_gcda(), the include of  was added
to libgcov-driver.c even if inhibit_libc is defined.  It turned out that this
header file is not always available.  Remove the include of  and
replace the intptr_t with the compiler provided __INTPTR_TYPE__.

libgcc/

	* libgcov-driver.c (#include ): Remove.
	(write_topn_counters): Use __INTPTR_TYPE__ instead of intptr_t.
---
 libgcc/libgcov-driver.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 9d7bc9c79950..087f71e01077 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -26,8 +26,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #include "libgcov.h"
 #include "gcov-io.h"
 
-#include 
-
 /* Return 1, if all counter values are zero, otherwise 0. */
 
 static inline int
@@ -453,7 +451,7 @@ write_topn_counters (const struct gcov_ctr_info *ci_ptr,
   gcov_type start = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 2];
   unsigned sizes = 0;
 
-  for (struct gcov_kvp *node = (struct gcov_kvp *)(intptr_t)start;
+  for (struct gcov_kvp *node = (struct gcov_kvp *)(__INTPTR_TYPE__)start;
 	   node != NULL; node = node->next)
 	++sizes;
 
@@ -472,7 +470,7 @@ write_topn_counters (const struct gcov_ctr_info *ci_ptr,
   gcov_type start = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 2];
 
   unsigned j = 0;
-  for (struct gcov_kvp *node = (struct gcov_kvp *)(intptr_t)start;
+  for (struct gcov_kvp *node = (struct gcov_kvp *)(__INTPTR_TYPE__)start;
 	   j < list_sizes[i]; node = node->next, j++)
 	{
 	  dump_counter (node->value, dump_fn, arg);
-- 
2.26.2



[PATCH] c++: Optimize constinit thread_local vars [PR101786]

2021-08-06 Thread Jakub Jelinek via Gcc-patches
Hi!

The paper that introduced constinit mentioned in rationale that constinit
can be used on externs as well and that it can be used to avoid the
thread_local initialization wrappers, because the standard requires that
if constinit is present on any declaration, it is also present on the
initialization declaration, even if it is in some other TU etc.

There is a small problem though, we use the tls wrappers not just if
the thread_local variable needs dynamic initialization, but also when
it has static initialization, but non-trivial destructor, as the
"dynamic initialization" in that case needs to register the destructor.

So, the following patch optimizes constinit thread_local vars only
if we can prove they will not have non-trivial destructors.  That includes
the case where we have incomplete type where we don't know and need to
conservatively assume the type will have non-trivial destructor at the
initializing declaration side.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-08-06  Jakub Jelinek  

PR c++/101786
* decl2.c (var_defined_without_dynamic_init): Return true for
DECL_DECLARED_CONSTINIT_P with complete type and trivial destructor.

* g++.dg/cpp2a/constinit16.C: New test.

--- gcc/cp/decl2.c.jj   2021-07-02 21:59:12.359171627 +0200
+++ gcc/cp/decl2.c  2021-08-05 16:09:39.833599188 +0200
@@ -3447,6 +3447,12 @@ set_guard (tree guard)
 static bool
 var_defined_without_dynamic_init (tree var)
 {
+  /* constinit vars are guaranteed to not have dynamic initializer,
+ but still registering the destructor counts as dynamic initialization.  */
+  if (DECL_DECLARED_CONSTINIT_P (var)
+  && COMPLETE_TYPE_P (TREE_TYPE (var))
+  && !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (var)))
+return true;
   /* If it's defined in another TU, we can't tell.  */
   if (DECL_EXTERNAL (var))
 return false;
--- gcc/testsuite/g++.dg/cpp2a/constinit16.C.jj 2021-08-05 15:50:49.702463664 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/constinit16.C2021-08-05 16:14:52.893202685 
+0200
@@ -0,0 +1,21 @@
+// PR c++/101786
+// { dg-do compile { target c++20 } }
+// { dg-add-options tls }
+// { dg-require-alias "" }
+// { dg-require-effective-target tls_runtime }
+// { dg-final { scan-assembler-not "_ZTH17mythreadlocalvar1" } }
+// { dg-final { scan-assembler "_ZTH17mythreadlocalvar2" } }
+// { dg-final { scan-assembler-not "_ZTH17mythreadlocalvar3" } }
+// { dg-final { scan-assembler "_ZTH17mythreadlocalvar4" } }
+
+extern thread_local constinit int mythreadlocalvar1;
+struct S;
+extern thread_local constinit S mythreadlocalvar2;
+struct T { int t; };
+extern thread_local constinit T mythreadlocalvar3;
+struct U { int u; ~U (); };
+extern thread_local constinit U mythreadlocalvar4;
+int foo () { return mythreadlocalvar1; }
+S *bar () { return &mythreadlocalvar2; }
+T *baz () { return &mythreadlocalvar3; }
+U *qux () { return &mythreadlocalvar4; }

Jakub



Re: [PATCH 1/4] openacc: Middle-end worker-partitioning support

2021-08-06 Thread Julian Brown
On Wed, 4 Aug 2021 15:13:30 +0200
Thomas Schwinge  wrote:

> 'oacc_do_neutering' is the 'execute' function of the pass, so that
> means every time this executes, a fresh 'field_map' is set up, no
> state persists across runs (assuming I'm understanding that
> correctly).  Why don't we simply use standard (non-GC) memory
> management for that?  "For convenience" shall be fine as an answer
> ;-) -- but maybe instead of figuring out the right GC annotations,
> changing the memory management will be easier?  (Or, of course, maybe
> I completely misunderstood that?)

I suspect you're right, and there's no need for this to be GC-allocated
memory. If non-standard memory allocation will work out fine, we should
probably use that instead.

Thanks,

Julian


Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-08-06 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
 wrote:
>
>
>
> On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni 
>  wrote:
>>
>> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>>  wrote:
>> >
>> >
>> >
>> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni 
>> >  wrote:
>> >>
>> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> >>  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches 
>> >> >  wrote:
>> >> >>
>> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov  
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > > -Original Message-
>> >> >> > > From: Prathamesh Kulkarni 
>> >> >> > > Sent: 06 July 2021 08:06
>> >> >> > > To: Christophe LYON 
>> >> >> > > Cc: Kyrylo Tkachov ; gcc Patches > >> >> > > patc...@gcc.gnu.org>
>> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> >> > > constructor
>> >> >> > >
>> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> >> > >  wrote:
>> >> >> > > >
>> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> >> > > >  wrote:
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> >> > > > > >  wrote:
>> >> >> > > > > >>
>> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> >> > > > >  -Original Message-
>> >> >> > > > >  From: Prathamesh Kulkarni 
>> >> >> > > > >  Sent: 28 June 2021 09:38
>> >> >> > > > >  To: Kyrylo Tkachov 
>> >> >> > > > >  Cc: Christophe Lyon ; gcc 
>> >> >> > > > >  Patches
>> >> >> > > > >> >> > > > >  patc...@gcc.gnu.org>
>> >> >> > > > >  Subject: Re: [ARM] PR98435: Missed optimization in 
>> >> >> > > > >  expanding
>> >> >> > > vector
>> >> >> > > > >  constructor
>> >> >> > > > > 
>> >> >> > > > >  On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> >> > > 
>> >> >> > > > >  wrote:
>> >> >> > > > > >
>> >> >> > > > > >> -Original Message-
>> >> >> > > > > >> From: Prathamesh Kulkarni 
>> >> >> > > > > >> 
>> >> >> > > > > >> Sent: 14 June 2021 09:02
>> >> >> > > > > >> To: Christophe Lyon 
>> >> >> > > > > >> Cc: gcc Patches ; Kyrylo 
>> >> >> > > > > >> Tkachov
>> >> >> > > > > >> 
>> >> >> > > > > >> Subject: Re: [ARM] PR98435: Missed optimization in 
>> >> >> > > > > >> expanding
>> >> >> > > vector
>> >> >> > > > > >> constructor
>> >> >> > > > > >>
>> >> >> > > > > >> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> >> > > > > >>  wrote:
>> >> >> > > > > >>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> >> > > > >  
>> >> >> > > > > >> wrote:
>> >> >> > > > >  On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via 
>> >> >> > > > >  Gcc-
>> >> >> > > patches
>> >> >> > > > >   wrote:
>> >> >> > > > > > Hi,
>> >> >> > > > > > As mentioned in PR, for the following test-case:
>> >> >> > > > > >
>> >> >> > > > > > #include 
>> >> >> > > > > >
>> >> >> > > > > > bfloat16x4_t f1 (bfloat16_t a)
>> >> >> > > > > > {
>> >> >> > > > > > return vdup_n_bf16 (a);
>> >> >> > > > > > }
>> >> >> > > > > >
>> >> >> > > > > > bfloat16x4_t f2 (bfloat16_t a)
>> >> >> > > > > > {
>> >> >> > > > > > return (bfloat16x4_t) {a, a, a, a};
>> >> >> > > > > > }
>> >> >> > > > > >
>> >> >> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon 
>> >> >> > > > > > -mfloat-
>> >> >> > > > >  abi=softfp
>> >> >> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> >> > > vectorized:
>> >> >> > > > > >
>> >> >> > > > > > f1:
>> >> >> > > > > >   vdup.16 d16, r0
>> >> >> > > > > >   vmovr0, r1, d16  @ v4bf
>> >> >> > > > > >   bx  lr
>> >> >> > > > > >
>> >> >> > > > > > f2:
>> >> >> > > > > >   mov r3, r0  @ __bf16
>> >> >> > > > > >   adr r1, .L4
>> >> >> > > > > >   ldrdr0, [r1]
>> >> >> > > > > >   mov r2, r3  @ __bf16
>> >> >> > > > > >   mov ip, r3  @ __bf16
>> >> >> > > > > >   bfi r1, r2, #0, #16
>> >> >> > > > > >   bfi r0, ip, #0, #16
>> >> >> > > > > >   bfi r1, r3, #16, #16
>> >> >> > > > > >   bfi r0, r2, #16, #16
>> >> >> > > > > >   bx  lr
>> >> >> > > > > >
>> >> >> > > > > > This seems to happen because vec_init pattern in 
>> >> >> > > > > > neon.md
>> >> >> > > has VDQ
>> >> >> > > > > >> mode
>> >> >> > > > > > iterator, which doesn't include V4BF. In attached 
>> >> >> > > > > > patch, I
>> >> >> > > changed
>> >> >> 

Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Fri, Aug 6, 2021 at 5:32 AM liuhongt  wrote:
>>
>> Hi:
>> ---
>> OK, I think sth is amiss here upthread.  insv/extv do look like they
>> are designed
>> to work on integer modes (but docs do not say anything about this here).
>> In fact the caller of extract_bit_field_using_extv is named
>> extract_integral_bit_field.  Of course nothing seems to check what kind of
>> modes we're dealing with, but we're for example happily doing
>> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
>> some integer mode and op0 is HFmode?  From the above I get it's
>> the other way around?  In that case we should wrap the
>> call to extract_integral_bit_field, extracting in an integer mode with the
>> same size as 'mode' and then converting the result as (subreg:HF (reg:HI 
>> ...)).
>> ---
>>   This is a separate patch as a follow up of upper comments.
>>
>> gcc/ChangeLog:
>>
>> * expmed.c (extract_bit_field_1): Wrap the call to
>> extract_integral_bit_field, extracting in an integer mode with
>> the same size as 'tmode' and then converting the result
>> as (subreg:tmode (reg:imode)).
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/i386/float16-5.c: New test.
>> ---
>>  gcc/expmed.c  | 19 +++
>>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
>>  2 files changed, 31 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>>
>> diff --git a/gcc/expmed.c b/gcc/expmed.c
>> index 3143f38e057..72790693ef0 100644
>> --- a/gcc/expmed.c
>> +++ b/gcc/expmed.c
>> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
>> bitsize, poly_uint64 bitnum,
>>op0_mode = opt_scalar_int_mode ();
>>  }
>>
>> +  /* Make sure we are playing with integral modes.  Pun with subregs
>> + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
>> + in extract_integral_bit_field.  */
>> +  if (int_mode_for_mode (tmode).exists (&imode)
>
> check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> cheaper.  Then imode should always be != tmode.  Maybe
> even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> how it behaves for composite modes.
>
> Of course the least surprises would happen when we restrict this
> to FLOAT_MODE_P (tmode).
>
> Richard - any preferences?

If the bug is that extract_integral_bit_field is being called with
a non-integral mode parameter, then it looks odd that we can still
fall through to it without an integral mode (when exists is false).

If calling extract_integral_bit_field without an integral mode is
a bug then I think we should have:

  int_mode_for_mode (mode).require ()

whenever mode is not already SCALAR_INT_MODE_P/is_a.
Ideally we'd make the mode parameter scalar_int_mode too.

extract_integral_bit_field currently has:

  /* Find a correspondingly-sized integer field, so we can apply
 shifts and masks to it.  */
  scalar_int_mode int_mode;
  if (!int_mode_for_mode (tmode).exists (&int_mode))
/* If this fails, we should probably push op0 out to memory and then
   do a load.  */
int_mode = int_mode_for_mode (mode).require ();

which would seem to be redundant after this change.

>> +  && imode != tmode
>> +  && imode != GET_MODE (op0))
>> +{
>> +  rtx ret = extract_integral_bit_field (op0, op0_mode,
>> +   bitsize.to_constant (),
>> +   bitnum.to_constant (), unsignedp,
>> +   NULL, imode, imode,
>> +   reverse, fallback_p);
>> +  gcc_assert (ret);
>> +
>> +  if (!REG_P (ret))
>> +   ret = force_reg (imode, ret);
>> +  return gen_lowpart_SUBREG (tmode, ret);
>> +}
>> +
>>/* It's possible we'll need to handle other cases here for
>>   polynomial bitnum and bitsize.  */

Minor nit, but since the code is using to_constant, it should go after
this comment rather than before it.

Thanks,
Richard

>>
>> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c 
>> b/gcc/testsuite/gcc.target/i386/float16-5.c
>> new file mode 100644
>> index 000..ebc0af1490b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-msse2 -O2" } */
>> +_Float16
>> +foo (int a)
>> +{
>> +  union {
>> +int a;
>> +_Float16 b;
>> +  }c;
>> +  c.a = a;
>> +  return c.b;
>> +}
>> --
>> 2.27.0
>>


Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 10:05 AM Sebastian Huber <
sebastian.hu...@embedded-brains.de> wrote:

> Hello Christophe,
>
> On 06/08/2021 09:50, Christophe Lyon wrote:
> > Looks like there's a problem with your patch:
> >
>  
> /tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/./gcc/xgcc
> -B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/./gcc/
> -B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/bin/
> -B/tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/lib/
> -isystem
> /tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/include
> -isystem
> /tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-eabi/sys-include
>-g -O2 -O2  -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -W -Wall
> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
> -Wmissing-prototypes -Wold-style-definition  -isystem ./include -fno-inline
> -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -Dinhibit_libc
>  -fno-inline -I. -I. -I../.././gcc
> -I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc
> -I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/.
> -I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../gcc
> -I/tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../include
>-o _gcov.o -MT _gcov.o -MD -MP -MF _gcov.dep -DL_gcov -c
> /tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/libgcov-driver.c
> >
> > In file included from
> >
> /tmp/1784442_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/libgcov-driver.c:29:
> >
> /tmp/1784442_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/include/stdint.h:9:16:
>
> > fatal error: stdint.h: No such file or directory
> >  9 | # include_next 
> >|^~
> > compilation terminated.
> > make[2]: *** [Makefile:928: _gcov.o] Error 1
> > make[2]: *** Waiting for unfinished jobs
> >
> > Can you check?
>
> I already feared that the  include may cause problems, from
> the commit message:
>
> "With this patch,  is included in libgcov-driver.c even if
> inhibit_libc is defined.  This header file should be also available for
> freestanding environments.  If this is not the case, then we have to
> define intptr_t somehow."
>
> What about the attached patch?
>

Thanks, it does fix the build issue for me.

Christophe


>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/
>


Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
>  wrote:
> >
> >
> >
> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org> wrote:
> >>
> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org> wrote:
> >> >>
> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via
> Gcc-patches  wrote:
> >> >> >>
> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <
> kyrylo.tkac...@arm.com> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > > -Original Message-
> >> >> >> > > From: Prathamesh Kulkarni 
> >> >> >> > > Sent: 06 July 2021 08:06
> >> >> >> > > To: Christophe LYON 
> >> >> >> > > Cc: Kyrylo Tkachov ; gcc Patches
>  >> >> >> > > patc...@gcc.gnu.org>
> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding
> vector
> >> >> >> > > constructor
> >> >> >> > >
> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> >> > >  wrote:
> >> >> >> > > >
> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> >> > > >  wrote:
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> >> > > > > >  wrote:
> >> >> >> > > > > >>
> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches
> wrote:
> >> >> >> > > > >  -Original Message-
> >> >> >> > > > >  From: Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org>
> >> >> >> > > > >  Sent: 28 June 2021 09:38
> >> >> >> > > > >  To: Kyrylo Tkachov 
> >> >> >> > > > >  Cc: Christophe Lyon ;
> gcc Patches
> >> >> >> > >  >> >> >> > > > >  patc...@gcc.gnu.org>
> >> >> >> > > > >  Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> >> > > vector
> >> >> >> > > > >  constructor
> >> >> >> > > > > 
> >> >> >> > > > >  On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> >> > > 
> >> >> >> > > > >  wrote:
> >> >> >> > > > > >
> >> >> >> > > > > >> -Original Message-
> >> >> >> > > > > >> From: Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org>
> >> >> >> > > > > >> Sent: 14 June 2021 09:02
> >> >> >> > > > > >> To: Christophe Lyon 
> >> >> >> > > > > >> Cc: gcc Patches ; Kyrylo
> Tkachov
> >> >> >> > > > > >> 
> >> >> >> > > > > >> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> >> > > vector
> >> >> >> > > > > >> constructor
> >> >> >> > > > > >>
> >> >> >> > > > > >> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> >> >> > > > > >>  wrote:
> >> >> >> > > > > >>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> >> > > > >  
> >> >> >> > > > > >> wrote:
> >> >> >> > > > >  On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni
> via Gcc-
> >> >> >> > > patches
> >> >> >> > > > >   wrote:
> >> >> >> > > > > > Hi,
> >> >> >> > > > > > As mentioned in PR, for the following test-case:
> >> >> >> > > > > >
> >> >> >> > > > > > #include 
> >> >> >> > > > > >
> >> >> >> > > > > > bfloat16x4_t f1 (bfloat16_t a)
> >> >> >> > > > > > {
> >> >> >> > > > > > return vdup_n_bf16 (a);
> >> >> >> > > > > > }
> >> >> >> > > > > >
> >> >> >> > > > > > bfloat16x4_t f2 (bfloat16_t a)
> >> >> >> > > > > > {
> >> >> >> > > > > > return (bfloat16x4_t) {a, a, a, a};
> >> >> >> > > > > > }
> >> >> >> > > > > >
> >> >> >> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon
> -mfloat-
> >> >> >> > > > >  abi=softfp
> >> >> >> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not
> being
> >> >> >> > > vectorized:
> >> >> >> > > > > >
> >> >> >> > > > > > f1:
> >> >> >> > > > > >   vdup.16 d16, r0
> >> >> >> > > > > >   vmovr0, r1, d16  @ v4bf
> >> >> >> > > > > >   bx  lr
> >> >> >> > > > > >
> >> >> >> > > > > > f2:
> >> >> >> > > > > >   mov r3, r0  @ __bf16
> >> >> >> > > > > >   adr r1, .L4
> >> >> >> > > > > >   ldrdr0, [r1]
> >> >> >> > > > > >   mov r2, r3  @ __bf16
> >> >> >> > > > > >   mov ip, r3  @ __bf16
> >> >> >> > > > > >   bfi r1, r2, #0, #16
> >> >> >> > > > > >   bfi r0, ip, #0, #16
> >> >> >> > > > > >   bfi r1, r3, #16, #16
> >> >> >> > > > > >   bfi r0, r2, #16, #16
> >> >> >> > > > > >   bx  lr
> >> >> >> > > > > >

Re: [PATCH 1/4] openacc: Middle-end worker-partitioning support

2021-08-06 Thread Julian Brown
On Wed, 4 Aug 2021 15:56:49 +0200
Thomas Schwinge  wrote:

> > This version of the patch [...]
> > avoids moving SESE-region finding code out
> > of the NVPTX backend  
> 
> So that's 'struct bb_sese' and following functions.
> 
> > since that code isn't used by the middle-end worker
> > partitioning neutering/broadcasting implementation yet.  
> 
> I understand correctly that "isn't used [...] yet" means that (a)
> "this isn't implemented yet" (on og11 etc.), and doesn't mean (b) "is
> missing from this patch submission"?  ... thus from (a) it follows
> that we may later also drop from the og11 branch these changes?

Yes, the former -- the SESE region-finding code isn't used anywhere for
the middle-end worker partitioning support. Thus if we happen to have
two adjacent blocks that both use worker-single mode, we will
conditionalise them to run on one worker only separately, with
redundant tests/branches.

I'm not sure how often that happens in practice. We don't need to
handle vector-single mode for GCN, which possibly means that the
SESE-finding code's ability to skip entire inner loop nests (IIRC!) is
unnecessary.

So yes, that code could probably be dropped for og11 too, though
perhaps we should try to evaluate if it would still be useful first.

> Relatedly, a nontrivial amount of data structures/logic/code did get
> duplicated from the nvptx back end, and modified slightly or
> not-so-slightly (RTL vs. GIMPLE plus certain implementation
> "details").
> 
> We should at least cross reference the two instances, to make sure
> that any changes to one are also propagated to the other.  (I'll take
> care.)

OK, thanks,

> And then, do you (or anyone else, of course) happen to have any clever
> idea about how to avoid the duplication, and somehow combine the RTL
> vs. GIMPLE implementations?  Given that we nowadays may use C++ -- do
> you foresee it feasible to have an abstract base class capturing
> basically the data structures, logic, common code, and then
> RTL-specialized plus GIMPLE-specialized classes inheriting from that?

I suppose one could either use "old-style" inheritance, or maybe do
it with templates. There's probably both costs & benefits when it comes
to maintenance, either way -- having this code shared would mean any
changes need testing for both nvptx & GCN targets, and risks making it
harder to follow. OTOH, like you say, changes would only need to be
made in one place.

TBH, I'd spend effort on trying to integrate the SESE code (if it'd be
beneficial) first, before trying to de-duplicate those other bits.

Thanks,

Julian


Re: [PATCH 1/4] aarch64: Use memcpy to copy structures in vst4[q]_lane intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst4[q]_lane Neon intrinsics in
> arm_neon.h.
>
> It also adds new code generation tests to verify that superfluous move
> instructions are not generated for the vst4q_lane intrinsics.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-29  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (__ST4_LANE_FUNC): Delete.
> (__ST4Q_LANE_FUNC): Delete.
> (vst4_lane_f16): Use __builtin_memcpy to copy vector
> structure instead of constructing __builtin_aarch64_simd_xi
> one vector at a time.
> (vst4_lane_f32): Likewise.
> (vst4_lane_f64): Likewise.
> (vst4_lane_p8): Likewise.
> (vst4_lane_p16): Likewise.
> (vst4_lane_p64): Likewise.
> (vst4_lane_s8): Likewise.
> (vst4_lane_s16): Likewise.
> (vst4_lane_s32): Likewise.
> (vst4_lane_s64): Likewise.
> (vst4_lane_u8): Likewise.
> (vst4_lane_u16): Likewise.
> (vst4_lane_u32): Likewise.
> (vst4_lane_u64): Likewise.
> (vst4_lane_bf16): Likewise.
> (vst4q_lane_f16): Use __builtin_memcpy to copy vector
> structure instead of using a union.
> (vst4q_lane_f32): Likewise.
> (vst4q_lane_f64): Likewise.
> (vst4q_lane_p8): Likewise.
> (vst4q_lane_p16): Likewise.
> (vst4q_lane_p64): Likewise.
> (vst4q_lane_s8): Likewise.
> (vst4q_lane_s16): Likewise.
> (vst4q_lane_s32): Likewise.
> (vst4q_lane_s64): Likewise.
> (vst4q_lane_u8): Likewise.
> (vst4q_lane_u16): Likewise.
> (vst4q_lane_u32): Likewise.
> (vst4q_lane_u64): Likewise.
> (vst4q_lane_bf16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
> tests.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 752397418afba8d17362f904556e7d2e88872eb8..dc04d14b87ce76d6369773a605795c0b05a6e8ad
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -9369,94 +9369,411 @@ __ST3Q_LANE_FUNC (uint16x8x3_t, uint16_t, v8hi, hi, 
> u16)
>  __ST3Q_LANE_FUNC (uint32x4x3_t, uint32_t, v4si, si, u32)
>  __ST3Q_LANE_FUNC (uint64x2x3_t, uint64_t, v2di, di, u64)
>  
> -#define __ST4_LANE_FUNC(intype, largetype, ptrtype, mode, \
> - qmode, ptr_mode, funcsuffix, signedtype) \
> -__extension__ extern __inline void\
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
> -vst4_lane_ ## funcsuffix (ptrtype *__ptr, \
> -   intype __b, const int __c) \
> -{ \
> -  __builtin_aarch64_simd_xi __o;  \
> -  largetype __temp;   \
> -  __temp.val[0]  
>  \
> -= vcombine_##funcsuffix (__b.val[0],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __temp.val[1]  
>  \
> -= vcombine_##funcsuffix (__b.val[1],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __temp.val[2]  
>  \
> -= vcombine_##funcsuffix (__b.val[2],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __temp.val[3]  
>  \
> -= vcombine_##funcsuffix (__b.val[3],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __o = __builtin_aarch64_set_qregxi##qmode (__o, \
> -  (signedtype) __temp.val[0], 0); \
> -  __o = __builtin_aarch64_set_qregxi##qmode (__o, \
> -  (signedtype) __temp.val[1], 1); \
> -  __o = __builtin_aarch64_set_qregxi##qmode (__o, \
> -  (signedtype) __temp.val[2], 2); \
> -  __o = __builtin_aarch64_set_qregxi##qmode (__o, \
> -  (signedtype) __temp.val[3], 3); \
> -  __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mod

Re: [PATCH 2/4] aarch64: Use memcpy to copy structures in vst3[q]_lane intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst3[q]_lane Neon intrinsics in
> arm_neon.h.
>
> It also adds new code generation tests to verify that superfluous move
> instructions are not generated for the vst3q_lane intrinsics.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?

Ok with the same s/\t=/ =/ comment as for 1/4.

Thanks,
Richard

>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-30  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (__ST3_LANE_FUNC): Delete.
> (__ST3Q_LANE_FUNC): Delete.
> (vst3_lane_f16): Use __builtin_memcpy to copy vector
> structure instead of constructing __builtin_aarch64_simd_ci
> one vector at a time.
> (vst3_lane_f32): Likewise.
> (vst3_lane_f64): Likewise.
> (vst3_lane_p8): Likewise.
> (vst3_lane_p16): Likewise.
> (vst3_lane_p64): Likewise.
> (vst3_lane_s8): Likewise.
> (vst3_lane_s16): Likewise.
> (vst3_lane_s32): Likewise.
> (vst3_lane_s64): Likewise.
> (vst3_lane_u8): Likewise.
> (vst3_lane_u16): Likewise.
> (vst3_lane_u32): Likewise.
> (vst3_lane_u64): Likewise.
> (vst3_lane_bf16): Likewise.
> (vst3q_lane_f16): Use __builtin_memcpy to copy vector
> structure instead of using a union.
> (vst3q_lane_f32): Likewise.
> (vst3q_lane_f64): Likewise.
> (vst3q_lane_p8): Likewise.
> (vst3q_lane_p16): Likewise.
> (vst3q_lane_p64): Likewise.
> (vst3q_lane_s8): Likewise.
> (vst3q_lane_s16): Likewise.
> (vst3q_lane_s32): Likewise.
> (vst3q_lane_s64): Likewise.
> (vst3q_lane_u8): Likewise.
> (vst3q_lane_u16): Likewise.
> (vst3q_lane_u32): Likewise.
> (vst3q_lane_u64): Likewise.
> (vst3q_lane_bf16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
> tests.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> dc04d14b87ce76d6369773a605795c0b05a6e8ad..d78ced8968869d9317d76368554bf6ce8f7e3afe
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -9285,89 +9285,383 @@ __ST2Q_LANE_FUNC (uint16x8x2_t, uint16_t, v8hi, hi, 
> u16)
>  __ST2Q_LANE_FUNC (uint32x4x2_t, uint32_t, v4si, si, u32)
>  __ST2Q_LANE_FUNC (uint64x2x2_t, uint64_t, v2di, di, u64)
>  
> -#define __ST3_LANE_FUNC(intype, largetype, ptrtype, mode, \
> - qmode, ptr_mode, funcsuffix, signedtype) \
> -__extension__ extern __inline void\
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
> -vst3_lane_ ## funcsuffix (ptrtype *__ptr, \
> -   intype __b, const int __c) \
> -{ \
> -  __builtin_aarch64_simd_ci __o;  \
> -  largetype __temp;   \
> -  __temp.val[0]  
>  \
> -= vcombine_##funcsuffix (__b.val[0],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __temp.val[1]  
>  \
> -= vcombine_##funcsuffix (__b.val[1],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __temp.val[2]  
>  \
> -= vcombine_##funcsuffix (__b.val[2],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __o = __builtin_aarch64_set_qregci##qmode (__o, \
> -  (signedtype) __temp.val[0], 0); \
> -  __o = __builtin_aarch64_set_qregci##qmode (__o, \
> -  (signedtype) __temp.val[1], 1); \
> -  __o = __builtin_aarch64_set_qregci##qmode (__o, \
> -  (signedtype) __temp.val[2], 2); \
> -  __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *)  
> \
> -  __ptr, __o, __c);   \
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vst3_lane_f16 (float16_t *__ptr, float16x4x3_t __val, const int __lane)
> +{
> +  __builtin_aarch64_simd_ci __o;
> +  float16x8x3_t __temp

Re: [PATCH 3/4] aarch64: Use memcpy to copy structures in vst2[q]_lane intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst2[q]_lane Neon intrinsics in
> arm_neon.h.
>
> It also adds new code generation tests to verify that superfluous move
> instructions are not generated for the vst2q_lane intrinsics.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?

Ok with the same s/\t=/ =/ comment as for 1/4.

Thanks,
Richard

> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-30  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Delete.
> (__ST2Q_LANE_FUNC): Delete.
> (vst2_lane_f16): Use __builtin_memcpy to copy vector
> structure instead of constructing __builtin_aarch64_simd_oi
> one vector at a time.
> (vst2_lane_f32): Likewise.
> (vst2_lane_f64): Likewise.
> (vst2_lane_p8): Likewise.
> (vst2_lane_p16): Likewise.
> (vst2_lane_p64): Likewise.
> (vst2_lane_s8): Likewise.
> (vst2_lane_s16): Likewise.
> (vst2_lane_s32): Likewise.
> (vst2_lane_s64): Likewise.
> (vst2_lane_u8): Likewise.
> (vst2_lane_u16): Likewise.
> (vst2_lane_u32): Likewise.
> (vst2_lane_u64): Likewise.
> (vst2_lane_bf16): Likewise.
> (vst2q_lane_f16): Use __builtin_memcpy to copy vector
> structure instead of using a union.
> (vst2q_lane_f32): Likewise.
> (vst2q_lane_f64): Likewise.
> (vst2q_lane_p8): Likewise.
> (vst2q_lane_p16): Likewise.
> (vst2q_lane_p64): Likewise.
> (vst2q_lane_s8): Likewise.
> (vst2q_lane_s16): Likewise.
> (vst2q_lane_s32): Likewise.
> (vst2q_lane_s64): Likewise.
> (vst2q_lane_u8): Likewise.
> (vst2q_lane_u16): Likewise.
> (vst2q_lane_u32): Likewise.
> (vst2q_lane_u64): Likewise.
> (vst2q_lane_bf16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
> tests.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> d78ced8968869d9317d76368554bf6ce8f7e3afe..ed6ce179d76f34e1f946adb75bb20a947b67ab82
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -9206,84 +9206,355 @@ __STRUCTN (float, 64, 4)
>  #undef __STRUCTN
>  
>  
> -#define __ST2_LANE_FUNC(intype, largetype, ptrtype, mode, \
> - qmode, ptr_mode, funcsuffix, signedtype) \
> -__extension__ extern __inline void\
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
> -vst2_lane_ ## funcsuffix (ptrtype *__ptr, \
> -   intype __b, const int __c) \
> -{ \
> -  __builtin_aarch64_simd_oi __o;  \
> -  largetype __temp;   \
> -  __temp.val[0]  
>  \
> -= vcombine_##funcsuffix (__b.val[0],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __temp.val[1]  
>  \
> -= vcombine_##funcsuffix (__b.val[1],  \
> -  vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
> -  __o = __builtin_aarch64_set_qregoi##qmode (__o, \
> -  (signedtype) __temp.val[0], 0); \
> -  __o = __builtin_aarch64_set_qregoi##qmode (__o, \
> -  (signedtype) __temp.val[1], 1); \
> -  __builtin_aarch64_st2_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *)  
> \
> -  __ptr, __o, __c);   \
> -}
> -
> -__ST2_LANE_FUNC (float16x4x2_t, float16x8x2_t, float16_t, v4hf, v8hf, hf, 
> f16,
> -  float16x8_t)
> -__ST2_LANE_FUNC (float32x2x2_t, float32x4x2_t, float32_t, v2sf, v4sf, sf, 
> f32,
> -  float32x4_t)
> -__ST2_LANE_FUNC (float64x1x2_t, float64x2x2_t, float64_t, df, v2df, df, f64,
> -  float64x2_t)
> -__ST2_LANE_FUNC (poly8x8x2_t, poly8x16x2_t, poly8_t, v8qi, v16qi, qi, p8,
> -  int8x16_t)
> -__ST2_LANE_FUNC (poly16x4x2_t, poly16x8x2_t, poly16_t, v4hi, v8hi, hi, p16,
> -  int16x8_t)
> -__ST2_LANE_FUNC (poly64x1x2_t, poly64x2x2_t, poly64_t, di, v2di_ssps, di, 
> p64,
> -  poly64x2_t)
> -__ST2_LANE_FUNC (int8x8x2_t, int8x16x2_t, int8_t, v8qi, v16qi, qi, s8,
> -  int8x16_t)
> -__ST2_LAN

Re: [PATCH 4/4] aarch64: Use memcpy to copy structures in bfloat vst* intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
> Neon intrinsics in arm_neon.h.
>
> It also adds new code generation tests to verify that superfluous move
> instructions are not generated for the vst[234]q or vst1q_x[234] bfloat
> intrinsics.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?

OK, thanks.

Richard

> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-30  Jonathan Wright  
>
> * config/aarch64/arm_neon.h (vst1_bf16_x2): Use
> __builtin_memcpy instead of constructing an additional
> __builtin_aarch64_simd_oi one vector at a time.
> (vst1q_bf16_x2): Likewise.
> (vst1_bf16_x3): Use __builtin_memcpy instead of constructing
> an additional __builtin_aarch64_simd_ci one vector at a time.
> (vst1q_bf16_x3): Likewise.
> (vst1_bf16_x4): Use __builtin_memcpy instead of a union.
> (vst1q_bf16_x4): Likewise.
> (vst2_bf16): Use __builtin_memcpy instead of constructing an
> additional __builtin_aarch64_simd_oi one vector at a time.
> (vst2q_bf16): Likewise.
> (vst3_bf16): Use __builtin_memcpy instead of constructing an
> additional __builtin_aarch64_simd_ci mode one vector at a
> time.
> (vst3q_bf16): Likewise.
> (vst4_bf16): Use __builtin_memcpy instead of constructing an
> additional __builtin_aarch64_simd_xi one vector at a time.
> (vst4q_bf16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vector_structure_intrinsics.c: Add new
> tests.
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> ed6ce179d76f34e1f946adb75bb20a947b67ab82..a16ee4d534fb6f15047b08a951adcc87c5c9ac3f
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -33839,8 +33839,7 @@ vst1_bf16_x2 (bfloat16_t * __a, bfloat16x4x2_t __val)
>bfloat16x8x2_t __temp;
>__temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 
> (__AARCH64_UINT64_C (0)));
>__temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 
> (__AARCH64_UINT64_C (0)));
> -  __o = __builtin_aarch64_set_qregoiv8bf (__o, __temp.val[0], 0);
> -  __o = __builtin_aarch64_set_qregoiv8bf (__o, __temp.val[1], 1);
> +  __builtin_memcpy (&__o, &__temp, sizeof (__temp));
>__builtin_aarch64_st1x2v4bf (__a, __o);
>  }
>  
> @@ -33849,8 +33848,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
> __artificial__))
>  vst1q_bf16_x2 (bfloat16_t * __a, bfloat16x8x2_t __val)
>  {
>__builtin_aarch64_simd_oi __o;
> -  __o = __builtin_aarch64_set_qregoiv8bf (__o, __val.val[0], 0);
> -  __o = __builtin_aarch64_set_qregoiv8bf (__o, __val.val[1], 1);
> +  __builtin_memcpy (&__o, &__val, sizeof (__val));
>__builtin_aarch64_st1x2v8bf (__a, __o);
>  }
>  
> @@ -33863,9 +33861,7 @@ vst1_bf16_x3 (bfloat16_t * __a, bfloat16x4x3_t __val)
>__temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 
> (__AARCH64_UINT64_C (0)));
>__temp.val[1] = vcombine_bf16 (__val.val[1], vcreate_bf16 
> (__AARCH64_UINT64_C (0)));
>__temp.val[2] = vcombine_bf16 (__val.val[2], vcreate_bf16 
> (__AARCH64_UINT64_C (0)));
> -  __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __temp.val[0], 
> 0);
> -  __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __temp.val[1], 
> 1);
> -  __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __temp.val[2], 
> 2);
> +  __builtin_memcpy (&__o, &__temp, sizeof (__temp));
>__builtin_aarch64_st1x3v4bf ((__builtin_aarch64_simd_bf *) __a, __o);
>  }
>  
> @@ -33874,26 +33870,31 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
> __artificial__))
>  vst1q_bf16_x3 (bfloat16_t * __a, bfloat16x8x3_t __val)
>  {
>__builtin_aarch64_simd_ci __o;
> -  __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __val.val[0], 
> 0);
> -  __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __val.val[1], 
> 1);
> -  __o = __builtin_aarch64_set_qregciv8bf (__o, (bfloat16x8_t) __val.val[2], 
> 2);
> +  __builtin_memcpy (&__o, &__val, sizeof (__val));
>__builtin_aarch64_st1x3v8bf ((__builtin_aarch64_simd_bf *) __a, __o);
>  }
>  
>  __extension__ extern __inline void
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vst1_bf16_x4 (bfloat16_t * __a, bfloat16x4x4_t val)
> +vst1_bf16_x4 (bfloat16_t * __a, bfloat16x4x4_t __val)
>  {
> -  union { bfloat16x4x4_t __i; __builtin_aarch64_simd_xi __o; } __u = { val };
> -  __builtin_aarch64_st1x4v4bf ((__builtin_aarch64_simd_bf *) __a, __u.__o);
> +  __builtin_aarch64_simd_xi __o;
> +  bfloat16x8x4_t __temp;
> +  __temp.val[0] = vcombine_bf16 (__val.val[0], vcreate_bf16 
> (__AARCH64_UINT64_C (0)));
> +  __temp.val[1] = vcombine_bf16 (__val.val[1], 

Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-08-06 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
 wrote:
>
>
>
> On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni 
>  wrote:
>>
>> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
>>  wrote:
>> >
>> >
>> >
>> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni 
>> >  wrote:
>> >>
>> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>> >>  wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni 
>> >> >  wrote:
>> >> >>
>> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>> >> >>  wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches 
>> >> >> >  wrote:
>> >> >> >>
>> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov 
>> >> >> >>  wrote:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > > -Original Message-
>> >> >> >> > > From: Prathamesh Kulkarni 
>> >> >> >> > > Sent: 06 July 2021 08:06
>> >> >> >> > > To: Christophe LYON 
>> >> >> >> > > Cc: Kyrylo Tkachov ; gcc Patches > >> >> >> > > patc...@gcc.gnu.org>
>> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding 
>> >> >> >> > > vector
>> >> >> >> > > constructor
>> >> >> >> > >
>> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> >> >> > >  wrote:
>> >> >> >> > > >
>> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> >> >> > > >  wrote:
>> >> >> >> > > > >
>> >> >> >> > > > >
>> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> >> >> > > > > >  wrote:
>> >> >> >> > > > > >>
>> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches 
>> >> >> >> > > > > >> wrote:
>> >> >> >> > > > >  -Original Message-
>> >> >> >> > > > >  From: Prathamesh Kulkarni 
>> >> >> >> > > > >  
>> >> >> >> > > > >  Sent: 28 June 2021 09:38
>> >> >> >> > > > >  To: Kyrylo Tkachov 
>> >> >> >> > > > >  Cc: Christophe Lyon ; gcc 
>> >> >> >> > > > >  Patches
>> >> >> >> > > > >> >> >> > > > >  patc...@gcc.gnu.org>
>> >> >> >> > > > >  Subject: Re: [ARM] PR98435: Missed optimization in 
>> >> >> >> > > > >  expanding
>> >> >> >> > > vector
>> >> >> >> > > > >  constructor
>> >> >> >> > > > > 
>> >> >> >> > > > >  On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> >> >> > > 
>> >> >> >> > > > >  wrote:
>> >> >> >> > > > > >
>> >> >> >> > > > > >> -Original Message-
>> >> >> >> > > > > >> From: Prathamesh Kulkarni 
>> >> >> >> > > > > >> 
>> >> >> >> > > > > >> Sent: 14 June 2021 09:02
>> >> >> >> > > > > >> To: Christophe Lyon 
>> >> >> >> > > > > >> Cc: gcc Patches ; Kyrylo 
>> >> >> >> > > > > >> Tkachov
>> >> >> >> > > > > >> 
>> >> >> >> > > > > >> Subject: Re: [ARM] PR98435: Missed optimization in 
>> >> >> >> > > > > >> expanding
>> >> >> >> > > vector
>> >> >> >> > > > > >> constructor
>> >> >> >> > > > > >>
>> >> >> >> > > > > >> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> >> >> > > > > >>  wrote:
>> >> >> >> > > > > >>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> >> >> > > > >  
>> >> >> >> > > > > >> wrote:
>> >> >> >> > > > >  On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni 
>> >> >> >> > > > >  via Gcc-
>> >> >> >> > > patches
>> >> >> >> > > > >   wrote:
>> >> >> >> > > > > > Hi,
>> >> >> >> > > > > > As mentioned in PR, for the following test-case:
>> >> >> >> > > > > >
>> >> >> >> > > > > > #include 
>> >> >> >> > > > > >
>> >> >> >> > > > > > bfloat16x4_t f1 (bfloat16_t a)
>> >> >> >> > > > > > {
>> >> >> >> > > > > > return vdup_n_bf16 (a);
>> >> >> >> > > > > > }
>> >> >> >> > > > > >
>> >> >> >> > > > > > bfloat16x4_t f2 (bfloat16_t a)
>> >> >> >> > > > > > {
>> >> >> >> > > > > > return (bfloat16x4_t) {a, a, a, a};
>> >> >> >> > > > > > }
>> >> >> >> > > > > >
>> >> >> >> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon 
>> >> >> >> > > > > > -mfloat-
>> >> >> >> > > > >  abi=softfp
>> >> >> >> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> >> >> > > vectorized:
>> >> >> >> > > > > >
>> >> >> >> > > > > > f1:
>> >> >> >> > > > > >   vdup.16 d16, r0
>> >> >> >> > > > > >   vmovr0, r1, d16  @ v4bf
>> >> >> >> > > > > >   bx  lr
>> >> >> >> > > > > >
>> >> >> >> > > > > > f2:
>> >> >> >> > > > > >   mov r3, r0  @ __bf16
>> >> >> >> > > > > >   adr r1, .L4
>> >> >> >> > > > > >   ldrdr0, [r1]
>> >> >> >> > > > > >   mov r2, r3  @ __bf16
>> >> >> >> > > > > >   mov ip, r3  @ __bf16
>> >> >> >> > > > > >   bfi r1, r2, #0, #16
>> >> 

Re: [OpenACC] Extract 'pass_oacc_loop_designation' out of 'pass_oacc_device_lower' (was: [PATCH 1/4] openacc: Middle-end worker-partitioning support)

2021-08-06 Thread Julian Brown
On Thu, 29 Jul 2021 09:49:05 +0200
Thomas Schwinge  wrote:

> >  namespace {
> >  
> > +const pass_data pass_data_oacc_loop_designation =
> > +{
> > +  GIMPLE_PASS, /* type */
> > +  "oaccloops", /* name */
> > +  OPTGROUP_OMP, /* optinfo_flags */
> > +  TV_NONE, /* tv_id */
> > +  PROP_cfg, /* properties_required */
> > +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  TODO_update_ssa | TODO_cleanup_cfg
> > +  | TODO_rebuild_alias, /* todo_flags_finish */
> > +};  
> 
> Do you remember why you added 'TODO_rebuild_alias' here?
> 'pass_oacc_device_lower' doesn't have it, and neither does
> 'pass_oacc_loop_designation' in your original (2017-11-27) internal
> gcn/master branch commit 81ee7ef64cdfa47c01f24c79b8ebd03242c9f3eb
> "Split device-lowering/gimple workers into three passes".  So I
> removed that -- but please do tell if there is a reason to keep it.

I do not :-). My suspicion is that it was leftover debug code, or
possibly the result of a bad merge.

(Another possibility is that the alias information needs updating when
we change the address space of certain entities during gimple rewriting
for worker partitioning -- but this is the wrong place for that, I
think?).

Thanks,

Julian


Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-06 Thread Martin Liška

On 8/6/21 10:05 AM, Sebastian Huber wrote:

What about the attached patch?


The patch is fine, please install it.

Martin


Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-06 Thread Sebastian Huber

On 06/08/2021 12:26, Martin Liška wrote:

On 8/6/21 10:05 AM, Sebastian Huber wrote:

What about the attached patch?


The patch is fine, please install it.


Thanks, I checked it in.

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[PATCH] middle-end/AArch64 Fix bootstrap after vec changes

2021-08-06 Thread Tamar Christina via Gcc-patches
Hi All,

The build is broken since a3d3e8c362c2 since it's deleted the ability to pass
vec<> by value and now much be past by reference.

However some language hooks used by AArch64 were not updated and breaks the
build on AArch64.  This patch updates these hooks.

However most of the changes are generic... so I'm sending to a cross section
of approvers.

Bootstrapped aarch64-none-linux-gnu and works again.

Ok for master?

Thanks,
Tamar

gcc/c/ChangeLog:

* c-decl.c (c_simulate_enum_decl): Pass vec<> by pointer.
* c-tree.h (c_simulate_enum_decl): Likewise.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_svpattern,
register_svprfop): Pass vec<> by pointer.
* langhooks-def.h (lhd_simulate_enum_decl): Likewise.
* langhooks.c (lhd_simulate_enum_decl): Likewise.
* langhooks.h (struct lang_hooks_for_types): Likewise.

gcc/cp/ChangeLog:

* cp-objcp-common.h (cxx_simulate_enum_decl): Pass vec<> by pointer.
* decl.c (cxx_simulate_enum_decl): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 
234ee16fe4afe5baf3490596d27662c7acee8126..221a67fe57be105dfb88f5053179adb62c9cc47d
 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9379,7 +9379,7 @@ build_enumerator (location_t decl_loc, location_t loc,
 
 tree
 c_simulate_enum_decl (location_t loc, const char *name,
- vec values)
+ vec *values_ptr)
 {
   location_t saved_loc = input_location;
   input_location = loc;
@@ -9389,6 +9389,7 @@ c_simulate_enum_decl (location_t loc, const char *name,
 
   tree value_chain = NULL_TREE;
   string_int_pair *value;
+  vec values = *values_ptr;
   unsigned int i;
   FOR_EACH_VEC_ELT (values, i, value)
 {
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 
ab6db3860f5062d5d73e3ffacf123a992b9c6c48..a8a90eae30d54006e83b20cdcc7aa7a582686c6c
 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -595,7 +595,7 @@ extern void finish_function (location_t = input_location);
 extern tree finish_struct (location_t, tree, tree, tree,
   class c_struct_parse_info *);
 extern tree c_simulate_enum_decl (location_t, const char *,
- vec);
+ vec *);
 extern struct c_arg_info *build_arg_info (void);
 extern struct c_arg_info *get_parm_info (bool, tree);
 extern tree grokfield (location_t, struct c_declarator *,
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 
f44f81f13754b2d7f7391086c846ee2f966d54a7..f71b287570e4c8c00149e864db4bf03941382672
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3499,7 +3499,7 @@ register_svpattern ()
 #undef PUSH
 
   acle_svpattern = lang_hooks.types.simulate_enum_decl (input_location,
-   "svpattern", values);
+   "svpattern", &values);
 }
 
 /* Register the svprfop enum.  */
@@ -3513,7 +3513,7 @@ register_svprfop ()
 #undef PUSH
 
   acle_svprfop = lang_hooks.types.simulate_enum_decl (input_location,
- "svprfop", values);
+ "svprfop", &values);
 }
 
 /* Implement #pragma GCC aarch64 "arm_sve.h".  */
diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index 
53c6e4c2c8859be51b7bbcc87ce688d35c10f602..f1704aad5578c4132bcee5b3f9799dbacb6fd114
 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -38,7 +38,7 @@ extern bool cp_handle_option (size_t, const char *, 
HOST_WIDE_INT, int,
  location_t, const struct cl_option_handlers *);
 extern tree cxx_make_type_hook (tree_code);
 extern tree cxx_simulate_enum_decl (location_t, const char *,
-   vec);
+   vec *);
 
 /* Lang hooks that are shared between C++ and ObjC++ are defined here.  Hooks
specific to C++ or ObjC++ go in cp/cp-lang.c and objcp/objcp-lang.c,
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 
6fa6b9adc8778ff641f6a1c6ca9083ee74863331..f626f1e65ee644a494eb57fa5bcb7f1eb05ff667
 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -16408,7 +16408,7 @@ lookup_enumerator (tree enumtype, tree name)
 
 tree
 cxx_simulate_enum_decl (location_t loc, const char *name,
-   vec values)
+   vec *values)
 {
   location_t saved_loc = input_location;
   input_location = loc;
diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index 
8b744d96fb23af9c16c4860a3ac1c715b292fc24..02b4681dd96e1479cad1d9a233f308c2c1acba9f
 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -55,7 +55,7 @@ extern void lhd_set_decl_assembler_name (tree decl);
 extern void lhd_overwrite_decl_assembler_name (tree decl, tree name);
 ext

Re: [PATCH] [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element.

2021-08-06 Thread Richard Sandiford via Gcc-patches
liuhongt via Gcc-patches  writes:
> Hi:
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
>   Ok for trunk?

I think if anything the canonicalisation should be the other way:
if the shift amount is an in-range constant, we know that it fits
within a vector element, and so the vector form should be preferred.

IMO it's a wart that we support (say):

  (ashift:V4SI (reg:V4SI R) (const_int 1))

even though:

  (plus:V4SI (reg:V4SI R) (const_int 1))
  (minus:V4SI (const_int 1) (reg:V4SI R))

are not well-formed (at least AFAIK).

Thanks,
Richard

>
> gcc/ChangeLog:
>
>   PR rtl-optimization/101796
>   * simplify-rtx.c
>   (simplify_context::simplify_binary_operation_1): Simplify
>   vector shift/rotate with const_vec_duplicate to vector
>   shift/rotate with const_int element.
>
> gcc/testsuite/ChangeLog:
>
>   PR rtl-optimization/101796
>   * gcc.target/i386/pr101796.c: New test.
> ---
>  gcc/simplify-rtx.c   | 15 ++
>  gcc/testsuite/gcc.target/i386/pr101796.c | 65 
>  2 files changed, 80 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr101796.c
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index a719f57870f..75f3e455562 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -3970,6 +3970,21 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>   return simplify_gen_binary (code, mode, op0,
>   gen_int_shift_amount (mode, val));
>   }
> +
> +  /* Optimize vector shift/rotate with const_vec_duplicate
> +  to vector shift/rotate with const_int element.
> +  /* TODO: vec_duplicate with variable can also be simplified,
> +  but GCC only require operand 2 of shift/rotate to be a scalar type
> +  which can have different modes in different backends, it makes
> +  simplication difficult to decide which mode should be choosed
> +  for shift/rotate count.  */
> +  if ((code == ASHIFTRT || code == LSHIFTRT
> +|| code == ASHIFT || code == ROTATERT
> +|| code == ROTATE)
> +   && const_vec_duplicate_p (op1))
> + return simplify_gen_binary (code, mode, op0,
> + unwrap_const_vec_duplicate (op1));
> +
>break;
>  
>  case ASHIFT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr101796.c 
> b/gcc/testsuite/gcc.target/i386/pr101796.c
> new file mode 100644
> index 000..c22d6267fe5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr101796.c
> @@ -0,0 +1,65 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -O2 " } */
> +/* { dg-final { scan-assembler-not "vpbroadcast" } }  */
> +/* { dg-final { scan-assembler-not "vpsrlv\[dwq\]" } }  */
> +/* { dg-final { scan-assembler-not "vpsllv\[dwq\]" } }  */
> +/* { dg-final { scan-assembler-not "vpsrav\[dwq\]" } }  */
> +/* { dg-final { scan-assembler-times "vpsrl\[dwq\]" 3 } }  */
> +/* { dg-final { scan-assembler-times "vpsll\[dwq\]" 3 } }  */
> +/* { dg-final { scan-assembler-times "vpsra\[dwq\]" 3 } }  */
> +
> +#include 
> +
> +__m512i
> +foo (__m512i a)
> +{
> +  return _mm512_srlv_epi16 (a, _mm512_set1_epi16 (3));
> +}
> +
> +__m512i
> +foo1 (__m512i a)
> +{
> +  return _mm512_srlv_epi32 (a, _mm512_set1_epi32 (3));
> +}
> +
> +__m512i
> +foo2 (__m512i a, long long b)
> +{
> +  return _mm512_srlv_epi64 (a, _mm512_set1_epi64 (3));
> +}
> +
> +__m512i
> +foo3 (__m512i a)
> +{
> +  return _mm512_srav_epi16 (a, _mm512_set1_epi16 (3));
> +}
> +
> +__m512i
> +foo4 (__m512i a)
> +{
> +  return _mm512_srav_epi32 (a, _mm512_set1_epi32 (3));
> +}
> +
> +__m512i
> +foo5 (__m512i a, long long b)
> +{
> +  return _mm512_srav_epi64 (a, _mm512_set1_epi64 (3));
> +}
> +
> +__m512i
> +foo6 (__m512i a)
> +{
> +  return _mm512_sllv_epi16 (a, _mm512_set1_epi16 (3));
> +}
> +
> +__m512i
> +foo7 (__m512i a)
> +{
> +  return _mm512_sllv_epi32 (a, _mm512_set1_epi32 (3));
> +}
> +
> +__m512i
> +foo8 (__m512i a, long long b)
> +{
> +  return _mm512_sllv_epi64 (a, _mm512_set1_epi64 (3));
> +}


Re: [PATCH] middle-end/AArch64 Fix bootstrap after vec changes

2021-08-06 Thread Richard Biener via Gcc-patches
On Fri, 6 Aug 2021, Tamar Christina wrote:

> Hi All,
> 
> The build is broken since a3d3e8c362c2 since it's deleted the ability to pass
> vec<> by value and now much be past by reference.
> 
> However some language hooks used by AArch64 were not updated and breaks the
> build on AArch64.  This patch updates these hooks.
> 
> However most of the changes are generic... so I'm sending to a cross section
> of approvers.
> 
> Bootstrapped aarch64-none-linux-gnu and works again.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/c/ChangeLog:
> 
>   * c-decl.c (c_simulate_enum_decl): Pass vec<> by pointer.
>   * c-tree.h (c_simulate_enum_decl): Likewise.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-sve-builtins.cc (register_svpattern,
>   register_svprfop): Pass vec<> by pointer.
>   * langhooks-def.h (lhd_simulate_enum_decl): Likewise.
>   * langhooks.c (lhd_simulate_enum_decl): Likewise.
>   * langhooks.h (struct lang_hooks_for_types): Likewise.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-objcp-common.h (cxx_simulate_enum_decl): Pass vec<> by pointer.
>   * decl.c (cxx_simulate_enum_decl): Likewise.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 
> 234ee16fe4afe5baf3490596d27662c7acee8126..221a67fe57be105dfb88f5053179adb62c9cc47d
>  100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -9379,7 +9379,7 @@ build_enumerator (location_t decl_loc, location_t loc,
>  
>  tree
>  c_simulate_enum_decl (location_t loc, const char *name,
> -   vec values)
> +   vec *values_ptr)
>  {
>location_t saved_loc = input_location;
>input_location = loc;
> @@ -9389,6 +9389,7 @@ c_simulate_enum_decl (location_t loc, const char *name,
>  
>tree value_chain = NULL_TREE;
>string_int_pair *value;
> +  vec values = *values_ptr;
>unsigned int i;
>FOR_EACH_VEC_ELT (values, i, value)
>  {
> diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
> index 
> ab6db3860f5062d5d73e3ffacf123a992b9c6c48..a8a90eae30d54006e83b20cdcc7aa7a582686c6c
>  100644
> --- a/gcc/c/c-tree.h
> +++ b/gcc/c/c-tree.h
> @@ -595,7 +595,7 @@ extern void finish_function (location_t = input_location);
>  extern tree finish_struct (location_t, tree, tree, tree,
>  class c_struct_parse_info *);
>  extern tree c_simulate_enum_decl (location_t, const char *,
> -   vec);
> +   vec *);
>  extern struct c_arg_info *build_arg_info (void);
>  extern struct c_arg_info *get_parm_info (bool, tree);
>  extern tree grokfield (location_t, struct c_declarator *,
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 
> f44f81f13754b2d7f7391086c846ee2f966d54a7..f71b287570e4c8c00149e864db4bf03941382672
>  100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3499,7 +3499,7 @@ register_svpattern ()
>  #undef PUSH
>  
>acle_svpattern = lang_hooks.types.simulate_enum_decl (input_location,
> - "svpattern", values);
> + "svpattern", &values);
>  }
>  
>  /* Register the svprfop enum.  */
> @@ -3513,7 +3513,7 @@ register_svprfop ()
>  #undef PUSH
>  
>acle_svprfop = lang_hooks.types.simulate_enum_decl (input_location,
> -   "svprfop", values);
> +   "svprfop", &values);
>  }
>  
>  /* Implement #pragma GCC aarch64 "arm_sve.h".  */
> diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
> index 
> 53c6e4c2c8859be51b7bbcc87ce688d35c10f602..f1704aad5578c4132bcee5b3f9799dbacb6fd114
>  100644
> --- a/gcc/cp/cp-objcp-common.h
> +++ b/gcc/cp/cp-objcp-common.h
> @@ -38,7 +38,7 @@ extern bool cp_handle_option (size_t, const char *, 
> HOST_WIDE_INT, int,
> location_t, const struct cl_option_handlers *);
>  extern tree cxx_make_type_hook   (tree_code);
>  extern tree cxx_simulate_enum_decl (location_t, const char *,
> - vec);
> + vec *);
>  
>  /* Lang hooks that are shared between C++ and ObjC++ are defined here.  Hooks
> specific to C++ or ObjC++ go in cp/cp-lang.c and objcp/objcp-lang.c,
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 
> 6fa6b9adc8778ff641f6a1c6ca9083ee74863331..f626f1e65ee644a494eb57fa5bcb7f1eb05ff667
>  100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -16408,7 +16408,7 @@ lookup_enumerator (tree enumtype, tree name)
>  
>  tree
>  cxx_simulate_enum_decl (location_t loc, const char *name,
> - vec values)
> + vec *values)
>  {
>location_t saved_loc = input_location;
>input_location = loc;
> diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
> index 
> 8b744d96fb23af

[Committed] Use CFN_BUILT_IN_CLRSB instead of BUILT_IN_CLRSB in switch.

2021-08-06 Thread Roger Sayle

This patch replaces the use of BUILT_IN_CLRSB with CFN_BUILT_IN_CLRSB
in my recent patch to tree-ssa-phiopt.c.  Both of these have identical
values, so there's no change in behavior, but consistent use of the same
enumeration avoids warnings when using clang (or static analysis tools).

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.  Committed as obvious.


2021-08-06  Roger Sayle  

gcc/ChangeLog
* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern): Use
CFN_BUILT_IN_CLRSB* instead of BUILT_IN_CLRSB* for consistency.

Roger
--

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 66af902..0e339c4 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2583,13 +2583,13 @@ cond_removal_in_builtin_zero_pattern (basic_block 
cond_bb,
}
}
   return false;
-case BUILT_IN_CLRSB:
+case CFN_BUILT_IN_CLRSB:
   val = TYPE_PRECISION (integer_type_node) - 1;
   break;
-case BUILT_IN_CLRSBL:
+case CFN_BUILT_IN_CLRSBL:
   val = TYPE_PRECISION (long_integer_type_node) - 1;
   break;
-case BUILT_IN_CLRSBLL:
+case CFN_BUILT_IN_CLRSBLL:
   val = TYPE_PRECISION (long_long_integer_type_node) - 1;
   break;
 default:


Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-06 Thread Richard Biener via Gcc-patches
On Fri, Aug 6, 2021 at 11:05 AM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Fri, Aug 6, 2021 at 5:32 AM liuhongt  wrote:
> >>
> >> Hi:
> >> ---
> >> OK, I think sth is amiss here upthread.  insv/extv do look like they
> >> are designed
> >> to work on integer modes (but docs do not say anything about this here).
> >> In fact the caller of extract_bit_field_using_extv is named
> >> extract_integral_bit_field.  Of course nothing seems to check what kind of
> >> modes we're dealing with, but we're for example happily doing
> >> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> >> some integer mode and op0 is HFmode?  From the above I get it's
> >> the other way around?  In that case we should wrap the
> >> call to extract_integral_bit_field, extracting in an integer mode with the
> >> same size as 'mode' and then converting the result as (subreg:HF (reg:HI 
> >> ...)).
> >> ---
> >>   This is a separate patch as a follow up of upper comments.
> >>
> >> gcc/ChangeLog:
> >>
> >> * expmed.c (extract_bit_field_1): Wrap the call to
> >> extract_integral_bit_field, extracting in an integer mode with
> >> the same size as 'tmode' and then converting the result
> >> as (subreg:tmode (reg:imode)).
> >>
> >> gcc/testsuite/ChangeLog:
> >> * gcc.target/i386/float16-5.c: New test.
> >> ---
> >>  gcc/expmed.c  | 19 +++
> >>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
> >>  2 files changed, 31 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> >>
> >> diff --git a/gcc/expmed.c b/gcc/expmed.c
> >> index 3143f38e057..72790693ef0 100644
> >> --- a/gcc/expmed.c
> >> +++ b/gcc/expmed.c
> >> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> >> bitsize, poly_uint64 bitnum,
> >>op0_mode = opt_scalar_int_mode ();
> >>  }
> >>
> >> +  /* Make sure we are playing with integral modes.  Pun with subregs
> >> + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> >> + in extract_integral_bit_field.  */
> >> +  if (int_mode_for_mode (tmode).exists (&imode)
> >
> > check !INTEGRAL_MODE_P (tmode) before, that should be slightly
> > cheaper.  Then imode should always be != tmode.  Maybe
> > even GET_MDOE_CLASS (tmode) != MODE_INT since I'm not sure
> > how it behaves for composite modes.
> >
> > Of course the least surprises would happen when we restrict this
> > to FLOAT_MODE_P (tmode).
> >
> > Richard - any preferences?
>
> If the bug is that extract_integral_bit_field is being called with
> a non-integral mode parameter, then it looks odd that we can still
> fall through to it without an integral mode (when exists is false).
>
> If calling extract_integral_bit_field without an integral mode is
> a bug then I think we should have:
>
>   int_mode_for_mode (mode).require ()
>
> whenever mode is not already SCALAR_INT_MODE_P/is_a.
> Ideally we'd make the mode parameter scalar_int_mode too.
>
> extract_integral_bit_field currently has:
>
>   /* Find a correspondingly-sized integer field, so we can apply
>  shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (&int_mode))
> /* If this fails, we should probably push op0 out to memory and then
>do a load.  */
> int_mode = int_mode_for_mode (mode).require ();
>
> which would seem to be redundant after this change.

I'm not sure what exactly the bug is, but extract_integral_bit_field ends
up creating a lowpart subreg that's not allowed and that ICEs (and I
can't see a way to check beforehand).  So it seems to me at least
part of that function doesn't expect non-integral extraction modes.

But who knows - the code is older than I am (OK, not, but older than
my involvment in GCC ;))

Richard.

> >> +  && imode != tmode
> >> +  && imode != GET_MODE (op0))
> >> +{
> >> +  rtx ret = extract_integral_bit_field (op0, op0_mode,
> >> +   bitsize.to_constant (),
> >> +   bitnum.to_constant (), 
> >> unsignedp,
> >> +   NULL, imode, imode,
> >> +   reverse, fallback_p);
> >> +  gcc_assert (ret);
> >> +
> >> +  if (!REG_P (ret))
> >> +   ret = force_reg (imode, ret);
> >> +  return gen_lowpart_SUBREG (tmode, ret);
> >> +}
> >> +
> >>/* It's possible we'll need to handle other cases here for
> >>   polynomial bitnum and bitsize.  */
>
> Minor nit, but since the code is using to_constant, it should go after
> this comment rather than before it.
>
> Thanks,
> Richard
>
> >>
> >> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c 
> >> b/gcc/testsuite/gcc.target/i386/float16-5.c
> >> new file mode 100644
> >> index 000..ebc0af1490b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/i38

Re: [PATCH] Fix loop split incorrect count and probability

2021-08-06 Thread Richard Biener via Gcc-patches
On Tue, 3 Aug 2021, Xionghu Luo wrote:

> loop split condition is moved between loop1 and loop2, the split bb's
> count and probability should also be duplicated instead of (100% vs INV),
> secondly, the original loop1 and loop2 count need be propotional from the
> original loop.
> 
> Regression tested pass, OK for master?
> 
> diff base/loop-cond-split-1.c.151t.lsplit  
> patched/loop-cond-split-1.c.151t.lsplit:
> ...
>int prephitmp_16;
>int prephitmp_25;
> 
> [local count: 118111600]:
>if (n_7(D) > 0)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
> 
> [local count: 118111600]:
>return;
> 
> [local count: 105119324]:
>pretmp_3 = ga;
> 
> -   [local count: 955630225]:
> +   [local count: 315357973]:
># i_13 = PHI 
># prephitmp_12 = PHI 
>if (prephitmp_12 != 0)
>  goto ; [33.00%]
>else
>  goto ; [67.00%]
> 
> -   [local count: 315357972]:
> +   [local count: 104068130]:
>_2 = do_something ();
>ga = _2;
> 
> -   [local count: 955630225]:
> +   [local count: 315357973]:
># prephitmp_5 = PHI 
>i_10 = inc (i_13);
>if (n_7(D) > i_10)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
> 
> [local count: 105119324]:
>goto ; [100.00%]
> 
> -   [local count: 850510901]:
> +   [local count: 280668596]:
>if (prephitmp_12 != 0)
> -goto ; [100.00%]
> +goto ; [33.00%]
>else
> -goto ; [INV]
> +goto ; [67.00%]
> 
> -   [local count: 850510901]:
> +   [local count: 280668596]:
>goto ; [100.00%]
> 
> -   [count: 0]:
> +   [local count: 70429947]:
># i_23 = PHI 
># prephitmp_25 = PHI 
> 
> -   [local count: 955630225]:
> +   [local count: 640272252]:
># i_15 = PHI 
># prephitmp_16 = PHI 
>i_22 = inc (i_15);
>if (n_7(D) > i_22)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
> 
> -   [local count: 850510901]:
> +   [local count: 569842305]:
>goto ; [100.00%]
> 
>  }
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>   (do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3a09bbc39e5..8e5a7ded0f7 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>   basic_block cond_bb;
>  
>   class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -profile_probability::always (),
> -profile_probability::always (),
> -profile_probability::always (),
> -profile_probability::always (),
> +true_edge->probability,
> +true_edge->probability.invert (),
> +true_edge->probability,
> +true_edge->probability.invert (),
>  true);

there is no 'true_edge' variable at this point.

>   gcc_assert (loop2);
>  
> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> invar_branch)
>initialize_original_copy_tables ();
>  
>struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -  profile_probability::always (),
> -  profile_probability::never (),
> -  profile_probability::always (),
> -  profile_probability::always (),
> +  invar_branch->probability.invert (),
> +  invar_branch->probability,
> +  invar_branch->probability.invert (),
> +  invar_branch->probability,
>true);
>if (!loop2)
>  {

The patch introduction seems to talk about do_split_loop_on_cond only.
Since loop versioning inserts a condition with the passed probabilities
but in this case a 'boolean_true_node' condition the then and else
probabilities passed look correct.  It's just the scaling arguments
that look wrong?  This loop_version call should get a comment as to
why we are passing probabilities the way we do.

It does seem that scaling the loop by the invar_branch probability
is correct.  Since this does similar things to unswitching, I see
that unswitching does

prob_true = edge_true->probability;
loop_version (loop, unshare_expr (cond),
   NULL, prob_true,
   prob_true.invert (),
   prob_true, prob_true.invert (),
   false);

which maybe suggests that your invar_branch based passing should
depend on 'true_invar'?

Also compared to unswitching the first lo

Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 11:51 AM Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On Fri, 6 Aug 2021 at 14:49, Christophe Lyon
>  wrote:
> >
> >
> >
> > On Fri, Aug 6, 2021 at 11:00 AM Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org> wrote:
> >>
> >> On Thu, 5 Aug 2021 at 18:05, Christophe Lyon
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org> wrote:
> >> >>
> >> >> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org> wrote:
> >> >> >>
> >> >> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via
> Gcc-patches  wrote:
> >> >> >> >>
> >> >> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov <
> kyrylo.tkac...@arm.com> wrote:
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > > -Original Message-
> >> >> >> >> > > From: Prathamesh Kulkarni 
> >> >> >> >> > > Sent: 06 July 2021 08:06
> >> >> >> >> > > To: Christophe LYON 
> >> >> >> >> > > Cc: Kyrylo Tkachov ; gcc Patches
>  >> >> >> >> > > patc...@gcc.gnu.org>
> >> >> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in
> expanding vector
> >> >> >> >> > > constructor
> >> >> >> >> > >
> >> >> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> >> >> > >  wrote:
> >> >> >> >> > > >
> >> >> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> >> >> > > >  wrote:
> >> >> >> >> > > > >
> >> >> >> >> > > > >
> >> >> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> >> >> > > > > >  wrote:
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches
> wrote:
> >> >> >> >> > > > >  -Original Message-
> >> >> >> >> > > > >  From: Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org>
> >> >> >> >> > > > >  Sent: 28 June 2021 09:38
> >> >> >> >> > > > >  To: Kyrylo Tkachov 
> >> >> >> >> > > > >  Cc: Christophe Lyon ;
> gcc Patches
> >> >> >> >> > >  >> >> >> >> > > > >  patc...@gcc.gnu.org>
> >> >> >> >> > > > >  Subject: Re: [ARM] PR98435: Missed optimization
> in expanding
> >> >> >> >> > > vector
> >> >> >> >> > > > >  constructor
> >> >> >> >> > > > > 
> >> >> >> >> > > > >  On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> >> >> > > 
> >> >> >> >> > > > >  wrote:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > >> -Original Message-
> >> >> >> >> > > > > >> From: Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org>
> >> >> >> >> > > > > >> Sent: 14 June 2021 09:02
> >> >> >> >> > > > > >> To: Christophe Lyon  >
> >> >> >> >> > > > > >> Cc: gcc Patches ;
> Kyrylo Tkachov
> >> >> >> >> > > > > >> 
> >> >> >> >> > > > > >> Subject: Re: [ARM] PR98435: Missed optimization
> in expanding
> >> >> >> >> > > vector
> >> >> >> >> > > > > >> constructor
> >> >> >> >> > > > > >>
> >> >> >> >> > > > > >> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> >> >> >> > > > > >>  wrote:
> >> >> >> >> > > > > >>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> >> >> > > > >  
> >> >> >> >> > > > > >> wrote:
> >> >> >> >> > > > >  On Fri, 4 Jun 2021 at 09:27, Prathamesh
> Kulkarni via Gcc-
> >> >> >> >> > > patches
> >> >> >> >> > > > >   wrote:
> >> >> >> >> > > > > > Hi,
> >> >> >> >> > > > > > As mentioned in PR, for the following
> test-case:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > #include 
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > bfloat16x4_t f1 (bfloat16_t a)
> >> >> >> >> > > > > > {
> >> >> >> >> > > > > > return vdup_n_bf16 (a);
> >> >> >> >> > > > > > }
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > bfloat16x4_t f2 (bfloat16_t a)
> >> >> >> >> > > > > > {
> >> >> >> >> > > > > > return (bfloat16x4_t) {a, a, a, a};
> >> >> >> >> > > > > > }
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > Compiling with arm-linux-gnueabi -O3
> -mfpu=neon -mfloat-
> >> >> >> >> > > > >  abi=softfp
> >> >> >> >> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not
> being
> >> >> >> >> > > vectorized:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > f1:
> >> >> >> >> > > > > >   vdup.16 d16, r0
> >> >> >> >> > > > > >   vmovr0, r1, d16  @ v4bf
> >> >> >> >> > > > > >   bx  lr
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > f2:
> >> >> >> >> > > > > >   mov r3, r0  @ __bf16
> >> >> >> >> > > > > >   adr r1, .L4
> >> >> >> >> 

Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2021-08-06 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply.

Robin Dapp via Gcc-patches  writes:
>> Hmm, OK.  Doesn't expanding both versions up-front create the same kind of
>> problem that the patch is fixing, in that we expand (and therefore cost)
>> both the reversed and unreversed comparison?  Also…
>> 
> [..]
>> 
>> …for min/max, I would have expected this swap to create the canonical
>> operand order for the min and max too.  What causes it to be rejected?
>> 
>
> We should not be expanding two comparisons but only emit (and cost) the 
> reversed comparison if expanding the non-reversed one failed.

The (potential) problem is that prepare_cmp_insn can itself emit
instructions.  With the current code we rewind any prepare_cmp_insn
that isn't needed, whereas with the new code we might keep both.

This also means that prepare_cmp_insn calls need to stay inside the:

  saved_pending_stack_adjust save;
  save_pending_stack_adjust (&save);
  last = get_last_insn ();
  do_pending_stack_adjust ();

  …

  delete_insns_since (last);
  restore_pending_stack_adjust (&save);

block.

> Regarding the reversal, I checked again - the commit introducing the 
> op2/op3 swap is g:deed3da9af697ecf073aea855ecce2d22d85ef71, the 
> corresponding test case is gcc.target/i386/pr70465-2.c.  It inlines one 
> long double ternary operation into another, probably causing  not for 
> multiple sets, mind you.  The situation doesn't occur with double.

OK, so going back to that revision and using the original SciMark test
case, we first try:

  (lt (reg/v:DF 272 [ ab ])
  (reg/v:DF 271 [ t ]))
  (reg/v:SI 227 [ jp ])
  (subreg:SI (reg:DI 346 [ ivtmp.59 ]) 0)

but i386 doesn't provide a native cbranchdf4 for lt and so the
prepare_cmp_insn fails.  Interesting that we use cbranch4
as the test for what conditional moves should accept, but I guess
that isn't something to change now.

So the key piece of information that I didn't realise before is
that it was the prepare_cmp_insn that failed, not the movcc
expander.  I think we can accomodate that in the new scheme
by doing:

  if (rev_comparison && COMPARISON_P (rev_comparison))
prepare_cmp_insn (XEXP (rev_comparison, 0), XEXP (rev_comparison, 1),
  GET_CODE (rev_comparison), NULL_RTX,
  unsignedp, OPTAB_WIDEN, &rev_comparison, &cmode);

first and then making:

  if (comparison && COMPARISON_P (comparison))
prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
  GET_CODE (comparison), NULL_RTX,
  unsignedp, OPTAB_WIDEN, &comparison, &cmode);

conditional on !rev_comparison.  But maybe the above makes
that moot.

>>> +
>>> +  rtx rev_comparison = NULL_RTX;
>>> bool swapped = false;
>>> -  if (swap_commutative_operands_p (op2, op3)
>>> -  && ((reversed = reversed_comparison_code_parts (code, op0, op1, 
>>> NULL))
>>> -  != UNKNOWN))
>>> +
>>> +  code = unsignedp ? unsigned_condition (code) : code;
>>> +  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
>>> +
>>> +  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
>>> +  != UNKNOWN)
>>>   {
>>> -  std::swap (op2, op3);
>>> -  code = reversed;
>>> -  swapped = true;
>>> +  reversed = unsignedp ? unsigned_condition (reversed) : reversed;
>> 
>> When is this needed?  I'd have expected the reversed from of an unsigned
>> code to be naturally unsigned.
>
> This was also introduced by the commit above, probably just repeating 
> what was done for the non-reversed comparison.

Yeah, but in the original code, the first reverse_comparison_code_parts
happens outside the loop, before the first unsigned_condition (which
happens inside the loop).  In the new code, the unsigned_condition
happens first, before we try reversing it.

IMO the new order makes more sense than the old one.  But it means that
reversed_comparison_code_parts always sees a comparison of the right
signedness, so we shouldn't need to adjust the result.

Thanks,
Richard


Re: [RFC] Don't move cold code out of loop by checking bb count

2021-08-06 Thread Richard Biener via Gcc-patches
On Mon, Aug 2, 2021 at 7:05 AM Xiong Hu Luo  wrote:
>
> There was a patch trying to avoid move cold block out of loop:
>
> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>
> Richard suggested to "never hoist anything from a bb with lower execution
> frequency to a bb with higher one in LIM invariantness_dom_walker
> before_dom_children".
>
> This patch does this profile count check in both gimple LIM
> move_computations_worker and RTL loop-invariant.c find_invariants_bb,
> if the loop bb is colder than loop preheader, don't hoist it out of
> loop.
>
> Also, the profile count in loop split pass should be corrected to avoid
> lim2 and lim4 mismatch behavior, currently, the new loop preheader generated
> by loop_version is set to "[count: 0]:", then lim4 after lsplt pass will
> move statement out of loop unexpectely when lim2 didn't move it.  This
> change could fix regression on 544.nab_r from -1.55% to +0.46%.
>
> SPEC2017 performance evaluation shows 1% performance improvement for
> intrate GEOMEAN and no obvious regression for others.  Especially,
> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> on P8LE.
>
> Regression and bootstrap tested pass on P8LE, any comments?  Thanks.

While I'm not familiar with the RTL invariant motion pass the patch there
looks reasonable.  Note that we should assess the profile quality
somehow - I'm not sure how to do that, CCed Honza for that.

For the GIMPLE part the patch looks quite complicated - but note it
probably has to be since LIM performs kind of a "CSE" on loads
(and stores for store-motion), so when there are multiple stmts
affected by a hoisting decision the biggest block count has to be
accounted.  Likewise when there are dependent stmts involved
that might include conditional stmts (a "PHI"), but the overall
cost should be looked at.

Now - GIMPLE LIM "costing" is somewhat backward right now
and it isn't set up to consider those multiple involved stmts.  Plus
the store-motion part does not have any cost part (but it depends
on previously decided invariant motions).

I think the way you implemented the check will cause no hoisting
to be performed instead of, say, hoisting to a different loop level
only.  Possibly shown when you consider a loop nest like

  for (;;)
if (unlikely_cond)
  for (;;)
 invariant;

we want to hoist 'invariant' but only from the inner loop even if it
is invariant also in the outer loop.  But for example if there is
a store motion opportunity like

  for (;;)
 {
if (unlikely_cond)
  for (;;)
a = ...;
a = ...;
 }

we'd still want to perform the store motion on the outer loop.

Note that store-motion already performs part of the transform
before dependent code is moved in move_computations (that
you patched).

IIRC your main concern were the COND_EXPRs we insert
for hoisted conditional stmts?

Thanks,
Richard.

> gcc/ChangeLog:
>
> * loop-invariant.c (find_invariants_bb): Check profile count
> before motion.
> (find_invariants_body): Add argument.
> * tree-ssa-loop-im.c (move_computations_worker): Check profile
> count before motion.
> (execute_sm): Likewise.
> (execute_sm_exit): Check pointer validness.
> * tree-ssa-loop-split.c (split_loop): Correct probability.
> (do_split_loop_on_cond): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> ---
>  gcc/loop-invariant.c|  10 +-
>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c |   2 +-
>  gcc/tree-ssa-loop-im.c  | 164 +++-
>  gcc/tree-ssa-loop-split.c   |  14 +-
>  4 files changed, 177 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index bdc7b59dd5f..7b5d64d11f9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
> always_reached, bool always_executed)
> call.  */
>
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool 
> always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> +   bool always_executed)
>  {
>rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  if (preheader->count > bb->count)
> +return;
>
>FOR_BB_INSNS (bb, insn)
>  {
> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
> *body,
>unsigned i;
>
>for (i = 0; i < loop->num_nodes; i++)
> -find_invariants_bb (body[i],
> -   bitmap_bit_p (always_reached, i),
> +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> bitmap_bit_p (always_executed, i));
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c 
> b/gcc/testsuite/gcc.dg/tr

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 9:52 AM Christophe Lyon <
christophe.lyon@gmail.com> wrote:

>
>
> On Fri, Aug 6, 2021 at 4:07 AM Martin Sebor via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
>> On 7/30/21 9:06 AM, Jason Merrill wrote:
>> > On 7/27/21 2:56 PM, Martin Sebor wrote:
>> >> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575690.html
>> >>
>> >> Are there any other suggestions or comments or is the latest revision
>> >> okay to commit?
>> >
>> > OK.
>>
>> I had to make a few more adjustments to fix up code that's snuck
>> in since I last tested the patch.  I committed r12-2776 after
>> retesting on x86_64-linux.
>>
>> With the cleanup out of the way I'll resubmit the copy ctor patch
>> next.
>>
>>
> Hi Martin,
>
> Your patch breaks the aarch64 build:
>  
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:
> In function 'void aarch64_sve::register_svpattern()':
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:3502:27:
> error: use of deleted function 'vec::vec(auto_vec&) [with long
> unsigned int N = 32ul;
> T = std::pair]'
> "svpattern", values);
>^
> In file included from
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/hash-table.h:248:0,
>  from
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:480,
>  from
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:24:
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vec.h:1455:3:
> error: declared here
>vec (auto_vec &) = delete;
>^
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:
> In function 'void aarch64_sve::register_svprfop()':
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:3516:30:
> error: use of deleted function 'vec::vec(auto_vec&) [with long
> unsigned int N = 16ul;
> T = std::pair]'
>  "svprfop", values);
>   ^
> In file included from
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/hash-table.h:248:0,
>  from
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:480,
>  from
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64-sve-builtins.cc:24:
> /tmp/1784440_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/vec.h:1455:3:
> error: declared here
>vec (auto_vec &) = delete;
>^
>
> Can you check?
>
> Thanks,
>

This has now been fixed by Tamar, thanks!

Christophe


>
> Christophe
>
>>
>> Martin
>>
>> >
>> >> On 7/20/21 12:34 PM, Martin Sebor wrote:
>> >>> On 7/14/21 10:23 AM, Jason Merrill wrote:
>>  On 7/14/21 10:46 AM, Martin Sebor wrote:
>> > On 7/13/21 9:39 PM, Jason Merrill wrote:
>> >> On 7/13/21 4:02 PM, Martin Sebor wrote:
>> >>> On 7/13/21 12:37 PM, Jason Merrill wrote:
>>  On 7/13/21 10:08 AM, Jonathan Wakely wrote:
>> > On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:
>> >> Somebody with more C++ knowledge than me needs to approve the
>> >> vec.h changes - I don't feel competent to assess all effects
>> >> of the change.
>> >
>> > They look OK to me except for:
>> >
>> > -extern vnull vNULL;
>> > +static constexpr vnull vNULL{ };
>> >
>> > Making vNULL have static linkage can make it an ODR violation
>> > to use
>> > vNULL in templates and inline functions, because different
>> > instantiations will refer to a different "vNULL" in each
>> > translation
>> > unit.
>> 
>>  The ODR says this is OK because it's a literal constant with the
>>  same value (6.2/12.2.1).
>> 
>>  But it would be better without the explicit 'static'; then in
>>  C++17 it's implicitly inline instead of static.
>> >>>
>> >>> I'll remove the static.
>> >>>
>> 
>>  But then, do we really want to keep vNULL at all?  It's a weird
>>  blurring of the object/pointer boundary that is also dependent
>>  on vec being a thin wrapper around a pointer.  In almost all
>>  cases it can be replaced with {}; one exception is ==
>>  comparison, where it seems to be testing that the embedded
>>  pointer is null, which is a weird thing to want to test.
>> >>>
>> >>> The one use case I know of for vNULL where I can't think of
>> >>> an equally good substitute is in passing a vec as an argument by
>> >>> value.  The only way to do that that I can think of is to name
>> >>> the full vec type (i.e., the specialization) which is more typing
>> >>> and less generic than vNULL.  I don't use vNULL myself so I
>> wouldn't
>> >>> miss this trick if it

Re: [PATCH] testsuite: aarch64: Fix failing vector structure tests on big-endian

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c 
> b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> index 
> 60c53bc27f8378c78b119576ed19fde0e5743894..a8e31ab85d6fd2a045c8efaf2cbc42b5f40d2411
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> @@ -197,7 +197,8 @@ TEST_ST1x3 (vst1q, uint64x2x3_t, uint64_t*, u64, x3);
>  TEST_ST1x3 (vst1q, poly64x2x3_t, poly64_t*, p64, x3);
>  TEST_ST1x3 (vst1q, float64x2x3_t, float64_t*, f64, x3);
>  
> -/* { dg-final { scan-assembler-not "mov\\t" } } */
> +/* { dg-final { scan-assembler-not {"mov\\t"} {
> + target { aarch64_little_endian } } ) }  */

I think this needs to stay on line.  We should also either keep the
original quoting on the regexp or use {mov\t}.  Having both forms
of quote would turn it into a test for the characters:

   "mov\t"

(including quotes and backslash).

Thanks,
Richard


>  
>  /* { dg-final { scan-assembler-times "tbl\\t" 18} }  */
>  /* { dg-final { scan-assembler-times "tbx\\t" 18} }  */



[PATCH] i386: Fix conditional move reg-to-reg move elimination peepholes [PR101797]

2021-08-06 Thread Uros Bizjak via Gcc-patches
Add missing operand predicate, otherwise any RTX will match.

2021-08-06  Uroš Bizjak  

gcc/
PR target/101797
* config/i386/i386.md (cmove reg-to-reg move elimination peephole2s):
Add general_gr_operand predicate to operand 3.

gcc/testsuite/
PR target/101797
* gcc.target/i386/pr101797.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 51e8b475bca..bc1c30b77f4 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19428,7 +19428,7 @@
   (parallel [(set (reg FLAGS_REG) (match_operand 5))
 (set (match_dup 0) (match_operand:SWI248 6))])
   (set (match_operand:SWI248 2 "general_reg_operand")
-   (match_operand:SWI248 3))
+   (match_operand:SWI248 3 "general_gr_operand"))
   (set (match_dup 0)
(if_then_else:SWI248 (match_operator 4 "ix86_comparison_operator"
 [(reg FLAGS_REG) (const_int 0)])
@@ -19456,7 +19456,7 @@
 ;; mov r2,r3; mov r0,r1; dec r0; cmov r0,r2 -> dec r1; mov r0,r3; cmov r0, r1
 (define_peephole2
  [(set (match_operand:SWI248 2 "general_reg_operand")
-   (match_operand:SWI248 3))
+   (match_operand:SWI248 3 "general_gr_operand"))
   (set (match_operand:SWI248 0 "general_reg_operand")
(match_operand:SWI248 1 "general_reg_operand"))
   (parallel [(set (reg FLAGS_REG) (match_operand 5))
diff --git a/gcc/testsuite/gcc.target/i386/pr101797.c 
b/gcc/testsuite/gcc.target/i386/pr101797.c
new file mode 100644
index 000..d5cc34e54a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101797.c
@@ -0,0 +1,15 @@
+/* PR target/101797 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int a;
+int main() {
+  int b, c, d, e = 0;
+  if (a) {
+c += a;
+e = ~(a % c);
+e || c || (b & d);
+  }
+  a = e;
+  return 0;
+}


Re: [PATCH] [i386] Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

2021-08-06 Thread H.J. Lu via Gcc-patches
cast_from_integer_constant (machine_mode 
> > > > mode, rtx op)
> > > >
> > > >/* Don't use integer vector broadcast if we can't move from GPR to 
> > > > SSE
> > > >   register directly.  */
> > > > -  if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> > > > +  if (!TARGET_INTER_UNIT_MOVES_TO_VEC
> > > > +  && INTEGRAL_MODE_P (mode))
> > > >  return nullptr;
> > > >
> > > >/* Convert CONST_VECTOR to a non-standard SSE constant integer
> > > > @@ -470,12 +473,17 @@ ix86_broadcast_from_integer_constant 
> > > > (machine_mode mode, rtx op)
> > > >if (!(TARGET_AVX2
> > > > || (TARGET_AVX
> > > > && (GET_MODE_INNER (mode) == SImode
> > > > -   || GET_MODE_INNER (mode) == DImode)))
> > > > +   || GET_MODE_INNER (mode) == DImode))
> > > > +   || FLOAT_MODE_P (mode))
> > > >|| standard_sse_constant_p (op, mode))
> > > >  return nullptr;
> > > >
> > > > -  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.  */
> > > > -  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT)
> > > > +  /* Don't broadcast from a 64-bit integer constant in 32-bit mode.
> > > > + We can still put 64-bit integer constant in memory when
> > > > + avx512 embed broadcast is available.  */
> > > > +  if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT
> > > > +  && (!TARGET_AVX512F
> > > > + || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL)))
> > > >  return nullptr;
> > > >
> > > >if (GET_MODE_INNER (mode) == TImode)
> > > > @@ -561,17 +569,20 @@ ix86_expand_vector_move (machine_mode mode, rtx 
> > > > operands[])
> > > >
> > > >if (can_create_pseudo_p ()
> > > >&& GET_MODE_SIZE (mode) >= 16
> > > > -  && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> > > > +  && VECTOR_MODE_P (mode)
> > > >&& (MEM_P (op1)
> > > >   && SYMBOL_REF_P (XEXP (op1, 0))
> > > >   && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0
> > > >  {
> > > > -  rtx first = ix86_broadcast_from_integer_constant (mode, op1);
> > > > +  rtx first = ix86_broadcast_from_constant (mode, op1);
> > > >if (first != nullptr)
> > > > {
> > > >   /* Broadcast to XMM/YMM/ZMM register from an integer
> > > > -constant.  */
> > > > - op1 = ix86_gen_scratch_sse_rtx (mode);
> > > > +constant or scalar mem.  */
> > > > + op1 = gen_reg_rtx (mode);
> > > I've changed ix86_gen_scratch_sse_rtx to gen_reg_rtx to let LRA/reload
> > > make the final decision for register usage, would that make sense?
> >
> > Hard registers are used for 2 purposes:
> >
> > 1.  Prevent stack realignment when the original code doesn't use vector
> > registers, which is the same for memcpy and memset.
> > 2.  Prevent combine to convert constant broadcast to load from constant
> > pool.
> >
> W/ gcc version 12.0.0 20210806 (experimental) (GCC) and replaced
> ix86_gen_scratch_sse_rtx with gen_reg_rtx i didn't observe the failure
> related to upper cases.

Did you run full GCC testsuite with

RUNTESTFLAGS="--target_board='unix{-m32,}'"

My partial results show

FAIL: gcc.target/i386/pieces-memset-3.c scan-assembler-not and[^\n\r]*%[re]sp
FAIL: gcc.target/i386/pieces-memset-37.c scan-assembler-not and[^\n\r]*%[re]sp
FAIL: gcc.target/i386/pieces-memset-37.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-39.c scan-assembler-not and[^\n\r]*%[re]sp
FAIL: gcc.target/i386/pieces-memset-39.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-3.c scan-assembler-not and[^\n\r]*%[re]sp
FAIL: gcc.target/i386/pieces-memset-3.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-37.c scan-assembler-not and[^\n\r]*%[re]sp
FAIL: gcc.target/i386/pieces-memset-37.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pieces-memset-39.c scan-assembler-not and[^\n\r]*%[re]sp
FAIL: gcc.target/i386/pieces-memset-39.c scan-assembler-not %[re]bp
FAIL: gcc.target/i386/pr90773-14.c scan-assembler-times movd[\\t
]+%xmm[0-9]+, 16\\(%[^,]+\\) 1
FAIL: gcc.target/i386/pr90773-17.c scan-assembler-times vmovd[\\t
]+%xmm[0-9]+, 15\\(%

Re: [PATCH, AArch64] PR target/101609 - Use the correct iterator for AArch64 vector right shift pattern.

2021-08-06 Thread Richard Sandiford via Gcc-patches
Tejas Belagod via Gcc-patches  writes:
> Hi,
>
> Loops containing long long shifts fail to vectorize due to the vectorizer
> not being able to recognize long long right shifts. This is due to a bug
> in the iterator used for the vashr and vlshr patterns in aarch64-simd.md.
>
> Tested and bootstrapped on aarch64-linux. OK?
>
> 2021-08-05  Tejas Belagod  
>
> gcc/ChangeLog:
>
> PR target/101609
> * config/aarch64/aarch64-simd.md (vlshr3, vashr3): Use
>   the right iterator.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vect-shr-reg.c: New testcase.
> * gcc.target/aarch64/vect-shr-reg-run.c: Likewise.

OK, thanks.  Nice how we're still finding these little easter eggs
from the dawn of the port. :-)

Richard

>
>
> Thanks,
> Tejas Belagod.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> c5638d096fa84a27b4ea397f62cd0d05a28e7c8c..48eddf64e05afe3788abfa05141f6544a9323ea1
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1299,13 +1299,10 @@ (define_expand "vashl3"
>DONE;
>  })
>  
> -;; Using mode VDQ_BHSI as there is no V2DImode neg!
> -;; Negating individual lanes most certainly offsets the
> -;; gain from vectorization.
>  (define_expand "vashr3"
> - [(match_operand:VDQ_BHSI 0 "register_operand")
> -  (match_operand:VDQ_BHSI 1 "register_operand")
> -  (match_operand:VDQ_BHSI 2 "register_operand")]
> + [(match_operand:VDQ_I 0 "register_operand")
> +  (match_operand:VDQ_I 1 "register_operand")
> +  (match_operand:VDQ_I 2 "register_operand")]
>   "TARGET_SIMD"
>  {
>rtx neg = gen_reg_rtx (mode);
> @@ -1333,9 +1330,9 @@ (define_expand "aarch64_ashr_simddi"
>  )
>  
>  (define_expand "vlshr3"
> - [(match_operand:VDQ_BHSI 0 "register_operand")
> -  (match_operand:VDQ_BHSI 1 "register_operand")
> -  (match_operand:VDQ_BHSI 2 "register_operand")]
> + [(match_operand:VDQ_I 0 "register_operand")
> +  (match_operand:VDQ_I 1 "register_operand")
> +  (match_operand:VDQ_I 2 "register_operand")]
>   "TARGET_SIMD"
>  {
>rtx neg = gen_reg_rtx (mode);
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c
> new file mode 100644
> index 
> ..3190448e0936b9d5265f538304f9d20f13927339
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -march=armv8.2-a" } */
> +
> +#include "vect-shr-reg.c"
> +
> +int
> +main(void)
> +{
> +  int64_t a[16];
> +  int64_t b[16];
> +  int64_t c[17];
> +
> +  uint64_t ua[16];
> +  uint64_t ub[16];
> +  uint64_t uc[17];
> +
> +  int64_t res_a[16];
> +  uint64_t res_ua[16];
> +
> +  int i;
> +
> +  /* Set up inputs.  */
> +  for (i = 0; i < 16; i++)
> +{
> +  b[i] = -2;
> +  c[i] = 34;
> +  ub[i] = 0x;
> +  uc[i] = 52;
> +}
> +
> +  /* Set up reference values.  */
> +  for (i = 0; i < 16; i++)
> +{
> +  res_a[i] = -1LL;
> +  res_ua[i] = 0x0fffLL;
> +}
> +
> +  /* Do the shifts.  */
> +  f (ua, ub, uc);
> +  g (a, b, c);
> +
> +  /* Compare outputs against reference values.  */
> +  for (i = 0; i < 16; i++)
> +{
> +  if (a[i] != res_a[i])
> + __builtin_abort ();
> +
> +  if (ua[i] != res_ua[i])
> + __builtin_abort ();
> +}
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c
> new file mode 100644
> index 
> ..5736dafb5a19957032e7b4bc1e90b218f52788fb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a" } */
> +
> +#include 
> +#include 
> +
> +#pragma GCC target "+nosve"
> +
> +int __attribute__((noinline))
> +f(uint64_t *__restrict a, uint64_t *__restrict b, uint64_t *__restrict c)
> +{
> +  int i;
> +
> +  for (i = 0; i < 16; i++)
> +a[i] = b[i] >> c[i];
> +}
> +
> +
> +int __attribute__((noinline))
> +g(int64_t *__restrict a, int64_t *__restrict b, int64_t *__restrict c)
> +{
> +  int i;
> +
> +  for (i = 0; i < 16; i++)
> +a[i] = b[i] >> c[i];
> +}
> +
> +/* { dg-final { scan-assembler "neg\\tv" } } */
> +/* { dg-final { scan-assembler "ushl\\tv" } } */
> +/* { dg-final { scan-assembler "sshl\\tv" } } */


Re: [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si

2021-08-06 Thread Bill Schmidt via Gcc-patches

Hi Kewen,

On 8/4/21 9:06 PM, Kewen.Lin wrote:

Hi,

The existing vec_unpacku_{hi,lo} supports emulated unsigned
unpacking for short and char but misses the support for int.
This patch adds the support for vec_unpacku_{hi,lo}_v4si.

Meanwhile, the current implementation uses vector permutation
way, which requires one extra customized constant vector as
the permutation control vector.  It's better to use vector
merge high/low with zero constant vector, to save the space
in constant area as well as the cost to initialize pcv in
prologue.  This patch updates it with vector merging and
simplify it with iterators.

Bootstrapped & regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8.

btw, the loop in unpack-vectorize-2.c doesn't get vectorized
without this patch, unpack-vectorize-[13]* is to verify
the vector merging and simplification works expectedly.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
(vec_unpacku_hi_v8hi): Likewise.
(vec_unpacku_lo_v16qi): Likewise.
(vec_unpacku_lo_v8hi): Likewise.
(vec_unpacku_hi_): New define_expand.
(vec_unpacku_lo_): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/unpack-vectorize-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-1.h: New test.
* gcc.target/powerpc/unpack-vectorize-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-2.h: New test.
* gcc.target/powerpc/unpack-vectorize-3.c: New test.
* gcc.target/powerpc/unpack-vectorize-3.h: New test.
* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
* gcc.target/powerpc/unpack-vectorize.h: New test.
---
 gcc/config/rs6000/altivec.md  | 158 --
 .../gcc.target/powerpc/unpack-vectorize-1.c   |  18 ++
 .../gcc.target/powerpc/unpack-vectorize-1.h   |  14 ++
 .../gcc.target/powerpc/unpack-vectorize-2.c   |  12 ++
 .../gcc.target/powerpc/unpack-vectorize-2.h   |   7 +
 .../gcc.target/powerpc/unpack-vectorize-3.c   |  11 ++
 .../gcc.target/powerpc/unpack-vectorize-3.h   |   7 +
 .../powerpc/unpack-vectorize-run-1.c  |  24 +++
 .../powerpc/unpack-vectorize-run-2.c  |  16 ++
 .../powerpc/unpack-vectorize-run-3.c  |  16 ++
 .../gcc.target/powerpc/unpack-vectorize.h |  42 +
 11 files changed, 196 insertions(+), 129 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c

 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d70c17e6bc2..0e8b66cd6a5 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -134,10 +134,8 @@ (define_c_enum "unspec"
    UNSPEC_VMULWLUH
    UNSPEC_VMULWHSH
    UNSPEC_VMULWLSH
-   UNSPEC_VUPKHUB
-   UNSPEC_VUPKHUH
-   UNSPEC_VUPKLUB
-   UNSPEC_VUPKLUH
+   UNSPEC_VUPKHUBHW
+   UNSPEC_VUPKLUBHW



Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we 
extend this later to other types.  Fine either way.



    UNSPEC_VPERMSI
    UNSPEC_VPERMHI
    UNSPEC_INTERHI
@@ -3885,143 +3883,45 @@ (define_insn "xxeval"
    [(set_attr "type" "vecsimple")
 (set_attr "prefixed" "yes")])

-(define_expand "vec_unpacku_hi_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-    (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
- UNSPEC_VUPKHUB))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero = gen_reg_rtx (V8HImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
-
-  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT 

Re: [committed] libstdc++: Move attributes that follow requires-clauses [PR101782]

2021-08-06 Thread Jonathan Wakely via Gcc-patches

On 05/08/21 19:02 +0100, Jonathan Wakely wrote:

On 05/08/21 15:40 +0100, Jonathan Wakely wrote:

On 05/08/21 15:19 +0100, Jonathan Wakely wrote:

On 04/08/21 12:55 +0100, Jonathan Wakely wrote:

This adds [[nodiscard]] throughout , as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.


This change causes errors when -fconcepts-ts is used. Fixed like so.

Tested powerpc64le-linux, committed to trunk.




commit 7b1de3eb9ed3f8dde54732d88520292c5ad1157d
Author: Jonathan Wakely 
Date:   Thu Aug 5 13:34:00 2021

 libstdc++: Move attributes that follow requires-clauses [PR101782]
 As explained in the PR, the grammar in the Concepts TS means that a [
 token following a requires-clause is parsed as part of the
 logical-or-expression rather than the start of an attribute. That makes
 the following ill-formed when using -fconcepts-ts:
   template requires foo [[nodiscard]] int f(T);
 This change moves all attributes that follow a requires-clause to the
 end of the function declarator.



Except that as Jakub pointed out, putting it there doesn't work.

It needs to be:

template requires foo int f [[nodiscard]] (T);

At least the testsuite isn't failing now, but the attributes I moved
have no effect. I'll fix it ... some time.


This should be correct now.


Except I needed to move the ones I added to  as well.

Tested powerpc64le-linux, pushed to trunk.


commit c2a984a3570b908a44a35e43bb48f0a05196156a
Author: Jonathan Wakely 
Date:   Fri Aug 6 13:43:26 2021

libstdc++: Also move the [[nodiscard]] attributes in 

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* libsupc++/compare (compare_three_way, strong_order)
(weak_order, partial_order, compare_strong_order_fallback)
(compare_weak_order_fallback, compare_partial_order_fallback):
Move nodiscard attributes to correct location.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index faeff641437..5aee89e3a6e 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -548,9 +548,8 @@ namespace std
 template
   requires three_way_comparable_with<_Tp, _Up>
   constexpr auto
-  operator()(_Tp&& __t, _Up&& __u) const
+  operator() [[nodiscard]] (_Tp&& __t, _Up&& __u) const
   noexcept(noexcept(std::declval<_Tp>() <=> std::declval<_Up>()))
-  [[nodiscard]]
   {
 	if constexpr (__detail::__3way_builtin_ptr_cmp<_Tp, _Up>)
 	  {
@@ -672,9 +671,8 @@ namespace std
   template _Up>
 	requires __strongly_ordered<_Tp, _Up>
 	constexpr strong_ordering
-	operator()(_Tp&& __e, _Up&& __f) const
+	operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
 	noexcept(_S_noexcept<_Tp, _Up>())
-	[[nodiscard]]
 	{
 	  /* FIXME:
 	  if constexpr (floating_point>)
@@ -720,9 +718,8 @@ namespace std
   template _Up>
 	requires __weakly_ordered<_Tp, _Up>
 	constexpr weak_ordering
-	operator()(_Tp&& __e, _Up&& __f) const
+	operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
 	noexcept(_S_noexcept<_Tp, _Up>())
-	[[nodiscard]]
 	{
 	  if constexpr (floating_point>)
 	return __cmp_cust::__fp_weak_ordering(__e, __f);
@@ -766,9 +763,8 @@ namespace std
   template _Up>
 	requires __partially_ordered<_Tp, _Up>
 	constexpr partial_ordering
-	operator()(_Tp&& __e, _Up&& __f) const
+	operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
 	noexcept(_S_noexcept<_Tp, _Up>())
-	[[nodiscard]]
 	{
 	  if constexpr (__adl_partial<_Tp, _Up>)
 	return partial_ordering(partial_order(static_cast<_Tp&&>(__e),
@@ -808,9 +804,8 @@ namespace std
   template _Up>
 	requires __strongly_ordered<_Tp, _Up> || __op_eq_lt<_Tp, _Up>
 	constexpr strong_ordering
-	operator()(_Tp&& __e, _Up&& __f) const
+	operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
 	noexcept(_S_noexcept<_Tp, _Up>())
-	[[nodiscard]]
 	{
 	  if constexpr (__strongly_ordered<_Tp, _Up>)
 	return _Strong_order{}(static_cast<_Tp&&>(__e),
@@ -841,9 +836,8 @@ namespace std
   template _Up>
 	requires __weakly_ordered<_Tp, _Up> || __op_eq_lt<_Tp, _Up>
 	constexpr weak_ordering
-	operator()(_Tp&& __e, _Up&& __f) const
+	operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
 	noexcept(_S_noexcept<_Tp, _Up>())
-	[[nodiscard]]
 	{
 	  if constexpr (__weakly_ordered<_Tp, _Up>)
 	return _Weak_order{}(static_cast<_Tp&&>(__e),
@@ -884,9 +878,8 @@ namespace std
   template _Up>
 	requires __partially_ordered<_Tp, _Up> || __op_eq_lt_lt<_Tp, _Up>
 	constexpr partial_ordering
-	operator()(_Tp&& __e, _Up&& __f) const
+	operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
 	noexcept(_S_noexcept<_Tp, _Up>())
-	[[nodiscard]]
 	{
 	  if constexpr (__partially_ordered<_Tp, _Up>)
 	return _Partial_order{}(static_cast<_Tp&&>(__e),


[PATCH] tree-optimization/101801 - remove vect_worthwhile_without_simd_p

2021-08-06 Thread Richard Biener via Gcc-patches
This removes the cost part of vect_worthwhile_without_simd_p, retaining
only the correctness bits.  The reason is that the cost heuristic
do not properly account for SLP plus the check whether "without simd"
applies misfires for AVX512 mask vectors at the moment, leading to
missed vectorizations there.

Any costing decision should take place in the cost modeling, no
single stmt is to disable all vectorization on its own.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

2021-08-06  Richard Biener  

PR tree-optimization/101801
* tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename...
(vect_can_vectorize_without_simd_p): ... to this.
* tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename...
(vect_can_vectorize_without_simd_p): ... to this and fold
in vect_min_worthwhile_factor.
(vect_min_worthwhile_factor): Remove.
(vectorizable_reduction): Adjust and remove the cost part.
* tree-vect-stmts.c (vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
---
 gcc/tree-vect-loop.c  | 43 +++
 gcc/tree-vect-stmts.c | 26 ++
 gcc/tree-vectorizer.h |  2 +-
 3 files changed, 10 insertions(+), 61 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 1e21fe6b13d..37c7daa7f9e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7227,24 +7227,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (dump_enabled_p ())
 dump_printf (MSG_NOTE, "op not supported by target.\n");
  if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
- || !vect_worthwhile_without_simd_p (loop_vinfo, code))
+ || !vect_can_vectorize_without_simd_p (code))
ok = false;
  else
if (dump_enabled_p ())
  dump_printf (MSG_NOTE, "proceeding using word mode.\n");
 }
 
-  /* Worthwhile without SIMD support?  */
-  if (ok
- && !VECTOR_MODE_P (TYPE_MODE (vectype_in))
- && !vect_worthwhile_without_simd_p (loop_vinfo, code))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not worthwhile without SIMD support.\n");
- ok = false;
-}
-
   /* lane-reducing operations have to go through vect_transform_reduction.
  For the other cases try without the single cycle optimization.  */
   if (!ok)
@@ -7948,46 +7937,28 @@ vectorizable_phi (vec_info *,
 }
 
 
-/* Function vect_min_worthwhile_factor.
+/* Return true if we can emulate CODE on an integer mode representation
+   of a vector.  */
 
-   For a loop where we could vectorize the operation indicated by CODE,
-   return the minimum vectorization factor that makes it worthwhile
-   to use generic vectors.  */
-static unsigned int
-vect_min_worthwhile_factor (enum tree_code code)
+bool
+vect_can_vectorize_without_simd_p (tree_code code)
 {
   switch (code)
 {
 case PLUS_EXPR:
 case MINUS_EXPR:
 case NEGATE_EXPR:
-  return 4;
-
 case BIT_AND_EXPR:
 case BIT_IOR_EXPR:
 case BIT_XOR_EXPR:
 case BIT_NOT_EXPR:
-  return 2;
+  return true;
 
 default:
-  return INT_MAX;
+  return false;
 }
 }
 
-/* Return true if VINFO indicates we are doing loop vectorization and if
-   it is worth decomposing CODE operations into scalar operations for
-   that loop's vectorization factor.  */
-
-bool
-vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
-{
-  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
-  unsigned HOST_WIDE_INT value;
-  return (loop_vinfo
- && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&value)
- && value >= vect_min_worthwhile_factor (code));
-}
-
 /* Function vectorizable_induction
 
Check if STMT_INFO performs an induction computation that can be vectorized.
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 94bdb74ea8d..5b94d41e292 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5685,24 +5685,13 @@ vectorizable_shift (vec_info *vinfo,
   /* Check only during analysis.  */
   if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
  || (!vec_stmt
- && !vect_worthwhile_without_simd_p (vinfo, code)))
+ && !vect_can_vectorize_without_simd_p (code)))
 return false;
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
  "proceeding using word mode.\n");
 }
 
-  /* Worthwhile without SIMD support?  Check only during analysis.  */
-  if (!vec_stmt
-  && !VECTOR_MODE_P (TYPE_MODE (vectype))
-  && !vect_worthwhile_without_simd_p (vinfo, code))
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "not worthwhile without SIMD support.\n");

Re: [PATCH] tree-optimization/101801 - remove vect_worthwhile_without_simd_p

2021-08-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> This removes the cost part of vect_worthwhile_without_simd_p, retaining
> only the correctness bits.  The reason is that the cost heuristic
> do not properly account for SLP plus the check whether "without simd"
> applies misfires for AVX512 mask vectors at the moment, leading to
> missed vectorizations there.
>
> Any costing decision should take place in the cost modeling, no
> single stmt is to disable all vectorization on its own.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.
>
> 2021-08-06  Richard Biener  
>
>   PR tree-optimization/101801
>   * tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename...
>   (vect_can_vectorize_without_simd_p): ... to this.
>   * tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename...
>   (vect_can_vectorize_without_simd_p): ... to this and fold
>   in vect_min_worthwhile_factor.
>   (vect_min_worthwhile_factor): Remove.
>   (vectorizable_reduction): Adjust and remove the cost part.
>   * tree-vect-stmts.c (vectorizable_shift): Likewise.
>   (vectorizable_operation): Likewise.
> ---
>  gcc/tree-vect-loop.c  | 43 +++
>  gcc/tree-vect-stmts.c | 26 ++
>  gcc/tree-vectorizer.h |  2 +-
>  3 files changed, 10 insertions(+), 61 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 1e21fe6b13d..37c7daa7f9e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -7227,24 +7227,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>if (dump_enabled_p ())
>  dump_printf (MSG_NOTE, "op not supported by target.\n");
> if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
> -   || !vect_worthwhile_without_simd_p (loop_vinfo, code))
> +   || !vect_can_vectorize_without_simd_p (code))
>   ok = false;
> else
>   if (dump_enabled_p ())
> dump_printf (MSG_NOTE, "proceeding using word mode.\n");
>  }
>  
> -  /* Worthwhile without SIMD support?  */
> -  if (ok
> -   && !VECTOR_MODE_P (TYPE_MODE (vectype_in))
> -   && !vect_worthwhile_without_simd_p (loop_vinfo, code))
> -{
> -  if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -  "not worthwhile without SIMD support.\n");
> -   ok = false;
> -}
> -
>/* lane-reducing operations have to go through 
> vect_transform_reduction.
>   For the other cases try without the single cycle optimization.  */
>if (!ok)
> @@ -7948,46 +7937,28 @@ vectorizable_phi (vec_info *,
>  }
>  
>  
> -/* Function vect_min_worthwhile_factor.
> +/* Return true if we can emulate CODE on an integer mode representation
> +   of a vector.  */
>  
> -   For a loop where we could vectorize the operation indicated by CODE,
> -   return the minimum vectorization factor that makes it worthwhile
> -   to use generic vectors.  */
> -static unsigned int
> -vect_min_worthwhile_factor (enum tree_code code)
> +bool
> +vect_can_vectorize_without_simd_p (tree_code code)
>  {
>switch (code)
>  {
>  case PLUS_EXPR:
>  case MINUS_EXPR:
>  case NEGATE_EXPR:
> -  return 4;
> -
>  case BIT_AND_EXPR:
>  case BIT_IOR_EXPR:
>  case BIT_XOR_EXPR:
>  case BIT_NOT_EXPR:
> -  return 2;
> +  return true;
>  
>  default:
> -  return INT_MAX;
> +  return false;
>  }
>  }
>  
> -/* Return true if VINFO indicates we are doing loop vectorization and if
> -   it is worth decomposing CODE operations into scalar operations for
> -   that loop's vectorization factor.  */
> -
> -bool
> -vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
> -{
> -  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
> -  unsigned HOST_WIDE_INT value;
> -  return (loop_vinfo
> -   && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&value)
> -   && value >= vect_min_worthwhile_factor (code));
> -}
> -
>  /* Function vectorizable_induction
>  
> Check if STMT_INFO performs an induction computation that can be 
> vectorized.
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 94bdb74ea8d..5b94d41e292 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -5685,24 +5685,13 @@ vectorizable_shift (vec_info *vinfo,
>/* Check only during analysis.  */
>if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
> || (!vec_stmt
> -   && !vect_worthwhile_without_simd_p (vinfo, code)))
> +   && !vect_can_vectorize_without_simd_p (code)))
>  return false;
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,
>   "proceeding using word mode.\n");
>  }
>  
> -  /* Worthwhile without SIMD support?  Check only during analysis.  */
> -  if (!vec_stmt
> -  && !VECTOR_MODE_P (T

Re: [RFC] ldist: Recognize rawmemchr loop patterns

2021-08-06 Thread Stefan Schulze Frielinghaus via Gcc-patches
ping

On Fri, Jun 25, 2021 at 12:23:32PM +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Jun 16, 2021 at 04:22:35PM +0200, Richard Biener wrote:
> > On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus
> >  wrote:
> > >
> > > On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus 
> > > wrote:
> > > [...]
> > > > > but we won't ever arrive here because of the niters condition.  But
> > > > > yes, doing the pattern matching in the innermost loop processing code
> > > > > looks good to me - for the specific case it would be
> > > > >
> > > > >   /* Don't distribute loop if niters is unknown.  */
> > > > >   tree niters = number_of_latch_executions (loop);
> > > > >   if (niters == NULL_TREE || niters == chrec_dont_know)
> > > > > ---> here?
> > > > > continue;
> > > >
> > > > Right, please find attached a new version of the patch where everything
> > > > is included in the loop distribution pass.  I will do a bootstrap and
> > > > regtest on IBM Z over night.  If you give me green light I will also do
> > > > the same on x86_64.
> > >
> > > Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at
> > > least the ldist-strlen testcase).  If you are Ok with the patch, then I
> > > would rebase and run the testsuites again and post a patch series
> > > including the rawmemchr implementation for IBM Z.
> > 
> > @@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop
> > *loop, vec *work_list)
> >return work_list->length () > 0;
> >  }
> > 
> > +static void
> > +generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
> > +   data_reference_p store_dr, tree base, tree 
> > pattern,
> > +   location_t loc)
> > +{
> > 
> > this new function needs a comment.  Applies to all of the new ones, btw.
> 
> Done.
> 
> > +  gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base))
> > +  && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE 
> > (pattern));
> > 
> > this looks fragile and is probably unnecessary as well.
> > 
> > +  gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base));
> > 
> > in general you want types_compatible_p () checks which for pointers means
> > all pointers are compatible ...
> 
> True, I removed both asserts.
> 
> > (skipping stuff)
> > 
> > @@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun)
> >   && !optimize_loop_for_speed_p (loop)))
> > continue;
> > 
> > -  /* Don't distribute loop if niters is unknown.  */
> > +  /* If niters is unknown don't distribute loop but rather try to 
> > transform
> > +it to a call to a builtin.  */
> >tree niters = number_of_latch_executions (loop);
> >if (niters == NULL_TREE || niters == chrec_dont_know)
> > -   continue;
> > +   {
> > + if (transform_reduction_loop (loop))
> > +   {
> > + changed = true;
> > + loops_to_be_destroyed.safe_push (loop);
> > + if (dump_file)
> > +   fprintf (dump_file, "Loop %d transformed into a
> > builtin.\n", loop->num);
> > +   }
> > + continue;
> > +   }
> > 
> > please look at
> > 
> >   if (nb_generated_loops + nb_generated_calls > 0)
> > {
> >   changed = true;
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >  loc, "Loop%s %d distributed: split to
> > %d loops "
> >  "and %d library calls.\n", str, loop->num,
> >  nb_generated_loops, nb_generated_calls);
> > 
> > and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the
> > transforms are reported with -fopt-info-loop
> 
> Done.
> 
> > +
> > +  return transform_reduction_loop_1 (loop, load_dr, store_dr, 
> > reduction_var);
> > +}
> > 
> > what's the point in tail-calling here and visually splitting the
> > function in half?
> 
> In the first place I thought that this is more pleasant since in
> transform_reduction_loop_1 it is settled that we have a single load,
> store, and reduction variable.  After refactoring this isn't true
> anymore and I inlined the function and made this clear via a comment.
> 
> > 
> > (sorry for picking random pieces now ;))
> > 
> > +  for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
> > +  gsi_next (&bsi), ++ninsns)
> > +   {
> > 
> > this counts debug insns, I guess you want gsi_next_nondebug at least.
> > not sure why you are counting PHIs at all btw - for the loops you match
> > you are expecting at most two, one IV and eventually one for the virtual
> > operand of the store?
> 
> Yes, I removed the counting for the phi loop and changed to
> gsi_next_nondebug for both loops.
> 
> > 
> > + if (gimple_has_volatile_ops (phi))
> > +   return false;
> > 
> > PHIs never have volatile ops.
> > 
> > + 

Re: [committed v2 3/3] arm: reorder assembler architecture directives [PR101723]

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Thu, Aug 5, 2021 at 1:58 PM Richard Earnshaw  wrote:

>
> A change to the way gas interprets the .fpu directive in binutils-2.34
> means that issuing .fpu will clear any features set by .arch_extension
> that apply to the floating point or simd units.  This unfortunately
> causes problems for more recent versions of the architecture because
> we currently emit .arch, .arch_extension and .fpu directives at
> different times and try to suppress redundant changes.
>
> This change addresses this by firstly unifying all the places where we
> emit these directives to a single block of code and secondly
> (re)emitting all the directives if any changes have been made to the
> target options.  Whilst this is slightly more than the strict minimum
> it should be enough to catch all cases where a change could have
> happened.  The new code also emits the directives in the order: .arch,
> .fpu, .arch_extension.  This ensures that the additional architectural
> extensions are not removed by a later .fpu directive.
>
> Whilst writing this patch I also noticed that in the corner case where
> the last function to be compiled had a non-standard set of
> architecture flags, the assembler would add an incorrect set of
> derived attributes for the file as a whole.  Instead of reflecting the
> command-line options it would reflect the flags from the last file in
> the function.  To address this I've also added a call to re-emit the
> flags from the asm_file_end callback so the assembler will be in the
> correct state when it finishes processing the intput.
>
> There's some slight churn to the testsuite as a consequence of this,
> because previously we had a hack to suppress emitting a .fpu directive
> for one specific case, but with the new order this is no-longer
> necessary.
>
> gcc/ChangeLog:
>
> PR target/101723
> * config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
> writing .cpu directive in asm output.
> * config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
> (arm_last_printed_arch_string): Delete.
> (arm_last-printed_fpu_string): Delete.
> (arm_configure_build_target): If use of floating-point/SIMD is
> disabled, remove all fp/simd related features from the target ISA.
> (last_arm_targ_options): New variable.
> (arm_print_asm_arch_directives): Add new parameters.  Change order
> of emitted directives and handle all cases here.
> (arm_file_start): Always call arm_print_asm_arch_directives, move
> all generation of .arch/.arch_extension here.
> (arm_file_end): Call arm_print_asm_arch.
> (arm_declare_function_name): Call arm_print_asm_arch_directives
> instead of printing .arch/.fpu directives directly.
>
> gcc/testsuite/ChangeLog:
>
> PR target/101723
> * gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected
> output.
> * gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
> * gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
> * gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do
> assemble.
> Add a non-no-op function body.
> * gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
> * gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
> * gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
> * gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
> check-function-bodies.
> * gcc.target/arm/attr-neon3.c: Likewise.
> * gcc.target/arm/pr69245.c: Tighten scan-assembler match, but allow
> multiple instances.
> * gcc.target/arm/pragma_fpu_attribute.c: Likewise.
> * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
>


Hi Richard,

There were a couple of typos in the tests updates, fixed as follows:

 Author: Christophe Lyon 
Date:   Fri Aug 6 14:06:44 2021 +

arm: Fix typos for reorder assembler architecture directives [PR101723]

Two tests had typos preventing them from passing, committed as obvious.

2021-08-06  Christophe Lyon  

gcc/testsuite/
PR target/101723
* gcc.target/arm/attr-neon3.c: Fix typo.
* gcc.target/arm/pragma_fpu_attribute_2.c: Fix typo.

diff --git a/gcc/testsuite/gcc.target/arm/attr-neon3.c
b/gcc/testsuite/gcc.target/arm/attr-neon3.c
index 0fbce6e4cd4..b6171e72d89 100644
--- a/gcc/testsuite/gcc.target/arm/attr-neon3.c
+++ b/gcc/testsuite/gcc.target/arm/attr-neon3.c
@@ -33,7 +33,7 @@ my (float32x2_t __a, float32x2_t __b)
 ** |
 ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]!
 ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]
-** }
+** )
 ** ...
 ** bx  lr
 */
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
index 3d33b04b787..398d8fff35c 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_fpu

Re: [PATCH] [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element.

2021-08-06 Thread Segher Boessenkool
On Fri, Aug 06, 2021 at 11:55:52AM +0100, Richard Sandiford wrote:
> liuhongt via Gcc-patches  writes:
> > Hi:
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> >   Ok for trunk?
> 
> I think if anything the canonicalisation should be the other way:
> if the shift amount is an in-range constant, we know that it fits
> within a vector element, and so the vector form should be preferred.

Yeah.  And the canonicalisation needs to be documented *first*, i.e. we
have to agree on it first, *before* patches doing this to simplify-rtx
are acceptable.  We don't do design-by-fait-accompli.

Any canonicalisation also has to fit in well with other canonicalisations,
or we will be better off not having canonical forms.

If it turns out there is no good canonical form, we will simply have to
handle both forms (or more than two perhaps).  This isn't the end of the
world, we have to do that already.  If we can simplify things with a
canonical form, that is great; if that causes too much extra work
instead, it is not so great.  These things have to be thought about.


Segher


Valgrind '--show-leak-kinds=all'

2021-08-06 Thread Thomas Schwinge
Hi!

I'm working on plugging a memory leak in an entirely different
compartment of GCC, but also ran into this issue:

On 2021-02-12T08:35:52+0100, Richard Biener via Gcc-patches 
 wrote:
> On Thu, Feb 11, 2021 at 7:35 PM Martin Sebor  wrote:
>> On 2/11/21 12:59 AM, Richard Biener wrote:
>> > On Wed, Feb 10, 2021 at 6:16 PM Martin Sebor  wrote:
>> >> [...] Valgrind shows more leaks in this code that
>> >> I'm not sure what to do about:
>> >>
>> >> 1) A tree built by build_type_attribute_qual_variant() called from
>> >>  attr_access::array_as_string() to build a temporary type only
>> >>  for the purposes of formatting it.
>> >>
>> >> 2) A tree (an attribute list) built by tree_cons() called from
>> >>  build_attr_access_from_parms() that's used only for the duration
>> >>  of the caller.
>> >>
>> >> Do these temporary trees need to be released somehow or are the leaks
>> >> expected?
>> >
>> > You should configure GCC with --enable-valgrind-annotations to make
>> > it aware of our GC.
>>
>> I did configure with that option:
>>
>> $ /src/gcc/master/configure --enable-checking=yes
>> --enable-languages=all,jit,lto --enable-host-shared
>> --enable-valgrind-annotations

>> $ /build/gcc-master/gcc/xgcc -B /build/gcc-master/gcc -S -Wall
>> /src/gcc/master/gcc/testsuite/gcc.dg/Wvla-parameter.c -wrapper
>> valgrind,--leak-check=full,--show-leak-kinds=all,--track-origins=yes,--log-file=valgrind-out.txt
>>
>> Do you not see the same leaks?

I do; also stuff like:

56 bytes in 1 blocks are still reachable in loss record 152 of 875
   at 0x483DD99: calloc (vg_replace_malloc.c:762)
   by 0x1753240: xcalloc (xmalloc.c:162)
   by 0x669C83: ggc_internal_alloc(unsigned long, void (*)(void*), unsigned 
long, unsigned long) (ggc-page.c:918)
   by 0x89E07D: ggc_internal_cleared_alloc(unsigned long, void (*)(void*), 
unsigned long, unsigned long) (ggc-common.c:117)
   by 0xF65D0D: make_node(tree_code) (ggc.h:143)
   by 0xF6632B: build_decl(unsigned int, tree_code, tree_node*, tree_node*) 
(tree.c:5264)
   by 0xA28ADC: build_builtin_function(unsigned int, char const*, 
tree_node*, int, built_in_class, char const*, tree_node*) (langhooks.c:681)
   by 0xA29FDD: add_builtin_function(char const*, tree_node*, int, 
built_in_class, char const*, tree_node*) (langhooks.c:716)
   by 0x622BFB: def_builtin_1(built_in_function, char const*, 
built_in_class, tree_node*, tree_node*, bool, bool, bool, tree_node*, bool) 
[clone .constprop.25] (lto-lang.c:650)
   by 0x640709: lto_define_builtins(tree_node*, tree_node*) 
(omp-builtins.def:46)
   by 0x641EE3: lto_init() (lto-lang.c:1339)
   by 0x61E26A: toplev::main(int, char**) (toplev.c:1921)

... and many, many more.

> Err, well.  --show-leak-kinds=all is probably the cause.

Before finding this email, I too had convinced myself that everying that
came by 'ggc_*' I may ignore, because:

> We
> definitely do not force-release
> all reachable GC allocated memory at program end.

... of this: these blocks simply had not been GCed at program end.

It's however a bit tedious to filter, in my case, 11864 lines of Valgrind
output.

> Not sure if
> valgrind annotations can
> make that obvious to valgrind.

Or, if that's not feasible (I don't know much about Valgrind...), then
instead would it help to force a final GC at program end if we're running
in "valgrind mode"?  If that's a plausible thing to do, would guarding
that by GCC having been configured with '--enable-valgrind-annotations'
be OK, or do we need a '--param', or something else?

> I'm just using --leak-check=full and
> thus look for
> unreleased and no longer reachable memory.

ACK, in my case, that only shows seven errors (not related to my stuff).


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed v2 3/3] arm: reorder assembler architecture directives [PR101723]

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 4:08 PM Christophe Lyon <
christophe.lyon@gmail.com> wrote:

>
>
> On Thu, Aug 5, 2021 at 1:58 PM Richard Earnshaw  wrote:
>
>>
>> A change to the way gas interprets the .fpu directive in binutils-2.34
>> means that issuing .fpu will clear any features set by .arch_extension
>> that apply to the floating point or simd units.  This unfortunately
>> causes problems for more recent versions of the architecture because
>> we currently emit .arch, .arch_extension and .fpu directives at
>> different times and try to suppress redundant changes.
>>
>> This change addresses this by firstly unifying all the places where we
>> emit these directives to a single block of code and secondly
>> (re)emitting all the directives if any changes have been made to the
>> target options.  Whilst this is slightly more than the strict minimum
>> it should be enough to catch all cases where a change could have
>> happened.  The new code also emits the directives in the order: .arch,
>> .fpu, .arch_extension.  This ensures that the additional architectural
>> extensions are not removed by a later .fpu directive.
>>
>> Whilst writing this patch I also noticed that in the corner case where
>> the last function to be compiled had a non-standard set of
>> architecture flags, the assembler would add an incorrect set of
>> derived attributes for the file as a whole.  Instead of reflecting the
>> command-line options it would reflect the flags from the last file in
>> the function.  To address this I've also added a call to re-emit the
>> flags from the asm_file_end callback so the assembler will be in the
>> correct state when it finishes processing the intput.
>>
>> There's some slight churn to the testsuite as a consequence of this,
>> because previously we had a hack to suppress emitting a .fpu directive
>> for one specific case, but with the new order this is no-longer
>> necessary.
>>
>> gcc/ChangeLog:
>>
>> PR target/101723
>> * config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
>> writing .cpu directive in asm output.
>> * config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
>> (arm_last_printed_arch_string): Delete.
>> (arm_last-printed_fpu_string): Delete.
>> (arm_configure_build_target): If use of floating-point/SIMD is
>> disabled, remove all fp/simd related features from the target ISA.
>> (last_arm_targ_options): New variable.
>> (arm_print_asm_arch_directives): Add new parameters.  Change order
>> of emitted directives and handle all cases here.
>> (arm_file_start): Always call arm_print_asm_arch_directives, move
>> all generation of .arch/.arch_extension here.
>> (arm_file_end): Call arm_print_asm_arch.
>> (arm_declare_function_name): Call arm_print_asm_arch_directives
>> instead of printing .arch/.fpu directives directly.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR target/101723
>> * gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected
>> output.
>> * gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
>> * gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
>> * gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do
>> assemble.
>> Add a non-no-op function body.
>> * gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
>> * gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
>> * gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
>> * gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
>> check-function-bodies.
>> * gcc.target/arm/attr-neon3.c: Likewise.
>> * gcc.target/arm/pr69245.c: Tighten scan-assembler match, but
>> allow
>> multiple instances.
>> * gcc.target/arm/pragma_fpu_attribute.c: Likewise.
>> * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
>>
>
>
> Hi Richard,
>
> There were a couple of typos in the tests updates, fixed as follows:
>
>  Author: Christophe Lyon 
> Date:   Fri Aug 6 14:06:44 2021 +
>
> arm: Fix typos for reorder assembler architecture directives [PR101723]
>
> Two tests had typos preventing them from passing, committed as obvious.
>
> 2021-08-06  Christophe Lyon  
>
> gcc/testsuite/
> PR target/101723
> * gcc.target/arm/attr-neon3.c: Fix typo.
> * gcc.target/arm/pragma_fpu_attribute_2.c: Fix typo.
>
> diff --git a/gcc/testsuite/gcc.target/arm/attr-neon3.c
> b/gcc/testsuite/gcc.target/arm/attr-neon3.c
> index 0fbce6e4cd4..b6171e72d89 100644
> --- a/gcc/testsuite/gcc.target/arm/attr-neon3.c
> +++ b/gcc/testsuite/gcc.target/arm/attr-neon3.c
> @@ -33,7 +33,7 @@ my (float32x2_t __a, float32x2_t __b)
>  ** |
>  ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]!
>  ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]
> -** }
> +** )
>  ** ...
>  ** bx  lr
>  */
> diff --git a/gcc/testsuite/gcc.targe

Re: [PATCH] tree-optimization/101801 - remove vect_worthwhile_without_simd_p

2021-08-06 Thread Richard Biener via Gcc-patches
On August 6, 2021 3:47:44 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener via Gcc-patches  writes:
>> This removes the cost part of vect_worthwhile_without_simd_p, retaining
>> only the correctness bits.  The reason is that the cost heuristic
>> do not properly account for SLP plus the check whether "without simd"
>> applies misfires for AVX512 mask vectors at the moment, leading to
>> missed vectorizations there.
>>
>> Any costing decision should take place in the cost modeling, no
>> single stmt is to disable all vectorization on its own.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.
>>
>> 2021-08-06  Richard Biener  
>>
>>  PR tree-optimization/101801
>>  * tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename...
>>  (vect_can_vectorize_without_simd_p): ... to this.
>>  * tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename...
>>  (vect_can_vectorize_without_simd_p): ... to this and fold
>>  in vect_min_worthwhile_factor.
>>  (vect_min_worthwhile_factor): Remove.
>>  (vectorizable_reduction): Adjust and remove the cost part.
>>  * tree-vect-stmts.c (vectorizable_shift): Likewise.
>>  (vectorizable_operation): Likewise.
>> ---
>>  gcc/tree-vect-loop.c  | 43 +++
>>  gcc/tree-vect-stmts.c | 26 ++
>>  gcc/tree-vectorizer.h |  2 +-
>>  3 files changed, 10 insertions(+), 61 deletions(-)
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 1e21fe6b13d..37c7daa7f9e 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -7227,24 +7227,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>>if (dump_enabled_p ())
>>  dump_printf (MSG_NOTE, "op not supported by target.\n");
>>if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
>> -  || !vect_worthwhile_without_simd_p (loop_vinfo, code))
>> +  || !vect_can_vectorize_without_simd_p (code))
>>  ok = false;
>>else
>>  if (dump_enabled_p ())
>>dump_printf (MSG_NOTE, "proceeding using word mode.\n");
>>  }
>>  
>> -  /* Worthwhile without SIMD support?  */
>> -  if (ok
>> -  && !VECTOR_MODE_P (TYPE_MODE (vectype_in))
>> -  && !vect_worthwhile_without_simd_p (loop_vinfo, code))
>> -{
>> -  if (dump_enabled_p ())
>> -dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> - "not worthwhile without SIMD support.\n");
>> -  ok = false;
>> -}
>> -
>>/* lane-reducing operations have to go through 
>> vect_transform_reduction.
>>   For the other cases try without the single cycle optimization.  */
>>if (!ok)
>> @@ -7948,46 +7937,28 @@ vectorizable_phi (vec_info *,
>>  }
>>  
>>  
>> -/* Function vect_min_worthwhile_factor.
>> +/* Return true if we can emulate CODE on an integer mode representation
>> +   of a vector.  */
>>  
>> -   For a loop where we could vectorize the operation indicated by CODE,
>> -   return the minimum vectorization factor that makes it worthwhile
>> -   to use generic vectors.  */
>> -static unsigned int
>> -vect_min_worthwhile_factor (enum tree_code code)
>> +bool
>> +vect_can_vectorize_without_simd_p (tree_code code)
>>  {
>>switch (code)
>>  {
>>  case PLUS_EXPR:
>>  case MINUS_EXPR:
>>  case NEGATE_EXPR:
>> -  return 4;
>> -
>>  case BIT_AND_EXPR:
>>  case BIT_IOR_EXPR:
>>  case BIT_XOR_EXPR:
>>  case BIT_NOT_EXPR:
>> -  return 2;
>> +  return true;
>>  
>>  default:
>> -  return INT_MAX;
>> +  return false;
>>  }
>>  }
>>  
>> -/* Return true if VINFO indicates we are doing loop vectorization and if
>> -   it is worth decomposing CODE operations into scalar operations for
>> -   that loop's vectorization factor.  */
>> -
>> -bool
>> -vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
>> -{
>> -  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
>> -  unsigned HOST_WIDE_INT value;
>> -  return (loop_vinfo
>> -  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&value)
>> -  && value >= vect_min_worthwhile_factor (code));
>> -}
>> -
>>  /* Function vectorizable_induction
>>  
>> Check if STMT_INFO performs an induction computation that can be 
>> vectorized.
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 94bdb74ea8d..5b94d41e292 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -5685,24 +5685,13 @@ vectorizable_shift (vec_info *vinfo,
>>/* Check only during analysis.  */
>>if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
>>|| (!vec_stmt
>> -  && !vect_worthwhile_without_simd_p (vinfo, code)))
>> +  && !vect_can_vectorize_without_simd_p (code)))
>>  return false;
>>if (dump_enabled_p ())
>>  dump_printf_loc (MSG_NOTE, vect_location,
>>   

Re: [committed v2 3/3] arm: reorder assembler architecture directives [PR101723]

2021-08-06 Thread Richard Earnshaw via Gcc-patches




On 06/08/2021 15:08, Christophe Lyon via Gcc-patches wrote:

On Thu, Aug 5, 2021 at 1:58 PM Richard Earnshaw  wrote:



A change to the way gas interprets the .fpu directive in binutils-2.34
means that issuing .fpu will clear any features set by .arch_extension
that apply to the floating point or simd units.  This unfortunately
causes problems for more recent versions of the architecture because
we currently emit .arch, .arch_extension and .fpu directives at
different times and try to suppress redundant changes.

This change addresses this by firstly unifying all the places where we
emit these directives to a single block of code and secondly
(re)emitting all the directives if any changes have been made to the
target options.  Whilst this is slightly more than the strict minimum
it should be enough to catch all cases where a change could have
happened.  The new code also emits the directives in the order: .arch,
.fpu, .arch_extension.  This ensures that the additional architectural
extensions are not removed by a later .fpu directive.

Whilst writing this patch I also noticed that in the corner case where
the last function to be compiled had a non-standard set of
architecture flags, the assembler would add an incorrect set of
derived attributes for the file as a whole.  Instead of reflecting the
command-line options it would reflect the flags from the last file in
the function.  To address this I've also added a call to re-emit the
flags from the asm_file_end callback so the assembler will be in the
correct state when it finishes processing the intput.

There's some slight churn to the testsuite as a consequence of this,
because previously we had a hack to suppress emitting a .fpu directive
for one specific case, but with the new order this is no-longer
necessary.

gcc/ChangeLog:

 PR target/101723
 * config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
 writing .cpu directive in asm output.
 * config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
 (arm_last_printed_arch_string): Delete.
 (arm_last-printed_fpu_string): Delete.
 (arm_configure_build_target): If use of floating-point/SIMD is
 disabled, remove all fp/simd related features from the target ISA.
 (last_arm_targ_options): New variable.
 (arm_print_asm_arch_directives): Add new parameters.  Change order
 of emitted directives and handle all cases here.
 (arm_file_start): Always call arm_print_asm_arch_directives, move
 all generation of .arch/.arch_extension here.
 (arm_file_end): Call arm_print_asm_arch.
 (arm_declare_function_name): Call arm_print_asm_arch_directives
 instead of printing .arch/.fpu directives directly.

gcc/testsuite/ChangeLog:

 PR target/101723
 * gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected
output.
 * gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
 * gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
 * gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do
assemble.
 Add a non-no-op function body.
 * gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
 * gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
 * gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
 * gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
 check-function-bodies.
 * gcc.target/arm/attr-neon3.c: Likewise.
 * gcc.target/arm/pr69245.c: Tighten scan-assembler match, but allow
 multiple instances.
 * gcc.target/arm/pragma_fpu_attribute.c: Likewise.
 * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.




Hi Richard,

There were a couple of typos in the tests updates, fixed as follows:

  Author: Christophe Lyon 
Date:   Fri Aug 6 14:06:44 2021 +

 arm: Fix typos for reorder assembler architecture directives [PR101723]

 Two tests had typos preventing them from passing, committed as obvious.

 2021-08-06  Christophe Lyon  

 gcc/testsuite/
 PR target/101723
 * gcc.target/arm/attr-neon3.c: Fix typo.
 * gcc.target/arm/pragma_fpu_attribute_2.c: Fix typo.

diff --git a/gcc/testsuite/gcc.target/arm/attr-neon3.c
b/gcc/testsuite/gcc.target/arm/attr-neon3.c
index 0fbce6e4cd4..b6171e72d89 100644
--- a/gcc/testsuite/gcc.target/arm/attr-neon3.c
+++ b/gcc/testsuite/gcc.target/arm/attr-neon3.c
@@ -33,7 +33,7 @@ my (float32x2_t __a, float32x2_t __b)
  ** |
  ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]!
  ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]
-** }
+** )
  ** ...
  ** bx  lr
  */
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
index 3d33b04b787..398d8fff35c 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
@@ -2

Re: [committed v2 3/3] arm: reorder assembler architecture directives [PR101723]

2021-08-06 Thread Richard Earnshaw via Gcc-patches




On 06/08/2021 15:21, Christophe Lyon via Gcc-patches wrote:

On Fri, Aug 6, 2021 at 4:08 PM Christophe Lyon <
christophe.lyon@gmail.com> wrote:




On Thu, Aug 5, 2021 at 1:58 PM Richard Earnshaw  wrote:



A change to the way gas interprets the .fpu directive in binutils-2.34
means that issuing .fpu will clear any features set by .arch_extension
that apply to the floating point or simd units.  This unfortunately
causes problems for more recent versions of the architecture because
we currently emit .arch, .arch_extension and .fpu directives at
different times and try to suppress redundant changes.

This change addresses this by firstly unifying all the places where we
emit these directives to a single block of code and secondly
(re)emitting all the directives if any changes have been made to the
target options.  Whilst this is slightly more than the strict minimum
it should be enough to catch all cases where a change could have
happened.  The new code also emits the directives in the order: .arch,
.fpu, .arch_extension.  This ensures that the additional architectural
extensions are not removed by a later .fpu directive.

Whilst writing this patch I also noticed that in the corner case where
the last function to be compiled had a non-standard set of
architecture flags, the assembler would add an incorrect set of
derived attributes for the file as a whole.  Instead of reflecting the
command-line options it would reflect the flags from the last file in
the function.  To address this I've also added a call to re-emit the
flags from the asm_file_end callback so the assembler will be in the
correct state when it finishes processing the intput.

There's some slight churn to the testsuite as a consequence of this,
because previously we had a hack to suppress emitting a .fpu directive
for one specific case, but with the new order this is no-longer
necessary.

gcc/ChangeLog:

 PR target/101723
 * config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
 writing .cpu directive in asm output.
 * config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
 (arm_last_printed_arch_string): Delete.
 (arm_last-printed_fpu_string): Delete.
 (arm_configure_build_target): If use of floating-point/SIMD is
 disabled, remove all fp/simd related features from the target ISA.
 (last_arm_targ_options): New variable.
 (arm_print_asm_arch_directives): Add new parameters.  Change order
 of emitted directives and handle all cases here.
 (arm_file_start): Always call arm_print_asm_arch_directives, move
 all generation of .arch/.arch_extension here.
 (arm_file_end): Call arm_print_asm_arch.
 (arm_declare_function_name): Call arm_print_asm_arch_directives
 instead of printing .arch/.fpu directives directly.

gcc/testsuite/ChangeLog:

 PR target/101723
 * gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected
output.
 * gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
 * gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
 * gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do
assemble.
 Add a non-no-op function body.
 * gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
 * gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
 * gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
 * gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
 check-function-bodies.
 * gcc.target/arm/attr-neon3.c: Likewise.
 * gcc.target/arm/pr69245.c: Tighten scan-assembler match, but
allow
 multiple instances.
 * gcc.target/arm/pragma_fpu_attribute.c: Likewise.
 * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.




Hi Richard,

There were a couple of typos in the tests updates, fixed as follows:

  Author: Christophe Lyon 
Date:   Fri Aug 6 14:06:44 2021 +

 arm: Fix typos for reorder assembler architecture directives [PR101723]

 Two tests had typos preventing them from passing, committed as obvious.

 2021-08-06  Christophe Lyon  

 gcc/testsuite/
 PR target/101723
 * gcc.target/arm/attr-neon3.c: Fix typo.
 * gcc.target/arm/pragma_fpu_attribute_2.c: Fix typo.

diff --git a/gcc/testsuite/gcc.target/arm/attr-neon3.c
b/gcc/testsuite/gcc.target/arm/attr-neon3.c
index 0fbce6e4cd4..b6171e72d89 100644
--- a/gcc/testsuite/gcc.target/arm/attr-neon3.c
+++ b/gcc/testsuite/gcc.target/arm/attr-neon3.c
@@ -33,7 +33,7 @@ my (float32x2_t __a, float32x2_t __b)
  ** |
  ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]!
  ** vld1.64 {d[0-9]+}, \[r[0-9]+:64\]
-** }
+** )
  ** ...
  ** bx  lr
  */
diff --git a/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
b/gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c
index 3d33b04b787..398d8fff35c 100644
--- a/gcc/testsuite/gcc.target/arm

Re: [committed v2 3/3] arm: reorder assembler architecture directives [PR101723]

2021-08-06 Thread Christophe Lyon via Gcc-patches
On Fri, Aug 6, 2021 at 4:28 PM Richard Earnshaw <
richard.earns...@foss.arm.com> wrote:

>
>
> On 06/08/2021 15:21, Christophe Lyon via Gcc-patches wrote:
> > On Fri, Aug 6, 2021 at 4:08 PM Christophe Lyon <
> > christophe.lyon@gmail.com> wrote:
> >
> >>
> >>
> >> On Thu, Aug 5, 2021 at 1:58 PM Richard Earnshaw 
> wrote:
> >>
> >>>
> >>> A change to the way gas interprets the .fpu directive in binutils-2.34
> >>> means that issuing .fpu will clear any features set by .arch_extension
> >>> that apply to the floating point or simd units.  This unfortunately
> >>> causes problems for more recent versions of the architecture because
> >>> we currently emit .arch, .arch_extension and .fpu directives at
> >>> different times and try to suppress redundant changes.
> >>>
> >>> This change addresses this by firstly unifying all the places where we
> >>> emit these directives to a single block of code and secondly
> >>> (re)emitting all the directives if any changes have been made to the
> >>> target options.  Whilst this is slightly more than the strict minimum
> >>> it should be enough to catch all cases where a change could have
> >>> happened.  The new code also emits the directives in the order: .arch,
> >>> .fpu, .arch_extension.  This ensures that the additional architectural
> >>> extensions are not removed by a later .fpu directive.
> >>>
> >>> Whilst writing this patch I also noticed that in the corner case where
> >>> the last function to be compiled had a non-standard set of
> >>> architecture flags, the assembler would add an incorrect set of
> >>> derived attributes for the file as a whole.  Instead of reflecting the
> >>> command-line options it would reflect the flags from the last file in
> >>> the function.  To address this I've also added a call to re-emit the
> >>> flags from the asm_file_end callback so the assembler will be in the
> >>> correct state when it finishes processing the intput.
> >>>
> >>> There's some slight churn to the testsuite as a consequence of this,
> >>> because previously we had a hack to suppress emitting a .fpu directive
> >>> for one specific case, but with the new order this is no-longer
> >>> necessary.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>  PR target/101723
> >>>  * config/arm/arm-cpus.in (generic-armv7-a): Add quirk to
> suppress
> >>>  writing .cpu directive in asm output.
> >>>  * config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
> >>>  (arm_last_printed_arch_string): Delete.
> >>>  (arm_last-printed_fpu_string): Delete.
> >>>  (arm_configure_build_target): If use of floating-point/SIMD is
> >>>  disabled, remove all fp/simd related features from the target
> ISA.
> >>>  (last_arm_targ_options): New variable.
> >>>  (arm_print_asm_arch_directives): Add new parameters.  Change
> order
> >>>  of emitted directives and handle all cases here.
> >>>  (arm_file_start): Always call arm_print_asm_arch_directives,
> move
> >>>  all generation of .arch/.arch_extension here.
> >>>  (arm_file_end): Call arm_print_asm_arch.
> >>>  (arm_declare_function_name): Call
> arm_print_asm_arch_directives
> >>>  instead of printing .arch/.fpu directives directly.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>  PR target/101723
> >>>  * gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected
> >>> output.
> >>>  * gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
> >>>  * gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c:
> Likewise.
> >>>  * gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do
> >>> assemble.
> >>>  Add a non-no-op function body.
> >>>  * gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
> >>>  * gcc.target/arm/pr98636.c (dg-options): Add
> -mfloat-abi=softfp.
> >>>  * gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
> >>>  * gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
> >>>  check-function-bodies.
> >>>  * gcc.target/arm/attr-neon3.c: Likewise.
> >>>  * gcc.target/arm/pr69245.c: Tighten scan-assembler match, but
> >>> allow
> >>>  multiple instances.
> >>>  * gcc.target/arm/pragma_fpu_attribute.c: Likewise.
> >>>  * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
> >>>
> >>
> >>
> >> Hi Richard,
> >>
> >> There were a couple of typos in the tests updates, fixed as follows:
> >>
> >>   Author: Christophe Lyon 
> >> Date:   Fri Aug 6 14:06:44 2021 +
> >>
> >>  arm: Fix typos for reorder assembler architecture directives
> [PR101723]
> >>
> >>  Two tests had typos preventing them from passing, committed as
> obvious.
> >>
> >>  2021-08-06  Christophe Lyon  
> >>
> >>  gcc/testsuite/
> >>  PR target/101723
> >>  * gcc.target/arm/attr-neon3.c: Fix typo.
> >>  * gcc.target/arm/pragma_fpu_attribute

[PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
Add additional checks to verify destination[source] of a load[store]
instruction is a register.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for master?

-Pat


2021-08-06  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
register.
(is_store_insn1): Verify source is a register.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..1460a0d7c5c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
 return find_mem_ref (SET_SRC (pat), load_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET
+  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
 return find_mem_ref (SET_DEST (pat), str_mem);
 
   if (GET_CODE (pat) == PARALLEL)


[PATCH] libcpp: For C++23 treat UCNs and UTF-8 chars not valid in identifiers as separate tokens

2021-08-06 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 06, 2021 at 11:53:56AM +0200, Jakub Jelinek via Gcc-patches wrote:
> Actually, there is another change in P1949R7 that I haven't touched
> in the patch and not sure what the implications are.
> 
> To the preprocessing-token non-terminal it adds
>   each universal-character-name that cannot be one of the above
> and changes the following paragraph:
>  ...
>  preprocessing operators and punctuators, and single
> +universal-character-names and
>  non-whitespace characters that do not lexically match the other
>  preprocessing token categories.
> +If a single universal-character-name does not match any of the other
> +preprocessing token categories, the program is ill-formed.
>  If a ' or a " character matches the last category, the behavior
>  is undefined.
>  ...

If the above (and identifier-start and identifier-continue non-terminals
only mentioning XID_Start+0x5F and XID_Continue UCNs) means that we should
indeed put each such UTF-8 char or UCN into a separate CPP_OTHER token
for C++23, then we need something like this incremental patch.
The drawback is worse diagnostics though, so maybe it would be useful if
the cpp_error that ... is not valid in an identifier or is not
valid at the start of an identifier would be emitted as a warning (and not
warn when skipping)?

2021-08-06  Jakub Jelinek  

gcc/
* c-lex.c (c_lex_with_flags): For CPP_OTHER with UCN in the
text report the UCN instead of backslash in the error message.
gcc/testsuite/
* g++.dg/cpp23/ucnid-1-utf8.C: Adjust expected diagnostics for
C++23.
* g++.dg/cpp23/ucnid-2-utf8.C: Likewise.
* g++.dg/cpp23/normalize3.C: Likewise.
libcpp/
* charset.c (_cpp_valid_ucn): For cxx23_identifiers, return false
instead of emitting cpp_error for UCNs not valid in identifiers.
(_cpp_valid_utf8): Similarly for UTF-8 characters not valid in
identifiers.
* lex.c (_cpp_lex_direct): For UCNs not valid in identifiers,
create CPP_OTHER token for the whole UCN.

--- libcpp/charset.c.jj 2021-08-06 11:01:09.052644793 +0200
+++ libcpp/charset.c2021-08-06 14:49:15.648388743 +0200
@@ -1158,6 +1158,15 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 {
   int validity = ucn_valid_in_identifier (pfile, result, nst);
 
+  if (CPP_OPTION (pfile, cxx23_identifiers))
+   {
+ if (validity == 0 || (validity == 2 && identifier_pos == 1))
+   {
+ *cp = result;
+ *pstr = base + 2;
+ return false;
+   }
+   }
   if (validity == 0)
cpp_error (pfile, CPP_DL_ERROR,
   "universal character %.*s is not valid in an identifier",
@@ -1283,9 +1292,11 @@ _cpp_valid_utf8 (cpp_reader *pfile,
 because logically, the UTF-8 was converted to a UCN during
 translation phase 1 (even though we don't physically do it that
 way).  In C, this byte rather becomes grammatically a separate
-token.  */
+token.  In C++23, it should become gramatically a separate
+token as well.  */
 
- if (CPP_OPTION (pfile, cplusplus))
+ if (CPP_OPTION (pfile, cplusplus)
+ && !CPP_OPTION (pfile, cxx23_identifiers))
cpp_error (pfile, CPP_DL_ERROR,
   "extended character %.*s is not valid in an identifier",
   (int) (*pstr - base), base);
@@ -1300,6 +1311,11 @@ _cpp_valid_utf8 (cpp_reader *pfile,
case 2:
  if (identifier_pos == 1)
{
+ if (CPP_OPTION (pfile, cxx23_identifiers))
+   {
+ *pstr = base;
+ return false;
+   }
  /* This is treated the same way in C++ or C99 -- lexed as an
 identifier which is then invalid because an identifier is
 not allowed to start with this character.  */
--- libcpp/lex.c.jj 2021-08-05 21:52:45.491176158 +0200
+++ libcpp/lex.c2021-08-06 14:49:38.861067796 +0200
@@ -3398,6 +3398,16 @@ _cpp_lex_direct (cpp_reader *pfile)
if (_cpp_valid_utf8 (pfile, &pstr, buffer->rlimit, 0, NULL, &s))
  buffer->cur = pstr;
  }
+   else if (c == '\\'
+&& (buffer->cur[0] == 'u' || buffer->cur[0] == 'U')
+&& CPP_OPTION (pfile, cxx23_identifiers))
+ {
+   const uchar *pstr = base + 2;
+   cppchar_t s;
+   if (_cpp_valid_ucn (pfile, &pstr, buffer->rlimit, 0, NULL, &s,
+   NULL, NULL))
+ buffer->cur = pstr;
+ }
create_literal (pfile, result, base, buffer->cur - base, CPP_OTHER);
break;
   }
--- gcc/c-family/c-lex.c.jj 2021-07-23 22:06:02.236084923 +0200
+++ gcc/c-family/c-lex.c2021-08-06 15:04:29.817748967 +0200
@@ -613,6 +613,16 @@ c_lex_with_flags (tree *value, location_
 
if (c == '"' || c == '\'')
  error_at (*loc, "m

Re: [PATCH] Use _GLIBCXX_ASSERTIONS as _GLIBCXX_DEBUG light

2021-08-06 Thread François Dumont via Gcc-patches

On 07/06/21 6:25 am, François Dumont wrote:

On 03/06/21 2:31 pm, Jonathan Wakely wrote:



+  }
+
  /* Checks that [first, last) is a valid range, and then returns
   * __first. This routine is useful when we can't use a separate
   * assertion statement because, e.g., we are in a constructor.
@@ -260,8 +279,9 @@ namespace __gnu_debug
    inline bool
    __check_sorted(const _InputIterator& __first, const 
_InputIterator& __last)

    {
-  return __check_sorted_aux(__first, __last,
-    std::__iterator_category(__first));
+  return __skip_debug_runtime_check()
+    || __check_sorted_aux(__first, __last,
+  std::__iterator_category(__first));


Currently this function is never called at all ifndef _GLIBCXX_DEBUG.
With this change, it's going to be present for _GLIBCXX_ASSERTIONS,
and if it isn't inlined it's going to explode the code size.

Some linux distros are already building the entire distro with
_GLIBCXX_ASSERTIONS so I think we need to be quite careful about this
kind of large change affecting every algo.

So maybe we shouldn't enable these checks via _GLIBCXX_ASSERTIONS, but
a new macro.


_GLIBCXX_DEBUG is already rarely used, so will be yet another mode.

So let's forget about all this, thanks.

I eventually wonder if your feedback was limited to the use of 
__check_sorted and some other codes perhaps.


So here is another proposal which activate a small subset of the 
_GLIBCXX_DEBUG checks in _GLIBCXX_ASSERTIONS but with far less code.


First, the _Error_formatter is not used, the injected checks are simply 
using __glibcxx_assert.


Second I reduced the number of accitaved checks, mostly the __valid_range.

I also enhance the valid_range check for constexpr because sometimes the 
normal implementation is good enough to let the compiler diagnose a 
potential issue in this context. This is for example the case of the 
std::equal implementation whereas the std::copy implementation is too 
defensive.


    libstdc++: [_GLIBCXX_ASSERTIONS] Activate basic debug checks

    libstdc++-v3/ChangeLog:

    * include/bits/stl_algobase.h (equal): Use runtime-only 
_GLIBCXX_DEBUG check.
    * include/bits/stl_iterator.h [_GLIBCXX_ASSERTIONS]: 
Include .
    * include/debug/debug.h [_GLIBCXX_ASSERTIONS]: Define debug 
macros non-empty. Most of

    the time do a simple valid_range check.
    * include/debug/helper_functions.h: Cleanup comment about 
removed _Iter_base.
    (__valid_range): Add __skip_if_constexpr parameter and skip 
check when in a constexpr

    context.
    * include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY): Define as 
__glibcxx_assert when only

    _GLIBCXX_ASSERTIONS is defined.
    (__glibcxx_check_valid_range): Add _SkipIfConstexpr parameter.
    (__glibcxx_check_can_increment_range): Likewise.
    * testsuite/24_iterators/istream_iterator/1.cc (test01): 
Skip iterator increment when

    _GLIBCXX_ASSERTIONS is defined.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New test.
    * testsuite/25_algorithms/heap/1.cc: Skip operation 
complexity checks when _GLIBCXX_ASSERTIONS

    is defined.
    * 
testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc: 
Fix dg-prune-output reason.
    * 
testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc: 
Likewise.
    * 
testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc: 
Likewise.
    * 
testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc: 
Likewise.
    * 
testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc: 
Likewise.
    * 
testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc: 
Likewise.


The last fixes below are due to the recent changes to the 
__glibcxx_assert macro but it is close to the code I am changing so I 
prefer to fix those here.


Tested under Linux x86_64 w/o _GLIBCXX_ASSERTIONS.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index d0c49628d7f..d45eec4339b 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1551,7 +1551,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   __glibcxx_function_requires(_EqualOpConcept<
 	typename iterator_traits<_II1>::value_type,
 	typename iterator_traits<_II2>::value_type>)
-  __glibcxx_requires_can_increment_range(__first1, __last1, __first2);
+  __glibcxx_requires_can_increment_range_runtime(__first1, __last1, __first2);
 
   return std::__equal_aux(__first1, __last1, __first2);
 }
diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 3773d600b8f..37c348adfd3 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2465,7 +24

Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Segher Boessenkool
On Fri, Aug 06, 2021 at 09:47:40AM -0500, Pat Haugen wrote:
> Add additional checks to verify destination[source] of a load[store]
> instruction is a register.

>   * config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
>   register.

No colon before " (".

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>if (!pat || pat == NULL_RTX)
>  return false;
>  
> -  if (GET_CODE (pat) == SET)
> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>  return find_mem_ref (SET_SRC (pat), load_mem);

So this now falls through if it is a SET of something else than a reg.
Is that intentional?  If so, this should be in the changelog.

> @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
>if (!pat || pat == NULL_RTX)
>  return false;
>  
> -  if (GET_CODE (pat) == SET)
> +  if (GET_CODE (pat) == SET
> +  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
>  return find_mem_ref (SET_DEST (pat), str_mem);

Similar here.


Segher


Re: Valgrind '--show-leak-kinds=all'

2021-08-06 Thread Richard Biener via Gcc-patches
On August 6, 2021 4:09:37 PM GMT+02:00, Thomas Schwinge 
 wrote:
>Hi!
>
>I'm working on plugging a memory leak in an entirely different
>compartment of GCC, but also ran into this issue:
>
>On 2021-02-12T08:35:52+0100, Richard Biener via Gcc-patches 
> wrote:
>> On Thu, Feb 11, 2021 at 7:35 PM Martin Sebor  wrote:
>>> On 2/11/21 12:59 AM, Richard Biener wrote:
>>> > On Wed, Feb 10, 2021 at 6:16 PM Martin Sebor  wrote:
>>> >> [...] Valgrind shows more leaks in this code that
>>> >> I'm not sure what to do about:
>>> >>
>>> >> 1) A tree built by build_type_attribute_qual_variant() called from
>>> >>  attr_access::array_as_string() to build a temporary type only
>>> >>  for the purposes of formatting it.
>>> >>
>>> >> 2) A tree (an attribute list) built by tree_cons() called from
>>> >>  build_attr_access_from_parms() that's used only for the duration
>>> >>  of the caller.
>>> >>
>>> >> Do these temporary trees need to be released somehow or are the leaks
>>> >> expected?
>>> >
>>> > You should configure GCC with --enable-valgrind-annotations to make
>>> > it aware of our GC.
>>>
>>> I did configure with that option:
>>>
>>> $ /src/gcc/master/configure --enable-checking=yes
>>> --enable-languages=all,jit,lto --enable-host-shared
>>> --enable-valgrind-annotations
>
>>> $ /build/gcc-master/gcc/xgcc -B /build/gcc-master/gcc -S -Wall
>>> /src/gcc/master/gcc/testsuite/gcc.dg/Wvla-parameter.c -wrapper
>>> valgrind,--leak-check=full,--show-leak-kinds=all,--track-origins=yes,--log-file=valgrind-out.txt
>>>
>>> Do you not see the same leaks?
>
>I do; also stuff like:
>
>56 bytes in 1 blocks are still reachable in loss record 152 of 875
>   at 0x483DD99: calloc (vg_replace_malloc.c:762)
>   by 0x1753240: xcalloc (xmalloc.c:162)
>   by 0x669C83: ggc_internal_alloc(unsigned long, void (*)(void*), 
> unsigned long, unsigned long) (ggc-page.c:918)
>   by 0x89E07D: ggc_internal_cleared_alloc(unsigned long, void (*)(void*), 
> unsigned long, unsigned long) (ggc-common.c:117)
>   by 0xF65D0D: make_node(tree_code) (ggc.h:143)
>   by 0xF6632B: build_decl(unsigned int, tree_code, tree_node*, 
> tree_node*) (tree.c:5264)
>   by 0xA28ADC: build_builtin_function(unsigned int, char const*, 
> tree_node*, int, built_in_class, char const*, tree_node*) (langhooks.c:681)
>   by 0xA29FDD: add_builtin_function(char const*, tree_node*, int, 
> built_in_class, char const*, tree_node*) (langhooks.c:716)
>   by 0x622BFB: def_builtin_1(built_in_function, char const*, 
> built_in_class, tree_node*, tree_node*, bool, bool, bool, tree_node*, bool) 
> [clone .constprop.25] (lto-lang.c:650)
>   by 0x640709: lto_define_builtins(tree_node*, tree_node*) 
> (omp-builtins.def:46)
>   by 0x641EE3: lto_init() (lto-lang.c:1339)
>   by 0x61E26A: toplev::main(int, char**) (toplev.c:1921)
>
>... and many, many more.
>
>> Err, well.  --show-leak-kinds=all is probably the cause.
>
>Before finding this email, I too had convinced myself that everying that
>came by 'ggc_*' I may ignore, because:
>
>> We
>> definitely do not force-release
>> all reachable GC allocated memory at program end.
>
>... of this: these blocks simply had not been GCed at program end.
>
>It's however a bit tedious to filter, in my case, 11864 lines of Valgrind
>output.
>
>> Not sure if
>> valgrind annotations can
>> make that obvious to valgrind.
>
>Or, if that's not feasible (I don't know much about Valgrind...), then
>instead would it help to force a final GC at program end if we're running
>in "valgrind mode"?  If that's a plausible thing to do, would guarding
>that by GCC having been configured with '--enable-valgrind-annotations'
>be OK, or do we need a '--param', or something else?

Well, instead of a final GC we could explicitly release all GC managed memory. 

>> I'm just using --leak-check=full and
>> thus look for
>> unreleased and no longer reachable memory.
>
>ACK, in my case, that only shows seven errors (not related to my stuff).
>
>
>Grüße
> Thomas
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955



Re: [PATCH] middle-end/AArch64 Fix bootstrap after vec changes

2021-08-06 Thread Martin Sebor via Gcc-patches

On 8/6/21 4:50 AM, Tamar Christina wrote:

Hi All,

The build is broken since a3d3e8c362c2 since it's deleted the ability to pass
vec<> by value and now much be past by reference.

However some language hooks used by AArch64 were not updated and breaks the
build on AArch64.  This patch updates these hooks.


Thanks.  It occurred to only after I committed the change that I should
have also tested a few other targets.  Sorry about that!

Martin



However most of the changes are generic... so I'm sending to a cross section
of approvers.

Bootstrapped aarch64-none-linux-gnu and works again.

Ok for master?

Thanks,
Tamar

gcc/c/ChangeLog:

* c-decl.c (c_simulate_enum_decl): Pass vec<> by pointer.
* c-tree.h (c_simulate_enum_decl): Likewise.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_svpattern,
register_svprfop): Pass vec<> by pointer.
* langhooks-def.h (lhd_simulate_enum_decl): Likewise.
* langhooks.c (lhd_simulate_enum_decl): Likewise.
* langhooks.h (struct lang_hooks_for_types): Likewise.

gcc/cp/ChangeLog:

* cp-objcp-common.h (cxx_simulate_enum_decl): Pass vec<> by pointer.
* decl.c (cxx_simulate_enum_decl): Likewise.

--- inline copy of patch --
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 
234ee16fe4afe5baf3490596d27662c7acee8126..221a67fe57be105dfb88f5053179adb62c9cc47d
 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9379,7 +9379,7 @@ build_enumerator (location_t decl_loc, location_t loc,
  
  tree

  c_simulate_enum_decl (location_t loc, const char *name,
- vec values)
+ vec *values_ptr)
  {
location_t saved_loc = input_location;
input_location = loc;
@@ -9389,6 +9389,7 @@ c_simulate_enum_decl (location_t loc, const char *name,
  
tree value_chain = NULL_TREE;

string_int_pair *value;
+  vec values = *values_ptr;
unsigned int i;
FOR_EACH_VEC_ELT (values, i, value)
  {
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 
ab6db3860f5062d5d73e3ffacf123a992b9c6c48..a8a90eae30d54006e83b20cdcc7aa7a582686c6c
 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -595,7 +595,7 @@ extern void finish_function (location_t = input_location);
  extern tree finish_struct (location_t, tree, tree, tree,
   class c_struct_parse_info *);
  extern tree c_simulate_enum_decl (location_t, const char *,
- vec);
+ vec *);
  extern struct c_arg_info *build_arg_info (void);
  extern struct c_arg_info *get_parm_info (bool, tree);
  extern tree grokfield (location_t, struct c_declarator *,
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 
f44f81f13754b2d7f7391086c846ee2f966d54a7..f71b287570e4c8c00149e864db4bf03941382672
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3499,7 +3499,7 @@ register_svpattern ()
  #undef PUSH
  
acle_svpattern = lang_hooks.types.simulate_enum_decl (input_location,

-   "svpattern", values);
+   "svpattern", &values);
  }
  
  /* Register the svprfop enum.  */

@@ -3513,7 +3513,7 @@ register_svprfop ()
  #undef PUSH
  
acle_svprfop = lang_hooks.types.simulate_enum_decl (input_location,

- "svprfop", values);
+ "svprfop", &values);
  }
  
  /* Implement #pragma GCC aarch64 "arm_sve.h".  */

diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index 
53c6e4c2c8859be51b7bbcc87ce688d35c10f602..f1704aad5578c4132bcee5b3f9799dbacb6fd114
 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -38,7 +38,7 @@ extern bool cp_handle_option (size_t, const char *, 
HOST_WIDE_INT, int,
  location_t, const struct cl_option_handlers *);
  extern tree cxx_make_type_hook(tree_code);
  extern tree cxx_simulate_enum_decl (location_t, const char *,
-   vec);
+   vec *);
  
  /* Lang hooks that are shared between C++ and ObjC++ are defined here.  Hooks

 specific to C++ or ObjC++ go in cp/cp-lang.c and objcp/objcp-lang.c,
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 
6fa6b9adc8778ff641f6a1c6ca9083ee74863331..f626f1e65ee644a494eb57fa5bcb7f1eb05ff667
 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -16408,7 +16408,7 @@ lookup_enumerator (tree enumtype, tree name)
  
  tree

  cxx_simulate_enum_decl (location_t loc, const char *name,
-   vec values)
+   vec *values)
  {
location_t saved_loc = input_location;
input_location = loc;
diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index 
8b744d96fb23af9c16c4860a3ac1c715b292fc24..02b4681d

Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
On 8/6/21 10:02 AM, Segher Boessenkool wrote:
> On Fri, Aug 06, 2021 at 09:47:40AM -0500, Pat Haugen wrote:
>> Add additional checks to verify destination[source] of a load[store]
>> instruction is a register.
> 
>>  * config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
>>  register.
> 
> No colon before " (".
> 
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>>if (!pat || pat == NULL_RTX)
>>  return false;
>>  
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>>  return find_mem_ref (SET_SRC (pat), load_mem);
> 
> So this now falls through if it is a SET of something else than a reg.
> Is that intentional?  If so, this should be in the changelog.
> 

Falling through eventually resulted in returning false, but no, wasn't really 
intentional to change the logic that way. I'll fix up both places to look like 
the following and resubmit.

  if (GET_CODE (pat)== SET) 
if (REG_P (SET_DEST (pat)))
  return find_mem_ref (SET_SRC (pat), load_mem);
else
  return false;

-Pat


Re: [PATCH] c++: suppress all warnings on memper pointers to work around dICE [PR101219]

2021-08-06 Thread Sergei Trofimovich via Gcc-patches
On Thu, 29 Jul 2021 11:41:39 -0400
Jason Merrill  wrote:

> On 7/22/21 7:15 PM, Sergei Trofimovich wrote:
> > From: Sergei Trofimovich 
> > 
> > r12-1804 ("cp: add support for per-location warning groups.") among other
> > things removed warning suppression from a few places including ptrmemfuncs.
> > 
> > Currently ptrmemfuncs don't have valid BINFO attached which causes ICEs
> > in access checks:
> > 
> >  crash_signal
> >  gcc/toplev.c:328
> >  perform_or_defer_access_check(tree_node*, tree_node*, tree_node*, int, 
> > access_failure_info*)
> >  gcc/cp/semantics.c:490
> >  finish_non_static_data_member(tree_node*, tree_node*, tree_node*)
> >  gcc/cp/semantics.c:2208
> >  ...
> > 
> > The change suppresses warnings again until we provide BINFOs for 
> > ptrmemfuncs.  
> 
> We don't need BINFOs for PMFs, we need to avoid paths that expect them.
> 
> It looks like the problem is with tsubst_copy_and_build calling 
> finish_non_static_data_member instead of build_ptrmemfunc_access_expr.

Sounds good. I'm not sure what would be the best way to match it. Here is
my attempt seems to survive all regtests:

--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20530,7 +20530,13 @@ tsubst_copy_and_build (tree t,
if (member == error_mark_node)
  RETURN (error_mark_node);

-   if (TREE_CODE (member) == FIELD_DECL)
+   if (object_type && TYPE_PTRMEMFUNC_P(object_type)
+   && TREE_CODE (member) == FIELD_DECL)
+ {
+   r = build_ptrmemfunc_access_expr (object, DECL_NAME(member));
+   RETURN (r);
+ }
+   else if (TREE_CODE (member) == FIELD_DECL)
  {
r = finish_non_static_data_member (member, object, NULL_TREE);
if (TREE_CODE (r) == COMPONENT_REF)

> > PR c++/101219
> > 
> > gcc/cp/ChangeLog:
> > 
> > * typeck.c (build_ptrmemfunc_access_expr): Suppress all warnings
> > to avoid ICE.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/torture/pr101219.C: New test.  
> 
> This doesn't need to be in torture; it has nothing to do with optimization.

Aha, moved to gcc/testsuite/g++.dg/warn/pr101219.C.

--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/pr101219.C
@@ -0,0 +1,11 @@
+/* PR c++/101219 - ICE on use of uninitialized memfun pointer
+   { dg-do compile }
+   { dg-options "-Wall" } */
+
+struct S { void m(); };
+
+template  bool f() {
+  void (S::*mp)();
+
+  return &S::m == mp; // no warning emitted here (no instantiation)
+}

Another question: Is it expected that gcc generates no warnings here?
It's an uninstantiated function (-1 for warn), but from what I
understand it's guaranteed to generate comparison with uninitialized
data if it ever gets instantiated. Given that we used to ICE in
warning code gcc could possibly flag it? (+1 for warn)

Attached full patch as well. Full 'make check' shows no regressions on
x86_64-linux.

-- 

  Sergei
>From 9c51dbc598d8633167729de9637c8cdb5f3089fe Mon Sep 17 00:00:00 2001
From: Sergei Trofimovich 
Date: Fri, 6 Aug 2021 16:14:16 +0100
Subject: [PATCH v2] c++: fix ptrmemfunc template instantiation [PR101219]

r12-1804 ("cp: add support for per-location warning groups.") among other
things removed warning suppression from a few places including ptrmemfuncs.

This exposed a bug in warning detection code as a reference to missing
BINFO (it's intentionally missing for ptrmemfunc types):

crash_signal
gcc/toplev.c:328
perform_or_defer_access_check(tree_node*, tree_node*, tree_node*, int, access_failure_info*)
gcc/cp/semantics.c:490
finish_non_static_data_member(tree_node*, tree_node*, tree_node*)
gcc/cp/semantics.c:2208
...

The change special cases ptrmemfuncs in templace substitution by using
build_ptrmemfunc_access_expr() instead of finish_non_static_data_member().

PR c++/101219

gcc/cp/ChangeLog:

* pt.c (tsubst_copy_and_build): Use build_ptrmemfunc_access_expr
to construct ptrmemfunc expression instantiation.

gcc/testsuite/ChangeLog:

* g++.dg/warn/pr101219.C: New test.

Signed-off-by: Sergei Trofimovich 
---
 gcc/cp/pt.c  |  8 +++-
 gcc/testsuite/g++.dg/warn/pr101219.C | 11 +++
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/pr101219.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b396ddd0089..c7a0317cbfb 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20530,7 +20530,13 @@ tsubst_copy_and_build (tree t,
 	if (member == error_mark_node)
 	  RETURN (error_mark_node);
 
-	if (TREE_CODE (member) == FIELD_DECL)
+	if (object_type && TYPE_PTRMEMFUNC_P(object_type)
+	&& TREE_CODE (member) == FIELD_DECL)
+	  {
+	r = build_ptrmemfunc_access_expr (object, DECL_NAME(member));
+	RETURN (r);
+	  }
+	else if (TREE_CODE (member) == FIELD_DECL)
 	  {
 	r = finish_non_static_data_member (member, object, NULL_TREE);
 	if (TREE_CODE (r) == COMPONENT_REF)
diff --git a/gcc/testsuite/g++.dg/w

Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Segher Boessenkool
On Fri, Aug 06, 2021 at 10:29:40AM -0500, Pat Haugen wrote:
> On 8/6/21 10:02 AM, Segher Boessenkool wrote:
> >> -  if (GET_CODE (pat) == SET)
> >> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
> >>  return find_mem_ref (SET_SRC (pat), load_mem);
> > 
> > So this now falls through if it is a SET of something else than a reg.
> > Is that intentional?  If so, this should be in the changelog.
> > 
> 
> Falling through eventually resulted in returning false, but no, wasn't really 
> intentional to change the logic that way. I'll fix up both places to look 
> like the following and resubmit.
> 
>   if (GET_CODE (pat)== SET) 
> if (REG_P (SET_DEST (pat)))
>   return find_mem_ref (SET_SRC (pat), load_mem);
> else
>   return false;

Well, that isn't quiet it either, no?  It returns false for all stores
then?


Segher


Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
On 8/6/21 11:05 AM, Segher Boessenkool wrote:
> On Fri, Aug 06, 2021 at 10:29:40AM -0500, Pat Haugen wrote:
>> On 8/6/21 10:02 AM, Segher Boessenkool wrote:
 -  if (GET_CODE (pat) == SET)
 +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
  return find_mem_ref (SET_SRC (pat), load_mem);
>>>
>>> So this now falls through if it is a SET of something else than a reg.
>>> Is that intentional?  If so, this should be in the changelog.
>>>
>>
>> Falling through eventually resulted in returning false, but no, wasn't 
>> really intentional to change the logic that way. I'll fix up both places to 
>> look like the following and resubmit.
>>
>>   if (GET_CODE (pat)== SET) 
>> if (REG_P (SET_DEST (pat)))
>>   return find_mem_ref (SET_SRC (pat), load_mem);
>> else
>>   return false;
> 
> Well, that isn't quiet it either, no?  It returns false for all stores
> then?

Yeah, I didn't word that clearly, meant I'd fix up both appropriately with the 
inner if-else logic.

Here's what it looks like now.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..ad856254f52 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,8 +18357,11 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_SRC (pat), load_mem);
+  if (GET_CODE (pat)== SET)
+if (REG_P (SET_DEST (pat)))
+  return find_mem_ref (SET_SRC (pat), load_mem);
+else
+  return false;
 
   if (GET_CODE (pat) == PARALLEL)
 {
@@ -18395,7 +18398,10 @@ is_store_insn1 (rtx pat, rtx *str_mem)
 return false;
 
   if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_DEST (pat), str_mem);
+if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))
+  return find_mem_ref (SET_DEST (pat), str_mem);
+else
+  return false;
 
   if (GET_CODE (pat) == PARALLEL)
 {

Unless I'm totally not understanding the point you're trying to get across to 
me.

-Pat


Re: [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si

2021-08-06 Thread Segher Boessenkool
On Fri, Aug 06, 2021 at 08:10:05AM -0500, Bill Schmidt wrote:
> On 8/4/21 9:06 PM, Kewen.Lin wrote:
> >-   UNSPEC_VUPKHUB
> >-   UNSPEC_VUPKHUH
> >-   UNSPEC_VUPKLUB
> >-   UNSPEC_VUPKLUH
> >+   UNSPEC_VUPKHUBHW
> >+   UNSPEC_VUPKLUBHW
> 
> Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we 
> extend this later to other types.

Yes please.  The "BHW" isn't useful.

> Otherwise the patch looks fine to me.  Recommend maintainers approve 
> with or without changes.

With.  I'll reply to Ke Wen's mail separately, your reply is whitespace
damaged (format=flawed it looks like).


Segher


[PATCH, rs6000 V2] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
Add additional checks to verify destination[source] of a load[store]
instruction is a register.

Modified code based on review comment to not change general logic of the flow. 
Braces needed on inner if-else to prevent error during bootstrap for ambiguous 
'else'.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for master?

-Pat


2021-08-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c (is_load_insn1): Verify destination is a
register.
(is_store_insn1): Verify source is a register.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..c8a146a7d18 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,8 +18357,13 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_SRC (pat), load_mem);
+  if (GET_CODE (pat)== SET)
+{
+  if (REG_P (SET_DEST (pat)))
+   return find_mem_ref (SET_SRC (pat), load_mem);
+  else
+   return false;
+}
 
   if (GET_CODE (pat) == PARALLEL)
 {
@@ -18395,7 +18400,12 @@ is_store_insn1 (rtx pat, rtx *str_mem)
 return false;
 
   if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_DEST (pat), str_mem);
+{
+  if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))
+   return find_mem_ref (SET_DEST (pat), str_mem);
+  else
+   return false;
+}
 
   if (GET_CODE (pat) == PARALLEL)
 {


Re: Go patch committed: Be strict about escape analysis of builtin functions

2021-08-06 Thread Ian Lance Taylor via Gcc-patches
On Wed, Aug 4, 2021 at 9:24 PM Ian Lance Taylor  wrote:
>
> This Go frontend patch by Cherry Mui makes the escape analysis pass
> stricter about builtin functions  In the places where we handle
> builtin functions, list all supported ones, and fail if an unexpected
> one is seen. So if a new builtin function is added in the future we
> can detect it, instead of silently treating it as nonescaping.
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

Here is a follow-on patch by Cherry Mui that does the same thing for
runtime functions.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
1157b8141443caed32f0fb7ea40aa74a1ba36fac
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 19ab2de5c18..9ed527f7eb4 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-32590102c464679f845667b5554e1dcce2549ad2
+747f3a2d78c073e9b03dd81914d0edb7ddc5be14
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/escape.cc b/gcc/go/gofrontend/escape.cc
index c8978ac9239..6da29edc419 100644
--- a/gcc/go/gofrontend/escape.cc
+++ b/gcc/go/gofrontend/escape.cc
@@ -1646,6 +1646,7 @@ Escape_analysis_assign::expression(Expression** pexpr)
  case Runtime::MAKECHAN:
  case Runtime::MAKECHAN64:
  case Runtime::MAKEMAP:
+ case Runtime::MAKEMAP64:
  case Runtime::MAKESLICE:
  case Runtime::MAKESLICE64:
 this->context_->track(n);
@@ -1705,8 +1706,52 @@ Escape_analysis_assign::expression(Expression** pexpr)
 }
 break;
 
+  case Runtime::MEMCMP:
+  case Runtime::DECODERUNE:
+  case Runtime::INTSTRING:
+  case Runtime::MAKEMAP_SMALL:
+  case Runtime::MAPACCESS1:
+  case Runtime::MAPACCESS1_FAST32:
+  case Runtime::MAPACCESS1_FAST64:
+  case Runtime::MAPACCESS1_FASTSTR:
+  case Runtime::MAPACCESS1_FAT:
+  case Runtime::MAPACCESS2:
+  case Runtime::MAPACCESS2_FAST32:
+  case Runtime::MAPACCESS2_FAST64:
+  case Runtime::MAPACCESS2_FASTSTR:
+  case Runtime::MAPACCESS2_FAT:
+  case Runtime::MAPASSIGN_FAST32:
+  case Runtime::MAPASSIGN_FAST64:
+  case Runtime::MAPITERINIT:
+  case Runtime::MAPITERNEXT:
+  case Runtime::MAPCLEAR:
+  case Runtime::CHANRECV2:
+  case Runtime::SELECTGO:
+  case Runtime::SELECTNBSEND:
+  case Runtime::SELECTNBRECV:
+  case Runtime::BLOCK:
+  case Runtime::IFACET2IP:
+  case Runtime::EQTYPE:
+  case Runtime::MEMCLRHASPTR:
+  case Runtime::FIELDTRACK:
+  case Runtime::BUILTIN_MEMSET:
+  case Runtime::PANIC_SLICE_CONVERT:
+// these do not escape.
+break;
+
+  case Runtime::IFACEE2E2:
+  case Runtime::IFACEI2E2:
+  case Runtime::IFACEE2I2:
+  case Runtime::IFACEI2I2:
+  case Runtime::IFACEE2T2P:
+  case Runtime::IFACEI2T2P:
+// handled in ::assign.
+break;
+
  default:
-   break;
+// should not see other runtime calls. they are not yet
+// lowered to runtime calls at this point.
+go_unreachable();
  }
  }
 else


Re: [PATCH] libcpp: For C++23 treat UCNs and UTF-8 chars not valid in identifiers as separate tokens

2021-08-06 Thread Joseph Myers
On Fri, 6 Aug 2021, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Aug 06, 2021 at 11:53:56AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > Actually, there is another change in P1949R7 that I haven't touched
> > in the patch and not sure what the implications are.
> > 
> > To the preprocessing-token non-terminal it adds
> > each universal-character-name that cannot be one of the above
> > and changes the following paragraph:
> >  ...
> >  preprocessing operators and punctuators, and single
> > +universal-character-names and
> >  non-whitespace characters that do not lexically match the other
> >  preprocessing token categories.
> > +If a single universal-character-name does not match any of the other
> > +preprocessing token categories, the program is ill-formed.
> >  If a ' or a " character matches the last category, the behavior
> >  is undefined.
> >  ...
> 
> If the above (and identifier-start and identifier-continue non-terminals
> only mentioning XID_Start+0x5F and XID_Continue UCNs) means that we should
> indeed put each such UTF-8 char or UCN into a separate CPP_OTHER token
> for C++23, then we need something like this incremental patch.
> The drawback is worse diagnostics though, so maybe it would be useful if
> the cpp_error that ... is not valid in an identifier or is not
> valid at the start of an identifier would be emitted as a warning (and not
> warn when skipping)?

It's not clear to me that this change to the standard actually requires 
any change in how GCC behaves.  A UCN (or character considered to be 
converted to a UCN) that's not valid in identifiers is still invalid in a 
context where an identifier preprocessing token could occur (including in 
#if 0), whether it's interpreted as a "single UCN" preprocessing token 
(stated to be ill-formed) or (part of) an invalid identifier preprocessing 
token.

-- 
Joseph S. Myers
jos...@codesourcery.com


[r12-2729 Regression] FAIL: g++.dg/cpp2a/concepts-pr67774.C -std=c++2a (test for excess errors) on Linux/x86_64

2021-08-06 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

240b01b0215f9e46ecf04267c8a3faeb19d4fe3c is the first bad commit
commit 240b01b0215f9e46ecf04267c8a3faeb19d4fe3c
Author: Jonathan Wakely 
Date:   Tue Aug 3 18:06:27 2021 +0100

libstdc++: Add [[nodiscard]] to iterators and related utilities

caused

FAIL: g++.dg/cpp2a/concepts-pr67774.C  -std=c++2a (test for excess errors)

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/concepts-pr67774.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/concepts-pr67774.C --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/concepts-pr67774.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/concepts-pr67774.C --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-2733 Regression] FAIL: gcc.target/i386/vect-gather-1.c scan-tree-dump vect "loop vectorized" on Linux/x86_64

2021-08-06 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

31855ba6b16cd138d7484076a08cd40d609654b8 is the first bad commit
commit 31855ba6b16cd138d7484076a08cd40d609654b8
Author: Richard Biener 
Date:   Thu Jul 29 14:14:48 2021 +0200

Add emulated gather capability to the vectorizer

caused

FAIL: gcc.target/i386/vect-gather-1.c scan-tree-dump vect "loop vectorized"

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-gather-1.c 
--target_board='unix{-m32}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-2730 Regression] FAIL: g++.old-deja/g++.other/inline7.C -std=gnu++2a (test for excess errors) on Linux/x86_64

2021-08-06 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

0d04fe49239d91787850036599164788f1c87785 is the first bad commit
commit 0d04fe49239d91787850036599164788f1c87785
Author: Jonathan Wakely 
Date:   Tue Aug 3 20:50:52 2021 +0100

libstdc++: Add [[nodiscard]] to sequence containers

caused

FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++17 (test for excess errors)
FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++2a (test for excess errors)

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.other/inline7.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.other/inline7.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.other/inline7.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="old-deja.exp=g++.old-deja/g++.other/inline7.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-2789 Regression] FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 loops" 1 on Linux/x86_64

2021-08-06 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

f31da42e047e8018ca6ad9809273bc7efb6ffcaf is the first bad commit
commit f31da42e047e8018ca6ad9809273bc7efb6ffcaf
Author: Richard Biener 
Date:   Fri Aug 6 14:39:05 2021 +0200

tree-optimization/101801 - remove vect_worthwhile_without_simd_p

caused

FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 
loops" 1

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/gen-vect-11b.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/gen-vect-11b.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-2766 Regression] FAIL: g++.dg/warn/Wstringop-overflow-6.C -std=gnu++2a (test for excess errors) on Linux/x86_64

2021-08-06 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

7b1de3eb9ed3f8dde54732d88520292c5ad1157d is the first bad commit
commit 7b1de3eb9ed3f8dde54732d88520292c5ad1157d
Author: Jonathan Wakely 
Date:   Thu Aug 5 13:34:00 2021 +0100

libstdc++: Move attributes that follow requires-clauses [PR101782]

caused

FAIL: g++.dg/warn/Wstringop-overflow-6.C  -std=gnu++2a (test for excess errors)

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] PR fortran/68568 - ICE with automatic character object and save, in combination with some options

2021-08-06 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

I intend to commit the following pretty obvious fix within the next
24 hours unless there are objections.

Regtested on x86_64-pc-linux-gnu.

Thanks,
Harald

Fortran: ICE with automatic character object, save, and various options

gcc/fortran/ChangeLog:

PR fortran/68568
* primary.c (gfc_expr_attr): Variable attribute can only be
inquired when symtree is non-NULL.

gcc/testsuite/ChangeLog:

PR fortran/68568
* gfortran.dg/pr68568.f90: New test.

diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c
index 9fe8d1ee20c..56a78d6f89f 100644
--- a/gcc/fortran/primary.c
+++ b/gcc/fortran/primary.c
@@ -2779,7 +2779,7 @@ gfc_expr_attr (gfc_expr *e)
 	   && e->value.function.isym->transformational
 	   && e->ts.type == BT_CLASS)
 	attr = CLASS_DATA (e)->attr;
-  else
+  else if (e->symtree)
 	attr = gfc_variable_attr (e, NULL);

   /* TODO: NULL() returns pointers.  May have to take care of this
diff --git a/gcc/testsuite/gfortran.dg/pr68568.f90 b/gcc/testsuite/gfortran.dg/pr68568.f90
new file mode 100644
index 000..eab5744b363
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr68568.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options "-finit-character=7 -Wsurprising" }
+! PR fortran/68568 - ICE with automatic character object, save, and options
+
+subroutine s(n) ! { dg-error "non-constant initialization expression" }
+  integer, intent(in) :: n
+  character(len=n) :: x
+  save
+  x = 'a'
+end


Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-08-06 Thread Victor Tong via Gcc-patches
Thanks for the feedback. Here's the updated pattern:

  /* X - (X - Y) --> Y */
  (simplify
(minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
(if (ANY_INTEGRAL_TYPE_P (type)
&& TYPE_OVERFLOW_UNDEFINED(type)
&& !TYPE_OVERFLOW_SANITIZED(type)
&& ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
&& TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
&& !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
&& ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
&& TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
&& !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
&& TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
&& TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
(convert @1)))

I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that 
leverage undefined overflows check for it. I think this new pattern shouldn't 
be applied if overflow sanitizer checks are enabled.

>> why is this testing TREE_TYPE (@0)?

I'm checking the type of @0 because I'm concerned that there could be a case 
where @0's type isn't an integer type with undefined overflow. I tried creating 
a test case and couldn't seem to create one where this is violated but I kept 
the checks to avoid causing a regression. If I'm being overcautious and you 
feel that the type checks on @0 aren't needed, I can remove them. I think the 
precision check on TREE_TYPE(@0) is needed to avoid truncation cases though.

>> Once we'd "inline" nop_convert genmatch would complain
about this.

Is someone working on inlining nop_convert? I'd like to avoid breaking someone 
else's work if that's being worked on right now.

I've also added some extra tests to cover this new pattern. I've attached a 
patch with my latest changes.


From: Richard Biener 
Sent: Wednesday, July 28, 2021 2:59 AM
To: Victor Tong 
Cc: Marc Glisse ; gcc-patches@gcc.gnu.org 

Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division 
followed by multiply [PR95176] 
 
On Tue, Jun 29, 2021 at 1:10 AM Victor Tong  wrote:
>
> Thanks Richard and Marc.
>
> I wrote the following test case to compare the outputs of fn1() and 
> fn1NoOpt() below with my extra pattern being applied. I tested the two 
> functions with all of the integers from INT_MIN to INT_MAX.
>
> long
> fn1 (int x)
> {
>   return 42L - (long)(42 - x);
> }
>
> #pragma GCC push_options
> #pragma GCC optimize ("O0")
> long
> fn1NoOpt (int x)
> {
>   volatile int y = (42 - x);
>   return 42L - (long)y;
> }
> #pragma GCC pop_options
>
> int main ()
> {
> for (long i=INT_MIN; i<=INT_MAX;i++)
> {
> auto valNoOpt = fn1NoOpt(i);
> auto valOpt = fn1(i);
> if (valNoOpt != valOpt)
> printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, 
>valNoOpt);
> }
> return 0;
> }
>
> I saw that the return values of fn1() and fn1NoOpt() differed when the input 
> was between INT_MIN and INT_MIN+42 inclusive. When passing values in this 
> range to fn1NoOpt(), a signed overflow is triggered which causes the value to 
> differ (undefined behavior). This seems to go in line with what Marc 
> described and I think the transformation is correct in the scenario above. I 
> do think that type casts that result in truncation (i.e. from a higher 
> precision to a lower one) or with unsigned types will result in an incorrect 
> transformation so those scenarios need to be avoided.
>
> Given that the extra pattern I'm adding is taking advantage the undefined 
> behavior of signed integer overflow, I'm considering keeping the existing 
> nop_convert pattern in place and adding a new pattern to cover these new 
> cases. I'd also like to avoid touching nop_convert given that it's used in a 
> number of other patterns.
>
> This is the pattern I have currently:
>
>   (simplify
> (minus (convert1? @0) (convert2? (minus (convert3? @2) @1)))
> (if (operand_equal_p(@0, @2, 0)

The operand_equal_p should be reflected by using @0 in place of @2.

> && INTEGRAL_TYPE_P (type)
> && TYPE_OVERFLOW_UNDEFINED(type)
> && !TYPE_OVERFLOW_SANITIZED(type)
> && INTEGRAL_TYPE_P (TREE_TYPE(@1))
> && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
> && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1))
> && !TYPE_UNSIGNED (TREE_TYPE (@1))
> && !TYPE_UNSIGNED (type)

please group checks on common argument.  I think a single test
on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P
to include vector and complex types.

> && TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type)
> && INTEGRAL_TYPE_P (TREE_TYPE(@0))
> && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))

why is this testing TREE_TYPE (@0)?  The outer subtract is using 'type',
the inner subtract uses TREE_TYPE (@1) though you could place
a capture on the minus to make what you test more obvious, like

  (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1)))

TYPE_OVERFLOW_

Re: [PATCH, rs6000 V2] Add additional checks when identifying load/store instructions

2021-08-06 Thread Segher Boessenkool
On Fri, Aug 06, 2021 at 02:07:10PM -0500, Pat Haugen wrote:
> Add additional checks to verify destination[source] of a load[store]
> instruction is a register.

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -18357,8 +18357,13 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>if (!pat || pat == NULL_RTX)
>  return false;
>  
> -  if (GET_CODE (pat) == SET)
> -return find_mem_ref (SET_SRC (pat), load_mem);
> +  if (GET_CODE (pat)== SET)

(space before ==)

Okay for trunk with that.  Thanks!


Segher


Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-08-06 Thread Marc Glisse

On Fri, 6 Aug 2021, Victor Tong wrote:


Thanks for the feedback. Here's the updated pattern:

 /* X - (X - Y) --> Y */
 (simplify
   (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1)))
   (if (ANY_INTEGRAL_TYPE_P (type)
   && TYPE_OVERFLOW_UNDEFINED(type)
   && !TYPE_OVERFLOW_SANITIZED(type)
   && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2))
   && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2))
   && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2))
   && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0))
   && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0))
   && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0))
   && TYPE_PRECISION (TREE_TYPE (@2)) <= TYPE_PRECISION (type)
   && TYPE_PRECISION (TREE_TYPE (@0)) <= TYPE_PRECISION (type))
   (convert @1)))

I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns that 
leverage undefined overflows check for it. I think this new pattern shouldn't 
be applied if overflow sanitizer checks are enabled.


why is this testing TREE_TYPE (@0)?


I'm checking the type of @0 because I'm concerned that there could be a case 
where @0's type isn't an integer type with undefined overflow. I tried creating 
a test case and couldn't seem to create one where this is violated but I kept 
the checks to avoid causing a regression. If I'm being overcautious and you 
feel that the type checks on @0 aren't needed, I can remove them. I think the 
precision check on TREE_TYPE(@0) is needed to avoid truncation cases though.


It doesn't matter if @0 has undefined overflow, but it can matter that it 
be signed (yes, the 2 are correlated...) if it has the same precision as 
@2. Otherwise (int64_t)(-1u)-(int64_t)((int)(-1u)-0) is definitely not 0 
and it has type:int64_t, @2:int, @0:unsigned.


Ignoring the sanitizer, the confusing double matching of constants, and 
restricting to scalars, I think the tightest condition (without vrp) that 
works is


TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) ||
 TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) &&
  (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) ||
   TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@2)) &&
   !TYPE_UNSIGNED (TREE_TYPE (@0))
  )

(where implicitly undefined => signed) but of course it is ok to handle
only a subset. It is too late for me to think about @0 vs @@0.

The patch you posted seems to accept the wrong 
(int128t)LLONG_MAX-(int128_t)((int)LLONG_MAX-0) -> (int128_t)0


Using ANY_INTEGRAL_TYPE_P allows vectors, and TYPE_PRECISION mustn't be 
used for vectors (maybe we should add more checking to catch such uses), 
and they may not be very happy with convert either...



Once we'd "inline" nop_convert genmatch would complain about this.


Maybe the new transform could be about scalars, and we could restrict the 
old one to vectors, to simplify the code, but that wouldn't help genmatch 
with the fact that the pattern x-(x-y) would appear twice. Is it really 
that bad? It is suspicious, but can be justified.



Is someone working on inlining nop_convert? I'd like to avoid breaking someone 
else's work if that's being worked on right now.

I've also added some extra tests to cover this new pattern. I've attached a 
patch with my latest changes.


From: Richard Biener 
Sent: Wednesday, July 28, 2021 2:59 AM
To: Victor Tong 
Cc: Marc Glisse ; gcc-patches@gcc.gnu.org 

Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] 
 

On Tue, Jun 29, 2021 at 1:10 AM Victor Tong  wrote:


Thanks Richard and Marc.

I wrote the following test case to compare the outputs of fn1() and fn1NoOpt() 
below with my extra pattern being applied. I tested the two functions with all 
of the integers from INT_MIN to INT_MAX.

long
fn1 (int x)
{
   return 42L - (long)(42 - x);
}

#pragma GCC push_options
#pragma GCC optimize ("O0")
long
fn1NoOpt (int x)
{
   volatile int y = (42 - x);
   return 42L - (long)y;
}
#pragma GCC pop_options

int main ()
{
 for (long i=INT_MIN; i<=INT_MAX;i++)
 {
 auto valNoOpt = fn1NoOpt(i);
 auto valOpt = fn1(i);
 if (valNoOpt != valOpt)
 printf("valOpt=%ld, valNoOpt=%ld\n", valOpt, valNoOpt);
 }
 return 0;
}

I saw that the return values of fn1() and fn1NoOpt() differed when the input 
was between INT_MIN and INT_MIN+42 inclusive. When passing values in this range 
to fn1NoOpt(), a signed overflow is triggered which causes the value to differ 
(undefined behavior). This seems to go in line with what Marc described and I 
think the transformation is correct in the scenario above. I do think that type 
casts that result in truncation (i.e. from a higher precision to a lower one) 
or with unsigned types will result in an incorrect transformation so those 
scenarios need to be avoided.

Given that the extra pattern I'm adding is taking advantage the undefined 
behavior of signed integer overflow, I'm considering keeping the 

Re: [PATCH 03/34] rs6000: Add the rest of the [altivec] stanza to the builtins file

2021-08-06 Thread Segher Boessenkool
Hi!

On Thu, Jul 29, 2021 at 08:30:50AM -0500, Bill Schmidt wrote:
> +  const vsc __builtin_altivec_abss_v16qi (vsc);
> +ABSS_V16QI altivec_abss_v16qi {}
> +
> +  const vsi __builtin_altivec_abss_v4si (vsi);
> +ABSS_V4SI altivec_abss_v4si {}
> +
> +  const vss __builtin_altivec_abss_v8hi (vss);
> +ABSS_V8HI altivec_abss_v8hi {}

Is there any ordering used here?  What is it, then?  Just alphabetical?

That order does not really allow breaking things up into groups, which
is the main tool to keep things manageable.

> +  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, 
> signed char, signed char, signed char, signed char, signed char, signed char, 
> signed char, signed char, signed char, signed char, signed char, signed char, 
> signed char);

That is a very long line, can you do something about that, or is that
forced by the file format?  Can you use just "char"?  "signed char" is a
very strange choice.

> +  pcvoid_type_node
> += build_pointer_type (build_qualified_type (void_type_node,
> + TYPE_QUAL_CONST));

A const void?  Interesting.  You are building a pointer to a const void
here, not a const pointer to void.  Is that what you wanted?

(And yes I do realise this is just moved, not new code).

> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2460,6 +2460,7 @@ enum rs6000_builtin_type_index
>RS6000_BTI_const_str,   /* pointer to const char * */
>RS6000_BTI_vector_pair, /* unsigned 256-bit types (vector pair).  */
>RS6000_BTI_vector_quad, /* unsigned 512-bit types (vector quad).  */
> +  RS6000_BTI_const_ptr_void, /* const pointer to void */
>RS6000_BTI_MAX
>  };

That is not what
  build_pointer_type (build_qualified_type (void_type_node, TYPE_QUAL_CONST));
builds though?

Okay for trunk, but please look at those things, especially the pcvoid
one!  Thanks,


Segher


Re: [r12-2789 Regression] FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 loops" 1 on Linux/x86_64

2021-08-06 Thread Jeff Law via Gcc-patches




On 8/6/2021 2:59 PM, sunil.k.pandey via Gcc-patches wrote:

On Linux/x86_64,

f31da42e047e8018ca6ad9809273bc7efb6ffcaf is the first bad commit
commit f31da42e047e8018ca6ad9809273bc7efb6ffcaf
Author: Richard Biener 
Date:   Fri Aug 6 14:39:05 2021 +0200

 tree-optimization/101801 - remove vect_worthwhile_without_simd_p

caused

FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect "vectorized 0 
loops" 1
Note I'm seeing this on some of the embedded targets as well.  Just to 
pick one, xstormy16-elf.  There are certainly others though.


jeff