[committed] gomp/declare-variant-1*.f90: Update for Windows

2023-01-27 Thread Tobias Burnus

Tested on x86_64-gnu-linux with -m32 and -m64. It was discussed on
#gfortran IRC and tested with MinGW64 with/by nightstrike.

Committed to mainline.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit d1e0575fdc9216f96c4f88f9f41a25b854300c0b
Author: Tobias Burnus 
Date:   Fri Jan 27 09:13:16 2023 +0100

gomp/declare-variant-1*.f90: Update for Windows

Replace target selector 'lp64' by '! ilp32' to handle
Windows which uses 32bit long (and vice versa for '! lp64').

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-10.f90: Update scan-tree's
target selector to handle Windows.
* gfortran.dg/gomp/declare-variant-11.f90: Likewise.
* gfortran.dg/gomp/declare-variant-12.f90: Likewise.

diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90
index d6d2c8c262b..2f09146a10d 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-10.f90
@@ -72,2 +72,2 @@ contains
-  call f04 ()	! { dg-final { scan-tree-dump-times "f03 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } }
-			! { dg-final { scan-tree-dump-times "f04 \\\(\\\);" 1 "gimple" { target { { ! lp64 } || { ! { i?86-*-* x86_64-*-* } } } } } }
+  call f04 () ! { dg-final { scan-tree-dump-times "f03 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } }
+  ! { dg-final { scan-tree-dump-times "f04 \\\(\\\);" 1 "gimple" { target { { ilp32 } || { ! { i?86-*-* x86_64-*-* } } } } } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90
index 60aa0fcb3b0..3593c9a5bb3 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-11.f90
@@ -129,2 +129,2 @@ contains
-call f27 ()	! { dg-final { scan-tree-dump-times "f25 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } }
-		! { dg-final { scan-tree-dump-times "f24 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } }
+call f27 () ! { dg-final { scan-tree-dump-times "f25 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } }
+! { dg-final { scan-tree-dump-times "f24 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ilp32 } } } } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90
index 610693e9807..2fd8abd0dc7 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-12.f90
@@ -136,2 +136,2 @@ contains
-	  call f13 ()	! { dg-final { scan-tree-dump-times "f09 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && lp64 } } } }
-			! { dg-final { scan-tree-dump-times "f11 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! lp64 } } } } }
+  call f13 ()   ! { dg-final { scan-tree-dump-times "f09 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ! ilp32 } } } } }
+! { dg-final { scan-tree-dump-times "f11 \\\(\\\);" 1 "gimple" { target { { i?86-*-* x86_64-*-* } && { ilp32 } } } } }


Re: [PATCH] tree: Fix up tree_code_{length,type}

2023-01-27 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 27, 2023 at 07:42:39AM +, Richard Biener wrote:
> > BTW, wonder if tree_code_type couldn't be an array of unsigned char
> > elements rather than enum tree_code_class and we'd then cast it
> > to the enum in the macro, that would shrink that array from 1496 bytes
> > to 374.  Of course, that sounds like stage1 material.
> 
> One could argue the same way for this patch (and instead revert),

Well, this patch is in fact a conditional reversion (revert for
C++11/14, add one keyword to 2 declarations otherwise).

> I'd say if we tweak this now then tweak it to the maximum extent?
> Isn't sth like 'enum unsigned char tree_code_class' now possible?
> (and a static assert the enum values all fit, though that would
> be diagnosed anyway?)

C++11 indeed has
enum tree_code_class : unsigned char {
  tcc_exceptional,
  ...
  tcc_expression
};
and one indeed gets an error if some enumerator doesn't fit.
The problem I see with this is that the type is 8-bit everywhere,
which I'd be afraid could cause worse code generation (of course,
one would need to try to see how much; e.g. build the compiler
unmodified, with the unsigned char array plus explicit casts from
the array and finally with unsigned char as underlying type).
When passing around enum tree_code_class etc., it is fine if it
is 32-bit.  And there isn't a way to create an enum with different
underlying type but with the same enumerators as in another enum.
Perhaps for tree_code_class we could away with the underlying type
because it is mostly used in the macros which immediately compare
it, in gcc/*.cc just in the following explicitly:
expr.cc:get_def_for_expr_class (tree name, enum tree_code_class tclass)
fold-const.cc:  enum tree_code_class tclass;
fold-const.cc:  enum tree_code_class tclass = TREE_CODE_CLASS (code);
fold-const.cc:  enum tree_code_class tclass = TREE_CODE_CLASS (code);
fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
gimple-fold.cc:  enum tree_code_class kind = TREE_CODE_CLASS 
(subcode);
print-tree.cc:  enum tree_code_class tclass;
print-tree.cc:  enum tree_code_class tclass;
tree.cc:   These must correspond to the tree_code_class entries.  */
tree.cc:const char *const tree_code_class_strings[] =
tree.cc:  enum tree_code_class type = TREE_CODE_CLASS (code);
tree.cc:  enum tree_code_class type = TREE_CODE_CLASS (code);
tree.cc:tree_class_check_failed (const_tree node, const enum tree_code_class cl,
tree.cc:tree_not_class_check_failed (const_tree node, const enum 
tree_code_class cl,
tree.cc:  const enum tree_code_class c = TREE_CODE_CLASS (TREE_CODE (t));
tree.cc:  const enum tree_code_class c = TREE_CODE_CLASS (TREE_CODE (t));
tree-dump.cc:  enum tree_code_class code_class;
tree-inline.cc:  enum tree_code_class cl = TREE_CODE_CLASS (code);
tree-pretty-print.cc:   enum tree_code_class tclass;
tree-ssa-live.cc:  enum tree_code_class c = TREE_CODE_CLASS (TREE_CODE (t));
tree-ssa-operands.cc:  enum tree_code_class codeclass;
But as I said, one would need to watch for code generation at least on
a couple of common hosts, and while x86_64 should be one of them, it might
have bigger effects on others as x86 has byte comparison etc. instructions.

> 
> > 2023-01-26  Patrick Palka  
> > Jakub Jelinek  
> > 
> > * tree-core.h (tree_code_type, tree_code_length): For
> > C++17 and later, add inline keyword, otherwise don't define
> > the arrays, but declare extern arrays.
> > * tree.cc (tree_code_type, tree_code_length): Define these
> > arrays for C++14 and older.
> > 
> > --- gcc/tree-core.h.jj  2023-01-02 09:32:31.188158094 +0100
> > +++ gcc/tree-core.h 2023-01-26 16:02:34.212113251 +0100
> > @@ -2284,17 +2284,20 @@ struct floatn_type_info {
> >  /* Matrix describing the structures contained in a given tree code.  */
> >  extern bool tree_contains_struct[MAX_TREE_CODES][64];
> >  
> > +/* Class of tree given its code.  */
> > +#if __cpp_inline_variables >= 201606L
> >  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> >  #define END_OF_BASE_TREE_CODES tcc_exceptional,
> >  
> > -
> > -/* Class of tree given its code.  */
> > -constexpr enum tree_code_class tree_code_type[] = {
> > +constexpr inline enum tree_code_class tree_code_type[] = {
> >  #include "all-tree.def"
> >  };
> 
> Do we need an explicit external definition somewhere when
> constant folding isn't possible?

> 
> Otherwise looks good to me.
> 
> Thanks,
> Richard.
> 
> >  #undef DEFTREECODE
> >  #undef END_OF_BASE_TREE_CODES
> > +#else
> > +extern const enum tree_code_class tree_code_type[];

There is one here for the C++11 and C++14 cases.
For C++17 and later it isn't needed,
constexpr inline enum tree_code_class tree_code_type[] = {
...
};
means this is a comdat variable in all TUs which need non-ODR
uses

[PATCH] cgraph: Adjust verify_corresponds_to_fndecl [PR106061]

2023-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

IPA passes redirect some calls in what it determines to be unreachable code
to builtin_decl_unreachable.  But that function returns sometimes
builtin_decl_explicit (BUILT_IN_UNREACHABLE) (which was what GCC 12
and earlier did always), or builtin_decl_explicit (BUILT_IN_TRAP)
(e.g. for -funreachable-traps, -O0, -Og).
Now the cgraph verification code has a code to verify cgraph edges
and has there an exception for these redirections to BUILT_IN_UNREACHABLE,
but doesn't have for BUILT_IN_TRAP, so e.g. the following testcase
ICEs during that verification.

The following patch just adds BUILT_IN_TRAP to those exceptions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-27  Jakub Jelinek  

PR ipa/106061
* cgraph.cc (cgraph_edge::verify_corresponds_to_fndecl): Allow
redirection of calls to __builtin_trap in addition to redirection
to __builtin_unreachable.

* gcc.dg/pr106061.c: New test.

--- gcc/cgraph.cc.jj2023-01-19 09:58:50.0 +0100
+++ gcc/cgraph.cc   2023-01-26 15:30:50.422759246 +0100
@@ -3248,9 +3248,11 @@ cgraph_edge::verify_corresponds_to_fndec
   node = node->ultimate_alias_target ();
 
   /* Optimizers can redirect unreachable calls or calls triggering undefined
- behavior to builtin_unreachable.  */
+ behavior to __builtin_unreachable or __builtin_trap.  */
 
-  if (fndecl_built_in_p (callee->decl, BUILT_IN_UNREACHABLE))
+  if (fndecl_built_in_p (callee->decl, BUILT_IN_NORMAL)
+  && (DECL_FUNCTION_CODE (callee->decl) == BUILT_IN_UNREACHABLE
+ || DECL_FUNCTION_CODE (callee->decl) == BUILT_IN_TRAP))
 return false;
 
   if (callee->former_clone_of != node->decl
--- gcc/testsuite/gcc.dg/pr106061.c.jj  2023-01-26 15:40:06.002721103 +0100
+++ gcc/testsuite/gcc.dg/pr106061.c 2023-01-26 15:41:32.553468886 +0100
@@ -0,0 +1,18 @@
+/* PR ipa/106061 */
+/* { dg-do compile } */
+/* { dg-options "-Og" } */
+
+extern void foo (void);
+
+inline void
+bar (int x)
+{
+  if (x)
+foo ();
+}
+
+void
+baz (void)
+{
+  bar (0);
+}

Jakub



[PATCH] doc: Fix up return type of __builtin_va_arg_pack_len [PR108560]

2023-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

__builtin_va_arg_pack_len as implemented returned int since its introduction
in 2007.  The initial documentation didn't mention any return type,
which changed in 2010 in r0-103077-gab940b73bfabe2cec4 during some
documentation formatting cleanups
https://gcc.gnu.org/legacy-ml/gcc-patches/2010-09/msg01632.html
I can understand that for formatting some type was needed there
but what exactly hasn't been really discussed.

So, I think we should change documentation to match the implementation,
rather than change implementation to match the documentation.
Most people don't use more than 2147483647 arguments to inline functions,
and on poor targets with 16-bit ints I bet even having more than 65535
arguments to inline functions would be highly unexpected.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-01-27  Jakub Jelinek  

PR other/108560
* doc/extend.texi: Fix up return type of __builtin_va_arg_pack_len
from size_t to int.

--- gcc/doc/extend.texi.jj  2023-01-24 11:10:13.218075138 +0100
+++ gcc/doc/extend.texi 2023-01-26 17:13:47.428496682 +0100
@@ -688,7 +688,7 @@ myprintf (FILE *f, const char *format, .
 @end smallexample
 @end deftypefn
 
-@deftypefn {Built-in Function} {size_t} __builtin_va_arg_pack_len ()
+@deftypefn {Built-in Function} {int} __builtin_va_arg_pack_len ()
 This built-in function returns the number of anonymous arguments of
 an inline function.  It can be used only in inline functions that
 are always inlined, never compiled as a separate function, such

Jakub



[Patch] OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]

2023-01-27 Thread Tobias Burnus

Rather obvious fix. Hence, I intent to commit it later as obvious,
unless there are any comments.

Tobias

PS: Thanks goes to Thomas for finding + reporting the issue.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]

gcc/fortran/ChangeLog:

	PR fortran/108558
	* trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr.

libgomp/ChangeLog:

	PR fortran/108558
	* testsuite/libgomp.fortran/has_device_addr.f90: New test.

 gcc/fortran/trans-openmp.cc|  2 +
 .../testsuite/libgomp.fortran/has_device_addr.f90  | 59 ++
 2 files changed, 61 insertions(+)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 87213de0918..5283d0ce5f3 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -6205,6 +6205,8 @@ gfc_split_omp_clauses (gfc_code *code,
 	= code->ext.omp_clauses->lists[OMP_LIST_MAP];
 	  clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_IS_DEVICE_PTR]
 	= code->ext.omp_clauses->lists[OMP_LIST_IS_DEVICE_PTR];
+	  clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_HAS_DEVICE_ADDR]
+	= code->ext.omp_clauses->lists[OMP_LIST_HAS_DEVICE_ADDR];
 	  clausesa[GFC_OMP_SPLIT_TARGET].device
 	= code->ext.omp_clauses->device;
 	  clausesa[GFC_OMP_SPLIT_TARGET].thread_limit
diff --git a/libgomp/testsuite/libgomp.fortran/has_device_addr.f90 b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90
new file mode 100644
index 000..95cc7788f2d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90
@@ -0,0 +1,59 @@
+! { dg-additional-options "-fdump-tree-original" }
+
+!
+! PR fortran/108558
+!
+
+! { dg-final { scan-tree-dump-times "#pragma omp target has_device_addr\\(x\\) has_device_addr\\(y\\)" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:x\\) map\\(tofrom:y\\)" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target update from\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:x\\) map\\(tofrom:y\\) use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp teams" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp distribute" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp parallel" 2 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp for nowait" 2 "original" } }
+
+module m
+contains
+subroutine vectorAdd(x, y, N)
+  implicit none
+  integer :: N
+  integer(4) :: x(N), y(N)
+  integer :: i
+
+  !$omp target teams distribute parallel do has_device_addr(x, y)
+  do i = 1, N
+y(i) = x(i) + y(i)
+  end do
+end subroutine vectorAdd
+end module m
+
+program main
+  use m
+  implicit none
+  integer, parameter :: N = 9876
+  integer(4) :: x(N), y(N)
+  integer :: i
+
+  x(:) = 1
+  y(:) = 2
+
+  !$omp target data map(x, y)
+!$omp target data use_device_addr(x, y)
+  call vectorAdd(x, y, N)
+!$omp end target data
+!$omp target update from(y)
+if (any (y /= 3)) error stop
+  !$omp end target data
+
+  x = 1
+  y = 2
+  !$omp target data map(x, y) use_device_addr(x, y)
+!$omp target teams distribute parallel do has_device_addr(x, y)
+do i = 1, N
+  y(i) = x(i) + y(i)
+end do
+ !$omp end target data
+ if (any (y /= 3)) error stop
+end program


Re: [Patch] OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]

2023-01-27 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 27, 2023 at 10:19:42AM +0100, Tobias Burnus wrote:
> Rather obvious fix. Hence, I intent to commit it later as obvious,
> unless there are any comments.

Yeah, this is obviously correct.

Have you checked the function if we don't miss other clauses in there
(e.g. compared to the C implementation)?

> OpenMP/Fortran: Fix has_device_addr clause splitting [PR108558]
> 
> gcc/fortran/ChangeLog:
> 
>   PR fortran/108558
>   * trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr.
> 
> libgomp/ChangeLog:
> 
>   PR fortran/108558
>   * testsuite/libgomp.fortran/has_device_addr.f90: New test.
> 
>  gcc/fortran/trans-openmp.cc|  2 +
>  .../testsuite/libgomp.fortran/has_device_addr.f90  | 59 
> ++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
> index 87213de0918..5283d0ce5f3 100644
> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -6205,6 +6205,8 @@ gfc_split_omp_clauses (gfc_code *code,
>   = code->ext.omp_clauses->lists[OMP_LIST_MAP];
> clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_IS_DEVICE_PTR]
>   = code->ext.omp_clauses->lists[OMP_LIST_IS_DEVICE_PTR];
> +   clausesa[GFC_OMP_SPLIT_TARGET].lists[OMP_LIST_HAS_DEVICE_ADDR]
> + = code->ext.omp_clauses->lists[OMP_LIST_HAS_DEVICE_ADDR];
> clausesa[GFC_OMP_SPLIT_TARGET].device
>   = code->ext.omp_clauses->device;
> clausesa[GFC_OMP_SPLIT_TARGET].thread_limit
> diff --git a/libgomp/testsuite/libgomp.fortran/has_device_addr.f90 
> b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90
> new file mode 100644
> index 000..95cc7788f2d
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/has_device_addr.f90
> @@ -0,0 +1,59 @@
> +! { dg-additional-options "-fdump-tree-original" }
> +
> +!
> +! PR fortran/108558
> +!
> +
> +! { dg-final { scan-tree-dump-times "#pragma omp target 
> has_device_addr\\(x\\) has_device_addr\\(y\\)" 2 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp target data 
> map\\(tofrom:x\\) map\\(tofrom:y\\)" 2 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp target data 
> use_device_addr\\(x\\) use_device_addr\\(y\\)" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp target update from\\(y\\)" 
> 1 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp target data 
> map\\(tofrom:x\\) map\\(tofrom:y\\) use_device_addr\\(x\\) 
> use_device_addr\\(y\\)" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp teams" 2 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp distribute" 2 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp parallel" 2 "original" } }
> +! { dg-final { scan-tree-dump-times "#pragma omp for nowait" 2 "original" } }
> +
> +module m
> +contains
> +subroutine vectorAdd(x, y, N)
> +  implicit none
> +  integer :: N
> +  integer(4) :: x(N), y(N)
> +  integer :: i
> +
> +  !$omp target teams distribute parallel do has_device_addr(x, y)
> +  do i = 1, N
> +y(i) = x(i) + y(i)
> +  end do
> +end subroutine vectorAdd
> +end module m
> +
> +program main
> +  use m
> +  implicit none
> +  integer, parameter :: N = 9876
> +  integer(4) :: x(N), y(N)
> +  integer :: i
> +
> +  x(:) = 1
> +  y(:) = 2
> +
> +  !$omp target data map(x, y)
> +!$omp target data use_device_addr(x, y)
> +  call vectorAdd(x, y, N)
> +!$omp end target data
> +!$omp target update from(y)
> +if (any (y /= 3)) error stop
> +  !$omp end target data
> +
> +  x = 1
> +  y = 2
> +  !$omp target data map(x, y) use_device_addr(x, y)
> +!$omp target teams distribute parallel do has_device_addr(x, y)
> +do i = 1, N
> +  y(i) = x(i) + y(i)
> +end do
> + !$omp end target data
> + if (any (y /= 3)) error stop
> +end program


Jakub



[PATCH] libstdc++: Fix up FAIL in 17_intro/names.cc on glibc < 2.19 [PR108568]

2023-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

On gcc112 which has glibc 2.17 I've noticed
FAIL: 17_intro/names.cc (test for excess errors)
FAIL: experimental/names.cc (test for excess errors)
These are because glibc < 2.19 used __unused as field member of various structs,
including mcontext_t in sys/ucontext.h on ppc64le.
This was changed in glibc with
https://gcc.gnu.org/pipermail/libc-alpha/2013-November/045766.html
names.cc even has
#ifdef __GLIBC_PREREQ
#if ! __GLIBC_PREREQ(2, 19)
// Glibc defines this prior to 2.19
#undef __unused
#endif
#endif
for it, but it doesn't work.  The reason is that __GLIBC_PREREQ is defined in
 but nothing included that header before this spot (it is included 
later
from bits/stdc++.h).

The following patch on Linux/Hurd conditionally includes features.h to get
the needed macros before deciding if __unused should be undefined or not.
If needed, I could use __GLIBC_PREREQ then but would need to check if it is
defined and between 1996 and 1999 it wasn't.

Tested on powerpc64le-linux with glibc 2.17 (where it fixes the
regressions), on x86_64-linux with glibc 2.35 (where it still PASSes),
plus on the latter with -E -dD on the test to verify __unused is just
defined and not undefined later on, ok for trunk?

2023-01-27  Jakub Jelinek  

PR libstdc++/108568
* testsuite/17_intro/names.cc (__unused): For linux or GNU hurd
include features.h if present and then check __GLIBC__ and
__GLIBC_MINOR__ macros for glibc prior to 2.19, instead of testing
__GLIBC_PREREQ which isn't defined yet.

--- libstdc++-v3/testsuite/17_intro/names.cc.jj 2023-01-16 23:19:06.292716661 
+0100
+++ libstdc++-v3/testsuite/17_intro/names.cc2023-01-27 10:20:20.787645823 
+0100
@@ -252,12 +252,15 @@
 #undef y
 #endif
 
-#ifdef __GLIBC_PREREQ
-#if ! __GLIBC_PREREQ(2, 19)
+#if defined (__linux__) || defined (__gnu_hurd__)
+#if __has_include()
+#include 
+#if __GLIBC__ == 2 && __GLIBC_MINOR__ < 19
 // Glibc defines this prior to 2.19
 #undef __unused
 #endif
 #endif
+#endif
 
 #if __has_include()
 // newlib's  defines these as macros.

Jakub



Re: [PATCH] RISC-V: Add testcases for IMM (0 ~ 31) AVL

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Wed, Jan 4, 2023 at 9:51 PM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-10.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-11.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-12.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-13.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-2.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-3.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-4.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-5.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-6.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-7.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-8.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-9.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_conflict-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_conflict-2.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_conflict-4.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_conflict-5.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-11.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-12.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-13.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-14.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-15.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-16.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-17.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-2.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-3.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-4.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-5.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-6.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-7.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-8.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-9.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-2.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-3.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-4.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-5.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-6.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-7.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: New test.
> * gcc.target/riscv/rvv/vsetvl/imm_switch-9.c: New test.
>
> ---
>  .../riscv/rvv/vsetvl/imm_bb_prop-1.c  |  32 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-10.c |  42 
>  .../riscv/rvv/vsetvl/imm_bb_prop-11.c |  42 
>  .../riscv/rvv/vsetvl/imm_bb_prop-12.c |  31 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-13.c |  29 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-2.c  |  29 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-3.c  |  22 ++
>  .../riscv/rvv/vsetvl/imm_bb_prop-4.c  |  25 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-5.c  |  33 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-6.c  |  30 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-7.c  |  31 +++
>  .../riscv/rvv/vsetvl/imm_bb_prop-8.c  |  37 
>  .../riscv/rvv/vsetvl/imm_bb_prop-9.c  |  37 
>  .../riscv/rvv/vsetvl/imm_conflict-1.c |  22 ++
>  .../riscv/rvv/vsetvl/imm_conflict-2.c |  22 ++
>  .../riscv/rvv/vsetvl/imm_conflict-3.c |  26 +++
>  .../riscv/rvv/vsetvl/imm_conflict-4.c |  38 
>  .../riscv/rvv/vsetvl/imm_conflict-5.c |  45 
>  .../riscv/rvv/vsetvl/imm_loop_invariant-1.c   | 195 ++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-10.c  |  41 
>  .../riscv/rvv/vsetvl/imm_loop_invariant-11.c  |  41 
>  .../riscv/rvv/vsetvl/imm_loop_invariant-12.c  |  28 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-13.c  |  30 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-14.c  |  31 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-15.c  |  32 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-16.c  |  29 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-17.c  |  23 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-2.c   | 168 +++
>  .../riscv/rvv/vsetvl/imm_loop_invariant-3.c   | 141 +
>  .../riscv/rvv/vsetvl/imm_loop_invariant-4.c   |

Re: [PATCH] RISC-V: Fix incorrect attributes of vsetvl instructions pattern

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Wed, Jan 18, 2023 at 10:44 AM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Fix incorrect attributes.
>
> ---
>  gcc/config/riscv/vector.md | 27 ---
>  1 file changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 4e93b7fead5..37cf4d6bcbf 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -95,13 +95,7 @@
>  (const_int 32)
>  (eq_attr "mode" "VNx1DI,VNx2DI,VNx4DI,VNx8DI,\
>   VNx1DF,VNx2DF,VNx4DF,VNx8DF")
> -(const_int 64)
> -
> -(eq_attr "type" "vsetvl")
> -(if_then_else (eq_attr "INSN_CODE (curr_insn) == CODE_FOR_vsetvldi
> -|| INSN_CODE (curr_insn) ==
> CODE_FOR_vsetvlsi")
> -  (symbol_ref "INTVAL (operands[2])")
> -  (const_int INVALID_ATTRIBUTE))]
> +(const_int 64)]
> (const_int INVALID_ATTRIBUTE)))
>
>  ;; Ditto to LMUL.
> @@ -149,12 +143,7 @@
>  (eq_attr "mode" "VNx4DI,VNx4DF")
>(symbol_ref "riscv_vector::get_vlmul(E_VNx4DImode)")
>  (eq_attr "mode" "VNx8DI,VNx8DF")
> -  (symbol_ref "riscv_vector::get_vlmul(E_VNx8DImode)")
> -(eq_attr "type" "vsetvl")
> -(if_then_else (eq_attr "INSN_CODE (curr_insn) == CODE_FOR_vsetvldi
> -|| INSN_CODE (curr_insn) ==
> CODE_FOR_vsetvlsi")
> -  (symbol_ref "INTVAL (operands[3])")
> -  (const_int INVALID_ATTRIBUTE))]
> +  (symbol_ref "riscv_vector::get_vlmul(E_VNx8DImode)")]
> (const_int INVALID_ATTRIBUTE)))
>
>  ;; It is valid for instruction that require sew/lmul ratio.
> @@ -531,7 +520,11 @@
>"TARGET_VECTOR"
>"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"
>[(set_attr "type" "vsetvl")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "sew") (symbol_ref "INTVAL (operands[2])"))
> +   (set (attr "vlmul") (symbol_ref "INTVAL (operands[3])"))
> +   (set (attr "ta") (symbol_ref "INTVAL (operands[4])"))
> +   (set (attr "ma") (symbol_ref "INTVAL (operands[5])"))])
>
>  ;; vsetvl zero,zero,vtype instruction.
>  ;; This pattern has no side effects and does not set X0 register.
> @@ -563,7 +556,11 @@
>"TARGET_VECTOR"
>"vset%i0vli\tzero,%0,e%1,%m2,t%p3,m%p4"
>[(set_attr "type" "vsetvl")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "sew") (symbol_ref "INTVAL (operands[1])"))
> +   (set (attr "vlmul") (symbol_ref "INTVAL (operands[2])"))
> +   (set (attr "ta") (symbol_ref "INTVAL (operands[3])"))
> +   (set (attr "ma") (symbol_ref "INTVAL (operands[4])"))])
>
>  ;; It's emit by vsetvl/vsetvlmax intrinsics with no side effects.
>  ;; Since we have many optmization passes from "expand" to
> "reload_completed",
> --
> 2.36.3
>
>


Re: [PATCH] RISC-V: Change VSETVL PASS always call split_all_insns

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks

On Mon, Jan 23, 2023 at 3:39 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 1/17/23 19:50, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > Since LCM will destroy CFG, we are going to reorder the location of
> VSETVL PASS
> > at least before bbro (block-reorder PASS) which is before split3 PASS.
> We need
> > to call it in VSETVL PASS to get final RVV instructions patterns.
> Just for the record.  LCM does not destroy the CFG, it just splits
> critical edges.
>
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/riscv-vsetvl.cc (pass_vsetvl::execute): Always
> call split_all_insns.
> OK.
> jeff
>


Re: [PATCH] RISC-V: Reorder VSETVL PASS location

2023-01-27 Thread Kito Cheng via Gcc-patches
Added more comments and committed, thanks!

On Mon, Jan 23, 2023 at 3:36 AM Jeff Law  wrote:

>
>
> On 1/17/23 20:03, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/riscv-passes.def (INSERT_PASS_BEFORE): Reorder
> VSETVL PASS.
> It'd be useful to know the motivation here, I could easily see someone
> in the future finding a case where the location of the vsetvl gets
> influenced by some dead code that would have been removed by the DCE
> pass.  Then they're going to want to know the motivation behind the
> current pass placement.
>
> So, OK after adding a comment describing why the pass is placed where it
> is.
>
> Jeff
>


Re: [PATCH] RISC-V: Change parse_insn into public for future use.

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Mon, Jan 23, 2023 at 3:32 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 1/17/23 20:06, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/riscv-vsetvl.h: Change it into public.
> OK.
> jeff
>


Re: [PATCH] tree: Fix up tree_code_{length,type}

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Jakub Jelinek wrote:

> On Fri, Jan 27, 2023 at 07:42:39AM +, Richard Biener wrote:
> > > BTW, wonder if tree_code_type couldn't be an array of unsigned char
> > > elements rather than enum tree_code_class and we'd then cast it
> > > to the enum in the macro, that would shrink that array from 1496 bytes
> > > to 374.  Of course, that sounds like stage1 material.
> > 
> > One could argue the same way for this patch (and instead revert),
> 
> Well, this patch is in fact a conditional reversion (revert for
> C++11/14, add one keyword to 2 declarations otherwise).
> 
> > I'd say if we tweak this now then tweak it to the maximum extent?
> > Isn't sth like 'enum unsigned char tree_code_class' now possible?
> > (and a static assert the enum values all fit, though that would
> > be diagnosed anyway?)
> 
> C++11 indeed has
> enum tree_code_class : unsigned char {
>   tcc_exceptional,
>   ...
>   tcc_expression
> };
> and one indeed gets an error if some enumerator doesn't fit.
> The problem I see with this is that the type is 8-bit everywhere,
> which I'd be afraid could cause worse code generation (of course,
> one would need to try to see how much; e.g. build the compiler
> unmodified, with the unsigned char array plus explicit casts from
> the array and finally with unsigned char as underlying type).
> When passing around enum tree_code_class etc., it is fine if it
> is 32-bit.  And there isn't a way to create an enum with different
> underlying type but with the same enumerators as in another enum.
> Perhaps for tree_code_class we could away with the underlying type
> because it is mostly used in the macros which immediately compare
> it, in gcc/*.cc just in the following explicitly:
> expr.cc:get_def_for_expr_class (tree name, enum tree_code_class tclass)
> fold-const.cc:  enum tree_code_class tclass;
> fold-const.cc:  enum tree_code_class tclass = TREE_CODE_CLASS (code);
> fold-const.cc:  enum tree_code_class tclass = TREE_CODE_CLASS (code);
> fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
> fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
> fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
> fold-const.cc:  enum tree_code_class kind = TREE_CODE_CLASS (code);
> gimple-fold.cc:  enum tree_code_class kind = TREE_CODE_CLASS 
> (subcode);
> print-tree.cc:  enum tree_code_class tclass;
> print-tree.cc:  enum tree_code_class tclass;
> tree.cc:   These must correspond to the tree_code_class entries.  */
> tree.cc:const char *const tree_code_class_strings[] =
> tree.cc:  enum tree_code_class type = TREE_CODE_CLASS (code);
> tree.cc:  enum tree_code_class type = TREE_CODE_CLASS (code);
> tree.cc:tree_class_check_failed (const_tree node, const enum tree_code_class 
> cl,
> tree.cc:tree_not_class_check_failed (const_tree node, const enum 
> tree_code_class cl,
> tree.cc:  const enum tree_code_class c = TREE_CODE_CLASS (TREE_CODE (t));
> tree.cc:  const enum tree_code_class c = TREE_CODE_CLASS (TREE_CODE (t));
> tree-dump.cc:  enum tree_code_class code_class;
> tree-inline.cc:  enum tree_code_class cl = TREE_CODE_CLASS (code);
> tree-pretty-print.cc: enum tree_code_class tclass;
> tree-ssa-live.cc:  enum tree_code_class c = TREE_CODE_CLASS (TREE_CODE (t));
> tree-ssa-operands.cc:  enum tree_code_class codeclass;
> But as I said, one would need to watch for code generation at least on
> a couple of common hosts, and while x86_64 should be one of them, it might
> have bigger effects on others as x86 has byte comparison etc. instructions.

Hm, yes.  Not sure if using uint_fast8_t would make a difference where
it should.  So lets keep this change separate.

Richard.

> > 
> > > 2023-01-26  Patrick Palka  
> > >   Jakub Jelinek  
> > > 
> > >   * tree-core.h (tree_code_type, tree_code_length): For
> > >   C++17 and later, add inline keyword, otherwise don't define
> > >   the arrays, but declare extern arrays.
> > >   * tree.cc (tree_code_type, tree_code_length): Define these
> > >   arrays for C++14 and older.
> > > 
> > > --- gcc/tree-core.h.jj2023-01-02 09:32:31.188158094 +0100
> > > +++ gcc/tree-core.h   2023-01-26 16:02:34.212113251 +0100
> > > @@ -2284,17 +2284,20 @@ struct floatn_type_info {
> > >  /* Matrix describing the structures contained in a given tree code.  */
> > >  extern bool tree_contains_struct[MAX_TREE_CODES][64];
> > >  
> > > +/* Class of tree given its code.  */
> > > +#if __cpp_inline_variables >= 201606L
> > >  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> > >  #define END_OF_BASE_TREE_CODES tcc_exceptional,
> > >  
> > > -
> > > -/* Class of tree given its code.  */
> > > -constexpr enum tree_code_class tree_code_type[] = {
> > > +constexpr inline enum tree_code_class tree_code_type[] = {
> > >  #include "all-tree.def"
> > >  };
> > 
> > Do we need an explicit external definition somewhere when
> > constant folding isn't possible?
> 
> > 
> > Otherwise looks good to me.
> > 
> > Thanks,

Re: [PATCH 1/3] arm: Fix sign of MVE predicate mve_pred16_t [PR 107674]

2023-01-27 Thread Andre Vieira (lists) via Gcc-patches




On 26/01/2023 15:02, Kyrylo Tkachov wrote:

Hi Andre,


-Original Message-
From: Andre Vieira (lists) 
Sent: Tuesday, January 24, 2023 1:41 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw

Subject: [PATCH 1/3] arm: Fix sign of MVE predicate mve_pred16_t [PR
107674]

Hi,

The ACLE defines mve_pred16_t as an unsigned short.  This patch makes
sure GCC treats the predicate as an unsigned type, rather than signed.

Bootstrapped on aarch64-none-eabi and regression tested on arm-none-eabi
and armeb-none-eabi for armv8.1-m.main+mve.fp.

OK for trunk?

gcc/ChangeLog:

PR target/107674
* config/arm/arm-builtins.cc (arm_simd_builtin_type): Rewrite to
use
new qualifiers parameter and use unsigned short type for MVE
predicate.
(arm_init_builtin): Call arm_simd_builtin_type with qualifiers
parameter.
(arm_init_crypto_builtins): Likewise.

gcc/testsuite/ChangeLog:

PR target/107674
* gcc.target/arm/mve/mve_vpt.c: New test.


diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 
11d7478d9df69139802a9d42c09dd0de7480b60e..6c67cec93ff76a4b42f3a0b305f697142e88fcd9
 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -1489,12 +1489,14 @@ arm_lookup_simd_builtin_type (machine_mode mode,
  }
  
  static tree

-arm_simd_builtin_type (machine_mode mode, bool unsigned_p, bool poly_p)
+arm_simd_builtin_type (machine_mode mode, enum arm_type_qualifiers qualifiers)
  {

I think in C++ now we can leave out the "enum" here.

diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
new file mode 100644
index 
..26a565b79dd1348e361b3aa23a1d6e6d13bffce8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
@@ -0,0 +1,27 @@
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-final { check-function-bodies "**" "" } } */
+#include 
+void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
+{
+uint8x16_t va = vldrbq_u8 (a);
+uint8x16_t vb = vldrbq_u8 (b);
+mve_pred16_t p = vcmpeqq_u8 (va, vb);
+uint8x16_t vc = vaddq_x_u8 (va, vb, p);
+vstrbq_p_u8 (c, vc, p);
+}
+/*
+** test0:
+** vldrb.8 q2, \[r0\]
+** vldrb.8 q1, \[r1\]
+** vcmp.i8 eq, q2, q1
+** vmrsr3, p0  @ movhi
+** uxthr3, r3
+** vmsrp0, r3  @ movhi
+** vpst
+** vaddt.i8q3, q2, q1
+** vpst
+** vstrbt.8q3, \[r2\]
+** bx  lr
+*/

This explicit assembly matching looks quite fragile and sensitive to future 
scheduling and RA changes.
Is there something more targeted we could scan for to check that the predicate 
is unsigned now?
No not really, as it's not unsigned everywhere, only in its intermediate 
representation between intrinsics. GCC is aware that mve_pred16_t is an 
unsigned short, so as soon as you try to use the value on its own or 
pass it as an argument or return, there is an implicit cast.


I could make this particular test-case more robust by not checking 
specific registers. Though the sequence of loads-cmp-add-store will 
always be the same as it's data-dependent.


RE: [PATCH 1/3] arm: Fix sign of MVE predicate mve_pred16_t [PR 107674]

2023-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Friday, January 27, 2023 9:54 AM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw 
> Subject: Re: [PATCH 1/3] arm: Fix sign of MVE predicate mve_pred16_t [PR
> 107674]
> 
> 
> 
> On 26/01/2023 15:02, Kyrylo Tkachov wrote:
> > Hi Andre,
> >
> >> -Original Message-
> >> From: Andre Vieira (lists) 
> >> Sent: Tuesday, January 24, 2023 1:41 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Kyrylo Tkachov ; Richard Earnshaw
> >> 
> >> Subject: [PATCH 1/3] arm: Fix sign of MVE predicate mve_pred16_t [PR
> >> 107674]
> >>
> >> Hi,
> >>
> >> The ACLE defines mve_pred16_t as an unsigned short.  This patch makes
> >> sure GCC treats the predicate as an unsigned type, rather than signed.
> >>
> >> Bootstrapped on aarch64-none-eabi and regression tested on arm-none-
> eabi
> >> and armeb-none-eabi for armv8.1-m.main+mve.fp.
> >>
> >> OK for trunk?
> >>
> >> gcc/ChangeLog:
> >>
> >>PR target/107674
> >>* config/arm/arm-builtins.cc (arm_simd_builtin_type): Rewrite to
> >> use
> >>new qualifiers parameter and use unsigned short type for MVE
> >> predicate.
> >>(arm_init_builtin): Call arm_simd_builtin_type with qualifiers
> >>parameter.
> >>(arm_init_crypto_builtins): Likewise.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>PR target/107674
> >>* gcc.target/arm/mve/mve_vpt.c: New test.
> >
> > diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> > index
> 11d7478d9df69139802a9d42c09dd0de7480b60e..6c67cec93ff76a4b42f3a0b3
> 05f697142e88fcd9 100644
> > --- a/gcc/config/arm/arm-builtins.cc
> > +++ b/gcc/config/arm/arm-builtins.cc
> > @@ -1489,12 +1489,14 @@ arm_lookup_simd_builtin_type
> (machine_mode mode,
> >   }
> >
> >   static tree
> > -arm_simd_builtin_type (machine_mode mode, bool unsigned_p, bool
> poly_p)
> > +arm_simd_builtin_type (machine_mode mode, enum arm_type_qualifiers
> qualifiers)
> >   {
> >
> > I think in C++ now we can leave out the "enum" here.
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > new file mode 100644
> > index
> ..26a565b79dd1348e361b3a
> a23a1d6e6d13bffce8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-options "-O2" } */
> > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> > +/* { dg-add-options arm_v8_1m_mve } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +#include 
> > +void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
> > +{
> > +uint8x16_t va = vldrbq_u8 (a);
> > +uint8x16_t vb = vldrbq_u8 (b);
> > +mve_pred16_t p = vcmpeqq_u8 (va, vb);
> > +uint8x16_t vc = vaddq_x_u8 (va, vb, p);
> > +vstrbq_p_u8 (c, vc, p);
> > +}
> > +/*
> > +** test0:
> > +** vldrb.8 q2, \[r0\]
> > +** vldrb.8 q1, \[r1\]
> > +** vcmp.i8 eq, q2, q1
> > +** vmrsr3, p0  @ movhi
> > +** uxthr3, r3
> > +** vmsrp0, r3  @ movhi
> > +** vpst
> > +** vaddt.i8q3, q2, q1
> > +** vpst
> > +** vstrbt.8q3, \[r2\]
> > +** bx  lr
> > +*/
> >
> > This explicit assembly matching looks quite fragile and sensitive to future
> scheduling and RA changes.
> > Is there something more targeted we could scan for to check that the
> predicate is unsigned now?
> No not really, as it's not unsigned everywhere, only in its intermediate
> representation between intrinsics. GCC is aware that mve_pred16_t is an
> unsigned short, so as soon as you try to use the value on its own or
> pass it as an argument or return, there is an implicit cast.
> 
> I could make this particular test-case more robust by not checking
> specific registers. Though the sequence of loads-cmp-add-store will
> always be the same as it's data-dependent.

Yeah, I suspected as such. Ok, let's abstract the registers away (I think 
check-function-bodies can use regex capturing to record particular registers) 
then.
Thanks,
Kyrill



Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-27 Thread Andre Vieira (lists) via Gcc-patches




On 26/01/2023 15:06, Kyrylo Tkachov wrote:

Hi Andre,


-Original Message-
From: Andre Vieira (lists) 
Sent: Tuesday, January 24, 2023 1:54 PM
To: gcc-patches@gcc.gnu.org
Cc: Richard Sandiford ; Richard Earnshaw
; Richard Biener ;
Kyrylo Tkachov 
Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
predicates before use [PR 107674]

Hi,

This patch teaches GCC that zero-extending a MVE predicate from 16-bits
to 32-bits and then only using 16-bits is a no-op.
It does so in two steps:
- it lets gcc know that it can access any MVE predicate mode using any
other MVE predicate mode without needing to copy it, using the
TARGET_MODES_TIEABLE_P hook,
- it teaches simplify_subreg to optimize a subreg with a vector
outermode, by replacing this outermode with a same-sized integer mode
and trying the avalailable optimizations, then if successful it
surrounds the result with a subreg casting it back to the original
vector outermode.

This removes the unnecessary zero-extending shown on PR 107674 (though
it's a sign-extend there), that was introduced in gcc 11.

Bootstrapped on aarch64-none-linux-gnu and regression tested on
arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.

OK for trunk?

gcc/ChangeLog:

PR target/107674
  * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
  (arm_modes_tieable_p): Make MVE predicate modes tieable.
* config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
* simplify-rtx.cc (simplify_context::simplify_subreg): Teach
simplify_subreg to simplify subregs where the outermode is not
scalar.


The arm changes look ok to me. We'll want a midend maintainer to have a look at 
simplify-rtx.cc



gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
zero-extend.


diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
index 
26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5
 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
@@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
  **vldrb.8 q2, \[r0\]
  **vldrb.8 q1, \[r1\]
  **vcmp.i8 eq, q2, q1
-** vmrsr3, p0  @ movhi
-** uxthr3, r3
-** vmsrp0, r3  @ movhi
  **vpst
  **vaddt.i8q3, q2, q1
  **vpst

Ah I see, that's the testcase from patch 1/3 that I criticized :)
Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more 
robust?
Thanks,
Kyrill
I could, but I would rather not. I have a patch series waiting for GCC 
14 that does further improvements to this (and other VPST codegen) 
sequences and if I do scan for 'absence' of an instruction I have to 
break them up into single tests each. Also it wouldn't then fail if we 
start spilling the predicate directly to memory for instance. Like I 
mentioned in the previous patch, the sequence is unlikely to be able to 
change through scheduling (other than maybe the reordering of the loads 
through some bad luck, but I could make it robust to that).


Re: [PATCH] cgraph: Adjust verify_corresponds_to_fndecl [PR106061]

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Jakub Jelinek wrote:

> Hi!
> 
> IPA passes redirect some calls in what it determines to be unreachable code
> to builtin_decl_unreachable.  But that function returns sometimes
> builtin_decl_explicit (BUILT_IN_UNREACHABLE) (which was what GCC 12
> and earlier did always), or builtin_decl_explicit (BUILT_IN_TRAP)
> (e.g. for -funreachable-traps, -O0, -Og).
> Now the cgraph verification code has a code to verify cgraph edges
> and has there an exception for these redirections to BUILT_IN_UNREACHABLE,
> but doesn't have for BUILT_IN_TRAP, so e.g. the following testcase
> ICEs during that verification.
> 
> The following patch just adds BUILT_IN_TRAP to those exceptions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-01-27  Jakub Jelinek  
> 
>   PR ipa/106061
>   * cgraph.cc (cgraph_edge::verify_corresponds_to_fndecl): Allow
>   redirection of calls to __builtin_trap in addition to redirection
>   to __builtin_unreachable.
> 
>   * gcc.dg/pr106061.c: New test.
> 
> --- gcc/cgraph.cc.jj  2023-01-19 09:58:50.0 +0100
> +++ gcc/cgraph.cc 2023-01-26 15:30:50.422759246 +0100
> @@ -3248,9 +3248,11 @@ cgraph_edge::verify_corresponds_to_fndec
>node = node->ultimate_alias_target ();
>  
>/* Optimizers can redirect unreachable calls or calls triggering undefined
> - behavior to builtin_unreachable.  */
> + behavior to __builtin_unreachable or __builtin_trap.  */
>  
> -  if (fndecl_built_in_p (callee->decl, BUILT_IN_UNREACHABLE))
> +  if (fndecl_built_in_p (callee->decl, BUILT_IN_NORMAL)
> +  && (DECL_FUNCTION_CODE (callee->decl) == BUILT_IN_UNREACHABLE
> +   || DECL_FUNCTION_CODE (callee->decl) == BUILT_IN_TRAP))
>  return false;
>  
>if (callee->former_clone_of != node->decl
> --- gcc/testsuite/gcc.dg/pr106061.c.jj2023-01-26 15:40:06.002721103 
> +0100
> +++ gcc/testsuite/gcc.dg/pr106061.c   2023-01-26 15:41:32.553468886 +0100
> @@ -0,0 +1,18 @@
> +/* PR ipa/106061 */
> +/* { dg-do compile } */
> +/* { dg-options "-Og" } */
> +
> +extern void foo (void);
> +
> +inline void
> +bar (int x)
> +{
> +  if (x)
> +foo ();
> +}
> +
> +void
> +baz (void)
> +{
> +  bar (0);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


RE: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Friday, January 27, 2023 9:58 AM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Richard Biener 
> Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
> predicates before use [PR 107674]
> 
> 
> 
> On 26/01/2023 15:06, Kyrylo Tkachov wrote:
> > Hi Andre,
> >
> >> -Original Message-
> >> From: Andre Vieira (lists) 
> >> Sent: Tuesday, January 24, 2023 1:54 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Richard Sandiford ; Richard Earnshaw
> >> ; Richard Biener ;
> >> Kyrylo Tkachov 
> >> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
> >> predicates before use [PR 107674]
> >>
> >> Hi,
> >>
> >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits
> >> to 32-bits and then only using 16-bits is a no-op.
> >> It does so in two steps:
> >> - it lets gcc know that it can access any MVE predicate mode using any
> >> other MVE predicate mode without needing to copy it, using the
> >> TARGET_MODES_TIEABLE_P hook,
> >> - it teaches simplify_subreg to optimize a subreg with a vector
> >> outermode, by replacing this outermode with a same-sized integer mode
> >> and trying the avalailable optimizations, then if successful it
> >> surrounds the result with a subreg casting it back to the original
> >> vector outermode.
> >>
> >> This removes the unnecessary zero-extending shown on PR 107674
> (though
> >> it's a sign-extend there), that was introduced in gcc 11.
> >>
> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on
> >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.
> >>
> >> OK for trunk?
> >>
> >> gcc/ChangeLog:
> >>
> >>PR target/107674
> >>   * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
> >>   (arm_modes_tieable_p): Make MVE predicate modes tieable.
> >>* config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
> >>* simplify-rtx.cc (simplify_context::simplify_subreg): Teach
> >>simplify_subreg to simplify subregs where the outermode is not
> >> scalar.
> >
> > The arm changes look ok to me. We'll want a midend maintainer to have a
> look at simplify-rtx.cc
> >
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
> >>zero-extend.
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > index
> 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5
> acf9af0a2155b5c5 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
> >   **vldrb.8 q2, \[r0\]
> >   **vldrb.8 q1, \[r1\]
> >   **vcmp.i8 eq, q2, q1
> > -** vmrsr3, p0  @ movhi
> > -** uxthr3, r3
> > -** vmsrp0, r3  @ movhi
> >   **vpst
> >   **vaddt.i8q3, q2, q1
> >   **vpst
> >
> > Ah I see, that's the testcase from patch 1/3 that I criticized :)
> > Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more
> robust?
> > Thanks,
> > Kyrill
> I could, but I would rather not. I have a patch series waiting for GCC
> 14 that does further improvements to this (and other VPST codegen)
> sequences and if I do scan for 'absence' of an instruction I have to
> break them up into single tests each. Also it wouldn't then fail if we
> start spilling the predicate directly to memory for instance. Like I
> mentioned in the previous patch, the sequence is unlikely to be able to
> change through scheduling (other than maybe the reordering of the loads
> through some bad luck, but I could make it robust to that).

Ok, looks like it was thought through, so fine by me.
Thanks,
Kyrill


Re: [PATCH] RISC-V: Add TARGET_MIN_VLEN > 32 into iterators of EEW = 64 vector modes

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks :)

On Mon, Jan 23, 2023 at 3:29 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 1/20/23 02:33, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > According to RVV ISA, RVV doesn't support EEW == 64 vector type for
> zve32x
> > and zve32f. So it makes sense add predicate in the iterators of EEW = 64
> > vector modes.
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/vector-iterators.md: Add TARGET_MIN_VLEN > 32
> predicates.
> OK.
>
> Jeff
>


Re: [PATCH] doc: Fix up return type of __builtin_va_arg_pack_len [PR108560]

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Jakub Jelinek wrote:

> Hi!
> 
> __builtin_va_arg_pack_len as implemented returned int since its introduction
> in 2007.  The initial documentation didn't mention any return type,
> which changed in 2010 in r0-103077-gab940b73bfabe2cec4 during some
> documentation formatting cleanups
> https://gcc.gnu.org/legacy-ml/gcc-patches/2010-09/msg01632.html
> I can understand that for formatting some type was needed there
> but what exactly hasn't been really discussed.
> 
> So, I think we should change documentation to match the implementation,
> rather than change implementation to match the documentation.
> Most people don't use more than 2147483647 arguments to inline functions,
> and on poor targets with 16-bit ints I bet even having more than 65535
> arguments to inline functions would be highly unexpected.

Agreed.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-01-27  Jakub Jelinek  
> 
>   PR other/108560
>   * doc/extend.texi: Fix up return type of __builtin_va_arg_pack_len
>   from size_t to int.
> 
> --- gcc/doc/extend.texi.jj2023-01-24 11:10:13.218075138 +0100
> +++ gcc/doc/extend.texi   2023-01-26 17:13:47.428496682 +0100
> @@ -688,7 +688,7 @@ myprintf (FILE *f, const char *format, .
>  @end smallexample
>  @end deftypefn
>  
> -@deftypefn {Built-in Function} {size_t} __builtin_va_arg_pack_len ()
> +@deftypefn {Built-in Function} {int} __builtin_va_arg_pack_len ()
>  This built-in function returns the number of anonymous arguments of
>  an inline function.  It can be used only in inline functions that
>  are always inlined, never compiled as a separate function, such
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] RISC-V: Fix pred_mov constraint for vle.v

2023-01-27 Thread Kito Cheng via Gcc-patches
Committed, thanks!

On Thu, Jan 19, 2023 at 3:03 PM  wrote:

> From: Ju-Zhe Zhong 
>
> The original constraint is incorrect in pred_mov pattern.
> Take a look at Alternative 2, the operands[0] is "vr",
> operands[1] which is mask operand can be "vm".
> Such alternative matching will give the wrong codegen (vle.v v0,0(a5),v0.t)
> This is illegal according to RVV ISA.
>
> To fix this issue and not destroy the RA performance, fix this pattern in
> this patch.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Fix constraints.
>
> ---
>  gcc/config/riscv/vector.md | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 48414e200cf..e1173f2d5a6 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -633,22 +633,23 @@
>  ;;2. (const_vector:VNx1SF repeat [
>  ;;(const_double:SF 0.0 [0x0.0p+0])]).
>  (define_insn_and_split "@pred_mov"
> -  [(set (match_operand:V 0 "nonimmediate_operand"  "=vd,vr,
>m,vr,vr")
> -   (if_then_else:V
> - (unspec:
> -   [(match_operand: 1 "vector_mask_operand" "vmWc1, vmWc1,
> vmWc1,   Wc1,   Wc1")
> -(match_operand 4 "vector_length_operand""   rK,rK,
> rK,rK,rK")
> -(match_operand 5 "const_int_operand""i, i,
>  i, i, i")
> -(match_operand 6 "const_int_operand""i, i,
>  i, i, i")
> -(match_operand 7 "const_int_operand""i, i,
>  i, i, i")
> -(reg:SI VL_REGNUM)
> -(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> - (match_operand:V 3 "vector_move_operand"   "m, m,
> vr,vr, viWc0")
> - (match_operand:V 2 "vector_merge_operand"  "0,vu,
> vu,   vu0,   vu0")))]
> +  [(set (match_operand:V 0 "nonimmediate_operand"  "=vr,vr,
> vd, m,vr,vr")
> +(if_then_else:V
> +  (unspec:
> +[(match_operand: 1 "vector_mask_operand" "vmWc1,   Wc1,
> vm, vmWc1,   Wc1,   Wc1")
> + (match_operand 4 "vector_length_operand""   rK,rK,
> rK,rK,rK,rK")
> + (match_operand 5 "const_int_operand""i, i,
>  i, i, i, i")
> + (match_operand 6 "const_int_operand""i, i,
>  i, i, i, i")
> + (match_operand 7 "const_int_operand""i, i,
>  i, i, i, i")
> + (reg:SI VL_REGNUM)
> + (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +  (match_operand:V 3 "vector_move_operand"   "m, m,
>  m,vr,vr, viWc0")
> +  (match_operand:V 2 "vector_merge_operand"  "0,vu,
> vu,vu,   vu0,   vu0")))]
>"TARGET_VECTOR"
>"@
> vle.v\t%0,%3%p1
> -   vle.v\t%0,%3%p1
> +   vle.v\t%0,%3
> +   vle.v\t%0,%3,%1.t
> vse.v\t%3,%0%p1
> vmv.v.v\t%0,%3
> vmv.v.i\t%0,%v3"
> @@ -657,7 +658,7 @@
> && satisfies_constraint_vu (operands[2])"
>[(set (match_dup 0) (match_dup 3))]
>""
> -  [(set_attr "type" "vlde,vlde,vste,vimov,vimov")
> +  [(set_attr "type" "vlde,vlde,vlde,vste,vimov,vimov")
> (set_attr "mode" "")])
>
>  ;; Dedicated pattern for vse.v instruction since we can't reuse pred_mov
> pattern to include
> --
> 2.36.3
>
>


Re: [PATCH] RISC-V: Refine function args of some functions.

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Wed, Jan 18, 2023 at 11:13 AM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Refine function
> args.
> (emit_vsetvl_insn): Ditto.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 728a32dacd6..e11751f00af 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -580,7 +580,7 @@ has_vector_insn (function *fn)
>
>  /* Emit vsetvl instruction.  */
>  static rtx
> -gen_vsetvl_pat (enum vsetvl_type insn_type, vl_vtype_info info, rtx vl)
> +gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info,
> rtx vl)
>  {
>rtx avl = info.get_avl ();
>rtx sew = gen_int_mode (info.get_sew (), Pmode);
> @@ -600,7 +600,7 @@ gen_vsetvl_pat (enum vsetvl_type insn_type,
> vl_vtype_info info, rtx vl)
>  }
>
>  static rtx
> -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info info)
> +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
>  {
>rtx new_pat;
>if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
> @@ -617,7 +617,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const
> vector_insn_info info)
>
>  static void
>  emit_vsetvl_insn (enum vsetvl_type insn_type, enum emit_type emit_type,
> - vl_vtype_info info, rtx vl, rtx_insn *rinsn)
> + const vl_vtype_info &info, rtx vl, rtx_insn *rinsn)
>  {
>rtx pat = gen_vsetvl_pat (insn_type, info, vl);
>if (dump_file)
> --
> 2.36.3
>
>
>


Re: [PATCH] RISC-V: Fix bug of before_p function

2023-01-27 Thread Kito Cheng via Gcc-patches
Committed with more comments to describe why this should be fixed.

On Wed, Jan 18, 2023 at 11:10 AM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (before_p): Fix bug.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 26d096ea939..728a32dacd6 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -188,7 +188,7 @@ real_insn_and_same_bb_p (const insn_info *insn, const
> bb_info *bb)
>  static bool
>  before_p (const insn_info *insn1, const insn_info *insn2)
>  {
> -  return insn1->compare_with (insn2) == -1;
> +  return insn1->compare_with (insn2) < 0;
>  }
>
>  static bool
> --
> 2.36.3
>
>


[PATCH]AArch64: Fix native detection in the presence of mandatory features which don't have midr values

2023-01-27 Thread Tamar Christina via Gcc-patches
Hi All,

aarch64-option-extensions.def explicitly defines the semantics for an empty midr
field as being:

 In that case this field
 should contain a space (" ") separated list of the strings in 'Features'
 that are required.  Their order is not important.  An empty string means
 do not detect this feature during auto detection.

That is to say, an empty string means that we don't know the midr value for this
feature and so it just shouldn't be taken into account for native features
detection.  However this meaning seems to have gotten lost at some point.

This results in e.g. -mcpu=native on a Neoverse N2 disabling features it does
have.  Essentially we disabled any mandatory feature for which there is no midr
entry.

The rationale for having -mcpu=native being able to disable features at all, is
because the kernel is able to disable a mandatory feature for correctness
issues.  Unfortunately we can't distinguish between "old kernel"
and "kernel disabled".

This patch adds a new field that indicates whether the midr field has any value
at all.  If there's no value we skip the extension when determining the "off"
flags.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? I think it needs backporting but need to verify older compilers.
If one is required, OK for backporting?

Thanks,
Tamar

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_option_extension): Add native_detect and document struct
a bit more.
(all_extensions): Set new field native_detect.
* config/aarch64/aarch64.cc (struct aarch64_option_extension): Delete
unused struct.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_19: New test.
* gcc.target/aarch64/cpunative/info_20: New test.
* gcc.target/aarch64/cpunative/info_21: New test.
* gcc.target/aarch64/cpunative/info_22: New test.
* gcc.target/aarch64/cpunative/native_cpu_19.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_20.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_21.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_22.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
a9695d60197e6585957b293d2d755a557e124d4f..4e9e9c0bf86a5ef2667f0bb7e646ba06152aa982
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -139,10 +139,17 @@ aarch64_handle_option (struct gcc_options *opts,
 /* An ISA extension in the co-processor and main instruction set space.  */
 struct aarch64_option_extension
 {
-  const char *name;
+  /* The extension name to pass on to the assembler.  */
+  const char *const name;
+  /* The smallest set of feature bits to toggle to enable this option.  */
   aarch64_feature_flags flag_canonical;
-  aarch64_feature_flags flags_on;
-  aarch64_feature_flags flags_off;
+  /* If this feature is turned on, these bits also need to be turned on.  */
+  const unsigned long flags_on;
+  /* If this feature is turned off, these bits also need to be turned off.  */
+  const unsigned long flags_off;
+  /* Indicates whether this feature is taken into account during native cpu
+ detection.  */
+  bool native_detect;
 };
 
 /* ISA extensions in AArch64.  */
@@ -150,9 +157,9 @@ static constexpr aarch64_option_extension all_extensions[] =
 {
 #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, F) \
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
-   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
+   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), strlen (F)},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0}
+  {NULL, 0, 0, 0, false}
 };
 
 struct processor_name_to_arch
@@ -325,9 +332,13 @@ aarch64_get_extension_string_for_isa_flags
outstr += opt.name;
   }
 
-  /* Remove the features in current_flags & ~isa_flags.  */
+  /* Remove the features in current_flags & ~isa_flags.  If the feature does
+ not have an HWCAPs then it shouldn't be taken into account for feature
+ detection because one way or another we can't tell if it's available
+ or not.  */
   for (auto &opt : all_extensions)
-if (opt.flag_canonical & current_flags & ~isa_flags)
+if (opt.native_detect
+   && (opt.flag_canonical & current_flags & ~isa_flags))
   {
current_flags &= ~opt.flags_off;
outstr += "+no";
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d36b57341b336a81dc2e1a975986b3e37402602a..860aeb3e5fbf655e87284be28cc72648c1cd71f9
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2808,14 +2808,6 @@ static const struct attribute_spec 
aarch64_attribute_table[] =
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
-/* An ISA extension in 

[PATCH]AArch64: Fix codegen regressions around tbz.

2023-01-27 Thread Tamar Christina via Gcc-patches
Hi All,

We were analyzing code quality after recent changes and have noticed that the
tbz support somehow managed to increase the number of branches overall rather
than decreased them.

While investigating this we figured out that the problem is that when an
existing &  exists in gimple and the instruction is generated because
of the range information gotten from the ANDed constant that we end up with the
situation that you get a NOP AND in the RTL expansion.

This is not a problem as CSE will take care of it normally.   The issue is when
this original AND was done in a location where PRE or FRE "lift" the AND to a
different basic block.  This triggers a problem when the resulting value is not
single use.  Instead of having an AND and tbz, we end up generating an
AND + TST + BR if the mode is HI or QI.

This CSE across BB was a problem before but this change made it worse. Our
branch patterns rely on combine being able to fold AND or zero_extends into the
instructions.

To work around this (since a proper fix is outside of the scope of stage-4) we
are limiting the new tbranch optab to only HI and QI mode values.  This isn't a
problem because these two modes are modes for which we don't have CBZ support,
so they are the problematic cases to begin with.  Additionally booleans are QI.

The second thing we're doing is limiting the only legal bitpos to pos 0. i.e.
only the bottom bit.  This such that we prevent the double ANDs as much as
possible.

Now most other cases, i.e. where we had an explicit & in the source code are
still handled correctly by the anonymous (*tb1)
pattern that was added along with tbranch support.

This means we don't expand the superflous AND here, and while it doesn't fix the
problem that in the cross BB case we loss tbz, it also doesn't make things 
worse.

With these tweaks we've now reduced the number of insn uniformly as originally
expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md (tbranch_3): Restrict to SHORT
and bottom bit only.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/tbz_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
2c1367977a68fc8e4289118e07bb61398856791e..aa09e93d85e9628e8944e03498697eb9597ef867
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -949,8 +949,8 @@ (define_insn "*cb1"
 
 (define_expand "tbranch_3"
   [(set (pc) (if_then_else
-  (EQL (match_operand:ALLI 0 "register_operand")
-   (match_operand 1 "aarch64_simd_shift_imm_"))
+  (EQL (match_operand:SHORT 0 "register_operand")
+   (match_operand 1 "const0_operand"))
   (label_ref (match_operand 2 ""))
   (pc)))]
   ""
diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_2.c 
b/gcc/testsuite/gcc.target/aarch64/tbz_2.c
new file mode 100644
index 
..ec128b58f35276a7c5452685a65c73f95f2d5f9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tbz_2.c
@@ -0,0 +1,130 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables 
-fno-asynchronous-unwind-tables" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#include 
+
+void h(void);
+
+/*
+** g1:
+** cbnzw0, .L[0-9]+
+** ret
+** ...
+*/
+void g1(int x)
+{
+  if (__builtin_expect (x, 0))
+h ();
+}
+
+/* 
+** g2:
+** tbnzx0, 0, .L[0-9]+
+** ret
+** ...
+*/
+void g2(int x)
+{
+  if (__builtin_expect (x & 1, 0))
+h ();
+}
+
+/* 
+** g3:
+** tbnzx0, 3, .L[0-9]+
+** ret
+** ...
+*/
+void g3(int x)
+{
+  if (__builtin_expect (x & 8, 0))
+h ();
+}
+
+/* 
+** g4:
+** tbnzw0, #31, .L[0-9]+
+** ret
+** ...
+*/
+void g4(int x)
+{
+  if (__builtin_expect (x & (1 << 31), 0))
+h ();
+}
+
+/* 
+** g5:
+** tst w0, 255
+** bne .L[0-9]+
+** ret
+** ...
+*/
+void g5(char x)
+{
+  if (__builtin_expect (x, 0))
+h ();
+}
+
+/* 
+** g6:
+** tbnzw0, 0, .L[0-9]+
+** ret
+** ...
+*/
+void g6(char x)
+{
+  if (__builtin_expect (x & 1, 0))
+h ();
+}
+
+/* 
+** g7:
+** tst w0, 3
+** bne .L[0-9]+
+** ret
+** ...
+*/
+void g7(char x)
+{
+  if (__builtin_expect (x & 3, 0))
+h ();
+}
+
+/* 
+** g8:
+** tbnzw0, 7, .L[0-9]+
+** ret
+** ...
+*/
+void g8(char x)
+{
+  if (__builtin_expect (x & (1 << 7), 0))
+h ();
+}
+
+/* 
+** g9:
+** tbnzw0, 0, .L[0-9]+
+** ret
+** ...
+*/
+void g9(bool x)
+{
+  if (__builtin_expect (x, 0))
+h ();
+}
+
+/* 
+** g10:
+** tbnzw0, 0, .L[0-9]+
+** ret
+** ...
+*/
+void g10(bool x)
+{
+  if (__builtin_expect (x & 1, 0))
+h ();
+}
+




-- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
2c1367977a68fc8e4289118e07b

Re: [PATCH 3/7] **/*.texi: Reorder index entries

2023-01-27 Thread Iain Buclaw via Gcc-patches
Excerpts from Arsen Arsenović via Gcc-patches's message of Januar 27, 2023 1:18 
am:
> 
> gcc/d/ChangeLog:
> 
>   * implement-d.texi: Reorder index entries around @items.
> 
> ---
>  gcc/d/implement-d.texi  |  66 ++---
> 
> diff --git a/gcc/d/implement-d.texi b/gcc/d/implement-d.texi
> index 6d0c1ec3661..89a17916a83 100644
> --- a/gcc/d/implement-d.texi
> +++ b/gcc/d/implement-d.texi
> @@ -126,11 +126,11 @@ The following attributes are supported on most targets.
> 

Don't have much to comment on the D-specific documentation changes,
other than seems reasonable to me.

OK.

Iain.


Re: [PATCH] libstdc++: Fix up FAIL in 17_intro/names.cc on glibc < 2.19 [PR108568]

2023-01-27 Thread Jonathan Wakely via Gcc-patches
On Fri, 27 Jan 2023 at 09:29, Jakub Jelinek  wrote:
>
> Hi!
>
> On gcc112 which has glibc 2.17 I've noticed
> FAIL: 17_intro/names.cc (test for excess errors)
> FAIL: experimental/names.cc (test for excess errors)
> These are because glibc < 2.19 used __unused as field member of various 
> structs,
> including mcontext_t in sys/ucontext.h on ppc64le.
> This was changed in glibc with
> https://gcc.gnu.org/pipermail/libc-alpha/2013-November/045766.html
> names.cc even has
> #ifdef __GLIBC_PREREQ
> #if ! __GLIBC_PREREQ(2, 19)
> // Glibc defines this prior to 2.19
> #undef __unused
> #endif
> #endif
> for it, but it doesn't work.  The reason is that __GLIBC_PREREQ is defined in
>  but nothing included that header before this spot (it is 
> included later
> from bits/stdc++.h).
>
> The following patch on Linux/Hurd conditionally includes features.h to get
> the needed macros before deciding if __unused should be undefined or not.
> If needed, I could use __GLIBC_PREREQ then but would need to check if it is
> defined and between 1996 and 1999 it wasn't.
>
> Tested on powerpc64le-linux with glibc 2.17 (where it fixes the
> regressions), on x86_64-linux with glibc 2.35 (where it still PASSes),
> plus on the latter with -E -dD on the test to verify __unused is just
> defined and not undefined later on, ok for trunk?

OK, thanks.



Re: [PATCH] tree-optimization/108522 Use component_ref_field_offset

2023-01-27 Thread Eric Botcazou via Gcc-patches
> OK.  PLACEHOLDER_EXPR are only relevant pre simplification.

I presume you mean "pre gimplification" here?

-- 
Eric Botcazou





[PATCH 1/2] Add support for conditional xorsign [PR96373]

2023-01-27 Thread Richard Sandiford via Gcc-patches
This patch is an optimisation, but it's also a prerequisite for
fixing PR96373 without regressing vect-xorsign_exec.c.

Currently the vectoriser vectorises:

  for (i = 0; i < N; i++)
r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);

as two unconditional operations (copysign and mult).
tree-ssa-math-opts.cc later combines them into an "xorsign" function.
This works for both Advanced SIMD and SVE.

However, with the fix for PR96373, the vectoriser will instead
generate a conditional multiplication (IFN_COND_MUL).  Something then
needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional
xorsign.  Three obvious options were:

(1) Extend tree-ssa-math-opts.cc.
(2) Do the fold in match.pd.
(3) Leave it to rtl combine.

I'm against (3), because this isn't a target-specific optimisation.
(1) would be possible, but would involve open-coding a lot of what
match.pd does for us.  And, in contrast to doing the current
tree-ssa-math-opts.cc optimisation in match.pd, there should be
no danger of (2) happening too early.  If we have an IFN_COND_MUL
then we're already past the stage of simplifying the original
source code.

There was also a choice between adding a conditional xorsign ifn
and simply open-coding the xorsign.  The latter seems simpler,
and means less boiler-plate for target-specific code.

The signed_or_unsigned_type_for change is needed to make sure
that we stay in "SVE space" when doing the optimisation on 128-bit
fixed-length SVE.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/96373
* tree.h (sign_mask_for): Declare.
* tree.cc (sign_mask_for): New function.
(signed_or_unsigned_type_for): For vector types, try to use the
related_int_vector_mode.
* genmatch.cc (commutative_op): Handle conditional internal functions.
* match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and.

gcc/testsuite/
PR tree-optimization/96373
* gcc.target/aarch64/sve/cond_xorsign_1.c: New test.
* gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
---
 gcc/genmatch.cc   | 15 
 gcc/match.pd  | 14 
 .../gcc.target/aarch64/sve/cond_xorsign_1.c   | 34 +++
 .../gcc.target/aarch64/sve/cond_xorsign_2.c   | 17 ++
 gcc/tree.cc   | 33 ++
 gcc/tree.h|  1 +
 6 files changed, 114 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index d4cb439a851..e147ab9db7a 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -489,6 +489,21 @@ commutative_op (id_base *id)
   case CFN_FNMS:
return 0;
 
+  case CFN_COND_ADD:
+  case CFN_COND_MUL:
+  case CFN_COND_MIN:
+  case CFN_COND_MAX:
+  case CFN_COND_FMIN:
+  case CFN_COND_FMAX:
+  case CFN_COND_AND:
+  case CFN_COND_IOR:
+  case CFN_COND_XOR:
+  case CFN_COND_FMA:
+  case CFN_COND_FMS:
+  case CFN_COND_FNMA:
+  case CFN_COND_FNMS:
+   return 1;
+
   default:
return -1;
   }
diff --git a/gcc/match.pd b/gcc/match.pd
index 56ac743aa6d..f605b798c44 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0)))
   (COPYSIGN_ALL (negate @0) @1)))
 
+/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z.
+   tree-ssa-math-opts.cc does the corresponding optimization for
+   unconditional multiplications (via xorsign).  */
+(simplify
+ (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3)
+ (with { tree signs = sign_mask_for (type); }
+  (if (signs)
+   (with { tree inttype = TREE_TYPE (signs); }
+(view_convert:type
+ (IFN_COND_XOR:inttype @0
+  (view_convert:inttype @1)
+  (bit_and (view_convert:inttype @2) { signs; })
+  (view_convert:inttype @3)))
+
 /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
 (simplify
   (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
new file mode 100644
index 000..338ca605923
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B))
+
+#define DEF_LOOP(TYPE, SUFFIX) \
+  void __attribute__ ((noinline, noclone)) \
+  test_##TYPE (TYPE *__restrict r, TYPE *__restrict a, \
+  TYPE *__restrict b, TYPE *__restrict c,  \
+  int n)   \
+  {

[PATCH 2/2] vect: Make partial trapping ops use predication [PR96373]

2023-01-27 Thread Richard Sandiford via Gcc-patches
PR96373 points out that a predicated SVE loop currently converts
trapping unconditional ops into unpredicated vector ops.  Doing
the operation on inactive lanes can then raise an exception.

As discussed in the PR trail, we aren't 100% consistent about
whether we preserve traps or not.  But the direction of travel
is clearly to improve that rather than live with it.  This patch
tries to do that for the SVE case.

Doing this regresses gcc.target/aarch64/sve/fabd_1.c.  I've added
-fno-trapping-math for now and filed PR108571 to track it.
A similar problem applies to fsubr_1.d.

I think this is likely to regress Power 10, since conditional
operations are only available for masked loops.  I think we'll
need to add -fno-trapping-math to any affected testcases,
but I don't have a Power 10 system to test on.  Kewen, would you
mind giving this a spin and seeing how bad the fallout is?

Tested on aarch64-linux-gnu.  OK to install assuming no blockers
on the Power 10 side?

Richard


gcc/
PR tree-optimization/96373
* tree-vect-stmts.cc (vectorizable_operation): Predicate trapping
operations on the loop mask.  Reject partial vectors if this isn't
possible.

gcc/testsuite/
PR tree-optimization/96373
PR tree-optimization/108571
* gcc.target/aarch64/sve/fabd_1.c: Add -fno-trapping-math.
* gcc.target/aarch64/sve/fsubr_1.c: Likewise.
* gcc.target/aarch64/sve/fmul_1.c: Expect predicate ops.
* gcc.target/aarch64/sve/fp_arith_1.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c |  2 +-
 gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c | 12 +++
 .../gcc.target/aarch64/sve/fp_arith_1.c   | 12 +++
 .../gcc.target/aarch64/sve/fsubr_1.c  |  2 +-
 gcc/tree-vect-stmts.cc| 32 ++-
 5 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c
index 13ad83be24c..30bde6f0df7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O3 --save-temps" } */
+/* { dg-options "-O3 --save-temps -fno-trapping-math" } */
 
 #define N 16
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c
index 4a3e7c06745..0245a8c1422 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c
@@ -27,20 +27,20 @@ DO_ARITH_OPS (_Float16, *, mul)
 DO_ARITH_OPS (float, *, mul)
 DO_ARITH_OPS (double, *, mul)
 
-/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, z[0-9]+\.h, 
z[0-9]+\.h\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */
 /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h, #0.5\n} 1 } } */
-/* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h, #2} } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h, #2.0\n} 1 } } */
 /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h, #5} } } */
 /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
z[0-9]+\.h, #-} } } */
 
-/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, z[0-9]+\.s, 
z[0-9]+\.s\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */
 /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, #0.5\n} 1 } } */
-/* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, #2} } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, #2.0\n} 1 } } */
 /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, #5} } } */
 /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, #-} } } */
 
-/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, z[0-9]+\.d, 
z[0-9]+\.d\n} 4 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */
 /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, #0.5\n} 1 } } */
-/* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, #2} } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, #2.0\n} 1 } } */
 /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, #5} } } */
 /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, #-} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c
index 5aed0dcb490..419d6e1b5ec 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fp_arith_1.c
@@ -34,

Re: [PATCH 2/2] vect: Make partial trapping ops use predication [PR96373]

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Richard Sandiford wrote:

> PR96373 points out that a predicated SVE loop currently converts
> trapping unconditional ops into unpredicated vector ops.  Doing
> the operation on inactive lanes can then raise an exception.
> 
> As discussed in the PR trail, we aren't 100% consistent about
> whether we preserve traps or not.  But the direction of travel
> is clearly to improve that rather than live with it.  This patch
> tries to do that for the SVE case.
> 
> Doing this regresses gcc.target/aarch64/sve/fabd_1.c.  I've added
> -fno-trapping-math for now and filed PR108571 to track it.
> A similar problem applies to fsubr_1.d.
> 
> I think this is likely to regress Power 10, since conditional
> operations are only available for masked loops.  I think we'll
> need to add -fno-trapping-math to any affected testcases,
> but I don't have a Power 10 system to test on.  Kewen, would you
> mind giving this a spin and seeing how bad the fallout is?
> 
> Tested on aarch64-linux-gnu.  OK to install assuming no blockers
> on the Power 10 side?

OK.

Thanks,
Richard.

> Richard
> 
> 
> gcc/
>   PR tree-optimization/96373
>   * tree-vect-stmts.cc (vectorizable_operation): Predicate trapping
>   operations on the loop mask.  Reject partial vectors if this isn't
>   possible.
> 
> gcc/testsuite/
>   PR tree-optimization/96373
>   PR tree-optimization/108571
>   * gcc.target/aarch64/sve/fabd_1.c: Add -fno-trapping-math.
>   * gcc.target/aarch64/sve/fsubr_1.c: Likewise.
>   * gcc.target/aarch64/sve/fmul_1.c: Expect predicate ops.
>   * gcc.target/aarch64/sve/fp_arith_1.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c |  2 +-
>  gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c | 12 +++
>  .../gcc.target/aarch64/sve/fp_arith_1.c   | 12 +++
>  .../gcc.target/aarch64/sve/fsubr_1.c  |  2 +-
>  gcc/tree-vect-stmts.cc| 32 ++-
>  5 files changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c
> index 13ad83be24c..30bde6f0df7 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fabd_1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do assemble { target aarch64_asm_sve_ok } } */
> -/* { dg-options "-O3 --save-temps" } */
> +/* { dg-options "-O3 --save-temps -fno-trapping-math" } */
>  
>  #define N 16
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c
> index 4a3e7c06745..0245a8c1422 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fmul_1.c
> @@ -27,20 +27,20 @@ DO_ARITH_OPS (_Float16, *, mul)
>  DO_ARITH_OPS (float, *, mul)
>  DO_ARITH_OPS (double, *, mul)
>  
> -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, z[0-9]+\.h, 
> z[0-9]+\.h\n} 4 } } */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
> z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */
>  /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
> z[0-9]+\.h, #0.5\n} 1 } } */
> -/* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
> z[0-9]+\.h, #2} } } */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
> z[0-9]+\.h, #2.0\n} 1 } } */
>  /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
> z[0-9]+\.h, #5} } } */
>  /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.h, p[0-7]/m, 
> z[0-9]+\.h, #-} } } */
>  
> -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, z[0-9]+\.s, 
> z[0-9]+\.s\n} 4 } } */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */
>  /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, #0.5\n} 1 } } */
> -/* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, #2} } } */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, #2.0\n} 1 } } */
>  /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, #5} } } */
>  /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, #-} } } */
>  
> -/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, z[0-9]+\.d, 
> z[0-9]+\.d\n} 4 } } */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */
>  /* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, #0.5\n} 1 } } */
> -/* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, #2} } } */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, #2.0\n} 1 } } */
>  /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, #5} } } */
>  /* { dg-final { scan-assembler-not   {\tfmul\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, #-} } } */
> d

Re: [PATCH 1/2] Add support for conditional xorsign [PR96373]

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Richard Sandiford wrote:

> This patch is an optimisation, but it's also a prerequisite for
> fixing PR96373 without regressing vect-xorsign_exec.c.
> 
> Currently the vectoriser vectorises:
> 
>   for (i = 0; i < N; i++)
> r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
> 
> as two unconditional operations (copysign and mult).
> tree-ssa-math-opts.cc later combines them into an "xorsign" function.
> This works for both Advanced SIMD and SVE.
> 
> However, with the fix for PR96373, the vectoriser will instead
> generate a conditional multiplication (IFN_COND_MUL).  Something then
> needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional
> xorsign.  Three obvious options were:
> 
> (1) Extend tree-ssa-math-opts.cc.
> (2) Do the fold in match.pd.
> (3) Leave it to rtl combine.
> 
> I'm against (3), because this isn't a target-specific optimisation.
> (1) would be possible, but would involve open-coding a lot of what
> match.pd does for us.  And, in contrast to doing the current
> tree-ssa-math-opts.cc optimisation in match.pd, there should be
> no danger of (2) happening too early.  If we have an IFN_COND_MUL
> then we're already past the stage of simplifying the original
> source code.
> 
> There was also a choice between adding a conditional xorsign ifn
> and simply open-coding the xorsign.  The latter seems simpler,
> and means less boiler-plate for target-specific code.
> 
> The signed_or_unsigned_type_for change is needed to make sure
> that we stay in "SVE space" when doing the optimisation on 128-bit
> fixed-length SVE.
> 
> Tested on aarch64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
> 
> 
> gcc/
>   PR tree-optimization/96373
>   * tree.h (sign_mask_for): Declare.
>   * tree.cc (sign_mask_for): New function.
>   (signed_or_unsigned_type_for): For vector types, try to use the
>   related_int_vector_mode.
>   * genmatch.cc (commutative_op): Handle conditional internal functions.
>   * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and.
> 
> gcc/testsuite/
>   PR tree-optimization/96373
>   * gcc.target/aarch64/sve/cond_xorsign_1.c: New test.
>   * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
> ---
>  gcc/genmatch.cc   | 15 
>  gcc/match.pd  | 14 
>  .../gcc.target/aarch64/sve/cond_xorsign_1.c   | 34 +++
>  .../gcc.target/aarch64/sve/cond_xorsign_2.c   | 17 ++
>  gcc/tree.cc   | 33 ++
>  gcc/tree.h|  1 +
>  6 files changed, 114 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c
> 
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index d4cb439a851..e147ab9db7a 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -489,6 +489,21 @@ commutative_op (id_base *id)
>case CFN_FNMS:
>   return 0;
>  
> +  case CFN_COND_ADD:
> +  case CFN_COND_MUL:
> +  case CFN_COND_MIN:
> +  case CFN_COND_MAX:
> +  case CFN_COND_FMIN:
> +  case CFN_COND_FMAX:
> +  case CFN_COND_AND:
> +  case CFN_COND_IOR:
> +  case CFN_COND_XOR:
> +  case CFN_COND_FMA:
> +  case CFN_COND_FMS:
> +  case CFN_COND_FNMA:
> +  case CFN_COND_FNMS:
> + return 1;
> +
>default:
>   return -1;
>}
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 56ac743aa6d..f605b798c44 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0)))
>(COPYSIGN_ALL (negate @0) @1)))
>  
> +/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z.
> +   tree-ssa-math-opts.cc does the corresponding optimization for
> +   unconditional multiplications (via xorsign).  */
> +(simplify
> + (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3)
> + (with { tree signs = sign_mask_for (type); }
> +  (if (signs)
> +   (with { tree inttype = TREE_TYPE (signs); }
> +(view_convert:type
> + (IFN_COND_XOR:inttype @0
> +  (view_convert:inttype @1)
> +  (bit_and (view_convert:inttype @2) { signs; })
> +  (view_convert:inttype @3)))
> +
>  /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
>  (simplify
>(plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
> new file mode 100644
> index 000..338ca605923
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B))
> +
> +#define DEF_LOOP(TYPE, SUFFIX) 

[PATCH] testsuite: Use noipa and noinline attributes for pr95115 test

2023-01-27 Thread Xi Ruoyao via Gcc-patches
They prevent the compiler from deeming the NaN result "unused" and
remove the calculation raising INVALID exception. See the discussion
in PR107608 for details.

Tested on x86_64-linux-gnu where the change fixes the test failure.
Ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.dg/pr95115.c (x): Add noipa and noinline attributes.
---
 gcc/testsuite/gcc.dg/pr95115.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr95115.c b/gcc/testsuite/gcc.dg/pr95115.c
index 69c4f83250c..11620acccae 100644
--- a/gcc/testsuite/gcc.dg/pr95115.c
+++ b/gcc/testsuite/gcc.dg/pr95115.c
@@ -6,7 +6,7 @@
 #include 
 #include 
 
-double
+__attribute__ ((noipa, noinline)) double
 x (void)
 {
   double d = __builtin_inf ();
-- 
2.39.1



Re: [PATCH] tree-optimization/108522 Use component_ref_field_offset

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, Jan 27, 2023 at 12:05 PM Eric Botcazou  wrote:
>
> > OK.  PLACEHOLDER_EXPR are only relevant pre simplification.
>
> I presume you mean "pre gimplification" here?

Eh, yes.  Spell-checkers ...

> --
> Eric Botcazou
>
>
>


Re: [PATCH] testsuite: Use noipa and noinline attributes for pr95115 test

2023-01-27 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 27, 2023 at 07:46:27PM +0800, Xi Ruoyao wrote:
> They prevent the compiler from deeming the NaN result "unused" and
> remove the calculation raising INVALID exception. See the discussion
> in PR107608 for details.
> 
> Tested on x86_64-linux-gnu where the change fixes the test failure.
> Ok for trunk?
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr95115.c (x): Add noipa and noinline attributes.

noipa implies noinline, so unless one targets both very old gcc versions
which didn't have noipa attribute in addition to current ones, it is
sufficient to specify just noipa.

Otherwise LGTM.

> diff --git a/gcc/testsuite/gcc.dg/pr95115.c b/gcc/testsuite/gcc.dg/pr95115.c
> index 69c4f83250c..11620acccae 100644
> --- a/gcc/testsuite/gcc.dg/pr95115.c
> +++ b/gcc/testsuite/gcc.dg/pr95115.c
> @@ -6,7 +6,7 @@
>  #include 
>  #include 
>  
> -double
> +__attribute__ ((noipa, noinline)) double
>  x (void)
>  {
>double d = __builtin_inf ();
> -- 
> 2.39.1

Jakub



Re: [PATCH] arm: Fix MVE's vcmp vector-scalar patterns [PR107987]

2023-01-27 Thread Andre Vieira (lists) via Gcc-patches

This applies cleanly to gcc-12 and regressions for arm-none-eabi look clean.

OK to apply to gcc-12?



On 06/12/2022 11:23, Kyrylo Tkachov wrote:




-Original Message-
From: Andre Simoes Dias Vieira 
Sent: Tuesday, December 6, 2022 11:19 AM
To: 'gcc-patches@gcc.gnu.org' 
Cc: Kyrylo Tkachov ; Richard Earnshaw

Subject: [PATCH] arm: Fix MVE's vcmp vector-scalar patterns [PR107987]

Hi,

This patch surrounds the scalar operand of the MVE vcmp patterns with a
vec_duplicate to ensure both operands of the comparision operator have the
same
(vector) mode.

Regression tested on arm-none-eabi. Is this OK for trunk? And a backport to
GCC 12?


Ok.
Thanks,
Kyrill



gcc/ChangeLog:

PR target/107987
* config/arm/mve.md (mve_vcmpq_n_,
@mve_vcmpq_n_f): Apply vec_duplicate to
scalar
operand.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/arm/mve/pr107987.c: New test.




RE: [PATCH] arm: Fix MVE's vcmp vector-scalar patterns [PR107987]

2023-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Friday, January 27, 2023 12:07 PM
> To: Kyrylo Tkachov ; 'gcc-patches@gcc.gnu.org'
> 
> Cc: Richard Earnshaw 
> Subject: Re: [PATCH] arm: Fix MVE's vcmp vector-scalar patterns [PR107987]
> 
> This applies cleanly to gcc-12 and regressions for arm-none-eabi look clean.
> 
> OK to apply to gcc-12?

Yes, thanks.
Kyrill

> 
> 
> 
> On 06/12/2022 11:23, Kyrylo Tkachov wrote:
> >
> >
> >> -Original Message-
> >> From: Andre Simoes Dias Vieira 
> >> Sent: Tuesday, December 6, 2022 11:19 AM
> >> To: 'gcc-patches@gcc.gnu.org' 
> >> Cc: Kyrylo Tkachov ; Richard Earnshaw
> >> 
> >> Subject: [PATCH] arm: Fix MVE's vcmp vector-scalar patterns [PR107987]
> >>
> >> Hi,
> >>
> >> This patch surrounds the scalar operand of the MVE vcmp patterns with a
> >> vec_duplicate to ensure both operands of the comparision operator have
> the
> >> same
> >> (vector) mode.
> >>
> >> Regression tested on arm-none-eabi. Is this OK for trunk? And a backport
> to
> >> GCC 12?
> >
> > Ok.
> > Thanks,
> > Kyrill
> >
> >>
> >> gcc/ChangeLog:
> >>
> >>PR target/107987
> >>* config/arm/mve.md (mve_vcmpq_n_,
> >>@mve_vcmpq_n_f): Apply vec_duplicate to
> >> scalar
> >>operand.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc/testsuite/gcc.target/arm/mve/pr107987.c: New test.
> >


Pushed: [PATCH] testsuite: Use noipa attribute for pr95115 test

2023-01-27 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-01-27 at 12:50 +0100, Jakub Jelinek wrote:
> On Fri, Jan 27, 2023 at 07:46:27PM +0800, Xi Ruoyao wrote:
> > They prevent the compiler from deeming the NaN result "unused" and
> > remove the calculation raising INVALID exception. See the discussion
> > in PR107608 for details.
> > 
> > Tested on x86_64-linux-gnu where the change fixes the test failure.
> > Ok for trunk?
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/pr95115.c (x): Add noipa and noinline attributes.
> 
> noipa implies noinline, so unless one targets both very old gcc
> versions
> which didn't have noipa attribute in addition to current ones, it is
> sufficient to specify just noipa.
> 
> Otherwise LGTM.

Pushed with only noipa.  I must have some flawed memory abort the
interaction between noipa and noinline.  Thanks for pointing that out!


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH]AArch64: Fix native detection in the presence of mandatory features which don't have midr values

2023-01-27 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> aarch64-option-extensions.def explicitly defines the semantics for an empty 
> midr
> field as being:
>
>  In that case this field
>  should contain a space (" ") separated list of the strings in 'Features'
>  that are required.  Their order is not important.  An empty string means
>  do not detect this feature during auto detection.
>
> That is to say, an empty string means that we don't know the midr value for 
> this
> feature and so it just shouldn't be taken into account for native features
> detection.  However this meaning seems to have gotten lost at some point.
>
> This results in e.g. -mcpu=native on a Neoverse N2 disabling features it does
> have.  Essentially we disabled any mandatory feature for which there is no 
> midr
> entry.
>
> The rationale for having -mcpu=native being able to disable features at all, 
> is
> because the kernel is able to disable a mandatory feature for correctness
> issues.  Unfortunately we can't distinguish between "old kernel"
> and "kernel disabled".
>
> This patch adds a new field that indicates whether the midr field has any 
> value
> at all.  If there's no value we skip the extension when determining the "off"
> flags.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? I think it needs backporting but need to verify older 
> compilers.
> If one is required, OK for backporting?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (struct aarch64_option_extension): Add native_detect and document struct
>   a bit more.
>   (all_extensions): Set new field native_detect.
>   * config/aarch64/aarch64.cc (struct aarch64_option_extension): Delete
>   unused struct.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/cpunative/info_19: New test.
>   * gcc.target/aarch64/cpunative/info_20: New test.
>   * gcc.target/aarch64/cpunative/info_21: New test.
>   * gcc.target/aarch64/cpunative/info_22: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_19.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_20.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_21.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_22.c: New test.

Mostly LGTM, but some nits below.

> --- inline copy of patch -- 
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> a9695d60197e6585957b293d2d755a557e124d4f..4e9e9c0bf86a5ef2667f0bb7e646ba06152aa982
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -139,10 +139,17 @@ aarch64_handle_option (struct gcc_options *opts,
>  /* An ISA extension in the co-processor and main instruction set space.  */
>  struct aarch64_option_extension
>  {
> -  const char *name;
> +  /* The extension name to pass on to the assembler.  */
> +  const char *const name;

There's no need to make name itself const.

> +  /* The smallest set of feature bits to toggle to enable this option.  */
>aarch64_feature_flags flag_canonical;
> -  aarch64_feature_flags flags_on;
> -  aarch64_feature_flags flags_off;
> +  /* If this feature is turned on, these bits also need to be turned on.  */
> +  const unsigned long flags_on;
> +  /* If this feature is turned off, these bits also need to be turned off.  
> */
> +  const unsigned long flags_off;

Please don't undo the aarch64_feature_flags abstraction.  "long" isn't
enough for x86_32 to aarch64 cross-compilers (yes, I know, but still),
and we're not far off running out of room in the uint64_t.  The point
of the abstraction was to reduce the number of changes that we need
once we have 65 or more features, architecture levels, etc.

> +  /* Indicates whether this feature is taken into account during native cpu
> + detection.  */
> +  bool native_detect;
>  };
>  
>  /* ISA extensions in AArch64.  */
> @@ -150,9 +157,9 @@ static constexpr aarch64_option_extension 
> all_extensions[] =
>  {
>  #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, F) \
>{NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
> -   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
> +   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), strlen (F)},

strlen isn't guaranteed to be evaluated at compile time.  How about
F[0] instead?  Would be good to rename F to FEATURE_STRING, now that
the expansion actually uses it.

>  #include "config/aarch64/aarch64-option-extensions.def"
> -  {NULL, 0, 0, 0}
> +  {NULL, 0, 0, 0, false}
>  };
>  
>  struct processor_name_to_arch
> @@ -325,9 +332,13 @@ aarch64_get_extension_string_for_isa_flags
>   outstr += opt.name;
>}
>  
> -  /* Remove the features in current_flags & ~isa_flags.  */
> +  /* Remove the features in current_flags & ~isa_flags.  If the feature does
> + not have an HWCAPs then it shouldn't be taken into acco

Re: Pushed: [PATCH] testsuite: Use noipa attribute for pr95115 test

2023-01-27 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 27, 2023 at 08:08:26PM +0800, Xi Ruoyao via Gcc-patches wrote:
> On Fri, 2023-01-27 at 12:50 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 27, 2023 at 07:46:27PM +0800, Xi Ruoyao wrote:
> > > They prevent the compiler from deeming the NaN result "unused" and
> > > remove the calculation raising INVALID exception. See the discussion
> > > in PR107608 for details.
> > > 
> > > Tested on x86_64-linux-gnu where the change fixes the test failure.
> > > Ok for trunk?
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > * gcc.dg/pr95115.c (x): Add noipa and noinline attributes.
> > 
> > noipa implies noinline, so unless one targets both very old gcc
> > versions
> > which didn't have noipa attribute in addition to current ones, it is
> > sufficient to specify just noipa.
> > 
> > Otherwise LGTM.
> 
> Pushed with only noipa.  I must have some flawed memory abort the
> interaction between noipa and noinline.  Thanks for pointing that out!

  /* A "noipa" function attribute implies "noinline", "noclone" and "no_icf"
 for those targets that support it.  */
  if (TREE_CODE (*node) == FUNCTION_DECL
  && attributes
  && lookup_attribute ("noipa", attributes) != NULL
  && lookup_attribute_spec (get_identifier ("noipa")))
{
  if (lookup_attribute ("noinline", attributes) == NULL)
attributes = tree_cons (get_identifier ("noinline"), NULL, attributes);

  if (lookup_attribute ("noclone", attributes) == NULL)
attributes = tree_cons (get_identifier ("noclone"),  NULL, attributes);

  if (lookup_attribute ("no_icf", attributes) == NULL)
attributes = tree_cons (get_identifier ("no_icf"),  NULL, attributes);
}

plus various spots check just for "noipa", so noipa isn't just equivalent to
noinline, noclone, no_icf.

Jakub



RE: [PATCH 0/9] Don't add crtfastmath.o for -shared

2023-01-27 Thread Kyrylo Tkachov via Gcc-patches
Thanks for fixing this Richard.

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
> Biener via Gcc-patches
> Sent: Friday, January 13, 2023 8:05 AM
> To: gcc-patches@gcc.gnu.org
> Cc: hongtao@intel.com; ubiz...@gmail.com
> Subject: [PATCH 0/9] Don't add crtfastmath.o for -shared
> 
> 
> This is a series completing the fix for PR55522 which got a fix for
> x86-linux already but left all other targets unfixed (including
> x86-cygwin, x86-darwin and x86-mingw32).  The following series
> applies a similar change to other specs using crtfastmath.o,
> the changes are untested.
> 
> Target maintainers are CCed and I hope they can smoke-test the
> changes.
> 

Do you think it's something we should mention in changes.html for GCC 13?
Kyrill

> x86 maintainers, can you please adjust the missed specs yourself?
> 
> Thanks,
> Richard.


Re: [pushed] wwwdocs: codingconventions: Update upstream instructions for libstdc++

2023-01-27 Thread Jonathan Wakely via Gcc-patches

On 27/01/23 01:16 +0100, Gerald Pfeifer wrote:

Jonathan (or some other libstdc++ developer), would you mind having a
look at that section of https://gcc.gnu.org/codingconventions.html to
see whether we should do further changes?


Oh wow, it's all wrong. I've pushed the patch below, thanks for
pointing it out.


commit 17f88fe2b73fe50b1831ece5dd40bf29151899ab
Author: Jonathan Wakely 
Date:   Fri Jan 27 12:13:42 2023 +

Replace outdated info about libstdc++-v3/doc files

diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index 24365815..7e2a092d 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -754,13 +754,13 @@ autoconf-based configury is a local GCC invention.  Changes to zlib
 outside the build system are discouraged, and should be sent upstream
 first. 
 
-libstdc++-v3:  In docs/doxygen, comments in *.cfg.in are
-partially autogenerated from https://www.doxygen.nl";>the
-Doxygen tool.  In docs/html, 
-the 27_io/binary_iostream_* files are copies of Usenet postings, and most
-of the files in 17_intro are either copied from elsewhere in GCC or the
-FSF website, or are autogenerated.  These files should not be changed
-without prior permission, if at all.
+libstdc++-v3:  The doc/doxygen/user.cfg.in file is partially autogenerated
+from https://www.doxygen.nl";>the Doxygen tool (and regenerated
+using doxygen -u).
+The files in doc/html are generated from the Docbook sources in doc/xml
+and should not be changed manually.
+The files in doc/xml/gnu are based on the GNU licenses and should not
+be changed without prior permission, if at all.
 
 libgcc/config/libbid: The master sources come from Intel BID library
 https://netlib.org/misc/intel/";>Intel BID library. 


RE: [PATCH 0/9] Don't add crtfastmath.o for -shared

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Kyrylo Tkachov wrote:

> Thanks for fixing this Richard.
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
> > Biener via Gcc-patches
> > Sent: Friday, January 13, 2023 8:05 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: hongtao@intel.com; ubiz...@gmail.com
> > Subject: [PATCH 0/9] Don't add crtfastmath.o for -shared
> > 
> > 
> > This is a series completing the fix for PR55522 which got a fix for
> > x86-linux already but left all other targets unfixed (including
> > x86-cygwin, x86-darwin and x86-mingw32).  The following series
> > applies a similar change to other specs using crtfastmath.o,
> > the changes are untested.
> > 
> > Target maintainers are CCed and I hope they can smoke-test the
> > changes.
> > 
> 
> Do you think it's something we should mention in changes.html for GCC 13?

Sure, I will add something once the rest of the series is approved.

Richard.


Re: [PATCH]AArch64: Fix codegen regressions around tbz.

2023-01-27 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> We were analyzing code quality after recent changes and have noticed that the
> tbz support somehow managed to increase the number of branches overall rather
> than decreased them.
>
> While investigating this we figured out that the problem is that when an
> existing &  exists in gimple and the instruction is generated 
> because
> of the range information gotten from the ANDed constant that we end up with 
> the
> situation that you get a NOP AND in the RTL expansion.
>
> This is not a problem as CSE will take care of it normally.   The issue is 
> when
> this original AND was done in a location where PRE or FRE "lift" the AND to a
> different basic block.  This triggers a problem when the resulting value is 
> not
> single use.  Instead of having an AND and tbz, we end up generating an
> AND + TST + BR if the mode is HI or QI.
>
> This CSE across BB was a problem before but this change made it worse. Our
> branch patterns rely on combine being able to fold AND or zero_extends into 
> the
> instructions.
>
> To work around this (since a proper fix is outside of the scope of stage-4) we
> are limiting the new tbranch optab to only HI and QI mode values.  This isn't 
> a
> problem because these two modes are modes for which we don't have CBZ support,
> so they are the problematic cases to begin with.  Additionally booleans are 
> QI.
>
> The second thing we're doing is limiting the only legal bitpos to pos 0. i.e.
> only the bottom bit.  This such that we prevent the double ANDs as much as
> possible.
>
> Now most other cases, i.e. where we had an explicit & in the source code are
> still handled correctly by the anonymous (*tb1)
> pattern that was added along with tbranch support.
>
> This means we don't expand the superflous AND here, and while it doesn't fix 
> the
> problem that in the cross BB case we loss tbz, it also doesn't make things 
> worse.
>
> With these tweaks we've now reduced the number of insn uniformly as originally
> expected.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (tbranch_3): Restrict to SHORT
>   and bottom bit only.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/tbz_2.c: New test.

Agreed that reducing the scope of the new optimisation seems like a safe
compromise for GCC 13.  But could you add a testcase that shows the
effect of both changes (reducing the mode selection and the bit
selection)?  The test above passes even without the patch.

It would be good to have a PR tracking this too.

Personally, I think we should try to get to the stage where gimple
does the CSE optimisations we need, and where the tbranch optab can
generate a tbz directly (rather than splitting it apart and hoping
that combine will put it back together later).

Thanks,
Richard

> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 2c1367977a68fc8e4289118e07bb61398856791e..aa09e93d85e9628e8944e03498697eb9597ef867
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -949,8 +949,8 @@ (define_insn "*cb1"
>  
>  (define_expand "tbranch_3"
>[(set (pc) (if_then_else
> -  (EQL (match_operand:ALLI 0 "register_operand")
> -   (match_operand 1 "aarch64_simd_shift_imm_"))
> +  (EQL (match_operand:SHORT 0 "register_operand")
> +   (match_operand 1 "const0_operand"))
>(label_ref (match_operand 2 ""))
>(pc)))]
>""
> diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_2.c 
> b/gcc/testsuite/gcc.target/aarch64/tbz_2.c
> new file mode 100644
> index 
> ..ec128b58f35276a7c5452685a65c73f95f2d5f9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/tbz_2.c
> @@ -0,0 +1,130 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables 
> -fno-asynchronous-unwind-tables" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include 
> +
> +void h(void);
> +
> +/*
> +** g1:
> +**   cbnzw0, .L[0-9]+
> +**   ret
> +**   ...
> +*/
> +void g1(int x)
> +{
> +  if (__builtin_expect (x, 0))
> +h ();
> +}
> +
> +/* 
> +** g2:
> +**   tbnzx0, 0, .L[0-9]+
> +**   ret
> +**   ...
> +*/
> +void g2(int x)
> +{
> +  if (__builtin_expect (x & 1, 0))
> +h ();
> +}
> +
> +/* 
> +** g3:
> +**   tbnzx0, 3, .L[0-9]+
> +**   ret
> +**   ...
> +*/
> +void g3(int x)
> +{
> +  if (__builtin_expect (x & 8, 0))
> +h ();
> +}
> +
> +/* 
> +** g4:
> +**   tbnzw0, #31, .L[0-9]+
> +**   ret
> +**   ...
> +*/
> +void g4(int x)
> +{
> +  if (__builtin_expect (x & (1 << 31), 0))
> +h ();
> +}
> +
> +/* 
> +** g5:
> +**   tst w0, 255
> +**   bne .L[0-9]+
> +**   ret
> +**   ...
> +*/
> +void g5(char x)
> +{
> +  if (__builtin_expect (x, 0))
> +

Re: [PATCH 0/9] Don't add crtfastmath.o for -shared

2023-01-27 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Fri, 27 Jan 2023, Kyrylo Tkachov wrote:
>
>> Thanks for fixing this Richard.
>> 
>> > -Original Message-
>> > From: Gcc-patches > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
>> > Biener via Gcc-patches
>> > Sent: Friday, January 13, 2023 8:05 AM
>> > To: gcc-patches@gcc.gnu.org
>> > Cc: hongtao@intel.com; ubiz...@gmail.com
>> > Subject: [PATCH 0/9] Don't add crtfastmath.o for -shared
>> > 
>> > 
>> > This is a series completing the fix for PR55522 which got a fix for
>> > x86-linux already but left all other targets unfixed (including
>> > x86-cygwin, x86-darwin and x86-mingw32).  The following series
>> > applies a similar change to other specs using crtfastmath.o,
>> > the changes are untested.
>> > 
>> > Target maintainers are CCed and I hope they can smoke-test the
>> > changes.
>> > 
>> 
>> Do you think it's something we should mention in changes.html for GCC 13?
>
> Sure, I will add something once the rest of the series is approved.

Mind if I rubber-stamp OK the unreviewed changes?  I don't think there's
a good justification for making a different choice on different targets.

Thanks,
Richard


[PATCH] RISC-V: Fix testcases check.

2023-01-27 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Fix testcase check.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Ditto.

---
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c  | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
index e855f86b9a3..2e1f68f9bdc 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
@@ -37,4 +37,4 @@ void f (void * restrict in, void * restrict out, int l, int 
n, int m, int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts 
"-funroll-loops" } } } } */
+/* { dg-final { scan-assembler 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts 
"-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
index 316a4ce6193..a3dca3834e3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
@@ -36,4 +36,4 @@ void f (void * restrict in, void * restrict out, int l, int 
n, int m, int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" 
no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler 
{add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts 
"-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
index df4fdf24a4a..7ad277e0266 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
@@ -16,4 +16,5 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t 
n) {
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0" 
no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target { no-opts 
"-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf4,\s*t[au],\s*m[au]} { target { no-opts 
"-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
-- 
2.36.3



Re: [PATCH] RISC-V: Finalize VSETVL PASS implementation

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Wed, Jan 18, 2023 at 11:25 AM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (vsetvl_insn_p): Add condition to
> avoid ICE.
> (vsetvl_discard_result_insn_p): New function.
> (reg_killed_by_bb_p): rename to find_reg_killed_by.
> (find_reg_killed_by): New name.
> (get_vl): allow it to be called by more functions.
> (has_vsetvl_killed_avl_p): Add condition.
> (get_avl): allow it to be called by more functions.
> (insn_should_be_added_p): New function.
> (get_all_nonphi_defs): Refine function.
> (get_all_sets): Ditto.
> (get_same_bb_set): New function.
> (any_insn_in_bb_p): Ditto.
> (any_set_in_bb_p): Ditto.
> (get_vl_vtype_info): Add VLMAX forward optimization.
> (source_equal_p): Fix issues.
> (extract_single_source): Refine.
> (avl_info::multiple_source_equal_p): New function.
> (avl_info::operator==): Adjust for final version.
> (vl_vtype_info::operator==): Ditto.
> (vl_vtype_info::same_avl_p): Ditto.
> (vector_insn_info::parse_insn): Ditto.
> (vector_insn_info::available_p): New function.
> (vector_insn_info::merge): Adjust for final version.
> (vector_insn_info::dump): Add hard_empty.
> (pass_vsetvl::hard_empty_block_p): New function.
> (pass_vsetvl::backward_demand_fusion): Adjust for final version.
> (pass_vsetvl::forward_demand_fusion): Ditto.
> (pass_vsetvl::demand_fusion): Ditto.
> (pass_vsetvl::cleanup_illegal_dirty_blocks): New function.
> (pass_vsetvl::compute_local_properties): Adjust for final version.
> (pass_vsetvl::can_refine_vsetvl_p): Ditto.
> (pass_vsetvl::refine_vsetvls): Ditto.
> (pass_vsetvl::commit_vsetvls): Ditto.
> (pass_vsetvl::propagate_avl): New function.
> (pass_vsetvl::lazy_vsetvl): Adjust for new version.
> * config/riscv/riscv-vsetvl.h (enum def_type): New enum.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 930 +++
>  gcc/config/riscv/riscv-vsetvl.h  |  30 +-
>  2 files changed, 737 insertions(+), 223 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index b33c198bbd6..253bfc7b210 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -54,6 +54,8 @@ along with GCC; see the file COPYING3.  If not see
> used any more and VL operand of VSETVL instruction if it is not
> used by
> any non-debug instructions.
>
> +-  Phase 6 - Propagate AVL between vsetvl instructions.
> +
>  Implementation:
>
>  -  The subroutine of optimize == 0 is simple_vsetvl.
> @@ -175,8 +177,20 @@ vector_config_insn_p (rtx_insn *rinsn)
>  static bool
>  vsetvl_insn_p (rtx_insn *rinsn)
>  {
> +  if (!vector_config_insn_p (rinsn))
> +return false;
>return (INSN_CODE (rinsn) == CODE_FOR_vsetvldi
> -|| INSN_CODE (rinsn) == CODE_FOR_vsetvlsi);
> + || INSN_CODE (rinsn) == CODE_FOR_vsetvlsi);
> +}
> +
> +/* Return true if it is vsetvl zero, rs1.  */
> +static bool
> +vsetvl_discard_result_insn_p (rtx_insn *rinsn)
> +{
> +  if (!vector_config_insn_p (rinsn))
> +return false;
> +  return (INSN_CODE (rinsn) == CODE_FOR_vsetvl_discard_resultdi
> + || INSN_CODE (rinsn) == CODE_FOR_vsetvl_discard_resultsi);
>  }
>
>  static bool
> @@ -191,15 +205,27 @@ before_p (const insn_info *insn1, const insn_info
> *insn2)
>return insn1->compare_with (insn2) < 0;
>  }
>
> -static bool
> -reg_killed_by_bb_p (const bb_info *bb, rtx x)
> +static insn_info *
> +find_reg_killed_by (const bb_info *bb, rtx x)
>  {
> -  if (!x || vlmax_avl_p (x))
> -return false;
> -  for (const insn_info *insn : bb->real_nondebug_insns ())
> +  if (!x || vlmax_avl_p (x) || !REG_P (x))
> +return nullptr;
> +  for (insn_info *insn : bb->reverse_real_nondebug_insns ())
>  if (find_access (insn->defs (), REGNO (x)))
> -  return true;
> -  return false;
> +  return insn;
> +  return nullptr;
> +}
> +
> +/* Helper function to get VL operand.  */
> +static rtx
> +get_vl (rtx_insn *rinsn)
> +{
> +  if (has_vl_op (rinsn))
> +{
> +  extract_insn_cached (rinsn);
> +  return recog_data.operand[get_attr_vl_op_idx (rinsn)];
> +}
> +  return SET_DEST (XVECEXP (PATTERN (rinsn), 0, 0));
>  }
>
>  static bool
> @@ -208,6 +234,9 @@ has_vsetvl_killed_avl_p (const bb_info *bb, const
> vector_insn_info &info)
>if (info.dirty_with_killed_avl_p ())
>  {
>rtx avl = info.get_avl ();
> +  if (vlmax_avl_p (avl))
> +   return find_reg_killed_by (bb, get_vl (info.get_insn ()->rtl ()))
> +  != nullptr;
>for (const insn_info *insn : bb->reverse_real_nondebug_insns ())
> {
>   def_info *def = find_access (insn->defs (), REGNO (avl));
> @@ -229,18 +258,6 @@ has_vsetvl_k

Re: [PATCH] tree: Fix up tree_code_{length,type}

2023-01-27 Thread Patrick Palka via Gcc-patches
On Thu, 26 Jan 2023, Patrick Palka wrote:

> On Thu, 26 Jan 2023, Jakub Jelinek wrote:
> 
> > On Thu, Jan 26, 2023 at 09:45:35AM -0500, Patrick Palka via Gcc-patches 
> > wrote:
> > > > +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> > > > +#define END_OF_BASE_TREE_CODES tcc_exceptional,
> > > > +
> > > > +
> > > >  /* Class of tree given its code.  */
> > > > -extern const enum tree_code_class tree_code_type[];
> > > > +constexpr enum tree_code_class tree_code_type[] = {
> > > > +#include "all-tree.def"
> > > > +};
> > > > +
> > > > +#undef DEFTREECODE
> > > > +#undef END_OF_BASE_TREE_CODES
> > > >  
> > > >  /* Each tree code class has an associated string representation.
> > > > These must correspond to the tree_code_class entries.  */
> > > >  extern const char *const tree_code_class_strings[];
> > > >  
> > > >  /* Number of argument-words in each kind of tree-node.  */
> > > > -extern const unsigned char tree_code_length[];
> > > > +
> > > > +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> > > > +#define END_OF_BASE_TREE_CODES 0,
> > > > +constexpr unsigned char tree_code_length[] = {
> > > > +#include "all-tree.def"
> > > > +};
> > > > +
> > > > +#undef DEFTREECODE
> > > > +#undef END_OF_BASE_TREE_CODES
> > > 
> > > IIUC defining these globals as non-inline constexpr gives them internal
> > > linkage, and so each TU contains its own unique copy of these globals.
> > > This bloats cc1plus by a tiny bit and is technically an ODR violation
> > > because some inline functions such as tree_class_check also ODR-use
> > > these variables and so each defn of tree_class_check will refer to a
> > > "different" tree_code_class.  Since inline variables are a C++17
> > > feature, I guess we could fix this by defining the globals the old way
> > > before C++17 and as inline constexpr otherwise?
> > 
> > And I'd argue with the tiny bit.
> > In my x86_64-linux cc1plus from today, I see 193 _ZL16tree_code_length vars,
> > 374 bytes each, and 324 _ZL14tree_code_type vars, 1496 bytes each.
> > So, that means waste of 555016 .rodata bytes, plus being highly non-cache
> > friendly.
> > 
> > The following patch does that.
> > 
> > So far tested on x86_64-linux in my -O0 working tree (system gcc 12
> > compiler) where .rodata shrunk with the patch by 928896 bytes, in last
> > stage of a bootstrapped tree (built by today's prev-gcc) where .rodata
> > shrunk by 561728 bytes (in neither case .text or most other sections
> > changed sizes) and on powerpc64le-linux --disable-bootstrap
> > (system gcc 4.8.5) to test also the non-C++17 case.
> 
> LGTM FWIW.  On a related note I noticed the function
> tree.h:tree_operand_length is declared static and is then used in the
> non-static inline functions tree_operand_check etc, which seems to be
> also be a (harmless) ODR violation?
> 
> We probably should do s/static inline/inline throughout the header files
> at some point, which'd hopefully reduce the size of and speed up stage1
> cc1plus.

Mechanically replacing uses of static inline in headers via

  echo gcc/*.h gcc/*/*.h | xargs sed -i 's/^static inline/inline/g'

reduces rodata size of stage1 cc1plus by ~1.5MB and seems to make it ~2%
faster.  Not bad..

> 
> > 
> > Ok for trunk if it passes full bootstrap/regtest?
> > 
> > BTW, wonder if tree_code_type couldn't be an array of unsigned char
> > elements rather than enum tree_code_class and we'd then cast it
> > to the enum in the macro, that would shrink that array from 1496 bytes
> > to 374.  Of course, that sounds like stage1 material.
> > 
> > 2023-01-26  Patrick Palka  
> > Jakub Jelinek  
> > 
> > * tree-core.h (tree_code_type, tree_code_length): For
> > C++17 and later, add inline keyword, otherwise don't define
> > the arrays, but declare extern arrays.
> > * tree.cc (tree_code_type, tree_code_length): Define these
> > arrays for C++14 and older.
> > 
> > --- gcc/tree-core.h.jj  2023-01-02 09:32:31.188158094 +0100
> > +++ gcc/tree-core.h 2023-01-26 16:02:34.212113251 +0100
> > @@ -2284,17 +2284,20 @@ struct floatn_type_info {
> >  /* Matrix describing the structures contained in a given tree code.  */
> >  extern bool tree_contains_struct[MAX_TREE_CODES][64];
> >  
> > +/* Class of tree given its code.  */
> > +#if __cpp_inline_variables >= 201606L
> >  #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> >  #define END_OF_BASE_TREE_CODES tcc_exceptional,
> >  
> > -
> > -/* Class of tree given its code.  */
> > -constexpr enum tree_code_class tree_code_type[] = {
> > +constexpr inline enum tree_code_class tree_code_type[] = {
> >  #include "all-tree.def"
> >  };
> >  
> >  #undef DEFTREECODE
> >  #undef END_OF_BASE_TREE_CODES
> > +#else
> > +extern const enum tree_code_class tree_code_type[];
> > +#endif
> >  
> >  /* Each tree code class has an associated string representation.
> > These must correspond to the tree_code_class entries.  */
> > @@ -2302,14 +2305,18 @@ extern const char *const tree_code_class
> >  
> >

Re: [PATCH] RISC-V: Fix testcases check.

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Fri, Jan 27, 2023 at 8:30 PM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Fix testcase check.
> * gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Ditto.
>
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c | 2 +-
>  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c | 2 +-
>  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c  | 3 ++-
>  3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
> index e855f86b9a3..2e1f68f9bdc 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c
> @@ -37,4 +37,4 @@ void f (void * restrict in, void * restrict out, int l,
> int n, int m, int cond)
>}
>  }
>
> -/* { dg-final { scan-assembler-times
> {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
> 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts
> "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler
> {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
> { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts
> "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
> index 316a4ce6193..a3dca3834e3 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c
> @@ -36,4 +36,4 @@ void f (void * restrict in, void * restrict out, int l,
> int n, int m, int cond)
>}
>  }
>
> -/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0"
> no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler
> {add\s+\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+\s+vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]}
> { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-g" no-opts
> "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
> index df4fdf24a4a..7ad277e0266 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-18.c
> @@ -16,4 +16,5 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m,
> size_t n) {
>}
>  }
>
> -/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0"
> no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*t[au],\s*m[au]} { target {
> no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf4,\s*t[au],\s*m[au]} { target {
> no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
> --
> 2.36.3
>
>


Re: [PATCH] RISC-V: Add vlm/vsm C/C++ API intrinsics support

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Thu, Jan 19, 2023 at 2:08 PM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc (BASE): Add vlm/vsm
> support.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def (vlm): New
> define.
> (vsm): Ditto.
> * config/riscv/riscv-vector-builtins-shapes.cc (struct
> loadstore_def): Add vlm/vsm support.
> * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_B_OPS):
> Ditto.
> (vbool64_t): Ditto.
> (vbool32_t): Ditto.
> (vbool16_t): Ditto.
> (vbool8_t): Ditto.
> (vbool4_t): Ditto.
> (vbool2_t): Ditto.
> (vbool1_t): Ditto.
> * config/riscv/riscv-vector-builtins.cc (DEF_RVV_B_OPS): Ditto.
> (rvv_arg_type_info::get_tree_type): Ditto.
> (function_expander::use_contiguous_load_insn): Ditto.
> * config/riscv/vector.md (@pred_store): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/vsm-1.C: New test.
> * g++.target/riscv/rvv/rvv.exp: New test.
> * gcc.target/riscv/rvv/base/vlm_vsm-1.c: New test.
> * gcc.target/riscv/rvv/base/vlm_vsm-2.c: New test.
> * gcc.target/riscv/rvv/base/vlm_vsm-3.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  |  6 +-
>  .../riscv/riscv-vector-builtins-bases.h   |  2 +
>  .../riscv/riscv-vector-builtins-functions.def |  2 +
>  .../riscv/riscv-vector-builtins-shapes.cc |  3 +-
>  .../riscv/riscv-vector-builtins-types.def | 15 
>  gcc/config/riscv/riscv-vector-builtins.cc | 43 ++-
>  gcc/config/riscv/vector.md| 23 +-
>  .../g++.target/riscv/rvv/base/vsm-1.C | 40 ++
>  gcc/testsuite/g++.target/riscv/rvv/rvv.exp| 44 +++
>  .../gcc.target/riscv/rvv/base/vlm_vsm-1.c | 75 +++
>  .../gcc.target/riscv/rvv/base/vlm_vsm-2.c | 75 +++
>  .../gcc.target/riscv/rvv/base/vlm_vsm-3.c | 75 +++
>  12 files changed, 395 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vsm-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/rvv.exp
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlm_vsm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlm_vsm-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlm_vsm-3.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index af66b016b49..0da4797d272 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -84,7 +84,7 @@ public:
>}
>  };
>
> -/* Implements vle.v/vse.v codegen.  */
> +/* Implements vle.v/vse.v/vlm.v/vsm.v codegen.  */
>  template 
>  class loadstore : public function_base
>  {
> @@ -116,6 +116,8 @@ static CONSTEXPR const vsetvl vsetvl_obj;
>  static CONSTEXPR const vsetvl vsetvlmax_obj;
>  static CONSTEXPR const loadstore vle_obj;
>  static CONSTEXPR const loadstore vse_obj;
> +static CONSTEXPR const loadstore vlm_obj;
> +static CONSTEXPR const loadstore vsm_obj;
>
>  /* Declare the function base NAME, pointing it to an instance
> of class _obj.  */
> @@ -126,5 +128,7 @@ BASE (vsetvl)
>  BASE (vsetvlmax)
>  BASE (vle)
>  BASE (vse)
> +BASE (vlm)
> +BASE (vsm)
>
>  } // end namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 79684bcb50d..28151a8d8d2 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -28,6 +28,8 @@ extern const function_base *const vsetvl;
>  extern const function_base *const vsetvlmax;
>  extern const function_base *const vle;
>  extern const function_base *const vse;
> +extern const function_base *const vlm;
> +extern const function_base *const vsm;
>  }
>
>  } // end namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index e5ebb7d829c..63aa8fe32c8 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -42,5 +42,7 @@ DEF_RVV_FUNCTION (vsetvlmax, vsetvlmax, none_preds,
> i_none_size_void_ops)
>  /* 7. Vector Loads and Stores. */
>  DEF_RVV_FUNCTION (vle, loadstore, full_preds, all_v_scalar_const_ptr_ops)
>  DEF_RVV_FUNCTION (vse, loadstore, none_m_preds, all_v_scalar_ptr_ops)
> +DEF_RVV_FUNCTION (vlm, loadstore, none_preds, b_v_scalar_const_ptr_ops)
> +DEF_RVV_FUNCTION (vsm, loadstore, none_preds, b_v_scalar_ptr_ops)
>
>  #undef DEF_RVV_FUNCTION
> diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> index 0332c031ce4..76cf14a8cc4 100644
> --- 

Re: [PATCH] RISC-V: Fix vop_m overloaded C++ API name.

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Fri, Jan 20, 2023 at 10:21 AM  wrote:

> From: Ju-Zhe Zhong 
>
> According to
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/master/
> For "vop_m" intrinsics, C++ overloaded API does not have "_m" suffix.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-shapes.cc (struct
> loadstore_def): Remove _m suffix for "vop_m" C++ overloaded API name.
>
> ---
>  gcc/config/riscv/riscv-vector-builtins-shapes.cc | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> index 76cf14a8cc4..56697f71cbd 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> @@ -128,6 +128,10 @@ struct loadstore_def : public build_base
> b.append_name (type_suffixes[instance.type.index].vector);
>}
>
> +/* According to rvv-intrinsic-doc, it does not add "_m" suffix
> +   for vop_m C++ overloaded API.  */
> +if (overloaded_p && instance.pred == PRED_TYPE_m)
> +  return b.finish_name ();
>  b.append_name (predication_suffixes[instance.pred]);
>  return b.finish_name ();
>}
> --
> 2.36.3
>
>


Re: [PATCH] RISC-V: Add vle/vse C++ overloaded API intrinsic testcases

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Fri, Jan 20, 2023 at 10:26 AM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/vle-1.C: New test.
> * g++.target/riscv/rvv/base/vle_tu-1.C: New test.
> * g++.target/riscv/rvv/base/vle_tum-1.C: New test.
> * g++.target/riscv/rvv/base/vle_tumu-1.C: New test.
> * g++.target/riscv/rvv/base/vse-1.C: New test.
>
> ---
>  .../g++.target/riscv/rvv/base/vle-1.C | 345 +
>  .../g++.target/riscv/rvv/base/vle_tu-1.C  | 345 +
>  .../g++.target/riscv/rvv/base/vle_tum-1.C | 345 +
>  .../g++.target/riscv/rvv/base/vle_tumu-1.C| 345 +
>  .../g++.target/riscv/rvv/base/vse-1.C | 685 ++
>  5 files changed, 2065 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vle-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vle_tu-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vle_tum-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vle_tumu-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/vse-1.C
>
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/vle-1.C
> b/gcc/testsuite/g++.target/riscv/rvv/base/vle-1.C
> new file mode 100644
> index 000..e06f62a8fb9
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/vle-1.C
> @@ -0,0 +1,345 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32d -O3 -fno-schedule-insns
> -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +vint8mf8_t
> +test___riscv_vle8(vbool64_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint8mf4_t
> +test___riscv_vle8(vbool32_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint8mf2_t
> +test___riscv_vle8(vbool16_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint8m1_t
> +test___riscv_vle8(vbool8_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint8m2_t
> +test___riscv_vle8(vbool4_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint8m4_t
> +test___riscv_vle8(vbool2_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint8m8_t
> +test___riscv_vle8(vbool1_t mask,int8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8mf8_t
> +test___riscv_vle8(vbool64_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8mf4_t
> +test___riscv_vle8(vbool32_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8mf2_t
> +test___riscv_vle8(vbool16_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8m1_t
> +test___riscv_vle8(vbool8_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8m2_t
> +test___riscv_vle8(vbool4_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8m4_t
> +test___riscv_vle8(vbool2_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vuint8m8_t
> +test___riscv_vle8(vbool1_t mask,uint8_t* base,size_t vl)
> +{
> +  return __riscv_vle8(mask,base,vl);
> +}
> +
> +vint16mf4_t
> +test___riscv_vle16(vbool64_t mask,int16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vint16mf2_t
> +test___riscv_vle16(vbool32_t mask,int16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vint16m1_t
> +test___riscv_vle16(vbool16_t mask,int16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vint16m2_t
> +test___riscv_vle16(vbool8_t mask,int16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vint16m4_t
> +test___riscv_vle16(vbool4_t mask,int16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vint16m8_t
> +test___riscv_vle16(vbool2_t mask,int16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vuint16mf4_t
> +test___riscv_vle16(vbool64_t mask,uint16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vuint16mf2_t
> +test___riscv_vle16(vbool32_t mask,uint16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vuint16m1_t
> +test___riscv_vle16(vbool16_t mask,uint16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vuint16m2_t
> +test___riscv_vle16(vbool8_t mask,uint16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vuint16m4_t
> +test___riscv_vle16(vbool4_t mask,uint16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vuint16m8_t
> +test___riscv_vle16(vbool2_t mask,uint16_t* base,size_t vl)
> +{
> +  return __riscv_vle16(mask,base,vl);
> +}
> +
> +vint32mf2_t
> +test___riscv_vle32(vbool

Re: [PATCH] tree: Fix up tree_code_{length,type}

2023-01-27 Thread Richard Biener via Gcc-patches



> Am 27.01.2023 um 13:41 schrieb Patrick Palka via Gcc-patches 
> :
> 
> On Thu, 26 Jan 2023, Patrick Palka wrote:
> 
>>> On Thu, 26 Jan 2023, Jakub Jelinek wrote:
>>> 
>>> On Thu, Jan 26, 2023 at 09:45:35AM -0500, Patrick Palka via Gcc-patches 
>>> wrote:
> +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> +#define END_OF_BASE_TREE_CODES tcc_exceptional,
> +
> +
> /* Class of tree given its code.  */
> -extern const enum tree_code_class tree_code_type[];
> +constexpr enum tree_code_class tree_code_type[] = {
> +#include "all-tree.def"
> +};
> +
> +#undef DEFTREECODE
> +#undef END_OF_BASE_TREE_CODES
> 
> /* Each tree code class has an associated string representation.
>These must correspond to the tree_code_class entries.  */
> extern const char *const tree_code_class_strings[];
> 
> /* Number of argument-words in each kind of tree-node.  */
> -extern const unsigned char tree_code_length[];
> +
> +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> +#define END_OF_BASE_TREE_CODES 0,
> +constexpr unsigned char tree_code_length[] = {
> +#include "all-tree.def"
> +};
> +
> +#undef DEFTREECODE
> +#undef END_OF_BASE_TREE_CODES
 
 IIUC defining these globals as non-inline constexpr gives them internal
 linkage, and so each TU contains its own unique copy of these globals.
 This bloats cc1plus by a tiny bit and is technically an ODR violation
 because some inline functions such as tree_class_check also ODR-use
 these variables and so each defn of tree_class_check will refer to a
 "different" tree_code_class.  Since inline variables are a C++17
 feature, I guess we could fix this by defining the globals the old way
 before C++17 and as inline constexpr otherwise?
>>> 
>>> And I'd argue with the tiny bit.
>>> In my x86_64-linux cc1plus from today, I see 193 _ZL16tree_code_length vars,
>>> 374 bytes each, and 324 _ZL14tree_code_type vars, 1496 bytes each.
>>> So, that means waste of 555016 .rodata bytes, plus being highly non-cache
>>> friendly.
>>> 
>>> The following patch does that.
>>> 
>>> So far tested on x86_64-linux in my -O0 working tree (system gcc 12
>>> compiler) where .rodata shrunk with the patch by 928896 bytes, in last
>>> stage of a bootstrapped tree (built by today's prev-gcc) where .rodata
>>> shrunk by 561728 bytes (in neither case .text or most other sections
>>> changed sizes) and on powerpc64le-linux --disable-bootstrap
>>> (system gcc 4.8.5) to test also the non-C++17 case.
>> 
>> LGTM FWIW.  On a related note I noticed the function
>> tree.h:tree_operand_length is declared static and is then used in the
>> non-static inline functions tree_operand_check etc, which seems to be
>> also be a (harmless) ODR violation?
>> 
>> We probably should do s/static inline/inline throughout the header files
>> at some point, which'd hopefully reduce the size of and speed up stage1
>> cc1plus.
> 
> Mechanically replacing uses of static inline in headers via
> 
>  echo gcc/*.h gcc/*/*.h | xargs sed -i 's/^static inline/inline/g'
> 
> reduces rodata size of stage1 cc1plus by ~1.5MB and seems to make it ~2%
> faster.  Not bad..

Nice.

Richard 


>> 
>>> 
>>> Ok for trunk if it passes full bootstrap/regtest?
>>> 
>>> BTW, wonder if tree_code_type couldn't be an array of unsigned char
>>> elements rather than enum tree_code_class and we'd then cast it
>>> to the enum in the macro, that would shrink that array from 1496 bytes
>>> to 374.  Of course, that sounds like stage1 material.
>>> 
>>> 2023-01-26  Patrick Palka  
>>>Jakub Jelinek  
>>> 
>>>* tree-core.h (tree_code_type, tree_code_length): For
>>>C++17 and later, add inline keyword, otherwise don't define
>>>the arrays, but declare extern arrays.
>>>* tree.cc (tree_code_type, tree_code_length): Define these
>>>arrays for C++14 and older.
>>> 
>>> --- gcc/tree-core.h.jj2023-01-02 09:32:31.188158094 +0100
>>> +++ gcc/tree-core.h2023-01-26 16:02:34.212113251 +0100
>>> @@ -2284,17 +2284,20 @@ struct floatn_type_info {
>>> /* Matrix describing the structures contained in a given tree code.  */
>>> extern bool tree_contains_struct[MAX_TREE_CODES][64];
>>> 
>>> +/* Class of tree given its code.  */
>>> +#if __cpp_inline_variables >= 201606L
>>> #define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
>>> #define END_OF_BASE_TREE_CODES tcc_exceptional,
>>> 
>>> -
>>> -/* Class of tree given its code.  */
>>> -constexpr enum tree_code_class tree_code_type[] = {
>>> +constexpr inline enum tree_code_class tree_code_type[] = {
>>> #include "all-tree.def"
>>> };
>>> 
>>> #undef DEFTREECODE
>>> #undef END_OF_BASE_TREE_CODES
>>> +#else
>>> +extern const enum tree_code_class tree_code_type[];
>>> +#endif
>>> 
>>> /* Each tree code class has an associated string representation.
>>>These must correspond to the tree_code_class entries.  */
>>> @@ -2302,14 +2305,18 @@ extern const char

[PATCH] driver: fix -gz=none error message with missing zstd

2023-01-27 Thread Martin Liška
We wrongly report:

$ echo "int main () {}" | gcc -xc -gz=none -
gcc: error: -gz=zstd is not supported in this configuration

if zstd compression is not supported by binutils. We should emit the
error message only if -gz=zstd.

PR driver/108572

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* gcc.cc (LINK_COMPRESS_DEBUG_SPEC): Report error only for
-gz=zstd.
---
 gcc/gcc.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index d8130024a8c..becc56051a8 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -842,7 +842,7 @@ proper position among the other output files.  */
 #define LINK_COMPRESS_DEBUG_SPEC \
" %{gz|gz=zlib:"  LD_COMPRESS_DEBUG_OPTION "=zlib}" \
" %{gz=none:" LD_COMPRESS_DEBUG_OPTION "=none}" \
-   " %{gz*:%e-gz=zstd is not supported in this configuration} " \
+   " %{gz=zstd:%e-gz=zstd is not supported in this configuration} " \
" %{gz=zlib-gnu:}" /* Ignore silently zlib-gnu option value.  */
 #elif HAVE_LD_COMPRESS_DEBUG == 2
 /* ELF gABI style and ZSTD.  */
-- 
2.39.1



RE: [PATCH]AArch64: Fix native detection in the presence of mandatory features which don't have midr values

2023-01-27 Thread Tamar Christina via Gcc-patches
Hi Richard,

> > +  /* The smallest set of feature bits to toggle to enable this
> > + option.  */
> >aarch64_feature_flags flag_canonical;
> > -  aarch64_feature_flags flags_on;
> > -  aarch64_feature_flags flags_off;
> > +  /* If this feature is turned on, these bits also need to be turned
> > + on.  */  const unsigned long flags_on;
> > +  /* If this feature is turned off, these bits also need to be turned
> > + off.  */  const unsigned long flags_off;
> 
> Please don't undo the aarch64_feature_flags abstraction.  "long" isn't
> enough for x86_32 to aarch64 cross-compilers (yes, I know, but still), and
> we're not far off running out of room in the uint64_t.  The point of the
> abstraction was to reduce the number of changes that we need once we
> have 65 or more features, architecture levels, etc.
> 

Sorry, the duplicate copy confused me.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? I think it needs backporting but need to verify older compilers.
If one is required, OK for backporting?

Thanks,
Tamar

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_option_extension): Add native_detect_p and document 
struct
a bit more.
(all_extensions): Set new field native_detect_p.
* config/aarch64/aarch64.cc (struct aarch64_option_extension): Delete
unused struct.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_19: New test.
* gcc.target/aarch64/cpunative/info_20: New test.
* gcc.target/aarch64/cpunative/info_21: New test.
* gcc.target/aarch64/cpunative/info_22: New test.
* gcc.target/aarch64/cpunative/native_cpu_19.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_20.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_21.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_22.c: New test.

--- inline copy of patch ---

diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
a9695d60197e6585957b293d2d755a557e124d4f..5a5ebfa1b724b173dd01ec71ffd63662037b3b74
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -139,20 +139,28 @@ aarch64_handle_option (struct gcc_options *opts,
 /* An ISA extension in the co-processor and main instruction set space.  */
 struct aarch64_option_extension
 {
+  /* The extension name to pass on to the assembler.  */
   const char *name;
+  /* The smallest set of feature bits to toggle to enable this option.  */
   aarch64_feature_flags flag_canonical;
+  /* If this feature is turned on, these bits also need to be turned on.  */
   aarch64_feature_flags flags_on;
+  /* If this feature is turned off, these bits also need to be turned off.  */
   aarch64_feature_flags flags_off;
+  /* Indicates whether this feature is taken into account during native cpu
+ detection.  */
+  bool native_detect_p;
 };
 
 /* ISA extensions in AArch64.  */
 static constexpr aarch64_option_extension all_extensions[] =
 {
-#define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, F) \
+#define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
-   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
+   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
+   FEATURE_STRING[0]},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0}
+  {NULL, 0, 0, 0, false}
 };
 
 struct processor_name_to_arch
@@ -325,9 +333,13 @@ aarch64_get_extension_string_for_isa_flags
outstr += opt.name;
   }
 
-  /* Remove the features in current_flags & ~isa_flags.  */
+  /* Remove the features in current_flags & ~isa_flags.  If the feature does
+ not have an HWCAPs then it shouldn't be taken into account for feature
+ detection because one way or another we can't tell if it's available
+ or not.  */
   for (auto &opt : all_extensions)
-if (opt.flag_canonical & current_flags & ~isa_flags)
+if (opt.native_detect_p
+   && (opt.flag_canonical & current_flags & ~isa_flags))
   {
current_flags &= ~opt.flags_off;
outstr += "+no";
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d36b57341b336a81dc2e1a975986b3e37402602a..860aeb3e5fbf655e87284be28cc72648c1cd71f9
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2808,14 +2808,6 @@ static const struct attribute_spec 
aarch64_attribute_table[] =
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
-/* An ISA extension in the co-processor and main instruction set space.  */
-struct aarch64_option_extension
-{
-  const char *const name;
-  const unsigned long flags_on;
-  const unsigned long flags_off;
-};
-
 typedef enum aarch64_cond_code
 {
   AARCH64_EQ = 0, AARCH64_NE, AARCH64_CS, AARCH64_CC, AARCH64_MI, AARCH64_PL,
diff --g

[PATCH] arm: Implement arm Function target attribute 'branch-protection'

2023-01-27 Thread Andrea Corallo via Gcc-patches
gcc/

* config/arm/arm.cc (arm_valid_target_attribute_rec): Add ARM function
attribute 'branch-protection' and parse its options.
* doc/extend.texi: Document ARM Function attribute 'branch-protection'.

gcc/testsuite/

* gcc.target/arm/acle/pacbti-m-predef-13.c: New test.

Co-Authored-By: Tejas Belagod  
---
 gcc/config/arm/arm.cc | 16 
 gcc/doc/extend.texi   |  7 
 .../gcc.target/arm/acle/pacbti-m-predef-13.c  | 41 +++
 3 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index efc48349dd3..add33090f18 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -33568,6 +33568,22 @@ arm_valid_target_attribute_rec (tree args, struct 
gcc_options *opts)
 
  opts->x_arm_arch_string = xstrndup (arch, strlen (arch));
}
+  else if (startswith (q, "branch-protection="))
+   {
+ char *bp_str = q + strlen ("branch-protection=");
+
+ opts->x_arm_branch_protection_string
+   = xstrndup (bp_str, strlen (bp_str));
+
+ /* Capture values from target attribute.  */
+ aarch_validate_mbranch_protection
+   (opts->x_arm_branch_protection_string);
+
+ /* Init function target attr values.  */
+ opts->x_aarch_ra_sign_scope = aarch_ra_sign_scope;
+ opts->x_aarch_enable_bti = aarch_enable_bti;
+
+   }
   else if (q[0] == '+')
{
  opts->x_arm_arch_string
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4a89a3eae7c..23ee43919dd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4492,6 +4492,13 @@ Enable or disable calls to out-of-line helpers to 
implement atomic operations.
 This corresponds to the behavior of the command line options
 @option{-moutline-atomics} and @option{-mno-outline-atomics}.
 
+@item branch-protection=
+@cindex @code{branch-protection=} function attribute, arm
+Select the function scope on which branch protection will be applied.
+The behavior and permissible arguments are the same as for the
+command-line option @option{-mbranch-protection=}.  The default value
+is @code{none}.
+
 @end table
 
 The above target attributes can be specified as follows:
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c
new file mode 100644
index 000..b6d2df53072
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target mbranch_protection_ok } */
+/* { dg-options "-march=armv8.1-m.main+fp -mbranch-protection=pac-ret+leaf 
-mfloat-abi=hard --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#if defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be undefined."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
+
+/*
+**foo:
+** bti
+** ...
+*/
+__attribute__((target("branch-protection=pac-ret+bti"), noinline))
+int foo ()
+{
+  return 3;
+}
+
+/*
+**main:
+** pac ip, lr, sp
+** ...
+** aut ip, lr, sp
+** bx  lr
+*/
+int
+main()
+{
+  return 1 + foo ();
+}
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 1" } } */
+/* { dg-final { scan-assembler-not "\.eabi_attribute 74" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
-- 
2.25.1



Re: [PATCH 0/9] Don't add crtfastmath.o for -shared

2023-01-27 Thread Richard Biener via Gcc-patches
On Fri, 27 Jan 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > On Fri, 27 Jan 2023, Kyrylo Tkachov wrote:
> >
> >> Thanks for fixing this Richard.
> >> 
> >> > -Original Message-
> >> > From: Gcc-patches  >> > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
> >> > Biener via Gcc-patches
> >> > Sent: Friday, January 13, 2023 8:05 AM
> >> > To: gcc-patches@gcc.gnu.org
> >> > Cc: hongtao@intel.com; ubiz...@gmail.com
> >> > Subject: [PATCH 0/9] Don't add crtfastmath.o for -shared
> >> > 
> >> > 
> >> > This is a series completing the fix for PR55522 which got a fix for
> >> > x86-linux already but left all other targets unfixed (including
> >> > x86-cygwin, x86-darwin and x86-mingw32).  The following series
> >> > applies a similar change to other specs using crtfastmath.o,
> >> > the changes are untested.
> >> > 
> >> > Target maintainers are CCed and I hope they can smoke-test the
> >> > changes.
> >> > 
> >> 
> >> Do you think it's something we should mention in changes.html for GCC 13?
> >
> > Sure, I will add something once the rest of the series is approved.
> 
> Mind if I rubber-stamp OK the unreviewed changes?  I don't think there's
> a good justification for making a different choice on different targets.

Sure, I've pushed the rest of the changes now.  I'm installing below
to changes.html.

Richard.

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 6cd5dd64..9ecd115c 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -123,6 +123,10 @@ a work-in-progress.
 system overcommitting.
 
   
+  -Ofast, -ffast-math and 
-funsafe-math-optimizations
+  will no longer add startup code to alter the floating-point environment
+  when producing a shared object with -shared.
+  
 
 
 


[committed] libstdc++: Use dg-bogus in new test [PR108554]

2023-01-27 Thread Jonathan Wakely via Gcc-patches
I messed up my first attempt to use dg-bogus with a typo, so didn't
include it in this new test. But it works if I fix the typo.

Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/108554
* testsuite/23_containers/map/modifiers/108554.cc: Use dg-bogus.
---
 libstdc++-v3/testsuite/23_containers/map/modifiers/108554.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/testsuite/23_containers/map/modifiers/108554.cc 
b/libstdc++-v3/testsuite/23_containers/map/modifiers/108554.cc
index 7076682f4c9..811a479b382 100644
--- a/libstdc++-v3/testsuite/23_containers/map/modifiers/108554.cc
+++ b/libstdc++-v3/testsuite/23_containers/map/modifiers/108554.cc
@@ -4,6 +4,8 @@
 // PR libstdc++/108554
 // Warning from -Wnull-dereference when extracting a unique_ptr from a map.
 
+// { dg-bogus "null pointer dereference" "PR 108554" { target *-*-* } 0 }
+
 #include 
 #include 
 #include 
-- 
2.39.1



[committed] libstdc++: Use constant for name of tzdata file

2023-01-27 Thread Jonathan Wakely via Gcc-patches
Thanks to Michael Welsh Duggan for pointing this out.

Tested x86_64-linux. Pushed to trunk.

-- >8 --

There's a string_view with this filename, which should have been used
instead of a string literal.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (tzdata_stream): Use constant instead of
string literal.
---
 libstdc++-v3/src/c++20/tzdb.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index c956e861891..2e7e173f0ef 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -1106,7 +1106,7 @@ namespace std::chrono
 
   tzdata_stream() : istream(nullptr)
   {
-   if (string path = zoneinfo_file("/tzdata.zi"); !path.empty())
+   if (string path = zoneinfo_file(tzdata_file); !path.empty())
{
  filebuf fbuf;
  if (fbuf.open(path, std::ios::in))
-- 
2.39.1



Re: [PATCH]AArch64: Fix native detection in the presence of mandatory features which don't have midr values

2023-01-27 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi Richard,
>
>> > +  /* The smallest set of feature bits to toggle to enable this
>> > + option.  */
>> >aarch64_feature_flags flag_canonical;
>> > -  aarch64_feature_flags flags_on;
>> > -  aarch64_feature_flags flags_off;
>> > +  /* If this feature is turned on, these bits also need to be turned
>> > + on.  */  const unsigned long flags_on;
>> > +  /* If this feature is turned off, these bits also need to be turned
>> > + off.  */  const unsigned long flags_off;
>> 
>> Please don't undo the aarch64_feature_flags abstraction.  "long" isn't
>> enough for x86_32 to aarch64 cross-compilers (yes, I know, but still), and
>> we're not far off running out of room in the uint64_t.  The point of the
>> abstraction was to reduce the number of changes that we need once we
>> have 65 or more features, architecture levels, etc.
>> 
>
> Sorry, the duplicate copy confused me.
>
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? I think it needs backporting but need to verify older 
> compilers.
> If one is required, OK for backporting?

OK for both, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (struct aarch64_option_extension): Add native_detect_p and document 
> struct
>   a bit more.
>   (all_extensions): Set new field native_detect_p.
>   * config/aarch64/aarch64.cc (struct aarch64_option_extension): Delete
>   unused struct.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/cpunative/info_19: New test.
>   * gcc.target/aarch64/cpunative/info_20: New test.
>   * gcc.target/aarch64/cpunative/info_21: New test.
>   * gcc.target/aarch64/cpunative/info_22: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_19.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_20.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_21.c: New test.
>   * gcc.target/aarch64/cpunative/native_cpu_22.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> a9695d60197e6585957b293d2d755a557e124d4f..5a5ebfa1b724b173dd01ec71ffd63662037b3b74
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -139,20 +139,28 @@ aarch64_handle_option (struct gcc_options *opts,
>  /* An ISA extension in the co-processor and main instruction set space.  */
>  struct aarch64_option_extension
>  {
> +  /* The extension name to pass on to the assembler.  */
>const char *name;
> +  /* The smallest set of feature bits to toggle to enable this option.  */
>aarch64_feature_flags flag_canonical;
> +  /* If this feature is turned on, these bits also need to be turned on.  */
>aarch64_feature_flags flags_on;
> +  /* If this feature is turned off, these bits also need to be turned off.  
> */
>aarch64_feature_flags flags_off;
> +  /* Indicates whether this feature is taken into account during native cpu
> + detection.  */
> +  bool native_detect_p;
>  };
>  
>  /* ISA extensions in AArch64.  */
>  static constexpr aarch64_option_extension all_extensions[] =
>  {
> -#define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, F) \
> +#define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
>{NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
> -   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
> +   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
> +   FEATURE_STRING[0]},
>  #include "config/aarch64/aarch64-option-extensions.def"
> -  {NULL, 0, 0, 0}
> +  {NULL, 0, 0, 0, false}
>  };
>  
>  struct processor_name_to_arch
> @@ -325,9 +333,13 @@ aarch64_get_extension_string_for_isa_flags
>   outstr += opt.name;
>}
>  
> -  /* Remove the features in current_flags & ~isa_flags.  */
> +  /* Remove the features in current_flags & ~isa_flags.  If the feature does
> + not have an HWCAPs then it shouldn't be taken into account for feature
> + detection because one way or another we can't tell if it's available
> + or not.  */
>for (auto &opt : all_extensions)
> -if (opt.flag_canonical & current_flags & ~isa_flags)
> +if (opt.native_detect_p
> + && (opt.flag_canonical & current_flags & ~isa_flags))
>{
>   current_flags &= ~opt.flags_off;
>   outstr += "+no";
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> d36b57341b336a81dc2e1a975986b3e37402602a..860aeb3e5fbf655e87284be28cc72648c1cd71f9
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2808,14 +2808,6 @@ static const struct attribute_spec 
> aarch64_attribute_table[] =
>{ NULL, 0, 0, false, false, false, false, NULL, NULL }
>  };
>  
> -/* An ISA extension in the co-processor and main instruction set space.  */
> -struct aarch64_opt

Re: [PATCH]AArch64: Fix native detection in the presence of mandatory features which don't have midr values

2023-01-27 Thread Andrew Pinski via Gcc-patches
On Fri, Jan 27, 2023 at 4:12 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Tamar Christina  writes:
> > Hi All,
> >
> > aarch64-option-extensions.def explicitly defines the semantics for an empty 
> > midr
> > field as being:
> >
> >  In that case this field
> >  should contain a space (" ") separated list of the strings in 
> > 'Features'
> >  that are required.  Their order is not important.  An empty string 
> > means
> >  do not detect this feature during auto detection.
> >
> > That is to say, an empty string means that we don't know the midr value for 
> > this
> > feature and so it just shouldn't be taken into account for native features
> > detection.  However this meaning seems to have gotten lost at some point.
> >
> > This results in e.g. -mcpu=native on a Neoverse N2 disabling features it 
> > does
> > have.  Essentially we disabled any mandatory feature for which there is no 
> > midr
> > entry.
> >
> > The rationale for having -mcpu=native being able to disable features at 
> > all, is
> > because the kernel is able to disable a mandatory feature for correctness
> > issues.  Unfortunately we can't distinguish between "old kernel"
> > and "kernel disabled".
> >
> > This patch adds a new field that indicates whether the midr field has any 
> > value
> > at all.  If there's no value we skip the extension when determining the 
> > "off"
> > flags.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? I think it needs backporting but need to verify older 
> > compilers.
> > If one is required, OK for backporting?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >   * common/config/aarch64/aarch64-common.cc
> >   (struct aarch64_option_extension): Add native_detect and document 
> > struct
> >   a bit more.
> >   (all_extensions): Set new field native_detect.
> >   * config/aarch64/aarch64.cc (struct aarch64_option_extension): Delete
> >   unused struct.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/cpunative/info_19: New test.
> >   * gcc.target/aarch64/cpunative/info_20: New test.
> >   * gcc.target/aarch64/cpunative/info_21: New test.
> >   * gcc.target/aarch64/cpunative/info_22: New test.
> >   * gcc.target/aarch64/cpunative/native_cpu_19.c: New test.
> >   * gcc.target/aarch64/cpunative/native_cpu_20.c: New test.
> >   * gcc.target/aarch64/cpunative/native_cpu_21.c: New test.
> >   * gcc.target/aarch64/cpunative/native_cpu_22.c: New test.
>
> Mostly LGTM, but some nits below.
>
> > --- inline copy of patch --
> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> > b/gcc/common/config/aarch64/aarch64-common.cc
> > index 
> > a9695d60197e6585957b293d2d755a557e124d4f..4e9e9c0bf86a5ef2667f0bb7e646ba06152aa982
> >  100644
> > --- a/gcc/common/config/aarch64/aarch64-common.cc
> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
> > @@ -139,10 +139,17 @@ aarch64_handle_option (struct gcc_options *opts,
> >  /* An ISA extension in the co-processor and main instruction set space.  */
> >  struct aarch64_option_extension
> >  {
> > -  const char *name;
> > +  /* The extension name to pass on to the assembler.  */
> > +  const char *const name;
>
> There's no need to make name itself const.
>
> > +  /* The smallest set of feature bits to toggle to enable this option.  */
> >aarch64_feature_flags flag_canonical;
> > -  aarch64_feature_flags flags_on;
> > -  aarch64_feature_flags flags_off;
> > +  /* If this feature is turned on, these bits also need to be turned on.  
> > */
> > +  const unsigned long flags_on;
> > +  /* If this feature is turned off, these bits also need to be turned off. 
> >  */
> > +  const unsigned long flags_off;
>
> Please don't undo the aarch64_feature_flags abstraction.  "long" isn't
> enough for x86_32 to aarch64 cross-compilers (yes, I know, but still),
> and we're not far off running out of room in the uint64_t.  The point
> of the abstraction was to reduce the number of changes that we need
> once we have 65 or more features, architecture levels, etc.

And a cross from x86_64-mingw to aarch64, long would be still 32bit as
mingw (and Windows in general) is a LLP64IL32 target. x86_64-mingw is
a less obscured target too.

Thanks,
Andrew Pinski


>
> > +  /* Indicates whether this feature is taken into account during native cpu
> > + detection.  */
> > +  bool native_detect;
> >  };
> >
> >  /* ISA extensions in AArch64.  */
> > @@ -150,9 +157,9 @@ static constexpr aarch64_option_extension 
> > all_extensions[] =
> >  {
> >  #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, F) \
> >{NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
> > -   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
> > +   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), strlen 
> > (F)},
>
> strlen isn't guaranteed to be evaluated at compile time.  How about
> F[0] instead?  Would be good to rename F to 

[pushed] aarch64: Prevent simd tests from being optimised away

2023-01-27 Thread Richard Sandiford via Gcc-patches
The vqdml[as]l[hs]_laneq_* tests were folded at compile time, meaning
that we didn't have any Advanced SIMD instructions in the assembly.
Kyrill's preference was to use wrapper functions, so this patch does
that for the failing tests and for others that had scan-assemblers
with inline intrinsics calls.  (There were some tests that already
used wrapper functions, some that used volatile, some that used
inline asm barriers, and some that had no separation.)

Doing that for vqdmulhs_lane_s32.c meant that we generated the scalar
form of the instruction, rather than a vector instruction operating
on lane 0.  That seems fair enough, so the patch keeps that test but
adds a second one for lane 1.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/testsuite/
* gcc.target/aarch64/simd/vfma_f64.c: Use a wrapper function
rather than an asm barrier.
* gcc.target/aarch64/simd/vfms_f64.c: Likewise.
* gcc.target/aarch64/simd/vmul_f64_1.c: Use a wrapper function
rather than volatile.
* gcc.target/aarch64/simd/vmul_n_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmlalh_laneq_s16_1.c: Use a wrapper
function.  Remove -fno-inline.
* gcc.target/aarch64/simd/vqdmlals_laneq_s32_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmlslh_laneq_s16_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmlsls_laneq_s32_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmulhh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmulhh_laneq_s16_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmulhs_laneq_s32_1.c: Likewise.
* gcc.target/aarch64/simd/vqrdmulhh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqrdmulhh_laneq_s16_1.c: Likewise.
* gcc.target/aarch64/simd/vqrdmulhs_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqrdmulhs_laneq_s32_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmulhs_lane_s32.c: Likewise.
Allow the scalar form to be used when operating on lane 0.
Add a test for lane 1.
---
 .../gcc.target/aarch64/simd/vfma_f64.c| 27 +--
 .../gcc.target/aarch64/simd/vfms_f64.c| 27 +--
 .../gcc.target/aarch64/simd/vmul_f64_1.c  | 12 ---
 .../gcc.target/aarch64/simd/vmul_n_f64_1.c| 12 ---
 .../aarch64/simd/vqdmlalh_laneq_s16_1.c   | 20 +--
 .../aarch64/simd/vqdmlals_laneq_s32_1.c   | 20 +--
 .../aarch64/simd/vqdmlslh_laneq_s16_1.c   | 20 +--
 .../aarch64/simd/vqdmlsls_laneq_s32_1.c   | 21 ++--
 .../aarch64/simd/vqdmulhh_lane_s16.c  | 15 +
 .../aarch64/simd/vqdmulhh_laneq_s16_1.c   | 18 +-
 .../aarch64/simd/vqdmulhs_lane_s32.c  | 33 ++-
 .../aarch64/simd/vqdmulhs_laneq_s32_1.c   | 18 +-
 .../aarch64/simd/vqrdmulhh_lane_s16.c | 15 +
 .../aarch64/simd/vqrdmulhh_laneq_s16_1.c  | 18 +-
 .../aarch64/simd/vqrdmulhs_lane_s32.c | 15 +
 .../aarch64/simd/vqrdmulhs_laneq_s32_1.c  | 18 +-
 16 files changed, 163 insertions(+), 146 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c
index ef414f1b2fc..467c740ea12 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c
@@ -7,33 +7,24 @@
 
 #define EPS 1.0e-15
 
-#define INHIB_OPT(x) asm volatile ("mov %d0, %1.d[0]"  \
-  : "=w"(x)\
-  : "w"(x) \
-  : /* No clobbers. */);
-
 extern void abort (void);
 
+float64_t __attribute__((noipa))
+test_vfma (float64x1_t arg1, float64x1_t arg2, float64x1_t arg3)
+{
+  return vget_lane_f64 (vfma_f64 (arg1, arg2, arg3), 0);
+}
+
 int
 main (void)
 {
-  float64x1_t arg1;
-  float64x1_t arg2;
-  float64x1_t arg3;
-
   float64_t expected;
   float64_t actual;
 
-  arg1 = vcreate_f64 (0x3fe3955382d35b0eULL);
-  arg2 = vcreate_f64 (0x3fa88480812d6670ULL);
-  arg3 = vcreate_f64 (0x3fd5791ae2a92572ULL);
-
-  INHIB_OPT (arg1);
-  INHIB_OPT (arg2);
-  INHIB_OPT (arg3);
-
   expected = 0.6280448184360076;
-  actual = vget_lane_f64 (vfma_f64 (arg1, arg2, arg3), 0);
+  actual = test_vfma (vcreate_f64 (0x3fe3955382d35b0eULL),
+ vcreate_f64 (0x3fa88480812d6670ULL),
+ vcreate_f64 (0x3fd5791ae2a92572ULL));
 
   if (__builtin_fabs (expected - actual) > EPS)
 abort ();
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c
index afbb8a892c6..af6ca6ff11e 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c
@@ -7,33 +7,24 @@
 
 #define EPS 1.0e-15
 
-#define INHIB_OPT(x) asm volatile ("mov %d0, %1.d[0]"   \
-   : "=w"(x)   \
-   : "w"(x)   

[pushed] testsuite: Two adjustments to gcc.dg/vect/complex

2023-01-27 Thread Richard Sandiford via Gcc-patches
fast-math-bb-slp-complex-add-pattern-half-float.c no longer fails.
The scans in (loop test) fast-math-complex-add-half-float.c were
marked UNRESOLVED because they scanned slp1 rather than vect.

Tested on aarch64-linux-gnu & pushed as obvious.

Richard


gcc/testsuite/
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Remove XFAIL.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Fix names
of dump files.
---
 .../complex/fast-math-bb-slp-complex-add-pattern-half-float.c | 2 +-
 .../gcc.dg/vect/complex/fast-math-complex-add-half-float.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
 
b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
index 885fd97c5d2..e30df0ff0b0 100644
--- 
a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
+++ 
b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
@@ -12,5 +12,5 @@
 
 /* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { 
target { vect_complex_add_half } } } } */
 /* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { 
target { vect_complex_add_half } && ! target { arm*-*-* } } } } */
-/* { dg-final { scan-tree-dump "Found COMPLEX_ADD_ROT270" "slp1" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump "Found COMPLEX_ADD_ROT270" "slp1" } } */
 /* { dg-final { scan-tree-dump "Found COMPLEX_ADD_ROT90" "slp1" } } */
diff --git 
a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c 
b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
index c656a2f6d56..046f014240b 100644
--- a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
@@ -9,5 +9,5 @@
 
 /* Vectorization is failing for these cases.  They should work but for now 
ignore.  */
 
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { 
xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { 
xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" { 
xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" { 
xfail *-*-* } } } */
-- 
2.25.1



[PATCH][GCC] arm: Optimize arm-mlib.h header inclusion (pr108505).

2023-01-27 Thread Srinath Parvathaneni via Gcc-patches
Hello,

I have committed a fix [1] into gcc trunk for a build issue mentioned in 
pr108505 and
latter received few upstream comments proposing more robust fix for this issue.

In this patch I'm addressing those comments and sending this as a followup 
patch.

Regression tested on arm-none-eabi target and found no regressions.

Ok for master?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610513.html

Regards,
Srinath.

gcc/ChangeLog:

2023-01-27  Srinath Parvathaneni  

PR target/108505
* config.gcc (tm_mlib_file): Define new variable.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
89f56047cfe3126bc6c8e90c8b4840dea13538f9..2aab92bbfd8b4088259ebf9b565af8e8bbef1122
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4355,6 +4355,7 @@ case "${target}" in
case ${arm_multilib} in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
+   tm_mlib_file="arm/arm-mlib.h"
;;
@*)
ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
@@ -4393,7 +4394,7 @@ case "${target}" in
# through to the multilib selector
with_float="soft"
tmake_file="${tmake_file} ${tmake_profile_file}"
-   tm_file="$tm_file arm/arm-mlib.h"
+   tm_file="$tm_file $tm_mlib_file"
TM_MULTILIB_CONFIG="$with_multilib_list"
fi
fi



diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
89f56047cfe3126bc6c8e90c8b4840dea13538f9..2aab92bbfd8b4088259ebf9b565af8e8bbef1122
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4355,6 +4355,7 @@ case "${target}" in
case ${arm_multilib} in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
+   tm_mlib_file="arm/arm-mlib.h"
;;
@*)
ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
@@ -4393,7 +4394,7 @@ case "${target}" in
# through to the multilib selector
with_float="soft"
tmake_file="${tmake_file} ${tmake_profile_file}"
-   tm_file="$tm_file arm/arm-mlib.h"
+   tm_file="$tm_file $tm_mlib_file"
TM_MULTILIB_CONFIG="$with_multilib_list"
fi
fi





Re: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future)

2023-01-27 Thread Segher Boessenkool
Hi!

On Wed, Nov 09, 2022 at 09:43:16PM -0500, Michael Meissner wrote:
> This patch is very preliminary support for a potential new feature to the
> PowerPC that extends the current power10 MMA architecture.  This feature may 
> or
> may not be present in any specific future PowerPC processor.

MMA is an optional facility in ISA 3.1 -- please don't say it is power10
only.

> In the current MMA subsystem for Power10, there are 8 512-bit accumulator
> registers.  These accumulators are each tied to sets of 4 FPR registers.

Four VSRs.  FPRs are only 64bits.  You mean this is VSRs 0..31 .

> When
> you issue a prime instruction, it makes sure the accumulator is a copy of the 
> 4

I suppose you mean the xxmtacc instruction?

> FPR registers the accumulator is tied to.  When you issue a deprime
> instruction, it makes sure that the accumulator data content is logically
> copied to the matching FPR register.

And xxmfacc.

Very importantly all the other rules in 7.2.1.3 "VSX Accumulators"
apply as well.  That should make old code work on new systems
transparently.

> In terms of changes, we now use the wD constraint for accumulators.  If you
> compile with -mcpu=power10, the wD constraint will match the equivalent FPR
> register that overlaps with the accumulator.

The set of *four* *VSX* registers.  Of course in the end it is just a
number, but :-)

> If you compile with -mcpu=future,
> the wD constraint will match the DMR register and not the FPR register.

Constraints do not "match" anything.  "Will allow" perhaps?

> In general, if you only use the built-in functions, things work between the 
> two
> systems.  If you use extended asm, you will likely need to modify the code.
> Going forward, hopefully if you modify your code to use the wD constraint and
> %A output modifier, you can write code that switches more easily between the
> two systems.

You *already* are required to follow all these rules that make this
painless and transparent.

> There is one bug that I noticed.  When you use the full DMR instruction the
> constant copy propagation patch issues internal errors.  I believe this is due
> to the CCP pass not handling opaque types cleanly enough, and it only shows up
> in larger types.  I would like to get these patches committed, and then work
> the maintainers of the CCP to fix the problem.

Erm.  If the compiler ICEs, we can not include this code.  But hopefully
you mean something else?


Segher


Re: Ping: [PATCH 1/6] PowerPC: Add -mcpu=future

2023-01-27 Thread Segher Boessenkool
On Fri, Jan 20, 2023 at 04:05:58PM -0500, Michael Meissner wrote:
> Ping patch.  We really would like the patches to enable the possible future
> MMA+ instructions into GCC 13.

Please send a version with Peter's comments taken into account?


Segher


[PATCH] tree: Fix up tree_code_{length,type}

2023-01-27 Thread Maciej Cencora via Gcc-patches
Hi,

you can emulate C++17 inline variables in C++11 with either of the two ways:

1) via a template helper
template 
struct Helper
{
static constexpr unsigned value[4] = {1, 2, 3, 4};
};

template 
constexpr unsigned Helper::value[4];

static constexpr auto& arr = Helper<>::value;

2) extern constexpr + weak attribute
[[gnu::weak]] extern constexpr unsigned arr[] = {1, 2, 3, 4};

Regards,
Maciej


Re: [PATCH] driver: fix -gz=none error message with missing zstd

2023-01-27 Thread Joseph Myers
On Fri, 27 Jan 2023, Martin Liška wrote:

> We wrongly report:
> 
> $ echo "int main () {}" | gcc -xc -gz=none -
> gcc: error: -gz=zstd is not supported in this configuration
> 
> if zstd compression is not supported by binutils. We should emit the
> error message only if -gz=zstd.
> 
>   PR driver/108572
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * gcc.cc (LINK_COMPRESS_DEBUG_SPEC): Report error only for
>   -gz=zstd.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] c: Disallow braces around C2x auto initializers

2023-01-27 Thread Joseph Myers
WG14 agreed at this week's meeting to remove support for braces around
auto scalar initializers, as incompatible with C++ auto handling of
braced initializers; thus remove that support in GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-parser.cc (c_parser_declaration_or_fndef): Do not allow braces
around auto initializer.

gcc/testsuite/
* gcc.dg/c2x-auto-1.c, gcc.dg/c2x-auto-3.c: Expect braces around
auto initializers to be disallowed.

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 803b04b8dc1..69230002bc8 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -2480,18 +2480,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  int flag_sanitize_save = flag_sanitize;
  if (nested && !empty_ok)
flag_sanitize = 0;
- if (std_auto_type_p
- && c_parser_next_token_is (parser, CPP_OPEN_BRACE))
-   {
- matching_braces braces;
- braces.consume_open (parser);
- init = c_parser_expr_no_commas (parser, NULL);
- if (c_parser_next_token_is (parser, CPP_COMMA))
-   c_parser_consume_token (parser);
- braces.skip_until_found_close (parser);
-   }
- else
-   init = c_parser_expr_no_commas (parser, NULL);
+ init = c_parser_expr_no_commas (parser, NULL);
  if (std_auto_type_p)
finish_underspecified_init (underspec_name,
underspec_state);
diff --git a/gcc/testsuite/gcc.dg/c2x-auto-1.c 
b/gcc/testsuite/gcc.dg/c2x-auto-1.c
index f8460fb3bfb..c50daccfe89 100644
--- a/gcc/testsuite/gcc.dg/c2x-auto-1.c
+++ b/gcc/testsuite/gcc.dg/c2x-auto-1.c
@@ -4,14 +4,14 @@
 
 auto i = 1;
 extern int i;
-static auto l = { 0L };
+static auto l = 0L;
 extern long l;
 extern auto const d = 0.0; /* { dg-warning "initialized and declared 'extern'" 
} */
 extern const double d;
 double dx;
 auto ((i2)) = 3;
 extern int i2;
-const auto i3 [[]] = { 4, };
+const auto i3 [[]] = 4;
 extern int i4;
 thread_local auto f = 1.0f;
 float ff;
diff --git a/gcc/testsuite/gcc.dg/c2x-auto-3.c 
b/gcc/testsuite/gcc.dg/c2x-auto-3.c
index a34ce31f6be..1ab3cc74d35 100644
--- a/gcc/testsuite/gcc.dg/c2x-auto-3.c
+++ b/gcc/testsuite/gcc.dg/c2x-auto-3.c
@@ -62,3 +62,10 @@ f5 ()
 {
   static int auto e10 = 3; /* { dg-error "multiple storage classes in 
declaration specifiers" } */
 }
+
+void
+f6 ()
+{
+  static auto l = { 0L }; /* { dg-error "expected expression" } */
+  const auto i3 [[]] = { 4, }; /* { dg-error "expected expression" } */
+}

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 2/2] c++: speculative constexpr and is_constant_evaluated [PR108243]

2023-01-27 Thread Patrick Palka via Gcc-patches
This PR illustrates that __builtin_is_constant_evaluated currently acts
as an optimization barrier for our speculative constexpr evaluation,
since we don't want to prematurely fold the builtin to false if the
expression in question would be later manifestly constant evaluated (in
which case it must be folded to true).

This patch fixes this by permitting __builtin_is_constant_evaluated
to get folded as false during cp_fold_function, since at that point
we're sure we're doing manifestly constant evaluation.  To that end
we add a flags parameter to cp_fold that controls what mce_value the
CALL_EXPR case passes to maybe_constant_value.

bootstrapped and rgetsted no x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/108243

gcc/cp/ChangeLog:

* cp-gimplify.cc (enum fold_flags): Define.
(cp_fold_data::genericize): Replace this data member with ...
(cp_fold_data::fold_flags): ... this.
(cp_fold_r): Adjust cp_fold_data use and cp_fold_calls.
(cp_fold_function): Likewise.
(cp_fold_maybe_rvalue): Likewise.
(cp_fully_fold_init): Likewise.
(cp_fold): Add fold_flags parameter.  Don't cache if flags
isn't empty.
: Pass mce_false to maybe_constant_value
if if ff_genericize is set.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr108243.C: New test.
---
 gcc/cp/cp-gimplify.cc   | 76 ++---
 gcc/testsuite/g++.dg/opt/pr108243.C | 29 +++
 2 files changed, 76 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr108243.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index a35cedd05cc..d023a63768f 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -43,12 +43,20 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-general.h"
 #include "opts.h"
 
+/* Flags for cp_fold and cp_fold_r.  */
+
+enum fold_flags {
+  ff_none = 0,
+  /* Whether we're being called from cp_fold_function.  */
+  ff_genericize = 1 << 0,
+};
+
 /* Forward declarations.  */
 
 static tree cp_genericize_r (tree *, int *, void *);
 static tree cp_fold_r (tree *, int *, void *);
 static void cp_genericize_tree (tree*, bool);
-static tree cp_fold (tree);
+static tree cp_fold (tree, fold_flags);
 
 /* Genericize a TRY_BLOCK.  */
 
@@ -996,9 +1004,8 @@ struct cp_genericize_data
 struct cp_fold_data
 {
   hash_set pset;
-  bool genericize; // called from cp_fold_function?
-
-  cp_fold_data (bool g): genericize (g) {}
+  fold_flags flags;
+  cp_fold_data (fold_flags flags): flags (flags) {}
 };
 
 static tree
@@ -1039,7 +1046,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
   break;
 }
 
-  *stmt_p = stmt = cp_fold (*stmt_p);
+  *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
 
   if (data->pset.add (stmt))
 {
@@ -1119,12 +1126,12 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
*data_)
 here rather than in cp_genericize to avoid problems with the invisible
 reference transition.  */
 case INIT_EXPR:
-  if (data->genericize)
+  if (data->flags & ff_genericize)
cp_genericize_init_expr (stmt_p);
   break;
 
 case TARGET_EXPR:
-  if (data->genericize)
+  if (data->flags & ff_genericize)
cp_genericize_target_expr (stmt_p);
 
   /* Folding might replace e.g. a COND_EXPR with a TARGET_EXPR; in
@@ -1157,7 +1164,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
 void
 cp_fold_function (tree fndecl)
 {
-  cp_fold_data data (/*genericize*/true);
+  cp_fold_data data (ff_genericize);
   cp_walk_tree (&DECL_SAVED_TREE (fndecl), cp_fold_r, &data, NULL);
 }
 
@@ -2375,7 +2382,7 @@ cp_fold_maybe_rvalue (tree x, bool rval)
 {
   while (true)
 {
-  x = cp_fold (x);
+  x = cp_fold (x, ff_none);
   if (rval)
x = mark_rvalue_use (x);
   if (rval && DECL_P (x)
@@ -2434,7 +2441,7 @@ cp_fully_fold_init (tree x)
   if (processing_template_decl)
 return x;
   x = cp_fully_fold (x);
-  cp_fold_data data (/*genericize*/false);
+  cp_fold_data data (ff_none);
   cp_walk_tree (&x, cp_fold_r, &data, NULL);
   return x;
 }
@@ -2469,7 +2476,7 @@ clear_fold_cache (void)
 Function returns X or its folded variant.  */
 
 static tree
-cp_fold (tree x)
+cp_fold (tree x, fold_flags flags)
 {
   tree op0, op1, op2, op3;
   tree org_x = x, r = NULL_TREE;
@@ -2490,8 +2497,11 @@ cp_fold (tree x)
   if (fold_cache == NULL)
 fold_cache = hash_map::create_ggc (101);
 
-  if (tree *cached = fold_cache->get (x))
-return *cached;
+  bool cache_p = (flags == ff_none);
+
+  if (cache_p)
+if (tree *cached = fold_cache->get (x))
+  return *cached;
 
   uid_sensitive_constexpr_evaluation_checker c;
 
@@ -2526,7 +2536,7 @@ cp_fold (tree x)
 Don't create a new tree if op0 != TREE_OPERAND (x, 0), the
 folding of the operand should be in the caches and if in cp_fold_r
 it will modify it in place.  */
- op0 =

[PATCH 1/2] c++: make manifestly_const_eval tri-state

2023-01-27 Thread Patrick Palka via Gcc-patches
This patch turns the manifestly_const_eval flag used by the constexpr
machinery into a tri-state enum so that we're able to express wanting
to fold __builtin_is_constant_evaluated to false via late speculative
constexpr evaluation.  Of all the entry points to constexpr evaluation
only maybe_constant_value is changed to take a tri-state value; the
others continue to take bool.  The subsequent patch will use this to fold
the builtin to false when called from cp_fold_function.

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_call::manifestly_const_eval): Give
it type int instead of bool.
(constexpr_ctx::manifestly_const_eval): Give it type mce_value
instead of bool.
(cxx_eval_builtin_function_call): Adjust after making
manifestly_const_eval tri-state.
(cxx_eval_call_expression): Likewise.
(cxx_eval_binary_expression): Likewise.
(cxx_eval_conditional_expression): Likewise.
(cxx_eval_constant_expression): Likewise.
(cxx_eval_outermost_constant_expr): Likewise.
(cxx_constant_value): Likewise.
(cxx_constant_dtor): Likewise.
(maybe_constant_value): Give manifestly_const_eval parameter
type mce_value instead of bool and adjust accordingly.
(fold_non_dependent_expr_template): Adjust call
to cxx_eval_outermost_constant_expr.
(fold_non_dependent_expr): Likewise.
(maybe_constant_init_1): Likewise.
* constraint.cc (satisfy_atom): Adjust call to
maybe_constant_value.
* cp-tree.h (enum class mce_value): Define.
(maybe_constant_value): Adjust manifestly_const_eval parameter
type and default argument.
* decl.cc (compute_array_index_type_loc): Adjust call to
maybe_constant_value.
* pt.cc (convert_nontype_argument): Likewise.
---
 gcc/cp/constexpr.cc  | 61 
 gcc/cp/constraint.cc |  3 +--
 gcc/cp/cp-tree.h | 18 -
 gcc/cp/decl.cc   |  2 +-
 gcc/cp/pt.cc |  6 ++---
 5 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index be99bec17e7..34662198903 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1119,8 +1119,8 @@ struct GTY((for_user)) constexpr_call {
   /* The hash of this call; we remember it here to avoid having to
  recalculate it when expanding the hash table.  */
   hashval_t hash;
-  /* Whether __builtin_is_constant_evaluated() should evaluate to true.  */
-  bool manifestly_const_eval;
+  /* The raw value of constexpr_ctx::manifestly_const_eval.  */
+  int manifestly_const_eval;
 };
 
 struct constexpr_call_hasher : ggc_ptr_hash
@@ -1248,7 +1248,7 @@ struct constexpr_ctx {
  trying harder to get a constant value.  */
   bool strict;
   /* Whether __builtin_is_constant_evaluated () should be true.  */
-  bool manifestly_const_eval;
+  mce_value manifestly_const_eval;
 };
 
 /* This internal flag controls whether we should avoid doing anything during
@@ -1463,7 +1463,7 @@ cxx_eval_builtin_function_call (const constexpr_ctx *ctx, 
tree t, tree fun,
   /* If we aren't requiring a constant expression, defer __builtin_constant_p
  in a constexpr function until we have values for the parameters.  */
   if (bi_const_p
-  && !ctx->manifestly_const_eval
+  && ctx->manifestly_const_eval == mce_unknown
   && current_function_decl
   && DECL_DECLARED_CONSTEXPR_P (current_function_decl))
 {
@@ -1479,12 +1479,13 @@ cxx_eval_builtin_function_call (const constexpr_ctx 
*ctx, tree t, tree fun,
   if (fndecl_built_in_p (fun, CP_BUILT_IN_IS_CONSTANT_EVALUATED,
 BUILT_IN_FRONTEND))
 {
-  if (!ctx->manifestly_const_eval)
+  if (ctx->manifestly_const_eval == mce_unknown)
{
  *non_constant_p = true;
  return t;
}
-  return boolean_true_node;
+  return constant_boolean_node (ctx->manifestly_const_eval == mce_true,
+   boolean_type_node);
 }
 
   if (fndecl_built_in_p (fun, CP_BUILT_IN_SOURCE_LOCATION, BUILT_IN_FRONTEND))
@@ -1591,7 +1592,7 @@ cxx_eval_builtin_function_call (const constexpr_ctx *ctx, 
tree t, tree fun,
 }
 
   bool save_ffbcp = force_folding_builtin_constant_p;
-  force_folding_builtin_constant_p |= ctx->manifestly_const_eval;
+  force_folding_builtin_constant_p |= ctx->manifestly_const_eval != 
mce_unknown;
   tree save_cur_fn = current_function_decl;
   /* Return name of ctx->call->fundef->decl for __builtin_FUNCTION ().  */
   if (fndecl_built_in_p (fun, BUILT_IN_FUNCTION)
@@ -2644,7 +2645,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   location_t loc = cp_expr_loc_or_input_loc (t);
   tree fun = get_function_named_in_call (t);
   constexpr_call new_call
-= { NULL, NULL, NULL, 0, ctx->manifestly_const_eval };
+= { NULL, NULL, NULL, 0, (int)ctx->manifestly_const_eval };
   int depth_ok;
 
   if (fun == NUL

Re: [PATCH 2/2] c++: speculative constexpr and is_constant_evaluated [PR108243]

2023-01-27 Thread Patrick Palka via Gcc-patches
On Fri, 27 Jan 2023, Patrick Palka wrote:

> This PR illustrates that __builtin_is_constant_evaluated currently acts
> as an optimization barrier for our speculative constexpr evaluation,
> since we don't want to prematurely fold the builtin to false if the
> expression in question would be later manifestly constant evaluated (in
> which case it must be folded to true).
> 
> This patch fixes this by permitting __builtin_is_constant_evaluated
> to get folded as false during cp_fold_function, since at that point
> we're sure we're doing manifestly constant evaluation.  To that end

"we're sure we're done with manifestly constant evaluation" rather

> we add a flags parameter to cp_fold that controls what mce_value the
> CALL_EXPR case passes to maybe_constant_value.
> 
> bootstrapped and rgetsted no x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/108243
> 
> gcc/cp/ChangeLog:
> 
>   * cp-gimplify.cc (enum fold_flags): Define.
>   (cp_fold_data::genericize): Replace this data member with ...
>   (cp_fold_data::fold_flags): ... this.
>   (cp_fold_r): Adjust cp_fold_data use and cp_fold_calls.
>   (cp_fold_function): Likewise.
>   (cp_fold_maybe_rvalue): Likewise.
>   (cp_fully_fold_init): Likewise.
>   (cp_fold): Add fold_flags parameter.  Don't cache if flags
>   isn't empty.
>   : Pass mce_false to maybe_constant_value
>   if if ff_genericize is set.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/opt/pr108243.C: New test.
> ---
>  gcc/cp/cp-gimplify.cc   | 76 ++---
>  gcc/testsuite/g++.dg/opt/pr108243.C | 29 +++
>  2 files changed, 76 insertions(+), 29 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/opt/pr108243.C
> 
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index a35cedd05cc..d023a63768f 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -43,12 +43,20 @@ along with GCC; see the file COPYING3.  If not see
>  #include "omp-general.h"
>  #include "opts.h"
>  
> +/* Flags for cp_fold and cp_fold_r.  */
> +
> +enum fold_flags {
> +  ff_none = 0,
> +  /* Whether we're being called from cp_fold_function.  */
> +  ff_genericize = 1 << 0,
> +};
> +
>  /* Forward declarations.  */
>  
>  static tree cp_genericize_r (tree *, int *, void *);
>  static tree cp_fold_r (tree *, int *, void *);
>  static void cp_genericize_tree (tree*, bool);
> -static tree cp_fold (tree);
> +static tree cp_fold (tree, fold_flags);
>  
>  /* Genericize a TRY_BLOCK.  */
>  
> @@ -996,9 +1004,8 @@ struct cp_genericize_data
>  struct cp_fold_data
>  {
>hash_set pset;
> -  bool genericize; // called from cp_fold_function?
> -
> -  cp_fold_data (bool g): genericize (g) {}
> +  fold_flags flags;
> +  cp_fold_data (fold_flags flags): flags (flags) {}
>  };
>  
>  static tree
> @@ -1039,7 +1046,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
> *data_)
>break;
>  }
>  
> -  *stmt_p = stmt = cp_fold (*stmt_p);
> +  *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
>  
>if (data->pset.add (stmt))
>  {
> @@ -1119,12 +1126,12 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
> *data_)
>here rather than in cp_genericize to avoid problems with the invisible
>reference transition.  */
>  case INIT_EXPR:
> -  if (data->genericize)
> +  if (data->flags & ff_genericize)
>   cp_genericize_init_expr (stmt_p);
>break;
>  
>  case TARGET_EXPR:
> -  if (data->genericize)
> +  if (data->flags & ff_genericize)
>   cp_genericize_target_expr (stmt_p);
>  
>/* Folding might replace e.g. a COND_EXPR with a TARGET_EXPR; in
> @@ -1157,7 +1164,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
> *data_)
>  void
>  cp_fold_function (tree fndecl)
>  {
> -  cp_fold_data data (/*genericize*/true);
> +  cp_fold_data data (ff_genericize);
>cp_walk_tree (&DECL_SAVED_TREE (fndecl), cp_fold_r, &data, NULL);
>  }
>  
> @@ -2375,7 +2382,7 @@ cp_fold_maybe_rvalue (tree x, bool rval)
>  {
>while (true)
>  {
> -  x = cp_fold (x);
> +  x = cp_fold (x, ff_none);
>if (rval)
>   x = mark_rvalue_use (x);
>if (rval && DECL_P (x)
> @@ -2434,7 +2441,7 @@ cp_fully_fold_init (tree x)
>if (processing_template_decl)
>  return x;
>x = cp_fully_fold (x);
> -  cp_fold_data data (/*genericize*/false);
> +  cp_fold_data data (ff_none);
>cp_walk_tree (&x, cp_fold_r, &data, NULL);
>return x;
>  }
> @@ -2469,7 +2476,7 @@ clear_fold_cache (void)
>  Function returns X or its folded variant.  */
>  
>  static tree
> -cp_fold (tree x)
> +cp_fold (tree x, fold_flags flags)
>  {
>tree op0, op1, op2, op3;
>tree org_x = x, r = NULL_TREE;
> @@ -2490,8 +2497,11 @@ cp_fold (tree x)
>if (fold_cache == NULL)
>  fold_cache = hash_map::create_ggc (101);
>  
> -  if (tree *cached = fold_cache->get (x))
> -return *cached;
> +  bool cache_p = (flags == ff_none);
> +
> +  if (cac

Re: [PATCH] c++: fix ICE with -Wduplicated-cond [PR107593]

2023-01-27 Thread Patrick Palka via Gcc-patches
On Thu, 26 Jan 2023, Marek Polacek via Gcc-patches wrote:

> Here we crash because a CAST_EXPR, representing T(), doesn't have
> its operand, and operand_equal_p's STRIP_ANY_LOCATION_WRAPPER doesn't
> expect that.  (o_e_p is called from warn_duplicated_cond_add_or_warn.)
> 
> In the past we've adjusted o_e_p to better cope with template codes,
> but in this case I think we just want to avoid attempting to warn
> about inst-dependent expressions; I don't think I've ever envisioned
> -Wduplicated-cond to warn about them.
> 
> The ICE started with r12-6022, two-stage name lookup for overloaded
> operators, which gave dependent operators a TREE_TYPE (in particular,
> DEPENDENT_OPERATOR_TYPE), so we no longer bail out here in o_e_p:
> 
>   /* Similar, if either does not have a type (like a template id),
>  they aren't equal.  */
>   if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> return false;
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
>   PR c++/107593
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_selection_statement): Don't do
>   -Wduplicated-cond when the condition is dependent.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/warn/Wduplicated-cond3.C: New test.
> ---
>  gcc/cp/parser.cc  |  3 +-
>  gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C | 38 +++
>  2 files changed, 40 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 4cdc1cd472f..3df85d49e16 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -13209,7 +13209,8 @@ cp_parser_selection_statement (cp_parser* parser, 
> bool *if_p,
>   /* Add the condition.  */
>   condition = finish_if_stmt_cond (condition, statement);
>  
> - if (warn_duplicated_cond)
> + if (warn_duplicated_cond
> + && !instantiation_dependent_expression_p (condition))
> warn_duplicated_cond_add_or_warn (token->location, condition,
>   &chain);

I noticed warn_duplicated_cond_add_or_warn already has logic to handle
TREE_SIDE_EFFECTS conditions by invaliding the entire chain.  I wonder
if we'd want to do the same for instantiation-dep conditions?

>  
> diff --git a/gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C 
> b/gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
> new file mode 100644
> index 000..3da054e5485
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
> @@ -0,0 +1,38 @@
> +// PR c++/107593
> +// { dg-do compile }
> +// { dg-options "-Wduplicated-cond" }
> +
> +template 
> +void
> +foo ()
> +{
> +  if (T() && T() && int())
> +;
> +  else if (T() && T() && int())
> +;
> +}
> +
> +template 
> +void bar(T a)
> +{
> +  if (a)
> +;
> +  else if (a)
> +;
> +}
> +
> +template 
> +void baz(int a)
> +{
> +  if (a)
> +;
> +  else if (a) // { dg-warning "duplicated" }
> +;
> +}
> +void
> +f ()
> +{
> +  foo();
> +  bar(1);
> +  baz(1);
> +}
> 
> base-commit: 94673a121cfc7f9d51c9d05e31795477f4dc8dc7
> -- 
> 2.39.1
> 
> 



Re: [PATCH] c++: fix ICE with -Wduplicated-cond [PR107593]

2023-01-27 Thread Jason Merrill via Gcc-patches

On 1/27/23 17:15, Patrick Palka wrote:

On Thu, 26 Jan 2023, Marek Polacek via Gcc-patches wrote:


Here we crash because a CAST_EXPR, representing T(), doesn't have
its operand, and operand_equal_p's STRIP_ANY_LOCATION_WRAPPER doesn't
expect that.  (o_e_p is called from warn_duplicated_cond_add_or_warn.)

In the past we've adjusted o_e_p to better cope with template codes,
but in this case I think we just want to avoid attempting to warn
about inst-dependent expressions; I don't think I've ever envisioned
-Wduplicated-cond to warn about them.

The ICE started with r12-6022, two-stage name lookup for overloaded
operators, which gave dependent operators a TREE_TYPE (in particular,
DEPENDENT_OPERATOR_TYPE), so we no longer bail out here in o_e_p:

   /* Similar, if either does not have a type (like a template id),
  they aren't equal.  */
   if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
 return false;

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/107593

gcc/cp/ChangeLog:

* parser.cc (cp_parser_selection_statement): Don't do
-Wduplicated-cond when the condition is dependent.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wduplicated-cond3.C: New test.
---
  gcc/cp/parser.cc  |  3 +-
  gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C | 38 +++
  2 files changed, 40 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4cdc1cd472f..3df85d49e16 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13209,7 +13209,8 @@ cp_parser_selection_statement (cp_parser* parser, bool 
*if_p,
/* Add the condition.  */
condition = finish_if_stmt_cond (condition, statement);
  
-	if (warn_duplicated_cond)

+   if (warn_duplicated_cond
+   && !instantiation_dependent_expression_p (condition))
  warn_duplicated_cond_add_or_warn (token->location, condition,
&chain);


I noticed warn_duplicated_cond_add_or_warn already has logic to handle
TREE_SIDE_EFFECTS conditions by invaliding the entire chain.  I wonder
if we'd want to do the same for instantiation-dep conditions?


Makes sense.

  
diff --git a/gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C b/gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C

new file mode 100644
index 000..3da054e5485
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
@@ -0,0 +1,38 @@
+// PR c++/107593
+// { dg-do compile }
+// { dg-options "-Wduplicated-cond" }
+
+template 
+void
+foo ()
+{
+  if (T() && T() && int())
+;
+  else if (T() && T() && int())
+;
+}
+
+template 
+void bar(T a)
+{
+  if (a)
+;
+  else if (a)
+;
+}
+
+template 
+void baz(int a)
+{
+  if (a)
+;
+  else if (a) // { dg-warning "duplicated" }
+;
+}
+void
+f ()
+{
+  foo();
+  bar(1);
+  baz(1);
+}

base-commit: 94673a121cfc7f9d51c9d05e31795477f4dc8dc7
--
2.39.1








Re: [PATCH] c++: fix ICE with -Wduplicated-cond [PR107593]

2023-01-27 Thread Marek Polacek via Gcc-patches
On Fri, Jan 27, 2023 at 05:15:00PM -0500, Patrick Palka wrote:
> On Thu, 26 Jan 2023, Marek Polacek via Gcc-patches wrote:
> 
> > Here we crash because a CAST_EXPR, representing T(), doesn't have
> > its operand, and operand_equal_p's STRIP_ANY_LOCATION_WRAPPER doesn't
> > expect that.  (o_e_p is called from warn_duplicated_cond_add_or_warn.)
> > 
> > In the past we've adjusted o_e_p to better cope with template codes,
> > but in this case I think we just want to avoid attempting to warn
> > about inst-dependent expressions; I don't think I've ever envisioned
> > -Wduplicated-cond to warn about them.
> > 
> > The ICE started with r12-6022, two-stage name lookup for overloaded
> > operators, which gave dependent operators a TREE_TYPE (in particular,
> > DEPENDENT_OPERATOR_TYPE), so we no longer bail out here in o_e_p:
> > 
> >   /* Similar, if either does not have a type (like a template id),
> >  they aren't equal.  */
> >   if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> > return false;
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > PR c++/107593
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (cp_parser_selection_statement): Don't do
> > -Wduplicated-cond when the condition is dependent.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/warn/Wduplicated-cond3.C: New test.
> > ---
> >  gcc/cp/parser.cc  |  3 +-
> >  gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C | 38 +++
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
> > 
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index 4cdc1cd472f..3df85d49e16 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -13209,7 +13209,8 @@ cp_parser_selection_statement (cp_parser* parser, 
> > bool *if_p,
> > /* Add the condition.  */
> > condition = finish_if_stmt_cond (condition, statement);
> >  
> > -   if (warn_duplicated_cond)
> > +   if (warn_duplicated_cond
> > +   && !instantiation_dependent_expression_p (condition))
> >   warn_duplicated_cond_add_or_warn (token->location, condition,
> > &chain);
> 
> I noticed warn_duplicated_cond_add_or_warn already has logic to handle
> TREE_SIDE_EFFECTS conditions by invaliding the entire chain.  I wonder
> if we'd want to do the same for instantiation-dep conditions?

warn_duplicated_cond_add_or_warn lives in c-family/c-warn.cc so I can't
use instantiation_dependent_expression_p there.  Sure, I could write a
C++ wrapper but with my patch we just won't add CONDITION to the chain
which I thought would work just as well.

Marek



[PATCH] RISC-V: Remove redundant attributes

2023-01-27 Thread juzhe . zhong
From: Ju-Zhe Zhong 

---
 gcc/config/riscv/vector.md | 20 
 1 file changed, 20 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 8c60eb20d72..4319266974d 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -208,26 +208,6 @@
 (const_int 4)]
(const_int INVALID_ATTRIBUTE)))
 
-;; The index of operand[] to get the tail policy op.
-(define_attr "tail_policy_op_idx" ""
-  (cond [(eq_attr "type" "vlde,vste,vimov,vfmov,vlds")
-(const_int 5)]
-   (const_int INVALID_ATTRIBUTE)))
-
-;; The index of operand[] to get the mask policy op.
-(define_attr "mask_policy_op_idx" ""
-  (cond [(eq_attr "type" "vlde,vste,vlds")
-(const_int 6)]
-   (const_int INVALID_ATTRIBUTE)))
-
-;; The index of operand[] to get the mask policy op.
-(define_attr "avl_type_op_idx" ""
-  (cond [(eq_attr "type" "vlde,vlde,vste,vimov,vimov,vimov,vfmov,vlds,vlds")
-(const_int 7)
-(eq_attr "type" "vldm,vstm,vimov,vmalu,vmalu")
-(const_int 5)]
-   (const_int INVALID_ATTRIBUTE)))
-
 ;; The tail policy op value.
 (define_attr "ta" ""
   (cond [(eq_attr "type" "vlde,vimov,vfmov,vlds")
-- 
2.36.3



Re: [PATCH] c++: fix ICE with -Wduplicated-cond [PR107593]

2023-01-27 Thread Patrick Palka via Gcc-patches
On Fri, 27 Jan 2023, Marek Polacek wrote:

> On Fri, Jan 27, 2023 at 05:15:00PM -0500, Patrick Palka wrote:
> > On Thu, 26 Jan 2023, Marek Polacek via Gcc-patches wrote:
> > 
> > > Here we crash because a CAST_EXPR, representing T(), doesn't have
> > > its operand, and operand_equal_p's STRIP_ANY_LOCATION_WRAPPER doesn't
> > > expect that.  (o_e_p is called from warn_duplicated_cond_add_or_warn.)
> > > 
> > > In the past we've adjusted o_e_p to better cope with template codes,
> > > but in this case I think we just want to avoid attempting to warn
> > > about inst-dependent expressions; I don't think I've ever envisioned
> > > -Wduplicated-cond to warn about them.
> > > 
> > > The ICE started with r12-6022, two-stage name lookup for overloaded
> > > operators, which gave dependent operators a TREE_TYPE (in particular,
> > > DEPENDENT_OPERATOR_TYPE), so we no longer bail out here in o_e_p:
> > > 
> > >   /* Similar, if either does not have a type (like a template id),
> > >  they aren't equal.  */
> > >   if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> > > return false;
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > >   PR c++/107593
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * parser.cc (cp_parser_selection_statement): Don't do
> > >   -Wduplicated-cond when the condition is dependent.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/warn/Wduplicated-cond3.C: New test.
> > > ---
> > >  gcc/cp/parser.cc  |  3 +-
> > >  gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C | 38 +++
> > >  2 files changed, 40 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
> > > 
> > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > index 4cdc1cd472f..3df85d49e16 100644
> > > --- a/gcc/cp/parser.cc
> > > +++ b/gcc/cp/parser.cc
> > > @@ -13209,7 +13209,8 @@ cp_parser_selection_statement (cp_parser* parser, 
> > > bool *if_p,
> > >   /* Add the condition.  */
> > >   condition = finish_if_stmt_cond (condition, statement);
> > >  
> > > - if (warn_duplicated_cond)
> > > + if (warn_duplicated_cond
> > > + && !instantiation_dependent_expression_p (condition))
> > > warn_duplicated_cond_add_or_warn (token->location, condition,
> > >   &chain);
> > 
> > I noticed warn_duplicated_cond_add_or_warn already has logic to handle
> > TREE_SIDE_EFFECTS conditions by invaliding the entire chain.  I wonder
> > if we'd want to do the same for instantiation-dep conditions?
> 
> warn_duplicated_cond_add_or_warn lives in c-family/c-warn.cc so I can't
> use instantiation_dependent_expression_p there.  Sure, I could write a
> C++ wrapper but with my patch we just won't add CONDITION to the chain
> which I thought would work just as well.

Ah that's unfortunate :( ISTM desirable to conservatively assume an
inst-dep cond has side effects though (possibly directly from
cp_parser_selection_statement), to avoid false positives as in:

  int n;

  template bool g() { n = 42; }

  template
  void f() {
if (n)
  ;
else if (g())
  ;
else if (n)
  ;
  }



Re: [PATCH] c++: fix ICE with -Wduplicated-cond [PR107593]

2023-01-27 Thread Patrick Palka via Gcc-patches
On Fri, 27 Jan 2023, Patrick Palka wrote:

> On Fri, 27 Jan 2023, Marek Polacek wrote:
> 
> > On Fri, Jan 27, 2023 at 05:15:00PM -0500, Patrick Palka wrote:
> > > On Thu, 26 Jan 2023, Marek Polacek via Gcc-patches wrote:
> > > 
> > > > Here we crash because a CAST_EXPR, representing T(), doesn't have
> > > > its operand, and operand_equal_p's STRIP_ANY_LOCATION_WRAPPER doesn't
> > > > expect that.  (o_e_p is called from warn_duplicated_cond_add_or_warn.)
> > > > 
> > > > In the past we've adjusted o_e_p to better cope with template codes,
> > > > but in this case I think we just want to avoid attempting to warn
> > > > about inst-dependent expressions; I don't think I've ever envisioned
> > > > -Wduplicated-cond to warn about them.
> > > > 
> > > > The ICE started with r12-6022, two-stage name lookup for overloaded
> > > > operators, which gave dependent operators a TREE_TYPE (in particular,
> > > > DEPENDENT_OPERATOR_TYPE), so we no longer bail out here in o_e_p:
> > > > 
> > > >   /* Similar, if either does not have a type (like a template id),
> > > >  they aren't equal.  */
> > > >   if (!TREE_TYPE (arg0) || !TREE_TYPE (arg1))
> > > > return false;
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > PR c++/107593
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.cc (cp_parser_selection_statement): Don't do
> > > > -Wduplicated-cond when the condition is dependent.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/warn/Wduplicated-cond3.C: New test.
> > > > ---
> > > >  gcc/cp/parser.cc  |  3 +-
> > > >  gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C | 38 +++
> > > >  2 files changed, 40 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/g++.dg/warn/Wduplicated-cond3.C
> > > > 
> > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > index 4cdc1cd472f..3df85d49e16 100644
> > > > --- a/gcc/cp/parser.cc
> > > > +++ b/gcc/cp/parser.cc
> > > > @@ -13209,7 +13209,8 @@ cp_parser_selection_statement (cp_parser* 
> > > > parser, bool *if_p,
> > > > /* Add the condition.  */
> > > > condition = finish_if_stmt_cond (condition, statement);
> > > >  
> > > > -   if (warn_duplicated_cond)
> > > > +   if (warn_duplicated_cond
> > > > +   && !instantiation_dependent_expression_p (condition))
> > > >   warn_duplicated_cond_add_or_warn (token->location, 
> > > > condition,
> > > > &chain);
> > > 
> > > I noticed warn_duplicated_cond_add_or_warn already has logic to handle
> > > TREE_SIDE_EFFECTS conditions by invaliding the entire chain.  I wonder
> > > if we'd want to do the same for instantiation-dep conditions?
> > 
> > warn_duplicated_cond_add_or_warn lives in c-family/c-warn.cc so I can't
> > use instantiation_dependent_expression_p there.  Sure, I could write a
> > C++ wrapper but with my patch we just won't add CONDITION to the chain
> > which I thought would work just as well.
> 
> Ah that's unfortunate :( ISTM desirable to conservatively assume an
> inst-dep cond has side effects though (possibly directly from

oops, "has side effects and clear the chain" rather

> cp_parser_selection_statement), to avoid false positives as in:
> 
>   int n;
> 
>   template bool g() { n = 42; }
> 
>   template
>   void f() {
> if (n)
>   ;
> else if (g())
>   ;
> else if (n)
>   ;
>   }
> 



[PATCH] sched-deps, cselib: Fix up some -fcompare-debug issues and regressions [PR108463]

2023-01-27 Thread Jakub Jelinek via Gcc-patches
Hi!

On Sat, Jan 14, 2023 at 08:26:00AM -0300, Alexandre Oliva via Gcc-patches wrote:
> The testcase used to get scheduled differently depending on the
> presence of debug insns with MEMs.  It's not clear to me why those
> MEMs affected scheduling, but the cselib pre-canonicalization of the
> MEM address is not used at all when analyzing debug insns, so the
> memory allocation and lookup are pure waste.  Somehow, avoiding that
> waste fixes the problem, or makes it go latent.

Unfortunately, this patch breaks the following testcase.
The code in sched_analyze_2 did 2 things:
1) cselib_lookup_from_insn
2) shallow_copy_rtx + cselib_subst_to_values_from_insn
Now, 1) is precondition of 2), we can only subst the VALUEs if we
have actually looked the address up, but as can be seen on that testcase,
we are relying on at least the 1) to be done because we subst the values
later on even on DEBUG_INSNs and actually use those when needed.
cselib_subst_to_values_from_insn mostly just replaces stuff in the
returned rtx, except for:
  /* This used to happen for autoincrements, but we deal with them
 properly now.  Remove the if stmt for the next release.  */
  if (! e)
{
  /* Assign a value that doesn't match any other.  */
  e = new_cselib_val (next_uid, GET_MODE (x), x);
}
which is like that since 2011, I hope it is never reachable and we should
in stage1 replace that with gcc_assert or just remove (then it will
segfault on following
  return e->val_rtx;
).

So, I (as done in the patch below) reinstalled the 1) and not 2) for
DEBUG_INSNs.  This fixed the new testcase, but broke again the PR106746
testcases.

I've spent a day debugging that and found the problem is that as documented
in a large comment in cselib.cc above n_useless_values variable definition,
we spend quite a few effort on making sure that VALUEs created on
DEBUG_INSNs don't affect the cselib decisions for non-DEBUG_INSNs such as
pruning of useless values etc., but if a VALUE created that way is then
looked up/needed from non-DEBUG_INSNs, we promote it to non-debug.

The reason for -fcompare-debug failure is that there is one large DEBUG_INSN
with 16 MEMs in it mostly with addresses that so far didn't appear in the IL
otherwise.  Later on, we see an instruction storing into MEM destination
and invalidate that MEM.  Unfortunately, there is a bug caused by the
introduction of SP_DERIVED_VALUE_P where alias.cc isn't able to disambiguate
MEMs with sp + optional offset in address vs. MEMs with address being a
VALUE having SP_DERIVED_VALUE_P + constant (or the SP_DERIVED_VALUE_P
itself), which ought to be possible when REG_VALUES (REGNO
(stack_pointer_rtx)) has SP_DERIVED_VALUE_P + constant location.  Not sure
if I should try to fix that in stage4 or defer for stage1.
Anyway, the cselib_invalidate_mem call because of this invalidates basically
all MEMs with the exception of 5 which have MEM_EXPRs that guarantee
non-aliasing with the sp based store.
Unfortunately, n_useless_values which in my understanding should be always
the same between -g and -g0 compilations diverges, has 3 more useless values
for -g.

Now, these were initially VALUEs created for DEBUG_INSN lookups.  As I said,
cselib.cc has code to promote such VALUEs (well, their location elements) to
non-debug if they are looked up from non-DEBUG_INSNs.  The problem is that
when looking some completely unrelated MEM from a non-DEBUG_INSN we run into
a hash collision and so call cselib_hasher::equal to check if the unrelated
MEM is equal to the one from DEBUG_INSN only element.  The equal static
member function calls rtx_equal_for_cselib_1 and if that returns true,
promotes the location to non-DEBUG, otherwise returns false.  So far so
good.  But rtx_equal_for_cselib_1 internally performs various other cselib
lookups, all done with the non-DEBUG_INSN cselib_current_insn, so they
all promote to non-debug.  And that is wrong, because if it was -g0
compilation, such hashtable entry wouldn't be there at all (or would be
but wouldn't contain that locs element), so with -g0 we wouldn't call
that rtx_equal_for_cselib_1 at all.  So, I think we need to pretend
that such lookup which only happens with -g and not -g0 actually comes
from some DEBUG_INSN (note, the lookups rtx_equal_for_cselib_1 does
are always with create = 0).
The cselib.cc part of the patch does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, I'm not really sure how:
  if (num_mems < param_max_cselib_memory_locations
  && ! canon_anti_dependence (x, false, mem_rtx,
  GET_MODE (mem_rtx), mem_addr))
{
  has_mem = true;
  num_mems++;
  p = &(*p)->next;
  continue;
}
num_mems cap can actually work correctly for -fcompare-debug,
I'd think we would need to differentiate between num_debug_mems and
num_mems depending on if setting_insn is no

[PATCH] RISC-V: Add vlse/vsse intrinsics support

2023-01-27 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/predicates.md (pmode_reg_or_0_operand): New predicate.
* config/riscv/riscv-vector-builtins-bases.cc (class loadstore): 
Support vlse/vsse.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vlse): New class.
(vsse): New class.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::use_contiguous_load_insn): Support vlse/vsse.
* config/riscv/vector.md (@pred_strided_load): New md pattern.
(@pred_strided_store): Ditto.

---
 gcc/config/riscv/predicates.md|  4 +
 .../riscv/riscv-vector-builtins-bases.cc  | 26 +-
 .../riscv/riscv-vector-builtins-bases.h   |  2 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 gcc/config/riscv/riscv-vector-builtins.cc | 33 ++-
 gcc/config/riscv/vector.md| 90 +--
 6 files changed, 143 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 766a427570c..f9013bbf8bb 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -286,6 +286,10 @@
(match_test "GET_CODE (op) == UNSPEC
 && (XINT (op, 1) == UNSPEC_VUNDEF)"
 
+(define_special_predicate "pmode_reg_or_0_operand"
+  (ior (match_operand 0 "const_0_operand")
+   (match_operand 0 "pmode_register_operand")))
+
 ;; The scalar operand can be directly broadcast by RVV instructions.
 (define_predicate "direct_broadcast_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index cf6a060ddfb..f9a16c68e07 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -84,8 +84,8 @@ public:
   }
 };
 
-/* Implements vle.v/vse.v/vlm.v/vsm.v codegen.  */
-template 
+/* Implements vle.v/vse.v/vlm.v/vsm.v/vlse.v/vsse.v codegen.  */
+template 
 class loadstore : public function_base
 {
   unsigned int call_properties (const function_instance &) const override
@@ -106,9 +106,23 @@ class loadstore : public function_base
   rtx expand (function_expander &e) const override
   {
 if (STORE_P)
-  return e.use_contiguous_store_insn (code_for_pred_store (e.vector_mode 
()));
+  {
+   if (STRIDED_P)
+ return e.use_contiguous_store_insn (
+   code_for_pred_strided_store (e.vector_mode ()));
+   else
+ return e.use_contiguous_store_insn (
+   code_for_pred_store (e.vector_mode ()));
+  }
 else
-  return e.use_contiguous_load_insn (code_for_pred_mov (e.vector_mode ()));
+  {
+   if (STRIDED_P)
+ return e.use_contiguous_load_insn (
+   code_for_pred_strided_load (e.vector_mode ()));
+   else
+ return e.use_contiguous_load_insn (
+   code_for_pred_mov (e.vector_mode ()));
+  }
   }
 };
 
@@ -118,6 +132,8 @@ static CONSTEXPR const loadstore vle_obj;
 static CONSTEXPR const loadstore vse_obj;
 static CONSTEXPR const loadstore vlm_obj;
 static CONSTEXPR const loadstore vsm_obj;
+static CONSTEXPR const loadstore vlse_obj;
+static CONSTEXPR const loadstore vsse_obj;
 
 /* Declare the function base NAME, pointing it to an instance
of class _obj.  */
@@ -130,5 +146,7 @@ BASE (vle)
 BASE (vse)
 BASE (vlm)
 BASE (vsm)
+BASE (vlse)
+BASE (vsse)
 
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 7af462b9530..93999e2cbee 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -30,6 +30,8 @@ extern const function_base *const vle;
 extern const function_base *const vse;
 extern const function_base *const vlm;
 extern const function_base *const vsm;
+extern const function_base *const vlse;
+extern const function_base *const vsse;
 }
 
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 8bcaf2e3267..1ddde7b9d76 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -44,5 +44,7 @@ DEF_RVV_FUNCTION (vle, loadstore, full_preds, 
all_v_scalar_const_ptr_ops)
 DEF_RVV_FUNCTION (vse, loadstore, none_m_preds, all_v_scalar_ptr_ops)
 DEF_RVV_FUNCTION (vlm, loadstore, none_preds, b_v_scalar_const_ptr_ops)
 DEF_RVV_FUNCTION (vsm, loadstore, none_preds, b_v_scalar_ptr_ops)
+DEF_RVV_FUNCTION (vlse, loadstore, full_preds, 
all_v_scalar_const_ptr_ptrdiff_ops)
+DEF_RVV_FUNCTION (vsse, loadstore, none_m_preds, all_v_scalar_ptr_ptrdiff_ops)
 
 #undef DEF_RVV_FUNCTION
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 90239305

[PATCH 1/3] Properly set GORI relation trios.

2023-01-27 Thread Andrew MacLeod via Gcc-patches
We added the concept of a relation trio this release.  Basically its a 
group of relations for a range-op statement indicating the relation 
between LHS & OP1, LHS & OP2, and OP1 & OP2.


This is primarily used in GORI so we can use relations during range 
calculation on outgoing edges, although It is also use during folding. 
We currently use them in just a couple of places so it was not really 
fleshed out completely.


The next couple of PRs require relations in range-ops, and in working 
with them, I found I had not been complete when I implemented 
relation_trio... just did enough to get it to work.


This patch properly sets each of the fields in relation trio, and then 
queries the proper field in the couple of places that currently use it.  
This patch invokes no new behaviour, just sets up the information 
correctly so future queries can get the right info.  It consolidates the 
relation trio setting code into one place (more utilization of the 
primary value_relation class) and is just basic goodness.


It also allows the next 2 PRs to be fixed with minimal changes.

Bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?

Andrew


From 5b552dde4d597444d867308fb94b9b49fac42927 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 25 Jan 2023 16:26:39 -0500
Subject: [PATCH 1/4] Properly set GORI relation trios.

When relation trios were added to GORI, there was only one use.  As they are
utilized more by range-ops, it is apparent that the implelemtation was
not complete.  This patch fleshes it out completely so that every GORI
operation has a complete relation trio.

	* gimple-range-gori.cc (gori_compute::compute_operand_range): Do
	not abort calculations if there is a valid relation available.
	(gori_compute::refine_using_relation): Pass correct relation trio.
	(gori_compute::compute_operand1_range): Create trio and use it.
	(gori_compute::compute_operand2_range): Ditto.
	* range-op.cc (operator_plus::op1_range): Use correct trio member.
	(operator_minus::op1_range): Use correct trio member.
	* value-relation.cc (value_relation::create_trio): New.
	* value-relation.h (value_relation::create_trio): New prototype.
---
 gcc/gimple-range-gori.cc | 70 ++--
 gcc/range-op.cc  |  4 +--
 gcc/value-relation.cc| 34 +++
 gcc/value-relation.h |  1 +
 4 files changed, 62 insertions(+), 47 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 930e2a0f0ab..3dc4576ff13 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -632,6 +632,9 @@ gori_compute::compute_operand_range (vrange &r, gimple *stmt,
   if (op1 && op2)
 {
   relation_kind k = handler.op1_op2_relation (lhs);
+  // If there is no relation, and op1 == op2, create a relation.
+  if (!vrel_ptr && k == VREL_VARYING && op1 == op2)
+	k = VREL_EQ;
   if (k != VREL_VARYING)
{
 	 vrel.set_relation (k, op1, op2);
@@ -952,7 +955,9 @@ gori_compute::refine_using_relation (tree op1, vrange &op1_range,
 {
   gcc_checking_assert (TREE_CODE (op1) == SSA_NAME);
   gcc_checking_assert (TREE_CODE (op2) == SSA_NAME);
-  gcc_checking_assert (k != VREL_VARYING && k != VREL_UNDEFINED);
+
+  if (k == VREL_VARYING || k == VREL_EQ || k == VREL_UNDEFINED)
+return false;
 
   bool change = false;
   bool op1_def_p = in_chain_p (op2, op1);
@@ -991,7 +996,7 @@ gori_compute::refine_using_relation (tree op1, vrange &op1_range,
   Value_Range new_result (type);
   if (!op_handler.op1_range (new_result, type,
  op1_def_p ? op1_range : op2_range,
- other_op, relation_trio::lhs_op2 (k)))
+ other_op, relation_trio::lhs_op1 (k)))
 	return false;
   if (op1_def_p)
 	{
@@ -1023,7 +1028,7 @@ gori_compute::refine_using_relation (tree op1, vrange &op1_range,
   Value_Range new_result (type);
   if (!op_handler.op2_range (new_result, type,
  op1_def_p ? op1_range : op2_range,
- other_op, relation_trio::lhs_op1 (k)))
+ other_op, relation_trio::lhs_op2 (k)))
 	return false;
   if (op1_def_p)
 	{
@@ -1062,6 +1067,10 @@ gori_compute::compute_operand1_range (vrange &r,
   tree op2 = handler.operand2 ();
   tree lhs_name = gimple_get_lhs (stmt);
 
+  relation_trio trio;
+  if (rel)
+trio = rel->create_trio (lhs_name, op1, op2);
+
   Value_Range op1_range (TREE_TYPE (op1));
   Value_Range tmp (TREE_TYPE (op1));
   Value_Range op2_range (op2 ? TREE_TYPE (op2) : TREE_TYPE (op1));
@@ -1073,27 +1082,11 @@ gori_compute::compute_operand1_range (vrange &r,
   if (op2)
 {
   src.get_operand (op2_range, op2);
-  relation_kind k = VREL_VARYING;
-  relation_kind op_op = (op1 == op2) ? VREL_EQ : VREL_VARYING;
-  if (rel)
-	{
-	 if (lhs_name == rel->op1 () && op1 == rel->op2 ())
-	   k = rel->kind ();
-	 else if (lhs_name == rel->op2 () && op1 == rel->op1 ())
-	   k = relation_swap (rel->kind ());
-	 else if (op1 == rel->op1 () && op2 == rel->op2 ())
-	   {
-	 op_op = rel->ki

[PATCH 2/3] PR tree-optimization/108359

2023-01-27 Thread Andrew MacLeod via Gcc-patches
If there exists an equivalence relationship between op1 and op2,any 
binary operation can be broken into individual operations and unioned if 
there are sufficiently few elements in the set.


This depends on the first patch as we need to get the relation op1 == 
op2 correct in to the relation_trio set.


There were various suggestions on how to "control" when we perform the 
sub-folds and when we continue using the aggregate. I settled on passing 
into wi_fold_in_parts_equiv a limit, which is currently either 0 or 8.  
As long as there are 8 or less ranges n the subrange, we split it up, 
until the range we are accumulating into reaches 32 subranges... then we 
just do aggregates again. That should allow us to do a "good job" for a 
while, but bail before it becomes wasteful.


Bootstraps on x86_64-pc-linux-gnu with no regressions,  OK for trunk?

(3rd patch is delayed while I look into it further)

Andrew

From 409fc68cd85953c77e02588057a9eb0d21991475 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 17 Jan 2023 11:14:41 -0500
Subject: [PATCH 2/4] Utilize op1 == op2 when invoking range-ops folding.

If there exists an equivalence relationship between op1 and op2,
any binary operation can be broken into individual operations and
unioned if there are sufficently few elements in the set.

	PR tree-optimization/108359
	gcc/
	* range-op.cc (range_operator::wi_fold_in_parts_equiv): New.
	(range_operator::fold_range): If op1 is equivalent to op2 then
	invoke new fold_in_parts_equiv to operate on sub-components.
	* range-op.h (wi_fold_in_parts_equiv): New prototype.

	gcc/testsuite/
	* gcc.dg/pr108359.c: New.
---
 gcc/range-op.cc | 54 +
 gcc/range-op.h  |  6 
 gcc/testsuite/gcc.dg/pr108359.c | 52 +++
 3 files changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108359.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index ed2dd1eb99c..f7c1e84e0bd 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -160,6 +160,38 @@ range_operator::wi_fold (irange &r, tree type,
   r.set_varying (type);
 }
 
+// Call wi_fold when both op1 and op2 are equivalent. Further split small
+// subranges into constants.  This can provide better precision.
+// For x + y,  when x == y with a range of [0,4] instead of [0, 8] produce
+// [0,0][2, 2][4,4][6, 6][8, 8]
+// LIMIT is the maximum number of elements in range allowed before we
+// do not processs them individually.
+
+void
+range_operator::wi_fold_in_parts_equiv (irange &r, tree type,
+	const wide_int &lh_lb,
+	const wide_int &lh_ub,
+	unsigned limit) const
+{
+  int_range_max tmp;
+  widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)),
+ widest_int::from (lh_lb, TYPE_SIGN (type)));
+  // if there are 1 to 8 values in the LH range, split them up.
+  r.set_undefined ();
+  if (lh_range >= 0 && lh_range < limit)
+{
+  for (unsigned x = 0; x <= lh_range; x++)
+	{
+	  wide_int val = lh_lb + x;
+	  wi_fold (tmp, type, val, val, val, val);
+	  r.union_ (tmp);
+	}
+}
+  // Otherwise just call wi_fold.
+  else
+wi_fold (r, type, lh_lb, lh_ub, lh_lb, lh_ub);
+}
+
 // Call wi_fold, except further split small subranges into constants.
 // This can provide better precision. For something   8 >> [0,1]
 // Instead of [8, 16], we will produce [8,8][16,16]
@@ -234,6 +266,28 @@ range_operator::fold_range (irange &r, tree type,
   unsigned num_lh = lh.num_pairs ();
   unsigned num_rh = rh.num_pairs ();
 
+  // If op1 and op2 are equivalences, then we don't need a complete cross
+  // product, just pairs of matching elements.
+  if (relation_equiv_p (rel) && lh == rh)
+{
+  int_range_max tmp;
+  r.set_undefined ();
+  for (unsigned x = 0; x < num_lh; ++x)
+	{
+	  // If the number of subranges is too high, limit subrange creation.
+	  unsigned limit = (r.num_pairs () > 32) ? 0 : 8;
+	  wide_int lh_lb = lh.lower_bound (x);
+	  wide_int lh_ub = lh.upper_bound (x);
+	  wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub, limit);
+	  r.union_ (tmp);
+	  if (r.varying_p ())
+	break;
+	}
+  op1_op2_relation_effect (r, type, lh, rh, rel);
+  update_known_bitmask (r, m_code, lh, rh);
+  return true;
+}
+
   // If both ranges are single pairs, fold directly into the result range.
   // If the number of subranges grows too high, produce a summary result as the
   // loop becomes exponential with little benefit.  See PR 103821.
diff --git a/gcc/range-op.h b/gcc/range-op.h
index b7b8a3b9473..f00b747f08a 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -109,6 +109,12 @@ protected:
 			 const wide_int &rh_lb,
 			 const wide_int &rh_ub) const;
 
+  // Called by fold range to split small subranges into parts when op1 == op2
+  void wi_fold_in_parts_equiv (irange &r, tree type,
+			   const wide_int &lb,
+			   const wide_int &ub,
+			   unsigned limit) const;
+
   // Tree code of the range operato

[pushed] wwwdocs: codingconventions: Replace markup by

2023-01-27 Thread Gerald Pfeifer
A small refinement. (Too bad the w3 validator isn't automatically usable 
for us any more, though I'm checking manually these days.)

Pushed.

Gerald
---
 htdocs/codingconventions.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index 7e2a092d..8b3cb8bb 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -756,7 +756,7 @@ first. 
 
 libstdc++-v3:  The doc/doxygen/user.cfg.in file is partially autogenerated
 from https://www.doxygen.nl";>the Doxygen tool (and regenerated
-using doxygen -u).
+using doxygen -u).
 The files in doc/html are generated from the Docbook sources in doc/xml
 and should not be changed manually.
 The files in doc/xml/gnu are based on the GNU licenses and should not
-- 
2.39.1


[pushed] wwwdocs: mirrors: Switch ftp.fu-berlin.de from ftp to https

2023-01-27 Thread Gerald Pfeifer
Back in 2021 http/https were not supported by ftp.fu-berlin.de, now 
they are, so switch over.

Thank you, f...@fu-berlin.de! (And please advise if you'd like to see 
things changed.)

Pushed.

Gerald
---
 htdocs/mirrors.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/mirrors.html b/htdocs/mirrors.html
index 38738bb1..963fc7c3 100644
--- a/htdocs/mirrors.html
+++ b/htdocs/mirrors.html
@@ -20,7 +20,7 @@ mirrors.  The following sites mirror the gcc.gnu.org 
download site
 France (no snapshots): ftp://ftp.lip6.fr/pub/gcc/";>ftp.lip6.fr, thanks to 
ftpma...@lip6.fr
 France, Brittany: ftp://ftp.irisa.fr/pub/mirrors/gcc.gnu.org/gcc/";>ftp.irisa.fr, thanks 
to ftpma...@irisa.fr
 France, Versailles: ftp://ftp.uvsq.fr/pub/gcc/";>ftp.uvsq.fr, 
thanks to ftpma...@uvsq.fr
-Germany, Berlin: ftp://ftp.fu-berlin.de/unix/languages/gcc/";>ftp.fu-berlin.de, thanks 
to f...@fu-berlin.de
+Germany, Berlin: https://ftp.fu-berlin.de/unix/languages/gcc/";>ftp.fu-berlin.de, 
thanks to f...@fu-berlin.de
 Germany: https://ftp.gwdg.de/pub/misc/gcc/";>ftp.gwdg.de, 
thanks to f...@gwdg.de
 Germany: https://ftp.mpi-inf.mpg.de/mirrors/gnu/mirror/gcc.gnu.org/pub/gcc/";>mpi-sb.mpg.de,
 thanks to ftpad...@mpi-sb.mpg.de
 Germany: http://gcc.cybermirror.org";>http://gcc.cybermirror.org, thanks to 
Sascha Schwarz (c...@cybermirror.org)
-- 
2.39.1


Re: [patch, gfortran.dg] Adjust numerous tests so that they pass on line endings

2023-01-27 Thread Jerry D via Gcc-patches

Committed:

It is not apparent to me that the testsuite/ChangeLog was updated. Maybe 
there is a time delay on that?


Please be patient with me as I figure out how all this works.

ommit f963705752e9d0b79a340788166269af417e344e (HEAD -> master, 
origin/master, origin/HEAD)

Author: Jerry DeLisle 
Date:   Sat Jan 21 15:47:19 2023 -0800

Fortran tests: Revise line end tests allowing windows testing.

gcc/testsuite/ChangeLog:

* gfortran.dg/ISO_Fortran_binding_17.f90: Replace (\n|\r\n|\r)
with (\r*\n+).
* gfortran.dg/array_temporaries_2.f90: Likewise.
* gfortran.dg/bind-c-contiguous-1.f90: Likewise.
* gfortran.dg/bind-c-contiguous-4.f90: Likewise.
* gfortran.dg/bind-c-contiguous-5.f90: Likewise.
* gfortran.dg/fmt_error_4.f90: Likewise.
* gfortran.dg/fmt_error_5.f90: Likewise.
* gfortran.dg/fmt_float.f90: Likewise.
* gfortran.dg/fmt_l.f90: Likewise.
* gfortran.dg/fmt_nonchar_2.f90: Likewise.
* gfortran.dg/fmt_zero_precision.f90: Likewise.
* gfortran.dg/g77/f77-edit-apostrophe-out.f: Likewise.
* gfortran.dg/g77/f77-edit-colon-out.f: Likewise.
* gfortran.dg/g77/f77-edit-h-out.f: Likewise.
* gfortran.dg/g77/f77-edit-i-out.f: Likewise.
* gfortran.dg/g77/f77-edit-s-out.f: Likewise.
* gfortran.dg/g77/f77-edit-slash-out.f: Likewise.
* gfortran.dg/g77/f77-edit-t-out.f: Likewise.
* gfortran.dg/g77/f77-edit-x-out.f: Likewise.
* gfortran.dg/namelist_40.f90: Likewise.
* gfortran.dg/namelist_47.f90: Likewise.
* gfortran.dg/namelist_print_1.f: Likewise.
* gfortran.dg/parameter_array_dummy.f90: Likewise.



Re: [PATCH] RISC-V: Remove redundant attributes

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Sat, Jan 28, 2023 at 6:58 AM  wrote:

> From: Ju-Zhe Zhong 
>
> ---
>  gcc/config/riscv/vector.md | 20 
>  1 file changed, 20 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 8c60eb20d72..4319266974d 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -208,26 +208,6 @@
>  (const_int 4)]
> (const_int INVALID_ATTRIBUTE)))
>
> -;; The index of operand[] to get the tail policy op.
> -(define_attr "tail_policy_op_idx" ""
> -  (cond [(eq_attr "type" "vlde,vste,vimov,vfmov,vlds")
> -(const_int 5)]
> -   (const_int INVALID_ATTRIBUTE)))
> -
> -;; The index of operand[] to get the mask policy op.
> -(define_attr "mask_policy_op_idx" ""
> -  (cond [(eq_attr "type" "vlde,vste,vlds")
> -(const_int 6)]
> -   (const_int INVALID_ATTRIBUTE)))
> -
> -;; The index of operand[] to get the mask policy op.
> -(define_attr "avl_type_op_idx" ""
> -  (cond [(eq_attr "type"
> "vlde,vlde,vste,vimov,vimov,vimov,vfmov,vlds,vlds")
> -(const_int 7)
> -(eq_attr "type" "vldm,vstm,vimov,vmalu,vmalu")
> -(const_int 5)]
> -   (const_int INVALID_ATTRIBUTE)))
> -
>  ;; The tail policy op value.
>  (define_attr "ta" ""
>(cond [(eq_attr "type" "vlde,vimov,vfmov,vlds")
> --
> 2.36.3
>
>


Re: [PATCH] RISC-V: Add vlse/vsse intrinsics support

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!

On Sat, Jan 28, 2023 at 7:26 AM  wrote:

> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/predicates.md (pmode_reg_or_0_operand): New
> predicate.
> * config/riscv/riscv-vector-builtins-bases.cc (class loadstore):
> Support vlse/vsse.
> (BASE): Ditto.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def (vlse): New
> class.
> (vsse): New class.
> * config/riscv/riscv-vector-builtins.cc
> (function_expander::use_contiguous_load_insn): Support vlse/vsse.
> * config/riscv/vector.md (@pred_strided_load): New md
> pattern.
> (@pred_strided_store): Ditto.
>
> ---
>  gcc/config/riscv/predicates.md|  4 +
>  .../riscv/riscv-vector-builtins-bases.cc  | 26 +-
>  .../riscv/riscv-vector-builtins-bases.h   |  2 +
>  .../riscv/riscv-vector-builtins-functions.def |  2 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 33 ++-
>  gcc/config/riscv/vector.md| 90 +--
>  6 files changed, 143 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/config/riscv/predicates.md
> b/gcc/config/riscv/predicates.md
> index 766a427570c..f9013bbf8bb 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -286,6 +286,10 @@
> (match_test "GET_CODE (op) == UNSPEC
>  && (XINT (op, 1) == UNSPEC_VUNDEF)"
>
> +(define_special_predicate "pmode_reg_or_0_operand"
> +  (ior (match_operand 0 "const_0_operand")
> +   (match_operand 0 "pmode_register_operand")))
> +
>  ;; The scalar operand can be directly broadcast by RVV instructions.
>  (define_predicate "direct_broadcast_operand"
>(ior (match_operand 0 "register_operand")
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index cf6a060ddfb..f9a16c68e07 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -84,8 +84,8 @@ public:
>}
>  };
>
> -/* Implements vle.v/vse.v/vlm.v/vsm.v codegen.  */
> -template 
> +/* Implements vle.v/vse.v/vlm.v/vsm.v/vlse.v/vsse.v codegen.  */
> +template 
>  class loadstore : public function_base
>  {
>unsigned int call_properties (const function_instance &) const override
> @@ -106,9 +106,23 @@ class loadstore : public function_base
>rtx expand (function_expander &e) const override
>{
>  if (STORE_P)
> -  return e.use_contiguous_store_insn (code_for_pred_store
> (e.vector_mode ()));
> +  {
> +   if (STRIDED_P)
> + return e.use_contiguous_store_insn (
> +   code_for_pred_strided_store (e.vector_mode ()));
> +   else
> + return e.use_contiguous_store_insn (
> +   code_for_pred_store (e.vector_mode ()));
> +  }
>  else
> -  return e.use_contiguous_load_insn (code_for_pred_mov (e.vector_mode
> ()));
> +  {
> +   if (STRIDED_P)
> + return e.use_contiguous_load_insn (
> +   code_for_pred_strided_load (e.vector_mode ()));
> +   else
> + return e.use_contiguous_load_insn (
> +   code_for_pred_mov (e.vector_mode ()));
> +  }
>}
>  };
>
> @@ -118,6 +132,8 @@ static CONSTEXPR const loadstore vle_obj;
>  static CONSTEXPR const loadstore vse_obj;
>  static CONSTEXPR const loadstore vlm_obj;
>  static CONSTEXPR const loadstore vsm_obj;
> +static CONSTEXPR const loadstore vlse_obj;
> +static CONSTEXPR const loadstore vsse_obj;
>
>  /* Declare the function base NAME, pointing it to an instance
> of class _obj.  */
> @@ -130,5 +146,7 @@ BASE (vle)
>  BASE (vse)
>  BASE (vlm)
>  BASE (vsm)
> +BASE (vlse)
> +BASE (vsse)
>
>  } // end namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 7af462b9530..93999e2cbee 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -30,6 +30,8 @@ extern const function_base *const vle;
>  extern const function_base *const vse;
>  extern const function_base *const vlm;
>  extern const function_base *const vsm;
> +extern const function_base *const vlse;
> +extern const function_base *const vsse;
>  }
>
>  } // end namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index 8bcaf2e3267..1ddde7b9d76 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -44,5 +44,7 @@ DEF_RVV_FUNCTION (vle, loadstore, full_preds,
> all_v_scalar_const_ptr_ops)
>  DEF_RVV_FUNCTION (vse, loadstore, none_m_preds, all_v_scalar_ptr_ops)
>  DEF_RVV_FUNCTION (vlm, loadstore, none_preds, b_v_scalar_const_ptr_ops)
>  DEF_RVV_FUNCTION (vsm, loadstore, none_preds, b_v_scalar_ptr_ops)
> +DEF_RVV

Re: [PATCH] RISC-V: Add vlse/vsse C/C++ intrinsic testcases

2023-01-27 Thread Kito Cheng via Gcc-patches
committed, thanks!


Re: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future)

2023-01-27 Thread Michael Meissner via Gcc-patches
On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote:
> > There is one bug that I noticed.  When you use the full DMR instruction the
> > constant copy propagation patch issues internal errors.  I believe this is 
> > due
> > to the CCP pass not handling opaque types cleanly enough, and it only shows 
> > up
> > in larger types.  I would like to get these patches committed, and then work
> > the maintainers of the CCP to fix the problem.
> 
> Erm.  If the compiler ICEs, we can not include this code.  But hopefully
> you mean something else?

I realize we can't include the code for final release.  But as a temporary
measure I was hoping we would put in the code, we could allow somebody more
familar with ccp to debug it.  Then if there were changes needed in the PowerPC
back end, we could make them, once ccp was fixed.

But that is a moot point, ccp no longer dies with the code, so I have removed
the comment and the no tree ccp option in the next set of patches.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com